A
spam-Attack Detection and Prevention System
Management Overview
presented by
Royce Williams
April 24th, 2001
spamradar is a Perl program designed
to help ISPs and other large-volume mail servers detect, block, and report
incoming floods of unsolicited commercial email (UCE) and/or unsolicited bulk
email (UBE) - collectively known as "spam".
spamradar is unusual because it is
not content-based. It does not examine
the actual contents of any email messages.
Instead, it examines the behavior of remote mailservers while they are
connecting to the local mailserver, searching for patterns that are concurrent
indicators of inbound spam attack.
spamradar is being released under the
GNU General Public License (see http://www.gnu.org/copyleft/gpl.html in the
hope that it can be of general use. spamradar uses no proprietary
code. Its full source is available at http://www.tycho.org/spamradar/.
Note that the words SPAM® and Spam® (with the
specific case shown) are not used elsewhere in this document because they are
registered trademarks of Hormel Foods Corporation. The use of the word "spam" (all lower-case) is
deliberate because it is the common term for UCE and/or UBE. Any use of the words "SPAM" or
"Spam" elsewhere in this document is inadvertent and should be
considered to be typographical error.
The purpose of this analysis was to find the optimal
way to solve a particular customer and business problem: the reduction of
incoming and outbound spam. The
analysis focused on an evaluation of both customer and resource impact,
searching for a solution to address these issues.
In particular, we felt that the solution needed to
be
·
Flexible
·
Fast
·
Easy
to use
·
Easy
to modify
The analysis addresses each of these needs.
Since most of the large companies that provide email
services (AOL, Yahoo, Hotmail, etc.) have their own internal proprietary
methods of reducing spam, this package is aimed towards the medium- and
small-sized ISPs, and small-to-medium-sized companies that run their own mail
servers.
spamradar is different from some
other anti-spam packages in that it does not attempt to examine the content of
the message itself to determine whether or not it is spam. Rather, it attempts to track and analyze
information outside the messages themselves (using information that is not
subject to most customer privacy regulations), and to intelligently use this
information to determine whether or not spam is likely to be coming from a
particular source. spamradar can act on a system-wide basis without invading customer
privacy.
Email is a very important and sensitive resource for
many people. Because of this, there are
a variety of different approaches to combating spam. spamradar’s features
are flexible enough to accommodate changes in policy, corporate philosophy, or
customer opinion and needs.
It is suggested that spamradar be used in conjunction with a customer-configurable
personal filtering system (such as Brightmail or various procmail filters) for
maximum spam reduction and flexibility.
·
Perl
5.004 or higher;
·
Sendmail
or Postfix mail server software
·
Mail
logging to a text file accessible by the machine that is running spamradar
Log analyzer
This is a Perl program implemented as a daemon to
constantly monitor raw mail logs, watching for spamlike activity. It can also be run in non-daemon mode,
reporting on spamlike activity to assist a systems administrator in the
investigation of mail problems. If
system resources are limited, many of the benefits of running as a daemon can
be achieved by arranging for spamradar
to be executed periodically from the Unix cron
facility.
Open-relay
tester
Once spamradar has determined that a relay may be a spam source, it queues up the IP address of that remote mailserver to be tested for third-party relaying. If a relay has been recently tested, the test will be skipped.
Test-results
analyzer
This component periodically checks the email inbox
that has been designated as the recipient of relay tests. It verifies that the message received
matches the one sent by checking the headers for the relay IP address and
matching a stored cookie with the cookie in the received message. Test results are then stored in the
database.
Command-line
utilities
While spamradar
is designed to run unattended and perform useful protection functions, it
requires tuning and is far from a perfect solution. It may be well-suited to a particular installation at a
particular time, but spammer tactics can change over time. spamradar’s
command-line options can be used to examine in closer detail any log entries
that are not recognized by spamradar
or are recognized incorrectly. They can
also be used to tune and monitor performance.
Data
repository and data access module
This is a Berkeley DB database that contains
information about known mail relays, their testing timestamps, their behavior,
and their testing status. To maximize
portability, the database interface is abstracted into a separate module, Relaydata.pm
. To take advantage of the speed of BerkeleyDB hashes while at the
same time benefit from their portability, the data is serialized in the
background into a single string by Relaydata.pm
and parsed into individual
data items. This happens in a fashion
transparent to the user.
Detect and
track attempts to deliver to users that don’t exist. Despite what many people believe, the most common way for a
spammer to guess your username is exactly that – guessing. The other common approach is to purchase a
list of email addresses from another source.
A natural byproduct of both of these approaches is that the mail logs
will contain attempts to deliver to nonexistent users. If there are more than a few of these coming
from a remote mailserver, they are probably relays being used by spammers and
should be tested.
Analyze email deliveries claiming to be from domains that don't exist
or don't have MX records. By claiming to come from
domains that don't exist, spammers try to obscure where they are really coming
from. The current best practice to
counter this is to block any incoming messages with "From:" fields
that use domains that don't exist, or that come from domains that don't have
any way to return the mail (missing MX records). spamradar takes this
one step further. A high number of
emails coming from a few relays claiming to come from domains that don't exist
are a concurrent indicator of spam attack.
spamradar watches for these
relays, summarizes and displays their activity, and can act when these
thresholds are exceeded.
Make it easier
to manually hunt for spammers. While
actively fighting a spam attack, it is often handy to be able to manually
review the last few lines of the log, looking for suspicious activity. spamradar's
-m
and -t
options make it easy to
analyze the mail logs from the command line with a minimum of typing, and the -o
option lets you manually
test a host to see whether or not it is an open relay.
Track and
analyze successful deliveries. If a
spammer has a good list of your users, the number of unsuccessful attempts will
be relatively low, but the volume will be high. spamradar tracks the
number of successful deliveries as well as the number of delivery
attempts. Much like Kai's SpamShield
(http://spamshield.conti.nu/),
if a particular host is sending you 1500 messages per minute, it's probably
spamming you - and spamradar will
flag it as a possible spam source to be reported and tested.
Pay special
attention to mailing lists. Mailing lists often are often
not well-maintained. It is common for
users to forget to unsubscribe from a mailing list before moving on to a new
email address. Ironically, this means
that the older and more popular a mailing list is, the more likely it is that a
number of your users will have previously subscribed to it and departed, making
it appear as though more and more usernames are being "guessed" by
the listserver.
To help counter this, hosts that have
"lyris", "list", "lists," or
"majordomo" in their names are not blocked by default. Other strings can also be added. Note that this has nothing to do with the
domain name used in the "From:" field of the email, but only deals
directly with the reverse-DNS name of the relaying host itself.
Test to see
whether or not a suspicious relay is an open relay. After detecting possible spam activity, spamradar can be configured to test the servers in question for
unauthorized third-party relaying. This
testing can also be manually performed with spamradar’s -o
option.
If the server is truly an open relay, an email address of your choice
will receive the test message sent by spamradar.
Warn
administrators of open or blocked relays that they have been blocked. When a relay is blocked, spamradar
will automatically send an email to the Postmaster user at that host, both by
name and by IP address. It will also
optionally send an email to them via the abuse.net email facility, a
centralized database of known good mail-abuse contacts. This service requires a free
registration. More information about
abuse.net can be found at http://www.abuse.net/.
Optionally
take advantage of existing spam resources.
There are a number of spam resources on the Internet that are designed
to allow mailserver sysadmins to centralize their experience with spammers and
open relays. The more sites that
contribute to these resources, the harder it becomes for spammers to deliver
their messages unimpeded.
For example, some third-party services allow you to
submit a host for testing as a possible open relay. The Open Relay Behaviour-modification System (or ORBS) is one of
the most widely used systems. For those
services that allow it, spamradar
can automatically forward confirmed relays to any service that accepts email
submission of open relays.. Be aware
that some of these services do not accept submissions without some sort of
registration. See http://www.orbs.org/ for
more information.
Users of spamradar
are strongly encouraged to take advantage of shared public databases such as
ORBS. The spammer most likely to slip
beneath spamradar's notice is the
spammer that sends a low number of spam messages per hour, claiming to come
from domains that really exist, using only a very few open relays at a time,
using real domains that have MX records, attempting delivery to usernames in
random order, and using "From:" usernames that look legitimate. In other words, spammers escape spamradar by becoming low-volume
spammers. The only exception to this is
the spammer that is smart enough to spam 10,000 sites simultaneously in an
interleaved and distributed fashion. It
is for this reason that you are encouraged to use these features of spamradar to share your experience with
others.
Detect
multiple sendings from multiple relays. If you receive five emails from five different mail servers on
diverse networks all claiming to be coming from "bill54332@loja.net",
there is an increased chance that those mail servers are open relays. spamradar
takes this into account and will be more likely to report such servers as
possible spam sources.
Concise. spamradar is designed
to show you as much information as you need.
For example, during normal operation, if you (or spamradar) have spotted a spam source and blocked it, spamradar will not display this host in
its output by default (though all relays can be displayed with the -a
option).
spamradar will also not
display statistics for hosts that you have marked as never to be blocked (in
the spamradar
.dontblock
file). This
can save valuable time when trying to track down spam-related problems.
Flexible. spamradar can
understand sendmail and Postfix logging formats. It can generate machine-readable output for processing by other
programs. You can define hosts that you
would like to always block, never block, or temporarily block for a specified
period of time. spamradar can run as a command-line tool or as a daemon. It can use an external configuration file in
a user-specified location. It has
multiple levels of debugging to allow you to examine the spam-detection process
in greater detail. spamradar can be configured to automatically block known open
relays, or to simply report suspicious relays and defer testing to a human
operator. In this way, spamradar can change as policies and
opinions change.
Portable and
self-contained. Written for Perl 5.0x, spamradar should be useable on any
platform running sendmail or Postfix that logs to a text file, and uses no platform-specific calls and very
few external programs. The only exceptions to this are the external
“tail” command, which is so much faster than Perl implementations that have
been tested that it is still an external call, and the single-relay-testing
utility rlytest.pl by Chip Rosenthal.
With some additional research and testing, it is believed that both features
can be internalized in future revisions.
By keeping these external dependencies to a minimum, spamradar can be easily ported to other
systems.
Open source. You may need to modify spamradar
to fit your own needs. With the full
source available, you will be able to do so.
If you modify spamradar in a
way that might benefit others, you are encouraged to contribute your changes
back to the main source tree.
Persistent. spamradar remembers
which servers you have tested, which servers you have not tested, and whether
or not a server has been blocked. When
a host has been blocked as a possible spam source, and then unblocked later, spamradar can remember the previous
block and will be more likely to block that host in the future. This maximizes the use of resources dedicated
to spamradar’s processing.
Efficient. On a Sun Ultra II with two 300MHz processors, 512M of RAM, and
its mail logs stored on a Netapp mounted via NFS, spamradar analyzes mail logs at a rate of about 1000 lines per CPU
second. On average, the mail systems
generate about 1100 log records per real-time minute. This should provide ample room for future growth.
·
Make
sure that the email address used as the source of test messages is directed to
someone who reads it on a regular basis.
This is especially important during early stages of implementation.
·
Take
advantage of spamradar’s export
functions to populate a mailserver lookup table that can be used by the
mailserver to refuse email.
·
For
organizations with multiple mailservers running, spamradar can be put to best use by forwarding all mail logging to
a centralized, shared log (via the Unix syslog facility). This will enable spamradar to monitor patterns across all servers. The list of rejected relays should be also
be shared, perhaps on a centralized NFS-mounted share or pushed out with ssh
and rsync.
·
Use
a hard-to-guess username for the recipient of test messages, to minimize
attempts to exploit the automated testing system.
·
Run
spamradar manually and note which
servers it detects as possibly spamming.
If domains you recognize show up frequently (like aol.com), add them to
the spamradar.ignore
file.
·
Syslogging
as an alternative to writing directly to a log file would be more
Unix-friendly.
·
A
particularly thorny and interesting improvement: the abstraction of the
mail-log parsing into a separate module, so that separate modules for handling
other log formats could be easily developed.
·
The
two remaining system calls (tail
and rlytest
) should be converted to
internal routines. Unfortunately, the
current version of the File::Tail
Perl module is considerably
less efficient at extracting the last 8000 lines of a file than is the external
tail
program. Internalizing rlytest
might be less difficult.
·
If
users bounced their spam emails to another mailbox (separate from the
TEST_RECIPIENT mailbox), spamradar could be taught to pop these messages,
extract relay IPs from the headers, and test them.
·
Relaytest.pm
could be modified to
emulate the DBI interface so that other DBI modules could be used in its place.
·
The
current version of spamradar does
not collect statistics or act on error messages generated by attempts to
deliver mail that are already denied by another mechanism (such as RSS or a
local “reject” list.) There are stubs
in the code where these would fit, but they have not yet been implemented.
·
The
spamradar.allow
and spamradar.ignore
files currently do not
understand networks expressed in Classless Inter-Domain Routing (CIDR)
notation. This would be useful for
autonomous systems that have subnets of sizes between the standard classful
networks.
·
The
current method of testing for guessed-username randomness is to examine only
the first character of each guessed username.
There are more sophisticated statistical methods to calculate how
different one string is from another string.
If the calculation cost is not too high, this might be worth pursuing,
perhaps even as a way to detect spam that is being successfully delivered.
From searching the mail forums and the Internet, it
is clear that most small- to medium-sized ISPs and companies are not writing
their mail logs to an SQL database.
Therefore, the feature that read from SQL databases was discarded before
implementation.
Performing ARIN and domain WHOIS lookups for each
host resulted in an undesirable processing delay and increase in load. A centralized database of contact
information at abuse.net coupled with an automatic attempt to deliver to postmaster@domain.com
and postmaster@192.168.42.73
achieve similar effect with much less impact.
Further research into the requirements for ORBS
submission has raised questions as to whether relay-testing results from an
automated service such as spamradar
will be accepted by ORBS. An email
inquiring about the feasibility of this has been sent to ORBS, but a response
has not yet been received.
In its current incarnation, spamradar calls Chip Rosenthal’s rlytest.pl
to perform its individual
relay testing. This is primary because
Chip’s implementation is so well-designed and optimized that has proven to be
difficult to improve upon it. Because rlytest.pl
is not packaged with spamradar, users will need to download
it from
http://www.unicom.com/sw/rlytest/.
What is spam? - Spam is usually defined
as bulk or commercial email that is unsolicited. Often called UCE (Unsolicited Commercial Email) or UBE
(Unsolicited Bulk Email).
Why do I get so much spam? Unfortunately, the laws of economics would suggest that the only
reasons that spam is so prevalent are that it works, and that it is very
inexpensive for the sender. Quite a
large number of people must be responding to spam, because the spammers keep
sending it. Because spam is often
relayed through improperly configured mail servers administered by unsuspecting
administrators, it is often impractical to try to trace back to where the spam
came from (more on this later).
Why are spammers hard to stop? Spammers
use a vulnerability in some mail servers to transfer the cost of delivering the
spam to someone else. Because this
vulnerability is somewhat abstract, many mail servers are vulnerable for years
without the problem being noticed by their system administrators. This vulnerability is know as open relay or third-party relay.
In the past, many mail server software packages
shipped with no limit on who could use them to relay mail. System administrators had to deliberately
deactivate this feature. In the early
days of the Internet, this was harmless and even promoted inter-node
communication. On today's Internet,
however, this practice is ill-advised and even dangerous, because huge floods
of incoming spam can be as interruptive to the normal function of a mail server
as a denial-of-service attack can be.
Most mature mail server products ship with no third-party relaying
allowed, and the system administrator must enumerate the networks or domain
names that are allowed to relay through the server. However, since old software is often inexpensive or has licensing
that is no longer vigorously enforced, it is quite common for new installations
of older mail server packages to be deployed daily all over the world.
What is an open relay? An open relay is a mail server that allows connections from any
network to relay mail through it to any other network.
Normal mail servers only allow their own internal
users to relay mail to addresses outside their systems. This makes some amount of sense - the big
mail servers pass the mail between each other and are connected to the Internet
at all times, and there is a cost to maintain these mail servers and providing
enough capacity to handle the load.
This cost is incurred by the ISP and is passed on to the customer. By limiting who can relay mail to only one's
customers, any costs associated with maintaining resources sufficient to handle
that load are tied directly to revenue.
However, if anyone is allowed to relay through your mailservers, all of
the associated costs increase without a corresponding increase in revenue.
For example, if I am an Internet Alaska customer,
and if I set my outgoing mail server to a mail server that belongs to Internet
Alaska, I can freely send email to bob@alaska.net
, bob@gci.net
, or bob@aol.com
. However, if I am an Internet Alaska customer and I set my
outgoing mail server to a GCI mail server, I should only be able to send
messages to bob@gci.net
. As an
Internet Alaska customer, I should not be able to send email through GCI's mail
servers to AOL. That is why this type
of relaying is often called third-party
relaying.
How do spammers get my email address? From a spammer's perspective, the more people that the spammer can reach, the better. This is why, if you are a spammer, it is vital to get a list of known good email addresses. There a few ways that a spammer can build a good user list:
·
Build one by randomly
guessing usernames. Spammers use this tactic more
often than you might think. spamradar's first goal was to figure
out which relays were trying to guess usernames and then display them.
·
Steal a public one. This is becoming less common.
Fewer and fewer ISPs are exposing their user list (most often as their
/etc/passwd file) to the public. If a
spammer has a recent list of your users, the amount of apparent username
guessing will be relatively low, making it harder to detect that the messages
are spam.
·
Harvest them from public
Internet sources. These include public Usenet
archives, web pages, and even chat rooms and IRC.
·
Purchase from and/or
exchange with other spammers.
What is an MX record? An MX or Mail eXchanger record is a type of DNS record attached
to a particular domain. Performing a
DNS query about a particular domain and specifying the “MX” record type will
show the list of mail servers that are listed as the final destination for any
email destined for that domain. For
example:
royce@vegan:~>>
host -t mx tycho.org
tycho.org
mail is handled (pri=20) by mailhost.alaska.net
tycho.org
mail is handled (pri=10) by smtpgate.alaska.net
This is a manual way to perform exactly the same
lookup that a mailserver performs whenever it has to deliver mail. If someone is trying to send email to billy@tycho.org
,
the mailserver will look up the MX records for the tycho.org
domain, pick the one with the lowest priority (in this case, 10) look up the IP
address of that server, and try to deliver the message to it.
The reason that this is a spam issue is that a spammer can configure his or her spam software to use a “From:” address that uses a domain that has no MX record, like so:
royce@vegan:~>>
host -t mx delphi.net
Host not
found, try again.
This is a domain that exists (its name is
registered) but it has no MX records.
If a message claiming to come from sadie@dephi.net
is successfully delivered, then the spammer has obscured the origin of the
email and made it more difficult to respond.
It is rare for spammers to use email addresses that actually exist.
Many mail server software packages (including sendmail) can be configured to reject mail claiming to come from domains that do not have MX records. spamradar takes advantage of the fact that the mailserver has to perform this lookup by reading the relevant record from the logs and nominating that server for testing.
Why is spam bad? - For end users, spam is
bad because they must waste their time wading through it to process their
legitimate mail. For some people, it is
quite easy to just hit the Delete key.
For others, especially those with young children, this can be a very
frustrating issue.
For system administrators,
however, the problem is serious for different reasons. The sheer volume of spam traversing the
Internet has rapidly increased to the point that capacity planning and
anti-spam activities take up considerable time and resources. The company incurs a cost for which it will
never be reimbursed. This is the
equivalent of the mailman being forced to deliver any unstamped letter that you
drop in the mailbox, whether it is one envelope or ten thousand.
Why is it called "spamradar”? - Two reasons. First, spamradar
is designed as an early-warning system that can both detect incoming trouble
and help you to act upon it, so the “radar” simile seemed appropriate. Additionally, the name didn't come up in any
of the search engines that were tried. J