UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering An Effective Defense Against Spam Laundering Mengjun Xie, Heng Yin, Haining Wang Presented by Dustin Christmann March 4, 2009
Feb 24, 2016
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
An Effective Defense Against Spam Laundering
Mengjun Xie, Heng Yin, Haining Wang
Presented by Dustin ChristmannMarch 4, 2009
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Outline• Introduction• Spam Laundering• Anti-Spam Techniques• Proxy-Based Spam Behavior• DBSpam• DBSpam Evaluation• Potential Evasions
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
IntroductionWhat is spam?Classic definition: a canned precooked
meat product made by the Hormel Foods Corporation, introduced in 1937. “SPAM” stands for “SPiced hAM”
Modern definition: the abuse of electronic messaging systems to send unsolicited bulk messages indiscriminately.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
IntroductionSo how did we get from one definition to
the other?
A 1970 Monty Python sketch, entitled “Spam.”
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Spam LaunderingMTAEmail relay
Proxy MTA
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Anti-Spam TechniquesThree main categories:1. Recipient-oriented techniques2. Sender-oriented techniques3. HoneySpam
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Recipient-oriented Techniques
Two main categories:1. Content-based techniques2. Non-content-based techniques
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Content-Based Techniques• Email address filters• Heuristic filters• Machine-learning based filters
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Non-content-based Techniques
• DNSBLs • MARID• Challenge-Response• Tempfailing• Delaying• Sender Behavior Analysis
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Sender-oriented Techniques
• Usage regulation• Cost-based approaches
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
HoneySpam• Based on honeyd• Set up
– Fake web servers– Fake open proxies– Fake relays
• Log the users of these fake servers as spam sources
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Proxy-based Spam Behavior
Normal email transmission MTA
Router
Corporate / campus / home network
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Proxy-based Spam Behavior
Proxy-based Spam MTA
Router
Corporate / campus / home network
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Connection Correlation• One-to-one mapping between upstream
and downstream connections• In normal email transmission, there’s
only one.• Problems
– Upstream encryption– Overhead– Timing
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Packet Symmetry• Message symmetry
– SMTP message from downstream connection results in TCP message to upstream connection
• Packet symmetry– One packet from downstream
connection results in one packet to upstream connection
– Exceptions
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
TCP Correlation Example
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpamGoals:1. Fast detection of spam laundering with
high accuracy2. Breaking spam laundering via throttling
or blocking after detection3. Support for spammer tracking and law
enforcement4. Support for spam message fingerprinting5. Support for global forensic analysis
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Deployment of DBSpam• At a network vantage point where it can
monitor the bi-directional trafficSingle-homed network:
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Deployment of DBSpamMulti-homed network
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Design of Spam Laundering Detection
• With proxy-based spam transmission, number of incoming SMTP reply packets = number of outgoing TCP packets
• Possible for this to occur with normal traffic, but very seldom
• Sequential Probability Ratio Test (SPRT) is used
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
SPRT• Can be viewed as a one-dimensional
“random walk” starting between two boundaries– One boundary defines “spam
connection”– Other boundary defines “not a spam
connection”
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
SPRT• Each observation pushes the walk in
one direction or the other– Observation of correlated SMTP-TCP
packets pushes walk toward “spam connection”
– Observation of no correlation pushes walk toward “no spam connection”
• When the walk hits either boundary, test ends
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
SPRT• Average number of required
observations to reach a determination depends on four variables:1. α* (the desired probability of false
positives)2. β* (the desired probability of false
negatives)3. θ1 (the distribution of positive
correlation)
4. θ0 (the distribution of negative correlation)
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
SPRTE[N|H1] vs. θ0 and α* (θ1 = 0.99, β* = 0.01)
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
SPRT Detection Algorithm
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Noise Reduction• Maintain a set of external IP addresses
that appear for each time• In the consecutive M time windows,
single out the external IP addresses that appear at least K times
• Can further reduce the incidence of false positives dramatically, depending on the selection of M and K
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Noise reduction
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam Evaluation• Evaluation at College of William & Mary• Two off-campus PCs as spam sources• Two PCs in different campus subnets
running SOCKS and HTTP proxies• Spam “sink” in dark net• Traces run in two different months• N-* includes no spam traffic• S-*-C encrypted spam, S-*-A and S-*-B
unencrypted spam
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam EvaluationSPRT Detection Time
Trace N = 6 N = 11 N ≥ 16S-1-A 970 (100%) 0 0S-1-B 5019
(96.9%)139 (2.7%) 21 (0.4%)
S-1-C 2245 (92.8%)
169 (7.0%) 6 (0.2%)
S-2-A 433 (99.1%)
3 (0.7%) 1 (0.2%)
S-2-B 4298 (94.7%)
198 (4.4%) 40 (0.9%)
S-2-C 1758 (98.9%)
16 (1.0%) 3 (0.1%)
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam EvaluationDistribution of N|H0
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam EvaluationCDF of Detection Time for SPRT
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam EvaluationAccuracy of SPRT
Attribute S-1-A S-1-B S-1-C S-2-A S-2-B S-2-C N-1 N-2Detection 970 5179 2420 437 4536 1777 66 2368
True Positives 966 5108 2369 320 3510 1558 - -
False Positives 4 71 51 117 1026 219 66 2368
True Negatives 290889 1156085 596979 1634307 8895993 4266100 687390 15941150
FP/(FP+TN) 0.0014% 0.0061% 0.0085% 0.0072% 0.012% 0.0051% 0.0096% 0.015%
Spam Connections 958 570 324 329 1351 969 - -
Missed Connections 8 2 0 6 27 13 - -
Missed Conn. Ratio 0.8% 0.4% 0 1.8% 2.0% 1.3% - -
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam EvaluationAccuracy of SPRT after noise reduction
Trace (M,K)(3,2) (4,3) (5,3) (5,4)
S-1-A 0/188 0/138 0/124 0/110S-1-B 0/162 0/126 0/103 0/103S-1-C 0/194 0/150 0/124 0/123S-2-A 0/65 0/36 0/52 0/27S-2-B 13/335 3/243 4/216 0/186S-2-C 0/193 0/124 0/135 0/94N-1 0/0 0/0 0/0 0/0N-2 7/7 1/1 2/2 0/0
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
DBSpam EvaluationResource Consumption
Trace CPU Util CPU Time
Pps Peak Mem
S-1-A 36.3% 9.0s 430283 2.2 MBS-1-B 37.7% 9.8s 426384 1.6 MBS-1-C 24.0% 9.3s 484875 1.2 MBS-2-A 58.0% 36.8s 327076 11.9 MBS-2-B 84.3% 109.2s 241965 10.5 MBS-2-C 57.1% 78.6s 332989 2.8 MBN-1 21.7% 51.1s 478171 5.6 MBN-2 32.1% 789.9s 376925 8.4 MB
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Potential Evasions• Fragmenting SMTP replies at the proxy
– Change the 1:1 packet symmetry into 1:2 or 1:3
• Inserting random delays at the proxy– Randomly change the 1:1 packet
symmetry into 1:0 or 1:2
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Strengths• Simple to implement• Moves spam detection closer to source,
reducing network traffic• Thwarts encryption• Detects proxy-based spam quickly• Few false positives
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering
Weaknesses• Easy to evade by breaking packet
symmetry• Can be thwarted by short SMTP dialogs• Must be installed at ISP edge• Too resource intensive for imbedded
systems