Shades of Grey: A Closer Look at Emails in the Gray Area Jelena Isacenkova Davide Balzarotti
Nov 01, 2014
Shades of Grey: A Closer Look at Emails in the Gray Area
Jelena IsacenkovaDavide Balzarotti
June 23, 2014 Eurecom 2
Evolution of SpamSpam rate100%
0%
50%
1994 1997 1998
Abuse of dynamic dial-up IP addresses
Lawyers Canter and Siegel
commercial spam scandal
Message classifiers (Bayesian)
RBLs
June 23, 2014 Eurecom 3
Evolution of Spam
2002 2003
Release of “Ratware” spamming tools:
DarkMailer, SenderSafe
Open-relay for proxying spam
Appearance of virusesautomatically downloading
email lists
Spam rate100%
0%
50%
9%40%
Directive 2002/58 on Privacy and Electronic
CommunicationsCAN-SPAM Act of 2003
1994 1997 1998
Abuse of dynamic dial-up IP addresses
Lawyers Canter and Siegel
commercial spam scandal
Message classifiers (Bayesian)
RBLs
June 23, 2014 Eurecom 4
Evolution of Spam
2002 2003 2004 2007
2008 2009-2012
Release of “Ratware” spamming tools:
DarkMailer, SenderSafe
Open-relay for proxying spam
Appearance of virusesautomatically downloading
email lists
First botnets:Bagle, Bobax
Distributed spamming tool:Reactor Mailer
Spam rate100%
0%
50%
9%40%
72% 85%
Spammers got sentenced
Srizbi takedown
7 botnet takedowns
Directive 2002/58 on Privacy and Electronic
CommunicationsCAN-SPAM Act of 2003
68%
1994 1997 1998
Abuse of dynamic dial-up IP addresses
Lawyers Canter and Siegel
commercial spam scandal
Message classifiers (Bayesian)
RBLs
June 23, 2014 Eurecom 5
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Email Categories
SPAM HAMGRAY
June 23, 2014 Eurecom 6
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
Newsletters
Notifications
GRAY
Email Categories
SPAM HAMGRAY
Customer Prospecting
Commercial ads
June 23, 2014 Eurecom 7
Gmail Spam folder
June 23, 2014 Eurecom 8
Gmail Spam folder
Within our study users checked 5-6 messages per day
1.5% of harmful spam emails had a malicious attachment
June 23, 2014 Eurecom 9
How significant gray category is?
June 23, 2014 Eurecom 10
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Gray Category in 2007
SPAM HAMGRAY
Newsletters
Notifications
Customer Prospecting
Commercial ads“Most misclassified ham messages are advertising, news digests, … [that] represent a small fraction of incoming mail, ... [which] filters find more difficult to classify.”
- Cormack & Lynam, “Online Supervised Spam Filter Evaluation”, 2007
June 23, 2014 Eurecom 11
Botnet spam
419 scam
Phishing
Targeted Email Attacks
Spear Phishing
Blackhole Spam
Snowshoe Spam
Personal User Emails
GRAY
Gray Category in 2012
SPAM HAMGRAY
“49% of consumers subscribe to 1-10 brands”- Direct Marketing Association
“70% of 'this is spam' are actually legitimate newsletters, offers or notifications”
- 2012, ReturnPath
Newsletters
Notifications
Customer Prospecting
“Graymail emails represent 50% of all inbox traffic”
- 2012, Hotmail
“Graymail – the source of 75% of all spam complaints”
- 2012, Hotmail
Commercial ads
June 23, 2014 Eurecom 12
Selecting a gray email dataset
June 23, 2014 Eurecom 13
Challenge-Response (CR) filtering
June 23, 2014 Eurecom 14
Challenge-Response (CR) filtering
Ham
Spam
June 23, 2014 Eurecom 15
Challenge-Response (CR) filtering
Ham
Spam
June 23, 2014 Eurecom 16
Gray email analysis
June 23, 2014 Eurecom 17
Identification and classificationof campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency and geo-distribution- Delivery statistics- CAPTCHAs solved- Bulk headers
Exact string matching
Limitation: only email header information
was used
June 23, 2014 Eurecom 18
Identification and classificationof campaigns
N-grams
Classification
LEGITIMATESPAM
Evaluation of email headers similarity per campaign
Grouping emails into campaigns
- Campaign sender consistency and geo-distribution- Delivery rejections- CAPTCHAs solved- Bulk headers
Exact string matching― False Positives: 0.9%
― False Negatives: 8.6%
― Classifier uncertainty zone: 6.4%
18% 82%
June 23, 2014 Eurecom 19
Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%
June 23, 2014 Eurecom 20
Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%
- Decompose into groups with a community finding algorithm- Propagate labels in homogeneous groups
June 23, 2014 Eurecom 21
Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%
- Extract graph metrics - Compare them with known clusters
June 23, 2014 Eurecom 22
Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%
False positives drop from 0.9% to 0.2%
June 23, 2014 Eurecom 23
Campaign types
June 23, 2014 Eurecom 24
Campaign Categories
June 23, 2014 Eurecom 25
Campaign Categories
Snowshoe spammers?
June 23, 2014 Eurecom 26
Campaign Categories
June 23, 2014 Eurecom 27
Campaign Categories
The owners websites underline the fact that “they are not spammers”, and that they provide to other companies a way to send marketing emails within the boundaries ofthe current legislation
June 23, 2014 Eurecom 28
Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to commercial camp.
─ Hide behind webmail accounts
June 23, 2014 Eurecom 29
Gray Email Campaign Categories
― Commercial campaigns (42% of total)
─ Use wide IP address ranges to run the campaigns
─ Provide a pre-compiled list of categorized email addresses
─ Distributed, but consistent campaign sending patterns
― Newsletters and notifications
― Botnet-generated campaigns
― Scam and phishing campaigns
─ Behavior similar to commercial camp.
─ Hide behind webmail accounts
June 23, 2014 Eurecom 30
User Behavior
Users are pro-active towards newsletters
June 23, 2014 Eurecom 31
User Behavior
Users are pro-active towards newsletters
June 23, 2014 Eurecom 32
User Behavior
But also curious to check on malicious/illegal content
- 20% of the users have opened botnet-generated emails- Each user on average viewing 5 messages
June 23, 2014 Eurecom 33
User Behavior
But also curious to check on malicious/illegal content
- 20% of the users have opened botnet-generated emails- Each user on average viewing 5 messages
June 23, 2014 Eurecom 34
Summary
June 23, 2014 Eurecom 35
Summary― Presented a first empirical study of gray emails and commercial and
newsletter campaigns
― Classified 50% of the gray emails (15% of all incoming email) and categorized into 4 categories
― Lessons learned:
─ Email classification cannot stay binary anymore
─ By neglecting gray emails and placing them in spam folder, we increase user security threat level instead of helping to lower it
─ Scam campaigns, especially sent from webmail accounts, were the most challenging to deal with
June 23, 2014 Eurecom 36
Questions