Top Banner
Shades of Grey: A Closer Look at Emails in the Gray Area Jelena Isacenkova Davide Balzarotti
36

Unveiling the gray emails: A Closer Look at Emails in the Gray Area

Nov 01, 2014

Download

Science

Every day, millions of users spend a considerable amount of time browsing through the messages in their spam folders. In this work we look into the often overlooked area of emails -- gray emails, i.e., those messages that cannot
be clearly categorized one way or the other by automated
spam filters. We analyze real-world emails by
grouping them into clusters of 4 categories of bulk email campaigns, where some contain potentially harmful content, and some not, thus having a different security risk levels on the users.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

Shades of Grey: A Closer Look at Emails in the Gray Area

Jelena IsacenkovaDavide Balzarotti

Page 2: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 2

Evolution of SpamSpam rate100%

0%

50%

1994 1997 1998

Abuse of dynamic dial-up IP addresses

Lawyers Canter and Siegel

commercial spam scandal

Message classifiers (Bayesian)

RBLs

Page 3: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 3

Evolution of Spam

2002 2003

Release of “Ratware” spamming tools:

DarkMailer, SenderSafe

Open-relay for proxying spam

Appearance of virusesautomatically downloading

email lists

Spam rate100%

0%

50%

9%40%

Directive 2002/58 on Privacy and Electronic

CommunicationsCAN-SPAM Act of 2003

1994 1997 1998

Abuse of dynamic dial-up IP addresses

Lawyers Canter and Siegel

commercial spam scandal

Message classifiers (Bayesian)

RBLs

Page 4: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 4

Evolution of Spam

2002 2003 2004 2007

2008 2009-2012

Release of “Ratware” spamming tools:

DarkMailer, SenderSafe

Open-relay for proxying spam

Appearance of virusesautomatically downloading

email lists

First botnets:Bagle, Bobax

Distributed spamming tool:Reactor Mailer

Spam rate100%

0%

50%

9%40%

72% 85%

Spammers got sentenced

Srizbi takedown

7 botnet takedowns

Directive 2002/58 on Privacy and Electronic

CommunicationsCAN-SPAM Act of 2003

68%

1994 1997 1998

Abuse of dynamic dial-up IP addresses

Lawyers Canter and Siegel

commercial spam scandal

Message classifiers (Bayesian)

RBLs

Page 5: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 5

Botnet spam

419 scam

Phishing

Targeted Email Attacks

Spear Phishing

Blackhole Spam

Snowshoe Spam

Personal User Emails

GRAY

Email Categories

SPAM HAMGRAY

Page 6: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 6

Botnet spam

419 scam

Phishing

Targeted Email Attacks

Spear Phishing

Blackhole Spam

Snowshoe Spam

Personal User Emails

Newsletters

Notifications

GRAY

Email Categories

SPAM HAMGRAY

Customer Prospecting

Commercial ads

Page 7: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 7

Gmail Spam folder

Page 8: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 8

Gmail Spam folder

Within our study users checked 5-6 messages per day

1.5% of harmful spam emails had a malicious attachment

Page 9: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 9

How significant gray category is?

Page 10: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 10

Botnet spam

419 scam

Phishing

Targeted Email Attacks

Spear Phishing

Blackhole Spam

Snowshoe Spam

Personal User Emails

GRAY

Gray Category in 2007

SPAM HAMGRAY

Newsletters

Notifications

Customer Prospecting

Commercial ads“Most misclassified ham messages are advertising, news digests, … [that] represent a small fraction of incoming mail, ... [which] filters find more difficult to classify.”

- Cormack & Lynam, “Online Supervised Spam Filter Evaluation”, 2007

Page 11: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 11

Botnet spam

419 scam

Phishing

Targeted Email Attacks

Spear Phishing

Blackhole Spam

Snowshoe Spam

Personal User Emails

GRAY

Gray Category in 2012

SPAM HAMGRAY

“49% of consumers subscribe to 1-10 brands”- Direct Marketing Association

“70% of 'this is spam' are actually legitimate newsletters, offers or notifications”

- 2012, ReturnPath

Newsletters

Notifications

Customer Prospecting

“Graymail emails represent 50% of all inbox traffic”

- 2012, Hotmail

“Graymail – the source of 75% of all spam complaints”

- 2012, Hotmail

Commercial ads

Page 12: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 12

Selecting a gray email dataset

Page 13: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 13

Challenge-Response (CR) filtering

Page 14: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 14

Challenge-Response (CR) filtering

Ham

Spam

Page 15: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 15

Challenge-Response (CR) filtering

Ham

Spam

Page 16: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 16

Gray email analysis

Page 17: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 17

Identification and classificationof campaigns

N-grams

Classification

LEGITIMATESPAM

Evaluation of email headers similarity per campaign

Grouping emails into campaigns

- Campaign sender consistency and geo-distribution- Delivery statistics- CAPTCHAs solved- Bulk headers

Exact string matching

Limitation: only email header information

was used

Page 18: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 18

Identification and classificationof campaigns

N-grams

Classification

LEGITIMATESPAM

Evaluation of email headers similarity per campaign

Grouping emails into campaigns

- Campaign sender consistency and geo-distribution- Delivery rejections- CAPTCHAs solved- Bulk headers

Exact string matching― False Positives: 0.9%

― False Negatives: 8.6%

― Classifier uncertainty zone: 6.4%

18% 82%

Page 19: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 19

Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%

Page 20: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 20

Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%

- Decompose into groups with a community finding algorithm- Propagate labels in homogeneous groups

Page 21: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 21

Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%

- Extract graph metrics - Compare them with known clusters

Page 22: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 22

Refinement with Graph AnalysisSPAM: 16%UNCERTAIN: 7%LEGITIMATE: 77%

False positives drop from 0.9% to 0.2%

Page 23: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 23

Campaign types

Page 24: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 24

Campaign Categories

Page 25: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 25

Campaign Categories

Snowshoe spammers?

Page 26: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 26

Campaign Categories

Page 27: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 27

Campaign Categories

The owners websites underline the fact that “they are not spammers”, and that they provide to other companies a way to send marketing emails within the boundaries ofthe current legislation

Page 28: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 28

Gray Email Campaign Categories

― Commercial campaigns (42% of total)

─ Use wide IP address ranges to run the campaigns

─ Provide a pre-compiled list of categorized email addresses

─ Distributed, but consistent campaign sending patterns

― Newsletters and notifications

― Botnet-generated campaigns

― Scam and phishing campaigns

─ Behavior similar to commercial camp.

─ Hide behind webmail accounts

Page 29: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 29

Gray Email Campaign Categories

― Commercial campaigns (42% of total)

─ Use wide IP address ranges to run the campaigns

─ Provide a pre-compiled list of categorized email addresses

─ Distributed, but consistent campaign sending patterns

― Newsletters and notifications

― Botnet-generated campaigns

― Scam and phishing campaigns

─ Behavior similar to commercial camp.

─ Hide behind webmail accounts

Page 30: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 30

User Behavior

Users are pro-active towards newsletters

Page 31: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 31

User Behavior

Users are pro-active towards newsletters

Page 32: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 32

User Behavior

But also curious to check on malicious/illegal content

- 20% of the users have opened botnet-generated emails- Each user on average viewing 5 messages

Page 33: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 33

User Behavior

But also curious to check on malicious/illegal content

- 20% of the users have opened botnet-generated emails- Each user on average viewing 5 messages

Page 34: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 34

Summary

Page 35: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 35

Summary― Presented a first empirical study of gray emails and commercial and

newsletter campaigns

― Classified 50% of the gray emails (15% of all incoming email) and categorized into 4 categories

― Lessons learned:

─ Email classification cannot stay binary anymore

─ By neglecting gray emails and placing them in spam folder, we increase user security threat level instead of helping to lower it

─ Scam campaigns, especially sent from webmail accounts, were the most challenging to deal with

Page 36: Unveiling the gray emails: A Closer Look at Emails in the Gray Area

June 23, 2014 Eurecom 36

Questions