Top Banner
Real-Time Detection of Malware Downloads via Large-Scale URLFileMachine Graph Mining Babak Rahbarinia ; Marco Balduzzi ; Roberto Perdisci AsiaCCS 2016, June 02, Xi’an, China 1
32

Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Apr 11, 2017

Download

Internet

Marco Balduzzi
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Real-Time Detection of Malware Downloads via Large-Scale URL→File→Machine Graph Mining

Babak Rahbarinia ; Marco Balduzzi ; Roberto PerdisciAsiaCCS 2016, June 02, Xi’an, China

1

Page 2: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Introduction

Traditional AV is dead?Signature-based VS. Statistical-based

Traditional AVs inefficiency (they don’t work!)polymorphism, code obfuscation, packers, ...

URL blacklistingstatic, lags behindtime consuming analysis of individual URLs

Local VS. GlobalLocal: looks at one potential malware at a time

Global: leverages global situational awareness

2

Page 3: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Introduction

Large-scale analysis of behavioral patterns“Who - where - what” relationshipGlobal situation awarenessGraph-based machine learning

Combination of system- and network-level info

Mastino:Real-time and concurrent detection of download

eventsReal-world deployment on million of machines

(Internet-scale)3

Page 4: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Approach

4

Page 5: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Approach

5

Page 6: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Static+dynamic detection [Many]

Graph mining detection: Polonium [KDD10]Offline approach VS real-timeOnly files classification VS + URLs (download event)Bipartite VS tripartite graphProprietary reputation function VS open

AMICO [Esorics13]HTTP-centric VS protocol-independentOnly works in LANs VS “move across networks”

Google’s CAMP [NDSS13]Browser-centric VS system-centric

(Quick) Related Work

6

Page 7: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Download GraphURLs

Files Machines

7

Page 8: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

AnnotationsURLs

Files Machines

● Age of URL, domain, path, IP

● Size● Lifetime, prevalence● Packed, signed

● Download behavior● Client processes8

Page 9: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

URLs

Files Machines

Labeling

Machines’ reputations based on their download/activity history 9

● B: Alexa (-hosting)● M: GSB + WRS

● B: Grid + VT● M: VT

Page 10: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

Files Features

10

Page 11: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

f intrinsicfeatures = {file size, prevalence,

packed, signed, ...}+

Files Features

11

Page 12: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

f intrinsicfeatures = {file size, prevalence,

packed, signed, ...}

Files Features

URLs Features

u + {all URLs sharing a component with u}

file1 file2 file3

u behavior-basedfeatures = {files stats, machine stats}

file4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

File’s R

Machine’s R

+

12

Page 13: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features and classifier

URLs Features

u + {all URLs sharing a component with u}

file1 file2 file3

u behavior-basedfeatures = {files stats, machine stats}

file4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

File’s R

Machine’s R

u intrinsicfeatures = {URL, FQD,

e2LD recency}+

f

url1 url2 url3

f behavior-basedfeatures = {URL stats, machine stats}

url4

machine1 machine3machine2

compute min, max, med, avg, and std

compute min, max, med, avg, and std

URL’s R + R of [FQD, e2LD, path, path pattern, query string, query pattern]

Machine’s R

f intrinsicfeatures = {file size, prevalence,

packed, signed, ...}

Files Features

+

13

Page 14: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #1

U1

U2

URLs

Files Machines

F2

F1

F3

G1

G2

What could be said about F1 and F2?

14

Page 15: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #1URLs

Files Machines

F2

F1

What could be said about F1 and F2?

15

Page 16: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #1URLs

Files Machines

F2

F1

What could be said about F1 and F2?

16

Page 17: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

What could be said about F1?All neighbors are unknown

F1

Machines17

Page 18: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

FQD Path

All URLs that share the same components as u

Machines

All URL components:* FQD* e2LD* Path* Path pattern* Query string* Query string pattern* IP* IP/24

18

F1

Page 19: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

FQD Path

All URLs that share the same components as u

Machines19

F1

Page 20: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Example #2

u

URLs

Files

FQD Path

All URLs that share the same components as u

Machines

F1

20

Page 21: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Deployment

TimeDay 1 Day 2 Today

...Yesterday

21

Time Window of 10 days

Page 22: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Deployment

TimeDay 1 Day 2 Today

...Yesterday

Trained classifiers

URL classifier

SHA1 classifier

Real-time classification

of URLs & SHA1s

Detection of

Malicious Download Events

22

Page 23: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Data Collection

7 months of data (Jan to Aug 2014)d = (u; f; m) Hundreds of thousands of machines, files, urlsMillion of nodes

Labeling:Files: VirusTotal, GRID [Trend]URLs: Alexa, Google Safe Browsing, WRS [Trend]

Annotations:File census and GUID census [Trend]Virus Total (signed..)

23

Page 24: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Train & test for new download events

New download events

Detection results new events over 7 periods of 5 days (35 days, total)

Files URLs

24

Page 25: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Combined detection of download events

(u = m) v (f = m) -> d = m1 day experiment (5 months)

Efficiency: requests are served in ~0.16 sec84% of detection: 0-days (unknown)

25

Page 26: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Wuachos.A DropperFilename file_saw.exe

URLs with _no_ reputationLow prevalenceInvalid signaturePath pattern with R of 0.72 (malicious) [*]

1,445 URLs serving 182 polymorphic malware

[*] /f/1392240240/1255385580/2 , /f/1392240120/4165299987/2 -> /H1/I10/I10/I1

Case Study #1

26

Page 27: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Somoto AdwareFilename FreeZipSetup-[\d].exe

Packed, short lifetime, prevalence = 01 related machine downloaded 1 known

sample during our time window T=10days

Detected a campaign of 695 samples616 were unknown to VirusTotal

61 unknown +6 months

Case Study #2

27

Page 28: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

TTAWinCDM Spyware

Machine and URL with _no_ reputationLow lifetime&prevelance&countries

Mismatch on downloading processAcrobat process VS. Unauthoritative domain

Flash 0-day (+2 month)

Case Study #3

28

Page 29: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Analysis of Window T

Bonus #1

29

Page 30: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Features Analysis

Bonus #2

30

Files analysis URLs analysis

Page 31: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Mastino: real-time detection of malware downloads by passive clients monitoring

Content agnostic, behavioral analysis

Real-world deployment on large-scaleOver 95% TP / 0.5% FP0-days

Conclusions

31

Page 32: Detection of Malware Downloads via Graph Mining (AsiaCCS '16)

Thank you!

@embytehttp://www.madlab.it

Babak Rahbarinia ; Marco Balduzzi ; Roberto Perdisci

Questions?

32