Top Banner
Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike Hsiao 20081107 University of Michigan, Arbor Network
29

Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

Jan 02, 2016

Download

Documents

Judith Carr
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

Automated Classification and Analysis of Internet Malware

M. BaileyJ. OberheideJ. AndersenZ. M. MaoF. JahanianJ. Nazario

RAID 2007

Presented by Mike Hsiao 20081107

University of Michigan,Arbor Network

Page 2: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

2

Outline

Introduction Anti-Virus Clustering of Malware Behavior-Based Malware Clustering Evaluation Related Work Conclusion

Page 3: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

3

Introduction

Current different anti-virus products characterize malware in ways that are inconsistent across anti-virus products, incomplete across malware, and fall to be concise in there semantics.

The authors propose a new classification technique that describes malware behavior in terms of system state changes.

Automated Classification and Analysis of Internet Malware

Page 4: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

4

Introduction (cont’d)

Spam, phishing, denial of service attacks, botnets, and worms largely depend on some form of malicious code, commonly referred to as malware.– Exploiting software vulnerability– Tricking users into running malicious code

Page 5: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

5

Introduction (challenges)

Agobot (name of a malware) has been observed to have more than 580 variants.

– Agobot variants have the ability to perform DoS attacks, steal bank passwords and account, propagate over the network using a diverse set of remote exploits, use polymorphism and disassembly, and even patch vulnerabilities and remove competing malware.

A recent Microsoft survey found more than 43,000 new variants of backdoor trojans and bots during the first half of 2006.

multi-vector

Page 6: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

6

Introduction

The authors developed a dynamic analysis approach, based on the execution of malware and the casual tracing of the OS objects created due to malware execution.

The reduced collection of these user-visible system state changes is used to create a fingerprint of the malware’s behavior.

– These fingerprints are more invariant and useful than abstract code sequence (representing program behaviors)

Page 7: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

7

Introduction

These can be directly used in assessing the potential damage incurred, enabling detection and classification of new threats, and assisting in the risk assessment of these threats in mitigation and clean up.

The author provide a method for automatically categorizing these malware profiles into groups that reflect similar classes of behaviors.

Page 8: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

8

Outline

Introduction Anti-Virus Clustering of Malware Behavior-Based Malware Clustering Evaluation Related Work Conclusion

Page 9: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

9

Understanding Anti-Virus Malware Labeling

In order to accurately characterize the ability of AV to provide meaningful labels for malware, …

Note: AV systems rarely use the exact same labels for a threat, and users of these systems have come to expect simple naming differences across vendors.

e.g, WORM_MSBLAST.A

Page 10: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

10

A pool of malware classified by AVs as SDBot families

The classification of SDBot is ambiguous.

Page 11: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

11

Properties of a Labeling System

Consistency– Identical items must and similar items should be assigned t

he same label.

Completeness– A label should be generated for as many items as possible.

Conciseness– The labels should be sufficient in number to reflect the uniq

ue properties of interest, while avoiding superfluous labels.

Page 12: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

12

Outline

Introduction Anti-Virus Clustering of Malware Behavior-Based Malware Clustering Evaluation Related Work Conclusion

Page 13: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

13

Defining and Generating Malware Behaviors

Individual system calls may be at a level that is too low for abstracting semantically meaningful information

– a higher abstraction level is needed to effectively describe the behavior of malware.

The authors define the behavior of malware in terms of non-transient state changes that the malware causes on the system.

– spawned process, modified registry keys, modified files, network connection attempts.

Page 14: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

14

Clustering of Malware

Ten unique malware samples - P: number of process - F: file - R: registry - N: network

Our approach to generating meaningful labels is achieved through clustering of the behavioral fingerprints.

Page 15: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

15

Comparing Individual Malware Behaviors - NCD

Intuitively, Normalized Compression Distance (NCD) represents the overlap in information between two samples.

C(x) is the zlib-compressed length of x.

Page 16: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

16

Constructing Relationships Between Malware

dist

ance

Page 17: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

17

Extracting Meaningful Groupsdi

stan

ce

c1c2

c3c4

Clusters are constructed from the tree by first calculating the inconsistency coefficient of each cluster, and then thresholding based on the coefficient.

Page 18: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

18

Outline

Introduction Anti-Virus Clustering of Malware Behavior-Based Malware Clustering Evaluation Related Work Conclusion

Page 19: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

19

Comparing AV Groupings and Behavioral Clustering

The propose method created 403 cluster from 3,698 individual malware.

– http://www.eecs.umich.edu/~mibailey/malware/

The authors expect that a behavior-based approach would separate out these more general classes if their behavior differs, and aggregate across the more specific classes if behaviors are shared.

Page 20: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

20

Comparing AV Groupings and Behavioral Clustering (example)

Symantec, who adopts a more general approach, has two binaries identified as “back-door.sdbot”.

They were divided into separate clusters in our evaluation based on

– differing processes created, – differing back-door ports, – differing methods of process invocation or reboot, – and the presence of AV avoidance in one of the samples.

Page 21: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

21

Comparing AV Groupings and Behavioral Clustering (example)

FProt, which has a high propensity to label each malware sample individually,

– had 47 samples that were identified as belonging to the sdbot family.

FProt provided 46 unique labels for these samples, nearly one unique label per sample.

In our clustering, these 46 unique labels were collapsed into 15 unique clusters reflecting their overlap in behaviors.

Page 22: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

22

Measuring the Completeness, Conciseness and Consistency

No such behavior - P: number of process - F: file - R: registry - N: network

In the large sample, roughly 2,200 binaries shared exactly identical behavior with another sample. When grouped, these 2,200 binaries created 267 groups in which each sample in the group had exactly the same behavior.

Page 23: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

23

Application of Clustering and Behavior Signatures (1/2)

Classifying Emerging Threats– For example, cluster c156 consists of three malware sampl

es that exhibit malicious bot-related behavior, including IRC command and control activities.

– Each of the 75 behaviors observed in the cluster is shared with other samples of the group—96.92% on average, meaning the malware samples within the cluster have almost identical behavior.

– It is clear that our behavioral classification would assist in identifying these samples as emerging threats through their extensive malicious behavioral profile.

Page 24: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

24

Application of Clustering and Behavior Signatures (2/2)

Resisting Binary Polymorphism

Examining the Malware Behaviors

Page 25: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

25

Outline

Introduction Anti-Virus Clustering of Malware Behavior-Based Malware Clustering Evaluation Related Work Conclusion

Page 26: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

26

Related Work

Content-based signatures– insufficient to cope with emerging threats due to intentional

evasion

Lower-layer behavioral profiles– individual system calls, instruction-based code templates, s

hellcode, network connection and session behavior– do not provide semantic value in explaining behaviors exhib

ited by a malware variant or family

Ellis– similar data being sent from one to the next

Page 27: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

27

Outline

Introduction Anti-Virus Clustering of Malware Behavior-Based Malware Clustering Evaluation Related Work Conclusion

Page 28: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

28

Conclusion

They showed that AV systems are incomplete in that they fail to detect or provide labels.– not consistent– vary widely in their conciseness

Create a behavioral fingerprint of the malware’s activity– the state changes that are a causal result of the in

fection.

Page 29: Automated Classification and Analysis of Internet Malware M. Bailey J. Oberheide J. Andersen Z. M. Mao F. Jahanian J. Nazario RAID 2007 Presented by Mike.

29

Comments

Host-based observation– v.s. network observation

Classify collected malware– v.s. detect malicious behavior

Closely to understand what are happening while these malware are executed.– v.s. the revealed behaviors that reflect the abnor

malities of compromised service