Transcript

1

MALWISE

Malwise—An Effective and Efficient Classification System for Packed and Polymorphic Malware

GUIDED BY,

Mrs.ASHITHA.S.SAsst.ProfessorIT DepartmentLMCST

PRESENTED BY,

FEBIN JOY KAVIYILS7 CSLMCSTfebinjoykaviyil@gmail.com

3

• Significant threat

• Prominent in last few years

• Malware detection – a field with challenging research opportunities

• Anti-Malware systems • right from the beginning• rapid advancement

IntroductionMALWARE

4

• Initial techniques involved the use of controlled environments

• Next or current phase involves the use of malware databases

IntroductionEARLIER AND EXISTING APPROACHES

5

• Predominant technique to detect malware instance is using malware signatures• Database comprises of identified

signatures• Efficient but not effective against

malware variants• Malwise proposes a new technique for

signature generation

IntroductionPRESENT TECHNIQUE

6

• Database creationoChallengingoNeeds access to set of known malwareoNeeds constant updating

• PackingoAdditional code packing to hinder analysiso86% malwares are packed

• Signature generation• Classification

oBy comparing signatures

IntroductionISSUES IN MALWARE DETECTION

7

• Database creationo Flow graph based signatures are

stored

• Unpackingo Using entropy analysis

• Signature generationoControl flow graph based

• ClassificationoUsing string edit distances

IntroductionMALWISE

8

Basic block diagram

WORKING

9

• Using entropy analysis• Entropy is the amount of information

contained in a block• Entropy of a block is given by

• Compressed and encrypted data have high entropy• In earlier systems controlled emulators where used to find OEPs-Original Entry Point• This was efficient but ineffective

UNPACKING

10

• In malwise the concept is extended by checking entropy from time to time

• If entropy of the analyzed data is low we can assume that no more encrypted or compressed data is present and hence stop unpacking

UnpackingSAMPLE

ENTROPY HIGH ? UNPACK

FINISH UNPACKING

NO

YES

11

UnpackingSAMPLE

ENTROPY HIGH ? UNPACK

FINISH UNPACKING

NO

YES

12

• Using Speculative DE-assembly

• Procedures are identified

• Incorrectly identified procedures are eliminated

• Intermediate representation is formed

• Weights are assigned with each signature

De assembly

Intermediate representation

Control flow graph

Signature

SIGNATURE GENERATION

13

Exact Flow graph matching• Only exact replicas or isomorphisms are

identified• Signatures are created by ordering the

nodes of the control flow graph in depth first order• Signature will consist of a list of graph

edges for ordered nodes• Efficient• Matching done using dictionary lookup• Weight is found by

Now signatures can be generated for the two flowgraph matching methods available..

Bi-No of basic blocks in binary

Depth first ordered flowgraph and its signature

Signature generation

14

Approximate Flowgraph matching• Approximate matches of

control flow graph are considered • Enables detection of

Variants• Structuring is used to

generate signatures• The output will be a string

character tokens representing high level structured constructs• Weight is found by

Control flowgraph->High level structured graph->SIGNATURE

Si - Signature of S in binary

Signature generation

15

Now to obtain the final signature the obtained string is converted to binary

Signature generation

16

• Done using Set similarity• Database will be comprised of signatures of

known malware• The input will be a binary• A similarity is constructed between the

binary’s flowgraph strings and each set of flowgraphs associated with malwares in the database• Complex mechanism• Considers the weights associated with the

signatures as well

CLASSIFICATIONNew sample

Non malicious Malicious

17

Basic principle for classification• The process results with a

similarity value for each set of signatures in the malware• Value ranges between 0 and 1• Value > 0.95 => Isomorphs• Value < 0.6 => No similarity• 0.6 > Value < 0.95 => Variant• The threshold values were

fixed after a thorough pilot study

Classification

SAMPLE DATABASESIMLARITY CHECK

> 0.95

ISOMORPHIC

> 0.6 VARIANT

NON MALICIOUS

18

ClassificationSAMPLE DATABASE

SIMLARITY CHECK

SIMILARITY > 0.95

EXACT MATCH OF EXISTING

MALWARE

VARIANT

NON MALICIOUS

SIMILARITY > 0.6

YES

YES

NO

NO

19

OEP• More efficient and effective than any incorporated technique• The table shows Malwise’s performance with some common

softwares

EVALUATION

20

Classification• Detection rate was rounded to be about 57.8%• Earlier approaches achieved maximum up to 39.6• Resilience to false positives• Less than 0.61% of the samples were incorrectly identified

as malwares• At least 10 procedures should be present in the flowgraph

for performing approximate flowgraph matching.• For exact flowgraph matching at least 15 procedures

should be present

Evaluation

21

CONCLUSIONISSUE EARLIER APPROACH MALWISE

UNPACKING USING CONTROLLED ENVIRONMENTS

USING ENTROPY ANALYSIS

SIGNATURE GENERATION

BASED ON BYTE LEVEL REPRESENTARION

BASED ON CONTROL FLOW GRAPH

DATABASE SOURCE CODE DEPENDENT SIGNATURES

CONTROL FLOW DEPENDENT SIGNATURES

CLASSIFICAION EXACT MATCHING ONLY EXACT MATCHING AND APPROXIMATE MATCHING

22

• Malwares and malware variants can be identified using similarity in Control flow graph• Unpacking using Entropy analysis proved more efficient• MALWISE proves to be a more efficient and effective substitute for

the existing anti-malware systems in internet gateways or so called anti-viruses on our desktops• Not yet implemented as anti-malware system• However SIMSEER(http://www.simseer.com) and BUGWISE(

http://www.bugwise.com) uses the same technique

CONCLUSION

23

QUERIES ?

24

• Malwise—An Effective and Efficient Classification System for Packed and Polymorphic Malware(IEEE PRESENTATION)

• http://www.experthacker.com• http://www.bugwise.com• http://www.simseer.com• http://www.gensign.com/flowgr

aph_malwise

REFERENCES

SILVIO CESARE

25