MALWISE 1
Nov 18, 2014
1
MALWISE
Malwise—An Effective and Efficient Classification System for Packed and Polymorphic Malware
GUIDED BY,
Mrs.ASHITHA.S.SAsst.ProfessorIT DepartmentLMCST
PRESENTED BY,
FEBIN JOY KAVIYILS7 [email protected]
3
• Significant threat
• Prominent in last few years
• Malware detection – a field with challenging research opportunities
• Anti-Malware systems • right from the beginning• rapid advancement
IntroductionMALWARE
4
• Initial techniques involved the use of controlled environments
• Next or current phase involves the use of malware databases
IntroductionEARLIER AND EXISTING APPROACHES
5
• Predominant technique to detect malware instance is using malware signatures• Database comprises of identified
signatures• Efficient but not effective against
malware variants• Malwise proposes a new technique for
signature generation
IntroductionPRESENT TECHNIQUE
6
• Database creationoChallengingoNeeds access to set of known malwareoNeeds constant updating
• PackingoAdditional code packing to hinder analysiso86% malwares are packed
• Signature generation• Classification
oBy comparing signatures
IntroductionISSUES IN MALWARE DETECTION
7
• Database creationo Flow graph based signatures are
stored
• Unpackingo Using entropy analysis
• Signature generationoControl flow graph based
• ClassificationoUsing string edit distances
IntroductionMALWISE
8
Basic block diagram
WORKING
9
• Using entropy analysis• Entropy is the amount of information
contained in a block• Entropy of a block is given by
• Compressed and encrypted data have high entropy• In earlier systems controlled emulators where used to find OEPs-Original Entry Point• This was efficient but ineffective
UNPACKING
10
• In malwise the concept is extended by checking entropy from time to time
• If entropy of the analyzed data is low we can assume that no more encrypted or compressed data is present and hence stop unpacking
UnpackingSAMPLE
ENTROPY HIGH ? UNPACK
FINISH UNPACKING
NO
YES
11
UnpackingSAMPLE
ENTROPY HIGH ? UNPACK
FINISH UNPACKING
NO
YES
12
• Using Speculative DE-assembly
• Procedures are identified
• Incorrectly identified procedures are eliminated
• Intermediate representation is formed
• Weights are assigned with each signature
De assembly
Intermediate representation
Control flow graph
Signature
SIGNATURE GENERATION
13
Exact Flow graph matching• Only exact replicas or isomorphisms are
identified• Signatures are created by ordering the
nodes of the control flow graph in depth first order• Signature will consist of a list of graph
edges for ordered nodes• Efficient• Matching done using dictionary lookup• Weight is found by
Now signatures can be generated for the two flowgraph matching methods available..
Bi-No of basic blocks in binary
Depth first ordered flowgraph and its signature
Signature generation
14
Approximate Flowgraph matching• Approximate matches of
control flow graph are considered • Enables detection of
Variants• Structuring is used to
generate signatures• The output will be a string
character tokens representing high level structured constructs• Weight is found by
Control flowgraph->High level structured graph->SIGNATURE
Si - Signature of S in binary
Signature generation
15
Now to obtain the final signature the obtained string is converted to binary
Signature generation
16
• Done using Set similarity• Database will be comprised of signatures of
known malware• The input will be a binary• A similarity is constructed between the
binary’s flowgraph strings and each set of flowgraphs associated with malwares in the database• Complex mechanism• Considers the weights associated with the
signatures as well
CLASSIFICATIONNew sample
Non malicious Malicious
17
Basic principle for classification• The process results with a
similarity value for each set of signatures in the malware• Value ranges between 0 and 1• Value > 0.95 => Isomorphs• Value < 0.6 => No similarity• 0.6 > Value < 0.95 => Variant• The threshold values were
fixed after a thorough pilot study
Classification
SAMPLE DATABASESIMLARITY CHECK
> 0.95
ISOMORPHIC
> 0.6 VARIANT
NON MALICIOUS
18
ClassificationSAMPLE DATABASE
SIMLARITY CHECK
SIMILARITY > 0.95
EXACT MATCH OF EXISTING
MALWARE
VARIANT
NON MALICIOUS
SIMILARITY > 0.6
YES
YES
NO
NO
19
OEP• More efficient and effective than any incorporated technique• The table shows Malwise’s performance with some common
softwares
EVALUATION
20
Classification• Detection rate was rounded to be about 57.8%• Earlier approaches achieved maximum up to 39.6• Resilience to false positives• Less than 0.61% of the samples were incorrectly identified
as malwares• At least 10 procedures should be present in the flowgraph
for performing approximate flowgraph matching.• For exact flowgraph matching at least 15 procedures
should be present
Evaluation
21
CONCLUSIONISSUE EARLIER APPROACH MALWISE
UNPACKING USING CONTROLLED ENVIRONMENTS
USING ENTROPY ANALYSIS
SIGNATURE GENERATION
BASED ON BYTE LEVEL REPRESENTARION
BASED ON CONTROL FLOW GRAPH
DATABASE SOURCE CODE DEPENDENT SIGNATURES
CONTROL FLOW DEPENDENT SIGNATURES
CLASSIFICAION EXACT MATCHING ONLY EXACT MATCHING AND APPROXIMATE MATCHING
22
• Malwares and malware variants can be identified using similarity in Control flow graph• Unpacking using Entropy analysis proved more efficient• MALWISE proves to be a more efficient and effective substitute for
the existing anti-malware systems in internet gateways or so called anti-viruses on our desktops• Not yet implemented as anti-malware system• However SIMSEER(http://www.simseer.com) and BUGWISE(
http://www.bugwise.com) uses the same technique
CONCLUSION
23
QUERIES ?
24
• Malwise—An Effective and Efficient Classification System for Packed and Polymorphic Malware(IEEE PRESENTATION)
• http://www.experthacker.com• http://www.bugwise.com• http://www.simseer.com• http://www.gensign.com/flowgr
aph_malwise
REFERENCES
SILVIO CESARE
25