Behavior-Based Malware Behavior-Based Malware DetectionDetection
Somesh JhaUniversity of Wisconsin,
Madison
The Malware ProblemThe Malware ProblemHost-based malicious-code detection:• New program arrives an end-host
system.• Need to identify whether the
program is malicious or not.
Viruses, trojans, backdoors, bots, adware, spyware, ...
June 2011 Somesh Jha: Behavior-Based Malware Detection 2
June 2011 Somesh Jha: Behavior-Based Malware Detection 3
Malware: A Threat Malware: A Threat AssessmentAssessment
Win32 viruses and other malware
445 687 9941,702
4,496
7,360
10,866
0
3,000
6,000
9,000
12,000
J an.-J une
2002
J uly-Dec.
2002
J an.-J une
2003
J uly-Dec.
2003
J an.-J une
2004
J uly-Dec.
2004
J an.-J une
2005
Tota
l num
ber
Total viruses and worms
Total families
Source: Symantec Research
June 2011 Somesh Jha: Behavior-Based Malware Detection 4
New Win32 virus and worm variants 2002-2005
445 687 9941,702
4,496
7,360
10,866
141 184 164 171 170N/ A N/ A0
3,000
6,000
9,000
12,000
J an.-J une2002
J uly-Dec.2002
J an.-J une2003
J uly-Dec.2003
J an.-J une2004
J uly-Dec.2004
J an.-J une2005
Period
Tota
l num
ber
Total viruses and worms
Total families
New Win32 virus and worm variants 2002-2005
445 687 9941,702
4,496
7,360
10,866
0
3,000
6,000
9,000
12,000
J an.-J une2002
J uly-Dec.2002
J an.-J une2003
J uly-Dec.2003
J an.-J une2004
J uly-Dec.2004
J an.-J une2005
Period
Tota
l num
ber
Total viruses and worms
Total families
Malware: A Threat Malware: A Threat AssessmentAssessment
Source: Symantec Research
Symantec Threat Report Symantec Threat Report 20102010• Highlights from the report
• See– http://www.symantec.com/en/uk/
business/theme.jsp?themeid=threatreport
June 2011 Somesh Jha: Behavior-Based Malware Detection 5
DemographicsDemographics• Where do attacks emerge?• US is still top on the list
– 19% in 2009 (23% in 2008)
• Emergence of other countries in the top 10 list– Brazil and India– Emergence of these new countries
related to increased internet connectivity in these countries
Attack TargetsAttack Targets• Who are the attackers targeting?• Old news
– Spam, identity theft, …– Still important factors
• New Trend– It looks like hackers are now targeting
enterprises and government organizations
– The goal seems to theft of sensitive data or espionage
– Stuxnet is most sophisticated example of this attack
Vulnerabilities ExploitedVulnerabilities Exploited• What vulnerabilities are attackers
exploiting?• It seems like web-based attacks are
the most popular– Mozilla Firefox seems to be the most
vulnerable
• The most common Web-based attack in 2009 was related to malicious PDF activity– Exploits vulnerabilities in “plug ins” that
read the attached PDF file
Malware TrendsMalware Trends• What types of malware were most
prevalent?• Trojans rule!
– Out of 10 malware families detected 6 were Trojans (2 worms, 1 back door, and 1 virus)
• Tool kits for creating malware and variants have matured– Popular kits: SpyEye, Fragus, Zues, …– In 2009 Symantec encountered 90,000
variants of malware variants created by the Zues toolkit
Take AwaysTake Aways• Demographics of attack origins is
expanding• Web is the major vector for attack• Trojans are the most prevalent form
of malware• Creating malware variants is easy
because the toolkits have matured• Enterprises and organizations are
going to be increasingly targeted
Market TrendsMarket Trends• Security market will have a rapid
growth in other countries (e.g., Brazil and India)– Reason: Demographics of attack origin
• Enterprise market will expand– Reason: Enterprises are being targeted
by the attackers
• Other technologies for detection and remediation will become important
DefensesDefenses• Simple measures
– Having policies in an enterprise can go a long way
– For example, don’t open a PDF attachment if you don’t recognize the sender
• Signature-based detection is not enough– In 2009 Symantec created 2,895,000
signatures– In 2008 they created 1,691,323 signatures– These detectors need to be complemented
with other types of detection
Defenses Defenses • Complementing technologies
– Behavior-based and reputation-based detection can complement signature-based detection
– These complementing defenses can keep the number of signatures in check
– These two technologies are mentioned throughout the report
• Data breaches– Keep confidential data secure even if an
enterprise gets compromised– There are several solutions in the market– Remediation solutions will also gain traction
Key DefinitionsKey DefinitionsVariants : New strains of viruses that
borrow code, to varying degrees, directly from other known viruses.
Source: Symantec Security Response Glossary
Family: a set of variants with a common code base.
Beagle family has 197 variants (as of Nov. 30).Warezov family has 218 variants (as on Nov.
27).
The Malware ProblemThe Malware Problem• Malware writers use any and all
techniques to evade detection.– Obfuscation / packing / encryption– Remote code updates– Rootkit-based hiding
• Detectors use technology from 15 years ago: signature-based detection.
lea eax, [ebp+Data]push offset aServices_exepush eaxcall _strcatpop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxlea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA
Signature-Based DetectionSignature-Based Detection8D 85 D8 FE FF FF68 78 8E 40 0050E8 69 06 00 00598D 85 D8 FE FF FF5957508D 85 D4 FD FF FF50FF 15 C0 60 40 00
Signature
• Signatures (aka scan-strings) are the most common malware detection mechanism.
June 2011 Somesh Jha: Behavior-Based Malware Detection 17
Signature Detection Does Not Signature Detection Does Not ScaleScaleOne signature for one malware
instance.
June 2011 Somesh Jha: Behavior-Based Malware Detection 18
Current Signature Current Signature ManagementManagementMcAfee: release daily updates
– Trying to move to hourly “beta” updates
DAT File #
DateThreats
DetectedNew Threats
AddedThreats Updated
4578 Sep. 09 147,382 22 188
4579 Sep. 12 147,828 27 231
4580 Sep. 13 148,000 11 236
4581 Sep. 14 148,368 42 140
4582 Sep. 15 148,721 16 203
4583 Sep. 16 149,050 18 117
Source: McAfee DAT Readme
Huge Signature DatabasesHuge Signature Databases• Recently, McAfee announced the
addition of the 200,000th signature.– More signatures than files on a standard
Windows machine (approx. 100k).
• McAfee notes that:“Good family detection becomes crucial for a less worrisome experience on the Internet.”
Source: McAfee Avert Labs
June 2011 Somesh Jha: Behavior-Based Malware Detection 20
Roadmap to Better Roadmap to Better DetectionDetection• Make the malware writer’s job as
hard as possible.
• Detect malware families,not individual malware instances.
• Catch behavior,not syntactic artifacts.
June 2011 Somesh Jha: Behavior-Based Malware Detection 21
OutlineOutline• Introduction• Threat Model• Evaluation of Current Detectors• Behavior-Based Detection• Future Directions
June 2011 Somesh Jha: Behavior-Based Malware Detection 22
Threat ModelThreat Model• Malware writers craft their programs
so to avoid detection.
Two common evasion techniques:– Program Obfuscation
(Preserves malicious behavior)
– Program Evolution(Enhances malicious behavior)
June 2011 Somesh Jha: Behavior-Based Malware Detection 23
Obfuscations for EvasionObfuscations for EvasionNop insertionRegister renamingJunk insertionInstruction reorderingEncryptionCompressionReversing of branch conditionsEquivalent instruction substitutionBasic block reordering...
June 2011 Somesh Jha: Behavior-Based Malware Detection 24
lea eax, [ebp+Data]push offset aServices_exepush eaxcall _strcatpop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxlea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA
lea eax, [ebp+Data]noppush offset aServices_exenopnoppush eaxcall _strcatnopnopnoppop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxnoplea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA
Evasion Through Evasion Through Junk Junk InsertionInsertion
8D 85 D8 FE FF FF68 78 8E 40 0050E8 69 06 00 00598D 85 D8 FE FF FF5957508D 85 D4 FD FF FF50FF 15 C0 60 40 00
Signature
June 2011 Somesh Jha: Behavior-Based Malware Detection 25
lea eax, [ebp+Data]noppush offset aServices_exenopnoppush eaxcall _strcatnopnopnoppop ecxlea eax, [ebp+Data]pop ecxpush edipush eaxnoplea eax, [ebp+ExistingFileName]push eaxcall ds:CopyFileA
lea eax, [ebp+Data]jmp label_one
label_two:lea eax, [ebp+Data]...push eaxcall ds:CopyFileAjmp label_three
label_one:...call _strcat...jmp label_two
label_three: ...
Evasion Through Evasion Through ReorderingReordering8D 85 D8 FE FF FF90*68 78 8E 40 0090*5090*E8 69 06 00 0090*5990*...90*5090*FF 15 C0 60 40 00
Regex Signature
June 2011 Somesh Jha: Behavior-Based Malware Detection 26
lea eax, [ebp+Data]jmp label_one
label_two:lea eax, [ebp+Data]...push eaxcall ds:CopyFileAjmp label_three
label_one:...call _strcat...jmp label_two
label_three: ...
Evasion Through Evasion Through EncryptionEncryption8D 85 D8 FE FF FF90*68 78 8E 40 0090*5090*E8 69 06 00 0090*5990*...90*5090*FF 15 C0 60 40 00
Regex Signature
lea esi, data_areamov ecx, 37
again: xor byte ptr [esi+ecx], 0x01 loop again jmp data_area . . .data_area: db 8C 84 D9 FF ... . . . db FE 14 C1 61 ...
June 2011 Somesh Jha: Behavior-Based Malware Detection 27
Evasion Through EvolutionEvasion Through Evolution• Malware writers are good at software
engineering:– Modular designs– High-level languages– Sharing of exploits, payloads, and
evasion techniques
Example:Beagle e-mail virus gained additional functionality with each version.
June 2011 Somesh Jha: Behavior-Based Malware Detection 28
Beagle EvolutionBeagle EvolutionSource: J. Gordon, infectionvectors.com
• More than 100 variants, not counting associated components.
BeagleMass mailer
MitgliederSpam relay
ToosoWeakens security
LodearUpdate Engine
MonikeyPropagation Mgr
LDPinchPassword Theft
TarnoPassword Theft
FormgliederBank Info Theft
June 2011 Somesh Jha: Behavior-Based Malware Detection 29
OutlineOutline• Introduction• Threat Model• Behavior-Based Detection• Mining Malicious Behaviors
June 2011 Somesh Jha: Behavior-Based Malware Detection 30
• Start with a set of known viruses.• Create obfuscated versions:
– Reordering– Register/variable renaming– Encryption
• Measure resilience to obfuscation (detection rate of obfuscated versions)
Empirical StudyEmpirical Study [Christodorescu & Jha, ISSTA [Christodorescu & Jha, ISSTA 2004]2004]
June 2011 Somesh Jha: Behavior-Based Malware Detection 31
Evaluation Goal: Evaluation Goal: ResilienceResilienceQuestion 1:•How resistant is a virus scanner to
obfuscations or variants of known worms?
Question 2:•Using the limitations of a virus
scanner, can a blackhat determine its detection algorithm?
June 2011 Somesh Jha: Behavior-Based Malware Detection 32
OutlineOutline• Introduction• Threat Model• Evaluation of Current Detectors• Behavior-Based Detection• Future Directions
June 2011 Somesh Jha: Behavior-Based Malware Detection 33
Describing Malicious BehaviorDescribing Malicious Behavior[Christodorescu et al., Oakland 2005][Christodorescu et al., Oakland 2005]
• Informal description:“Mass-mailing virus”
• A more precision description:“A program that:
sends messages containing copies ofitself,using the SMTP protocol,in a large number over a short
periodof time.”
June 2011 Somesh Jha: Behavior-Based Malware Detection 34
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
• A specification of behavior.
MalspecMalspec
= +connect(Y);
send(Z,T);
connect(Y);
send(Z,T);
Syntactic info
“HELO”
Y
Z T
Semantic info
Malspec
Malware Instance(Netsky.B)
June 2011 Somesh Jha: Behavior-Based Malware Detection 35
Obfuscation Preserves Obfuscation Preserves BehaviorBehavior
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hnoppush eaxxor eax, ebxxor eax, ebxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush eaxpop eaxpush ecxpush edicall send
push 10hnoppush eaxxor eax, ebxxor eax, ebxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush eaxpop eaxpush ecxpush edicall send
• Junk insertion + code reordering.
June 2011 Somesh Jha: Behavior-Based Malware Detection 36
Obfuscation Preserves Obfuscation Preserves BehaviorBehavior
• Junk insertion + code reordering.
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send
push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send
June 2011 Somesh Jha: Behavior-Based Malware Detection 37
push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send
push 10hnoppush eaxjmp L1L4: push ecxpush edijmp L5L2: xor eax, ebxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush eaxjmp L3L1: xor eax, ebxjmp L2L3: pop eaxjmp L4L5: call send
Obfuscation Preserves Obfuscation Preserves BehaviorBehavior
• Junk insertion + code reordering.
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
June 2011 Somesh Jha: Behavior-Based Malware Detection 38
Evolution Preserves Evolution Preserves BehaviorBehavior
• Add error handling.
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...
push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...
June 2011 Somesh Jha: Behavior-Based Malware Detection 39
Evolution Preserves Evolution Preserves BehaviorBehavior
• Add error handling.
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send
push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...
push 10hpush eaxpush edicall connect... ; check return codejnz error_handler... ; compose SMTP
; command "HELO ..."push eaxpush ecxpush edicall send... ; check return codejnz error_handler...error_handler:...
June 2011 Somesh Jha: Behavior-Based Malware Detection 40
Detection Using MalspecsDetection Using MalspecsStatic detection:
Given an executable binary, check whether it satisfies the malspec.
φ
Malspec
Just like model checking, but...
• Malicious code allows no assumptions to be made
• Real-time constraints
June 2011 Somesh Jha: Behavior-Based Malware Detection 41
A Behavior-Based DetectorA Behavior-Based Detector• Match the syntactic constructs, then
check the semantic information.
connect(Y);
send(Z,T);
connect(Y);
send(Z,T);
Syntactic info
“HELO”
Y
Z T
Semantic info
Malspec
June 2011 Somesh Jha: Behavior-Based Malware Detection 42
push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email
push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email
Check the Semantic InfoCheck the Semantic InfoProgram (Netsky.O): connect(Y);
send(Z,T);
connect(Y);
send(Z,T);
Syntactic info
“HELO”
Y
Z T
Semantic info
Malspec... ; compose SMTP; command
“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv
... ; compose SMTP; command
“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
send_email()
SMTP_send_and_rcv()
June 2011 Somesh Jha: Behavior-Based Malware Detection 43
Doeseax before == ebx after for the code
sequence:push eaxcall foomov ebx, [ebp+4]
?
Check with the OracleCheck with the Oracle• Assume we have an oracle that can
validate value predicates.
Yes.
June 2011 Somesh Jha: Behavior-Based Malware Detection 44
push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email
push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email
Check the Semantic InfoCheck the Semantic InfoProgram (Netsky.O): connect(Y);
send(Z,T);
connect(Y);
send(Z,T);
Syntactic info
“HELO”
Y
Z T
Semantic info
Malspec... ; compose SMTP; command
“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv
... ; compose SMTP; command
“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
A:
B:
send_email()
SMTP_send_and_rcv()
June 2011 Somesh Jha: Behavior-Based Malware Detection 45
push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email
push 10hpush eaxpush [ebp+s]call connect...push ebxlea eax, [ebp+s]push eaxcall send_email
Query the OracleQuery the OracleProgram (Netsky.O): connect(Y);
send(Z,T);
connect(Y);
send(Z,T);
Syntactic info
“HELO”
Y
Z T
Semantic info
Malspec... ; compose SMTP; command
“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv
... ; compose SMTP; command
“HELO ..."lea eax, [ebp+arg1]push eaxlea eax, [ebp+buffer]push eaxcall SMTP_send_and_rcv
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
push eaxpush [ebp+arg1]mov eax, [ebp+arg2]push [eax]call send
A:
B:
send_email()
SMTP_send_and_rcv()
Doesmemory[ebp@A+4]
== memory[ebp@B+4]
hold for the code sequence between A
and B?
Yes.
June 2011 Somesh Jha: Behavior-Based Malware Detection 46
A Recipe for an OracleA Recipe for an Oracle• Instance of program verification
problem:Does program P respect property φ ?
PatternMatching
PatternMatching
RandomExecution
RandomExecution
SimplifyTheorem Prover
SimplifyTheorem Prover
UCLIDModel Checker
UCLIDModel Checker
CodeFragment P
Expressionse1, …, ek
Yes No Yes Yes
More powerful, higher cost
June 2011 Somesh Jha: Behavior-Based Malware Detection 47
A Behavior-Based PrototypeA Behavior-Based Prototype
• Developed malspecs for several families of worms.
• No false positives.
• Improved resilience to common obfuscations.
June 2011 Somesh Jha: Behavior-Based Malware Detection 48
Evaluation of MalspecsEvaluation of Malspecs
McAfee uses individual signatures for each worm.
Malspecs provide forward detection.
Netsky.B
Decryption sig
Mass-mailing sig
Prototypedetector
Netsky.C
Netsky.D
Netsky.O
Netsky.P
Netsky.T
Netsky.W
June 2011 Somesh Jha: Behavior-Based Malware Detection 49
PerformancePerformance• Prototype is slower than commercial
anti-virus tools.
• Plenty of room for improvement.e.g. disassembler: 25% of time.
Malware Family
Running Time
Average Std. Deviation
Netsky 99.57 s 41.01 s
Beagle 56.41 s 40.72 s
June 2011 Somesh Jha: Behavior-Based Malware Detection 50
Evaluation: False Positive Evaluation: False Positive RateRate• Tested the malspecs on 2,000 benign
Windows binaries.• False positive rate: 0%
0%
20%
40%
60%
80%
100%
0 B 35,840 B 71,680 B 107,520 B 143,360 B
Program size (grouped in 5 kB increments)
Disas
sem
bly
rate
June 2011 Somesh Jha: Behavior-Based Malware Detection 51
Evaluation: Obfuscation Evaluation: Obfuscation ResilienceResilience• Different types garbage insertion
applied to Beagle.Y to obtain more variants.
Obfuscation TypeBehavior-Based Detection
McAfeeAverage Time Detection Rate
Nop insertion 74.81 s 100% 75%
Stack op. insertion 159.10 s 100% 25%
Math op. insertion 186.50 s 95% 5%
June 2011 Somesh Jha: Behavior-Based Malware Detection 52
Detector
Obfuscation
Formally Assessing Formally Assessing ResilienceResilience
[POPL 2007][POPL 2007]• Soundness (no false positives)• Completeness (no false negatives)
“HELO”
Y
Z T
Malspec
agmoPrr
Program
?
June 2011 Somesh Jha: Behavior-Based Malware Detection 53
Approach to Assessing Approach to Assessing ResilienceResilience
• Detector “filters out” irrelevant aspects of the program (described in terms of trace semantics).
Detector
“HELO”
Y
Z T
Malspec
agmoPrr
Program
?
=Program
Program Abstractio
n
Dynamic Dynamic Behavior-Based DetectionBehavior-Based Detection• Threatfire
• Sana Security
• Novashield
June 2011 Somesh Jha: Behavior-Based Malware Detection 54
NovaShield Behavior Engine NovaShield Behavior Engine ArchitectureArchitecture
FileMonitor
RegistryMonitor
ProcessMonitor
NetworkMonitor
OS Kernel
BehaviorEngine
User
Process
User
Process
User
Process
Security
Policies
Security
Policies
June 2011 Somesh Jha: Behavior-Based Malware Detection 56
Additional InformationAdditional Information• Papers
– M. Christodorescu and S. Jha, Testing Malware Detectors, International Sympoisum on Testing and Analysis (ISSTA), 2004
– M. Christodorescu, S. Seshia, S. Jha, D. Song, and R. Bryant, Semantics-Aware Malware Detection, IEEE Symposium on Security and Privacy (Oakland), 2005.
– M. Dalla Preda, M. Christodorescu, S. Debray and S. Jha, A Semantics-Based Approach to Malware Detection, Symposium on Principles of Programming Languages (POPL), January 2007.
• Website– http://www.cs.wisc.edu/~jha/
Behavior-Based DetectionBehavior-Based Detection
The old way – match syntactic signatures:
The new way – examine underlying behavior:
One-to-one
One-to-one
One-to-many
One-to-many
< 50% detection
< 50% detection
Specifying BehaviorsSpecifying Behaviors
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
Specifying BehaviorsSpecifying Behaviors
June 2011 Somesh Jha: Behavior-Based Malware Detection 59
Behavior-graph representation– Nodes epresent events & arguments
•System calls, library calls, high-level events
– Edges represent data dependencies• Data substring equality, resource
generation/use
– Argument values are crucial!
Finding the Needle in the Finding the Needle in the HaystackHaystack
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
NtOpenKey“…\
InternetSettings\...”
NtSetValueKey“ProxyBypass”
Large, Complex ProblemLarge, Complex Problem• Behavior graphs are large
– Between tens of thousands to millions of nodes
• New malware is ever-present– Lower bound of 7,933 samples/day in 2009
• Large, diverse benign application pool– Windows 7 is backwards compatible to
NT/95
• Manual analysis, brute force not feasible
7933 samples
62 Synthesizing Optimal Malware Specifications June 2011
Large, Complex ProblemLarge, Complex Problem• Behavior graphs are large
– Between tens of thousands to millions of nodes
• New malware is ever-present– Lower bound of 7,933 samples/day in 2009
• Large, diverse benign application pool– Windows 7 is backwards compatible to
NT/95
• Manual analysis, brute force not feasible
7933 samples
63 Synthesizing Optimal Malware Specifications June 2011
Our ContributionsOur Contributions• New specification-synthesis algorithm
– Perform efficient, large-scale data mining first to uncover suspicious behaviors
– Probabilistically refines and optimizes specifications
• Key algorithms scale to real problem size– Reduces the window of vulnerability
• Tunable true positive/false positive rate– 86% TP for low FP, 100% TP for higher FP
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
Holmes: Our Approach to Holmes: Our Approach to Specification SynthesisSpecification Synthesis•Roadmap:
–Workflow1.Mine significant behaviors2.Synthesize specification
–Results–Conclusion
Significant BehaviorsSignificant Behaviors
• Significant behaviors discriminate between labeled malicious and benign sets
• Measured statistically via frequency counting of subgraphs– Can use information gain, cross entropy, G-test,
…
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
Key RequirementKey Requirement• Significant behavior appears in many
malware graphs, few benign graphs
Leap Mining: Extracting Leap Mining: Extracting Significant BehaviorsSignificant Behaviors• Want to find subgraph that optimizes
significance measure• Problem: Number of candidate
subgraphs is factorial in # Nodes + # Edges
Leap Mining (Contd)Leap Mining (Contd)• Insight: Correlation between
structural similarity, significance score similarity to guide search [Yan et al., SIGMOD ‘08]– “Leap” over branches in search tree with
similar structure
• Future: Probabilistically compress source graphs to mine behaviors more efficiently [Chen et al, VLDB ‘09]
June 2011 Somesh Jha: Behavior-Based Malware Detection 68
Leap Mining: ExampleLeap Mining: Example
Significance
0̀0.1
Significance score similar to parent!
Significance score similar to parent!
This means we can prune
siblings
This means we can prune
siblings0.20.80.10.2
Most significant pattern!
Most significant pattern!
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
Holmes: Our Approach to Holmes: Our Approach to Specification SynthesisSpecification Synthesis•Roadmap:
–Workflow1.Mine significant behaviors2.Synthesize specification
–Results–Conclusion
Naïve Synthesis: Just Naïve Synthesis: Just Significant BehaviorsSignificant Behaviors• Use all significant behaviors exhibited
by a specific sample• Pros:
– Not path-dependent– Significance metric likely to select
behaviors that give low false positives
• Cons:– Some significant behaviors may be variant-
specific false negatives!– Some samples may not exhibit many mined
suspicious behaviors false positives!
Searching for the Optimal Searching for the Optimal SpecificationSpecification• Insight: significant behaviors are
suspicious behaviors• A good spec. is the right combination
of suspicious behaviors• Given a malware set, search using
concept analysis– Concept is a pair: ({malware samples},
{suspicious behaviors})– Find set of concepts with optimal
true/false positive characteristics
Simulated AnnealingSimulated Annealing
• Concept space is enormous: factorial in number of suspicious behaviors• Simulated annealing: probabilistic search over localized portions of
solution space– Derive new solutions greedily most of the time– With certain probability, move to sub-optimal solutions in the search avoid
local minima– Known sampling methods, cooling schedules to guarantee optimal
convergence
Simulated Annealing: Simulated Annealing: ExampleExample
Detection Rate False Positives
` 0678 111 5
Probabilistically take sub-optimal
solution!
Probabilistically take sub-optimal
solution!
75
WorkflowWorkflow
Known Malware
Specification Synthesis
DiscriminativeSpecification
Benign Apps
Significant Behaviors
Behavior Mining
Benign Apps
Recent Malware
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
Holmes: Our Approach to Holmes: Our Approach to Specification SynthesisSpecification Synthesis•Roadmap:
–Workflow1.Mine significant behaviors2.Synthesize specification
–Results–Conclusion
Evaluation WorkflowEvaluation Workflow
492 samples
Known Malware
Specification Synthesis
DiscriminativeSpecification
Benign Apps
Significant Behaviors
Behavior Mining
11 apps
166 behaviors
378 samples
Benign Apps
28 apps
1 specification
(with 10-fold cross-validation)
Behavior-BasedMalware Detection
DetectionResults
Recent Malware
Benign Apps
28 apps
42 samples
New Malware
78 Synthesizing Optimal Malware Specifications June 2011
Corpus DetailsCorpus Details• 912 malware samples
– 18 AV-labeled families• Spyware, worms, bots, filesystem viruses, …
– 492 samples in 6 families for mining– 420 samples in 12 families for synthesis
& evaluation
• 49 benign applications– Behaviorally-diverse set: browsers,
system administration, media…
Corpus Details (Contd)Corpus Details (Contd)• Trace collection accounts for a single
path– 120 seconds for malware– Typical usage patterns for benign
applications
80 Synthesizing Optimal Malware Specifications June 2011
Behavior Mining ResultsBehavior Mining Results• Mined 109 unique behaviors
– 18.1 per family, on average– 77 manually deemed malicious
• Non-malicious behaviors due to sample size
• Most behaviors correspond to those in AV databases– Mined some unreported by AV, e.g. code
injection & browser reconfiguration in worms and viruses
– Some behaviors missing (likely) due to single-path collection
81 Synthesizing Optimal Malware Specifications June 2011
Specification Synthesis Specification Synthesis ResultsResults
• 0 FP on test corpus for 86.5% detection rate• TP/FP tradeoff configurable• Better than commercial AV on our corpus: Sana (42.61%),
Threatfire (61.70%)
82 Synthesizing Optimal Malware Specifications June 2011
Specification Synthesis Specification Synthesis ResultsResults
• 0 FP on test corpus for 86.5% detection rate• TP/FP tradeoff configurable• Better than commercial AV on our corpus: Sana (42.61%),
Threatfire (61.70%)
Synthesizing Optimal Malware Specifications June 2011
Performance and ScalabilityPerformance and Scalability• Behavior mining runtime varies between
families– Worst-case exponential; can tweak tradeoff
in accuracy– Similarity between malicious/benign graphs
affects runtime– Can easily parallelize for linear speedup
• Specification synthesis works quickly– Most specifications found in under one
minute (near-optimal solutions)– Optimal solution can be found in
exponential time using same algorithm
84 Synthesizing Optimal Malware Specifications June 2011
ConclusionsConclusions
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
• Synthesizing specifications is hard!• Holmes utilizes large-scale data
mining to extract suspicious behaviors
• Holmes probabilistically searches for near-optimal specifications using suspicious behaviors
• Detection results beat industry results
• Algorithms scale to real problem size
Additional InformationAdditional Information• Matt Fredrikson, Somesh Jha, Mihai
Christodorescu, Reiner Sailer, Xifeng Yan– Synthesizing Near-Optimal Malware
Specifications from Suspicious Behaviors.
– IEEE Symposium on Security and Privacy, 2010.
June 2011 Somesh Jha: Behavior-Based Malware Detection 85
June 2011 Somesh Jha: Behavior-Based Malware Detection 86
OutlineOutline• Introduction• Threat Model• Evaluation of Current Detectors• Behavior-Based Detection• Future Directions
June 2011 Somesh Jha: Behavior-Based Malware Detection 87
Take awaysTake aways• Malware detection is $5-6 billion
dollar industry• No well defined threat model• Need to formally defined a threat
model and design detection techniques based on it
• Behavior-based malware detection is a move towards that vision
June 2011 Somesh Jha: Behavior-Based Malware Detection 88
On the theoretical sideOn the theoretical side• Can we prove oracle completeness
results?– For example, if the oracle can give me a
perfect control-flow graph, I can handle reordering heuristics perfectly
• How about bounding the adversary?– Computational power (like in
cryptography)– Limit the class of obfuscations
Questions?Questions?
90 Synthesizing Optimal Malware Specifications June 2011
Naïve Synthesis: Full Naïve Synthesis: Full SpecificationSpecification• Use entire behavior graph for malware
sample• Pros:
– Fits malware very tightly– Low false positives
• Cons:– Path-specific: e.g. some looping/branching
behavior, non-determinism not critical for specification
– Impossible to build full graph – behaviors not in training run are not accounted for
– Likely to miss variants
91 Synthesizing Optimal Malware Specifications June 2011
Specifying BehaviorsSpecifying Behaviors• Behavior graph representation
– Nodes represent events & arguments• System calls, library calls, high-level events
– Edges represent data dependencies• Data substring equality, resource
generation/use
– Argument values are crucial!
NtOpenKey“…\CurrentVersion\
Run”
NtDeleteValueKey“McAfee Firewall”
NtOpenKey
NtDeleteValueKey
DefUse(1, 1) DefUse(1, 1)
NtOpenKey(501, ACC_WRITE,
“Run”, )
NtDeleteValueKey(501, “… Firewall”, )
DefUse(1, 1)
Too genera
l!
Too genera
l!
Too specifi
c!
Too specifi
c!
Just RightJust
Right
92 Synthesizing Optimal Malware Specifications June 2011
Multi-Faceted ProblemMulti-Faceted Problem• Detailed behavior information makes large,
data-rich raw source• Difficult to extract complete behavior
information– See multi-path problem [Cadar et al., CCS ‘06],
[Moser et al., Oakland ‘07]
• Malicious and benign behaviors look similar– Benign application update vs. malicious
dropping– Benign network activity vs. malicious C&C– Benign software patching vs. malicious code
injection
June 2011 Somesh Jha: Behavior-Based Malware Detection 93
Start upStart up• There is a startup which is
commercializing some of the ideas presented in this talk
• Securitas Technologies Inc.– See www.securitastech.com
Here be Dragons!Here be Dragons!
June 2011 Somesh Jha: Behavior-Based Malware Detection 95
DisclaimerDisclaimer
Virus detection is undecidable.[Cohen 1984]
Best approximation up to now:byte signatures.
June 2011 Somesh Jha: Behavior-Based Malware Detection 96
My Proposal for a SolutionMy Proposal for a Solution• Make the malware writer’s job as
hard as possible.
• Stop malware based on behavior:– Employ semantics of instructions– Use enforceable interfaces– Combine static and dynamic techniques
June 2011 Somesh Jha: Behavior-Based Malware Detection 97
Current AV Detection Current AV Detection MethodsMethods• Scan strings
(byte sequences from a malicious executable)
– Enhanced using regular expressions
• Heuristics– Binary file structure– APIs used– Byte (n-gram) distribution
June 2011 Somesh Jha: Behavior-Based Malware Detection 98
Previous ResearchPrevious Research• Different structures over bytes
N-gram distributions[Li, Wang, & Stolfo, SMC 2005]
Neural networks, Bayes[Arnold & Tesauro, VB2000]
Additional features: DLL imports, syscalls[Schultz, Eskin, Zadok, & Stolfo, Oakland 2001]
• Different information about the programSlices from syscalls
[Lo, Levitt, & Ollson, 1995]
Recovery of high-level constructs[Bergeron, Debbabi, Erhioui, & Ktari, SREIS 2001]
Model checking[Kinder, Katzenbeisser, Schallhart, & Veith, DIMVA 2005]
June 2011 Somesh Jha: Behavior-Based Malware Detection 99
Key ObservationsKey ObservationsVariants : New strains of viruses that
borrow code, to varying degrees, directly from other known viruses.
Source: Symantec Security Response Glossary
• Syntactic signatures cannot capture variants.
• Syntactic signature methods do not scale.
Need to focus on behavior.
June 2011 Somesh Jha: Behavior-Based Malware Detection 100
My Previou
s Researc
h
Proposed
Research
Behavior-Based DetectionBehavior-Based Detection• How to describe malicious behavior?
• How to identify malicious behavior?– Static Techniques– Static + Dynamic Techniques
• How to automatically learnmalicious behavior?
• How effective are these techniques?
A Language to Describe A Language to Describe Malicious BehaviorsMalicious Behaviors
Previous Researc
h
June 2011 Somesh Jha: Behavior-Based Malware Detection 102
Establishing a Threat ModelEstablishing a Threat ModelA threat model has three components:
• Attack ModelHow is the attack performed?
• Defensive GoalWhat is the system designed to protect?
• TimeHow long is the protection operational?
: Malicious Behavior
: Trusted Computing Base
: Forever?
June 2011 Somesh Jha: Behavior-Based Malware Detection 103
• Interface to TCB has to be enforceable.
For this talk: TCB = OS + Processor.
Choosing a TCBChoosing a TCB
Program
LibrariesAPI calls
OS KernelSystem calls
ProcessorInstructions
TCB: Libraries/Interpreter+ OS+ Processor
TCB:OS+ ProcessorTCB:Processor
►
►
►
June 2011 Somesh Jha: Behavior-Based Malware Detection 104
Formal Definition of MalspecFormal Definition of MalspecΣ = { σk }k≥1 is the set of system calls
V = { vi }i≥1 is the set of uninterpreted vars
A is a logic of formulas over V
G = (N,E) is a graph:Vertices are labeled with system calls
from Σ instantiated with variables from V.
Edges are labeled with predicates in A.
June 2011 Somesh Jha: Behavior-Based Malware Detection 105
Malspec BenefitsMalspec Benefits• Representation-independent
– Depends only on the interface to the TCB
– Ignores functions boundaries– Ignores specific data structures– Ignores process boundaries
• Order-independent– Allows any order of operations, as long
as the dependence predicates are satisfied.
Static Detection ofStatic Detection ofMalicious BehaviorMalicious Behavior
Previous Researc
h
June 2011 Somesh Jha: Behavior-Based Malware Detection 107
Step 1: Matching NodesStep 1: Matching NodesStraightforward…… except for encrypted code!
• Encryption & compression effectively hide the system calls (i.e., the TCB operations).
• Solution: Malware normalization
June 2011 Somesh Jha: Behavior-Based Malware Detection 108
A Malware NormalizerA Malware Normalizer• Dynamic analysis technique:
– Run program in a contained environment
– Stop as soon as control flow reaches a previously written address
– Reconstruct program with current memory snapshotPacked
Executable
Normalizer
Qemu (system
emulator)
Unpacked Executable
June 2011 Somesh Jha: Behavior-Based Malware Detection 109
Detector CharacteristicsDetector Characteristics• Intraprocedural:
– Flow sensitive Handles many syntacticobfuscations
• Interprocedural:– Context sensitive
OR– Context insensitive
Handles changesthroughevolution
June 2011 Somesh Jha: Behavior-Based Malware Detection 110
Step 2: Predicate Step 2: Predicate VerificationVerificationCheck whether a program path satisfies
the corresponding malspec predicate.
Requirements for the predicate logic:• Addition, comparison, multiplication• Bit-vector arithmetic• Arrays• On 32-bit values (and soon 64-bit
values)
June 2011 Somesh Jha: Behavior-Based Malware Detection 111
For predicates that express preservation of values.φ(A): A1 = A2
• Syntactic check:Compare code sequence with a known set of obfuscations– Nops, pushes & pops– Operations on non-live
variables
A Simple VerifierA Simple Verifier
φ
Malspec
June 2011 Somesh Jha: Behavior-Based Malware Detection 112
Preliminary ResultsPreliminary Results [Christodorescu & Jha, USENIX Security [Christodorescu & Jha, USENIX Security 2003]2003]
Detection succeeds in the presence of:– Code reordering– Simple junk insertion– Register renaming
Zero missed detections(compared to very high missed detection rates for commercial virus scanner)
June 2011 Somesh Jha: Behavior-Based Malware Detection 113
A Value-Preservation VerifierA Value-Preservation VerifierExpress program path as a
state transformer.– Use instruction semantics
Use decision procedures.
φ
Malspec
∂ ∂ φ ?
June 2011 Somesh Jha: Behavior-Based Malware Detection 114
Verification ToolsVerification Tools• Instance of program verification
problem:Does program P respect property φ ?
PatternMatching
PatternMatching
RandomExecution
RandomExecution
SimplifyTheorem Prover
SimplifyTheorem Prover
UCLIDModel Checker
UCLIDModel Checker
CodeFragment
Predicateφ
Yes No Yes Yes
More powerful, higher cost
RandomAbstract
Interpretation
June 2011 Somesh Jha: Behavior-Based Malware Detection 115
Evaluation of Value-Evaluation of Value-PreservationPreservation [Christodorescu & Jha, Oakland [Christodorescu & Jha, Oakland 2005]2005]
McAfee uses individual signatures for each worm.Semantic malspecs provide forward detection.
Netsky.B
Decryption malspec
Mass-mailing malspec
Prototypedetector
Netsky.C
Netsky.D
Netsky.O
Netsky.P
Netsky.T
Netsky.W
June 2011 Somesh Jha: Behavior-Based Malware Detection 116
Architecture (up to now)Architecture (up to now)Executable
MalwareNormalizerMalware
NormalizerNormalize
dExecutabl
e
Malspec
Library
Semantics-Aware
Malware Detector
Semantics-Aware
Malware Detector
Semantic Query EngineSemantic Query Engine
Decision Procedures
Static AnalysesInstruction/Syscall
Semantics
Hybrid Detection ofHybrid Detection ofMalicious BehaviorMalicious Behavior
Proposed
Research
June 2011 Somesh Jha: Behavior-Based Malware Detection 118
Static Analysis is Not PerfectStatic Analysis is Not Perfect• Safety at the cost of precision
– Good for strict security, bad for usable security.
φ
Perl interpreter
June 2011 Somesh Jha: Behavior-Based Malware Detection 119
Imprecision of Static Imprecision of Static AnalysisAnalysis• Many sources of imprecision :
– Disassembly– Control flow reconstruction– Loops, recursion– Malspec predicate verification (decision
procedures)
• Leads to false positives
June 2011 Somesh Jha: Behavior-Based Malware Detection 120
Dynamic AnalysisDynamic Analysis• As precise as possible for a particular
execution– Can retrieve any part of program state– Adds time dimension
• But... adds runtime overhead– Emulators are orders of magnitude
slower
June 2011 Somesh Jha: Behavior-Based Malware Detection 121
A Hybrid Malware DetectorA Hybrid Malware DetectorCombine static + dynamic
– Identify where static analysis loses precision
– Have the dynamic analyzer check those locations
Detection goal:Check only whether malicious behavior
appears in the current execution.
Small (<10%) runtime overhead needed.
June 2011 Somesh Jha: Behavior-Based Malware Detection 122
ExampleExample
Runtime monitor determines whether portion of trace satisfies predicate.
φ
Perl interpreter
Static Stage
Dynamic Stage
Perl interpreter
Runtime monitorin
g
June 2011 Somesh Jha: Behavior-Based Malware Detection 123
Hybrid Detector OperationHybrid Detector Operation1. Determine path validity
Static analysis identifies a certain path as possibly malicious.
Dynamic analysis confirms that the current execution trace follows that path.
2. Check that trace satisfies predicateAt the end of the trace segment that
matches the path, verify the malspec predicate.
June 2011 Somesh Jha: Behavior-Based Malware Detection 124
Semantic Query EngineSemantic Query EngineStatic + Dynamic
Analyses
Architecture [hybrid]Architecture [hybrid]Executable
MalwareNormalizerMalware
NormalizerNormalize
dExecutabl
e
Malspec
Library
Semantics-Aware
Malware Detector
Semantics-Aware
Malware Detector
Decision Procedures
Static AnalysesInstruction/Syscall
Semantics
Automatic Extraction of Automatic Extraction of Malicious BehaviorMalicious Behavior
Proposed
Research
June 2011 Somesh Jha: Behavior-Based Malware Detection 126
Deriving MalspecsDeriving MalspecsGoal:
Extract a malspec from a sample program labeled as malicious.
• Requirements– Capture behavior, not implementation– Low to no false positives
multiple samplesTwo options
one sample
June 2011 Somesh Jha: Behavior-Based Malware Detection 127
Malspec from Multiple Malspec from Multiple SamplesSamplesLearning a malspec from multiple
samples:
1. Identify common sequences of system calls.
– Subgraph isomorphism
2. For each pair of system calls, construct a predicate describing the actual code paths.
- Symbolic execution, human expert
June 2011 Somesh Jha: Behavior-Based Malware Detection 128
ExampleExample
X =socket()connect( Y )write( Z, “EHLO ...” )
write( B, “DATA” )write( C, body )
close( D )
write( A, “TO ” + address )
X =socket()connect( Y )
foo( Z, “EHLO ...” )
foo( B, “DATA” )
foo( C, body )
close( D )
foo( A, “TO ” + address )
write( A, B )read( C )
Beagle.B Beagle.Cfoo( A, B )
June 2011 Somesh Jha: Behavior-Based Malware Detection 129
Malspec from One SampleMalspec from One SampleAdditional semantic information
needed
• System call API usage rules– Provides sequencing information and
some data flow information
• Network protocol semantics– Provides sequencing information and
additional data flow information
June 2011 Somesh Jha: Behavior-Based Malware Detection 130
System call rules:
socket connect (write|read)* close
SMTP protocol:
write(“EHLO”)
write(“MAILTO”+addr)
write(“DATA”)
write(body)
Example: Beagle.BExample: Beagle.B
X =socket()connect( Y )write( Z, “EHLO ...” )
write( B, “DATA” )write( C, body )
close( D )
write( A, “TO ” + address )
June 2011 Somesh Jha: Behavior-Based Malware Detection 131
Complete ArchitectureComplete ArchitectureExecutable
MalwareNormalizerMalware
NormalizerNormalize
dExecutabl
e
Malspec
Library
Semantics-Aware
Malware Detector
Semantics-Aware
Malware Detector
MalspecGenerato
r
MalspecGenerato
r
Semantic Query EngineSemantic Query Engine
Decision ProceduresStatic + Dynamic
AnalysesInstruction/Syscall Semantics
Theoretical Limits of Theoretical Limits of Behavior-Based DetectionBehavior-Based Detection
Proposed
Research
June 2011 Somesh Jha: Behavior-Based Malware Detection 133
What Does This Buy Us?What Does This Buy Us?• How strong (theoretically) is this
system?ORHow much harder does the malware writer have to work to evade my system?
Goal:“Design” a computationally-bounded adversary. Assess the behavior-based detector against this adversary.
June 2011 Somesh Jha: Behavior-Based Malware Detection 134
TimelineTimeline
2005 2006 2007June June
Malspec extractionfrom many samples
Malspec extractionfrom one sample
Hybrid detection• runtime monitor• path checking• predicate
checking
Theoretical workThesis writingInterview season
Behavior-Based Malware Behavior-Based Malware DetectionDetection
Somesh Jha
Joint work with Mihai Christodorescu
June 2011 Somesh Jha: Behavior-Based Malware Detection 136
June 2011 Somesh Jha: Behavior-Based Malware Detection 137
Step 2: UnificationStep 2: Unification• One-way unification to associate
program expressions with the uninterpreted variables in the malspec.
• Result: one binding map for each matched pair (malspec node, program location).
June 2011 Somesh Jha: Behavior-Based Malware Detection 138
Evaluation: Obfuscation Evaluation: Obfuscation ResilienceResilience• Different types junk insertion applied
to Beagle.Y to obtain more variants.
Obfuscation TypeSemantics-Aware Detection
McAfeeAverage Time Detection Rate
Nop insertion 74.81 s 100% 75%
Stack op. insertion 159.10 s 100% 25%
Math op. insertion 186.50 s 95% 5%
June 2011 Somesh Jha: Behavior-Based Malware Detection 139
Problems with Dynamic Problems with Dynamic AnalysisAnalysis• Execution may have affected the
host machine in a malicious way.
Goal:Stop execution as soon as itenters a path that iscertainly malicious.
• Static analysis can help identify these points of no return.
Perl interpreter