Abusing File Processing in Malware Detectors for Fun and Pro t fi Suman Jana and Vitaly Shmatikov The University of Texas at Austin
Feb 24, 2016
Abusing File Processing in Malware Detectors for
Fun and ProfitSuman Jana and Vitaly Shmatikov
The University of Texas at Austin
• All about sophisticated detection and evasion techniqueso Polymorphism, metamorphism, obfuscation…
Modern malware research
• All about sophisticated detection and evasion techniqueso Polymorphism, metamorphism, obfuscation..
the world’s best malware detector
the world’s simplest virus
no changes to virus content
against
Topic of this talk
Malware researcher’s view of malware detection
malware detectorhoneypot
malware
users
internet gateway
malware detector
parseinfer file type
How malware detectors work in practice
internet
intranet
• Detection algorithms are type-specific• Parsing depends on file type
• Detectors may skip less vulnerable types like MPEG
efficiency
Why must malware detectors infer file types?
correctness
Parsing in malware detectors
macros
<a> foo <\a>
<a>foo<\a>
Archive files HTML filesWord documents
find macros extract filesremove whitespace
characters
Parsing in malware detectors
MS CAB
MS CHMJavaScript
PDFMS EXE
ELF
Adobe Flash
RTFMS PPT(X)
COFF
bzip
COM
MS DLL7-zip JAR
GIF
MP3
JPEG
Why must malware detectors parse before detection?
• Identify executable contento Macros in Word fileso Code segments in PE, ELFo JavaScript in CHM
• Normalize input to a form suitable for detectiono Decompresso Preprocess HTML
• Separate metadata from content
detectors must parse lots of file
formats
File-type inference and parsingtake place in two different places
internet
infer file type
parse
if uninfected
malware detector user application/OS
parse difference =
potentialevasion
infer file type
Exhibit A (CVE-2012-1419)
• TAR files: ustar at offset 257• mirc.ini files: [aliases] at offset 0
TAR archive
eicar.com\0
header
initial 100 bytes contains the name
of first file ustar
eicar.com
• TAR files: ustar at offset 257• mirc.ini files: [aliases] at offset 0
TAR archive
[aliases].com\0
header
filename changes but the content is
unmodified ustar
Exhibit A (CVE-2012-1419)
eicar.com
Vulnerable detectors
Exhibit B (CVE-2012-1463)
Executable and Linkable Format (ELF)
offset 51 : little-endian2 : big-endian
header
1
Exhibit B (CVE-2012-1463)
offset 51 : little-endian2 : big-endian
header
2Linux ELF loader does not use this
byte but most malware detectors
do
Executable and Linkable Format (ELF)
Vulnerable detectors
Exhibit C (CVE-2012-1461)
gzip
eicar.tar
eicar.tar.1 eicar.tar.2
eicar.tar.gz
most detectors cannot parse such files correctly but
gunzip does
Vulnerable detectors
Exhibit D (CVE-2012-1459)
TAR archive layout
most detectors ignore
checksum field
length
header 1
header 2
uninfected file
checksum
Exhibit D (CVE-2012-1459)
TAR archive layout
length
header 1
wrong checksum
header 2
uninfected file most detectors ignore
checksum field
GNU tar ignores header with wrong checksum, extracts
malware
Vulnerable detectors
Many more attacks
• 45 different CVE reports for previously unknown evasion exploits
• 9 file formats• 13 applications
36 tested detectors – ALL vulnerable
You might be thinking…
Aren’t these well-known bugs?
Response from AV vendors
• OMG! These exploits completely bypass our detection engineso Patches are being pushed out
You might be thinking…
Aren’t these the same as browser content-sniffing bugs?
No
Content-sniffing bugs in browsers
• MIME content sniffing in Web browsers can be exploited for XSS attacks o First reported by Palant (2007) and Nazario (2009)
• Defense for browsers [Barth et al.]: prefix-disjoint signatures… does not work for malware detectorso Signatures for many formats that detectors must deal
with are not prefix-disjoint
You might be thinking…
These attacks affect only archive formats
Does this affect only archive formats?
• No, we have attacks against ELF, PE, MS CHM, MS Word, etc.
• What is an archive format anyway?o Many modern formats (e.g. PDF, MS Word) allow
embedding different types of content
You might be thinking…
Behavioral detection will save us
No, behavioral detection will not save you
infer file type
parse
malware detector user application/OS
parse
they must be exactly the same
For behavioral detection to work
here ….
infer file type
Possible solutions
• Write better parsers • On-access scanning
o Does not work in network/cloud detectors
• Better integration of malware detectors with applicationso Applications can share intermediate state after
parsing with cloud/network detectors
nonstarter
only works for archive files