Abusing File Processing in Malware Detectors for Fun and Proﬁt

Abusing File Processing in Malware Detectors for

Fun and ProfitSuman Jana and Vitaly Shmatikov

The University of Texas at Austin

• All about sophisticated detection and evasion techniqueso Polymorphism, metamorphism, obfuscation…

Modern malware research

• All about sophisticated detection and evasion techniqueso Polymorphism, metamorphism, obfuscation..

the world’s best malware detector

the world’s simplest virus

no changes to virus content

against

Topic of this talk

Malware researcher’s view of malware detection

malware detectorhoneypot

malware

users

internet gateway

malware detector

parseinfer file type

How malware detectors work in practice

internet

intranet

• Detection algorithms are type-specific• Parsing depends on file type

• Detectors may skip less vulnerable types like MPEG

efficiency

Why must malware detectors infer file types?

correctness

Parsing in malware detectors

macros

<a> foo <\a>

<a>foo<\a>

Archive files HTML filesWord documents

find macros extract filesremove whitespace

characters

Parsing in malware detectors

MS CAB

MS CHMJavaScript

PDFMS EXE

ELF

Adobe Flash

RTFMS PPT(X)

COFF

bzip

COM

MS DLL7-zip JAR

GIF

MP3

JPEG

Why must malware detectors parse before detection?

• Identify executable contento Macros in Word fileso Code segments in PE, ELFo JavaScript in CHM

• Normalize input to a form suitable for detectiono Decompresso Preprocess HTML

• Separate metadata from content

detectors must parse lots of file

formats

File-type inference and parsingtake place in two different places

internet

infer file type

parse

if uninfected

malware detector user application/OS

parse difference =

potentialevasion

infer file type

Exhibit A (CVE-2012-1419)

• TAR files: ustar at offset 257• mirc.ini files: [aliases] at offset 0

TAR archive

eicar.com\0

header

initial 100 bytes contains the name

of first file ustar

eicar.com

• TAR files: ustar at offset 257• mirc.ini files: [aliases] at offset 0

TAR archive

[aliases].com\0

header

filename changes but the content is

unmodified ustar

Exhibit A (CVE-2012-1419)

eicar.com

Vulnerable detectors

Exhibit B (CVE-2012-1463)

Executable and Linkable Format (ELF)

offset 51 : little-endian2 : big-endian

header

1

Exhibit B (CVE-2012-1463)

offset 51 : little-endian2 : big-endian

header

2Linux ELF loader does not use this

byte but most malware detectors

do

Executable and Linkable Format (ELF)


Exhibit C (CVE-2012-1461)

gzip

eicar.tar

eicar.tar.1 eicar.tar.2

eicar.tar.gz

most detectors cannot parse such files correctly but

gunzip does


Exhibit D (CVE-2012-1459)

TAR archive layout

most detectors ignore

checksum field

length

header 1

header 2

uninfected file

checksum

Exhibit D (CVE-2012-1459)

TAR archive layout

length

header 1

wrong checksum

header 2

uninfected file most detectors ignore

checksum field

GNU tar ignores header with wrong checksum, extracts

malware


Many more attacks

• 45 different CVE reports for previously unknown evasion exploits

• 9 file formats• 13 applications

36 tested detectors – ALL vulnerable

You might be thinking…

Aren’t these well-known bugs?

Response from AV vendors

• OMG! These exploits completely bypass our detection engineso Patches are being pushed out


Aren’t these the same as browser content-sniffing bugs?

No

Content-sniffing bugs in browsers

• MIME content sniffing in Web browsers can be exploited for XSS attacks o First reported by Palant (2007) and Nazario (2009)

• Defense for browsers [Barth et al.]: prefix-disjoint signatures… does not work for malware detectorso Signatures for many formats that detectors must deal

with are not prefix-disjoint


These attacks affect only archive formats

Does this affect only archive formats?

• No, we have attacks against ELF, PE, MS CHM, MS Word, etc.

• What is an archive format anyway?o Many modern formats (e.g. PDF, MS Word) allow

embedding different types of content


Behavioral detection will save us

No, behavioral detection will not save you

infer file type

parse

malware detector user application/OS

parse

they must be exactly the same

For behavioral detection to work

here ….

infer file type

Possible solutions

• Write better parsers • On-access scanning

o Does not work in network/cloud detectors

• Better integration of malware detectors with applicationso Applications can share intermediate state after

parsing with cloud/network detectors

nonstarter

only works for archive files

Abusing File Processing in Malware Detectors for Fun and Proﬁt

Documents

file type detectors

content detectors

file types

file processing

file ustareicar

tar archive aliases

vulnerable types

tar archive eicar