Top Banner
Eureka: A Framework for Enabling Static Analysis on Malware MARS.MTC.SRI.COM
43

Eureka: A Framework for Enabling Static Analysis on Malware

Feb 25, 2016

Download

Documents

sezja

Eureka: A Framework for Enabling Static Analysis on Malware. MARS.MTC.SRI.COM. Motivation. Malware landscape is diverse and constant evolving Large botnets Diverse propagation vectors, exploits, C&C Capabilities – backdoor, keylogging, rootkits, Logic bombs, time-bombs - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Eureka: A Framework for Enabling Static Analysis on Malware

Eureka: A Framework for Enabling Static Analysis on Malware

MARS.MTC.SRI.COM

Page 2: Eureka: A Framework for Enabling Static Analysis on Malware

Motivation Malware landscape is diverse and constant evolving

Large botnets Diverse propagation vectors, exploits, C&C Capabilities – backdoor, keylogging, rootkits, Logic bombs, time-bombs

Malware is not about script-kiddies anymore, it’s real business.

Manual reverse-engineering is close to impossible Need automated techniques to extract system logic,

interactions and side-effects

Page 3: Eureka: A Framework for Enabling Static Analysis on Malware

Dynamic vs Static Malware Analysis Dynamic Analysis

Techniques that profile actions of binary at runtime

Better track record to date CWSandbox, TTAnalyze Only provides partial ``effects-oriented profile’’ of

malware potential Static Analysis

Can provide complementary insights Potential for more comprehensive assessment

Page 4: Eureka: A Framework for Enabling Static Analysis on Malware

Malware Evasions and Obfuscations To defeat signature based detection schemes

Polymorphism, metamorphism: started appearing in viruses of the 90’s primarily to defeat AV tools

To defeat Dynamic Malware Analysis Anti-debugging, anti-tracing, anti-memory dumping VMM detection, emulator detection

To defeat Static Malware analysis Encryption (packing) API and control-flow obfuscations Anti-disassembly

Page 5: Eureka: A Framework for Enabling Static Analysis on Malware

System Goals

Desiderata for a Static Analysis Framework Unpack over 90% of contemporary malware Handle most if not all packers Deobfuscate API references Automate identification of capabilities Provide feedback on unpacking success Simplify and annotate call graphs to illustrate interactions

between key logical blocks

Page 6: Eureka: A Framework for Enabling Static Analysis on Malware

The Eureka Framework Novel unpacking technique based on coarse

grained execution tracing Heuristic-based and statistic-based upacking Implements several techniques to handle

obfucated API references Multiple metrics to evaluate unpack success Annotated call graphs provide bird’s eye view

of system interaction

Page 7: Eureka: A Framework for Enabling Static Analysis on Malware

The Eureka Workflow

Trace Malware

syscalls in VM

Syscall trace

Heuristic basedoffline

analysis

Eureka’s Unpacker

Favorable execution

point

Packed Binary

Un-packedBinary

Dis-assemblyIDA-Pro

Un-Packed .ASM

Dis-assemblyIDA-Pro

Packed .ASM

Statistics based

Evaluator

Unpack Evaluati

on

Eureka’s API Resolver(Control and Data-flow

Analysis)

Un-obfuscated

.ASM

Detailed call-graph

Annotated Call-Graphs

(Control and Data-flow Analysis)

Statistics based

Evaluator

Page 8: Eureka: A Framework for Enabling Static Analysis on Malware

Coarse-grained Execution Monitoring Generalized unpacking principle

Execute binary till it has sufficiently revealed itself Dump the process execution image for static

analysis Monitoring exection progress

Eureka employs a Windows driver that hooks to SSDT (System Service Dispatch Table)

Callback invoked on each NTDLL system call Filtering based on malware process pid

Page 9: Eureka: A Framework for Enabling Static Analysis on Malware

Related Work PolyUnpack (Royal et al. ACSAC 2006)

Static model using program analysis Fine-grained execution tracking detects execution

steps outside the model Renovo (Kang et al. WORM 2007)

Fine-grained execution tracking using QEMU Dumping trigger: execution of newly written code

OmniUnpack (Martigoni et al. ACSAC 2007) Coarse-grained monitoring using page-level

protection mechanisms

Page 10: Eureka: A Framework for Enabling Static Analysis on Malware

Design SpaceSystem Environm

entGranularity Trigger Child

processmonitoring

Output Layers

Speed Evasions

Poly-Unpack

Inside VM

Instruction Model No 1 Slow 1,2,3

Renovo OutsideVM

Instruction Heuristic Yes Many Slow 2,4

Omni-Unpack

Inside VM

Page Heuristic No Many Fast 2,3

Eureka Inside VM

System Call

HeuristicStatistic

Yes 1,Many Fast 2,3

Evasions: (1) multiple packing (2) partial code revealing packers (3) VM detection(4) Emulator detection

Page 11: Eureka: A Framework for Enabling Static Analysis on Malware

Heuristic-based Unpacking How do you determine when to dump?

Heuristic #1: Dump as late as possible. NtTerminateProcess

Heuristic #2: Dump when your program generates errors. NtRaiseHardError

Heuristic #3: Dump when program forks a child process. NtCreateProcess

Issues Weak adversarial model, too simple to evade… Doesn’t work well for package non-malware programs

Page 12: Eureka: A Framework for Enabling Static Analysis on Malware

Statistics-based Unpacking Observations

Statistical properties of packed executable differ from unpacked exectuable

As malware executes code-to-data ratio increases Complications

Code and data sections are interleaved in PE executables

Data directories(import tables) look similar to data but are often found in code sections

Properties of data sections vary with packers

Page 13: Eureka: A Framework for Enabling Static Analysis on Malware

Statistics-based Unpacking (2) Our Approach

Model statistical properties of unpacked code Volume of unpacked code must strictly increase

Estimating unpacked code N-gram analysis to look for frequent instructions We use bi-grams (2-grams) because x-86 opcodes are 1 or

2 bytes Extract subroutine code from 9 benign executables FF 15 (call), FF 75 (push), E8 _ _ _ ff (call), E8 _ _ _ 00

(call)

Page 14: Eureka: A Framework for Enabling Static Analysis on Malware

Statistics-based Unpacking (3)

Bigram Calc117 KB

Explorer1010 KB

Ipconfig59 KB

lpr11 KB

Mshearts131 KB

Notepad72 KB

Ping21 KB

Shutdown23 KB

Taskman19 KB

FF 15call

246 3045 184 24 192 415 58 132 126

FF 75push

235 2494 272 33 274 254 41 63 85

E8 _ _ _ 0xffcall

1583 2201 181 19 369 180 87 49 41

E8 _ _ _ 0x00call

746 1091 152 62 641 108 57 66 50

Page 15: Eureka: A Framework for Enabling Static Analysis on Malware

Statistics-based Unpacking (4) Feasibility test

Corpus of (pre- and post-unpacked) executables unpacked with heuristic unpacking

1090 executables: 125 originally unpacked, 965 unpacked

Simple bi-gram counting was able to distinguish 922 out of 965 unpacked executables (95% success rate)

Page 16: Eureka: A Framework for Enabling Static Analysis on Malware

STOP Algorithm STOP – Statistical Test for Online unPacking

Online algorithm for determing dumping trigger Simple hypothesis test for change in mean

Null Hypothesis: mean bigram count has not increased Assumption: bigram counts are normally distributed with

prior mean μo. If (μ1 – μ0) / σ1 > 1.645, we reject null hypothesis with

confidence level of 0.95. Test is repeated to determine beginning of

unpacking and end of unpacking.

Page 17: Eureka: A Framework for Enabling Static Analysis on Malware

API Resolution User-level malware programs require system

calls to perform malicious actions Use Win32 API to access user level libraries Obufscations impede malware analysis using

IDA Pro or OllyDbg Packers use non-standard linking and loading of

dlls Obfuscated API resolution

Page 18: Eureka: A Framework for Enabling Static Analysis on Malware

Standard API Resolution API Calls

Calls to various user-level DLL’s linked by the Windows Linker/Loader Legitimate executables have import table Import table is used to fill up IAT with virtual addresses at run-time

CALL F ; call by thunk…CALL [X] ; indirect call

CALL X

Imports X KERNEL32.OpenFile……..

IAT (Import Address Table) B+R……..

ExportsOpenFile R……..

KERNEL32.DLL

B:

R: Entrypoint to OpenFile

X:Dynamic linking

F: JMP [X] ; thunk

Page 19: Eureka: A Framework for Enabling Static Analysis on Malware

Standard API ResolutionImports in IAT identified by IDA by looking at Import Table

Page 20: Eureka: A Framework for Enabling Static Analysis on Malware

API Obfuscation by Packers Import table is removed IAT is not filled in by the linker and loader Unpacker fills in IAT or similar data structure by itself Hard to identify corresponding API call in executable

………….CALL F

CALL X

Imports X KERNEL32.OpenFile……..

IAT (Import Address Table) B+R……..

ExportsOpenFile R……..

KERNEL32.DLL

B:

R: Entrypoint to OpenFile

X:

F: JMP [X] ; thunk

Page 21: Eureka: A Framework for Enabling Static Analysis on Malware

Identifying APIs by Address For each DLL build relative and absolute address database

Default “Image address” is the base address Calculate corresponding virtual address for each exported API

Match addresses used in calls with the databaseS

………….CALL [X]

CALL X

Imports X KERNEL32.OpenFile……..

IAT (Import Address Table) 7c810332……..

ExportsOpenFile R……..

KERNEL32.DLL

7c800000:

7c810332: Entrypoint to OpenFile

X:Dynamic linking

Page 22: Eureka: A Framework for Enabling Static Analysis on Malware

Handling DLL Load Obfuscations Intercept dynamic loading at arbitrary addresses

Look for “NtOpenSection” and “NtMapViewOfSection” in trace Search for DLL headers in memory during dumping

Can even identify DLL code that are copied to arbitrary location

………….CALL F

CALL X

Imports X KERNEL32.OpenFile……..

IAT (Import Address Table) 21810332……..

ExportsOpenFile R……..

KERNEL32.DLL

RVA:00000:

RVA:10332: Entrypoint to OpenFile

X:Dynamic linking

Page 23: Eureka: A Framework for Enabling Static Analysis on Malware

Handling Thunks Identify subroutines with a JMP instruction only

Treat any calls to these subs as an API call IsDebuggerPresent

Page 24: Eureka: A Framework for Enabling Static Analysis on Malware

Using Dataflow Analysis Identify register based indirect calls

GetEnvironmentStringW

use

def

Page 25: Eureka: A Framework for Enabling Static Analysis on Malware

Handling Dynamic Pointer Updates Identify register based indirect calls

dword_41e304 has no staticvalue to look up API

use

def

A def to dword_41e308 is foundLook for probable call toGetProcAddress earlier

Call to GetProcAddress

Page 26: Eureka: A Framework for Enabling Static Analysis on Malware

Evaluation Metrics Measuring analyzability

Code-to-data ratio Use disassembler to separate code and data. Most successfully unpacked malware have code-to-data

ratio over 50% API resolution success

Percentage of API calls that have been resolved from the set of all call sites.

Higher percentage implies more the malware is amenable to static anlaysis.

Page 27: Eureka: A Framework for Enabling Static Analysis on Malware

Graph Generation Call graph simplification

Most malware contain hundreds of functions Remove nodes without APIs connecting inbound and

outbound edges Micro-ontology labeling

Bird’s eye view of malware instance Translate API functions into categories based on

functionality Categories based on Microsoft’s Classifications Common Filesystem, Random, Time, Registry, Socket, File

Management

Page 28: Eureka: A Framework for Enabling Static Analysis on Malware

Storm Worm Case Study

Storm Worm: Bird’s Eye View(Semi-manually generated)

Page 29: Eureka: A Framework for Enabling Static Analysis on Malware

Storm Worm Case Study (2)

Control Flow Graph:eDonkey Handler

Page 30: Eureka: A Framework for Enabling Static Analysis on Malware

Eureka Ontology Graph

Page 31: Eureka: A Framework for Enabling Static Analysis on Malware

Experimental Evaluation Evaluation using three different datasets Goat (packed benign executable) dataset

15 common packers Provides ground truth for what packer is used and

what is expected after unpacking Spam malware corpus Honeynet malware corpus

Page 32: Eureka: A Framework for Enabling Static Analysis on Malware

Goat DatasetPacker Poly-

UnpackRenovo Eureka Eureka-

APIArmadillo No Partial Yes 64%ASPack Partial Yes Yes 99%ASProtect Partial Yes No -ExeCryptor Yes Partial Yes 2%ExeStealth No Yes Yes 97%FSG Yes Yes Yes 0%MEW Yes Yes Yes 97%

Page 33: Eureka: A Framework for Enabling Static Analysis on Malware

Goat DatasetPacker Poly-

UnpackRenovo Eureka Eureka-

APIMoleBox No Yes Yes 98%Morphine Yes Partial Yes 0%Obsidium No No Yes 99%PeCompact No Yes Yes 99%Themida No Partial Partial -UPX Yes Yes Yes 99%WinUPack Partial Yes Yes 99%Yoda Partial Partial Yes 97%

Page 34: Eureka: A Framework for Enabling Static Analysis on Malware

Evaluation (ASPack)

Page 35: Eureka: A Framework for Enabling Static Analysis on Malware

Evaluation (MoleBox)

Page 36: Eureka: A Framework for Enabling Static Analysis on Malware

Evaluation (Armadillo)

Page 37: Eureka: A Framework for Enabling Static Analysis on Malware

Spam Corpus Evaluation Evaluation of a corpus of 481 executables

Binaries collected at spam traps 470 executables successfully unpacked (over

97% success) 401 unpacked simply using heuristic unpacker Rest unpacked using statistical hypothesis test Most API references were successfully

deobfuscated

Page 38: Eureka: A Framework for Enabling Static Analysis on Malware

Spam Corpus Evaluation (2)

Packer Count Eureka Eureka-API

Unknown 186 184 85%UPX 134 132 78%Virus 79 79 79%PEX 18 18 58%MEW 12 11 70%Rest(10) 52 46 83%

Page 39: Eureka: A Framework for Enabling Static Analysis on Malware

Spam Corpus Evaluation (3)

Virus Family Count Eureka Eureka-API

TRSmall 98 98 93%TRDldr 63 61 48%Bagle 67 67 84%Mydoom 45 44 99%Klez 77 77 78%Rest(39) 131 123 78%

Page 40: Eureka: A Framework for Enabling Static Analysis on Malware

Honeynet Corpus Evaluation Evaluation of a corpus of 435 executables

Binaries collected at SRI honeynet 178 out of 435 packed with Themida (only partially

analyzable)

Analysis of the 257 non-Themida binaries 20 did not execute on Win XP Eureka unpacks 228 / 237 remaining binaries High API resolution rates on unpacked binaries

Page 41: Eureka: A Framework for Enabling Static Analysis on Malware

Honeynet Corpus Evaluation (2)

Packer Count* Eureka Eureka-API

PolyEne 109 109 97%FSG 36 35 94%Unknown 33 29 67%ASPack 23 22 93%tELock 9 9 91%Rest(9) 27 24 62%

*Includes all binaries except those packed with Themida

Page 42: Eureka: A Framework for Enabling Static Analysis on Malware

Honeynet Corpus Evaluation (3)

Virus Family Count* Eureka Eureka-API

Korgo 70 70 86%Virut 24 24 90%Padobot 21 21 82%Sality 17 17 96%Parite 15 15 96%Rest(19) 90 81 90%

*Includes all binaries except those packed with Themida

Page 43: Eureka: A Framework for Enabling Static Analysis on Malware

Runtime Performance Evaluation of a corpus of 435 executables

Binaries collected at SRI honeynet 178 out of 435 packed with Themida (only partially

analyzable)

Analysis of the 257 non-Themida binaries 20 did not execute on Win XP Eureka unpacks 228 / 237 remaining binaries High API resolution rates on unpacked binaries