Top Banner
2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant Pongsin Poosankam Dawn Song UC, Berkeley
38

2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

Dec 31, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 1

Binary Analysis and Rewriting

Arvind AyyangarNiranjan Hasabnis

Alireza SaberiTung TranR. Sekar

Stony Brook University

Min Gyung KangStephen McCamantPongsin Poosankam

Dawn SongUC, Berkeley

Page 2: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 2

Binary Rewriting for Protecting ApplicationsBasic approach: Instrument OS+application to

enforce policies that protect an application from a hostile OS

Why binary rewriting? Versatile: enforce a wide range of properties

Low-level: memory pages, instructions/operands,…Higher-level: fine-grained (data-structure level) memory

isolation, policies on callable functions and parameters,…Global: information flow, control-flow integrity,…

Wide applicability:COTS and legacy applications available only in binary form

Application and all library code can be analyzed/rewrittenWorks across programs in many high-level languagesAbility to handle low-level code written in assembly

Page 3: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 3

Binary Rewriting TodayRelies on dynamic rewriting

Each basic block rewritten just before first executionBenefit: Side-steps challenges of static rewriting, e.g., accurate disassembly

Drawbacks High overheads for problems of our interest

400% to 4000% for taint-tracking

Difficulty in reasoning about higher level propertiesLimited visibility (single basic block) constrains the

classes of properties that can be reasoned aboutTargets a single instruction set (usually x86)

Page 4: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 4

Our ApproachDevelop novel static analysis based methods to

overcome the drawbacks of today’s techniquesMany research challenges:

Robust and scalable static analysis of low-level code produced by different compilers (or hand-written

assembly)Accurate disassembly of binary code

Indirect control-flow transfers, non-standard call/return conventions, mingling of data and code, …

Accurate reasoning about key properties Dynamic taint analysis

Page 5: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 5

Robust and scalable Static analysis of low-level code

Page 6: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 6

Static analysis of low-level codeScalability relies on modularity

Analyze functions individually, compose resultsAvoids repeated analysis of same code (esp. libraries)

Strength comes from accurate treatment of local variables

Challenges in low-level binary codeDifficult to identify parameter passing in optimized

codeMissing pushes, parameter passing via registers,…

Difficult to distinguish local variables from other accesses

Caller/callee-saved registers, stack pointer conventions, …

Page 7: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 7

Static analysis of low-level codeTo solve these challenges, previous approaches

make optimistic assumptions, or rely on compiler idiomsoften fail on optimized code and/or large programsdon’t work for other compilers, or hand-written assembly

Our solution: Develop a new static analysis thatUses systematic analysis to avoid

assumptions/heuristicsParameters, passing conventions, caller/callee save regs,…

Verifies assumptions that it needs to makepreservation of stack pointer across callswhether return goes back to caller, etc.

Accurately tracks local variables by analyzing values held in registers and on the stack

Page 8: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 8

Stack AnalysisIdentify well-formed functionsAssociate with it scope, activation recordNo assumptions about

Parameters & Return valuesCaller & Callee SavesUse of base pointers

ESP RETURN ADDR

ƒ

Page 9: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 9

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[0,0]

ESP0 Base_SP

Page 10: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 10

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[-4,-4]

ESP

Base_BP+[0,0]

0

-4

Base_SP

Page 11: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 11

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_SP +[-4,-4]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[-4,-4]

ESP

Base_BP+[0,0]

0

-4

Base_SP

Page 12: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 12

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_SP +[-4,-4]

EBP

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP-20ESP

Base_BP+[0,0]

0

-4

Base_SP

Page 13: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 13

Stack Analysis (contd)

<f>:push %ebpmov %esp, %ebpsub $16, %espmov 8(%ebp), %eaxadd $3, %eaxmov %eax, 8(%ebp)mov $7, -12(%ebp)mov 12(%ebp), %edxmov %edx, -8(%ebp)leaveret

args

locals

Base_SP + [-4, -4]

arg1 + [3, 3]

arg2 + [0, 0]

EBP

EAX

EDX

Base_SP + [0, 0]

arg2 + [0, 0]

ESP

-12

SP

arg2

arg1 + [3, 3]

Ret Addr

RP

arg2 + [0, 0]

Base BP +[0,0]

7

Caller frame

Calleeframe

args

locals

Base_SP

Base_SP+[-20,-20]

Page 14: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 14

Function summaries from Stack analysis

Change in ESP as a result of executing functionNumber of incoming parameters Changes in registers and parameters as a result

of executing functionFor function <f>:

ESP unchanged2 incoming argumentsEAX, EDX and first parameter changed as shown before; Other registers and parameters unchanged.

Page 15: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 15

Analysis time

XMMSApache

0

50

100

150

200

250

300

0 200 400 600

Size (K instructions)

Anal

ysis

tim

e (s

econ

ds)

Page 16: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 16

Static disassembly of binary code

Page 17: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 17

Background: Disassembly TechniquesLinear sweep algorithm

Start with program entry point, proceed to disassemble instructions sequentially

Key assumption: all instructions appear one after the next, without any gapsViolated in most code (presence of data or padding)

Recursive Traversal AlgorithmAfter a control-flow transfer instruction (CTI),

proceed to disassemble target addressFor conditional CTI and non-CTI, proceed to

disassemble next instructionKey problems

Code reached only through indirect CTIsFunctions that don’t return in the usual way

Page 18: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 18

Our Approach for DisassemblyAssumption

No code obfuscationNon-assumptions

Function prologue and epilogue patternsCompiler idioms or (lack of) optimizations

ApproachUse recursive traversalUse stack analysis to compute/verify return targetsDevelop new analysis techniques to determine

targets of indirect control-flow transfers

Page 19: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 19

Our Approach: Type inference Key insight: Code pointer values don’t undergo

arithmetic or other transformationsImplication: values assigned to code pointers must

represent indirect CTI targetsAchieves much better results than data flow

analysisAvoids global def-use problem, which is very hard in low-level languages

Compute sets C of possible code addresses and C of definite code addressesCode at addresses in C can be safely disassembledCode at addresses not in C can be safely relocated

Page 20: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 20

Static Disassembly: Preliminary Results

Analysis of disassembler on 'ls' binary

Analysis Disassembled code Reachable code not disassembled

Recursive Traversal 2.7% 85%

Compiler idioms and heuristics 87% 1%

Function pointer analysis 88% 0%

Page 21: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 21

Static Disassembly: Preliminary Results

Gap in dhclient due to incomplete implementation, dealing with global arrays

Application Size (KB)

Disassembled code

Reachable code not disassembled

pdftops 14 97% 0%

chroot 26 85% 0%

chmod 39 87% 0%

cat 43 92% 0%

ls 96 88% 0%

dhclient 411 81% 4%

Page 22: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 22

DTA++: Improving accuracy of Dynamic Taint Analysis

Page 23: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 23

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

Page 24: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 24

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too few dependencies lead to under-tainting

Page 25: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 25

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too many dependencies lead to over-tainting

Page 26: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 26

Basic IdeaData dependencies

Taint propagates from operands to the output of an operation

Control dependenciesVariables assigned within a conditional branch

receive taint from the operands of the conditionCommonly omitted in DTA: leading to under-

taintingKey idea in DTA++: propagate taint only for

control dependencies that would otherwise cause under-tainting (culprit implicit flows)

Page 27: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 27

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Intuition: Information Flow

Page 28: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 28

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Intuition: Information Flow

1 char output[256];2 char input = next_in();3 long len = 0;4 if (input == '{') {5 output[0] = '\\';6 output[1] = '{';7 len = 2;8 }

Page 29: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 29

Offline Rule GenerationHypothesis: under-tainting occurs at just a few

locations in a program (culprit branches)Approach: find these locations in advance, and

construct new taint propagation rules form themAssumption: we are given test inputs that

demonstrate the under-tainting

Page 30: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 30

Architecture Overview

Extra Propagation

Conventional DTA

Extra Propagation

Conventional DTA

Under-taintingDiagnosis

Rule Generation

correct propagationinformation

sampletainted input execution

traceimplicit flow

branches

DTA++ propagation rules

Offline Analysis

generaltainted input

trace(or other analysis)

Page 31: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 31

Under-tainting Detection PredicateGiven a (partial) execution trace t, φ(t) holds if t

contains a culprit implicit flowImplementation: count how many other inputs

could take the same execution path as t (using symbolic execution)Few or none →φ(t) = true

Page 32: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 32

Search for Culprit BranchesSearch through prefixes of a trace to find the

shortest satisfying φ: the last instruction in the prefix is the culprit

To minimize calls to φ, use binary searchAfter finding one culprit, remove it and repeat

the search to find others

Page 33: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 33

Experiment Setup• Subject programs are 8 Windows word-

processing applications in binary form• Input tainted plain text from virtual

keyboard• Convert and save the text in RTF or HTML

– RTF: “Taint it: {” →“Taint it \{”– HTML: “Taint it: <” →“Taint it: &lt”

Page 34: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 34

Results: Performance

ProgramDescription

# of CulpritImplicit Flows

Detected & Fixed

Time forDiagnosis

WordPad, RTF 1 0.26s

MS Word 2003, RTF 24 31m 5.26s

AbiWord, HTML 1 14.29s

AngelWriter, HTML 3 0.63s

AurelEdit, RTF 1 0.76s

VNU Editor, RTF 1 0.34s

IntelliEdit, RTF 1 0.40s

CryptEdit, RTF 1 0.23s

Page 35: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 35

Measuring Over-tainting• After saving the file, count the number of

tainted bytes in system memory– Also counted tainted branches (in paper)

• Four levels of propagation:– Original: vanilla DTA (has under-tainting)– Optimal: fix a single instruction manually– DTA++: targeted control-flow propagation– DYTAN*: indiscriminate control-flow

propagation (similar to Clause et al.)

Page 36: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 36

Over-tainting Measurements

Page 37: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 37

Questions?

Page 38: 2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen.

2/9/2009 38

Related WorkIDAProVSANaClTIEBIRD