Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant.

Binary Analysis and Rewriting

Arvind AyyangarNiranjan Hasabnis

Alireza SaberiTung TranR. Sekar

Stony Brook University

Min Gyung KangStephen McCamantPongsin Poosankam

Dawn SongUC, Berkeley

MotivationA popular approach for protecting applications

from untrusted OS is to rely on a trusted VMMBinary translation is one of the commonly used

implementation technologies in VMMsQEMU, earlier versions of VMWare, …Benefits: No need for hardware support, applicable to COTS binaries, whole system can be instrumented

Unfortunately, existing binary translators unsuited for enforcing higher level propertiesInformation flow, control-flow integrity, object-granularity memory safety, … Incur very high overheads (4x to 10x slowdown), or are

simply unable to express certain properties

Our ApproachDevelop novel static analysis based methods to overcome the drawbacks of today’s techniques

Robust, scalable static analysis of low-level code From different compilers, or hand-coded assembly

Accurate disassembly of binary code Indirect control-flow transfers, non-standard call/return conventions, mixing of data and code, …

Accurate reasoning about key properties Dynamic taint analysis

Robust and scalable Static analysis of low-level code

Static analysis of low-level codeScalability: requires modular analysis

Analyze functions individually, compose resultsAvoids repeated analysis of same code (esp. libraries)

Strength: requires accurate reasoning about variables (esp. local variables)

Challenges in low-level binary codeDifficult to identify parameter passing in optimized

codeMissing pushes, parameter passing via registers,…

Difficult to distinguish local variables from other accesses

Caller/callee-saved registers, stack pointer conventions, …

Static analysis of low-level code

To solve these challenges, previous approachesmake optimistic assumptions, or rely on compiler

idiomsoften fail on optimized code and/or large programsdon’t work for other compilers, or hand-written assembly

Our solution: Develop a new approach thatUses systematic analysis to reduce

assumptions/heuristicsAccurately tracks local variables by analyzing values

held in registers and on the stack

Stack AnalysisAnalyzes one function at a timeExamines the use of stack to

Determine parameters Number of them, whether in registers or on stack

Caller- and callee-saved registersSummarize effect on parameters

Preservation of SP, return to caller, changes in parameter or register contents,…

ESP RETURN ADDR

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[0,0]

ESP0 Base_SP

Abstract Interpretation for Stack Analysis

LATTICE

<ƒ> :

Activation Record

Base_BP +[0,0]

push %ebpmov %esp, %ebpsub $16, %esp

Base_SP +[-4,-4]

Base_BP+[0,0]

Base_SP

Stack Analysis (contd)

Summary for f: No change to ESP Two input parameters on stack EAX, EDX, arg1 changed as shown Others unchanged

<f>:push %ebpmov %esp, %ebpsub $16, %espmov 8(%ebp), %eaxadd $3, %eaxmov %eax, 8(%ebp)mov $7, -12(%ebp)mov 12(%ebp), %edxmov %edx, -8(%ebp)leaveret

locals

Base_SP + [-4, -4]

arg1 + [3, 3]

arg2 + [0, 0]

Base_SP + [0, 0]

arg2 + [0, 0]

arg1 + [3, 3]

Ret Addr

arg2 + [0, 0]

Base BP +[0,0]

Caller frame

Calleeframe

locals

Base_SP

Base_SP+[-20,-20]

Stack Analysis: Preliminary results

pdftops

XMMSApache

0 200 400 600

Size (K instructions)

Static disassembly of binary code

Background: Disassembly TechniquesLinear sweep algorithm

Start with program entry point, proceed to disassemble instructions sequentially

Key assumption: all instructions appear one after the next, without any gapsViolated in most code (presence of data or padding)

Recursive Traversal AlgorithmAfter a control-flow transfer instruction (CTI),

proceed to disassemble target addressFor conditional CTI and non-CTI, proceed to

disassemble next instructionKey problems

Code reached only through indirect CTIsFunctions that don’t return in the usual way

Our Approach for DisassemblyAssumption

No code obfuscationNon-assumptions

Function prologue and epilogue patternsCompiler idioms or (lack of) optimizations

ApproachUse recursive traversalUse stack analysis to compute/verify return targetsDevelop new analysis to determine targets of

indirect control-flow transfers

Our Approach: Type inference Key insight: Code pointer values don’t undergo

arithmetic or other transformationsImplication: values assigned to code pointers must

represent indirect CTI targetsAchieves much better results than data flow

analysisAvoids global def-use problem, which is very hard in low-level languages

Compute sets C of possible code addresses and C of definite code addressesCode at addresses in C can be safely disassembledCode at addresses not in C can be safely relocated

Static Disassembly: Preliminary Results

Analysis of disassembler on 'ls' binary

Analysis Disassembled code Reachable code not disassembled

Recursive Traversal 2.7% 85%

Compiler idioms and heuristics 87% 1%

Function pointer analysis 88% 0%

Static Disassembly: Preliminary Results

Gap in dhclient due to incomplete implementation, dealing with global arrays

Application Size (KB)

Disassembled code

Reachable code not disassembled

pdftops 14 97% 0%

chroot 26 85% 0%

chmod 39 87% 0%

cat 43 92% 0%

ls 96 88% 0%

dhclient 411 81% 4%

DTA++: Improving accuracy of Dynamic Taint Analysis [NDSS 2011]

Under-tainting and Over-taintingResults vary based on which values are

considered to depend on others:

• Too few dependencies lead to under-tainting

• Too many dependencies lead to over-tainting

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows)

Key Idea

Under-tainting occurs when control flow state represents (almost) all of the information in inputs

Key idea: propagate taint only for control dependencies that would cause under-tainting (culprit implicit flows)

Key Idea

1 char output[256];2 char input = next_in();3 long len = 0;4 if (input == '{') {5 output[0] = '\\';6 output[1] = '{';7 len = 2;8 }

DTA++ Approach OverviewHypothesis: under-tainting occurs at just a few locations

in a program (culprit branches)Approach: find these locations in advance, and construct

new taint propagation rules for themAssumption: we are given test inputs that demonstrate

the under-tainting

Approach DetailsUnder-tainting Detection Predicate

Given a (partial) execution trace t, φ(t) holds if t contains a culprit implicit flow

ImplementationUse symbolic execution to count how many other inputs could take the same execution path as t

Few or none → φ(t) = trueSearch for Culprit Branches

Find shortest prefix of t that satisfies φthe last instruction in the prefix is the culprit

Remove culprit, repeat the search to find others

ProgramDescription

# of CulpritImplicit Flows

Detected & Fixed

Time forDiagnosis

WordPad, RTF 1 0.26s

MS Word 2003, RTF 24 31m 5.26s

AbiWord, HTML 1 14.29s

AngelWriter, HTML 3 0.63s

AurelEdit, RTF 1 0.76s

VNU Editor, RTF 1 0.34s

IntelliEdit, RTF 1 0.40s

CryptEdit, RTF 1 0.23s

DTA++ Results: Diagnosis Time

DTA++ Results: Over-tainting

Summary and Future WorkDevelop novel static analysis based methods to

overcome the drawbacks of today’s techniquesRobust, scalable static analysis of low-level codeAccurate disassembly of binary code Accurate reasoning about key properties

Dynamic taint analysisFuture work

Experimentation and evaluation of stack analysis and disassembly

Robust and efficient binary instrumentation for information flow and related properties

Application to hostile OS defense

Binary Analysis and Rewriting Arvind Ayyangar Niranjan Hasabnis Alireza Saberi Tung Tran R. Sekar Stony Brook University Min Gyung Kang Stephen McCamant.

binary analysis

stack stack analysisanalyzes

stack analysislattice

spstack analysis contdsummary

lowlevel binary codedifficult

stack pointer conventions

x slowdown

existing binary translators

Documents

Q UANTITATIVE I NFORMATION F LOW AS N ETWORK F LOW C APACITY...

Impact of Pre and Post Patent Regime on Indian ......the...

kend · Web viewSpeakers of the Lok Sabha 1. G.V. Mavlankar....

Quantitative Information-Flow Tracking for Real...

rAmAyaNa veNpA of maturakavi srinivAca aiyenkAr...

Scientific ProgramsSrikala Raghavan, Ambika Kurbet, Oindrila...

Reeba Devaraj Akshaya Ayyangar March 2017 Ashwin...

on the revision of the patents la · REPORT on the revision...

Adopter une approche fondée sur le cycle de vie pour cibler...

2/9/2009 1 Binary Analysis and Rewriting Arvind Ayyangar...

magadhuniversity.ac.in › download › Ambedkar.d… ·...

Imaging Evaluation of Intrathecal Baclofen REVIEW … ·...

ICEBAT Conference, India Published by Aquatic Therapy...

Solving Pell’s equation using the nearest square...

Alphageo (India) Limited Un-claimed Divided for the year ......

Institutional Innovation or Institutional Imitation? The...