Top Banner
Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University
43

Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

Dec 18, 2015

Download

Documents

Marion Hoover
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

Unleashing Mayhem on Binary Code

Sang Kil ChaThanassis Avgerinos

Alexandre RebertDavid Brumley

Carnegie Mellon University

Page 2: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

2

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

AEG

Program Exploits

I = input();if (I < 42) vuln();else safe();

Page 3: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

3

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

Explore Program

Page 4: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

4

Ghostscript v8.62 Bugint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

Reading user input from command line

Buffer overflow

CVE-2009-4270

Page 5: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

5

Multiple Pathsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

ManyBranches!

Page 6: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

6

Automatic Exploit Generation Challenge

Automatically Find Bugs & Generate Exploits

Transfer Control to Attacker Code

(exec “/bin/sh”)

Page 7: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

7

Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

outp

rint

f

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Page 8: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

8

Generating Exploitsint outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) { switch ( arg[0] ) { case ‘-’: { switch ( arg[1] ) { case 0: … default: outprintf( “unknown switch %s\n”, arg[1] ); } } default: … } …

Read Return Address from Stack Pointer (esp)

8

outp

rint

f

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Control Hijack Possible

Page 9: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

9Source

int main( int argc, char* argv[] ){ const char *arg; while( (arg = *argv++) != 0 ) {…

Executables (Binary)

01010010101010100101010010101010100101010101010101000100001000101001001001001000000010100010010101010010101001001010101001010101001010000110010101010111011001010101010101010100101010111110100101010101010101001010101010101010101010

Unleashing Mayhem

Automatically Find Bugs & Generate Exploitsfor Executables

Page 10: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

10

Demo

Page 11: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

11

if x < 100

if x*x = 0xffffffff

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x != 0xffffffff)

(x > 42) ∧ (x*x != 0xffffffff)

∧ (x >= 100)

f t

f t

f t

Page 12: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

12

f t

f t

f t

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

if x*x = 0xffffffff

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)

if x < 100

Page 13: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

13

f t

f t

f t

x = input()

if x > 42

if x*x = 0xffffffff

vuln()

Path Predicate = Π

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)Π =

if x < 100

Page 14: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

14

f t

f t

f t

x = input()

How Mayhem Works:Symbolic Execution

if x > 42

if x*x = 0xffffffff

vuln()

x can be anything

x > 42

(x > 42) ∧ (x*x == 0xffffffff)

ViolatesSafety Policyif x < 100

Page 15: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

15

int outprintf( const char *fmt, … ){ int count; char buf[1024]; va_list args; va_start( args, fmt ); count = vsprintf( buf, fmt, args ); outwrite( buf, count ); // print out}

Safety Policy in Mayhem

outp

rint

f

fmt

ret addr

count

args

bufuser

inpu

t

mai

n

esp

Return to user-controlled address

EIP not affected by user input

Instruction Pointer (EIP) level:

Page 16: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

16

Exploit Generation

Π∧

input[0-31] = attack code∧

input[1038-1042] = attack code address

Exploit is an input that satisfies the predicate:

Exploit PredicateCan transfer

control to attack code?

Can position attack code?

Page 17: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

17

Challenges

Symbolic Execution Exploit Generation

Efficient Resource Management

Symbolic IndexChallenge

Hybrid ExecutionIndex-based Memory

Model

Page 18: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

18

Challenge 1: Resource Management inSymbolic Execution

Page 19: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

19

Current Resource Management in Symbolic Execution

Online Symbolic Execution

Offline Symbolic Execution

(a.k.a. Concolic)

Page 20: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

20

Offline ExecutionOne pathat a time Method 1:

Re-run from scratch Inefficient⟹

Re-executedevery time

Page 21: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

21

Online Execution

Method 2:Stop forking

Miss paths⟹

Method 3: Snapshot process

Huge disk ⟹image

Hit Resource Cap

Fork at branches

Page 22: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

22

Mayhem: Hybrid Execution

Our Method:Don’t snapshot state; use path predicate to recreate state

9.4M 500K

Hit Resource Cap

Fork at branches

Ghostscript 8.62

“Checkpoint”

Page 23: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

23

Hybrid Execution

Manage #executorsin memory within resource cap✓

Minimize duplicated work✓

Lightweight checkpoints✓

Page 24: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

24

Challenge 2: Symbolic Indices

Page 25: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

25

Symbolic Indices

x = user_input();y = mem[x];assert (y == 42);

x can be anything

Which memory cell contains 42?

232 cells to check

Memory0 232 -1

Page 26: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

26

One Cause: Overwritten Pointers

42

mem[0x11223344]

mem[input]

arg

ret addr

ptr

buf

user

inpu

t

… assert(*ptr==42); return;

ptr address 11223344ptr = 0x11223344

Page 27: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

27

Another Cause: Table Lookups

Table lookups in standard APIs:• Parsing: sscanf, vfprintf, etc.• Character test: isspace, isalpha, etc.• Conversion: toupper, tolower, mbtowc, etc.• …

Page 28: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

28

Method 1: Concretization

Over-constrained• Misses 40% of exploits in our experiments

Π∧ mem[x] = 42 ∧ ’Π

Π ∧ x = 17∧ mem[x] = 42 ∧ ’Π

✓ Solvable✗ Exploits

Page 29: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

29

Method 2: Fully Symbolic

Π ∧ mem[x] = 42 ∧ ’Π

✗ Solvable✓ Exploits

Π ∧ mem[x] = 42 ∧ mem[0] = v0 ∧…∧ mem[232-1] = v232-1

∧ ’Π

Page 30: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

30

Our Observation

Path predicate (Π)constrains rangeof symbolic memoryaccesses

y = mem[x]

f t

x <= 42

x can be anything

f

t

x >= 50

Use symbolic execution state to:Step 1: Bound memory addresses referencedStep 2: Make search tree for memory address values

42 < x < 50Π

Page 31: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

31

Step 1 — Find Boundsmem[ x & 0xff ]

1. Value Set Analysis1 provides initial bounds• Over-approximation

2. Query solver to refine bounds

Lowerbound = 0, Upperbound = 0xff

[1] Balakrishnan et al., Analyzing memory accesses in x86 executables, ICCC 2004

Page 32: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

32

Step 2 — Index Search Tree Construction

y = mem[x]if x = 1 then y = 10

Index

MemoryValue

1012

22

20

if x = 2 then y = 12

if x = 3 then y = 22

if x = 4 then y = 20

ite( x < 3, left, right )

ite( x < 2, left, right )

Page 33: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

33

Fully Symbolic vs.Index-based Memory Modeling

Fully Symbolic Index-based Piecewise Opt.0

5000

10000Time Timeout atphttpd

v0.4b

Page 34: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

34

Index Search Tree Optimization:Piecewise Linear Approximation

y = 2*x + 10

y = - 2*x + 28

Index

MemoryValue

Page 35: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

35

Piecewise Linear Approximation

Fully Symbolic Index-based Piecewise Opt.0

5000

10000Time

2x faster

atphttpd v0.4b

Page 36: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

36

Exploit Generation

Page 37: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

37

a2ps

aeon

aspell

atphttpd

freeradius

ghostscript

glftpd

gnugol

htget

htpasswd

iwconfig

mbse-bbs

nCompress

orzHttpd

psUtils

rsync

sharutils

socat

squirrel mail

tipxd

xgalaga

xtokkaetama

coolplayer

destiny

dizzy

galan

gsplayer

muse

soritong

1 10 100 1000 10000 100000

Linux

(22)

Windows

(7)

Page 38: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

38

a2ps

aeon

aspell

atphttpd

freeradius

ghostscript

glftpd

gnugol

htget

htpasswd

iwconfig

mbse-bbs

nCompress

orzHttpd

psUtils

rsync

sharutils

socat

squirrel mail

tipxd

xgalaga

xtokkaetama

coolplayer

destiny

dizzy

galan

gsplayer

muse

soritong

1 10 100 1000 10000 100000

2 Unknown Bugs:FreeRadius,GnuGol

Page 39: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

39

Limitations• We do not claim to find all exploitable bugs

• Given an exploitable bug, we do not guarantee we will always find an exploit

• Lots of room for improving symbolic execution, generating other types of exploits (e.g., info leaks), etc.

• We do not consider defenses, which may defend against otherwise exploitable bugs– Q [Schwartz et al., USENIX 2011]

But Every Report is Actionable

Page 40: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

40

Related Work• APEG [Brumley et al., IEEE S&P 2008]

– Uses patch to locate bug, no shellcode executed

• Automatic Generation of Control Flow Hijacking Exploits for Software Vulnerabilities

[Heelan, MS Thesis, U. of Oxford 2009]– Creates control flow hijack from crashing input

• AEG [Avgerinos et al., NDSS 2011]

– Find and generate exploits from source code

• BitBlaze, KLEE, Sage, S2E, etc.– Symbolic execution frameworks

Page 41: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

41

Conclusion• Mayhem automatically generated 29 exploits

against Windows and Linux programs

• Hybrid Execution– Efficient resource management for symbolic

execution

• Index-based Memory Modeling– Handle symbolic memory in real-world

applications

Page 42: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

42

Thank You

• Our shepherd: Cristian Cadar• Anonymous reviewers• Maverick Woo, Spencer Whitman

Page 43: Unleashing Mayhem on Binary Code Sang Kil Cha Thanassis Avgerinos Alexandre Rebert David Brumley Carnegie Mellon University.

43

Q&A

Sang Kil Cha ([email protected])http://www.ece.cmu.edu/~sangkilc/