Security Seminar, Fall 2003 On the (Im)possibility of Obfuscating Programs Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil.

Security Seminar, Fall 2003

On the (Im)possibility of Obfuscating Programs

Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil

Vadhan and Ke Yang

Presented by Shai Rubin

Theory/Practice “Gap”In practice

Hackers successfully obfuscate

viruses

Researchers successfully

obfuscate programs [2,4]

Companies sell obfuscation

products [3]

In theory [1]

There is no good algorithm for

obfuscating programs

Which side are you?

Why Do I Give This Talk?

• Understand Theory/Practice Gap

• An example of a good paper

• An example of an interesting research:

– shows how to model a practical problem in terms of complexity

theory

– Illustrates techniques used by theoreticians

• I did not understand the paper. I thought that explaining the paper to

others, will help me understand it

• To hear your opinion (free consulting)

• To learn how to pronounce ‘obfuscation’

Disclaimer

• This paper is mostly about complexity theory

• I’m not a complexity theory expert

• I present and discuss only the main result of the paper

• The paper describes extensions to the main result which

I did not fully explore

• Hence, some of my

interpretations/conclusions/suggestions may be

wrong/incomplete

• You are welcome to catch me

Talk StructureMotivation

(Theory/Practice Gap)

ObfuscationModel

ImpossibilityProof

Theoretician Track

Other Obfuscation

Models Practitioner

TrackSummary

ObfuscationModel

Analysis

Obfuscation ConceptA good obfuscator: a virtual black box

“Anything an adversary can compute from an obfuscated program O(P), it can compute given just an oracle access to P”

The weakest notion of compute: a predicate, or a property of P.

Prog.cO(Prog.c) Prog.c

p(Prog.c)

Input/Output queriesCode +

Analysis + Input/Output queries

Turing Machine Obfuscator

1. [Functionality property] O(M) computes the same function as M.2. [Efficiency property] O(M) running time1 is the same as M.3. [Black box property] For any efficient algorithm2 A (Analysis) that

computes a predicate p(M) from O(P), there is an efficient (oRacle access) algorithm2 RM that for all M computes p(M):

2Probabalistic polynomial-time Turing machine

1Polynomial slowdown is permitted

A Turing machine O is a Turing Machine (TM) Obfuscator if for any Turing machine M:

Pr[A(O(M)) = p(M)] Pr[RM(1|M|) = p(M)]

In words: For every M, there is no predicate that can be (efficiently) computed from the obfuscated version of M, and cannot be computed by merely observing the input-output behavior of M.

Talk Structure

ObfuscationModel

ImpossibilityProof

Other Obfuscation

Models

Summary

Motivation(Theory/Practice Gap)

Theoretician Track

Practitioner Track

ObfuscationModel

Analysis

Proof Outline

2. Really? Please provide O.

4. I show you a predicate p, and an (analysis) algorithm s.t.: A(O(E))=p(E). You must provide RM: Pr[RE(1|E|)= p(E)] Pr[A(O(E))=p(E)].

5. I choose another machine Z and obfuscate it using O. I show you that Pr[RZ(1|Z|)= p(Z)] << Pr[A(O(Z))=p(Z)].

1. You say: “I have an obfuscator: for any Machine M, for any (analysis) algorithm A that computes a predicate p(M), there is an oracle access algorithm RM that for all M computes p(M).

3. Given O and a my chosen Turing machine E, I compute O(E).

6. Conclusion: please try another obfuscator (i.e., you do not have a good obfuscator)

Building E (1)

• Combination Machine. For any M,N:

• COMBM,N(1,x) M(x) and COMBM,N(0,x) N(x).

• Hence, COMBM,N can be used to compute N(M).

COMBM,N(b,x)=M(x) b=1

N(x) b=0

Building E (2)

• Let ,{0,1}K

• Let

• Note: D, can distinguish between C, and C’,’ when (,)(’,’)

• E,=COMBD,,C,

• Remember: E, can be used to compute D,(C,)

C,(x)= x=

0 otherwiseD,(C)=

1 C()=

0 otherwise

Proof Outline

4. I show you a predicate p, and an (analysis) algorithm s.t.: A(O(E,))=p(E,). You must provide RM: Pr[RE,(1|E,|)= p(E,)] Pr[A(O(E,))=p(E,)].

5. I choose another machine Z and obfuscate it using O. I show you that Pr[RZ(1|Z|)= p(Z)] >> Pr[A(O(Z))=p(Z)].

3. Given O and a my chosen Turing machine E, I compute O(E,).

The Analysis Algorithm

Input: A combination machine COMBM,N(b,x).

Algorithm: 1. Decompose COMBM,N into M and N.

a. COMBM,N(1,x) M(x) b. COMPM,N(0,x) N(x)).

2. Return M(N).

Note: A(O(E,)) is a predicate that is always (i.e., with probability 1) true:

A(O(E,)) = A(O(COMBD,,C,)) D,(C,) = 1

You must provide oracle access algorithm:RM s.t. Pr[RE,(1|E,|)=1] 1.

Proof Outline

5. I choose another machine Z and obfuscate it using O. I show you that Pr[RZ(1|Z|)= p(Z)] << Pr[A(O(Z))=p(Z)].

4. I show you a predicate p, and an (analysis) algorithm s.t.: A(O(E,))=1.

You must provide RM: Pr[RE,(1|E,|)= 1] Pr[A(O(E,))=1] = 1.

The Z machine

• Let Zk be a machine that always return 0k.

• Z is similar to E, (COMBD,,C,): replace C, with Zk.

Z=COMBD,,Zk

• Note A(O(Z)): is a predicate that is always (i.e., with probability 1) false:

A(O(Z)) = A(O(COMBD,,Zk)) D,(Zk) = 0

• Pr[RZ(1|Z|]=0) 1 ?. If we show that Pr[RZ(1|Z|]=0) << 1, we are done.

Why Pr[RZ(1|Z|]=0)<<1 ?

Let us look at the execution of RE,:

Start End

D, D, D,

Start EndOut’

When we replace the oracle to C, with oracle to Zk, we get RZ.

What will change in the execution?

Pr(out’=0) = Pr(a query to C, returns non-zero) =

Pr(query=) = 2-k

D, D, D,

3 Inaccurate, see paper.

Proof Outline

4. I show you a predicate p, and an (analysis) algorithm s.t.: A(O(E))=1. You must provide RM: Pr[RE(1|E|)= 1] Pr[A(O(E))=1] = 1.

5. I choose another machine Z and obfuscate it using O. I show you that Pr[RZ(1|Z|)= 0]=2-k << Pr[A(O(Z))=0] = 1.

Talk Structure

ObfuscationModel

ImpossibilityProof

ObfuscationModel

Analysis

Other Obfuscation

Models

Summary

Theoretician Track

Practitioner Track

Motivation(Theory/Practice Gap)

Modeling ObfuscationA good obfuscator: a virtual black box

“Anything that an adversary can compute from an obfuscation O(P), it can also compute given just an oracle access to P”

Prog.cO(Prog.c) Prog.c

Knowledge

• Barak shows: there are properties that cannot be efficiently learned from I/O queries, but can be learned from the code

• However, we informally knew it: for example, whether a program is written in C or Pascal, or which data structure a program uses

Input/Output queriesCode +

Analysis + Input/Output queries

Obfuscation Model Space

Difficulty to gain information from O(P).

Efficient inefficient

Information hiddenby obfuscator.

Specific predicate

All predicates

Barak’s Model

TM Obfuscator

1. O(M) computes the same function as M.2. O(M) running time1 is the same as M.3. For any efficient algorithm2 A (Analysis) that computes a predicate

p(M), there is an efficient (oRacle) algorithm2 RM that for all M computes p(M):

A Turing machine O is a TM obfuscator if for any Turing machine M:

Pr[A(O(M)) = p(M)] Pr[RM(1|M|) = p(M)]

Obfuscation Model Space

Efficient Inefficient

Programs

AllPro

Barak’s Model

Information gainedfrom O(P).

Specific predicate

All predicates

TM Obfuscator

1. O(M) computes the same function as M.2. O(M) running time1 is the same as M.3. For any efficient algorithm2 A (Analysis) that computes a predicate

p(M), there is an efficient (oRacle) algorithm2 RM that for all M computes p(M):

A Turing machine O is a TM obfuscator if for any Turing machine M:

Pr[A(O(M)) = p(M)] Pr[RM(1|M|) = p(M)]

Talk StructureMotivation

ObfuscationModel

ImpossibilityProof

Other Obfuscation

Models

Summary

Theoretician Track

Practitioner Track

ObfuscationModel

Analysis

Signature obfuscation:

Other Obfuscation Models

Efficient Inefficient

Barak’s Model

Programs

Signature obfuscation:1. Not all properties2. Not virtual black box?

AllPro

Information gainedfrom O(P).All predicates

Static Disassembly [2]:

Specific predicate

Static Disassembly [2]:1. Not all properties2. Not difficult3. Not virtual black box?

Barak’s Model Limitation

• Virtual Black Box: – Not surprising in some sense (but, still excellent work)– Does not corresponds to what attackers/researchers are doing:

“the virtual black box paradigm for obfuscation is inherently flawed”

• Too general: – obfuscator must work for all programs– for any property (Barak addresses this in the extensions)

• Too restrictive: does not allow to fit the oracle algorithm per Turing machine (does it matter?).

Alternative Models

“Property Hiding Model”: for a given property q: (i) q can be computed from P, (ii) q cannot be (is more difficult to?) computed from O(P).

Given an algorithm A, and a Turing machine M such that A(M)=q(M), obfuscate M such that

1. [property hiding] for every algorithm A, A(O(M)) q(M) 2. [functionality] M and O(M) computes the same function

Virus Signature Obfuscation•A(M) = q(M) = substring of

instructions inside M•O(M) does not contain this

substring

Static Disassembly•A(M)=(particular) Dissembler•q(M) = A(M)• 90% of the instruction in A(M) are

different than the instructions in

A(O(M))

Alternative Models (2)Backdoor Model: hide functionality for a single input, change functionality for most other inputs

Given a Turing machine M and an input x 1. [obfuscated back door] there exists y such that M(x)=O(M)(y)2. [non functionality] for every zy Pr[M(z)O(M)(z)] is high

Summary

What to take home:• The gap is possible because:

– Virtual black box paradigm is different than real world obfuscation.

– The Obfuscation Model Space .

• Nice research: Concept Formalism Properties• A lot remain to be done

Bibliography

1. B. Barak, O. Goldreich R. Impagliazzo, S. Rudich, A. Sahai, S. Vadhan and K. Yang, "On the (Im)possibility of Obfuscating Programs", CRYPTO, Aug. 2001, Santa Barbara, CA.

2. Cullen Linn and Saumya Debray. "Obfuscation of Executable Code to Improve Resistance to Static Disassembly", CCS Oct. 2003, Washington DC.

3. www.cloakware.com.

4. Christian S. Collberg, Clark D. Thomborson, Douglas Low: Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs. POPL 1998.

Security Seminar, Fall 2003 On the (Im)possibility of Obfuscating Programs Boaz Barak, Oded Goldreich, Russel Impagliazzo, Steven Rudich, Amit Sahai, Salil.

obfuscation slide

machine z

pz security seminar

turing machine obfuscator

good algorithm

good obfuscator slide

analysis algorithm input

pm prr

Documents

Welcome to the First International SPARTAN Meeting Thank...

PhyCloak: Obfuscating Sensing from Communication Signals

1 Secrecy Beyond Encryption: Obfuscating Transmission...

· IZZO ANGELO ANTONIO Ordinario Professore ... Capo...

New Techniques for Obfuscating...

REGIONE AUTONOMA DELLA SARDEGNA AZIENDA … · dott. paolo....

Approximate List- Decoding and Hardness Amplification...

A TAXONOMY OF OBFUSCATING TRASFORMATIONS Funes Daniel 809619...

Knowledge Base: Digital Presence Management Base --...

Creating, obfuscating and analyzing malware JavaScript

A taxonomy of obfuscating transformations

Securely Obfuscating Re-encryption

Russell Impagliazzo ( IAS & UCSD ) Ragesh Jaiswal ( Columbia...

INVITED: ObfusCADe: Obfuscating Additive Manufacturing CAD.....

Obfuscating Simple Functionalities from Knowledge...

Plenary Discussion John Impagliazzo Computing Curricula...