Catch and Release: A New Look at Detecting and Mitigating highly obfuscated Exploit Kits

CATCH AND RELEASE: A NEW LOOK AT DETECTING AND MITIGATING HIGHLY OBFUSCATED EXPLOIT KITS

BY MOHAMED SAHER AND AHMED GARHY

AGENDA Our Intent

Rethinking Evasions

Domain of the Problem

Current Problem

Problem with Current Solutions

Solution #1 First Method

Solution #2 Second Method

OUR INTENT Is this function malicious?

function Translate(objects, offset, size) {

var length = 4;

for (var i = 0; i < size; i++) {

var r = rc.substr(0, length);

if(offset > 0) {

r = r.substr(offset) + r.substr(0, offset);

}

objects[i] = r.substr(0, r.length);

}

}

OUR INTENT Is this function malicious?

function Translate(objects, offset, size) {

var length = 4;



if(offset > 0) {


}


}

} Without understanding the context on how a function is used, it is

very difficult to determine if it is malicious or not

OUR INTENT What about this script?

<script>

var a = '%25%33%43%69%66%72%61%6d%65 ...';

var b = unescape(unescape(a));

var spray = new Function(unescape(b));

</script>


<script>

var a = '%25%33%43%69%66%72%61%6d%65 ...';



</script> An “expert’s eye” can probably determine it looks suspicious.

The two are actually equal to each other


<script>

var a = '%25%33%43%69%66%72%61%6d%65 ...';



</script> An “expert’s eye” can probably determine it looks suspicious.

The two are actually equal to each other

Our intent is to allow an attack using the first example script, without depending on obfuscating like the second example script, and propose a more superior method for detecting both

RETHINKING EVASIONS Designing a new architecture


Use a message oriented architecture (MOA) to split the attack into disparate self contained messages – we refer to this as “units of work”



This is a variation of the “script splitting” technique except a message exists within a local scope and is destroyed after it serves its purpose




Does not require DOM manipulation to hide “magic strings”




Does not require DOM manipulation to hide “magic strings” Avoid the “magic redirect IFRAME” that can be a trigger for some

analyzers


Avoiding HTTP


Avoiding HTTP

An artifact that can be parsed or scanned for patterns, characteristics, and definitions does not exist


Avoiding HTTP


An alternative to loading JavaScript in “clear text”


Avoiding HTTP


An alternative to loading JavaScript in “clear text” Load one message at a time, forcing each message to be

analyzed independently – remember “units of work”


Avoiding HTTP


An alternative to loading JavaScript in “clear text” Load one message at a time, forcing each message to be

analyzed independently – remember “units of work” Web Sockets are a perfect candidate for both MOA and

bypassing HTTP from a web environment


Avoiding HTTP

Avoiding client side state


Avoiding HTTP


Two components involved, client and server

Client

Listen

Invoke


Avoiding HTTP


Two components involved, client and server

Client

Listen

Invoke

Server

State

Send


Avoiding HTTP


Two components involved, client and server For each accepted connection from a client, server maintains a

state machine


Avoiding HTTP



state machine Messages are essentially commands and do not depend on each

other – remember “units of work”


Avoiding HTTP



state machine Messages are essentially commands and do not depend on each

other – remember “units of work” Client evaluates message, invokes message, and destroys it


Avoiding HTTP


Limit control flow and function call hierarchy


Avoiding HTTP



Only client control flow is that of the client listening and invoking a message


Avoiding HTTP




Order of messages not guaranteed by server. Server may send NOP messages as part of an attack to trick certain analyzers


Avoiding HTTP




Order of messages not guaranteed by server. Server may send NOP messages as part of an attack to trick certain analyzers

“Monkey patch” functions dynamically evaluated in messages to trick certain analyzers


Avoiding HTTP



Getting creative in transport format


Avoiding HTTP




Web Sockets are simple TCP pipes, so data can be represented on the wire in an application specific way


Avoiding HTTP





No longer restricted to sending JavaScript in clear text


Avoiding HTTP





No longer restricted to sending JavaScript in clear text Create custom binary format


Avoiding HTTP





No longer restricted to sending JavaScript in clear text Create custom binary format Send message in binary on the wire

0100100001100101011011000110110001101111001000000100100001100001011011010110001001110101011100100110011100100001


Avoiding HTTP





No longer restricted to sending JavaScript in clear text Create custom binary format Send message in binary on the wire Simply looking at a binary message won't give hints about what its

contents are – is it an audio file, an image, even text?


Avoiding HTTP





No longer restricted to sending JavaScript in clear text Create custom binary format Send message in binary on the wire Simply looking at a binary message won't give hints about what its contents are

– is it an audio file, an image, even text? To even begin to understand a binary message, its format specification needs

to be known beforehand or else it is a very challenging problem in its own


Avoiding HTTP




Confusing the Context


Avoiding HTTP





Remember this function?function Translate(objects, offset, size) {

var length = 4;



if(offset > 0) {


}


}

}


Avoiding HTTP





Remember this function?function Translate(objects, offset, size) {

var length = 4;



if(offset > 0) {


}


}

} Now that we get this from our binary format, we again ask the question, how do you determine if it is

malicious?

DOMAIN OF THE PROBLEM How can we define a malicious website?


How can we detect a malicious website?



How can we detect obfuscation?




How can we identify obfuscation used for malicious purposes?




How can we identify obfuscation used for malicious purposes?

How can we categorize what is malicious and what is not?

CURRENT PROBLEM Exploits delivered at some point relies on JavaScript


JavaScript is continuously getting obfuscated with more complexity


JavaScript is continuously getting obfuscated with more complexity

Current solutions are way behind in technology

PROBLEMS WITH CURRENT SOLUTIONS Relies heavily on invocative functions that are not a

concrete base to be malicious (fromCharCode, eval, unescape, etc.) and have plenty of legitimate use cases



DOM and CSS selectors



DOM and CSS selectors Client side proxies for client-server interaction



DOM and CSS selectors Client side proxies for client-server interaction Client side template engines



Limited sets of characteristics



Limited sets of characteristics

Probabilistic decisions is directly proportional with the characteristics extracted

TYPES OF APPROACHES Dynamic analysis of embedded JS


Static analysis of extracted JS (Method #1)




DYNAMIC ANALYSIS AdHoc Forwarding


Create a middle layer between the browser and the JS engine



Analyze the CFG of the scripts being executed



Analyze the CFG of the scripts being executed Analyze a call hierarchy of functions order



Analyze the CFG of the scripts being executed Analyze a call hierarchy of functions order Analyze certain combination of functions used including

known highly risky ones


Browser Automation


Browser Automation

Attach to IE process


Browser Automation

Attach to IE process Use shdocvw.dll to automate COM callbacks


Browser Automation

Attach to IE process Use shdocvw.dll to automate COM callbacks Capture events while they trigger and manipulate them


Browser Automation

Attach to IE process Use shdocvw.dll to automate COM callbacks Capture events while they trigger and manipulate them Analyze in the same manner as AdHoc Forwarding


Browser Automation

Browser In-Memory Injection


Browser Automation


Inject JS in DOM to monitor events


Browser Automation


Inject JS in DOM to monitor events Use a JS Debugger (FireBug or other)

STATIC ANALYSIS (METHOD 1) Extract local scripts

STATIC ANALYSIS (METHOD 1) Extract local scripts

Extract remote scripts

STATIC ANALYSIS (METHOD 1) Analyze the script and categorize them based on certain

criteria


criteria

Web page encoding


criteria

Web page encoding Detecting current language used and extracting features


criteria

Web page encoding Detecting current language used and extracting features Check the WHOIS for the web page


criteria

Web page encoding Detecting current language used and extracting features Check the WHOIS for the web page

Determine probabilistically to which category it belongs to

SHANNON’S ENTROPY Formula

SHANNON’S ENTROPY Formula

We use Shannon’s Entropy to determine the entropy of the file only as a side-effect and not a main criteria to determine the decision whether it was malicious or not

NAÏVE BAYESIAN A machine-learning technique that can be used to predict

to which category a particular data case belongs

NAÏVE BAYESIAN A machine-learning technique that can be used to predict to

which category a particular data case belongs

Given the above formula’: An event A is INDEPENDENT from event B if the conditional probability is the same as the marginal probability

LAPLACIAN SMOOTHING To avoid having a 0 joint in any partial probability we use

the add-one smoothing technique

LAPLACIAN SMOOTHING To avoid having a 0 joint in any partial probability we use

the add-one smoothing technique.

Given an observation x = (x1, …, xd) from a multinomial distribution with N trials and parameter vector θ = (θ1, …, θd), a "smoothed" version of the data gives the estimator

where α > 0 is the smoothing parameter (α = 0 corresponds to no smoothing)

STATIC ANALYSIS (METHOD 2) How is JS executed/handled?


1. The code is scanned for all function(s) declaration. Each declaration is executed by creating a function object and a named reference to that function is created so that the function can be called from within a statement.


1. The code is scanned for all function(s) declaration. Each declaration is executed by creating a function object and a named reference to that function is created so that the function can be called from within a statement.

2. The statements are evaluated and executed by order as they appear on the page after fully loaded.

JS EXAMPLE #1

<script>

DoNothing();

function DoNothing() {

return;

}

</script>

This works

JS EXAMPLE #2

<script>

DoNothing();

</script>

<script>


return;

}

</script>

This does not works

JS EXAMPLE #3

<script>


return;

}

</script>

<script>

DoNothing();

</script>

This works

JS EXAMPLE #3

<script>

// assuming that DoNothing is not defined

DoNothing();

alert(1);

</script>

This does not works

JS EXAMPLE #3

<script>

// assuming that DoNothing is not defined

DoNothing();

</script>

<script>

alert(1);

</script>

This works

STATIC ANALYSIS (METHOD 2) Semantic analysis to focus on “what does this mean”

STATIC ANALYSIS (METHOD 2) Semantic analysis to focus on “what does this mean”

Optimizer-Compiler for JS which focuses on structure other than extracted invocative functions

OPTIMIZER-COMPILER The following describes the architecture of any ordinary

compiler and the current compiler as well

Lexer Parser Translator OptimizerTokens AST IR

OPTIMIZER-COMPILER At this phase the optimizer tries to optimize the JS input

based on optimization theories after the AST was generated and converted into an IR

Optimizer

Hidden Classes



Optimizer

Hidden Classes

Type Inference



Optimizer

Hidden Classes

Type Inference

Inline Caches



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis

Inline Expansion



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis

Inline Expansion

Loop Invariant Code Motion



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis

Inline Expansion


Constant Folding



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis

Inline Expansion


Constant Folding

Copy Propagation



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis

Inline Expansion


Constant Folding

Copy Propagation

Common Sub-Expression Elimination



Optimizer

Hidden Classes

Type Inference

Inline Caches

Function Synthesis

Inline Expansion


Constant Folding

Copy Propagation

Common Sub-Expression Elimination

Dead Code Elimination

Catch and Release: A New Look at Detecting and Mitigating highly obfuscated Exploit Kits

Software

new architecture

substroffset r

var length

new look

size i

example script

length ifoffset

disparate self