Zarathustra: Extracting WebInject Signatures from Banking Trojans

Zarathustra: Extracting WebInject Signaturesfrom Banking Trojans

Fabio Bosatelli, Claudio CriscioneStefano Zanero, Federico Maggi

Politecnico di Milano, Italy

PST 2014

Why do we care?

Effective: direct profit (i.e., wire transfer) to the operatorNot well detected: 27.94–39.79% detection rateWidespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.

Why do we care?

Effective: direct profit (i.e., wire transfer) to the operator

Not well detected: 27.94–39.79% detection rateWidespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.

Why do we care?

Effective: direct profit (i.e., wire transfer) to the operatorNot well detected: 27.94–39.79% detection rate

Widespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.

Why do we care?

Effective: direct profit (i.e., wire transfer) to the operatorNot well detected: 27.94–39.79% detection rateWidespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.

Banking Trojans Based on WebInect

Goals:

information stealing (i.e., credentials)automatic banking transactions

ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:

network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control


Goals:information stealing (i.e., credentials)

automatic banking transactionsZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:



Goals:information stealing (i.e., credentials)automatic banking transactions





ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotorious

Functionalities:









network traffic tapping and modification,

file harvesting and stealing,use the victim as a proxy,remote control




network traffic tapping and modification,file harvesting and stealing,

use the victim as a proxy,remote control




network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,

remote control








network traffic tapping and modificationnetwork traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control

network traffic tapping and modification

Core: WebInject

Core: WebInject

WebInject: Peculiarities

URL is unchanged (including https://https://)

lock is there (happy user)

very subtle, surgical and legitimate-looking modifications

changes are happening only on the client side

https://


URL is unchanged (including https://)













changes are happening only on the client sideonly on the client sideonly on the client side

WebInject Internals: Wininet.dll API hooking

<html>......

</html>

Browser

Network APIs

<html>...

</html>

<input />

<input />WebInject

<input />...

...

Use

r spa

ceK

erne

l spa

ce

90 [NOP]90 [NOP]90 [NOP]90 [NOP]90 [NOP]8bff [MOV EDI, EDI] (FUNCTION ENTRY)55 [PUSH EBP]8bec [MOV EBP, ESP]

INFECTED CLIENT SERVER

Orig

inal

pag

e

HTTPS

Hooking

Example configuration file

set_url https://extranet.banesto.es/npage/OtrosLogin/LoginIBanesto.htm GP

data_beforename=usuario*</td>data_end

data_inject</tr><tr></TR> <TR> <TD align=left><FONT size=+0><B>Clave de Firma:</B></FONT></TD> <TD align=left colSpan=3><INPUT

type=password maxLength=8 align=center size=8 value="" name=ESpass></TD>

data_end

NoteConfiguration files embody the real value of a webinject-basedtrojan. They are sold on underground forums.

Example configuration file

set_url https://extranet.banesto.es/npage/OtrosLogin/LoginIBanesto.htm GP

data_beforename=usuario*</td>data_end

data_inject</tr><tr></TR> <TR> <TD align=left><FONT size=+0><B>Clave de Firma:</B></FONT></TD> <TD align=left colSpan=3><INPUT

type=password maxLength=8 align=center size=8 value="" name=ESpass></TD>

data_end

NoteConfiguration files embody the real value of a webinject-basedtrojan. They are sold on underground forums.

Why detection is challenging

Configuration file is encryptedExtraction can be automated, but a slight change impliesmanual analysisDifferent families, variants:

different browsersdifferent API-hooking methodsdifferent operating system

Bottom lineHard to devise future-proof, implementation-agnostic analysistechniques.

Why detection is challenging

Configuration file is encryptedExtraction can be automated, but a slight change impliesmanual analysisDifferent families, variants:

different browsersdifferent API-hooking methodsdifferent operating system

Bottom lineHard to devise future-proof, implementation-agnostic analysistechniques.

Zarathustra

Zarathustra

Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,

automatically extract the injected code,(application) generate detection signatures.

Requirementsno reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.

Zarathustra

Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,automatically extract the injected code,

(application) generate detection signatures.Requirements

no reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.

Zarathustra

Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,automatically extract the injected code,(application) generate detection signatures.


Zarathustra



Zarathustra



Key intuition

https://bank.example.com/loginpage

clean machine infected machine

<html> </html>

Pagina pulita

<html>

</html><input />

Pagina infetta

<input />

ObservationThese differences are an unavoidable consequence of an infection.

https://bank.example.com/loginpage

Web page diffing

Generally hard problem, but can be solved in specific cases

We want to catch:

Inserted DOM nodes<td>foo</td> → <td><input />foo</td>

Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">

Modified attribute valuesonclick="load();" → onclick="load(); inject();"

Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>

Web page diffing

Generally hard problem, but can be solved in specific cases

We want to catch:





Web page diffing

Generally hard problem, but can be solved in specific casesWe want to catch:





Web page diffing






Web page diffing






Web page diffing






Web page diffing






Benign vs. Malicious differences

Challenge: Not all differences are malicious!How to tell benign and malicious differences apart?

whitelist attributes that are not bound to eventsstyle="..."width="..."value="..."...

whitelist dynamically inserted nodes




whitelist dynamically inserted nodes




whitelist dynamically inserted nodeswhitelist dynamically inserted nodeswhitelist dynamically inserted nodes

Dynamically-inserted nodes

<script> <-- first injectionvar inject = document.createElement("input");...var hook = document.getElementById("login_form");hook.insertBefore(inject, hook.childNodes[0]);

</script>

...<form ...>

<input name="username" /><input name="password" />

<input name="inject" /> <-- dynamic injection</form>...


<script> <-- first injectionvar inject = document.createElement("input");...var hook = document.getElementById("login_form");hook.insertBefore(inject, hook.childNodes[0]);

</script>

...<form ...>

<input name="username" /><input name="password" />

<input name="inject" /> <-- dynamic injection</form>...


Static injections© the trojan unavoidably injects at least one static node§ they could be very coarse grained

Dynamic injections© dynamic injections are more fine grained§ very common difference in rich pages

advertising iframesdifferent versions of JS libraries...

→ main source of false positives

Approximate solutionWe disable the JavaScript interpreter to consider only staticinjections.


Static injections© the trojan unavoidably injects at least one static node§ they could be very coarse grained




Approximate solutionWe disable the JavaScript interpreter to consider only staticinjections.


Static injectionsStatic injections© the trojan unavoidably injects at least one static node§ they could be very coarse grained




Approximate solutionWe disable the JavaScript interpreterdisable the JavaScript interpreter to consider only staticinjections.

Static injections

disable the JavaScript interpreter


whitelist attributes that are not bound to eventsstyle="..."width="..."alt="..."value="..."...

whitelist dynamically inserted nodescache server responsescache server responsescache server responses

FingerprintgenerationDOM collection DOM comparison

Clean VM 1

VM images

Infected VM

Clean VM 2

Clean VM n

. . .

DOMinjection

DOMn

DOM2

DOM1

DOM

Heuristics

DO

M V

ARIA

NTS

MAL

ICIO

US

DO

M

htt

p:/

/ww

w.b

an

kin

g.s

ite

Trojansample


Clean VM 1

VM images

Infected VM

Clean VM 2

Clean VM n

. . .

DOMinjection

DOMn

DOM2

DOM1

DOM

Heuristics

DO

M V

ARIA

NTS

MAL

ICIO

US

DO

M

htt

p:/

/ww

w.b

an

kin

g.s

ite

Trojansample


Clean VM 1

VM images

Infected VM

Clean VM 2

Clean VM n

. . .

DOMinjection

DOMn

DOM2

DOM1

DOM

Heuristics

DO

M V

ARIA

NTS

MAL

ICIO

US

DO

M

htt

p:/

/ww

w.b

an

kin

g.s

ite

Trojansample

Experimental results

Dataset

56 distinct ZeuS samplesmanually confirmed to be active

213 URLs extracted from real-world configuration files35 clean machines + 1 infected machine

Correct signatures, with whitelisting

Technique Avg. Correct (± Var.) %Covered URLs

3,1 39.58 ± 11.53% 52.17%2,1 74.98 ± 15.42% 23.48%2,3 97.97 ± 0.069% 22.61%

AllAll 100.0%100.0% 23.48%23.48%

Approximation techniques:1 whitelist attributes that are not bound to events2 whitelist dynamically inserted nodes (no JavaScript)3 cache server responses

All 100.0% 23.48%

Limitations

injections that overlap perfectly with existing nodeshard to produce without disrupting the pagedid not see any, so far

dynamic injections via existing JavaScriptunreliable from the viewpoint of the attackereasy to catch (but not to model) with simple fuzzy hashing

as of now, we detect if the code is modified, yet not how.

Conclusions

Conclusions

Simple yet very effective techniquewe implemented it in a web application (just catch me afterthe talk)Implementation, language, platform and family agnosticFuture proof

[email protected]

http://maggi.cc@phretor

Zarathustra: Extracting WebInject Signaturesfrom Banking Trojans

Fabio Bosatelli, Claudio CriscioneStefano Zanero, Federico Maggi

Politecnico di Milano, Italy

PST 2014

mailto:[email protected]

http://maggi.cc

https://twitter.com/phretor

Extra Slides

Signature example

{"test_node_parent": "form","test_node_value": "input","test_node_xpath": "/html[1]/body[1]/center[3]/table[1]/

tbody[1]/tr[1]/form[1]/input[13]"}

False positives

-10

0

10

20

30

40

50

60

70

0 5 10 15 20 25 30 35-10

0

10

20

30

40

50

60

70

% F

PR

(n)

n = distinct virtual machines

Clean machineInfected machine

Processing time per 213 URLs and 76 samples

40000

50000

60000

70000

80000

90000

100000

110000

4 5 6 7 8 9 10

Tim

e(n

) [s

eco

nd

s]

n = Virtual machines running in parallel

5.702

4.164

3.277

2.836

Zarathustra: Extracting WebInject Signatures from Banking Trojans

Science