Zarathustra: Extracting WebInject Signatures from Banking Trojans Fabio Bosatelli, Claudio Criscione Stefano Zanero, Federico Maggi Politecnico di Milano, Italy PST 2014
Zarathustra: Extracting WebInject Signaturesfrom Banking Trojans
Fabio Bosatelli, Claudio CriscioneStefano Zanero, Federico Maggi
Politecnico di Milano, Italy
PST 2014
Why do we care?
Effective: direct profit (i.e., wire transfer) to the operatorNot well detected: 27.94–39.79% detection rateWidespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.
Why do we care?
Effective: direct profit (i.e., wire transfer) to the operator
Not well detected: 27.94–39.79% detection rateWidespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.
Why do we care?
Effective: direct profit (i.e., wire transfer) to the operatorNot well detected: 27.94–39.79% detection rate
Widespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.
Why do we care?
Effective: direct profit (i.e., wire transfer) to the operatorNot well detected: 27.94–39.79% detection rateWidespread: average 200,000 infected PCs (Apr, 2014), andmillions of dollars revenue for the cyber criminals.
Banking Trojans Based on WebInect
Goals:
information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)
automatic banking transactionsZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotorious
Functionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,
file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,
use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,
remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
Banking Trojans Based on WebInect
Goals:information stealing (i.e., credentials)automatic banking transactions
ZeuS (2007), SpyEye (2011), Carperb (2012), are the mostnotoriousFunctionalities:
network traffic tapping and modificationnetwork traffic tapping and modification,file harvesting and stealing,use the victim as a proxy,remote control
network traffic tapping and modification
WebInject: Peculiarities
URL is unchanged (including https://https://)
lock is there (happy user)
very subtle, surgical and legitimate-looking modifications
changes are happening only on the client side
https://
WebInject: Peculiarities
URL is unchanged (including https://)
lock is there (happy user)
very subtle, surgical and legitimate-looking modifications
changes are happening only on the client side
WebInject: Peculiarities
URL is unchanged (including https://)
lock is there (happy user)
very subtle, surgical and legitimate-looking modifications
changes are happening only on the client side
WebInject: Peculiarities
URL is unchanged (including https://)
lock is there (happy user)
very subtle, surgical and legitimate-looking modifications
changes are happening only on the client sideonly on the client sideonly on the client side
WebInject Internals: Wininet.dll API hooking
<html>......
</html>
Browser
Network APIs
<html>...
</html>
<input />
<input />WebInject
<input />...
...
Use
r spa
ceK
erne
l spa
ce
90 [NOP]90 [NOP]90 [NOP]90 [NOP]90 [NOP]8bff [MOV EDI, EDI] (FUNCTION ENTRY)55 [PUSH EBP]8bec [MOV EBP, ESP]
INFECTED CLIENT SERVER
Orig
inal
pag
e
HTTPS
Hooking
Example configuration file
set_url https://extranet.banesto.es/npage/OtrosLogin/LoginIBanesto.htm GP
data_beforename=usuario*</td>data_end
data_inject</tr><tr></TR> <TR> <TD align=left><FONT size=+0><B>Clave de Firma:</B></FONT></TD> <TD align=left colSpan=3><INPUT
type=password maxLength=8 align=center size=8 value="" name=ESpass></TD>
data_end
NoteConfiguration files embody the real value of a webinject-basedtrojan. They are sold on underground forums.
Example configuration file
set_url https://extranet.banesto.es/npage/OtrosLogin/LoginIBanesto.htm GP
data_beforename=usuario*</td>data_end
data_inject</tr><tr></TR> <TR> <TD align=left><FONT size=+0><B>Clave de Firma:</B></FONT></TD> <TD align=left colSpan=3><INPUT
type=password maxLength=8 align=center size=8 value="" name=ESpass></TD>
data_end
NoteConfiguration files embody the real value of a webinject-basedtrojan. They are sold on underground forums.
Why detection is challenging
Configuration file is encryptedExtraction can be automated, but a slight change impliesmanual analysisDifferent families, variants:
different browsersdifferent API-hooking methodsdifferent operating system
Bottom lineHard to devise future-proof, implementation-agnostic analysistechniques.
Why detection is challenging
Configuration file is encryptedExtraction can be automated, but a slight change impliesmanual analysisDifferent families, variants:
different browsersdifferent API-hooking methodsdifferent operating system
Bottom lineHard to devise future-proof, implementation-agnostic analysistechniques.
Zarathustra
Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,
automatically extract the injected code,(application) generate detection signatures.
Requirementsno reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.
Zarathustra
Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,automatically extract the injected code,
(application) generate detection signatures.Requirements
no reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.
Zarathustra
Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,automatically extract the injected code,(application) generate detection signatures.
Requirementsno reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.
Zarathustra
Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,automatically extract the injected code,(application) generate detection signatures.
Requirementsno reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.
Zarathustra
Goals (given a URL and a binary sample):automatically tell if the website is targeted by that sample,automatically extract the injected code,(application) generate detection signatures.
Requirementsno reverse engineering required,no memory forensics (future proof),browser and OS independent (future proof),scalable to million of URLs and thousands of samples.
Key intuition
https://bank.example.com/loginpage
clean machine infected machine
<html> </html>
Pagina pulita
<html>
</html><input />
Pagina infetta
<input />
ObservationThese differences are an unavoidable consequence of an infection.
Web page diffing
Generally hard problem, but can be solved in specific cases
We want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Web page diffing
Generally hard problem, but can be solved in specific cases
We want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Web page diffing
Generally hard problem, but can be solved in specific casesWe want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Web page diffing
Generally hard problem, but can be solved in specific casesWe want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Web page diffing
Generally hard problem, but can be solved in specific casesWe want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Web page diffing
Generally hard problem, but can be solved in specific casesWe want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Web page diffing
Generally hard problem, but can be solved in specific casesWe want to catch:
Inserted DOM nodes<td>foo</td> → <td><input />foo</td>
Inserted node attributes<a href="..."> → <a href="..." onclick="inject();">
Modified attribute valuesonclick="load();" → onclick="load(); inject();"
Modified text nodes<script>foo();</script> → <script>foo(); inject();</script>
Benign vs. Malicious differences
Challenge: Not all differences are malicious!How to tell benign and malicious differences apart?
whitelist attributes that are not bound to eventsstyle="..."width="..."value="..."...
whitelist dynamically inserted nodes
Benign vs. Malicious differences
Challenge: Not all differences are malicious!How to tell benign and malicious differences apart?
whitelist attributes that are not bound to eventsstyle="..."width="..."value="..."...
whitelist dynamically inserted nodes
Benign vs. Malicious differences
Challenge: Not all differences are malicious!How to tell benign and malicious differences apart?
whitelist attributes that are not bound to eventsstyle="..."width="..."value="..."...
whitelist dynamically inserted nodeswhitelist dynamically inserted nodeswhitelist dynamically inserted nodes
Dynamically-inserted nodes
<script> <-- first injectionvar inject = document.createElement("input");...var hook = document.getElementById("login_form");hook.insertBefore(inject, hook.childNodes[0]);
</script>
...<form ...>
<input name="username" /><input name="password" />
<input name="inject" /> <-- dynamic injection</form>...
Dynamically-inserted nodes
<script> <-- first injectionvar inject = document.createElement("input");...var hook = document.getElementById("login_form");hook.insertBefore(inject, hook.childNodes[0]);
</script>
...<form ...>
<input name="username" /><input name="password" />
<input name="inject" /> <-- dynamic injection</form>...
Dynamically-inserted nodes
Static injections© the trojan unavoidably injects at least one static node§ they could be very coarse grained
Dynamic injections© dynamic injections are more fine grained§ very common difference in rich pages
advertising iframesdifferent versions of JS libraries...
→ main source of false positives
Approximate solutionWe disable the JavaScript interpreter to consider only staticinjections.
Dynamically-inserted nodes
Static injections© the trojan unavoidably injects at least one static node§ they could be very coarse grained
Dynamic injections© dynamic injections are more fine grained§ very common difference in rich pages
advertising iframesdifferent versions of JS libraries...
→ main source of false positives
Approximate solutionWe disable the JavaScript interpreter to consider only staticinjections.
Dynamically-inserted nodes
Static injectionsStatic injections© the trojan unavoidably injects at least one static node§ they could be very coarse grained
Dynamic injections© dynamic injections are more fine grained§ very common difference in rich pages
advertising iframesdifferent versions of JS libraries...
→ main source of false positives
Approximate solutionWe disable the JavaScript interpreterdisable the JavaScript interpreter to consider only staticinjections.
Static injections
disable the JavaScript interpreter
Benign vs. Malicious differences
whitelist attributes that are not bound to eventsstyle="..."width="..."alt="..."value="..."...
whitelist dynamically inserted nodescache server responsescache server responsescache server responses
FingerprintgenerationDOM collection DOM comparison
Clean VM 1
VM images
Infected VM
Clean VM 2
Clean VM n
. . .
DOMinjection
DOMn
DOM2
DOM1
DOM
Heuristics
DO
M V
ARIA
NTS
MAL
ICIO
US
DO
M
htt
p:/
/ww
w.b
an
kin
g.s
ite
Trojansample
FingerprintgenerationDOM collection DOM comparison
Clean VM 1
VM images
Infected VM
Clean VM 2
Clean VM n
. . .
DOMinjection
DOMn
DOM2
DOM1
DOM
Heuristics
DO
M V
ARIA
NTS
MAL
ICIO
US
DO
M
htt
p:/
/ww
w.b
an
kin
g.s
ite
Trojansample
FingerprintgenerationDOM collection DOM comparison
Clean VM 1
VM images
Infected VM
Clean VM 2
Clean VM n
. . .
DOMinjection
DOMn
DOM2
DOM1
DOM
Heuristics
DO
M V
ARIA
NTS
MAL
ICIO
US
DO
M
htt
p:/
/ww
w.b
an
kin
g.s
ite
Trojansample
Dataset
56 distinct ZeuS samplesmanually confirmed to be active
213 URLs extracted from real-world configuration files35 clean machines + 1 infected machine
Correct signatures, with whitelisting
Technique Avg. Correct (± Var.) %Covered URLs
3,1 39.58 ± 11.53% 52.17%2,1 74.98 ± 15.42% 23.48%2,3 97.97 ± 0.069% 22.61%
AllAll 100.0%100.0% 23.48%23.48%
Approximation techniques:1 whitelist attributes that are not bound to events2 whitelist dynamically inserted nodes (no JavaScript)3 cache server responses
All 100.0% 23.48%
Limitations
injections that overlap perfectly with existing nodeshard to produce without disrupting the pagedid not see any, so far
dynamic injections via existing JavaScriptunreliable from the viewpoint of the attackereasy to catch (but not to model) with simple fuzzy hashing
as of now, we detect if the code is modified, yet not how.
Conclusions
Simple yet very effective techniquewe implemented it in a web application (just catch me afterthe talk)Implementation, language, platform and family agnosticFuture proof
http://maggi.cc@phretor
Zarathustra: Extracting WebInject Signaturesfrom Banking Trojans
Fabio Bosatelli, Claudio CriscioneStefano Zanero, Federico Maggi
Politecnico di Milano, Italy
PST 2014
Signature example
{"test_node_parent": "form","test_node_value": "input","test_node_xpath": "/html[1]/body[1]/center[3]/table[1]/
tbody[1]/tr[1]/form[1]/input[13]"}
False positives
-10
0
10
20
30
40
50
60
70
0 5 10 15 20 25 30 35-10
0
10
20
30
40
50
60
70
% F
PR
(n)
n = distinct virtual machines
Clean machineInfected machine