Towards Scalable Cluster Auditing through Grammatical Inference over Provenance Graphs Wajih Ul Hassan , Mark Lemay, Nuraini Aguse, Adam Bates, Thomas Moyer NDSS Symposium 2018 Feb 20, 2018
Towards Scalable Cluster Auditing through Grammatical Inference
over Provenance Graphs WajihUlHassan,MarkLemay,NurainiAguse,
AdamBates,ThomasMoyer
NDSS Symposium 2018 Feb 20, 2018
Notable Data Breach in 2017
2
Notable Data Breach in 2017
3
Equifax Data Breach Timeline 2017
apr may jun jul aug sep oct
Breached Detected
Hackers in Equifax Servers
Patched
Breached Announced
Notable Data Breach in 2017
4
Equifax Data Breach Timeline 2017
apr may jun jul aug sep oct
Breached Detected
Hackers in Equifax Servers
Patched
Breached Announced
3 Months of crucial attack audit logs
Notable Data Breach in 2017
5
Equifax Data Breach Timeline 2017
apr may jun jul aug sep oct
Breached Detected
Hackers in Equifax Servers
Patched
Breached Announced
3 Months of crucial attack audit logs Are current auditing systems scalable?
Data Provenance aka Audit log � Lineage of system activities � Represented as Directed Acyclic Graph (DAG) � Used for forensic analysis
6
1. Bash,SpawnsNGINX2. NGINX,Receivesfromabc.com3. NGINX,ReadsFileindex.html4. ….......
index.html
NGINX
abc.com
Auditlog
Bash
ProvenanceGraph
Bash:exec(“./NGINX”);NGINX:recv(…,“abc.com”);fread(“index.html”);
CodeExecu@on
Data Provenance in a Cluster
7
WorkerNodes
MasterNode
Centralized auditing not practical due to two
limitations
Limitation#1: Graph Complexity � NGINX and MySQL running for 5 mins on a single machine
8
Finding needle in a haystack problem
Limitation#2: Storage overhead � Leads to network overhead as logs are transferred to master
node
9
[VALUE]GB
[VALUE]GB
[VALUE]GB
[VALUE]GB
0
2
4
6
8
10
12
14
Day1 Day2 Day3 Day4 Day5
LogSize(G
B)
AuditLogSizeGrowthforaSingleNGINXserver
Uncompressed Compressed
2.54GB/Dayonasinglemachine
Withcompression>1GB/Day
Winnower
� Cluster applications are replicated in accordance with microservice architecture principle
� Replicated apps produce highly homogeneous provenance graphs � core execution behaviour is similar
10
Key Idea: Remove redundancy from provenance graphs across cluster before sending to master node
11
Before
/up/*
NGINX
*
Bash
mysql
*
mysqld
/db/*
Master Node View with Winnower
After
Otherlibraryfilever@ces
Winnower � Build consensus model across cluster using graph grammars � Like string grammar, graph grammars provide rule-based
mechanisms � For generating, manipulating and analyzing graphs � Induction – produce grammar from a given graph � Parsing – membership test of a given graph is in a grammar
12
a a
t t t
b b b
a
S ≔ A T
A ≔ a B
T t
B≔
b
S
S ≔ e
a e
t
b
Graph GraphGrammar
Architecture
13
AuditModule
Prov.Graph
Worker Nodes
ModelAggregator
Master Node
Worker Node
Fetchgraphateachepoch
——
——
——
—— —
—
——
——
———
—
Fine-grained Graph
Abstracted Graph
Graph
Abstraction
ModelGraph
Graph
Induction
WinnowerAgent
Modelgraphs/grammarsfromcluster
Architecture
14
AuditModule
ModelAggregator
——
——
——
—— —
—
——
——
———
—
Fine-grained Graph
Abstracted Graph
Graph
Abstraction
ModelGraph
Graph
Induction
WinnowerAgent
AggregatedModelOnlysendModelupdates
Worker Nodes
Master Node
Worker Node
Prov.Graph
Fetchgraphatnextepoch
Architecture
15
AuditModule
ModelAggregator
WinnowerAgent
QuerypartofProvenancegraphHigh-fidelity
Provenancegraph
Worker Nodes
Master Node
Worker Node
Provenance Graph Abstraction � Graph Induction process builds a model/grammar that concisely
describe the whole graph � However, instance-specific fields frustrate any attempts to build a
generic application behaviour model
16
NoGeneralmodelasinstancespecificinforma@onsuchPIDisdifferentamonggraphs
ftppid:2788
ftp workerpid:2797
192.168.0.2
ftp listenerpid:2789
192.168.0.1
ftp listenerpid:2791
ftp workerpid:2795
192.168.0.2192.168.0.1 /up/File1Inode:3
/up/File2Inode:5
ftppid:2780
Node 1 Node 2
GraphInduc@on
Provenance Graph Abstraction � Provenance graph vertices have well defined fields
� E.g. pid:1234 , FilePath:/etc/ld.so� Defined rules manually that remove or generalize these fields
ftppid:2788
ftp workerpid:2797
192.168.0.2
ftp listenerpid:2789
192.168.0.1
ftp listenerpid:2791
ftp workerpid:2795
192.168.0.2192.168.0.1 /up/File1Inode:3
/up/File2Inode:5
ftppid:2780
Node 1 Node 2ftp
ftp worker
192.168.0.0/24
ftp listener
192.168.0.0/24
ftp listener
ftp worker
192.168.0.0/24192.168.0.0/24/up/* /up/*
ftp
Node 2Node 1
GraphAbstrac@on
Provenance Graph Induction � Deterministic Finite Automata (DFA) Learning to generate grammar
� Encodes the causality in generated models � In DFA learning the present state of a vertex includes the path taken
to reach the vertex (provenance ancestry) � Winnower extends it to remember descendants (provenance progeny)
� State of each vertex consist of three items: 1. Label 2. Provenance ancestry 3. Provenance progeny
18
File1.txt
gzip
Bash
File1.txt
Progenyofgzipvertex
Ancestryofgzipvertex
Provenance Graph Induction � Finds repetitive patterns using standard implicit and explicit
state merging algorithm � Implicit state merging combines two subgraphs if states of each
vertex are same in both subgraphs
19
ftp
ftp worker
192.168.0.0/24
ftp listener
192.168.0.0/24
ftp listener
ftp worker
192.168.0.0/24192.168.0.0/24/up/* /up/*
ftp
Node 2Node 1 ftp192.168.0.0/24
ftp listener
ftp worker
192.168.0.0/24 /up/*
Confidence levelLegend 2
GraphInduc@on
java
javamapper
data
javareducer
java
java mapper
data
javareducer
Node 1
Explicit State Merging
20Mergetwonodes
� At high-level explicit state merging � Picks two nodes and make their states same � Check if subgraph can be merged implicitly
� Consider a chained map reduce job
java
java mapper
data
javareducer
java
java mapper
data
javareducer
Node 1
:=SS:=AS:=T|VT:=A->X|A->YX:=B->WY:=C->WW:=D|D->SA:=dataB:=javamapperC:=javareducerD:=java
GraphGrammar
A
B
D
C
Provenance Graph Induction � Consider a graph with a malicious activity � Malicious behavior is visible in the final model
21
GraphInduc@on
ftp
ftp worker
ftp listener
ftp listener
ftp worker/up/*
/up/*
bash
Malicious filewget
x.x.x.x
ftp
Node 1
ftp ftp listener
ftp worker/up/*
Node 3
Node 2ftp
ftp listener
ftp worker
/up/*
bash
Malicious file
wget
x.x.x.x
Confidence levelLegend 1 3
Master Node
Evaluation Setup
� Setup � 1 VM as master node, 4 VMs as worker nodes � SPADE and Docker Swarm � Epoch size 50 sec
� Metrics � Storage Overhead � Computational Cost � Effectiveness
22
Storage Overhead on Master Node
23
0 100 200 300 400 500 600 700
HTTPD
ProFTPD
MySQL
485
630
130
0.11
0.12
0.17
LOGSIZEINMB
Winnower Raw
98.7%decrease
Storage Reduction on Master Node
24
� Apache Webserver with moderate workload
� Note the log scale on y-axis
1
10
100
1000
10000
100000
50 100 150 200 250 300 350 400 450 500 550 600
LOG
SIZ
E (M
B)
TIME (SEC)
Raw(Uncompressed) Raw(Compressed) Winnower
7zcompressionisnotsuitable:• Noglobalviewofcluster• Oblivioustopreviousbatch
Evaluation: Computation Cost
25
� Average time spent in induction and membership test at each epoch
0
5
10
15
20
25
30
35
50 100 150 200 250 300 350 400 450 500 550 600
AverageTime(sec)
ElapsedTime(sec)
Apache MySQL ProFTPD
HeterogeneousWorkload->
Updatesmodel
GenerateModelforfirst@me
Membershipcheckinexis@ngmodel
Case Study: Ransomware Attack
26
• Attacker exploits Redis database server vulnerability version < 3.2
• Vulnerability allows attacker to change SSH key and log in as Root
• Attacker deletes the database and left a note using vim to send bitcoins get database back
Traditional Graph of Attack
27
� 10 instances of redis running in the cluster � ~80k vertices and ~83K edges with 161 MB size � Part of provenance graph shown below
28
Winnower Generated Provenance graph
� 54 vertices and 68 edges with 0.7 MB size � Part of graph is shown below:
Worker
* /uploads/*
redis-server
x.x.x.x
Attack Provenance
Nginx
*
bash
/root/.ssh/authorized_keys
*
172.17.0.0/24
/var/lib/redis/dump.rdb
/proc/12743/stat
/var/log/redis/redis.log
x.x.x.x sshd bash
/root/ransomware.notevim
/dev/tty
Other library files
Confidence levelLegend 1 10
29
Winnower Generated Provenance graph
� What happens if we attack all the nodes in the cluster
Worker
* /uploads/*
redis-server
x.x.x.x
Attack Provenance
Nginx
*
bash
/root/.ssh/authorized_keys
*
172.17.0.0/24
/var/lib/redis/dump.rdb
/proc/12743/stat
/var/log/redis/redis.log
x.x.x.x sshd bash
/root/ransomware.notevim
/dev/tty
Other library files
Confidence levelLegend 10
� Winnower is the first practical system for provenance-based auditing of clusters at scale with low overhead
� Winnower significantly improves attack identification and investigation in a large cluster
30
Conclusion
32
Backup Slides
Threat model
� Assumptions � Winnower only tracks user-space attacks i.e. trusts the OS � Log integrity is maintained
� Attack surface � Distributed application replicated on Worker nodes
� Attacker’ motive � Gain control over worker node by exploiting a software vulnerability in
the distributed application
33