Towards Scalable Cluster Auditing through Grammatical Inference … · 2019. 4. 7. · through Grammatical Inference over Provenance Graphs Wajih Ul Hassan, Mark Lemay, Nuraini Aguse,

Post on 25-Sep-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Towards Scalable Cluster Auditing through Grammatical Inference

over Provenance Graphs WajihUlHassan,MarkLemay,NurainiAguse,

AdamBates,ThomasMoyer

NDSS Symposium 2018 Feb 20, 2018

Notable Data Breach in 2017

2

Notable Data Breach in 2017

3

Equifax Data Breach Timeline 2017

apr may jun jul aug sep oct

Breached Detected

Hackers in Equifax Servers

Patched

Breached Announced

Notable Data Breach in 2017

4

Equifax Data Breach Timeline 2017

apr may jun jul aug sep oct

Breached Detected

Hackers in Equifax Servers

Patched

Breached Announced

3 Months of crucial attack audit logs

Notable Data Breach in 2017

5

Equifax Data Breach Timeline 2017

apr may jun jul aug sep oct

Breached Detected

Hackers in Equifax Servers

Patched

Breached Announced

3 Months of crucial attack audit logs Are current auditing systems scalable?

Data Provenance aka Audit log � Lineage of system activities � Represented as Directed Acyclic Graph (DAG) � Used for forensic analysis

6

1.   Bash,SpawnsNGINX2.  NGINX,Receivesfromabc.com3.  NGINX,ReadsFileindex.html4.  ….......

index.html

NGINX

abc.com

Auditlog

Bash

ProvenanceGraph

Bash:exec(“./NGINX”);NGINX:recv(…,“abc.com”);fread(“index.html”);

CodeExecu@on

Data Provenance in a Cluster

7

WorkerNodes

MasterNode

Centralized auditing not practical due to two

limitations

Limitation#1: Graph Complexity � NGINX and MySQL running for 5 mins on a single machine

8

Finding needle in a haystack problem

Limitation#2: Storage overhead � Leads to network overhead as logs are transferred to master

node

9

[VALUE]GB

[VALUE]GB

[VALUE]GB

[VALUE]GB

0

2

4

6

8

10

12

14

Day1 Day2 Day3 Day4 Day5

LogSize(G

B)

AuditLogSizeGrowthforaSingleNGINXserver

Uncompressed Compressed

2.54GB/Dayonasinglemachine

Withcompression>1GB/Day

Winnower

� Cluster applications are replicated in accordance with microservice architecture principle

� Replicated apps produce highly homogeneous provenance graphs �  core execution behaviour is similar

10

Key Idea: Remove redundancy from provenance graphs across cluster before sending to master node

11

Before

/up/*

NGINX

*

Bash

mysql

*

mysqld

/db/*

Master Node View with Winnower

After

Otherlibraryfilever@ces

Winnower � Build consensus model across cluster using graph grammars �  Like string grammar, graph grammars provide rule-based

mechanisms �  For generating, manipulating and analyzing graphs �  Induction – produce grammar from a given graph �  Parsing – membership test of a given graph is in a grammar

12

a a

t t t

b b b

a

S ≔ A T

A ≔ a B

T t

B≔

b

S

S ≔ e

a e

t

b

Graph GraphGrammar

Architecture

13

AuditModule

Prov.Graph

Worker Nodes

ModelAggregator

Master Node

Worker Node

Fetchgraphateachepoch

——

——

——

—— —

——

——

———

Fine-grained Graph

Abstracted Graph

Graph

Abstraction

ModelGraph

Graph

Induction

WinnowerAgent

Modelgraphs/grammarsfromcluster

Architecture

14

AuditModule

ModelAggregator

——

——

——

—— —

——

——

———

Fine-grained Graph

Abstracted Graph

Graph

Abstraction

ModelGraph

Graph

Induction

WinnowerAgent

AggregatedModelOnlysendModelupdates

Worker Nodes

Master Node

Worker Node

Prov.Graph

Fetchgraphatnextepoch

Architecture

15

AuditModule

ModelAggregator

WinnowerAgent

QuerypartofProvenancegraphHigh-fidelity

Provenancegraph

Worker Nodes

Master Node

Worker Node

Provenance Graph Abstraction � Graph Induction process builds a model/grammar that concisely

describe the whole graph � However, instance-specific fields frustrate any attempts to build a

generic application behaviour model

16

NoGeneralmodelasinstancespecificinforma@onsuchPIDisdifferentamonggraphs

ftppid:2788

ftp workerpid:2797

192.168.0.2

ftp listenerpid:2789

192.168.0.1

ftp listenerpid:2791

ftp workerpid:2795

192.168.0.2192.168.0.1 /up/File1Inode:3

/up/File2Inode:5

ftppid:2780

Node 1 Node 2

GraphInduc@on

Provenance Graph Abstraction � Provenance graph vertices have well defined fields

�  E.g. pid:1234 , FilePath:/etc/ld.so� Defined rules manually that remove or generalize these fields

ftppid:2788

ftp workerpid:2797

192.168.0.2

ftp listenerpid:2789

192.168.0.1

ftp listenerpid:2791

ftp workerpid:2795

192.168.0.2192.168.0.1 /up/File1Inode:3

/up/File2Inode:5

ftppid:2780

Node 1 Node 2ftp

ftp worker

192.168.0.0/24

ftp listener

192.168.0.0/24

ftp listener

ftp worker

192.168.0.0/24192.168.0.0/24/up/* /up/*

ftp

Node 2Node 1

GraphAbstrac@on

Provenance Graph Induction � Deterministic Finite Automata (DFA) Learning to generate grammar

�  Encodes the causality in generated models �  In DFA learning the present state of a vertex includes the path taken

to reach the vertex (provenance ancestry) �  Winnower extends it to remember descendants (provenance progeny)

� State of each vertex consist of three items: 1.  Label 2.  Provenance ancestry 3.  Provenance progeny

18

File1.txt

gzip

Bash

File1.txt

Progenyofgzipvertex

Ancestryofgzipvertex

Provenance Graph Induction � Finds repetitive patterns using standard implicit and explicit

state merging algorithm �  Implicit state merging combines two subgraphs if states of each

vertex are same in both subgraphs

19

ftp

ftp worker

192.168.0.0/24

ftp listener

192.168.0.0/24

ftp listener

ftp worker

192.168.0.0/24192.168.0.0/24/up/* /up/*

ftp

Node 2Node 1 ftp192.168.0.0/24

ftp listener

ftp worker

192.168.0.0/24 /up/*

Confidence levelLegend 2

GraphInduc@on

java

javamapper

data

javareducer

java

java mapper

data

javareducer

Node 1

Explicit State Merging

20Mergetwonodes

� At high-level explicit state merging �  Picks two nodes and make their states same �  Check if subgraph can be merged implicitly

� Consider a chained map reduce job

java

java mapper

data

javareducer

java

java mapper

data

javareducer

Node 1

:=SS:=AS:=T|VT:=A->X|A->YX:=B->WY:=C->WW:=D|D->SA:=dataB:=javamapperC:=javareducerD:=java

GraphGrammar

A

B

D

C

Provenance Graph Induction � Consider a graph with a malicious activity � Malicious behavior is visible in the final model

21

GraphInduc@on

ftp

ftp worker

ftp listener

ftp listener

ftp worker/up/*

/up/*

bash

Malicious filewget

x.x.x.x

ftp

Node 1

ftp ftp listener

ftp worker/up/*

Node 3

Node 2ftp

ftp listener

ftp worker

/up/*

bash

Malicious file

wget

x.x.x.x

Confidence levelLegend 1 3

Master Node

Evaluation Setup

� Setup �  1 VM as master node, 4 VMs as worker nodes �  SPADE and Docker Swarm �  Epoch size 50 sec

� Metrics �  Storage Overhead �  Computational Cost �  Effectiveness

22

Storage Overhead on Master Node

23

0 100 200 300 400 500 600 700

HTTPD

ProFTPD

MySQL

485

630

130

0.11

0.12

0.17

LOGSIZEINMB

Winnower Raw

98.7%decrease

Storage Reduction on Master Node

24

�  Apache Webserver with moderate workload

�  Note the log scale on y-axis

1

10

100

1000

10000

100000

50 100 150 200 250 300 350 400 450 500 550 600

LOG

SIZ

E (M

B)

TIME (SEC)

Raw(Uncompressed) Raw(Compressed) Winnower

7zcompressionisnotsuitable:•  Noglobalviewofcluster•  Oblivioustopreviousbatch

Evaluation: Computation Cost

25

�  Average time spent in induction and membership test at each epoch

0

5

10

15

20

25

30

35

50 100 150 200 250 300 350 400 450 500 550 600

AverageTime(sec)

ElapsedTime(sec)

Apache MySQL ProFTPD

HeterogeneousWorkload->

Updatesmodel

GenerateModelforfirst@me

Membershipcheckinexis@ngmodel

Case Study: Ransomware Attack

26

•  Attacker exploits Redis database server vulnerability version < 3.2

•  Vulnerability allows attacker to change SSH key and log in as Root

•  Attacker deletes the database and left a note using vim to send bitcoins get database back

Traditional Graph of Attack

27

� 10 instances of redis running in the cluster � ~80k vertices and ~83K edges with 161 MB size � Part of provenance graph shown below

28

Winnower Generated Provenance graph

� 54 vertices and 68 edges with 0.7 MB size � Part of graph is shown below:

Worker

* /uploads/*

redis-server

x.x.x.x

Attack Provenance

Nginx

*

bash

/root/.ssh/authorized_keys

*

172.17.0.0/24

/var/lib/redis/dump.rdb

/proc/12743/stat

/var/log/redis/redis.log

x.x.x.x sshd bash

/root/ransomware.notevim

/dev/tty

Other library files

Confidence levelLegend 1 10

29

Winnower Generated Provenance graph

� What happens if we attack all the nodes in the cluster

Worker

* /uploads/*

redis-server

x.x.x.x

Attack Provenance

Nginx

*

bash

/root/.ssh/authorized_keys

*

172.17.0.0/24

/var/lib/redis/dump.rdb

/proc/12743/stat

/var/log/redis/redis.log

x.x.x.x sshd bash

/root/ransomware.notevim

/dev/tty

Other library files

Confidence levelLegend 10

� Winnower is the first practical system for provenance-based auditing of clusters at scale with low overhead

� Winnower significantly improves attack identification and investigation in a large cluster

30

Conclusion

Thank you for your time. whassan3@illinois.edu

31

Questions

32

Backup Slides

Threat model

� Assumptions �  Winnower only tracks user-space attacks i.e. trusts the OS �  Log integrity is maintained

� Attack surface �  Distributed application replicated on Worker nodes

� Attacker’ motive �  Gain control over worker node by exploiting a software vulnerability in

the distributed application

33

top related