Top Banner
Narus Company Confidential 1 Summer 2011 Company Meeting CyberEagle: Automated Discovery , Attribution, Analysis and Risk Assessment of Information Security Threats  Saby Saha, Narus Lei Liu, Michigan State Universit y Prakash Mandayam, Michigan State University
29

Summer 2011 Company Meeting Narus

Apr 02, 2018

Download

Documents

StopSpyingOnMe
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 1/29

Narus Company Confidential 1

Summer 2011 Company Meeting

CyberEagle: Automated Discovery, Attribution, Analysis and Risk

Assessment of Information Security Threats Saby Saha, Narus

Lei Liu, Michigan State University

Prakash Mandayam, Michigan State University

Page 2: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 2/29

Narus Company Confidential 2Narus Company Confidential

CyberEagle

• Motivation and Challenges

• Project Layout

Architecture• Statistical Machine Learning/ Data Mining

• Results

• Conclusion & Future Work

Page 3: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 3/29

Narus Company Confidential 3Narus Company Confidential

Increasing Security Threats

• Continuous and increased attacks on infrastructure

• Threats to business, national security

• Huge financial stake (Conficker: 10 million machines, Loss$9.1 Billion) 

• Attacks are becoming more advanced and sophisticated

• Honeypots, IDS/IPS, Email/IP Reputation Systems areinadequate

Zeus: 3.6 million machines [HTML Injection]Koobface: 2.9 million machines [Social

Networking Sites]

TidServ: 1.5 million machines [Email spam

attachment]

Page 4: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 4/29

Narus Company Confidential 4Narus Company Confidential

More Sophisticated Attacks

Page 5: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 5/29

Narus Company Confidential 5Narus Company Confidential

Host Based Security

• Complete monitoring end hosts behavior and thestate of the system

• Often analyzes a malware program in a controlledenvironment to build a model its behavior

Pros – Information rich view: high detection rate with low falsepositive

 – Reverse engineer the properties of the Threat

• Cons – After-the-fact approach

• Require malicious code for analysis

 – Fail to identify evolved threats

 – Not effective to identify zero-day threats

Page 6: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 6/29

Narus Company Confidential 6Narus Company Confidential

Network Security

Firewall systems• IDS/IPS

• Network behavior anomaly detection (NBAD)

• Pros: – Complete macro view of the network – With the knowledge of good traffic it can identify

anomalies

 – Able to identify new threats as anomalies

• Cons – Generate large number of false positives

 – Unsupervised approach, lacks ground truth

Page 7: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 7/29Narus Company Confidential 7Narus Company Confidential

Bringing Them Together

• Leverage advantages of both the approaches

• Host-security tag flows with threat signatures

 – Generates ground truth for associated with flows

• Network security can learn rich statistical model

for all threats using the flow data tagged with

ground truth

• Develop a comprehensive end-to-end data

security system for real-time discovery, analysis,

and risk assessment of security threats

Page 8: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 8/29Narus Company Confidential 8Narus Company Confidential

Enhanced Comprehensive Security System

• Discover common and persistent behavioralpatterns for all security threats – Even when sessions are encrypted (IDS/IPS fails)

• Generate precise threat alerts in real-time – Reduce the false positive rate

• Identify new threats which has some similaritieswith previous ones – Newly evolved version of a threat

 – New threat with similar behavioral pattern

• Inform about the newly identified threat to thehost-security

Page 9: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 9/29Narus Company Confidential 9Narus Company Confidential

System Overview

Model Generation

• Extract Set of Transport LayerFeatures

• Generation of Statistical Models

Classification

• Flush Out Model to Streaming

Classification Path• Redirect Packets Matching Model toBinary Analysis Module

ValidationAssessment

• Extract Executable and ExecuteExecutable

• Analysis of Information Touched

• Assess the Risk• Increase Confidence and Alert

Page 10: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 10/29Narus Company Confidential 10Narus Company Confidential

Information Flow

Page 11: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 11/29

Narus Company Confidential 11Narus Company Confidential

Supervised Threat Classification

•Data – Network flow features

• Kernel

 –

Define similarity between different flows• Classifier

 – Binary to separate good from bad

 – Multiclass to further separate bad flows

• Scalability issues

 – Hierarchy

Page 12: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 12/29

Narus Company Confidential 12Narus Company Confidential

Challenges

Irregular data – Missing values.

 – Imbalanced data

 – Heterogeneous.

 –

Non applicable features.• Large number of classes (Number of threats

reaches hundreds of thousands)

• New classes

• Noise in the data• All threat classes may not be captured

• Minimize false positives

Page 13: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 13/29

Narus Company Confidential 13Narus Company Confidential

Preprocessing

• Normalization

• Deal with missing values

 – Case deleting method: – Mean imputation

• Overall classes

• Each individual class

 – Median imputation• Overall classes

• Each individual class

Page 14: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 14/29

Narus Company Confidential 14Narus Company Confidential

Classifier Framework

… 

   S   u   p   e   r   v   i   s   e    d

    C    l   a   s   s   i    f   i   e   r

Flows

SNORT Bad Flows

76 different classes

13935 Flows

Unknown Flows

44427 Flows

Class 1

Class 76

Class 2

 … Shellcode

Spambot_Proxy_Control_Channel

Exploit_Suspected_PHP_Injection_Attack

Macro-Level

Classifier

Unknown

CL_A

Bad

CL_B CL_N

Micro-LevelClassifier

Learning/Training

Learning/Training

Page 15: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 15/29

Narus Company Confidential 15Narus Company Confidential

Binary Classifier Results

• Biased SVM performance comparison with different kernels

Linear Kernel RBF Kernel Poly Kernel

Precision good 79.75 87.46 78.70

Recall good 87.07 90.42 97.79

F1 good  83.25  88.9347  87.2126 Precision bad 79.75 69.33 79.78

Recall bad 37.17 62.55 24.81

F1 bad  42.74  65.7657  34.8495 

Accuracy  74.08  83.26  78.79 

G-mean  56.89  75.21  49.25 

Kernel Learning

Page 16: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 16/29

Narus Company Confidential 16Narus Company Confidential

Binary Classifier Results

• Parameter selection for Biased SVM with RBF Kernel

When gamma=10,

C+/C_=0.5, win best

F1_bad = 0.6494

When gamma=10,

C+/C_=0.55, win best

F1_bad = 0.657657

Page 17: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 17/29

Narus Company Confidential 17Narus Company Confidential

Binary Classifier Results

• F1 bad comparison of the methods for Binary classifier

F1 best performance with/without noise: 79.07/88.7 %

F1 bad comparison with noise

45.57 45.57 46.41

63.74 65.7657

76.01 79.07

0

10

20

30

40

50

60

70

80

90

Bagging

SMO

Adaboost SMO KNN Biased

SVM

Decision

Tree

Bagging

Decision

Tree

F1 bad comparison without noise

51.7 51.7 53.4

79.43

67.55

86.8 88.7

010

20

30

40

50

60

70

80

90

100

Bagging

SMO

Adaboost SMO KNN Biased

SVM

Decision

Tree

Bagging

Decision

Tree

Page 18: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 18/29

Narus Company Confidential 18Narus Company Confidential

Preprocessing (Multiclass)

Tree based generated features – For each class k, do

• Repeat c times

 – Collect samples from class k, label them +1

 – Collect samples from class kc, label them -1.

 – Build a regression tree on above binary data.

 –

Store the tree as Tik • End

 – End

• Example:

Home

owner

Marital status Annual

income

Number of 

children

age

- married 125K - 41

No Not married 70K N/A 22

No - 59K 1 55

yes Not married - N/A 23

yes married 100K 1 -

Tree 1 Tree 2 Tree 3 Tree 4 Tree5

-0.25 -1 -0.5 -1 -0.14286

-0.25 -1 -0.5 -0.33333 -0.14286

-1 0.2 1 1 0.142857

0.5 0.714286 0.5 0.25 -1

-0.25 -0.33333 -0.5 0.777778 -0.14286

Original features Tree based features

transformation

Page 19: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 19/29

Narus Company Confidential 19Narus Company Confidential

Preprocessing

• Multiclass results comparison with – Original features

 – Tree based generated features

Original Features 

Tree based features 

Class

ID 

Precision  Recall  F1  Precision  Recall  F1 

24 77.65  78.30  77.97  86.12  88  87.05 

25 63.62  70.02  66.67  79.3  82  80.63 

28 99.36  99.70  99.53  100  100  100 

48 82.16  73.95  77.84  79.68  77.9  78.78 

68 69.05  71.38  70.20  67.7  76  71.61 

76 66.58  71.23  68.83  68.45  66.6  67.51 

76.40

80.21

77.43

81.75

76.84

80.93

73

74

75

76

77

78

79

80

8182

83

Precision

original

features

Precision

Tree based

features

Recall

original

features

Recall Tree

based

features

F1 original

features

F1 tree

based

features

Average performance of 6 majority classes Performance of 6 majority classes

Page 20: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 20/29

Narus Company Confidential 20Narus Company Confidential

Multi-class Classification

Identify individual threats• Identify new classes and provide properties

• Classifiers – K-Nearest Neighbor

•No training involved

• Computationally intensive for testing

 – Ensemble methods• Failing to scale up for huge number of classes

 – Sphere-based SVM• Encapsulate each class in a hyper sphere.

• Transform data into appropriate space such thatthey cluster into single cohesive unit

Page 21: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 21/29

Narus Company Confidential 21Narus Company Confidential

Building Kernel

• Let (Xi,Yi) be the data points where Yi={+1,-1}

• Construct ground truth kernel K – Kij = YiY j

• Now learn a parametric kernel as follows

 – Kij = f θ(Xi,X j)

Home

owner

Marital

status

Annual

income

Number

of 

children

age Y

- married 125K - 41 +1

No Not married 70K N/A 22 +1No - 59K 1 55 +1

yes Not married - N/A 23 -1

yes married 100K 1 - -1

- Married - 2 32 -1

Kij ~f θ(Xi,X j)

Once θ is learned, it can be applied onto the test set.

 y y

class 1 2 3 4 5 6

1 +1 +1 +1 -1 -1 -1

2 +1 +1 +1 -1 -1 -1

3 +1 +1 +1 -1 -1 -14 -1 -1 -1 +1 +1 +1

5 -1 -1 -1 +1 +1 +1

6 -1 -1 -1 +1 +1 +1

Page 22: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 22/29

Narus Company Confidential 22Narus Company Confidential

Kernel for Multi Class

• For each class we do following

 – Collect samples belonging to class and label as +1

 – Collection samples from rest of data and label as -1

 – Build separate kernel for each class.

class 1 2 3 4 5 6

1 +1 +1 +1 -1 -1 -1

2 +1 +1 +1 -1 -1 -1

3 +1 +1 +1 -1 -1 -14 -1 -1 -1 +1 +1 +1

5 -1 -1 -1 +1 +1 +1

6 -1 -1 -1 +1 +1 +1

Kij ~f θ(Xi,X j)

Page 23: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 23/29

Narus Company Confidential 23Narus Company Confidential

Boosted Trees for Kernel Learning

1

1

1

1

1

1

1 y

1 2 3 4 5 6

1 +1 -1 -1 +1 -1 +1

2 -1 +1 +1 -1 +1 -1

3 -1 +1 +1 -1 +1 -1

4 +1 -1 -1 +1 -1 +1

5 -1 +1 +1 -1 +1 -1

6 +1 -1 -1 +1 -1 +1

1 2 3 4 5 6

1 +1 +1 +1 -1 -1 -1

2 +1 +1 +1 -1 -1 -1

3 +1 +1 +1 -1 -1 -1

4 -1 -1 -1 +1 +1 +1

5 -1 -1 -1 +1 +1 +1

6 -1 -1 -1 +1 +1 +1

Output of tree 1 Kernel matrix for tree 1

1

1

1

1

1

1

2 y

Output of tree 2 Kernel matrix for tree 2

1 2 3 4 5 6

1 +1 -1 +1 +1 -1 +1

2 -1 +1 -1 +1 +1 -1

3 -1 +1 +1 -1 +1 -1

4 +1 -1 -1 +1 -1 +1

5 -1 +1 +1 -1 +1 -1

6 +1 -1 -1 +1 -1 +1

.

Page 24: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 24/29

Narus Company Confidential 24Narus Company Confidential

Multi class Results

Spheres require only K =6

(number of classes)

comparison whereas KNN

require N comparisons.

Page 25: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 25/29

Narus Company Confidential 25Narus Company Confidential

Classification +New Class Detection

Find transformation

to separate class +from rest of data

Find transformation

to separate class xfrom rest of data

Find transformationto separate class --from rest of data

Find transformation

to separate class ^from rest of data

Build a

separate

Kernel for

each class

Page 26: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 26/29

Narus Company Confidential 26Narus Company Confidential

New Class Generation

Page 27: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 27/29

Narus Company Confidential 27Narus Company Confidential

Conclusion

• CyberEagle: An enhanced comprehensive security

system

 – Bringing Host and Network security together to fight

security threats• Identify threats that IDS/IPS fails to detect

(Encrypted, evolved)

Identify new threats in the earliest stage• Generate signatures for the new threats and alert

the host security system in an automated way

Page 28: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 28/29

Narus Company Confidential 28Narus Company Confidential

Future Work

Improve classification accuracy• Scaling up for huge number of classes

• Reduce computation during classification

 –

Learn class hierarchy – Increase speed without sacrificing accuracy

• Validate with diverse data

• Reputation analysis of the ip addresses

• Online update of the classifier

• Mapreduce implementations

Page 29: Summer 2011 Company Meeting Narus

7/27/2019 Summer 2011 Company Meeting Narus

http://slidepdf.com/reader/full/summer-2011-company-meeting-narus 29/29

Narus Company Confidential 29

Summer 2011 Company Meeting

Thank YouPrakash, Lei, Saby