Top Banner
Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams Aaron Tuor w/ Brian Hutchinson, Sam Kaplan, Nicole Nichols, Sean Robinson
24

Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Jan 03, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Aaron Tuor

w/ Brian Hutchinson, Sam Kaplan, Nicole Nichols, Sean Robinson

Page 2: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Pacific Northwest National Laboratory

• Location in Richland and Seattle Washington

• Applications for internships are now open

http://jobs.pnnl.gov/

• Spend a whole summer doing Computer Science research

• Collaborate with scientists from other fields

• Fun enrichment events:• Hanford museum

• Lake Washington cruise

• Virtual reality lab tour

• Tour Laser Interferometer Gravitational-Wave Observatory (LIGO)

2

Page 3: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Anomaly Detection/ Outlier Analysis 3

Page 4: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Motivation

• Insider Threat: Actions taken by an employee harmful to an organization Unsanctioned data transfer

Sabotage of resources

Misuse of network that disrupts organization

• Analysis of network activity can detect Insider Threat

• Automated filtering can help reduce the analyst workload

4

Page 5: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Approach Constraints

Approach should provide:

• Real time evaluation

• Upper bound on storage requirements

• Analysis of structured multivariate data

• Adaptation to shifting distribution of activities

• Interpretable assessments

5

Page 6: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

System at a Glance

file email http http email Feature Extractor

Raw “Events”

… …

Feature Vectors

Batcher /

Dispatcher

Neural

Network

Anomaly Scores

6

Page 7: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Outline

• Data processing and feature extraction

• Deep learning architectures

• Experiments and results

• Takeaways

file email http http email Feature Extractor… …

7

Page 8: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Data Sources

• CERT Insider Threat (version 6.2)Synthetic data generated according to sophisticated user model

516 days, 135 million events total

Email, web, logon, file and device usage events

5 bad actors produce 470 threat events

Accompanying user meta data (role, project, team, …)

file email http http email …

8

Page 9: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

CERT: Example Log Line

Email

Event

Event ID I102-B4EB49RW-7379WSQW

Date 1/2/2010 6:36:41

User HDB1666

PC PC-6793

To [email protected]

CC [email protected]

BCC [email protected]

From [email protected]

Activity Send

Size 45659

Attachments <none>

Content Now Sylvia, the object …

9

Page 10: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Aggregate Feature Vector

• For each user, aggregate their events over a window of time (e.g. one day)

• Example feature vector:

19 3 281 … 8 23 47 29 35 … 8𝑥𝑡𝑢 =

Role=“ComputerScientist”

Project=“3”

Team=“Systems Engineering 20”

Supervisor=“Carter Wells”

CountsCategorical

# Emails Sent w/ Att.

# Webpages Visited

# Logons

# File Writes

# Web

Downloadsfile

email

httphttphttphttphttp

logon

file

emailemail

device

10

Page 11: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Outline

• Data processing and feature extraction

• Deep learning architectures

• Experiments and results

• Takeaways and ongoing work

Batcher /

Dispatcher

Neural

Network

Anomaly Scores

11

Page 12: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Deep Neural Network Autoencoder

ℎ𝑖 = 𝑔 𝑊𝑖𝑇ℎ𝑖−1 + 𝑎𝑖hidden layer

output

𝑊1,𝑎

1𝑈,𝑏

input

hidden layer

ℎ1 = 𝑔 𝑊1𝑇𝑥 + 𝑎1

𝑦 = 𝑓 𝑈𝑇ℎ𝐿 + 𝑏• Parametric function

• Trained to reproduce input as output

• Complexity is constrained to prevent

learning identity function

• Anomaly is detected when a poor

reconstruction of an input is made by

the model

12

Page 13: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Predicting Structured Events

• First, we decompose the joint probability (assume independence)

𝑃 𝑅, 𝑃, 𝑇, 𝑆, 𝐶 ℎ𝑡 ≈ 𝑃 𝑅𝑜𝑙𝑒 ℎ𝑡 𝑃 𝑃𝑟𝑜𝑗𝑒𝑐𝑡 ℎ𝑡 𝑃 𝑇𝑒𝑎𝑚 ℎ𝑡 ⋯𝑃(𝐶𝑜𝑢𝑛𝑡𝑠|ℎ𝑡)

The output y vectors are either a distribution (for categorical input) or the parameters of a multivariate normal distribution (for vector of continuous input features).

13

Page 14: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Training and Anomaly Detecition 14

))(log( ...))(log( ))(log( cPtPrP CTR

r t … … … 23 47 29 35 … 8𝑥𝑡𝑢 =

CountsCategorical

PR PT … … … µC ΣC

c

Multivariate loss

(Anomaly score)

Page 15: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Outline

• Data processing and feature extraction

• Neural networks and our model

• Experiments and results

• Takeaways

15

Page 16: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Experiment Setup

• Split data set into train/dev and test

• Aggregate Features: 408 count features, 6 categorical features

• Test model configurations on range of hyper parameters

• Test best model configurations against standard anomaly detection techniques

16

Page 17: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Evaluation Criteria

• CR-k: Sum of recalls for all budgets up to and including k

17

Page 18: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Baseline Models

( )PCA Reconstruction

Isolation

Forest

One-Class

Support

Vector

Machine

18

=

Page 19: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Dnn vs Rnn vs Baselines 19

Page 20: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Best results: DNN-diag 20

Page 21: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Interpretable Assessments

Decomposed negative log probabilities provide insight into anomaly score

Write to a globally uncommon file locally 12pm-6pm

Copy to a globally

Uncommon file from

Removable media

6am-12pm

77.0%

Other

8.2%7.4%7.4%

Write globally

Uncommon file

locally 6pm-12am

Day 418

True Positive

21

Page 22: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Outline

• Data processing and feature extraction

• Neural networks and our model

• Experiments and results

• Takeaways

22

Page 23: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Takeaways

• Online unsupervised deep learning architecture

• Interpretable assessments

• System meets constraints of the online scenario Assessments in real time

Bounded memory requirements

• System outperforms standard anomaly detection techniques

• Approach is applicable to a more general class of problems

23

Page 24: Deep Learning for Unsupervised Anomaly Detection in ...tuora/aarontuor/materials/...Deep Learning for Unsupervised Insider Threat Detection in Structured Cybersecurity Data Streams

Thank you

• Questions?

24