Applying AI to automate threat detection and empower threat hunting Darren Gale, Senior Director, High Growth Markets, Vectra.ai
Applying AI to automate threat detection and empower threat huntingDarren Gale, Senior Director, High Growth Markets, Vectra.ai
I propose to consider the question, 'Can machines think?
The Turing test….
“if a computer could fool people into thinking that they were
interacting with another person, rather than a machine, then it
could be classified as possessing artificial intelligence”
Essentially can a machine do in part the function of a role
occupied by a human?
Alan Mathison Turing OBE FRS
What is AI?
Artificial Intelligence: technologies that
approximate the human mind
Machine Learning: algorithms learn to make
predictions from data
Representation Learning: learning abstract
representations of data
Deep Learning: learning a series of
hierarchical abstractions to make predictions
from data
Adapted from “Deep Learning,” Goodfellow, Bengio & Courville (2016)
Deep
Learning
Representation
Learning
Machine Learning
Artificial Intelligence
From Artificial Intelligence to Deep Learning
Supervised
Labeled Data Are Available
Learn to Predict Label from Data
Unsupervised
No Labeled Data Are Available
Discover Structure in the Data
Types of Learning: Supervised vs. Unsupervised
SU
PE
RV
ISE
D
UN
SU
PE
RV
ISE
D
SHALLOW
DEEP
K-MeansDBSCANLogistic
RegressionKNN
PCASVM
One-Class
SVM
GMMNaïve Bayes
HMM
RBE
MDN
Decision
Tree Random
Forest
Isolation
Forest
Deep
AutoencoderDeep Neural Network
Network Embeddings
ARTMAPART
RBMPerceptron
DBN
Neural Networks
A Broad spectrum of learning approaches
Telephone
central office
Automotive
manufacturing
Electronics
manufacturing
1930s — 1950s
Industries that underwent radical transformation
Telephone
central office
Automotive
manufacturing
Electronics
manufacturing
Would this have felt like AI?
Big data set and looking for a needle in a haystack…..
Exoplanets Sub atomic particles
…………what about Cyber?
What are the best use case for AI?
Naive Bayes has been studied extensively since the 1950s.
Quadratic Classifier – 1965 Paper by TM Cover
Deep learning - Gradient-based learning applied to document recognition 1988
The first algorithm for random decision forests by Tin Kam Ho 1995
1986 “Induction of Decision trees” – regression tree development, Quinlan, J. R
Richard Sutton (1988). "Learning to predict by the methods of temporal differences”
Lloyd, S. P. (1957). "Least square quantization in PCM"
Frey & Dueck (2007). "Clustering by passing messages between data points"
Why now for AI in Information Security?
Mathematics developments + Compute power = 4th industrial revolution (AI)
Source: Wong 2011
Cybersecurity skill gap →Manual hunting alone won’t
work.
>350,000 unfilled European cybersecurity
jobs than candidates by 2022*
Signatures cannot detect skilled adversaries.
1 2
10%13%
18%
23%
30%
39%
50%
60%
0%
10%
20%
30%
40%
50%
60%
70%
2013 2014 2015 2016 2017 2018 2019 2020
2013-2020 CAGR + 29.2%
Source: Gartner, CS Communications Infrastructure Team,
Credit Suisse Research
% of IT security budget dedicated to detection and response
Enterprise security budget has been shifting
Traditional signatures AI
How the threat looks
Find threats that you’ve seen before
Snapshot in time
No local context
What the threat does
Find what all threats have in common
Learning over time
Local learning and context
Exact match of
previously seen threat
Finds methods, even if IoCs are
unknown
AI detects what signatures cannot. Efficiently.
DetectFind Post-Compromise Activity
SOC Analysts
Threat hunting teams
Legacy technologies deeper in the network
100+ days to find attackers
Security Gap
Source: M-Trends 2018
EMPOWER THREAT
HUNTERS WITH AI
Logs
Forensic tools
IR Consultants
Clean-upContain and Remediate
Firewalls and IPS
Endpoint AV
Sandboxes
Web & Email Security
PreventStop Initial Compromise
Never 100%
effective.
Early detection
is key.
AI is Required to Reduce Attack Dwell Time
Endpoint SIEM Network
Where can you apply AI in the context of Cyber?
Security ResearchCharacterize fundamental attacker
behaviors
Data ScienceML models to accurately detect behaviors
Attacker Behavior modelsHigh-fidelity, signatureless detection
Command and Control
Advanced C2: human control
Botnet C2
Reconnaissance
Network sweeps and scans
Advanced: AD, RPC, shares
Lateral Movement
Stolen accounts
Exploits
Backdoors
Exfiltration
Data movement
Methods, e.g. tunnels
10 Patents Awarded
20+ Patents Pending
Collect advanced
attack samples
Come up with advanced attacks
Abstract the behavior and form a theory
Collect positive and
negative samples
Extract features out
of the samples
Work the theory on
offline data
Refine into detection
model
Deploy and test on
live data
Review results
Design UI Develop UIPut detection
into production
Check efficacy; improve where
necessary
Improve and redeploy
Improve and redeploy
Security Researchers Security Researchers + Data Scientists
Product Designer
Developers
What it takes to build and maintain an algorithm
Command and Control
External Remote Access
Hidden DNS Tunnel
Hidden HTTP/S Tunnel
Suspicious Relay
Suspect Domain Activity
Malware Update
Peer-to-Peer
Pulling Instructions
Suspicious HTTP
Stealth HTTP Post
TOR Activity
Threat Intel Match
Reconnaissance
Internal Darknet Scan
Port Scan
Port Sweep
SMB Account Scan
Kerberos Account Scan
File Share Enumeration
Suspicious LDAP Query
RDP Recon
RPC Recon
Lateral Movement
Suspicious Remote Exec
Suspicious Remote Desktop
Suspicious Admin
Shell Knocker
Automated Replication
Brute-Force Attack
SMB Brute-Force
Kerberos Brute Force
Suspicious Kerberos Client
Suspicious Kerberos Account
Kerberos Server Activity
Ransomware File Activity
SQL Injection Activity
Exfiltration
Data Smuggler
Smash and Grab
Hidden DNS Tunnel
Hidden HTTP/S Tunnel
Botnet Monetization
Abnormal Web or Ad Activity
Cryptocurrency Mining
Brute-Force Attack
Outbound DoS
Outbound Port Sweep
Outbound Spam
10 Patents Awarded
20+ Patents Pending
Silent tripwires across the KillChain
Data: Samples of DNS Command and Control
channel traffic and normal DNS Traffic.
Features and Separability: Decomposed into a
set of features that are linearly separable.
Model Choice: Linearly separable labeled
dataset? Good candidate for straightforward,
highly interpretable Logistic Regression.
Example 1: Supervised learning
Data: Samples of Remote Access Tool traffic
and normal traffic.
Features and Separability: Timeseries with
traffic statistics at each moment in time; not
even close to linearly separable
Model Choice: Not linearly separable? Inputs
are timeseries rather than static vectors?
Requires a Recurrent, Deep Neural Network.
Example 2: Supervised learning
Data: DCE/RPC data for UUIDs performing
remote code execution across Vectra installed
base
Features and Constraints: Timeseries of [uuid,
src, dst, account] tuples on DCE/RPC
Model Choice: Custom novelty detector
anchored on UUIDs to detect unexpected remote
execution
Example 3: Unsupervised learning
Cognito Detect
Host ScoringScores host risk based on a
combination of behaviors over time
Detection ScoringScores risk of a single attacker
behavior based on similar activity over time
Attacker Behavior Detection
AttackCampaigns
Identifies relatedbehaviors across hosts
Packets(primary data source)
IoCs
Syslog (from
authentication)
Automation Stack
Automated Hunting for attacker behaviours
Vectra is the only visionary
Source: Gartner Magic Quadrant for Intrusion Detection and Prevention Systems
January, 2018. ID Number: G00324914
All statements in this report attributable to Gartner represent
Vectra’s interpretation of data, research opinion or viewpoints
published as part of a syndicated subscription service by Gartner,
Inc., and have not been reviewed by Gartner. Each Gartner
publication speaks as of its original publication date (and not as of
the date of this presentation. The opinions expressed in Gartner
publications are not representations of fact, and are subject to
change without notice.
in the Gartner 2018 Magic
Quadrant for Intrusion
Detection and Prevention
Systems
We chose Vectra as the winner because it prioritizes threats and reduces attacker dwell time witha lightweight solution.
– Tim WilsonEditor in Chief, Dark Reading, 2016
200K hosts monitored Red team detected: hosts → Critical Only 2 hosts per day overall → Critical
Workload reduction of 44X
Cognito separates high-fidelity signal from the daily noise
30 day analysis of Vectra’s value in large customer
DetectFind Post-Compromise Activity
Logs
Forensic tools
IR Consultants
Clean-upContain and Remediate
Firewalls and IPS
Endpoint AV
Sandboxes
Web & Email Security
PreventStop Initial Compromise
Never 100%
effective.
Early detection
is key.
REDUCE DWELL TIME
AUTOMATE FOR
EFFICIENCY
Empowering Threat Hunters with AI
Thank you