Improving Intrusion Detectors by Crook-Sourcing
—Frederico AraujoIBM Research
The 35th Computer Security Applications Conference
Gbadebo Ayoade, Khaled Al-Naami, Yang Gao, Kevin Hamlen, and Latifur KhanThe University of Texas at Dallas
The research reported herein was supported in part by ONR award N00014-17-1-2995; NSA award H98230-15-1-0271; AFOSR award FA9550-14-1-0173;NSF FAIN awards DGE-1931800, OAC-1828467, and DGE-1723602; NSF awards DMS-1737978 and MRI-1828467; an IBM faculty award (Research); andan HP grant. Any opinions, recommendations, or conclusions expressed are those of the authors and not necessarily of the aforementioned supporters.
Information Asymmetry(Kasparov vs. Deep Blue, 1997)
2
1997: IBM Deep Blue becomes the first machine to beat a chess grandmaster (Garry Kasparov) under tournament conditions.
After the match, Kasparov complains match was unfair:
“It was difficult to prepare for an opponent with no games. … I couldn’t prepare myself properly for such an event. … You have to
know your opponent!” –Garry Kasparov
In contrast, Deep Blue had trained using every match Kasparov had ever played.
IBM Research / June 29, 2018 / © 2018 IBM Corporation
Information asymmetry in cyber defenseAttackers have months or years to study vulnerabilities and defenses
Defenders have seconds to react to never-before-seen attacks
3IBM Research / June 29, 2018 / © 2018 IBM Corporation
3
ISTR, vol. 23, 2018
1 in 13Web requests lead to malware
Edgescan, 2019
Source: Edgescan, 2019
Ponemon, 2019
19.2% of all web application vulnerabilities high or critical (24.9% if internal networks)
4IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ ML offers so much promise for powerful, fast intrusion detection– Face and speech recognition, recommendation systems, natural language translation, …
§ Yet, most deployed IDS solutions are still human rule-based with weak AI support... Why?(1) Unbalanced data: Hard to get enough malicious data to properly train ML-based IDSes
(2) Huge feature space: Security-relevant features within the data not known in advance
(3) Encryption opacity: Encrypted traffic is commonplace and hides much of the best data.
(4) False alarms: High false alarm rates lead to very low base detection rates.
The task of identifying attacks is fundamentally different from other application domains where machine learning is applied
Information asymmetry & ML for intrusion detection
5
Main idea:
When an attack is detected, don’t disconnect it!
Keep the attacker talking to harvest threat data.
Apply automated data mining for IDS training.
IDS learns over time with no data collection burden.
Research Question: Does such an IDS actuallylearn concepts useful for thwarting real attacks?
(Spoiler alert: Yes, with surprising effectiveness!)
IBM Research / June 29, 2018 / © 2018 IBM Corporation
crook-sourcing —noun. the conscription and manipulation of attackers into performing free penetration testing for improved IDS model training and adaptation.
Detected attacks are missed IDS training opportunities
Attack kill chain: a vicious cycle
6IBM Research / June 12, 2019 / © 2019 IBM Corporation
secrets
attack
Attack kill chain: a vicious cycle
7IBM Research / June 12, 2019 / © 2019 IBM Corporation
attack
reject
§ facilitates low-risk reconnaissance§ accentuates the information and time asymmetry that favors attackers§ amplifies the impact of n-day exploits
conventional software security patches advertise themselves to attackers
Enhancing IDSes through crook-sourcing
8IBM Research / June 12, 2019 / © 2019 IBM Corporation
attack
fake secrets
software security patches repurposed as feature extractors
Crook-sourcing advantages
9IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ Deceive attackers into performing free penetration testing for IDS model training and adaptation– attackers contribute their TTP patterns to the data streams processed by the
embedded deceptions– automatically labels malicious attacker behavior
§ Enables (semi-) supervised learning for intrusion detection – improves base detection rates– enables multi-class detection and contextually-richer predictions
§ Overcomes issues related to concept differences between honeypot attacks and those against genuine assets– deceptions are embedded into the actual target of attacks
System architecture
10IBM Research / June 12, 2019 / © 2019 IBM Corporation
UserAttacker
monitoring stream
honey-patched
anomalydetector audit stream
attack traces
System architecture
11IBM Research / June 12, 2019 / © 2019 IBM Corporation
attack detectionattack modeling
featureextraction
data queueing
audit stream
attack traces feature
extraction
attackdata
audit data
attack model classifier
model update
featureextraction
monitoring stream
monitoring data
alerts
UserAttacker
monitoring stream
honey-patched
anomalydetector audit stream
attack traces
Feature set models
12IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ Network features (Bi-Di)− Packet length− Uni-burst size, time, count− Bi-burst size, time
§ System features (N-Gram)− System calls: enter or exit− Bi-, tri-, and quad-events
Attack detection (model 1)
13IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ Bi-Di-SVM: Network features + SVM§ N-Gram-SVM: System features + SVM§ Ens-SVM: ensemble
Attack detection (model 2)
14IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ Bi-Di-OML: Network features + OAML + k-NN§ N-Gram-OML: System features + OAML + k-NN
Online Adaptive Metric Learning
15IBM Research / June 12, 2019 / © 2019 IBM Corporation
Experimental framework
16IBM Research / June 12, 2019 / © 2019 IBM Corporation
UserAttacker
monitoring stream
honey-patched
anomalydetector audit stream
attack traces
UserAttacker
monitoring stream
honey-patched
anomalydetector audit stream
attack tracesUser
Attacker
monitoring stream
honey-patched
anomalydetector audit stream
attack traces
red teaming
Vulnerability Classes
17IBM Research / June 12, 2019 / © 2019 IBM Corporation
Dataset
18IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ Raw data: 42 GB of (uncompressed) network packets and system events over a period of three weeks
§ Training data: after feature extraction, the training data comprised 1800 normal instances and 1600 attack instances
§ Testing data: 3400 normal and attack instances gathered from monitors deployed at unpatched servers, where the distribution of normal and attack instances varies per experiment
§ Red teaming data: collected over three days, 10 graduate students with basic to advanced offensive security skills, average 45 min sessions.
Detection accuracies on simulated environment
19IBM Research / June 12, 2019 / © 2019 IBM Corporation
Bi-Di:networkfeaturesN-Gram:systemfeatures
Red teaming validation
20IBM Research / June 12, 2019 / © 2019 IBM Corporation
Bi-Di:networkfeaturesN-Gram:systemfeatures
False positive rate reduction
21IBM Research / June 12, 2019 / © 2019 IBM Corporation
Crook-sourcing advantage
22IBM Research / June 29, 2018 / © 2018 IBM Corporation
50
55
60
65
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
accu
racy
(%)
no deception deceptive defense
Experiments on synthetic dataapproximating numerous attackers
number of attack classes
Human subject evaluation: a cautionary tale
23IBM Research / June 12, 2019 / © 2019 IBM Corporation
50
55
60
65
70
75
80
85
90
95
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1650556065707580859095
100
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
accu
racy
(%)
no deception deceptive defense
Experiments on synthetic dataapproximating numerous attackers
Experiments with 10 actualhuman attackers (students)
number of attack classes
Monitoring performance
24IBM Research / June 12, 2019 / © 2019 IBM Corporation
Host:16cores,24GBRAM,64-bitUbuntu16.04LTSBenchmarkprofile:c=10,500req/s
Conclusions
25IBM Research / June 12, 2019 / © 2019 IBM Corporation
§ Crook-sourcing yields higher-accuracy detection models– no additional developer effort apart from routine patching activities– effortless labeling of the data
§ Deceive attackers into disclosing their TTP patterns for IDS model evolution– embedded deceptions extract relevant features from attack sessions
§ Enables semi-supervised learning for intrusion detection – Improves base detection rates– Enables multi-class detection and contextually-richer predictions
Thank you
26
Frederico Araujo—[email protected]/faraujo
IBM Research / June 12, 2019 / © 2019 IBM Corporation ©D. Kirat