Cyber-Attack Forecasting: A Proactive Approach to Defensive Cyberwarfare Malachi Jones, PhD Cyber Security Technologist
Aug 07, 2015
Cyber-Attack Forecasting: A Proactive Approach to Defensive Cyberwarfare
Malachi Jones, PhDCyber Security Technologist
About Me(Cyber-security Background)
4
• Georgia Tech (2007-2013)
– Security research collaboration between Georgia Tech (GT) and University of California Santa Barbara (UCSB)
– PhD thesis topic: “Cyber-Attack Forecasting” [1]
• Harris Corporation (2013 – Present)
– (2014) Crypto-system software development and security consultant
– (2015) Cyber Security Vulnerability Researcher
Giovanni Vigna, PhD
Security Researcher
Joao Hespana, PhD
Game Theorist
Jeff Shamma, PhD
Game Theorist
Georgios Kotsalis, PhD
Game Theorist
Malachi Jones, PhD
Security Researcher
Outline
5
• Motivation: Reactive vs. Proactive
• Background
– Game Theory
– Machine Learning
• Cyber-Attack Forecasting
– Modeling a Cyber System
– Analyzing the Model
• Conclusion
• Questions
• Additional Resources
Motivation: Reactive vs Proactive
• Reactive Security
– Backward looking: Addressing
yesterday’s security threats today
– Status quo in Cyber-Security
Community
– Effective against novice hackers
– Inadequate for
• Advanced Persistent Threats (APTs)
• Sophisticated cyberweapons
Teen Hacker in Basement
State Sponsored Hacking
Motivation: Reactive vs Proactive
• Reactive Cyber-Security Process
Hacker Develops New
Technique
Technique tested against
security systems
Technique adopted by
other hackers
Security community eventually responds
Motivation: Proactive Approach (Healthcare)
• Forecasting Infections/diseases
– Reliably Predict the next outbreak
of an infection or disease
– Learn/Estimate the capabilities of
the disease (i.e. Highly contagious)
– Proactive Countermeasures
• Provide vaccinations
• Quarantine infected individuals
• Set up medical facilities near
areas where outbreak likely to be
worst
Motivation: Proactive Approach (Cyber Security)
• Forecasting a cyber attack
– Reliably predict a cyber-attack
– Learn/estimate attacker and/or
malware capabilities
– Launch proactive countermeasures
• Take infected systems offline
• Scrub and reinstall system
• Repressive actions (i.e. sandbox
databases/datastores)
• Perform more invasive “checkups” on
systems likely to be infected
Motivation: Cyber Attack Forecasting
• Forecasting Challenges
– Modeling attacker and cyber system in
an analytical framework
– Computational complexity of analyzing
model to predict future attacks
Background: Game Theory
• Cyber Security
– At least two decision makers (i.e. Cyber
Defender and Attacker)
– Want to predict likely behavior of attacker
– Objective to make “good” decisions to
defend against cyber-attacks
• Game Theory
– Mathematical decision framework
– Provides methods to analyze interactions
among decision makers
– Can allow us to predict the likely actions
of an adversary and recommend
appropriate actions for the defender
Background: Game Theory
• Prisoner‟s Dilemma
– Police arrest two suspects
– Suspects interrogated in separate rooms
– Each suspect can choose an action:
• Cooperate: Stay silent (Not Guilty)
• Defect: Confess and “rat out” the other
suspect (Guilty)
• Analysis of likely behavior of decision maker
– Best outcome for the group is to Cooperate
– Best outcome for the individual is to Defect and rat
out the other person
– Outcome is defect for each decision maker
2,2 5,1
1,5 3,3
C D
D
C
Background: Machine Learning
• Machine Learning:
– Discovering/learning from patterns in collected data
– Can be useful to group „like‟ objects
• Hierarchical Clustering
– Clusters are a group of „like‟ objects
– Builds a hierarchy of clusters
• Agglomerative Clustering
– Bottom up approach to building cluster
– Initially, each object is its own cluster
– Pairs of clusters are merged based on „likeness‟
– Performance: O(n2)
Example of Agglomerative Clustering
Actionable Cyber-Attack Forecasting
14
• Two components of forecasting we will focus on:
Analyzing the Model Using
Game Theoretic Methods
Modeling a Cyber System
Actionable Cyber-Attack Forecasting
15
Analyzing the Model Using
Game Theoretic MethodsModeling a Cyber System
Modeling a Cyber System: A Simple Model
16
• Decision makers: Defender and Attacker
• Actions
– Defender: Rate (xi) to check up on the cyber-health of Host hi
– Attacker: Rate (yi) to attack (e.g. exfiltrate info) from Host hi
• Utility function for Host hi:
where is the cyber-health of hi
• Global Utility:
• Defender objective: Maximize the global utility function
• Zero-sum assumption: Attacker objective inverse of defender
,
Modeling a Cyber System: A Simple Model
17
• A closer inspection of the local utility function of host hi:
• Feasible constraints on the parameters:
• How do we obtain the following information to input into utility function?
– Cyber health of a node
– Parameters: cinfo, rdetect , and cprobe
Information leakage cost.Cost for probing that includes
bandwidth and processing
Reward for detecting malware
and/or a cyber-attack
Estimating Cyber Health: High Level Overview
18
• Machine Learning:
– Use agglomerative clustering algorithm to cluster hosts based on the similarity of
the top 10 active processes with respect to CPU time
– Caution: We need to protect against malicious clusters from forming. We don‟t
want a subset of bad nodes to form their own cluster
– Example stopping criteria to help prevent malicious clusters:
– Since we are using hierarchical clustering, the algorithm will terminate once all
clusters are at least the minimum cluster size
Estimating Cyber Health: High Level Overview
19
• Anomaly Detection:
– Let the health of a node be a function of how far away it is from the center of
mass of its assigned cluster
– Example:
• Let Pi be the set of processes running on host hi
• We will measure the similarity of nodes i and j by using the Jaccard index as follows
below:
• Let be the set of processes that are at least on 75% of machines in the cluster
that host hi is in
• Then
Estimating Utility Function Parameters
20
• Information leakage cost for host hi
– We can borrow an idea from sophisticated cyperweapons like Regin
– Assign higher costs to hosts that are accessed by people that have higher privileges in an organization (IT admins, CEO, CTO, etc…)
• Probing cost for host hi
– Another idea borrowed from sophisticated malware
– Self monitor process cpu/memory/bandwidth usage at different probe rates to derive costs for each host
• Reward for detecting malware
– Determine organizations attribution risk appetite for unknowingly hosting botnets/zombies
– The reward can be proportionate to the resources available for use on a host by a botmaster and/or hacker
Actionable Cyber-Attack Forecasting
21
Analyzing the Model Using
Game Theoretic MethodsModeling a Cyber System
• Suppose the following:
– Defender: Actions are always probe and never probe (i.e. xi = 1 or xi = 0)
– Attacker : Actions are always attack and never attack (i.e. yi= 1 or xi = 0)
• The zero-sum 2X2 matrix game representation for host hi
Analysis with Game Theory
22
NA
P
A
NP
P
NP
NAA
....P
NP
NAA
Analysis with Game Theory
23
• Formulation of game as a general optimization problem:
where s* is the optimal mixed strategy for the defender
• Note: s* is the probability that the defender should always probe
• Key Point: This problem can be formulated as a linear program, which
is computationally more efficient
• Linear Programming Formulation:
Conclusion: Q&A
• Can you really forecast a cyber attack in a real, non-trivial system?
– Yes…Forecasting isn‟t necessarily binary (i.e. either it will happen or not happen)
– The predictiveness can be about intensity/frequency/distribution of an attack in a
system (e.g. Will it get worse? How often will it occur? Where will it spread next? )
– Example: I have a cough. Will this turn into a flu? Can it spread to others?
– All models are wrong, but some models can be useful
• How far in advance could you predict an attack (Lead-time)?
– You don‟t have to predict an event days or weeks in advance for the prediction to
be useful
– Even a 20 minute warning could be the difference between 1,000 users sensitive
information being exfiltrated and 1,000,0000
24
Conclusion: Q&A
• If you can forecast, what approaches/methodologies will you use to
predict cyber attacks in a real world system?
– Machine Learning: Hierarchical clustering of groups of hosts in a system based
on the similarity of processes/services running on each host
– Anomaly Detection: Amongst hosts in a cluster, determining which hosts
behaviors are significantly different and deriving cyber-health for each host
– Game Theory: Mathematical decision framework that can allow us to predict the
likely actions of an adversary and recommend appropriate action for the defender
• What are examples of „actionable‟ decisions in the context of a
defender of a cyber system?
– Probing frequency/intensity: How often should we „check up‟ on a host and how
invasive should the checkup be?
– Should a host stay online, be taken offline, or wiped and reinstalled
25
Conclusion: Q&A
• Are there any connections with healthcare (i.e. modeling/forecasting
infectious diseases like malaria and ebola)?
– There may be a lot of ideas from the medical field that we can borrow that are
relevant and useful in predicting/detecting/treating cyber infections.
– Example: When you go to the doctor for a checkup, they compare your vitals (i.e.
blood pressure, pulse, and body temperature) to what is „normal‟ for someone in
your respective demographic
– We explicitly borrow this concept of deriving cyber-health of a node based on what
is „normal‟ for the cluster.
26
Additional Resources
28
1. M. Jones, G. Kotsalis, and J. Shamma, “Cyber-attack forecast modeling and
complexity reduction using a game-theoretic framework,” in Control of Cyber-
Physical Systems (D. C. Tarraf, ed.), vol. 449 of Lecture Notes in Control and
Information Sciences, pp. 65–84, Springer International Publishing, 2013.
2. Singer, P.W. & Friedman, A. (2014). Cybersecurity: What Everyone Needs to
Know. OUP USA.
3. Zetter, Kim (2014). Countdown to Zero Day: Stuxnet and the Launch of the
World's First Digital Weapon. Crown Publishing Group
4. Jacobs, Jay & Rudis, Bob (2014). Data-Driven Security: Analysis,
Visualization and Dashboards. Wiley Publishing
5. Kleidermacher, D. & Kleidermacher, M. (2012). Embedded Systems Security:
Practical Methods for Safe and Secure Software and Systems Development.
Additional Resources
29
6. Ferguson, Niels, Schneier, Bruce & Kohno, Tadayoshi (2010). Cryptography
Engineering: Design Principles and Practical Applications. Wiley Publishing
7. Gebotys, C.H. (2009). Security in Embedded Devices. Springer
8. Anderson, R., "Why information security is hard - an economic perspective,"
Computer Security Applications Conference, 2001. ACSAC 2001.
Proceedings 17th Annual , vol., no., pp.358,365, 10-14 Dec. 2001