Model inference to support detection of vulnerabilitiesseminaire-dga.gforge.inria.fr/2013/20140523_RolandGroz.pdf · Fault location SUV source code Source based inference Trace-driven
Post on 29-Jun-2020
0 Views
Preview:
Transcript
Model inference to support detection of vulnerabilities
Application to Web apps & services
Roland GROZ LIG, Université de Grenoble Alpes Savoie, France
Séminaire DGA Sécurité & MF Rennes 23 mai 2014
Acknowledgments Joint work (for inference) with
(Team:) C. Oriat, J-L. Richier, K. Hossen, F. Duchène Muzammil Shahbaz (U. Sheffield) Keqin Li Alexandre Petrenko (CRIM, Canada) SPaCIoS partners, esp: M. Minea, PF Mihancea, I.
Soara (IeAT Timisoara) Inspiration & discussion with:
Doron Peled, David Lee, Kirill Bogdanov, Neil Walkinshaw, + U. Dortmund (& Uppsala)
2
Motivational examples
3
Reverse-engineer
Internet-Box from supplier
(Game)
Identify hidden behaviours of
3rd party s/w to put on Orange (trusted Apps)
Learning interactions of
home appliances
Get security model of 3rd party
component
Retailer
Customer
Couponing
Key issues: BB, 3rd party, trust, integration...
Outline Why should we INFER models ? Basic techniques used SPaCIoS project: security in Internet of
Services (Web apps) Inference of security models for Web apps
SIMPA tool (Simpa Infers Models Pretty Automatically)
Inference of state + data flow for smart fuzzing KameleonFuzz 4
Benefits of models Tool-supported automatic analysis Thorough analysis Spotting tricky – unexpected behaviours Confronting
expectations <-> reality specification <-> implementation
and of course:
5
Bread & butter for attendees of this seminar on security and formal methods
6
Who wants to write models of 3rd party or legacy components ? How many s/w engineers actually use models ?
How do I perform system behavioral analysis? How do I identify integration problems ?
Example: 3rd party components Behaviors
Interactions Validation
Understanding the System of
Black Box Components is a challenge
Our goal Reverse engineer behavioural MODELS
not code or design from BLACK BOX systems
3rd party remote access or too complex, not designed with models, etc
just by TESTING at interfaces in order to feed into model-based tools,
esp. for VULNERABILITY detection 7
Outline Why should we INFER models ? Basic techniques used SPaCIoS project: security in Internet of
Services (Web apps) Inference of security models for Web apps
SIMPA tool (Simpa Infers Models Pretty Automatically)
Inference of state + data flow for smart fuzzing KameleonFuzz 8
Various types of Machine Learning Artificial Intelligence (& datamining)
Ability to infer rules, recognize patterns Learning from samples E.g. neural networks
Two major techniques (among others) Statistical (bayesian) inference from collection
of data -> e.g. Weka tool in testing
Grammatical inference of language from theoretical computer science
9
10
Learning languages from samples "Learning from given positive/negative samples”
• Finding a minimum DFA (Deterministic Finite Automaton) is NP-HARD
– Complexity of automaton identification from given data. [E. Gold 78]
• Even a DFA with no. of states polynomially larger than the no. of states of the minimum is NP-Complete
– The minimum consistent DFA problem cannot be approximated within any polynomial. [Pitt & Warmuth 93]
• Probably Approximately Correct (PAC) – A theory of the learnable. [L.G. Valiant 84]
11
Active learning Active Learning
"Learning from Queries”: inference algorithm can query an oracle of the language
Angluin's Algorithm L* [Angluin 87] Reference algorithm Two types of queries: membership, equivalence Learns Deterministic Finite Automaton (DFA)
in polynomial time key assumption : Minimally Adequate Teacher (MAT)
Applied in formal Software Engineering Black Box Checking [Peled 99] Learning and Testing Telecom Systems [Steffen 03] Protocol Testing [Shu & Lee 08] …
Dana Angluin Yale University
and very active research field in model-based techniques
12
Our Context of Inference (testing s/w)
The Algorithm L*
Input Alphabet Σ
Final DFA Conjecture
Oracle
Black Box Machine
System of Communicating
Black Box Components
Components having I/O behaviours I/O are structurally complex (parameters) Formidable size of input sets
Enhanced State Machine Models Mealy Machines Parameterized Machines
Test Strategies and heuristics Learned Models can be used to generate tests to find discrepancies
Poor (cheap) oracles
Our techniques (Grenoble-LIG) Extensions to Angluin-style learning
parameterized, symbolic automata arbitrary (infinite) data types (esp. strings) security-oriented models & features
Combining Control-Flow + Data inference machine learning statistical algos
New algorithms based on quotients Tailorable + adaptive abstraction Targeting specific parts of behaviour
13
Outline Why should we INFER models ? Basic techniques used SPaCIoS project: security in Internet of
Services (Web apps) Inference of security models for Web apps
SIMPA tool (Simpa Infers Models Pretty Automatically)
Inference of state + data flow for smart fuzzing KameleonFuzz 14
SPaCIoS Tool: penetration testing, security testing, model checking, automatic learning.
Assess on security testing problem cases (industrial/open-source IoS application scenarios).
Migrate to industry (SAP, SIEMENS), standardisation bodies, open-source communities.
SPaCIoS objectives and approach
Model of the SUV
Abstract execution trace
Test case
The SPaCIoS Tool
Test Execution Engine
VulnerabilitiesAttack Patterns
Security GoalsAttacker Models
User Interface
Model of theSUV
Securitygoals
Userguidance
SecurityAnalyst
Model inference and adjustment
Property-driven and vulnerability-driven
test case generation
Libraries
Test
Res
ults
Model ofthe attacker
Faultlocation
SUVsource
code
Sourcebased
inference
Trace-driven faultlocalization
SUV
ASLan++ models for cryptographic protocols & security of services
Model checking + Model-based testing
Some Web services may be black boxes: inferred
Models used to identify potential attacks and test them
Security testing problem cases
The User Interface (UI)
UI mediates interaction betweenSecurity Analyst and core componentsof the SPaCIoS Tool.
It supports the definition of1 Ma: abstract model of system2 Mc : concrete model of system3 �: correspondence between Ma and Mc ,4 ⇥: correspondence between Mc and SUV,
SAML Authentication Protocol
C IdP SPS1. GET URI
A1. HTTP302 IdP?SAMLRequest=AuthReq(ID, SP)&RelayState=URI
A2. GET IdP?SAMLRequest=AuthReq(ID, SP)&RelayState=URI
IdP builds an authentication assertionAA = AuthAssert(ID, C, IdP, SP)A3. HTTP200 Form(. . .)
A4. POST SP, Response(ID, SP, IdP, {AA}K�1IdP
), RelayState(URI)
S2. HTTP200 Resource(URI)
�
⇥
Ma
Mc
SUT
Moreover the UI lets the Security Analyst to1 inspect the status of the SPaCIoS Tool,2 execute, manage, and debug tests.
A. Armando (U. of Genova) SPaCIoS Mock-up SPaCIoS Meeting, Sep 13, 2011 6 / 11
Objectives Results
WP 2 Models
Model writing, learning, extracting Security goals Attacker behavior & vulnerability
models Attack combination
Modelling language and Libraries ASLan++ of security aspects & goals Attack patterns, vulnerabilities, attacker
models Chained attacks
Model inference: Simpa
Model
Property
Model Checker
Attack trace
Real system
Black-box model inference
Interactions with the system
ASLAN++ model
Partial model
No model
Models? Aslan++?
Model extraction: jModex
KameleonFuzz
Inferred tainted model
Attack input grammar
Genetic fuzzing
Real system
Vulnerabilities?
Model of the SUV
Abstract execution trace
Test case
The SPaCIoS Tool
Test Execution Engine
VulnerabilitiesAttack Patterns
Security GoalsAttacker Models
User Interface
Model of theSUV
Securitygoals
Userguidance
SecurityAnalyst
Model inference and adjustment
Property-driven and vulnerability-driven
test case generation
Libraries
Test
Res
ults
Model ofthe attacker
Faultlocation
SUVsource
code
Sourcebased
inference
Trace-driven faultlocalization
SUV
Interactions with the system
Outline Why should we INFER models ? Basic techniques used SPaCIoS project: security in Internet of
Services (Web apps) Inference of security models for Web apps
SIMPA tool (Simpa Infers Models Pretty Automatically)
Inference of state + data flow for smart fuzzing KameleonFuzz 20
Black Box inference of Security Models
Extend inference methods to deal with SPaCIoS security protocols and Web applications Non-Deterministic Values (NDV, e.g. nonces) Parameters recorded in local variables (sessions IDs, cookies
etc)
Combining state and data inference Angluin-style observation tables Data tables to record I/O parameters New inference algorithms Combined with Weka statistical tool for parameter associations Implemented in SIMPA (open source)
21
Needham Schroeder model
22
s0
s1
m1(p1)/ p2 = p1, p3 = ndv,
m2(p2, p3), v3 := p3
s2 m3(p4), p4 = v3/ OK
m3(p4), p4 ≠ v3/ KO
m1(p1)/ p2 = p1, p3 = ndv,
m2(p2, p3), v3 := p3
Extended FSM model of NSPK Responder
[ε] m1(p1)/ p2 = p1, p3 = ndv,
m2(p2, p3), v3 := p3
[m1(5)]
m3(p4), p4 = v3/ OK
m3(p4), p4 ≠ v3/ KO
m3(p4)/ Ω
m1(p1)/ Ω
Inferred EFSM model
Inference algorithm
23
S0
Inputs : m1, m3 Output : m2, OK, KO
Inference algorithm
24
S0
M1(5)/m2(5,423)
M3(5)/Ω
S?
S?
Inputs : m1, m3 Output : m2, OK, KO
Inference algorithm
25
S0
M1(5)/m2(5,423)
M3(5)/Ω
S?
S?
M1(5)/ Ω
M3(5)/ KO
M1(5)/m2(5,867)
M3(5)/ Ω
Inputs : m1, m3 Output : m2, OK, KO
Inference algorithm
26
S0
M1(5)/m2(5,423)
M3(5)/Ω
S?
S?
M1(5)/ Ω
M3(5)/ KO
M1(5)/m2(5,867)
M3(5)/ Ω
Inputs : m1, m3 Output : m2, OK, KO
Inference algorithm
27
S0
M1(5)/m2(5,423)
M3(5)/Ω
S1
M1(5)/ Ω
M3(5)/ KO S?
S?
Inputs : m1, m3 Output : m2, OK, KO
Table-based inference algorithm
28
Control table Data table
29
Nonce !
30
New behavior !
Infering guards of transitions Data mining For S1: m3/KO
((10), [(init)(init)(5)(5, 882)(init)] -> (0)) ((11), [(init)(init)(5)(5, 332)(init)] -> (0)) ((10), [(init)(init)(5)(5, 258)(init)] -> (0))
For S1: m3/OK ((887), [(init)(init)(5)(5, 887)(init)] -> (1))
((529), [(init)(init)(5)(5, 529)(init)] -> (1)) ((175), [(init)(init)(5)(5, 175)(init)] -> (1))
31
DataMining algos and decision tree (J48 : nominal/string, M5P : integer)
32
KO OK
Data[0] == Data[5] Data[0] != Data[5]
Abstraction extraction
33
Connecting to the real System (SUV) Model inference on BB requires a driver to interact with the
system (abstraction/concretization)
Writing such drivers is time-consuming
Automating driver construction for HTTP interactions based on crawling techniques: page models automatic identification of inputs, and abstracting outputs !
(Magic !)
SIMPA includes automatic generation of driver for Web applications: recognizes relevant input and output abstractions
34
Experiments
35
WebGoat (Stored XSS lesson)
Automatic abstraction (& driver generation) 11 inputs found / 11 6 outputs found / 6
Output parameters 16 1 in the main page 15 in the profile page
36
WebGoat (Stored XSS lesson)
Parameter for the main page
37
WebGoat (Stored XSS lesson) Parameters for the profile page (which contains the XSS)
The name
and the fields of the profile are detected by the crawler
XSS attack is detected by the SPaCIoS tool
38
Inferred model
39
s0
s1
LoginV(user,pass)/ {valid credentials}
listing(status)
Logout(profileID)/ home(status)
LoginI(user,pass)/ {invalid credentials}
home(status)
viewProfile(profileID)/ Profile(status)
updateProfile(profileID)/ Profile(status)
editProfile(profileID)/ editionPage(status)
Login(user,pass)/ {valid credentials}
listing(status)
Outline Why should we INFER models ? Basic techniques used SPaCIoS project: security in Internet of
Services (Web apps) Inference of security models for Web apps
SIMPA tool (Simpa Infers Models Pretty Automatically)
Inference of state + data flow for smart fuzzing KameleonFuzz 40
Fabien Duchène (PhD 2/6/2014) with Roland Groz, Sanjay Rawat & Jean-Luc Richier
23/05/14 KameleonFuzz - XSS fuzzing 41
KameleonFuzz precise & automatic detection of XSS
Black Box XSS detection
42
23/05/14 KameleonFuzz - XSS
fuzzing 43
B. Approximate Taint Dataflow
- taint inputs- infer taint in outputs
- annotate model
C.1. Malicious Inputs Generation
- generate inputs
C.2. Precise Taint Dataflow
- attack successful?
A. Inferring SUT state
model
Evolutionary Algorithm
if new page or state discovered
evolve inputs
Approach Overview
D.
A. Model Inference
Form Page model Page clustering Navigation
23/05/14 KameleonFuzz - XSS
fuzzing 44
B. Approximate Taint Dataflow
- taint inputs- infer taint in outputs
- annotate model
C.1. Malicious Inputs Generation
- generate inputs
C.2. Precise Taint Dataflow
- attack successful?
A. Inferring SUT state
model
Evolutionary Algorithm
if new page or state discovered
evolve inputs
D.
B. Model Annotation (taint inference)
23/05/14 KameleonFuzz - XSS
fuzzing 45
B. Approximate Taint Dataflow
- taint inputs- infer taint in outputs
- annotate model
C.1. Malicious Inputs Generation
- generate inputs
C.2. Precise Taint Dataflow
- attack successful?
A. Inferring SUT state
model
Evolutionary Algorithm
if new page or state discovered
evolve inputs
D.
B. SUT Model annotation
23/05/14 KameleonFuzz - XSS
fuzzing 46
We annotate the model for type-1 & type-2 REFLECTIONS
Ex: of reflection annotation
23/05/14 KameleonFuzz - XSS
fuzzing 47
Ex. of annotated SUT model: type-1 XSS
KameleonFuzz - XSS fuzzing 48 23/05/14
23/05/14 KameleonFuzz - XSS
fuzzing 49
• Attack Input Grammar • Mutation & Crossover • Fitness & Test Verdict
B. Approximate Taint Dataflow
- taint inputs- infer taint in outputs
- annotate model
C.1. Malicious Inputs Generation
- generate inputs
C.2. Precise Taint Dataflow
- attack successful?
A. Inferring SUT state
model
Evolutionary Algorithm
if new page or state discovered
evolve inputsC. Evolutionary Fuzzing
D.
Attack Input Grammar
23/05/14 KameleonFuzz - XSS
fuzzing 50
Payloads production
rules
several realistic
payloads
anti-filtersproduction
rules
considered SUT filters
Attack Input
Grammar
browser specific stringtransformations
Sets of Attack Vectors (evtly
structured)
hacker sources
(Shazzer ..)
Scanner comparisons
23/05/14 KameleonFuzz - XSS
fuzzing 51
Conclusion Model inference to detect vulnerabilities:
it works ! ready to become key ingredient in
combination with other techniques Black box inference is powerful enough
White box suffers from many limitations (aliasing, external libraries, scaling…)
But can be complementary (e.g. guard inference vs constraint solving)
52
Some perspectives Model inference can help reverse
engineering (& understanding) code Possibly malware ?
Many ways to combine smart fuzzing & inference “Feedback” loop fuzzing -> model
Combining WB & BB inference Binary code analysis enhanced with
behavioural model
53
top related