Page 1
Aram Hovsepyan1, Riccardo Scandariato1,James Walden2, Viet Hung Nguyen3
Wouter Joosen1
1 iMinds-DistriNet, Katholieke Universiteit Leuven2 Northern Kentucky University, 3 University of Trento
Vulnerability Prediction in Android Apps
Monday 10 June 13 week
Page 2
Vulnerability Prediction in Android Apps
Dis
triN
et
Android apps are an attractive target
Android has 75% market share as of Q1 2013 [IDC]
2Monday 10 June 13 week
Page 3
Vulnerability Prediction in Android Apps
Dis
triN
et
Android apps are an attractive target
Google play has over 775K apps and over 48B total installs [IDC, Google I/O keynote]
3Monday 10 June 13 week
Page 4
Vulnerability Prediction in Android Apps
Dis
triN
et
Android apps are an attractive target
App security is not guaranteed by the platform providerÜ Apps that are well intended, but not exploit free
A single vulnerability could affect a massive number of users
Not yet much exploredÜ Focused on Mozilla Firefox / RHEL
4Monday 10 June 13 week
Page 5
Vulnerability Prediction in Android Apps
Dis
triN
et
How to find vulnerabilities?
5Monday 10 June 13 week
Page 6
Vulnerability Prediction in Android Apps
Dis
triN
et
How to find vulnerabilities?
Code inspectionÜ Manual verification is not feasible
Ü Not all apps can afford security experts
Ü Even security experts cannot analyze every line of code
5Monday 10 June 13 week
Page 7
Vulnerability Prediction in Android Apps
Dis
triN
et
How to find vulnerabilities?
Code inspectionÜ Manual verification is not feasible
Ü Not all apps can afford security experts
Ü Even security experts cannot analyze every line of code
Penetration testing / security testing
5Monday 10 June 13 week
Page 8
Vulnerability Prediction in Android Apps
Dis
triN
et
How to find vulnerabilities?
Code inspectionÜ Manual verification is not feasible
Ü Not all apps can afford security experts
Ü Even security experts cannot analyze every line of code
Penetration testing / security testing
Static code analysis
5Monday 10 June 13 week
Page 9
Vulnerability Prediction in Android Apps
Dis
triN
et
How to find vulnerabilities?
Code inspectionÜ Manual verification is not feasible
Ü Not all apps can afford security experts
Ü Even security experts cannot analyze every line of code
Penetration testing / security testing
Static code analysis
MagicÜ Vulnerability prediction models
5Monday 10 June 13 week
Page 10
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction model
6Monday 10 June 13 week
Page 11
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction model
6
Source code
Source code
Source code
Source code
Source code
Source code
Monday 10 June 13 week
Page 12
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction model
6
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Monday 10 June 13 week
Page 13
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction model
6
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Machine learning
Monday 10 June 13 week
Page 14
Vulnerability Prediction in Android Apps
Dis
triN
et
Our research
Predict vulnerable Java files in Android apps!
Predict vulnerable C++ components in Chrome/FirefoxÜ ongoing
Predict vulnerable PHP filesÜ summer work
7Monday 10 June 13 week
Page 15
Vulnerability Prediction in Android Apps
Dis
triN
et
Outline
Existing tools and techniquesÜ Vulnerability prediction models
Our approach
Results
Conclusions and future research
8Monday 10 June 13 week
Page 16
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
9Monday 10 June 13 week
Page 17
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
9
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Source code
Monday 10 June 13 week
Page 18
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
9Monday 10 June 13 week
Page 19
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
Start from a hunch = featureÜ e.g., larger components are more likely to be vulnerable
9Monday 10 June 13 week
Page 20
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
Start from a hunch = featureÜ e.g., larger components are more likely to be vulnerable
Fetch the features from the componentsÜ e.g., calculate the size for each component
9Monday 10 June 13 week
Page 21
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
Start from a hunch = featureÜ e.g., larger components are more likely to be vulnerable
Fetch the features from the componentsÜ e.g., calculate the size for each component
Determine the vulnerabilitiesÜ e.g., National Vulnerability Database, MFSA
9Monday 10 June 13 week
Page 22
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
Start from a hunch = featureÜ e.g., larger components are more likely to be vulnerable
Fetch the features from the componentsÜ e.g., calculate the size for each component
Determine the vulnerabilitiesÜ e.g., National Vulnerability Database, MFSA
Investigate the correlationÜ Use machine learning techniques
9Monday 10 June 13 week
Page 23
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
10Monday 10 June 13 week
Page 24
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
Typical “hunches”Ü Use size and complexity metrics
Ü Leverage developer activity metrics
Ü Leverage code churn metrics
Ü Leverage design churn metrics
Ü Number of import statements
10Monday 10 June 13 week
Page 25
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
Typical “hunches”Ü Use size and complexity metrics
Ü Leverage developer activity metrics
Ü Leverage code churn metrics
Ü Leverage design churn metrics
Ü Number of import statements
Inspired on the defect prediction workÜ Vulnerabilities are actually defects, but much more scarce
(“needle in a haystack”)
10Monday 10 June 13 week
Page 26
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability prediction models
The existing models are fairly complexÜ Typically several versions are necessary to collect all
metrics
Ü Developer activity metrics are required
Ü Code evolution metrics are required
Biased to the underlying “hunch” of the researcher
11Monday 10 June 13 week
Page 27
Vulnerability Prediction in Android Apps
Dis
triN
et
Outline
Existing tools and techniquesÜ Static code analysis
Ü Vulnerability prediction using metrics
Our approach
Results
Conclusions and future research
12Monday 10 June 13 week
Page 28
Vulnerability Prediction in Android Apps
Dis
triN
et
Our approach
Use the source code itself in a tokenized form
Use the token frequency as featuresÜ Simplicity
Ü No explicit assumptions regarding the code characteristics
13
#machine learning
#SPAM filtering
#text analysis
Monday 10 June 13 week
Page 29
Vulnerability Prediction in Android Apps
Dis
triN
et
Our approach
14
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 30
Vulnerability Prediction in Android Apps
Dis
triN
et
Tokenizer
15
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 31
Vulnerability Prediction in Android Apps
Dis
triN
et
Tokenizer
15
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 32
Vulnerability Prediction in Android Apps
Dis
triN
et
Tokenizer
Transform each source code token into a feature vectorÜ each token (“monogram”) is a feature
Ü tokenize by delimiters, mathematical and logical operations. , ! < > [ ] = + - ^ * / etc.
Ü each feature has a count assigned to it
15
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 33
Vulnerability Prediction in Android Apps
Dis
triN
et
Feature vector
16Monday 10 June 13 week
Page 34
Vulnerability Prediction in Android Apps
Dis
triN
et
Feature vector
16Monday 10 June 13 week
Page 35
Vulnerability Prediction in Android Apps
Dis
triN
et
Feature vector
16
package: 1
Monday 10 June 13 week
Page 36
Vulnerability Prediction in Android Apps
Dis
triN
et
Feature vector
16
package: 1, com: 1package: 1
Monday 10 June 13 week
Page 37
Vulnerability Prediction in Android Apps
Dis
triN
et
Feature vector
16
package: 1, com: 1, fsck: 1, k9: 1, import: 2, android: 2, text: 2, util: 1, Rfc822Tokenizer: 2, widget: 1, AutoCompleteTextView:1, Validator: 2, public: 3, class: 1, EmailAddressValidator: 1, implements: 1, CharSequence: 2, fixText: 1, invalidText: 1, return: 2, tokenize: 1, length: 1
package: 1, com: 1package: 1
Monday 10 June 13 week
Page 38
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability assignment
17
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 39
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability assignment
17
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 40
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability assignment
Assign vulnerability to each Java fileÜ use Fortify (static code analyzer) for this task
Ü each file is either vulnerable or clean
17
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 41
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability assignment
Assign vulnerability to each Java fileÜ use Fortify (static code analyzer) for this task
Ü each file is either vulnerable or clean
17
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
package: 1, com: 1, fsck: 1, k9: 1, import: 2, android: 2, text: 2, util: 1, Rfc822Tokenizer: 2, widget: 1, AutoCompleteTextView:1, Validator: 2, public: 3, class: 1, EmailAddressValidator: 1, implements: 1, CharSequence: 2, fixText: 1, invalidText: 1, return: 2, tokenize: 1, length: 1
Monday 10 June 13 week
Page 42
Vulnerability Prediction in Android Apps
Dis
triN
et
Vulnerability assignment
Assign vulnerability to each Java fileÜ use Fortify (static code analyzer) for this task
Ü each file is either vulnerable or clean
17
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
package: 1, com: 1, fsck: 1, k9: 1, import: 2, android: 2, text: 2, util: 1, Rfc822Tokenizer: 2, widget: 1, AutoCompleteTextView:1, Validator: 2, public: 3, class: 1, EmailAddressValidator: 1, implements: 1, CharSequence: 2, fixText: 1, invalidText: 1, return: 2, tokenize: 1, length: 1, vulnerability: 0
package: 1, com: 1, fsck: 1, k9: 1, import: 2, android: 2, text: 2, util: 1, Rfc822Tokenizer: 2, widget: 1, AutoCompleteTextView:1, Validator: 2, public: 3, class: 1, EmailAddressValidator: 1, implements: 1, CharSequence: 2, fixText: 1, invalidText: 1, return: 2, tokenize: 1, length: 1
Monday 10 June 13 week
Page 43
Vulnerability Prediction in Android Apps
Dis
triN
et
Machine learning
18
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 44
Vulnerability Prediction in Android Apps
Dis
triN
et
Machine learning
18
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 45
Vulnerability Prediction in Android Apps
Dis
triN
et
Machine learning
Leverage machine learning techniques to build a prediction modelÜ Training set -> the data used to train the model
Ü Testing set -> the data used to validate the model
Various techniques available (SVM, Naive Bayes, Random Forest, CART, kNN)
18
Source code(Java files)
Tokenizer Feature vectors
Machine Learning
FortifyStatic code
analyzer
Vulnerabilities
Monday 10 June 13 week
Page 46
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
19Monday 10 June 13 week
Page 47
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
Can we predict future versions of an app based on its first version?
19Monday 10 June 13 week
Page 48
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
Can we predict future versions of an app based on its first version?Ü Training set - the first version (v0) of an app
19Monday 10 June 13 week
Page 49
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
Can we predict future versions of an app based on its first version?Ü Training set - the first version (v0) of an app
Ü Testing set - all subsequent versions of that app
19Monday 10 June 13 week
Page 50
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
Can we predict future versions of an app based on its first version?Ü Training set - the first version (v0) of an app
Ü Testing set - all subsequent versions of that app
19Monday 10 June 13 week
Page 51
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
Can we predict future versions of an app based on its first version?Ü Training set - the first version (v0) of an app
Ü Testing set - all subsequent versions of that app
19Monday 10 June 13 week
Page 52
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1
Can we predict future versions of an app based on its first version?Ü Training set - the first version (v0) of an app
Ü Testing set - all subsequent versions of that app
Ü Repeat for all apps
19Monday 10 June 13 week
Page 53
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 2
20Monday 10 June 13 week
Page 54
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 2
Can we build a generalized predictor that works on all apps?Ü Training set - the first version (v0) of an app
Ü Testing set - first versions of all other apps
20Monday 10 June 13 week
Page 55
Vulnerability Prediction in Android Apps
Dis
triN
et
Applications (data from early 2012)
Ü F-droid repository: 01/01/2010->31/12/2011
Ü Selection criteria: open-source, size, number of versions
21Monday 10 June 13 week
Page 56
Vulnerability Prediction in Android Apps
Dis
triN
et
Applications: descriptive statistics
22Monday 10 June 13 week
Page 57
Vulnerability Prediction in Android Apps
Dis
triN
et
Applications: descriptive statistics
23Monday 10 June 13 week
Page 58
Vulnerability Prediction in Android Apps
Dis
triN
et
Outline
Existing tools and techniquesÜ Static code analysis
Ü Vulnerability prediction using metrics
Our approach
Results
Conclusions and future research
24Monday 10 June 13 week
Page 59
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
25Monday 10 June 13 week
Page 60
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
25Monday 10 June 13 week
Page 61
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
25Monday 10 June 13 week
Page 62
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
25Monday 10 June 13 week
Page 63
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
Prediction vs. realityÜ True positive (TP)
Ü True negative (TN)
Ü False positive (FP)
Ü False negative (FN)
25Monday 10 June 13 week
Page 64
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
Prediction vs. realityÜ True positive (TP)
Ü True negative (TN)
Ü False positive (FP)
Ü False negative (FN)
25
TP
TN
TP
TP
FN
FNFN
FPTN
TN
TNFP
Monday 10 June 13 week
Page 65
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
Prediction vs. realityÜ True positive (TP)
Ü True negative (TN)
Ü False positive (FP)
Ü False negative (FN)
Precision: P = TP/(TP+FP)
25
TP
TN
TP
TP
FN
FNFN
FPTN
TN
TNFP
P = 3/5
Monday 10 June 13 week
Page 66
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
Prediction vs. realityÜ True positive (TP)
Ü True negative (TN)
Ü False positive (FP)
Ü False negative (FN)
Precision: P = TP/(TP+FP)
25
TP
TN
TP
TP
FN
FNFN
FPTN
TN
TNFP
Monday 10 June 13 week
Page 67
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
Prediction vs. realityÜ True positive (TP)
Ü True negative (TN)
Ü False positive (FP)
Ü False negative (FN)
Precision: P = TP/(TP+FP)
Recall: R = TP/(TP+FN)
25
TP
TN
TP
TP
FN
FNFN
FPTN
TN
TNFP
R = 3/6
Monday 10 June 13 week
Page 68
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators
Accuracy: percentage of correctly classified filesÜ imagine 90% of the files are clean
Ü saying all files are clean will achieve 90% accuracy
Prediction vs. realityÜ True positive (TP)
Ü True negative (TN)
Ü False positive (FP)
Ü False negative (FN)
Precision: P = TP/(TP+FP)
Recall: R = TP/(TP+FN)
25
TP
TN
TP
TP
FN
FNFN
FPTN
TN
TNFP
Monday 10 June 13 week
Page 69
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators: K9
26Monday 10 June 13 week
Page 70
Vulnerability Prediction in Android Apps
Dis
triN
et
Performance indicators: K9
26
0.00#
0.10#
0.20#
0.30#
0.40#
0.50#
0.60#
0.70#
0.80#
0.90#
1.00#
Mar010#
Apr010#
May010#
Jun010#
Jul010#
Aug010#
Sep010#
Oct010#
Nov010#
Dec010#
Jan011#
Feb011#
Mar011#
Apr011#
May011#
Jun011#
Jul011#
Aug011#
Sep011#
Oct011#
Nov011#
Dec011#
K9Mail#(Random#Forest)#
Precision#
Recall#
Monday 10 June 13 week
Page 71
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1: future predictions
27Monday 10 June 13 week
Page 72
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1: future predictions
27
When do we need to build a new model?Ü Retrain when performance
indicators drop with 10%
Monday 10 June 13 week
Page 73
Vulnerability Prediction in Android Apps
Dis
triN
et
Experiment 1: future predictions
27
Application Retrain (months)AnkiDroid --BoardGameGeek 9ConnectBot --CoolReader 10Crosswords 2FBReader --K9Mail 12KeePassAndroid --MileageTracker 1Mustard --
-- no retraining is required
When do we need to build a new model?Ü Retrain when performance
indicators drop with 10%
Monday 10 June 13 week
Page 74
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
28Monday 10 June 13 week
Page 75
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential features
28Monday 10 June 13 week
Page 76
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential featuresÜ e, Exception, try, catch (error handling)
28Monday 10 June 13 week
Page 77
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential featuresÜ e, Exception, try, catch (error handling)
Ü if (branching)
28Monday 10 June 13 week
Page 78
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential featuresÜ e, Exception, try, catch (error handling)
Ü if (branching)
Ü null (pointer algebra)
28Monday 10 June 13 week
Page 79
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential featuresÜ e, Exception, try, catch (error handling)
Ü if (branching)
Ü null (pointer algebra)
Ü java, org (import statements)
28Monday 10 June 13 week
Page 80
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential featuresÜ e, Exception, try, catch (error handling)
Ü if (branching)
Ü null (pointer algebra)
Ü java, org (import statements)
Ü new, Log (others)
28Monday 10 June 13 week
Page 81
Vulnerability Prediction in Android Apps
Dis
triN
et
Results: most influential features
Most influential featuresÜ e, Exception, try, catch (error handling)
Ü if (branching)
Ü null (pointer algebra)
Ü java, org (import statements)
Ü new, Log (others)
Produced by InfoGain
28Monday 10 June 13 week
Page 82
Vulnerability Prediction in Android Apps
Dis
triN
et
Validity threats
Use of Fortify tool for vulnerability extractionÜ Some research results have shown that there are strong
correlations between static analysis metrics and the quality of reported vulnerabilities
Ü Manual validation seems to confirm our findings (work in progress)!
Ü We are currently validating the same technique on Mozilla Firefox and the results are slightly better than the existing work
29Monday 10 June 13 week
Page 83
Vulnerability Prediction in Android Apps
Dis
triN
et
Conclusions and future research
We have presented a novel technique for predicting vulnerable Java files in Android applicationsÜ The obtained results are very promising
We are working in parallel on 2 additional tracksÜ Vulnerability prediction for Firefox/Chrome in C++
Ü Vulnerability prediction for PHP
30Monday 10 June 13 week
Page 84
Vulnerability Prediction in Android Apps
Dis
triN
et
Bring your own data
We are looking to validate our technique further
If you have data you are willing to share with us, we would be glad to collaborate
31Monday 10 June 13 week