Top Banner
Text Analytics for Mobile App Security and Beyond Tao Xie University of Illinois at Urbana-Champaign 1 taoxie@illino is.edu
55
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

1

Text Analytics for Mobile App Security and Beyond

Tao XieUniversity of Illinois at Urbana-Champaign

[email protected]

Page 2: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Mobile App Markets

Apple App Store Google Play Microsoft Windows Phone

Page 3: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

App Store beyond Mobile Apps!

Page 4: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

What If Formal Specs Are Written?!

4

APP DEVELOPERS

APP USERS

App Functional Requirements

App Security Requirements

User Functional Requirements

User Security Requirements

informal: app description, etc. permission list, etc.

Page 5: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Informal App Functional Requirements: App Description

5

App Code

App Permissions

Page 6: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

App Security Requirements: Permission List

6

Page 7: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

What If Formal Specs Are Written?!

7

APP DEVELOPERS

APP USERS

App Functional Requirements

App Security Requirements

User Functional Requirements

User Security Requirements

informal: app description, etc. permission list, etc.

Page 8: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Example Andriod App: Angry Birds

8

Page 9: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

What If Formal Specs Are Written?!

9

APP DEVELOPERS

APP USERS

App Functional Requirements

App Security Requirements

User Functional Requirements

User Security Requirements

In reality, few of these requirements are (formally) specified!! Hope?!: Bring human into the loop: user perception + judgment

informal: app description, etc. permission list, etc.

Page 10: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Our Yin-Yang View on Mobile App Security

10

App Description

App Code

App Permissions

User-Perceived Information

App Security Behavior

o Reason about user-perceived info, e.g., WHYPER ( )

o Push app security behavior across the boundary ()

o Check consistency across the boundary ()

o Reduce user judgment effort ( )

App UIs, App categories, App metadata, User forums, …

[functional]

[security]

Page 11: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

11

oApple (Market’s Responsibility)o Apple performs manual inspection

oGoogle (User’s Responsibility)o Users approve permissions for security/privacyo Bouncer (static/dynamic malware analysis)

oWindows Phone (Hybrid)o Permissions / manual inspection

Assuring Market Security/Privacy

Page 12: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

12

o Previous approaches look at permissions code (runtime behaviors)

o What does the users expect?o GPS Tracker: record and send locationo Phone-Call Recorder: record audio during phone call

Need More Than Program Analysis

App Description

App Code

App Permissions

Page 13: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

13

oUser expectationso user perception + user judgment

o Focus on permission app descriptionso permissions (protecting user understandable

resources) should be discussed

Vision“Bridging the gap between

user expectation app behaviors”

App Description Sentence Permission

Linkage

Page 14: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

14

WHYPER Overview

Application Market

WHYPER

DEVELOPERS

USERSPandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security 2013http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf

• Enhance user experience while installing apps• Enforce functionality disclosure on developers• Complement program analysis to ensure justifications

Page 15: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

15

Example Sentence in App Desc.• E.g., “Also you can share the yoga

exercise to your friends via Email and SMS.” – Implication of using the contact

permission– Permission sentences

Keyword-based search on application descriptions

Page 16: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

16

Problems with Ctrl + F

• Confounding effects:– Certain keywords such as “contact” have a

confounding meaning – E.g., “... displays user contacts, ...” vs “... contact

me at [email protected]”.

• Semantic inference: – Sentences often describe a sensitive operation

without actually referring to keywords – E.g., “share yoga exercises with your friends via

Email and SMS”

Page 17: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Natural Language Processing

• Natural Language Processing (NLP) techniques help computers understand NL artifacts

• In general, NLP is still difficult

• NLP on domain specific sentences with specific styles is feasible– Text2Policy: extraction of security policies from use cases [FSE

12]– APIInfer: inferring contracts from API docs [ICSE 12]– WHYPER: domain knowledge from API docs [USENIX Security 13]

Page 18: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

18

Overview of WHYPER

APP Description

APP Permission

SemanticGraphs

PreprocessorIntermediate

RepresentationGenerator

SemanticEngine

NLP Parser

Semantic GraphGeneratorAPI Docs

AnnotatedDescription

FOLRepresentation

WHYPER

Domain Knowledge

Page 19: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

19

Preprocessor• Period Handling

– Decimals, ellipsis, shorthand notations (Mr., Dr.)

• Sentence Boundaries– Tabs, bullet points, delimiters (:)– Symbols (*,-) and enumeration sentence

• Named Entity Handling– E.g., “Pandora internet radio”

• Abbreviation Handling– E.g., “Instant Message (IM)”

Page 20: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

20

Intermediate-Representation Generator

Alsoyoucanshare yogaexercisetoyourfriendsviaEmailandSMSVBRB PRP MD NNDT NN NNSPRP NNP NNP

the

Alsoyou

can

share

exercise

yourfriends

EmailSMS

yoga

advmodnsubjauxdobj

detnn

prep_topossprep_via

conj_and

the

shareto

youyoga exercise

ownedyouvia

friendsand

EmailSMS

Predicate

Governing

Entity

DependentEntit

y

Page 21: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Semantic Engine

shareto

youyoga exercise

ownedyouvia

friendsand

EmailSMSEmail

share

WordNet Similarity

21

Inferred from API

DocsGoverning

Entity

DependentEntit

y

Page 22: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

22

Systematic approach to infer graphso Identify resource associated with the permissions

from the API class nameo ContactsContract.Contacts

o Inspect the member variables and member methods to identify actions and subordinate resourceso ContactsContract.CommonDataKinds.Email

Semantic-Graph Generator

Page 23: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

23

Evaluation• Subjects

– Permissions: • READ_CONTACTS • READ_CALENDAR • RECORD_AUDIO

– 581 application descriptions – 9,953 sentences

• Evaluation setup– Manual annotation of the sentences– WHYPER for identifying permission sentences– Comparison to keyword-based searching

Page 24: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

24

Evaluation Results

• Precision and recall of WHYPER – Average precision (82.8%) and recall (81.5%)

• Comparison to keyword-based searching – Improving precision (41.6%) and recall (-1.2%)– E.g., microphone-blow into and call-record

Permission KeywordsREAD_CONTACTS contact, data, number,

name, emailREAD_CALENDAR calendar, event, date,

month, day, yearRECORD_AUDIO record, audio, voice,

capture, microphone

Page 25: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Access Control Policies (ACP) in Requirements Document

• Access control is often governed by security policies called Access Control Policies (ACP)– Includes rules to control which principals have access to

which resources

• A policy rule includes four elements– Subject – HCP – Action – edit– Resource - patient's account– Effect - deny

“The Health Care Personnel (HCP) does not have the ability to edit the patient's account.”

ex.

Page 26: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Overview of Text2Policy

A HCP should not change patient’s account.

An [subject: HCP] should not [action: change] [resource: patient’s account].

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Linguistic Analysis

Model-Instance Construction

TransformationXiao et al. Automated Extraction of Security Policies from Natural-Language Software Documents. FSE 2012. http://web.engr.illinois.edu/~taoxie/publications/fse12-nlp.pdf

Page 27: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Example Technical Challenges in ACP Extraction

• Semantic Structure Variance– different ways to specify the same rule

• Negative Meaning Implicitness– verb could have negative meaning

ACP 1: An HCP cannot change patient’s account.ACP2: An HCP is disallowed to change patient’s account.

Page 28: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Road Ahead: Yin-Yang View

28

App Description

App Code

App Permissions

User-Perceived Information

App Security Behavior

o Reason about user-perceived info, e.g., WHYPER ( )

o Push app security behavior across the boundary ()

o Check consistency across the boundary ()

o Reduce user judgment effort ( )

App UIs, App categories, App metadata, User forums, …

[functional]

[security]

Page 29: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Text Analytics for Mobile App Security and Beyond

29

App Description

App Code

App Permissions

[email protected]

App UIs, App categories, App metadata, User forums, …

Acknowledgments: Supported in part by NSA Science of Security (SoS) Lablet, NSF SaTC, NSF SHF, NSF CAREER

Page 30: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

30

Page 31: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

31

Problems with Ctrl + F

o Confounding effects:

o Certain keywords such as “contact” have a confounding meaning. o For instance, “... displays user contacts, ...” vs “... contact me at [email protected]”.

o Semantic Inference:

o Sentences often describe a sensitive operation such as reading contacts without actually referring to keyword “contact”.

o For instance, “share yoga exercises with your friends via email, sms”.

Page 32: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

32

• NLP techniques help computers understand NL artifacts

• NLP is still difficult

• NLP on domain specific sentences with specific styles is feasible

Natural Language Processing (NLP)

Page 33: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

33

RQ1 Results: Effectiveness of WHYPER

• Low FPs and FNs• out of 9,061 sentences, only 129 are flagged as FPs• among 581 applications, 109 applications (18.8%) contain at least one FP• among 581 applications, 86 applications (14.8%) contain at least one FN

Permission SI TP FP FN TN Prec. Recall F-Score Acc

READ_CONTACTS 204 186 18 49 2,930 91.2 79.2 84.8 97.9

READ_CALENDAR 288 241 47 42 2,422 83.7 85.2 84.5 96.8

RECORD_AUDIO 259 195 64 50 3,470 75.3 79.6 77.4 97.0

TOTAL 751 622 129 141 9,061 82.8 81.5 82.2 97.3

Page 34: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

34

• Incorrect parsing• “MyLink Advanced provides full

synchronization of all Microsoft Outlook emails (inbox, sent, outbox and drafts), contacts, calendar, tasks and notes with all Android phones via USB”

• Synonym analysis• “You can now turn recordings into

ringtones.”

Result Analysis (False Positives)

Page 35: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

35

• Incorrect parsing• Incorrect identification of sentence boundaries and limitations of

underlying NLP infrastructure

• Limitations of Semantic Graphs• Manual Augmentation

• microphone-blow into and call-record• significant improvement of Delta Recalls: -6.6% to 0.6%

• Automatic mining from user comments and forums

Result Analysis (False Negatives)

Page 36: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Overview of Text2Policy

A HCP should not change patient’s account.

An [subject: HCP] should not [action: change] [resource: patient’s account].

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Linguistic Analysis

Model-Instance Construction

Transformation

Page 37: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Linguistic Analysis

• Incorporate syntactic and semantic analysis– syntactic structure -> noun group, verb group, etc.– semantic meaning -> subject, action, resource, negative

meaning, etc.

• Provide New techniques for model extraction– Identify ACP and AS sentences– Infer semantic meaning

Page 38: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Common Techniques

• Shallow parsing• Domain dictionary• Anaphora resolution

An HCP can view patient’s account.He is disallowed to change the patient’s account.

Subject Main Verb Group

Object

NP PNP

UPDATEHCP

VG

Page 39: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Technical Challenges (TC) in ACP Extraction

• TC1: Semantic Structure Variance– different ways to specify the same rule

• TC2: Negative Meaning Implicitness– verb could have negative meaning

ACP 1: An HCP cannot change patient’s account.ACP2: An HCP is disallowed to change patient’s account.

Page 40: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Semantic-Pattern Matching

• Address TC1 Semantic Structure Variance

• Compose pattern based on grammatical function

An HCP is disallowed to change the patient’s account.ex.

passive voice to-infinitive phrasefollowed by

Page 41: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Negative-Expression Identification

• Address TC2 Negative Meaning Implicitness

• Negative expression– “not” in subject:

– “not” in verb group:

• Negative meaning words in main verb group

No HCP can edit patient’s account.ex.

HCP can not edit patient’s account.HCP can never edit patient’s account.

ex.

ex. An HCP is disallowed to change the patient’s account.

Page 42: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

AS: Syntactic-Pattern Matching

• Syntactic elements– Subject , Main verb, Object

• Subject and Object Checking– subject is a not a user or object is not a resource

• Filtering negative-meaning sentences– Negative sentences tend not to describe ASs

The prescription list should include medication, the name of the doctor. . .

ex.

Page 43: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Overview of Text2Policy

A HCP should not change patient’s account.

An [subject: HCP] should not [action: change] [resource: patient’s account].

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Linguistic Analysis

Model-Instance Construction

Transformation

Page 44: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

ACP Model-Instance Construction

• Identify subject, action, and resource:– Subject: HCP– Action: change– Resource: patient’s account

• Infer effect:– Negative Expression: none– Negative Verb: disallow– Inferred Effect: deny

An HCP is disallowed to change the patient’s account.

ex.

ACP Rule

EffectSubject Action Resource

HCP UPDATE - change

patient’s account

deny

Page 45: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

AS Model-Instance Construction

• Use case patterns– industry use cases [DSN’09]– public use cases

• Model-Instance ConstructionThe patient views access log.ex.

Action Step

Actor Action Resource

patient OUTPUT – view

access log

Page 46: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Technical Challenges in Action-Step Extraction

• TC4: Transitive Subject

• TC5: Perspective Variance

AS 1:He edits the account.AS 2: The system updates the account.AS 3: The system displays the updated account.

HCPHCP views the updated account.

Page 47: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Subject Flow Tracking

• Address TC4 Transitive Subject• Apply data flow to track non-system subject:

AS 1: The HCP edits the account.AS 2: The system updates the account.

Tracking Only system as subject

replaced with HCP as subject

Page 48: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Perspective Conversion

• Address TC5 Perspective Variance• Apply data flow to track non-system subject:

AS 1: The HCP edits the account.AS 2: The system shows the updated account.

Tracking Only system as subject andaction is output

Converting to “HCP views the updated account”

Page 49: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Evaluation – RQs

• RQ1: How effectively does Text2Policy identify ACP sentences in NL documents?

• RQ2: How effectively does Text2Policy extract ACP rules from ACP sentences?

• RQ3: How effectively does Text2Policy extract action steps from action-step sentences?

Page 50: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Evaluation – Subject

• iTrust open source project– http://agile.csc.ncsu.edu/iTrust/wiki/– 448 use-case sentences (37 use cases)– preprocessed use cases

• Collected ACP sentences– 100 ACP sentences – From 17 sources (published papers and websites)

• A module of an IBMApp (financial domain)– 25 use cases

Page 51: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

RQ1 ACP Sentence Identification

• Apply Text2Policy to identify ACP sentences in iTrust use cases and IBMApp use cases

• Text2Policy effectively identifies ACP sentences with precision and recall more than 88%

• Precision on IBMApp use cases is better– proprietary use cases are often of higher quality compared to open-source

use cases

Page 52: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Evaluation –RQ2 Accuracy of Policy Extraction

• Apply Text2Policy to extract ACP rules from ACP sentences

• Text2Policy effectively extracts ACP model instances with accuracy above 86%

Page 53: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Evaluation –RQ3 Accuracy of Action-Step Extraction

• Apply Text2Policy to extract action steps from iTrust and IBMApp use cases

• Text2Policy effectively extracts AS model instances with accuracy above 81%

• Limitations: – Subordinate conjunction or else and long phrases

Page 54: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Detected Inconsistencies

• No violation between ASs against the extracted ACPs

• Inconsistent names used for referring to the same entity (e.g., user) across different use cases

editor used in UC 4 of iTrust use cases actually refers to HCP, admin, and all usersin UCs 1, 2, and 4

ex.

Page 55: Tao Xie University of Illinois at Urbana-Champaign 0 taoxie@illinois.edu.

Summary

• Natural Language Processing (NLP) for domain-specific purposes is feasible– Challenging for general documents– Feasible for domain-specific sentences with specific

styles

• New techniques are required – Addressing unique challenges in software engineering

http://research.csc.ncsu.edu/ase/projects/text2policy/