Semantics for Big Data (,) Security and Privacy Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD NSF Workshop on Big Data.

Semantics for Big Data(,) Security and Privacy

Tim Finin and Anupam JoshiUniversity of Maryland, Baltimore County

Baltimore MD

NSF Workshop on Big Data Security and Privacy2014-09-16, University of Texas at Dallas

http://ebiq.org/r/363

The plot outline• Big data

→ Variety → Need for integration & fusion → Must understand data semantics→ Use semantic languages & tools (reasoners, ML)→ Have shared ontologies & background knowledge

• Relevance to security and privacy –Protect personal information, especially in

mobile/IOT scenarios–Better intrusion detection systems

Use Case ExamplesWe’ve used semantic technologies in support of assured information tasks including– Representing & enforcing information sharing policies– Negotiating for cloud services respecting organizational

constraints (e.g., data privacy, location, …)– Modeling context for mobile users and using this to

manage information sharing– Acquiring, using and sharing knowledge for

situationally-aware intrusion detection systemsKey technologies include Semantic Web languages (OWL, RDF) and tools and information extraction from text

Context-Aware Privacy and Security

• Smart mobile devices know a great deal abouttheir users, including their current context

• Acquiring and using this knowledgehelps them provide better services

• Sharing the information with other users, organizations and service providers can also be beneficial (Mobile Ad-Hoc Knowledge Networks)

• Context-aware policies can be used to limit information sharing as well as to control theactions and information access of mobile apps

We’re in a two-hour budget meeting at X with A, B and C

We’re in a impor-tant meeting

We’re busy

http://ebiq.org/p/589

Context-aware power management • Maintaining context model uses power• We empirically determine power usage for a

phone’s sensors and use this for optimization

Context-aware power management

• Maintaining the context model use power• We developed an accurate power models for a

phone’s sensors and use this for optimization

When updating context model1. Only enable sensors required by policy, reuse

recent sensor readings whenever appropriatee.g., disable GPS sensor when at home in evening

2. Prefer sensors with lower energy footprint or already in use when several available

e.g., Choose Wifi to GPS for location at office during day3.Reorder rule conditions to reduce energy use

e.g., Check conditions requiring no sensor access first

Intrusion Detection Systems• Current intrusion detection systems poor for

zero-day and “low and slow” attacks, and APTs• Sharing Information from heterogeneous data

sources can provide useful information even when an attack signature is unavailable

• Implemented prototypes that integrate and reason over data from IDSs, host and network scanners, and text at the knowledge level

• We’ve established the feasibility of the approach in simple evaluation experiments

From dashboards & watchstanding

(Simple) Analysis

… to situational awareness

Non Traditional “Sensors”

Traditional Sensors

Facts / Information

Context/Situation

Policies

Analytics

Alerts

Use-after-free vulnerability in Microsoft Internet Explorer 6 through 8 ….

[ a IDPS:text_entity; IDPS:has_vulnerability_term "true"; IDPS:has_security_exploit "true"; IDPS:has_text “Internet Explorer"; IDPS:has_text “arbitrary code "; IDPS:has_text "remote attackers".]

[ a IDPS:system; IDPS:host_IP "130.85.93.105”.]

[ a IDPS:scannerLog IDPS:scannerLogIP "130.85.93.105"; …][ a IDPS:gatewayLog IDPS:gatewayLogIP "130.85.93.105"; …]

[ IDPS:scannerLog IDPS:hasBrowser ?BrowserIDPS:gatewayLog IDPS:hasURL ?URL?URL IDPS:hasSymantecRating “unsafe”IDPS: scannerLog IDPS:hasOutboundConnection “true”IDPS:WiresharkLog IDPS:isConnectedTo ?IPAddress?IPAddress IDSP:isZombieAddress “true”]=>[IDPS:system IDPS:isUnderAttack “user-after-free vulnerability”IDPS:attack IDPS:hasMeans “Backdoor”IDPS:attack IDPS:hasConsequence “UnautorizedRemoteAccess”]

Maintaining the vulnerability KB• Our approach requires us to keep the KB of

software products and known or suspected vulnerabilities and attacks up to date

• Resources like NVD are great, but tapping into text can enrich their info and give earlier warn-ings of problems

CVE disclosed(01/14/13)

Vendor deploys software

Attacker finds vuln. & exploits it(01/10/13)

Exploit reported in mailing list

(01/10/13) Vuln. reported in NVD RSS feed

Analysis

Vuln. Analyzed & included in NVD feed(02/16/2013)

Vendor Analysis

Threat disclosed in vendor bulletin

(03/04/2013)

Patch development

Patch released(Critical Patch Update)

(06/18/2013)

Resolution

System update

Information extraction from text

CVE-2012-0150Buffer overflow in msvcrt.dll in Microsoft Windows Vista SP2, Windows Server 2008 SP2, R2, and R2 SP1, and Windows 7 Gold and SP1 allows remote attackers to execute arbitrary code via a crafted media file, aka ”Msvcrt.dll Buffer Overflow Vulnerability.”

ebqids:hasMeans

Identify relationships

http://dbpedia.org/resource/Buffer_overflow

Link concepts to entities

http://dbpedia.org/resource/Windows_7

ebqids:affectsProduct

http://dbpedia.org/resource/Arbitrary_code_execution

• We use information extraction techniques to identify entities, relations and concepts in security related text

• These are mapped to terms in our ontology and the DBpedia LOD KB (based on Wikipedia)

• Google’s slogan: “Things, not strings”

Security Bulletins Blogs

Maintaining the vulnerability KB

Unstructured Data (Vuln. Summaries)

Entity & Concept Spotter

Extracted Concepts<Concept, Class>

Web Text

Triple Store

NVD dataset

Structured Data (XML)

IDS OntologyLinked

Cybersecurity Data

Consumers

Linking & Mapping Entities

RDF Generation

Faceblock

Click image to play 80 second video or go to Youtube

Faceblock OntologyFaceblock’s (OWL) ontology lets one to write context policy rules using predefined activity and place types

Faceblock Protocols

User device maintains context, reasons with policy rules and informs glass devices of Faceblock property: True or Fase

Taming Wild Big Data• WBD is structured or semi-structured data for

which we lack schema-level understanding–e.g, raw tables, graphs, xml, logs

• Developed tools to generate semanticdata from background ontologies& KBs, e.g. for clinical trial tables

• It’s harder when the domain is not even known. We’re developing systems that use large background KBs (e.g., Google’s Freebase) to predict types/subtypes of data instances

http://ebiq.org/p/672http://ebiq.org/p/661

Conclusion• Google’s new slogan: things, not strings• We also need: measurements, not numbers• Common ontologies in semantic representations

enable big data integration at a “knowledge level”–data, meta-data, provenance, certainty, rules

• Many advantages: –Enhancing discovery, integration and interoperability–Enabling inference and knowledge-level analytics–Expressing policy constraints in common semantic terms

http://ebiq.org/r/363

Semantics for Big Data (,) Security and Privacy Tim Finin and Anupam Joshi University of Maryland, Baltimore County Baltimore MD NSF Workshop on Big Data.

Documents

Tim Finin University of Maryland, Baltimore County UMBC HON....

The Semantic Web: there and back again Tim Finin University....

8//2808 Wikitology Wikipedia as an Ontology Tim Finin,...

CMSC 671 Fall 2015 Tim Finin, finin@umbc.edu. What is AI?

1 Intelligent Information Systems on the Web and in the...

1 Security and Services in Mobiquitous Computing Tim Finin.....

UMBC AN HONORS UNIVERSITY IN MARYLAND Future Research...

Semantic Message Passing for Generating Linked … Message.....

CMSC 471/671 Fall 2006 Tim Finin, finin@cs.umbc.edu.

1 of 30 Declarative Policies for Describing Web Service...

Semantics for Privacy and Context Tim Finin University of...

Tim Finin University of Maryland, Baltimore County 29...

Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad....

1 An Overview and Underview of the Semantic Web Tim Finin...

Detecting Communities Via Simultaneous Clustering of Graphs....

LOD 123: Making the semantic web easier to use Tim Finin...