YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Semantics-empowered Big Data Processing for PCS ApplicationsKrishnaprasad Thirunarayan (T. K. Prasad) and Amit Sheth

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing

Wright State University, Dayton, OH-45435

Page 2: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 2

Outline

• 5 V’s of Big Data Research

• Semantic Perception for Scalability

• Lightweight semantics to manage heterogeneity – Cost-benefit trade-off and continuum

• Hybrid Knowledge Representation and Reasoning– Anomaly, Correlation, Causation

11/15/2013

Page 3: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 3

5V’s of Big Data Research

Volume

Velocity

Variety

Veracity

Value

11/15/2013

Big Data => Smart Data

Page 4: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 4

Volume : Assorted Examples

• 25+ billion sensors have been deployed.

• About 250TB of sensor data are generated for a NY-LA flight on Boeing 737.

• Parkinson disease dataset that tracked 16 patients with mobile phone using 7 sensors over 8 weeks is 12GB.

Check engine light analogy11/15/2013

Page 5: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 5

Volume : Semantic Perception

• Abstracting machine-sensed data – E.g., fine-grained to coarse-grained– E.g., average, peak, rate of change

• Extracting human-comprehensible features/entities• Machine perception

– Derive conclusions using domain models

and hybrid abductive/deductive reasoning

Goal: Human accessible situational awareness and actionable intelligence for decision making

11/15/2013

Page 6: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 6

Weather Use Case

• Machine-sensed phenomenon– temperature, precipitation, humidity, wind speed, etc.

• Human perceived features– blizzard, flurry, rain storm, clear, etc.– categories of hurricanes (SSHWS)

• Machine perception– Using domain models from NOAA

• Ultimately, generate weather alerts …

11/15/2013

Page 7: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 7

Parkinson’s Disease Use Case

• Data from machine-sensors– accelerometer, GPS, compass, microphone, etc.

• Human perceived features– tremors, walking style, balance, slurred speech, etc.

• Machine perception– Using domain models to be created to diagnose and

monitor disease progression

• Ultimately, recommend options to control chronic conditions …

11/15/2013

Page 8: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 8

Heart Failure Use Case

• Machine-sensed data – weight, heart rate, blood pressure, oxygen level, etc.

• Human perceived features– Risk-level for hospital readmission of CHF/ADHR patient

• Machine perception– Using domain models to be created to monitor heart

condition of a cardiac patient post hospital discharge

• Ultimately, recommend treatments to reduce preventable hospital readmissions …

11/15/2013

Page 9: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 9

Asthma Use Case

• Data from machine-sensors– Environmental sensors, physiological sensors, etc.

• Human perceived features– Asthma severity gleaned from frequency of asthma

attacks, wheezing, coughing, sleeplessness, etc.

• Machine perception– Using domain models to be created to monitor asthma

patients and their surroundings

• Ultimately, recommend prevention and control options …

11/15/2013

Page 10: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 10

Traffic Use Case

• Data from machine-sensors, social media stream, and planned event schedules– Traffic flow sensors : link speed, link volume, Event-

specific tweets, etc.

• Human perceived features– traffic delays and congestion, etc.

• Machine perception– Using domain models to be created to understand traffic

patterns in response to events

• Ultimately, recommend traffic management options …

11/15/2013

Page 11: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Slow moving traffic

Link Description

Scheduled Event

Scheduled Event

511.org

511.org

Schedule Information

511.org

Traffic Monitoring

11

Heterogeneity in a Physical-Cyber-Social System

Page 12: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 12

Volume with a Twist

Resource-constrained reasoning on mobile-devices

Goal: Boolean encodings to ensure feasibility, efficiency, and economy

11/15/2013

Page 13: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

13* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle* that exploits background knowledge / domain models

Abstracting raw data for human

comprehension

Focus generation for disambiguation and action(incl. human in the loop)

Prior Knowledge

Page 14: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Virtues of Our Approach to Semantic Perception

Blends simplicity, effectiveness, and scalability.

• Declarative specification of explanation and discrimination;

• With applications (e.g., to healthcare) that are of contemporary relevance and interdisciplinary;

• Using encodings/algorithms that are significant (asymptotic order of magnitude gain) and necessary (“tractable” due to time/memory reduction for typical problem sizes); and

• Prototyped using extant PCs and mobile devices.

Page 15: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to

linear

Evaluation on a mobile device

15

Page 16: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 16

Volume and Velocity

• Lightweight semantics-based Adaptive/Continuous Filtering

E.g.,: Track evolution of crowd-sourced and verified Wikipedia event pages for relevance ranking of Twitter hashtags in Disaster response use-case

• Building domain models dynamically

11/15/2013

Page 17: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Heliopolis is a suburb of

Cairo.

Dynamic Model Creation

Continuous Semantics 17

Page 18: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety

Syntactic and semantic heterogeneity • in textual and sensor data, • in (legacy) materials data• in (long tail) geosciences data

Idea: Semantics-empowered integration

11/15/2013 Prasad 18

Page 19: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 19

Variety (What?): Materials/Geosciences Use Case

• Structured Data (e.g., relational)

• Semi-structured, Heterogeneous Documents (e.g., Publications and technical specs, which usually include text, numerics, maps and images)

• Tabular data (e.g., ad hoc spreadsheets and complex tables incorporating “irregular” entries)

11/15/2013

Page 20: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

20

Variety (How?/Why?): Granularity of Semantics & Applications

• Lightweight semantics: File and document-level annotation to enable discovery and sharing

• Richer semantics: Data-level annotation and extraction for semantic search and summarization

• Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data

Cost-benefit trade-off and continuum

Page 21: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 21

Challenges Associated with Typical Spreadsheet/Table

• Meant for human consumption • Irregular :

– Not simple rectangular grid• Heterogeneous

– All rows not interpreted similarly• Complex

– Meaning of each row and each column context dependent • Footnotes modify meaning of entries (esp. in materials

and process specifications)

11/15/2013

Page 22: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

22

Page 23: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 23

Practical Semi-Automatic Content Extraction

• DESIGN: Develop regular data structures that can be used to formalize tabular information.– Provide a natural expression of data – Provide semantics to data, thereby removing potential

ambiguities– Enable automatic translation

• USE: Manual population of regular tables and automatic translation into LOD

11/15/2013

Page 24: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety (What?) : Sensor Data Use Case

Develop/learn domain models to exploit complementary and corroborative information

• To relate patterns in multimodal data to “situation”

• To integrate machine sensed and human sensed data

11/15/2013 Prasad 24

Page 25: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety: Hybrid KRR

Blending data-driven models with declarative knowledge – Data-driven: Bottom-up, correlation-based,

statistical– Declarative: Top-down, causal/taxonomical,

logical– Refine structure to better estimate parameters

E.g., Traffic Analytics using PGMs + KBs

11/15/2013 Prasad 25

Page 26: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Variety (Why?): Hybrid KRR

Data can help compensate for our overconfidence in our own intuitions and reduce the extent to which our desires distort our perceptions.

-- David Brooks of New York Times

However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.

11/15/2013 Prasad 26

Page 27: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

• Correlations due to common cause or origin

• Coincidental due to data skew or misrepresentation

• Coincidental new discovery

• Strong correlation vs causation

• Anomalous and accidental

• Correlation turning into causations

Correlations vs Causation vs Anomalies

11/15/2013 Prasad 27

Page 28: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein

• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!

• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales

• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers

• Anomalous and accidental– E.g., CO2 levels and Obesity

• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex

Correlations vs Causation vs Anomalies

11/15/2013 Prasad 28

Page 29: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

• Correlations Due to common cause or origin– E.g., Planets: Copernicus > Kepler > Newton > Einstein

• Coincidental due to data skew or misrepresentation – E.g., Tall policy claims made by politicians!

• Coincidental new discovery– E.g., Hurricanes and Strawberry Pop-Tarts Sales

• Strong correlation vs causation– E.g., Spicy foods vs Helicobacter Pyroli : Stomach Ulcers

• Anomalous and accidental– E.g., CO2 levels and Obesity

• Correlation turning into causations– E.g., Pavlovian learning: conditional reflex

Paradoxes: The Seeds of Progress

Correlations vs Causation vs Anomalies

11/15/2013 Prasad 29

Page 30: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Veracity

Lot of existing work on Trust ontologies, metrics and models, and on Provenance tracking

• Homogeneous data: Statistical techniques• Heterogeneous data: Semantic models

Open Problem: Develop semantics of trust using expressive frameworks that are both declarative and computational • To make explicit all aspects that go into trust

formation, to inspire confidence in inferences

11/15/2013 Prasad 30

Page 31: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Veracity

Machine sensing: objective, quantitative,

but prone to environmental effects, battery life, …

Human sensing: subjective, qualitative,

but prone to bias, perceptual errors, rumors, …

Open problem: Improving trustworthiness by combining machine sensing and human sensing– E.g., 2002 Überlingen mid-air collision :Pilot incorrectly

using Traffic controller advice over electronic TCAS system recommendation

11/15/2013 Prasad 31

Page 32: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

(More on) Value

Learning domain models from “big data” for prediction

E.g., Harnessing Twitter "Big Data" for Automatic Emotion Identification

Idea: Exploit “emotion-hashtagged” tweets as training dataset

11/15/2013 Prasad 32

Page 33: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

(More on) Value

Discovering gaps and enriching domain models using data

E.g., Data driven knowledge acquisition method for domain knowledge enrichment in the healthcare

Idea: Use associations between diseases, symptoms and medications in EMR documents

11/15/2013 Prasad 33

Page 34: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad 34

Conclusions

• Glimpse of our research organized around

the 5 V’s of Big Data• Discussed role in harnessing Value

– Semantic Perception (Volume)– Continuum of Semantic models to manage

Heterogeneity (Variety)– Hybrid KRR: Probabilistic + Logical (Variety)– Continuous Semantics (Velocity)– Trust Models (Veracity)

11/15/2013

Page 35: Semantics-empowered Approaches to Big Data Processing for Physical-Cyber-Social Applications

Prasad35

thank you, and please visit us at

http://knoesis.org/

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, Ohio, USA

Kno.e.sis

11/15/2013

Special Thanks to: Pramod Anantharam and Cory Henson


Related Documents