11
T. K. Prasad (Krishnaprasad Thirunarayan )Professor of Computer Science and Engineering
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, OH-45435
Big Data and Smart HealthcareHonors Institute Symposium on Visions of the Future
Big Data Processing and Smart HealthcareKrishnaprasad Thirunarayan (T. K. Prasad)
Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, OH-45435
Prasad 3
Outline
• Extent and Economics of Healthcare Problem• Nature of Health-related Big Data• Cognitive Computing Goals• Five V’s of Big Data Research• Our Research
– Semantic Perception for Scalability– Lightweight Semantics to Manage Heterogeneity – Hybrid Knowledge Representation and Reasoning
• Anomaly, Correlation, Causation
03/20/2014
Prasad 4
Acute Decompensated Heart Failure (ADHF) Statistics
• Heart failure affects > 5 million people in the US.• > 550,000 new cases are diagnosed each year.
• The estimated cost of heart failure in the US for 2008 is $34.8 billion.
• Approximately 25% of patients are re-hospitalized within 30 days of discharge.
• Approximately 50% of patients are re-hospitalized within 6 months of discharge.
03/20/2014
Prasad 5
Asthma Statistics
• Asthma affects > 25 million people in the US.• > 7 million are children.
• The current reactive cost > $56 billion.
• Asthma is the third leading cause of hospitalization with 800,000 emergency room visits among children under the age of 15.
03/20/2014
Prasad 6
Obesity Statistics
03/20/2014
• The number of severely obese (BMI ≥ 40) patients has quadrupled between 1986 and 2000 from one in 200 to one in 50.
• Obesity-related medical treatment costs > $150 billion a year.
• Hospitalizations of children and youths with obesity doubled from 1999 to 2005.
Prasad 7
Parkinson’s Disease (PD) Statistics
03/20/2014
• In 2010, 630,000 people in the US had a diagnosis of PD.
• The number of people with PD will double by 2040.
• Just medical costs for people with PD is $8.1 billion total.
The Patient of the FutureMIT Technology Review, 2012
http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 8
Prasad 9
Healthcare Related Big Data for Potential Exploitation: Assorted Examples
• Semi-structured: Electronic medical records (EMR) market has been valued at $20 billion in 2012.
• User-generated content / informal text: Social media posts / microblogs discussing depression, drug abuse/liberalization policies, side-effects, etc.
• Sensor data: M. J. Fox Foundation Parkinson disease challenge dataset that tracked 16 people (9 patients + 7 control) with 7 mobile phone sensors over 8 weeks is 12 GB.
• Other Applications: The healthcare industry spends roughly $250 billion per year due to fraud.
03/20/2014
Structured vs Unstructured Data
Patient Disorders ICD-9 Code
Patient1 Hypertension 401
Patient2 Atrial fibrillation 427.31
Patient1 Pulmonary hypertension 416
Patient3 Edema 782.3
Patient4 hyperthyroidism 242.9
Coronary artery disease, status post four-vessel coronary artery bypass graft surgery on , by Dr. X with a left internal mammary artery to the left anterior descending artery, sequential vein graft to the ramus and first diagonal, and a vein graft to the posterior descending artery. He had normal left ventricular function. He is having some symptoms that are unclear if they are angina or not. I am therefore going to get him scheduled for an exercise Cardiolite stress test.
VS
Patient Data Distribution
Structured data
Unstructured data
Search Mining
Decision Support
Knowledge Discovery Prediction
NLP +
Semantics
Nature of Processing
An Example
He is off both Diovan and Lotrel. I am unsure if it is due to underlying renal insufficiency. He has actually been on atenolol alone for his hypertension.
Raw Text
Concepts
Knowledge
Inference
diovan lotrel renal insufficiency atenolol hypertension
diovanvaltuna
valsartan
antihypertensive agent
atenolol
tenominatenix kidney failure
renal insufficiency
kidney disease
disorder
blood pressure disorder
hypertension
systoloc hypertension
pulmonary hypertension
Patient taking diovan for hypertension
Patient has kidney disease
Patient is on antihypertensive drugs
is used to treat
is a
drug
disorder
Purpose of Big Data Analytics Vetted by Domain Experts
Data can help compensate for our overconfidence
in our own intuitions and reduce the extent to which our desires distort our perceptions.
-- David Brooks of New York Times
However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.
03/20/2014 Prasad 14
Prasad 15
Cognitive Computing Systems
03/20/2014
• Leverage Big Data using human experts to enable better decisions.– Process natural language and unstructured
data. – Use of Artificial Intelligence (e.g., Machine
Learning algorithms) to sense, infer, predict, abduce, and, in some ways, think.
Check engine light analogy
Prasad 16
Research Challenges : 5V’s of Big Data
VolumeVelocityVarietyVeracity
Value
Big Data => Smart Data03/20/2014
Prasad 17
Volume : (1) Semantic Perception
Semantic Perception : Volume => Value
Distill voluminous machine-sensed data into human comprehensible nuggets necessary for decision-making using background knowledge
03/20/2014
Prasad 20
Parkinson’s Disease Use Case
• Data from machine-sensors– accelerometer, GPS, compass, microphone, etc.
• Human perceived features– tremors, walking style, balance, slurred speech, etc.
• Machine perception – Using domain models to be created to diagnose and
monitor disease progression
• Ultimately, recommend options to control chronic conditions …
03/20/2014
Prasad 22
Heart Failure Use Case
• Machine-sensed data – Weight change, heart rate, blood pressure, oxygen level,
etc.• Human perceived features
– Risk-level for hospital readmission of CHF/ADHF patient • Machine perception
– Using domain models to be created to monitor heart condition of a cardiac patient post hospital discharge
• Ultimately, recommend treatments to reduce preventable hospital readmissions …
03/20/2014
Prasad 23
Asthma Use Case
• Data from machine-sensors– Environmental sensors, physiological sensors, etc.
• Human perceived features– Asthma severity gleaned from frequency of asthma
attacks, wheezing, coughing, sleeplessness, etc.• Machine perception
– Using domain models to be created to monitor asthma patients and their surroundings
• Ultimately, recommend prevention, treatment, and control options …[EVIDENCE-BASED APPROACH]
03/20/2014
Prasad 24
Volume : (2) Exploiting Embarrassing Parallelism
• Cloud Computing–Hardware : Networked Stock PCs–Middleware: Replicated storage and
restarted computations for fault tolerance• E.g., Hadoop file system, Google file system
–Application Programming: Models / languages for distributed computation• E.g., Map-Reduce, PIG, HIVE
03/20/2014
Prasad 25
Volume with a Twist
Resource-constrained reasoning on mobile-devices
Goal: Boolean encodings to ensure feasibility, efficiency, and economy
03/20/2014
Prasad 26
Cory Henson’s Thesis Statement
Machine perception can be formalized using semantic web technologies to derive abstractions from sensor data using background knowledge on the Web, and efficiently executed on resource-constrained devices.
03/20/2014
Prasad 27* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Perception Cycle* that exploits background knowledge / domain models
Abstracting raw data for human
comprehension
Focus generation for disambiguation and action(incl. human in the loop)
Prior Knowledge
03/20/2014
O(n3) < x < O(n4) O(n)
Efficiency Improvement
• Problem size increased from 10’s to 1000’s of nodes
• Time reduced from minutes to milliseconds• Complexity reduced from polynomial to
linear
Evaluation on a mobile device
Prasad 35
36
kHealth: Health Signal Processing Architecture
Take Medication before going to work Avoid going out in the evening due to high pollen levels
Domain ExpertsDomain Knowledge
Risk Model
Data Acquisition & aggregation
Analysis
Personalized Actionable
Information
Personal level Signals
Public level Signals
Population level Signals
Events from Social Streams
Contact doctor
kHealth Demo
• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4
38
Prasad 39
Variety
Syntactic and semantic heterogeneity • in textual and sensor data, • in social media and Web forums data• In Electronic Medical Records
Idea: Semantics-empowered integration
03/20/2014
Prasad 40
Variety (How?): (1) Granularity of Semantics & Applications
• Lightweight semantics: File and document-level annotation to enable discovery and sharing
• Richer semantics: Data-level annotation and extraction for semantic search and summarization
• Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data
Cost-benefit trade-off and continuum03/20/2014
Prasad 42
Variety (How?): (2) Hybrid KRR Blending data-driven models with declarative
knowledge – Data-gleaned models: Bottom-up, correlation-
based, statistical– Expert-given KBs: Top-down, causal/taxonomical,
logical– Refine structure to better estimate parameters
E.g., Medical Data Analytics using PGMs + KBs
03/20/2014
Prasad 45
Veracity
Scalable and Agile Big Data Analytics cannot deliver value unless we have confidence and trust in our data.
Open Problem: Develop expressive frameworks for trust to make explicit all aspects that go into trust formation and inferences.
03/20/2014
Prasad 46
Veracity: Confession of sorts!
Trust is well-known, but is not well-understood.
The utility of a notion testifies not to its clarity but rather to the philosophical importance of clarifying it.
-- Nelson Goodman (Fact, Fiction and Forecast, 1955)
03/20/2014
Prasad 47
(More on) Value
Discovering gaps and enriching domain models using data
E.g., Semantics Driven Approach for Knowledge Acquisition from EMRs
Idea: Use known associations between diseases, symptoms and medications implicit in real world scenarios (EMRs) to acquire unknown associations and bridge the gaps in knowledge base
03/20/2014
Prasad 48
(More on) Value
Discovering drug-drug interaction by analyzing search query logs
• E.g., The antidepressant, paroxetine, and the cholesterol lowering drug, pravastatin, were shown to interfere causing high blood sugar, by correlated searches with “hyperglycemia”, “high blood sugar” or “blurry vision”.
03/20/2014
Prasad 49
Conclusions
• Glimpse of our research organized around the 5 V’s of Big Data• Discussed role in harnessing Value
– Semantic Perception (Volume)– Continuum of Semantic models to manage
Heterogeneity (Variety)– Hybrid KRR: Probabilistic + Logical (Variety)– Trust Models (Veracity)
03/20/2014
Prasad 50
thank you, and please visit us athttp://knoesis.org/
Department of Computer Science and EngineeringWright State University, Dayton, Ohio, USA
Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
Special Thanks to: Pramod Anantharam, Sujan Perera, Dr. Cory Henson, Professor Amit Sheth
03/20/2014