Top Banner
1 1 T. K. Prasad (Krishnaprasad Thirunarayan ) Professor of Computer Science and Engineering Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled Computing Wright State University, Dayton, OH-45435 Big Data and Smart Healthcare Honors Institute Symposium on Visions of the Future
36

Big data healthcare

Aug 23, 2014

Download

Healthcare

With the rapid proliferation of mobile phones, social media, and sensors, it is critical to collect and convert big data so generated into actionable information that is relevant for decision making. In this session, we explore challenges and approaches for synthesizing relevant background knowledge and inferences that can enable smart healthcare and ultimately benefit community at large.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Big data healthcare

11

T. K. Prasad (Krishnaprasad Thirunarayan )Professor of Computer Science and Engineering

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, OH-45435

Big Data and Smart HealthcareHonors Institute Symposium on Visions of the Future

Page 2: Big data healthcare

Big Data Processing and Smart HealthcareKrishnaprasad Thirunarayan (T. K. Prasad)

Kno.e.sis – Ohio Center of Excellence in Knowledge-enabled ComputingWright State University, Dayton, OH-45435

Page 3: Big data healthcare

Prasad 3

Outline

• Extent and Economics of Healthcare Problem• Nature of Health-related Big Data• Cognitive Computing Goals• Five V’s of Big Data Research• Our Research

– Semantic Perception for Scalability– Lightweight Semantics to Manage Heterogeneity – Hybrid Knowledge Representation and Reasoning

• Anomaly, Correlation, Causation

03/20/2014

Page 4: Big data healthcare

Prasad 4

Acute Decompensated Heart Failure (ADHF) Statistics

• Heart failure affects > 5 million people in the US.• > 550,000 new cases are diagnosed each year.

• The estimated cost of heart failure in the US for 2008 is $34.8 billion.

• Approximately 25% of patients are re-hospitalized within 30 days of discharge.

• Approximately 50% of patients are re-hospitalized within 6 months of discharge.

03/20/2014

Page 5: Big data healthcare

Prasad 5

Asthma Statistics

• Asthma affects > 25 million people in the US.• > 7 million are children.

• The current reactive cost > $56 billion.

• Asthma is the third leading cause of hospitalization with 800,000 emergency room visits among children under the age of 15. 

03/20/2014

Page 6: Big data healthcare

Prasad 6

Obesity Statistics

03/20/2014

• The number of severely obese (BMI ≥ 40) patients has quadrupled between 1986 and 2000 from one in 200 to one in 50. 

• Obesity-related medical treatment costs > $150 billion a year.

• Hospitalizations of children and youths with obesity doubled from 1999 to 2005.

Page 7: Big data healthcare

Prasad 7

Parkinson’s Disease (PD) Statistics

03/20/2014

• In 2010, 630,000 people in the US had a diagnosis of PD.

• The number of people with PD will double by 2040.

• Just medical costs for people with PD is $8.1 billion total.

Page 8: Big data healthcare

The Patient of the FutureMIT Technology Review, 2012

http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/ 8

Page 9: Big data healthcare

Prasad 9

Healthcare Related Big Data for Potential Exploitation: Assorted Examples

• Semi-structured: Electronic medical records (EMR) market has been valued at $20 billion in 2012.

• User-generated content / informal text: Social media posts / microblogs discussing depression, drug abuse/liberalization policies, side-effects, etc.

• Sensor data: M. J. Fox Foundation Parkinson disease challenge dataset that tracked 16 people (9 patients + 7 control) with 7 mobile phone sensors over 8 weeks is 12 GB.

• Other Applications: The healthcare industry spends roughly $250 billion per year due to fraud.

03/20/2014

Page 10: Big data healthcare

Structured vs Unstructured Data

Patient Disorders ICD-9 Code

Patient1 Hypertension 401

Patient2 Atrial fibrillation 427.31

Patient1 Pulmonary hypertension 416

Patient3 Edema 782.3

Patient4 hyperthyroidism 242.9

Coronary artery disease, status post four-vessel coronary artery bypass graft surgery on , by Dr. X with a left internal mammary artery to the left anterior descending artery, sequential vein graft to the ramus and first diagonal, and a vein graft to the posterior descending artery. He had normal left ventricular function. He is having some symptoms that are unclear if they are angina or not. I am therefore going to get him scheduled for an exercise Cardiolite stress test.

VS

Page 11: Big data healthcare

Patient Data Distribution

Structured data

Unstructured data

Page 12: Big data healthcare

Search Mining

Decision Support

Knowledge Discovery Prediction

NLP +

Semantics

Nature of Processing

Page 13: Big data healthcare

An Example

He is off both Diovan and Lotrel. I am unsure if it is due to underlying renal insufficiency. He has actually been on atenolol alone for his hypertension.

Raw Text

Concepts

Knowledge

Inference

diovan lotrel renal insufficiency atenolol hypertension

diovanvaltuna

valsartan

antihypertensive agent

atenolol

tenominatenix kidney failure

renal insufficiency

kidney disease

disorder

blood pressure disorder

hypertension

systoloc hypertension

pulmonary hypertension

Patient taking diovan for hypertension

Patient has kidney disease

Patient is on antihypertensive drugs

is used to treat

is a

drug

disorder

Page 14: Big data healthcare

Purpose of Big Data Analytics Vetted by Domain Experts

Data can help compensate for our overconfidence

in our own intuitions and reduce the extent to which our desires distort our perceptions.

-- David Brooks of New York Times

However, inferred correlations require clear justification that they are not coincidental, to inspire confidence.

03/20/2014 Prasad 14

Page 15: Big data healthcare

Prasad 15

Cognitive Computing Systems

03/20/2014

• Leverage Big Data using human experts to enable better decisions.– Process natural language and unstructured

data. – Use of Artificial Intelligence (e.g., Machine

Learning algorithms) to sense, infer, predict, abduce, and, in some ways, think.

Check engine light analogy

Page 16: Big data healthcare

Prasad 16

Research Challenges : 5V’s of Big Data

VolumeVelocityVarietyVeracity

Value

Big Data => Smart Data03/20/2014

Page 17: Big data healthcare

Prasad 17

Volume : (1) Semantic Perception

Semantic Perception : Volume => Value

Distill voluminous machine-sensed data into human comprehensible nuggets necessary for decision-making using background knowledge

03/20/2014

Page 18: Big data healthcare

Prasad 20

Parkinson’s Disease Use Case

• Data from machine-sensors– accelerometer, GPS, compass, microphone, etc.

• Human perceived features– tremors, walking style, balance, slurred speech, etc.

• Machine perception – Using domain models to be created to diagnose and

monitor disease progression

• Ultimately, recommend options to control chronic conditions …

03/20/2014

Page 19: Big data healthcare

Prasad 22

Heart Failure Use Case

• Machine-sensed data – Weight change, heart rate, blood pressure, oxygen level,

etc.• Human perceived features

– Risk-level for hospital readmission of CHF/ADHF patient • Machine perception

– Using domain models to be created to monitor heart condition of a cardiac patient post hospital discharge

• Ultimately, recommend treatments to reduce preventable hospital readmissions …

03/20/2014

Page 20: Big data healthcare

Prasad 23

Asthma Use Case

• Data from machine-sensors– Environmental sensors, physiological sensors, etc.

• Human perceived features– Asthma severity gleaned from frequency of asthma

attacks, wheezing, coughing, sleeplessness, etc.• Machine perception

– Using domain models to be created to monitor asthma patients and their surroundings

• Ultimately, recommend prevention, treatment, and control options …[EVIDENCE-BASED APPROACH]

03/20/2014

Page 21: Big data healthcare

Prasad 24

Volume : (2) Exploiting Embarrassing Parallelism

• Cloud Computing–Hardware : Networked Stock PCs–Middleware: Replicated storage and

restarted computations for fault tolerance• E.g., Hadoop file system, Google file system

–Application Programming: Models / languages for distributed computation• E.g., Map-Reduce, PIG, HIVE

03/20/2014

Page 22: Big data healthcare

Prasad 25

Volume with a Twist

Resource-constrained reasoning on mobile-devices

Goal: Boolean encodings to ensure feasibility, efficiency, and economy

03/20/2014

Page 23: Big data healthcare

Prasad 26

Cory Henson’s Thesis Statement

Machine perception can be formalized using semantic web technologies to derive abstractions from sensor data using background knowledge on the Web, and efficiently executed on resource-constrained devices.

03/20/2014

Page 24: Big data healthcare

Prasad 27* based on Neisser’s cognitive model of perception

ObserveProperty

PerceiveFeature

Explanation

Discrimination

1

2

Perception Cycle* that exploits background knowledge / domain models

Abstracting raw data for human

comprehension

Focus generation for disambiguation and action(incl. human in the loop)

Prior Knowledge

03/20/2014

Page 25: Big data healthcare

O(n3) < x < O(n4) O(n)

Efficiency Improvement

• Problem size increased from 10’s to 1000’s of nodes

• Time reduced from minutes to milliseconds• Complexity reduced from polynomial to

linear

Evaluation on a mobile device

Prasad 35

Page 26: Big data healthcare

36

kHealth: Health Signal Processing Architecture

Take Medication before going to work Avoid going out in the evening due to high pollen levels

Domain ExpertsDomain Knowledge

Risk Model

Data Acquisition & aggregation

Analysis

Personalized Actionable

Information

Personal level Signals

Public level Signals

Population level Signals

Events from Social Streams

Contact doctor

Page 27: Big data healthcare

kHealth Demo

• kHealth: http://www.youtube.com/watch?v=btnRi64hJp4

38

Page 28: Big data healthcare

Prasad 39

Variety

Syntactic and semantic heterogeneity • in textual and sensor data, • in social media and Web forums data• In Electronic Medical Records

Idea: Semantics-empowered integration

03/20/2014

Page 29: Big data healthcare

Prasad 40

Variety (How?): (1) Granularity of Semantics & Applications

• Lightweight semantics: File and document-level annotation to enable discovery and sharing

• Richer semantics: Data-level annotation and extraction for semantic search and summarization

• Fine-grained semantics: Data integration, interoperability and reasoning in Linked Open Data

Cost-benefit trade-off and continuum03/20/2014

Page 30: Big data healthcare

Prasad 42

Variety (How?): (2) Hybrid KRR Blending data-driven models with declarative

knowledge – Data-gleaned models: Bottom-up, correlation-

based, statistical– Expert-given KBs: Top-down, causal/taxonomical,

logical– Refine structure to better estimate parameters

E.g., Medical Data Analytics using PGMs + KBs

03/20/2014

Page 31: Big data healthcare

Prasad 45

Veracity

Scalable and Agile Big Data Analytics cannot deliver value unless we have confidence and trust in our data.

Open Problem: Develop expressive frameworks for trust to make explicit all aspects that go into trust formation and inferences.

03/20/2014

Page 32: Big data healthcare

Prasad 46

Veracity: Confession of sorts!

Trust is well-known, but is not well-understood.

The utility of a notion testifies not to its clarity but rather to the philosophical importance of clarifying it.

-- Nelson Goodman (Fact, Fiction and Forecast, 1955)

03/20/2014

Page 33: Big data healthcare

Prasad 47

(More on) Value

Discovering gaps and enriching domain models using data

E.g., Semantics Driven Approach for Knowledge Acquisition from EMRs

Idea: Use known associations between diseases, symptoms and medications implicit in real world scenarios (EMRs) to acquire unknown associations and bridge the gaps in knowledge base

03/20/2014

Page 34: Big data healthcare

Prasad 48

(More on) Value

Discovering drug-drug interaction by analyzing search query logs

• E.g., The antidepressant, paroxetine, and the cholesterol lowering drug, pravastatin, were shown to interfere causing high blood sugar, by correlated searches with “hyperglycemia”, “high blood sugar” or “blurry vision”.

03/20/2014

Page 35: Big data healthcare

Prasad 49

Conclusions

• Glimpse of our research organized around the 5 V’s of Big Data• Discussed role in harnessing Value

– Semantic Perception (Volume)– Continuum of Semantic models to manage

Heterogeneity (Variety)– Hybrid KRR: Probabilistic + Logical (Variety)– Trust Models (Veracity)

03/20/2014

Page 36: Big data healthcare

Prasad 50

thank you, and please visit us athttp://knoesis.org/

Department of Computer Science and EngineeringWright State University, Dayton, Ohio, USA

Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing

Special Thanks to: Pramod Anantharam, Sujan Perera, Dr. Cory Henson, Professor Amit Sheth

03/20/2014