Smart Data enabling Personalized Digital Health : Deriving Value via harnessing Volume, Variety and Velocity using semantics and Semantic Web Put Knoesis Banner Pramod Ananthar am Amit P. Sh eth Cory He nson Dr. T.K. Prasad Contributions by many, but Special Thanks to: Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis ) Wright State University, USA Sujan Perer Delroy Ca meron
Talk at PARC, Oct. 30, 2013. Abstract at: http://j.mp/PARCabs [To see animations, you may need to download the file and use powerpoint.]
Also see related talks on Smart Data for Smart Energy and other applications: http://wiki.knoesis.org/index.php/Smart_Data
The proliferation of smartphones and sensors, the continuous monitoring of physiology and environment (personal health signals), notifications from public health sources (public health signals), and more digital access to clinical data, are resulting in massive multisensory and multimodal observational data. The technology has significant potential to improve health and well-being, through early detection, better diagnosis, effective prevention and treatment of a disease; and improved the quality of life. However, to make this personalized digital medicine a reality, it is crucial to derive actionable insights from data including heterogeneous and fine-grained observations.
At Kno.e.sis, we have collaborations with clinicians in growing number of specializations (Cardiovascular, Pulmonology, Gastroenterology) to study personalized health decision making that involve the use of real-world patient data, deep background knowledge and well targeted clinical applications. For example:
* For a patient discharged from hospital with Acute Decompensated Heart Failure, can we compute post hospital discharge risk factor to reduce 30-day readmissions?
* For children with Asthma, can we predict an impending attack to enable actions that prevent an attack reducing the need for post-attack symptomatic relief?
* For Parkinson’s Disease, can we characterize the progression to adjust medication and therapeutic changes?
The above provides the context for a research agenda around what I call Smart Data, which (a) provides value from harnessing the challenges posed by volume, velocity, variety and veracity of Big Data, in-turn providing actionable information and improve decision making, and/or (b) is focused on the actionable value achieved by human involvement in data creation, processing and consumption phases for improving the Human experience. In describing Smart Data approach to above heath applications, I will cover the following technical capabilities that adds semantics to enhance or complement traditional NLP and ML centric solutions:
* Semantic Sensor Web- including semantic computation infrastructure, ability to semi-automatically create domain specific background knowledge (ontology) from unstructured data (e.g., EMR), and automatically do semantic annotation of multimodal and multisensory data
* Semantic perception – convert low level signals into higher level abstractions using IntellegO framework that utilizes domain knowledge and hybrid abductive/deductive reasoning
* Intelligence at Edge - perform scalable and efficient semantic computation on resource constrained devices
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Smart Data enabling Personalized Digital Health: Deriving Value via harnessing Volume, Variety and Velocity
using semantics and Semantic Web
Put Knoesis Banner
Pramod Anantharam
Amit P. Sheth
Cory Henson
Dr. T.K. Prasad
Contributions by many, but Special Thanks to:
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University, USA
• Prediction of the spread of flu in real time during H1N1 2009– Google tested a mammoth of 450 million different mathematical
models to test the search terms, comparing their predictions against the actual flu cases; 45 important parameters were founds
– Model was tested when H1N1 crisis struck in 2009 and gave more meaningful and valuable real time information than any public health official system [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• FareCast: predict the direction of air fares over different routes [Big Data, Viktor Mayer-Schonberger and Kenneth Cukier, 2013]
• NY city manholes problem [ICML Discussion, 2012]
• Current focus mainly to serve business intelligence and targeted analytics needs, not to serve complex individual and collective human needs (e.g., empower human in health, fitness and well-being; better disaster coordination, smart energy consumption) that is highly personalized/individualized/contextualized– Incorporate real-world complexity: multi-modal and multi-sensory nature of real-
world and human perception– Need deeper understanding of data and its role to information (e.g., skew,
• Human involvement and guidance: Leading to actionable information, understanding and insight right in the context of human activities– Bottom-up & Top-down processing: Infusion of models and background knowledge
(data + knowledge + reasoning)
What is missing?
13
Contextual
Information Smart Data
Makes Sense
Actionable or help decision support/making
14
DescriptiveExploratoryInferentialPredictive
Causal
Improved Analytics CREATION
PROCESSING
EXPERIENCE & DECISION MAKING
Human Centric Computing
15
Smart Data
Smart data makes sense out of Big data
It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-
turn providing actionable information and improve decision
making.
16
“OF human, BY human and FOR human”
Smart data is focused on the actionable value achieved by human
involvement in data creation, processing and consumption phases
for improving the human experience.
Another perspective on Smart Data
17
• Focus on verticals: advertising‚ social media‚ retail‚ financial services‚ telecom‚ and healthcare
– Aggregate data, focused on transactions, limited integration (limited complexity), analytics to find (simple) patterns
– Emphasis on technologies to handle volume/scale, and to lesser extent velocity: Hadoop, NoSQL,MPP warehouse ….
– Full faith in the power of data (no hypothesis), bottom up analysis
Current Focus on Big Data
18
“OF human, BY human and FOR human”
Another perspective on Smart Data
19Petabytes of Physical(sensory)-Cyber-Social Data everyday!
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
‘OF human’ : Relevant Real-time Data Streams for Human Experience
Mr. Yocabet was a disabled former truck driver and he has diabetes type 1. Treatment for the liver may harm his kidney even cause organ failure and death!
“Because he’s on anti-rejection drugs, the hepatitis C will be a lot worse in him,” -- Ms. Christina Mecannic
“Health professionals are required to make decisions with multiple foci (e.g. diagnosis, intervention, interaction and evaluation), in dynamic contexts, using a
diverse knowledge base (including an increasing body of evidence-based literature), with multiple variables and individuals involved.”
Semantic Perception and risk assessment algorithms can transform raw data (hard to comprehend) to abstractions (e.g., Patient Health is 3 on a scale of 5) that
is intuitively understandable and valuable for decision makers.
Having health score for various patients will allow efficient utilization of a decision maker’s precious attention
Risk assessment model
Semantic Perception
Population health record
Personal health record
Expert opinion
Clinical research
Clinical decision support
36
Patient Vulnerability Score (prognostic)
The Clinical Decision Support systems such as EMR alert system in its current state follows the high recall philosophy by reporting every
possible alert!
Doctors need actionable information and not a deluge of alerts to make timely and important decisions. Providing a vulnerability score would
facilitate right use of Doctor’s time to investigate further on vulnerabilities.
Risk assessment model
Semantic Perception
Population health record
Personal health record
Expert opinion
Clinical research
Clinical decision support
37
Value: Patient Context
How could Smart Data help?
38
3.4 billion people will have smartphones or tablets by 2017 -- Research2Guidance
How are machines supposed to integrate and interpret sensor data?
Semantic Sensor Networks (SSN)
43
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
44
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
45
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
46
Semantic Annotation of SWE
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K., Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
47
To gain new insight in
patient care &
early indications of
disease
Smart Data in Healthcare
49
… and do it efficiently and at scale
What if we could automate this sense making ability?
50
Making sense of sensor data with
51
People are good at making sense of sensory input
What can we learn from cognitive models of perception?• The key ingredient is prior knowledge
52* based on Neisser’s cognitive model of perception
ObserveProperty
PerceiveFeature
Explanation
Discrimination
1
2
Perception Cycle*
Translating low-level signals into high-level knowledge
Focusing attention on those aspects of the environment that provide useful information
Orders of magnitude resource savings for generating and storing relevant abstractions vs. raw observations.
Relevant abstractions
Raw observations
65
The Decisions are as Good as the Underlying Coded Knowledge
• How do we know whether we have all possible relationships?
• How do we know which relationships are missing?• How can we efficiently fill the missing relationships?
66
Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas Nair, 'Semantics Driven Approach for Knowledge Acquisition from EMRs', Special Issue on Data Mining in Bioinformatics, Biomedicine and Healthcare Informatics, Journal of Biomedical and Health Informatics (To Appear)
Knowledge is built by abstracting real world facts, once built it should be able to explain the real world
Semantics Driven Approach for Knowledge Acquisition from EMRs
Explanation Module
Explained?
Yes
NoHypothesis
FilteringHypothesis Generation
Hypothesis with High
Confidence
D
D D
DD
D
Patient Notes
Semantics Driven Approach for Knowledge Acquisition from EMRs
UMLS
68
1. Annotate the EMR documents with given knowledgebase2. Find unexplained symptoms3. Generate hypothesis for unexplained symptoms
1. All disorders in document becomes candidates4. Filter out candidate disorder with high confidence
1. Get disorders which has relationship with unexplained symptom in given knowledgebase
2. Collect the “neighborhood” of the disorders3. Get the intersection of “neighborhood” and candidate
disorders
The Algorithm
D1
D5
D2
D3
D4
S1
D8
D12
D6
D9D10
D2D7
D11
D13
D5Candidate Disease
Is symptom of
rdfs:subClassOf
Candidate Filtering Step
Intuition: “similar disorders manifest similar symptoms”
70
Evaluation
Precision = number of suggested correct relationshipsTotal number of suggested
= 73.09%
Recall = correct relationships found all correct relationships – known correct relationships
= 66.67%
If we do not perform the semantic filtering step, the precision would be 30%. High precision is important since it is hard to find domain experts to validate the generated hypothesis.
71
Through physical monitoring and analysis, our cellphones could act as an early warning system to detect serious health conditions, and provide actionable information
canary in a coal mine
kHealth
knowledge-enabled healthcare
72
kHealth to Manage ADHF (Acute Decompensated Heart Failure)
Current Observations-Physical-Physiological-History
Risk Score(Actionable Information)
Model CreationValidate correlations
Historical observations of each patient
Risk Score: from Data to Abstraction and Actionable Information
77
1http://www.nhlbi.nih.gov/health/health-topics/topics/asthma/2http://www.lung.org/lung-disease/asthma/resources/facts-and-figures/asthma-in-adults.html 3Akinbami et al. (2009). Status of childhood asthma in the United States, 1980–2007. Pediatrics,123(Supplement 3), S131-S145.
25 million
300 million
$50 billion
155,000
593,000
People in the U.S. are diagnosed with asthma (7 million are children)1.
Asthma is a multifactorial disease with health signals spanning personal, public health, and population levels.
Real-time health signals from personal level (e.g., Wheezometer, NO in breath, accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and population level (e.g., pollen level, CO2) arriving continuously in fine grained samples potentially with missing information and uneven sampling frequencies.
Variety Volume
VeracityVelocity
Value
Can we detect the asthma severity level?Can we characterize asthma control level?What risk factors influence asthma control?What is the contribution of each risk factor?
sem
antic
s Understanding relationships betweenhealth signals and asthma attacksfor providing actionable information
WHY Big Data to Smart Data: Healthcare example
79
Population Level
Personal
Public Health
Variety: Health signals span heterogeneous sourcesVolume: Health signals are fine grainedVelocity: Real-time change in situationsVeracity: Reliability of health signals may be compromised
Value: Can I reduce my asthma attacks at night?
Decision support to doctorsby providing them with
deeper insights into patientasthma care
Asthma: Demonstration of Value
80
Sensordrone – for monitoring environmental air quality
Wheezometer – for monitoringwheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers?
What is the wheezing level?
What is the propensity toward asthma?
What is the exposure level over a day?
What is the air quality indoors?
Commute to Work
Personal
Public Health
Population Level
Closing the window at homein the morning and taking analternate route to office may
lead to reduced asthma attacks
Actionable Information
Asthma: Actionable Information for Asthma Patients
81
Personal, Public Health, and Population Level Signals for Monitoring Asthma
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ; *consider referral to specialist
Asthma Control and Actionable Information
Sensors and their observations for understanding asthma
82
Personal Level Signals
Societal Level Signals
(Personal Level Signals)
(Personalized Societal Level Signal)
(Societal Level Signals)Societal Level Signals
Relevant to the Personal Level
Personal Level Sensors
(kHealth**) (EventShop*)
Qualify QuantifyAction
Recommendation
What are the features influencing my asthma?What is the contribution of each of these features?
How controlled is my asthma? (risk score)What will be my action plan to manage asthma?
tweet reporting pollution level and asthma attacks
Acceleration readings fromon-phone sensors
Sensor and personal observations
Signals from personal, personal spaces, and community spaces
Risk Category assigned by doctors
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Health Signal Extraction to Understanding
Well Controlled - continueNot Well Controlled – contact nursePoor Controlled – contact doctor
84
Personal Health Score and Vulnerability Score
At Discharge
Health Score Non-compliance Poor economic status
No living assistance
Vulnerability Score
Well Controlled Low
Well Controlled Very low
Not Well Controlled
High
Not Well Controlled
Medium
Poor Controlled Very High
Poor Controlled High
Estimation of readmission vulnerability based on the personal health score
85
Health Signal Extraction Challenges
Social streams has been used to extract many near real-time events
Twitter provides access to rich signals but is noisy, informal, uncontrolled capitalization, redundant,
and lacks context
We formalize the event extraction from tweets as a sequence labeling problem
How do we know the event phrases and who creates the training set? (manual creation is ruled out)
Now you know why you’re miserable! Very High Alert for B-ALLERGEN Ragweed I-ALLERGEN pollen. B-FACILITY Oklahoma I-FACILITY Allergy I-FACILITY Clinic says it’s an extreme exposure situation
Idea: Background knowledge used to create the training set e.g., typing information becomes the label for a concept
86
Health Signal Understanding Challenges
Formalized as a problem of structure extraction of a Bayesian Network
Find the structure that maximize the scoring function Where k indexes over all
• Problem size increased from 10’s to 1000’s of nodes• Time reduced from minutes to milliseconds• Complexity growth reduced from polynomial to
linear
Evaluation on a mobile device
91
2 Prior knowledge is the key to perceptionUsing SW technologies, machine perception can be formalized and integrated with prior knowledge on the Web
3 Intelligence at the edgeBy downscaling semantic inference, machine perception can
execute efficiently on resource-constrained devices
Semantic Perception for smarter analytics: 3 ideas to takeaway
1 Translate low-level data to high-level knowledgeMachine perception can be used to convert low-level sensory signals into high-level knowledge useful for decision making
D. Cameron, G. A. Smith, R. Daniulaityte, A. P. Sheth, D. Dave, L. Chen, G. Anand, R. Carlson, K. Z. Watkins, R. Falck. PREDOSE: A Semantic Web Platform for Drug Abuse Epidemiology using Social Media. Journal of Biomedical Informatics. July 2013 (in press)
Kno.e.sis - Ohio Center of Excellence in Knowledge-enabled ComputingCITAR - Center for Interventions Treatment and Addictions Research
http://wiki.knoesis.org/index.php/PREDOSE
PREDOSE: Prescription Drug abuse Online-Surveillance and Epidemiology
Bridging the gap between researcher and policy makers
Early identification of emerging patterns and trends in abuse
PREDOSE: Prescription Drug abuse Online-Surveillance and Epidemiology
• Drug Overdose Problem in US• 100 people die everyday from drug overdoses• 36,000 drug overdose deaths in 2008• Close to half were due to prescription drugs
Gil KerlikowskeDirector, ONDCP
Launched May 2011
PREDOSE: Bringing Epidemiologists and Computer Scientists together
Early Identification and Detection of Trends
Access hard-to-reach Populations
Large Data Sample Sizes
Group Therapy: http://www.thefix.com/content/treatment-options-prison90683
Interviews
Online Surveys
Automatic Data Collection
Not Scalable
Manual Effort
Sample Biases
Epidemiologist
Qualitative Coding
Problems
Computer Scientist
Automate Information Extraction & Content Analysis
I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It gave me a bad headache, for hours, and I almost vomited. I could feel the bupe working but overall the experience sucked.
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now.
Codes Triples (subject-predicate-object)
Suboxone used by injection, negative experience Suboxone injection-causes-Cephalalgia
Suboxone used by injection, amount Suboxone injection-dosage amount-2mg
Suboxone used by injection, positive experience Suboxone injection-has_side_effect-Euphoria
experience sucked
feel pretty damn good
didn’t do shit
feel great
Sentiment Extraction
bad headache
+ve
-ve
Triples
DOSAGE PRONOUN
INTERVAL Route of Admin.
RELATIONSHIPS SENTIMENTS
DIVERSE DATA TYPES
ENTITIES
I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It gave me a bad headache, for hours, and I almost vomited. I could feel the bupe working but overall the experience sucked.
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now.
I was sent home with 5 x 2 mg Suboxones. I also got a bunch of phenobarbital (I took all 180 mg and it didn't do shit except make me a walking zombie for 2 days). I waited 24 hours after my last 2 mg dose of Suboxone and tried injecting 4 mg of the bupe. It gave me a bad headache, for hours, and I almost vomited. I could feel the bupe working but overall the experience sucked.
Of course, junkie that I am, I decided to repeat the experiment. Today, after waiting 48 hours after my last bunk 4 mg injection, I injected 2 mg. There wasn't really any rush to speak of, but after 5 minutes I started to feel pretty damn good. So I injected another 1 mg. That was about half an hour ago. I feel great now.
• Data processing for personalized healthcare is lot more than a Big Data processing problem
• It is all about the human – not computing, not device: help them make better decisions, give actionable information– Computing for human experience
• Whatever we do in Smart Data, focus on human-in-the-loop (empowering machine computing!):– Of Human, By Human, For Human– But in serving human needs, there is a lot more than what
current big data analytics handle – variety, contextual, personalized, subjective, spanning data and knowledge across P-C-S dimensions
• Note:• For images and sources, if not on slides, please see slide notes• Some images were taken from the Web Search results and all such images belong
to their respective owners, we are grateful to the owners for usefulness of these images in our context.