Life sciences is a fast becoming a data problem - in this presentation we explore the challenges faced by scientists wishing to leverage life science and healthcare big data. We demonstrate Qiagram - a collaborative visual, ad hoc query tool for exploring these large complex data sets. Using examples form Adverse Event Reporting Database, MedRA and SNOMED we illustrate how scientists with little IT knowledge can mine these data sets and unlock their potential.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Multiple disparate data sources• Lack of integration patient-molecular-clinical-
assay-payer• “Swiss cheese” problem• Data cleansing/verification/credibility• Standards for data interchange• Privacy concerns• Lack good tools for cross-domain analytics
• Occurs before in-depth statistical/analytics• Explore and probe the data• Determine “what’s interesting”, “what’s relevant”• Generate hypotheses• Ensure data is there to support hypotheses
• Lack of a shared language to support collaboration– Multidisciplinary data
requires domain experts
• Meaningful access to data– Sensitive to
regulatory/compliance
“The hands-on analytics time to write the SAS code and specify clearly what you need for each hypothesis is very time-consuming,” Felix Freuh, CEO, Medco*
*Miller, K. Big Data Analytics, Biomedical Computational Review, Winter 2011/2012
Single researcher in a silo often can go deep into the data, but maybe limited by their domain expertise
Small groups of researchers may be able to collaborate on asking questions but can’t go very deep with the tools they have today
QIAGRAM
Deep Collaboration is when multiple groups of researchers can collaborate in asking questions deeper into the layers of data. Shared domain knowledge allows deeper insights
low level term pref term hlt pref termhlgt pref term soc term
abdominal migraine migraine migraine headaches headaches nervous system disordersacute migraine migraine migraine headaches headaches nervous system disordersband-like headache tension headache headaches nec headaches nervous system disordersbasilar migraine basilar migraine migraine headaches headaches nervous system disorderscephalalgia headache headaches nec headaches nervous system disorderscephalalgia or cephalgia headache headaches nec headaches nervous system disorderscephalgia headache headaches nec headaches nervous system disorders
• The Clinical Practice Research Datalink (CPRD) is the new English NHS observational data and interventional research service, jointly funded by the NHS National Institute for Health Research (NIHR) and the Medicines and Healthcare products Regulatory Agency (MHRA).
• 6 large fact tables with 1 B to 2 B rows
• Example query– Identify patients with coronary artery disease who
• Several characteristics set MarketScan databases apart from other research databases. The core databases, Commercial, Medicare Supplemental, and Medicaid, are huge – over 170 million patients since 1995.
• Over 25 Fact tables 100 M up to 1.5 B rows• Example
– Identify cancer patients, looking at opiate treatment and study duration of the escalation