Data-driven visualization of drug interactions
Data-driven visualization of drug interactions
Adverse Drug Events
• Almost 1 million deaths/injuries each year in the US[1]
• Some fraction of ADEs are caused by previously unknown drug-drug interactions• Clinical trials aren’t large enough to detect many potential
interactions
• FDA, WHO, pharmaceutical companies maintain databases of reported[2] ADEs• You can download a sample of the FDA data from the Adverse
Event Reporting System website[3]
• We can analyze the reported data to identify suspicious drug interactions
Copyright 2011 Cloudera Inc. All rights reserved
Challenges in Analyzing Adverse Drug Events
• Biased Sample• Adverse event reporting is voluntary• We don’t see events from patients who took the drugs and
nothing happened
• Correlation != Causation• No controlled trials, some correlations are coincidences
• Requires Advanced Statistical Modeling Skills• Multi-item Gamma Poisson Shrinkage Estimator is used to
score the significance of a drug interactions• The model is too complex to solve directly, we use Expectation
Maximization (EM) to estimate its parameters
Copyright 2011 Cloudera Inc. All rights reserved
The Hard Problem: Counting
• It is a “small” data problem…• 250,000+ events reported to the FDA annually
• …that explodes when we consider:• Multi-drug, multi-symptom interactions• Analyzed by strata (e.g., month of report, patient age, patient
gender, etc.)• ~1 million reports => ~360 million buckets
• Analysts typically filter the data to only consider a few adverse reactions at a time…
• …but that is not the way of the data scientist
Copyright 2011 Cloudera Inc. All rights reserved
Solving the Hard Problem
• MapReduce on Hadoop• 20 MapReduce jobs• Filter, aggregate, join, aggregate again• Model the resulting data in R• Use MapReduce to apply the model parameters to the data,
score each drug-drug interaction, and then filter the data to obtain the highest scoring interactions
• Visualizing the Results• Even applying a restrictive filter on the scores, we end up with
20,000+ statistically significant drug-drug-reaction triples
Copyright 2011 Cloudera Inc. All rights reserved
The Drug-Drug Interaction Graph
Copyright 2011 Cloudera Inc. All rights reserved
HIV Medications
Copyright 2011 Cloudera Inc. All rights reserved
Cancer Medications
Copyright 2011 Cloudera Inc. All rights reserved
Exploring the Graph
Copyright 2011 Cloudera Inc. All rights reserved
Bridges Between Dense Clusters
Copyright 2011 Cloudera Inc. All rights reserved
Copyright 2011 Cloudera Inc. All rights reserved
Acknowledgments and References
• Thanks to Josh Wills, Director of Data Science at Cloudera, for the data collection and analysis shown here.
• References:• [1] ADE instances/year:
http://www.ahrq.gov/qual/aderia/aderia.htm• [2] AERS reporting site:
http://www.ahrq.gov/qual/aderia/aderia.htm• [3] Download ADE instance data:
http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm
• Other resources:• http://www.cloudera.com/blog• http://wiki.cloudera.com/
Copyright 2011 Cloudera Inc. All rights reserved 12