Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies Mark Wilkinson World Research & Innovation Congress, Brussels, 2013 Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain Adjunct Professor of Medical Genetics, University of British Columbia Vancouver, BC, Canada.
85
Embed
Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies
Improving Transparency and Reproducibility of Biomedical Research Using Semantic Technologies Mark Wilkinson World Research & Innovation Congress, Brussels, 2013. Isaac Peral Senior Researcher in Biological Informatics Centro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Improving Transparency and Reproducibility
of Biomedical ResearchUsing Semantic Technologies
Mark Wilkinson
World Research & Innovation Congress, Brussels, 2013
Isaac Peral Senior Researcher in Biological InformaticsCentro de Biotecnología y Genómica de Plantas, UPM, Madrid, Spain
Adjunct Professor of Medical Genetics, University of British ColumbiaVancouver, BC, Canada.
Making the Web abiomedical research platform
from hypothesis through to publication
Publication
Discourse
Hypothesis
Experiment
Interpretation
Motivation:
3 intersecting trends in the Life Sciences
that are now, or soon will be,extremely problematic
NON-REPRODUCIBLE SCIENCE & THE FAILURE OF PEER REVIEW
TREND #1
Trend #1
Multiple recent surveys of high-throughput biology
- Huang & Gottardo, Briefings in Bioinformatics, 2012
Trend #1
Trend #1
“the most common errors are simple,the most simple errors are common”
At least partially because the analytical methodology was inappropriate
and/or not sufficiently described
- Baggerly, 2009
Trend #1
These errors pass peer review
The researcher is (sometimes) unaware of the error
The process that led to the error is not recorded
Therefore it cannot be detected during peer-review
Agencies have Noticed!
In March, 2012, the US Institute of Medicine ~said
“Enough is enough!”
Agencies have Noticed!
Institute of Medicine RecommendationsFor Conduct of High-Throughput Research:
Evolution of Translational Omics Lessons Learned and the Path Forward. The Institute of Medicine of the National Academies, Report Brief, March 2012.
1. Rigorously-described, -annotated, and -followed data management and manipulation procedures
2. “Lock down” the computational analysis pipeline once it has been selected
3. Publish the analytical workflow in a formal manner, together with the full starting and result datasets
BIGGER, CHEAPER DATATREND #2
Trend #2
High-throughput technologies are becomingcheaper and easier to use
Trend #2
High-throughput technologies are becomingcheaper and easier to use
But there are still very few experts trained in statistical analysis of high-throughput data
Trend #2
Therefore
Even small, moderately-funded laboratories can now afford to produce more data
than they can manage or interpret
“THE SINGULARITY”TREND #3
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009
Slide adapted with permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
The Healthcare Singularity and the Age of Semantic Medicine, Michael Gillam, et al, The Fourth Paradigm: Data-Intensive Scientific Discovery Tony Hey (Editor), 2009Slide Borrowed with Permission from Joanne Luciano, Presentation at Health Web Science Workshop 2012, Evanston IL, USAJune 22, 2012.
“The Singularity”
The X-intercept is where, the moment a discovery is made, it is immediately put into practice
We have not over-trivialized the problemof interpreting clinical data...
Measurement Units
One example of the “little ways” that Semantics will help clinical researchers
day-by-day
Units must be harmonized
Don’t leave this up to the researcher(it’s fiddly, time-consuming, and error-prone)
NASA Mars Climate Orbiter
Oops!
ID
HEIGHT
WEIGHT
SBP CHOL
HDL
BMI
GR
SBP
GR
CHOL
GR
HDL
GR
pt1 1.82 177 128 227 55 0 0 1 0
pt2 179 196 13.4 5.9 1.7 1 0 1 0
The Chaos of Real-world Clinical Datasets(this is a snapshot of an actual dataset we worked on)
Height in m and cm Chol in mmol/l and mg/l
...and other delicious weirdness
GOAL: get the clinical researcher “out of the loop” once the data is collected
(as per the Institute of Medicine Recommendations)
Semantically defining clinical phenotypes;Building on the expertise of others
SystolicBloodPressure =
GALEN:SystolicBloodPressure and ("sio:has measurement value" some "sio:measurement" and ("sio:has unit" some “om: unit of measure”) and (“om:dimension” value “om:pressure or stress dimension”) and "sio:has value" some rdfs:Literal))
Very general definition“some kind of pressure unit”
(so that others can build on this as they wish!)
HighRiskSystolicBloodPressure (as defined by Framingham)
SystolicBloodPressure and sio:hasMeasurement some (sio:Measurement and (“sio:has unit” value om:kilopascal) and (sio:hasValue some double[>= "18.7"^^double])))
Now we are specific to our clinical study:MUST be in kpascal and must be > 18.7
Semantically defining clinical phenotypes;Building on the expertise of others
SELECT ?record ?convertedvalue ?convertedunitFROM <./patient.rdf> WHERE {
RecordID Start Val Start Unit Pressure End Unit Pt1 15 cmHg 19.998 KiloPascalPt2 14.6 cmHg 19.465 KiloPascalPt1 148 mmHg 19.731 KiloPascalPt2 146 mmHg 19.465 KiloPascal
Running the Clinical Analysis
All measurements have now been automaticallyharmonized to KiloPascal, because we encoded thesemantics in the model
Visual inspection of our output data and the AHA guidelines
showed that in many cases the clinician
“tweaked” the guidelines when doing their own analysis
------------------AHA BMI risk threshold: BMI=25
In our dataset the clinical researcher used BMI=26------------------
AHA HDL guideline HDL<=1.03mmol/lThe dataset from our researcher: HDL<=0.89mmol/l
-------------------
Visual inspection of our output data and the AHA guidelines
showed that in many cases the clinician
“tweaked” the guidelines when doing their own analysis
These Alterations Were Not Recorded in Their Study Notes!
Adjusting our Semantic definitions and re-running the analysisresulted in nearly 100% correspondence with the clinical researcher
HighRiskCholesterolRecord=
PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.0]))))
HighRiskCholesterolRecord=
PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.2]))))
Reflect on this for a second... Because this is important!
1. We semantically encoded clinical guidelines
2. We found that clinical researchers did not follow the official guidelines
3. Their “personalization” of the guidelines was unreported
4. Nevertheless, we were able to create “personalized” Semantic Models
5. These reflect the opinion of an individual domain-expert
6. These models are shared on the Web
7. Can be automatically re-used by others to interpret their own data using
that clinical expert’s viewpoint
AHA:HighRiskCholesterolRecord
PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.0]))))
McManus:HighRiskCholesterolRecord
PatientRecord and (sio:hasAttribute some (cardio:SerumCholesterolConcentration and sio:hasMeasurement some ( sio:Measurement and (sio:hasUnit value cardio:mili-mole-per-liter) and (sio:hasValue some double[>= 5.2]))))
PREFIX AHA =http://americanheart.org/measurements/