Towards Evidence-Based Discovery Informatics Tools for Synthesis Guest Speaker : Tim Cary Catherine Blake School of Information and Library Science University of North Carolina at Chapel Hill http://www.ils.unc.edu/~cablake [email protected]
Jan 03, 2016
Towards Evidence-Based Discovery
Informatics Tools for Synthesis
Guest Speaker : Tim CaryCatherine Blake
School of Information and Library ScienceUniversity of North Carolina at Chapel Hill
http://www.ils.unc.edu/[email protected]
Systematic Review Process
– Formulate the problem– Locate and select studies– Assess quality of studies– Collect data – Analyze and present results– Interpret results– Improve and update review
28 months frominitial idea topublication
Increased demand due to evidence-
based medicine
I teration
Co llaboration
A n alysisE xtraction
Con textIn form ation
H ypothesisP ro jection
R etrieval Corpus
M E D L IN E
E m base V erifi cationFacts
Manual Synthesis
Select Extract AnalyzeVerify
Guesswork guided by scientifically trained intuition
Rescher (1978)
4
Cochrane - RevMan• Review Manager (RevMan) is the software used
for preparing and maintaining Cochrane reviews. • You can use RevMan for protocols and full reviews.
It is most useful when you have formulated the question for the review, and allows you to prepare the text, build the tables showing the characteristics of studies and the comparisons in the review, and add study data. It can perform meta-analyses and present the results graphically.
• Source: http://www.cc-ims.net/RevMan
Cochrane - GRADEpro
• GRADEpro (GRADEprofiler) is the software used to create Summary of Findings (SoF) tables in Cochrane systematic reviews. It can retrieve data of the systematic review and meta-analyses from a Review Manager 5 file, combine these data with user-entered data, and then export a Summary of Findings table ready for import into Review Manager 5. It performs many of the calculations necessary to present the key results of systematic reviews in a table format and guides users through the process of grading the quality of the evidence using the GRADE approach.
• Source: http://www.cc-ims.net/gradepro
5
Reporting Guidelines
• CONSORT - reporting of RCTs• PRISMA (formerly QUOROM) [PDF document]
- preferred reporting items for systematic reviews and meta-analyses
• STROBE - reporting of observational studies in epidemiology
• EQUATOR Network - collection of reporting guidelines
• Source: http://www.cochrane.org/index_authors_researchers.htm
6
Selection Step• Typical information retrieval framing
– Input: MEDLINE– Output: Articles included in previous studies– Goal: identify weighting schemes that identify
only articles included in a traditional analysis• Examples
– Cohen AM, Hersh WR, Peterson K, Yen PY. Reducing Workload in Systematic Review Preparation Using Automated Citation Classification. JAMIA 2006;13(2):206-219.
– Demner-Fushman D, Seckman C, Fisher C, Hauser S, Clayton J, Thoma G. Prototype System To Support Evidence-based Practice. AMIA Annu Symp Proc. November 2008:151-5.
7
Context Information
• Study Information– e.g. date, location, ...
• Population Information– e.g. gender, age, ...
• Risk Factor or Intervention– e.g. duration of exposure, confounders
• Disease– e.g. stage, confounders
Loosely coupledto review focus
Tightly coupledto review focus
I teration
Co llaboration
ExternalD ata
A n alysisE xtraction
Con textIn form ation
H ypothesisP ro jection
R etrieval Corpus
M E D L IN E
E m base V erifi cationFacts
Collaborative Information Synthesis
Key: Estimate Missing Information
What are people with Breast Cancer exposed to?
What are people in a similar population exposed to?
Are these rates significantly different?
Studies with Breast Cancer patients
Database of risk factorsBRFSS
Facts for each study•number of patients•age of patients •geographic location•risk-factor exposure …
Codebook•question asked•age, gender•% responses
1 2
3
T. Tengs & N. D. Osgood (2001) “The link between smoking and Impotence: Two Decades of Evidence”, Preventive Medicine, 32:447-52
More than Automated Meta-Analysis
Systematic Review
External database
Entire study
Main topicSecondary Information
Key
Information SynthesisInformation Synthesis
• Traditional analysis– same study design– medicine = RCT– epidemiology =
cohort
• Information Synthesis– any study that
includes required information
– augment missing information
12
EducationDiscovery Science
Evidence-based Practice
Natural LanguageProcessing
Human Discovery and
Synthesis
Human-assisted
Discovery and
Synthesis
Heterogeneous Literature
Core
Chemistry
Breast Cancer
Genomics
Synthesis andDiscovery Work
Practices
News
DocSouth
Natural LanguageProcessing
13
METIS Information Extractor
• Semantic Grammar• Features: words, numbers, and semantic types in
the Unified Medical Language System (UMLS)
• Information extracted :• risk factor exposure (tobacco and alcohol ) gender• age (min, max, mean) start and end dates• number of subjects with medical condition geographical
location
{term;’age’} {term:’of’} {number;10<n2<110}{term;’to’}{number;10<n2<110}
The age of breast cancer subjects ranged between 20 to 64 years old.
{semantic type: neoplastic process, or disease}
METIS Info Extractor – Evaluation
• Diverse text corpus– epidemiology, surgery, biology, ...– cohort studies, case-control trials, ...
• Evaluation– Metrics (precision, recall)– Annotators (developer, domain expert,
expert annotator, novice) – Primary topic (breast cancer, impotence)– Secondary information (tobacco and
alcohol consumption)
METIS Info Extractor – Recall
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5Rank
Rec
all
Development
Domain Expert
Expert Annotator
Novice Annotator
METIS Info Extractor – Precision
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5Rank
Pre
cis
ion
Development
Domain Expert
Expert Annotator
Novice Annotator
Verify information extracted
Electronic version of article
Converted Article
METIS Verifier
METIS Verifier
METIS Analyzer
• Meta-Analysis– Developed for agricultural application– Requires empirical studies with a
quantitative outcome– Unit of study is an article - not a person– Result – a unitless metric called an effect size
• Two common meta-analysis techniques– Fixed effects– Randomized-effects model
Evaluation: Compared generated effect size with examples in text books and published articles,
Result: Same effect size
Synthetic Estimate Evaluation
0
0.2
0.4
0.6
0.8
1
1 2 3 4 Average
Article Identifier
Co
ntr
ol R
ate
Actual
Estimated
TobaccoConsumption
0
0.2
0.4
0.6
0.8
1
1 2 3 4 AverageArticle Identifier
Co
ntr
ol R
ate
Actual
Estimated
AlcoholConsumption