ontent-Mining for Clinical Tria Peter Murray-Rust contentmine.org Cochrane UK, Oxford, 2015-03-16 • OPEN Platform for Machines+humans to automatically “read” the trials literature • Grow communities and give everyone the tools and know-how to mine trials
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Content-Mining for Clinical TrialsPeter Murray-Rust
contentmine.orgCochrane UK, Oxford, 2015-03-16
• OPEN Platform for Machines+humans to automatically “read” the trials literature
• Grow communities and give everyone the tools and know-how to mine trials
• 09:30 - Introductions10:00 - Overview of ContentMine10:30 - Discussion: why might content mining clinical trials be useful?11:00 - Tea/coffee break11:15 - Discussion: current tools and what is needed12:00 - Discussion: imagining the clinical trials mining pipeline12:30 - Lunch13:30 - Demo and introduction to software14:30 - Technical session 1 (hands-on content mining)15:30 - Tea/coffee break15:45 - Technical session 2 (hands-on content mining)17:00 - Event close
Background for Today• Contentmine aims to make large areas of scientific fact OPEN (100
million facts/year)• We’re working with WellcomeTrust, Europe PubMedCentral, etc.• A politically “hot” area (Hargreaves legislation, EU activity)• A week ago WellcomeTrust workshop on TDM and Neuroscience; “rough
consensus” on what was needed.• In the last few days we’ve prototyped what we think is a good starting
point…• NOTE: The software is very “bleeding edge”! Please treat in a spirit of
adventure!!
• Vision/enthusiasm from Amy Price, Anna Noel-Storr, Emily Sena (E’burgh) and yourselves!
Questions we could tackle
• How to we find (mentions of) clinical trials?• Is a document a (clinical) trial?• What is the subject of the trial?• What is the methodology used?• Does the design and practice conform to CONSORT?• What are the outcomes?• Can we extract specific re-usable information?• Who are involved? (researchers, sponsors, patients?)• Has a proposed trial been completed and reported?
Afternoon session
• Work in groups; mixture of skills and experience
• Take different sections of CONSORT• Scrape articles from trialsjournal.com• Explore word frequency – create your own
lists of frequent words• Design regexes to extract CONSORT 8a->11
2014-May->Nov• Budapest/Shuttleworth• Leicester Univ• Electronic Theses and Dissertations• Austrian Science Fund AT• OKFest DE• Eur. Bioinformatics Institute• Open Science Rio de Janeiro BR• Sci DataCon , Delhi IN• Univ of Chicago US• OpenCon 2014, Wash DC. US• JISC , London
Upcoming• LIBER • Cochrane• BL• Wellcome Trust (April)• WHO
Collaborators
• Wikimedia/Wikidata• Mozilla• Open Knowledge• LIBER (European Research Libraries)• British Library• Wellcome Trust• EBI (Eur. Bioinf. Inst.)• JISC• Open Access Button• SPARC• Creative Commons• CORE• EuropePubmedCentral
• CRAWL the web for scientific documents (articles, grey literature, repositories)• quickSCRAPE pages (text, graphics, images, data)• NORMA-lize page to semantic form
…Open semantic science …• MINE pages with your methods and tools (AMI)
• CAT-alogue results in searchable index• Automate daily process (CANARY)
Facts Marked by “non-scientists” in ContentMine workshops
With Wikipedia everyone can be a scientist
“nuggets” in a scientific paper
quantity
units
Value ranges
Humans aren’t designed to mine this … chemical
project places
Advanced Plugins
http://chemicaltagger.ch.cam.ac.uk/
• Typical
Typical chemical synthesis
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
UNITS
TICKS
QUANTITYSCALE
TITLES
DATA!!2000+ points
VECTOR PDF
Dumb PDF
CSV
SemanticSpectrum
2nd Derivative
Smoothing Gaussian Filter
Automaticextraction
AMI https://bitbucket.org/petermr/xhtml2stm/wiki/Home
Example reaction scheme, taken from MDPI Metabolites 2012, 2, 100-133; page 8, CC-BY:
AMI reads the complete diagram, recognizes the paths and generates the molecules. Then she creates a stop-fram animation showing how the 12 reactions lead into each other