Applying Semantic Web Standards to Drug Discovery and Development Eric Neumann W3C HCLS co-chair
Mar 27, 2015
Applying Semantic Web Standards to Drug Discovery and Development
Eric NeumannW3C HCLS co-chair
2
Knowledge
“--is the human acquired capacity (both potential and actual) to take effective action in varied and uncertain situations.”
How does this translate into using Information Systems better in support of Innovation?
3
Knowledge Predictiveness
• Knowledge of Target Mechanisms• Knowledge of Toxicity • Knowledge of Patient-Drug Profiles
4
Where Information Advances are Most Needed
• Supporting Innovative Applications in R&D– Mol Diagnostics (Biomarkers)– Molecular Mechanisms (Systems)– Data Provenance, Rich Annotation
• Clinical Information– eHealth Records + EDC– Clinical Submission Documents– Safety Information, Pharmacovigilance, Adverse Events– Handling Biomarker evidence
• Standards– Central Data Sources
• Genomics, Diseases, Chemistry, Toxicology
– MetaData• Ontologies• Vocabularies
5
DecisionDecisionSupportSupport
TranslationalTranslationalResearchResearch
ToxTox
NewNewApplicationsApplications
SafetySafety
TargetTargetValidationValidation
BiomarkerBiomarkerQualificationQualification
GOGO
BioPAXBioPAX
ICHICH
Raw DataRaw Data
MAGE MLMAGE ML
ASN1.ASN1.
XLSXLS
Psi XMLPsi XML
CSVCSV
SAS TablesSAS Tables
CDISCCDISC
Semantic BridgeSemantic Bridge
6
Losing Connectedness in Tables
Genes
Tissues
?
Fast Uptake and ease of use, but loose binding to entities and terms
7
Data Integration?
• Querying Databases is not sufficient
• Data needs to include the Context of Local Scientists
• Concepts and Vocabulary need to be associated
• More about Sociology than Technology
Information Knowledge
8
Data Integration: Biology Requirements
Disease Proteins GenesPapers
RetentionPolicy
AuditTrail
Curation Tools Ontology Experiment
Samples
Compounds
9
Standards- Why Not?
• Good when there’s a majority of agreement• By vendors, for vendors?• Mainly about Data Packing-- should be more
about Semantics (user-defined)• Ease and Expressivity• Too often they’re Brittle and Slow to develop• “They’re great, that’s why there are so many of
them”
10
Data Integration Enables Business Integration: Efficiency and Innovation
• Searching
• Visualization
• Analysis
• Reporting
• Notification
• Navigation
11
Searching…
#1 way for finding information in companies…
13
Semantic Web Data Integration
R&D Scientist
Bioinformatics CheminformaticsLIMS Public Data Sources
Dynamic,Linked,
Searchable
14
The Current Web
What the computer sees: “Dumb” links
No semantics - <a href> treated just like <bold>
Minimal machine-processable information
15
The Semantic Web
Machine-processable semantic information
Semantic context published – making the data more informative to both humans and machines
16
The Web of Data
• URI’s are universal ID’s• Distributed data references• Non-locality of data• NamedGraphs can help
segment external references• New meaning for Annotation
target target
gene
pathway
17
Case Study: Omics
ApoA1 …
… is produced by the Liver
… is expressed less in Atherosclerotic Liver
… is correlated with DKK1
… is cited regarding Tangier’s disease
… has Tx Reg elements like HNFR1
Subject Verb Object
18
Courtesy of BG-Medicine
Example:Knowledge Aggregation
20
Tim Berners-Lee’s App View
21
Semantic Web Drug DD Application Space
Genomics
Therapeutics
Biology
HTS
NDA
Compound Opt
safety
eADME
DMPK
informatics
manufacturing
genes
ClinicalStudies
Patent
Chem Lib
Production
22
W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group
• Interest Group formally launched Nov 2005: http://www.w3.org/2001/sw/hcls
• First Domain Group for W3C - “…take SW through its paces”
• An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices
• Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, AlzForum
• SW Supporting Vendors: Oracle, IBM, HP, Siemens, AGFA,
• Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)
23
HCLS Objectives
• Share use cases, applications, demonstrations, experiences
• Exposing collections
• Developing vocabularies
• Building / extending (where appropriate) core vocabularies for data integration
24
HCLS Activities
• BioRDF - data as RDF• BioNLP - unstructured data• BioONT - ontology coordination • Clinical Trials - CDISC/HL7• Scientific Publishing - evidence management• Adaptive Healthcare Protocols
25
Reporting on ProgressionNotify Others of Decisions
ProgressionManager
Found DeterminationsNoted Alternatives
ScientistToxicogenomicist
Shared AnnotationsNotified of Alternatives
Semantic Web in R&D
A Single Compound
Open Data Format and Flexible Linking EnabledData Integration and Collaboration
26
Progression ManagerProject Dashboard
ScientistR&D Commons
ToxicogenomicistExperiment Manager
A Single Compound
R&D Applications in the Semantic Web
27
Other Benefits of Semantic Web
• Enterprise Distributed Connectivity– Universal Resource Identifiers (URI)
• Authenticity– Auditability (Sarbanes-Oxley) – Authorship Non-repudibility
• Privacy– Encryptibility and Trust Networks
• Security – At any level of granularity
28
What is the Semantic Web ?
• http://www.w3.org/2006/Talks/0125-hclsig-em/
It’s AI
It’s Web 2.0
It’sOntologies
It’s DataTracking
It’s a Global Conspiracy
It’s SemanticWebs
It’s TextExtraction
29
W3C Roadmap
• Semantic Web foundation specifications – RDF, RDF Schema and OWL are W3C
Recommendations as of Feb 2004
• Standardization work is underway in Query, Best Practices and Rules
• Goal of moving from a Web of Document to a Web of Data
The Only Open and Web-based Data Integration Model Game in Town
30
Leveraging with Semantic Web
• Free Data from Applications… – Data uniquely defined by URI’s, even across
multiple databases– Mapped through a common graph semantic
model– Data can be distributed (not in one location)– New relations and attributes dynamically added
• As easy as spreadsheets, but with semantics and web locations
Benefit #1
31
Leveraging with Semantic Web
• All things on the Web can have semantics added to them– Ability to define and link in ontologies– Documents Management through Links– Changed data and semantics can be managed as
versions– Semantics can be used to define and apply policies– No Need for complex Middleware
Benefit #2
32
Leveraging with Semantic Web
• Supporting the Management of Knowledge– All data nodes and doc resources can be linked– Ability to represent Assertions and Hypotheses
• Include authorship and assumptions• Use of KD45 logic
– Both Local and Global Knowledge• Scientists can upload partially validated facts
– View Data and Interpretations through Points-of-View (Semantic Lenses)
• Share views with others
Benefit #3
33
The Technologies: RDF
• Resource Description Framework• Think: "Relational Data Format"• W3C standard for making statements of fact
or belief about data or concepts• Descriptive statements are expressed as
triples: (Subject, Verb, Object)– We call verb a “predicate” or a “property”
Subject ObjectProperty
<Patient HB2122> <shows_sign> <Disease Pneumococcal_Meningitis>
34
Universal, semantic connectivity supports the construction of elaborate structures.
What RDF Gets You
35
What does RDF get you?
• Structure is not format-rigid (i.e. tree)– Semantics not implicit in Syntax– No new parsers need to be defined for new data
• Entities can be anywhere on the web (URI)• Define semantics into graph structures
(ontologies)– Use rules to test data consistency and extract important
relations
• Data can be merged into complete graphs• Multiple ontologies supported
36
RDF vs. XML example
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Wang et al., Nature Biotechnology, Sept 2005
AGML
HUPML
37
RDF Stripe Mode
Node>Edge>Node>Edge….
38
RDF Graph
40
gsk:KENPAL rdf:type :Compound ; dc:source http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14698171 ;
chemID “3820” ;
clogP “2.4” ;
kA “e-8” ;
mw “327.17” ;
ic50 { rdf:type :IC50 ; value “23” ; units :nM ; forTarget gsk:GSK3beta } ;
chemStructure “C16H11BrN2O” ;
rdfs:label “kenpaullone” ;
synonym “bromo-paullone” ;
smiles “C1C2=C(C3=CC=CC=C3NC1=O)NC4=C2C=C(C=C4)B” ;
inChI “1/C16H11BrN2O/c17-9-5-6-14-11(7-9)12-8-15(20)18-13-4-2-1-3-10(13)16(12)19- 14/h1-7,19H,8H2,(H,18,20)/f/h18H” ;
xref http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=3820 .
41
DB
Mapping from Current Formats
42
Excel => RDF
ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2 ; ls:GE_Expected_Ratio "0.2726" ; ls:conditionHub gl:BREAST_MALIGNANT } ;
ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:TNFRS ; ls:GE_Expected_Ratio "0.0138" ; ls:conditionHub gl:BREAST_MALIGNANT } ;
ls:indivCell ${ rdf:type ls:GE_Cell; ls:probeHub gl:CASP2 ; ls:GE_Expected_Ratio "0.1275" ; ls:conditionHub gl:BREAST_NORMAL } ;
Casp2
TNFRS
BreastMalig
43
W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group
• Interest Group formally launched Nov 2005: http://www.w3.org/2001/sw/hcls
• First Domain Group for W3C - “…take SW through its paces”
– Not a standards group, but a group to identify the best implementations of current SW Standards!
• An Open Scientific Forum for Discussing, Capturing, and Showcasing Best Practices
• Co-chairs: Dr. Tonya Hongsermeier (Partners HealthCare); Eric Neumann (Teranode)
44
W3C Launches Semantic Web for HealthCare and Life Sciences Interest Group
• First formal meeting: Jan 25-26, 2006 Cambridge, MA
• SW Supporting Vendors: Oracle, IBM, HP, Siemens, Agfa,
• Recent life science members: Pfizer, Merck, Partners HealthCare, Teranode, Cerebra, NIST, U Manchester, Stanford U, U Bolzano, AlzForum,
• Joining W3C gets you in as s group member
– Early access to technology and discussions
– Interaction with potential partners and clients
45
Multiple Ontologies Used Together
Drug targetontologyFOAF
Patentontology
OMIM
Person
Group
Chemicalentity
Disease
SNP
BioPAX
UniProt
Extant ontologies
Protein
Under development
Bridge concept
UMLS
DiseasePolymorphisms
PubChem
46
Potential Linked Clinical Ontologies
Clinical Trialsontology
RCRIM(HL7)
Genomics
CDISC
IRB
Applications
Molecules
Clinical Obs
ICD10
Pathways(BioPAX)
DiseaseModels
Extant ontologies
Mechanisms
Under development
Bridge concept
SNOMED
DiseaseDescriptions
Tox
47
Case Studies
48
Case Study: NeuroCommons.org
• Public Data & Knowledge for CNS
• R&D Forum
• Available for industry and academia
• All based on Semantic Web Standards
49
NeuroCommons
The Recontribution of Knowledge
Publications are usually copyrighted…Knowledge of Nature should be openly shareable!
50
NeuroCommons.org
The Neurocommons project, a collaboration between Science Commons and the Teranode Corporation, is creating a free, public Semantic Web for neurological research. The project has three distinct goals:
1. To demonstrate that scientific impact and innovation is directly related to the freedom to legally reuse and technically transform scientific information.
2. To establish a legal and technical framework that increases the impact of investment in neurological research in a public and clearly measurable manner.
3. To develop an open community of neuroscientists, funders of neurological research, technologists, physicians, and patients to extend the Neurocommons work in an open, collaborative, distributed manner.
52
NeuroCommons First Steps
The first stage is underway:
• Using NLP and other automated technologies, extract machine-readable representations of neuroscience-related knowledge as contained in free text and databases
• Assemble those representations into a graph• Publish the graph with no intellectual
property rights or contractual restrictions on reuse
53
HCLS Neuro Tasks
• Aggregate facts and models around Parkinson’s Disease
• SWAN: scientific annotations and evidence• Use RDF and OWL to describe
– Brain scans in the The Whole Brain Atlas– Neural entries in NCBI’s Entrez Gene Database– ’Brain Connectivity'– Neuronal data in SenseLab– Neurological Disease entries in OMIM
54
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type ><bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >
Case Study: BioPAX (Pathways)
55
<bp:PATHWAYSTEP rdf:ID="xDshToXGSK3bPathwayStep"><bp:next-step rdf:resource="#xGSK3bToBetaCateninPathwayStep"/><bp:step-interactions>
<bp:MODULATION rdf:ID="xDshToXGSK3b"><bp:keft rdf:resource="#xDsh"/><bp:right rdf:resource="#xGSK-3beta"/><bp:participants rdf:resource="#xGSK-3beta"/><bp:name rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Dishevelled to GSK3beta</bp:name><bp:direction rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> IRREVERSIBLE-LEFT-TO-RIGHT</bp: direction ><bp:control-type rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> INHIBITION</bp: control-type ><drug:affectedBy rdf:resource=”http://pharma.com/cmpd/CHIR99102"/><bp: participants rdf:resource="#xDsh"/>
</bp: MODULATION > </bp: step-interactions > </bp: PATHWAYSTEP >
Case Study: BioPAX (Pathways)
Modulation
CHIR99102
affectedBy
56
Case Study: Drug Discovery Dashboards
• Dashboards and Project Reports• Next generation browsers for semantic
information via Semantic Lenses• Renders OWL-RDF, XML, and HTML
documents• Lenses act as information aggregators
and logic style-sheets
add { ls:TheraTopic hs:classView:TopicView}
58
Drug Discovery Dashboardhttp://www.w3.org/2005/04/swls/BioDash
Topic: GSK3beta Topic
Target: GSK3beta
Disease: DiabetesT2
Alt Dis: Alzheimers
Cmpd: SB44121
CE: DBP
Team: GSK3 Team
Person: John
Related Set
Path: WNT
59
Bridging Chemistry and Molecular Biology
urn:lsid:uniprot.org:uniprot:P49841
Semantic Lenses: Different Views of the same data
Apply Correspondence Rule:if ?target.xref.lsid == ?bpx:prot.xref.lsidthen ?target.correspondsTo.?bpx:prot
BioPax Components
Target Model
60
•Lenses can aggregate, accentuate, or even analyze new result sets
• Behind the lens, the data can be persistently stored as RDF-OWL
• Correspondence does not need to mean “same descriptive object”, but may mean objects with identical references
Bridging Chemistry and Molecular Biology
61
Case Study: Drug Safety ‘Safety Lenses’
• Lenses can ‘focus data in specific ways– Hepatoxicity, genotoxicity, hERG, metabolites
• Can be “wrapped” around statistical tools• Aggregate other papers and findings (knowledge) in
context with a particular project• Align animal studies with clinical results• Support special “Alert-channels” by regulators for
each different toxicity issue• Integrate JIT information on newly published
mechanisms of actions
62
GeneLogic GeneExpress Data
• Additional relations and aspects can be defined additionally
Diseased Tissue
Links to OMIM (RDF)
63
ClinDash: Clinical Trials Browser
Clinical Obs
Expression Data
Subjects
•Values can be normalized across all measurables (rows)
•Samples can be aligned to their subjects using RDF rules
•Clustering can now be done over all measureables (rows)
64
Case Study: Nokia
• Developer’s Forum Portal
65
Case Study: TERANODE Design Suite Supports Laboratory Data and Workflow
• Protocol Modeler– Accelerates workflow
development
– Eliminates database
programming
• Protocol Player– Guides users through workflow
– Automates data capture
– Automates complex data flow
plates
– Integrates lab data with project
and enterprise data
66
Conclusions:Key Semantic Web Principles
• Plan for change • Free data from the application that
created it • Lower reliance on overly complex
Middleware• The value in "as needed" data integration
• Big wins come from many little ones • The power of links - network effect • Open-world, open solutions are cost
effective • Importance of "Partial Understanding"
Efficiency and Innovation:Semantic Web Applications Roadmap