2012-10-08 Practical Semantics In The Pharmaceutical Industry - The Open PHACTS Project

Post on 10-May-2015

5701 Views

Category:

Health & Medicine

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Keynote presentation given by Lee Harland at EKAW 2012 http://rd.springer.com/chapter/10.1007/978-3-642-33876-2_1

Transcript

http://openphacts.org pmu@openphacts.org

@Open_PHACTS

Source: Nature Reviews Drug Discovery 11, 191-200 (March 2012) | doi:10.1038/nrd3681 Jack W. Scannell, Alex Blanckley, Helen Boldon & Brian Warrington

Source: Nature Reviews Drug Discovery 3, 711-716 (August 2004) | doi:10.1038/nrd1470 Ismail Kola & John Landis

harmful

harmful useless

http://www.medicalprogresstoday.com/spotlight/spotlight_indarchive.php?id=1039

Derek Lowe

http://www.ebi.ac.uk/Information/Brochures/pdf/EMBL-EBI%20Annual%20Report%202011.pdf

297,650

http://www.forbes.com/sites/matthewherper/2011/04/13/a-decade-in-drug-industry-layoffs/

¤ Built to primary use-case ¤ Tailored indexes ¤ Tailored GUIs ¤ Unique language &

metadata ¤ Poor interoperability/

integration

Literature HR Synthesis Portfolio SAR Docs Safety In vivo Etc

Information Tombs…

The Outside World

Precompetitive Informatics

Public Domain Drug Discovery Data: Pharma are accessing, processing, storing & re-processing

LiteraturePubChem

GenbankPatents Databases

Downloads

Data Integration Data Analysis Firewalled Databases

Repeat @ each

company x

Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944

•  EC funded public-private partnership for pharmaceutical research

•  Focus on key problems –  Efficacy, Safety,

Education & Training, Knowledge Management

The Innovative Medicines Initiative

The Open PHACTS Project •  Create a semantic integration hub (“Open

Pharmacological Space”)… •  Runs 2011-2014 •  Deliver services to support on-going drug

discovery programs in pharma and public domain

•  Leading academics in semantics, pharmacology and informatics, driven by solid industry business requirements

•  23 academic partners, 8 pharmaceutical companies, 3 software SMEs

•  Work split into clusters: •  Technical Build •  Scientific Drive •  Community & Sustainability

`

Pathways

Pharmacological Activities

Biological Processes

Transcripts

Pathological Processes

Diseases

Genes

Proteins

Interactions

Clinical Drug Applications

Indications

Drugs

Compounds

Chemicals

Optimised To Business Questions

Number   sum   Nr  of  1   Ques-on  

15 12   9   All  oxido,reductase  inhibitors  ac6ve  <100nM  in  both  human  and  mouse  

18 14   8  Given  compound  X,  what  is  its  predicted  secondary  pharmacology?  What  are  the  on  and  off,target  safety  concerns  for  a  compound?  What  is  the  evidence  and  how  reliable  is  that  evidence  (journal  impact  factor,  KOL)  for  findings  associated  with  a  compound?  

24 13   8  Given  a  target  find  me  all  ac-ves  against  that  target.  Find/predict  polypharmacology  of  ac-ves.  Determine  ADMET  profile  of  ac-ves.  

32 13   8   For  a  given  interac-on  profile,  give  me  compounds  similar  to  it.  

37 13   8  The  current  Factor  Xa  lead  series  is  characterised  by  substructure  X.  Retrieve  all  bioac-vity  data  in  serine  protease  assays  for  molecules  that  contain  substructure  X.  

38 13   8  Retrieve  all  experimental  and  clinical  data  for  a  given  list  of  compounds  defined  by  their  chemical  structure  (with  op-ons  to  match  stereochemistry  or  not).  

41 13   8  

A  project  is  considering  Protein  Kinase  C  Alpha  (PRKCA)  as  a  target.  What  are  all  the  compounds  known  to  modulate  the  target  directly?  What  are  the  compounds  that  may  modulate  the  target  directly?  i.e.  return  all  cmpds  ac-ve  in  assays  where  the  resolu-on  is  at  least  at  the  level  of  the  target  family  (i.e.  PKC)  both  from  structured  assay  databases  and  the  literature.  

44 13   8   Give  me  all  ac-ve  compounds  on  a  given  target  with  the  relevant  assay    data  46 13   8   Give  me  the  compound(s)  which  hit  most  specifically  the  mul-ple  targets  in  a  given  pathway  (disease)  59 14   8   Iden-fy  all  known  protein-­‐protein  interac-on  inhibitors  

Goals

Platform GUI

Standards

Apps

API

A Precompetitive Knowledge Framework

Integration

Pharma Needs

Inputs

Sustainability Stability Security

Management / Governance

Data Mining Services/Algorithms

Mapping & Populating Architecture Interfaces

& Services

Content Structured & Unstructured

Vocabularies & Identifiers

(URIs)

Community KD Innovation

Data Cache (Virtuoso Triple Store)

Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services

Open PHACTS Explorer 1st Gen Apps

Identity Resolution

Service (ConceptWiki)

Chemistry Normalisation & Q/C ChemSpider

Identifier Management

Service (BridgeDb+)

Partner Apps

Data Import

P12374 EC2.43.4

CS4532

“Adenosine receptor 2a”

Oct. 2012

Public Content Commercial

Public Ontologies

User Annotations

P12047 X31045!

GB:29384!

Issues

¤ Provenance

¤ Conflicting Authorities

¤ Management

¤ Transitivity

Whats “equal” anyway?

Gleevec® = Imatinib Mesylate

Imatinib Mesylate YLMAHDNUQAMNNX-UHFFFAOYSA-N

Search “Gleevec”

PubChem Drugbank ChemSpider

Imatinib

Mesylate

Consequences…..

Ignore Salts?

NCX-911 Viagra ®

The 18th International Conference on Knowledge Engineering and Knowledge Management is concerned with all aspects of eliciting, acquiring, modeling and managing knowledge, and its role in the construction of knowledge-intensive systems and serv ices for the semantic web, knowledge management, e-business, natural language processing, intelligent information integration, etc. The focus of the 18th edition of EKAW will be on " K n o w l e d g e E n g i n e e r i n g a n d K n o w l e d g e Management that matters".

Dynamic Equality

§  Tuneable (same data, different questions) §  Domain specific §  User driven §  Traceable

Strict Relaxed

Analysing Browsing

LinkSet#1 { chemspider:gleevec hasParent imatinib ... drugbank:gleevec exactMatch imatinib ... }

linkSet1{ chemspider:aspirin exactMatch chembl:aspirin …. } linkSet2{ imantinib_mesylate hasParent imatinib …. } linkSet3{ (+)Staurosporine enantiomer (-)Staurosporine …. } linkSet4{ vanillaEssence hasPart Vanillin …. }

Profile P1 “Broad”

Profile P2 “Parents”

Profile P2 “Strict”

The Identifier Mapping Service

Identity Mapping Service

(BridgeDB)

Query Expander

Service

cw:979b545d-f9a9 cheminf:logd ?logd

cw:979b545d-f9a9

?iri cheminf:logd ?logd .FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || ?iri = chembl:1280 || ?iri = db:db00945 || …) … }

For each line of SPARQL:

[cs:2157, chembl:1280,db:db00945]

parse

recognise

expand

transform

Profiles

Mappings

Q, P1 context GRAPH <http://rdf.chemspider.com> {

Q’

Based on ve2 editor http://lab.linkeddata.deri.ie/ve2/

Shouldn’t an integration system be able to tell you exactly what its integrating?

## your dataset description :myDS rdf:type void:Dataset ; foaf:homepage <http://example.org/> ; dcterms:title "Example Dataset"^^xsd:string ; dcterms:description """A simple dataset in RDF."""^^xsd:string ; pav:license <http://creativecommons.org/licenses/by-sa/3.0/> ; void:uriSpace "http://example.org/"^^xsd:string ; pav:retrievedFrom <http://exampledownload.com> ; pav:retrievedOn "2012-09-19"^^xsd:date ; pav:retrievedBy <http://some_web_id> ; pav:version "15.5"^^xsd:string ;

Provenance Everywhere

<inDataset href=“http://rdf.chemspider.com/void.rdf#chemSpiderDataset” />

Nanopublications

!

Credit For Curation

Quality Assertions

ChemSpider Validation & Standardization Platform http://bit.ly/NZF5VB

QUDT (http://www.qudt.org/)

STANDARD_TYPE UNIT_COUNT ---------------- ------- AC50 7 Activity 421 EC50 39 IC50 46 ID50 42 Ki 23 Log IC50 4 Log Ki 7 Potency 11 log IC50 0

STANDARD_TYPE STANDARD_UNITS COUNT(*) ------------------ ------------------ -------- IC50 nM 829448 IC50 ug.mL-1 41000 IC50 38521 IC50 ug/ml 2038 IC50 ug ml-1 509 IC50 mg kg-1 295 IC50 molar ratio 178 IC50 ug 117 IC50 % 113 IC50 uM well-1 52 IC50 p.p.m. 51 IC50 ppm 36 IC50 uM-1 25 IC50 nM kg-1 25 IC50 milliequivalent 22 IC50 kJ m-2 20

~ 100 units

>5000 types

Licencing

Linked Closed Data

Kick-Starting Sustainability

Apps

API

•  Chem-Bio Navigator •  Target Dossier •  Polypharmacology Browser •  Utopia Documents •  Disease Maps •  … more

Conclusions

¤ Project designed for the new drug discovery environment

¤ Timing with RDF/SW is good ¤ Companies eager to see whether it can really make a

difference

¤ Challenge: Got to be better than state of the art (in 3 years!)

¤ Funding challenges are formidable

Acknowledgements ¤  Many members of the consortium who have contributed to data, use cases,

funding, support, documentation, management

¤  EBI: John Overington, Anna Gaulton, Mark Davies

¤  Lundbeck: Sune Askjær

¤  Maastricht: Chris Evelo, Andra Waagmeester, Egon Willighagen

¤  Manchester: ¤  Carole Goble, Alasdair Gray, Christian Brenninkmeijer ¤  Steve Pettifer, Ian Dunlop, Rishi Ramgolam, James Eales

¤  NBIC: Barend Mons, Kees Burger

¤  RSC: Antony Williams, Valery Tkachenko

¤  SIB: Christine Chichester

¤  VU: Frank van Harmelen, Paul Groth, Antonis Loizou

¤  OpenLink: Orri Erling, Yrjana Rankka, Hugh Williams

¤  Chem2Bio2RDF: David Wild, Bin Chen

More Info

pmu@openphacts.org

http://openphacts.org

@Open_PHACTS

lee@connecteddiscovery.com

@Scibitely

backup

Find me the off-target activities of known cancer

drugs who's primary target is a cell cycle regulatory kinase

ChEMBL DrugBank Gene Ontology Wikipathways

Uniprot

ChemSpider

UMLS

ConceptWiki

ChEBI

Connected Using Semantic Technology

Are these Interleukin 1A?

http://bio2rdf.org/uniprot:P01583

http://identifiers.org/uniprot/P01583

Human Interleukin 1A Protein

Human Interleukin 1A Protein

Entrez Gene: 3552, Ensembl:ENSG00000115008

1ITA (3D) 2ILA (3D) 2KKI (3D) 2L5X (3D) IL1A PDB Structures

Uniprot:P01582 Mouse Interleukin 1A

Human Interleukin 1A Gene

1076_at, 210118_s_at, 208200_at, 208200_at Affymetrix probes hIL1A

….etc

“There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar?

No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.”

Sir Tim Berners-Lee

Are These Vanilla?

Multiple Namespaces

Uniprot database ID: P26838

http://identifiers.org/uniprot/P26838 http://bio2rdf.org/uniprot:P26838 http://uniprot.bio2rdf.org/uniprot:P26838 http://chem2bio2rdf.org/uniprot/resource/P26838 http://purl.uniprot.org/uniprot/P26838 ……

What’s this?

http://www.drugbank.ca/drugs/DB00203

/Viagra

Data sets

top related