www.guidetopharmacology.org Will the real drugs and targets please stand up? Evolving consensus-based curatorial strategies Chris Southan, IUPHAR/BPS Guide to PHARMACOLOGY Web portal Group, Centre for Integrative Physiology,School of Biomedical Sciences, University of Edinburgh, Hugh Robson Building, Edinburgh, EH8 9XD, UK. [email protected]Presented to the Gloriam/GPCRDB Team and the Dept. of Pharmaceutical Sciences, University of Copenhagen, 6 th May 2014 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
www.guidetopharmacology.org
Will the real drugs and targets please stand up?Evolving consensus-based curatorial strategies
Chris Southan, IUPHAR/BPS Guide to PHARMACOLOGY Web portal Group, Centre for Integrative Physiology,School of Biomedical Sciences, University of Edinburgh, Hugh Robson Building, Edinburgh,
reviews on various topics in pharmacological journals and through the IUPHAR database.
• Subcommittees update their database pages annually.• Continuously expanding to incorporate new data types, new
targets and ligands and new domain committees• Public database releases every 3-4 months
Content
Detailed annotation
Pharmacological and clinical data
Wellcome Trust Grant 099156/Z/12/Z
• Key objective: “encompass all the human targets of current prescription medicines and the likely targets of future medicines”
• Conceptually familiar from our established receptor/channel-centric database
• But - needed to re-define curatorial approaches, caveats and end-points
• Balance between theoretical rigour and pragmatic utility
• Four foci - grant fulfilment, user value, data mining, data consumption
• Discuss and document changes in curatorial strategies with practical guidelines
• Add enhancements, new relationships and features
• Control activity-mapping stringencies and relationship distributions
• QC legacy content, harmonise and remediate where necessary
• Aim for small, but perfectly-formed, data content vs. complete coverage
7
Technical implementation
• Restrict relationships to citable/provenanced quantitative mappings (typically IC50, Ki, Kd)
• Formally tag data-supported “primary targets”
• Only data-supported polypharmacology
• Mask nutraceuticals, metabolites or endogenous hormones from bloating drug > target relationship space
• Limit drug > multiple subunit mappings to direct interactions
• Normalize targets to UniProt IDs and Swiss-Prot for human
• Normalise drugs and ligands to PubChem compound records (CIDs)
• Extend useful relationships e.g. drug > prodrug, drug > active metabolite, ligand = target (antibody > cytokine)
• Flexibility to handle edge cases (e.g. heparinoids)• Options for selective expansion (e.g. kinases, proteases and
Alzheimer’s)
8
Defining limits for curation
• The good news: capture of targets and drugs in databases and literature reports is continuously expanding
• The bad news: no one agrees on numbers, relationship definitions, curatorial rules, identifiers, exact molecular structures, choices of primary sources or provenance attribution
• More bad news: source proliferation < “circular” annotation • Human target range: 186 approved drugs in 2006
(PMID:17139284 ) < 3,044 in ChEMBL_18• Approved drug ranges: 1,216 FDA Maximum Daily Dose
(PubChem Assay ID 1195) < 2,750 for the NCGC Pharmaceutical Collection (PMID:21525397)
• Outer bioactivity ranges: 8057 INNs < 928,875 actives in PubChem BioAssays < 6.3 million from GVKBIO with SAR from papers and patents
9
Evolution of our consensus strategy
Based on many collective years of curatorial engagement and deep source knowledge we now pursue a consensus approach for the following reasons:
1. Concordant sources are generally more likely to be right than wrong
2. Curatorial efficiency of starting with solid consensus sets3. Multiple sources are informatically synergistic ( if truly
independent)4. Approach is flexible via source updates and testing different filters5. We control total numbers for matching to curatorial capacity6. The concept can easily be explained to users7. The exercise of comparing sources is very informative 8. It forces entity identifier normalisation (via cross-mapping if
necessary)9. Consensus lists per se have value for users (e.g. hosting on
website)
10
Will the real targets please stand up ?
• Compared as human Swiss-Prot IDs for 2013 database releases • Intersect is 351 the union is 3,046 (i.e. 15% of the 20,265 human proteome)• Lists included approved, clinical and research targets
Figure 7d from: “Comparing the chemical structure and protein content of ChEMBL, DrugBank, Human Metabolome Database and the Therapeutic Target Database” PMID: 24533037
• Mathias Rask-Anderson et. al July 2013, 481 approved
• Southan et. al, 2013 3-way human DrugBank/ChEMBL/TTD 352
• 3-way or 2-way, 19 + 40 + 143 = 202 Targets Of Approved Drugs (TOADS) set selected for GToP upload
13
Will the real drugs please stand up?
• Work up the following CID triage inside PubChem• Select DrugBank 1504 “approved” drug structures• Select two additional sources TTD and ChEMBL• Filter to remove salts and mixtures• Select synonym INN (WHO International Non-proprietary
Name). • The final step was the Boolean intersect between all five
14
Observations and caveats
• This set of 923 drugs can be accessed via the MyNCBI open URLhttp://www.ncbi.nlm.nih.gov/sites/myncbi/collections/public/1Fo7u3apR1bzS_UWr1YhHOTkZ/• TTD last submitted in Feb 2012 so drug content is thus capped
to before that date (dropping TTD gives 1117 CIDs)• Some metabolites (e.g. amino acids) come through the filters• Older drugs have no INN (e.g. aspirin) • Some peptide drug CIDs are missing (suggesting low
concordance)• Approved fixed-mixtures are excluded (they do not get an INN)• The computed CID identity is actually a hash-code match,
rather than via InChIKey (but this should give similar numbers)• Each of the 923 had 76 submissions (SIDs)• Applying “same (bond) connectivity” gives 18749 but removing
the virtual deuterated entries reduces this to 6919 (i.e. the 923 have, on average, 7.5 alternative stereo CIDs)