IUPHAR Database Meeting
Will the Correct Drugs Please Stand up?Chris Southan, Elena
Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and
Jamie A Davies IUPHAR/BPS Guide to Pharmacology (GtoPdb) University
of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK.
Presentation for the 12th GCC Fulda, November 20161
Declarations
2Since circa 2005, team members working on IUPHAR-DB that became
GtoPdb in 2012 have been curating the structures of approved drugs
for human diseasesPartly as a consequence of the work presented
here, we neither claim a definitive, nor error-free, nor a
complete, approved setWe have encountered most of the problems
associated with this exercise first hand, so we empathise with
teams grappling with the same issuesWe are grateful to all the
sources in PubChem used in this comparison studyThe highlighting of
inter-source discordances and particular examples in this
presentation should not be misinterpreted as criticism of those
sources
Surfaced totals: take your pick
3
Intersects between three sources: 2006-2009
4
20092006
Context of the current workSince ~2013 the GtoPdb team noticed
that the structure space around approved drugs was becoming
increasingly multiplexed and fuzzy Curatorial choices were
consequently getting more difficult from a pharmacological angleWe
thus needed a molecular perspective on causes and consequences of
this fuzz with a view to reappraising our drug curation
strategyUpdating PMID 20298516 was a logical approach but the
methods used then had been largely supersededWe had increased our
PubChem exploitation by; paying close attention to our regular
substance submissions and refreshesusing it for curatorial
selectionexploring fuzz via relationship navigation in
PubChemfinding more approved drug sources that we could compare
directly inside PubChemIt thus became feasible to explore approved
drug comparisons entirely within PubChem
5
Methods outline Identify submitters in PubChem that were
expected to encompass FDA and other approved drug structures
represented by CIDs (i.e. excluding large biologicals)In some cases
coverage was SID-tagged (e.g. DrugBank) in others explicit (e.g.
INN/USAN) and others implicit (e.g. FDA UNIIs)Select and/or convert
each source to a PubChem CID list (or extrinsically for ChEMBL
approved) Compare these sources at the CID levelLook at overlaps
between four curated sets (using the Venny tool)Analyse intersects
and diffs by PubChem relationships (next slide)6
Drug relationship interrogation using the PubChem rules 7
Eight Sources for comparison 8
Sequential overlaps for approved drugs9
ChEMBL 1900 approved CIDs as a starting point > intersecting
these 8 sources Left only 183 CIDs in-commonDoing seven intersects
without any approved sets > 373Adding FDA/MDD (1216) >
198
Four-source intersects at the CID level 10
Comparative parameters for the splits11
Consensus drug submissions: popular but old 12
Nothing more recent than 2011 (CID 54677470 meloxicam)
Niacin tops the pops
Source-unique entries (orphans) 13
DrugCentral orphan (from 29)14
Cross-checking Taltirelin15
DrugBank approved orphan (from 6) 16
NPC orphan (from 14)17
ChEMBL approved orphan (from 25) 18
Consensus drug multiplexing in PubChem 19
Thomson Pharma SCRIPDB SureChEMBL
But some mixtures can be correct drugs20
ConclusionsDiscordances between sources and counts of approved
drug structures mapped into PubChem give cause for concernAs a
testimony to the challenge, this is certainly no ones faultClean
selects for approved structures from different sources should be
easier (e.g. direct InChIKey downloads)The PubChem selection
functionality and relationship navigation facilitates exploring the
causes of discordanceIts no surprise that confounding factors
include chiral complexity and mixturesIts not clear if different
extrinsic comparison methods (e.g. InChIKey and/or CACTVS toolkit )
would give more optimistic resultsIssues around structural
multiplexing extend to all bioactive database entries, not just
approved drugsPatent extractions and vendors in PubChem are
valuable but contribute to extensive multiplexing of drug
structures (e.g. virtual deuteration)
21
ConsequencesThere is neither a definitive set of approved drug
structures nor any consensus on totals, FDA or globalConsidering
these are the Crown Jewels of many decades of global R&D for
human medicines they could be better looked afterThere is no data
to suggest commercial collations are significantly less affected
than public onesThe broader ramifications of the problem are
unclear but certainly present pitfalls that have/will affect
published workFor QSAR, the edict Trust but verify remains
particularly appositeBig Data and the mega-portals will simply
transitively subsume and recycle the discordancesGtoPdb now takes a
more pragmatic and parsimonious approach to approved drug
annotation, including adding more curators notesNote this work only
addresses struc-to-struc issues. These are compounded by additional
multiplexing problems of name-to-struc and struc-to-activity
22
Amelioration ideasThe single biggest step forward would be for
drug development organisations globally (mainly pharma) to
provenance and submit their own Gold Standard structures that are
under regulatory consideration, directly from their internal
registrations systems into PubChemThis avoids the mapping spaghetti
in the system of IND, FDA, INN, USAN, CAS, that mainly shuffle
around PDF images and IUPAC names The countless US
Government-affiliated systems and initiatives could improve their
cross-normalisation Will the Global Ingredient Archival
System(GinAS) rule them all?We should explore inter-source
collaborative cross-curationCrowdsourcing efforts, such as
Wikidata, might improve the situation Given what look to be
unsurmountable challenges should we consider ensemble/cluster based
solutions? (e.g. OpenPhacts semantic lenses)
23
Thank you, questions welcomenow and/or later over a glass of
something perhaps24
Poster 34See me for a flyer
http://www.slideshare.net/cdsouthan/
http://www.slideshare.net/cdsouthan/will-the-correct-drugs-please-stand-up-68239021
You can pick up the 459 in this MyNCBI
linkhttps://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/51348141/public/
Let me know if you need any of the other sets