Top Banner
Will the Correct Drugs Please Stand up? Chris Southan, Elena Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies IUPHAR/BPS Guide to Pharmacology (GtoPdb) University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK. Presentation for the 12 th GCC Fulda, November 2016 1

Will the correct drugs please stand up?

Jan 27, 2017



Chris Southan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.

IUPHAR Database Meeting

Will the Correct Drugs Please Stand up?Chris Southan, Elena Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies IUPHAR/BPS Guide to Pharmacology (GtoPdb) University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK. Presentation for the 12th GCC Fulda, November 20161


2Since circa 2005, team members working on IUPHAR-DB that became GtoPdb in 2012 have been curating the structures of approved drugs for human diseasesPartly as a consequence of the work presented here, we neither claim a definitive, nor error-free, nor a complete, approved setWe have encountered most of the problems associated with this exercise first hand, so we empathise with teams grappling with the same issuesWe are grateful to all the sources in PubChem used in this comparison studyThe highlighting of inter-source discordances and particular examples in this presentation should not be misinterpreted as criticism of those sources

Surfaced totals: take your pick


Intersects between three sources: 2006-2009



Context of the current workSince ~2013 the GtoPdb team noticed that the structure space around approved drugs was becoming increasingly multiplexed and fuzzy Curatorial choices were consequently getting more difficult from a pharmacological angleWe thus needed a molecular perspective on causes and consequences of this fuzz with a view to reappraising our drug curation strategyUpdating PMID 20298516 was a logical approach but the methods used then had been largely supersededWe had increased our PubChem exploitation by; paying close attention to our regular substance submissions and refreshesusing it for curatorial selectionexploring fuzz via relationship navigation in PubChemfinding more approved drug sources that we could compare directly inside PubChemIt thus became feasible to explore approved drug comparisons entirely within PubChem


Methods outline Identify submitters in PubChem that were expected to encompass FDA and other approved drug structures represented by CIDs (i.e. excluding large biologicals)In some cases coverage was SID-tagged (e.g. DrugBank) in others explicit (e.g. INN/USAN) and others implicit (e.g. FDA UNIIs)Select and/or convert each source to a PubChem CID list (or extrinsically for ChEMBL approved) Compare these sources at the CID levelLook at overlaps between four curated sets (using the Venny tool)Analyse intersects and diffs by PubChem relationships (next slide)6

Drug relationship interrogation using the PubChem rules 7

Eight Sources for comparison 8

Sequential overlaps for approved drugs9

ChEMBL 1900 approved CIDs as a starting point > intersecting these 8 sources Left only 183 CIDs in-commonDoing seven intersects without any approved sets > 373Adding FDA/MDD (1216) > 198

Four-source intersects at the CID level 10

Comparative parameters for the splits11

Consensus drug submissions: popular but old 12

Nothing more recent than 2011 (CID 54677470 meloxicam)

Niacin tops the pops

Source-unique entries (orphans) 13

DrugCentral orphan (from 29)14

Cross-checking Taltirelin15

DrugBank approved orphan (from 6) 16

NPC orphan (from 14)17

ChEMBL approved orphan (from 25) 18

Consensus drug multiplexing in PubChem 19

Thomson Pharma SCRIPDB SureChEMBL

But some mixtures can be correct drugs20

ConclusionsDiscordances between sources and counts of approved drug structures mapped into PubChem give cause for concernAs a testimony to the challenge, this is certainly no ones faultClean selects for approved structures from different sources should be easier (e.g. direct InChIKey downloads)The PubChem selection functionality and relationship navigation facilitates exploring the causes of discordanceIts no surprise that confounding factors include chiral complexity and mixturesIts not clear if different extrinsic comparison methods (e.g. InChIKey and/or CACTVS toolkit ) would give more optimistic resultsIssues around structural multiplexing extend to all bioactive database entries, not just approved drugsPatent extractions and vendors in PubChem are valuable but contribute to extensive multiplexing of drug structures (e.g. virtual deuteration)


ConsequencesThere is neither a definitive set of approved drug structures nor any consensus on totals, FDA or globalConsidering these are the Crown Jewels of many decades of global R&D for human medicines they could be better looked afterThere is no data to suggest commercial collations are significantly less affected than public onesThe broader ramifications of the problem are unclear but certainly present pitfalls that have/will affect published workFor QSAR, the edict Trust but verify remains particularly appositeBig Data and the mega-portals will simply transitively subsume and recycle the discordancesGtoPdb now takes a more pragmatic and parsimonious approach to approved drug annotation, including adding more curators notesNote this work only addresses struc-to-struc issues. These are compounded by additional multiplexing problems of name-to-struc and struc-to-activity


Amelioration ideasThe single biggest step forward would be for drug development organisations globally (mainly pharma) to provenance and submit their own Gold Standard structures that are under regulatory consideration, directly from their internal registrations systems into PubChemThis avoids the mapping spaghetti in the system of IND, FDA, INN, USAN, CAS, that mainly shuffle around PDF images and IUPAC names The countless US Government-affiliated systems and initiatives could improve their cross-normalisation Will the Global Ingredient Archival System(GinAS) rule them all?We should explore inter-source collaborative cross-curationCrowdsourcing efforts, such as Wikidata, might improve the situation Given what look to be unsurmountable challenges should we consider ensemble/cluster based solutions? (e.g. OpenPhacts semantic lenses)


Thank you, questions welcomenow and/or later over a glass of something perhaps24

Poster 34See me for a flyer

You can pick up the 459 in this MyNCBI link

Let me know if you need any of the other sets