Will the correct drugs please stand up?

1

Will the Correct Drugs Please Stand up?

Chris Southan, Elena Faccenda, Simon J. Harding, Joanna L. Sharman, Adam J. Pawson, and Jamie A Davies

IUPHAR/BPS Guide to Pharmacology (GtoPdb) University of Edinburgh, Centre for Integrated Physiology, EH8 9XD, UK.

Presentation for the 12th GCC Fulda, November 2016

2

Declarations

• Since circa 2005, team members working on IUPHAR-DB that became GtoPdb in 2012 have been curating the structures of approved drugs for human diseases

• Partly as a consequence of the work presented here, we neither claim a definitive, nor error-free, nor a complete, approved set

• We have encountered most of the problems associated with this exercise first hand, so we empathise with teams grappling with the same issues

• We are grateful to all the sources in PubChem used in this comparison study

• The highlighting of inter-source discordances and particular examples in this presentation should not be misinterpreted as criticism of those sources

3

Surfaced totals: take your pick

4

Intersects between three sources: 2006-2009

20092006

5

Context of the current work

• Since ~2013 the GtoPdb team noticed that the structure space around approved drugs was becoming increasingly multiplexed and “fuzzy”

• Curatorial choices were consequently getting more difficult from a pharmacological angle

• We thus needed a molecular perspective on causes and consequences of this “fuzz” with a view to reappraising our drug curation strategy

• Updating PMID 20298516 was a logical approach but the methods used then had been largely superseded

• We had increased our PubChem exploitation by; – paying close attention to our regular substance submissions and refreshes– using it for curatorial selection– exploring “fuzz” via relationship navigation in PubChem– finding more approved drug sources that we could compare directly inside

PubChem• It thus became feasible to explore approved drug comparisons

entirely within PubChem

6

Methods outline

• Identify submitters in PubChem that were expected to encompass FDA and other approved drug structures represented by CIDs (i.e. excluding large biologicals)

• In some cases coverage was SID-tagged (e.g. DrugBank) in others explicit (e.g. INN/USAN) and others implicit (e.g. FDA UNIIs)

• Select and/or convert each source to a PubChem CID list (or extrinsically for ChEMBL approved)

• Compare these sources at the CID level• Look at overlaps between four curated sets (using the Venny

tool)• Analyse intersects and diffs by PubChem relationships (next

slide)

7

Drug relationship interrogation using the PubChem rules

8

Eight Sources for comparison

9

Sequential overlaps for approved drugs

• ChEMBL 1900 approved CIDs as a starting point > intersecting these 8 sources • Left only 183 CIDs in-common• Doing seven intersects without any approved sets > 373• Adding FDA/MDD (1216) > 198

10

Four-source intersects at the CID level

11

Comparative parameters for the splits

12

Consensus drug submissions: popular but old

• Nothing more recent than 2011 (CID 54677470 meloxicam)

Niacin tops the pops

13

Source-unique entries

(orphans)

14

DrugCentral orphan (from 29)

15

Cross-checking Taltirelin

16

DrugBank approved orphan (from 6)

17

NPC orphan (from 14)

18

ChEMBL approved orphan (from 25)

19

Consensus drug multiplexing in PubChem

Thomson Pharma

SCRIPDB

SureChEMBL

20

But some mixtures can be “correct “ drugs

21

Conclusions

• Discordances between sources and counts of approved drug structures mapped into PubChem give cause for concern

• As a testimony to the challenge, this is certainly no ones “fault”• Clean selects for approved structures from different sources should

be easier (e.g. direct InChIKey downloads)• The PubChem selection functionality and relationship navigation

facilitates exploring the causes of discordance• Its no surprise that confounding factors include chiral complexity

and mixtures• Its not clear if different extrinsic comparison methods (e.g.

InChIKey and/or CACTVS toolkit ) would give more optimistic results

• Issues around structural multiplexing extend to all bioactive database entries, not just approved drugs

• Patent extractions and vendors in PubChem are valuable but contribute to extensive multiplexing of drug structures (e.g. virtual deuteration)

22

Consequences

• There is neither a definitive set of approved drug structures nor any consensus on totals, FDA or global

• Considering these are the “Crown Jewels” of many decades of global R&D for human medicines they could be better looked after

• There is no data to suggest commercial collations are significantly less affected than public ones

• The broader ramifications of the problem are unclear but certainly present pitfalls that have/will affect published work

• For QSAR, the edict “Trust but verify” remains particularly apposite• Big Data and the mega-portals will simply transitively subsume and

recycle the discordances• GtoPdb now takes a more pragmatic and parsimonious approach to

approved drug annotation, including adding more curators notes• Note this work only addresses struc-to-struc issues. These are

compounded by additional multiplexing problems of name-to-struc and struc-to-activity

23

Amelioration ideas

• The single biggest step forward would be for drug development organisations globally (mainly pharma) to provenance and submit their own “Gold Standard” structures that are under regulatory consideration, directly from their internal registrations systems into PubChem

• This avoids the mapping spaghetti in the “system” of IND, FDA, INN, USAN, CAS, that mainly shuffle around PDF images and IUPAC names

• The countless US Government-affiliated systems and initiatives could improve their cross-normalisation

• Will the Global Ingredient Archival System(GinAS) rule them all?• We should explore inter-source collaborative cross-curation• Crowdsourcing efforts, such as Wikidata, might improve the situation • Given what look to be unsurmountable challenges should we consider

ensemble/cluster based solutions? (e.g. OpenPhacts semantic lenses)

24

Thank you, questions welcomenow and/or later over a glass of something perhaps…

Poster 34

See me for a flyer

http://www.slideshare.net/cdsouthan/

http://www.slideshare.net/cdsouthan/will-the-correct-drugs-please-stand-up-68239021

You can pick up the 459 in this MyNCBI linkhttps://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/51348141/public/

Let me know if you need any of the other sets





https://www.ncbi.nlm.nih.gov/sites/myncbi/christopher.southan.1/collections/51348141/public/




Will the correct drugs please stand up?

Science