Top Banner
Large-scale curation of bioactive chemistry from patents and papers: A snapshot of the Excelra GOSTARstatistics www.excelra.com 0 20000 40000 60000 80000 100000 MAPK14 PIK3CA PDGFR BETA CANNABINOID RECEPTOR 1 JANUS KINASE 2 C-SRC TYROSINE KINASE THROMBIN COAGULATION FACTOR X EGFR VEGFR2 All-me Target Ranking, MCD + TCD 0 500 1000 1500 2000 2500 3000 CANNABINOID RECEPTOR 1 MONOAMINE OXIDASE B VEGFR2 CARBONIC ANHYDRASE I CARBONIC ANHYDRASE II CYP3A4 EGFR POTASSIUM CHANNEL KV11.1 BUTYRYLCHOLINESTERASE ACETYLCHOLINESTERASE 2015-16 Target Ranking MCD (papers) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 INTERLEUKIN-1 RECEPTOR-ASSOCIATEDNUCLEAR RECEPTOR ROR-GAMMA T BRUTON TYROSINE KINASE JANUS KINASE 2 PIK3CB SODIUM CHANNEL NAV1.7 PIK3CG PIK3CD NUCLEAR RECEPTOR ROR-GAMMA PIK3CA 2015-16 Target Ranking TCD (patents) 64263 60722 58123 56276 55714 54210 53844 53277 1364 1325 1190 1090 1064 1050 1043 9079 8094 7948 7838 7354 7076 6558 6487 5754 5590 Christopher Southan, Anil Kumar Manchala and Sreeni Devidas IUPHAR/BPS Guide to PHARMACOLOGY, Centre for Integrative Physiology, University of Edinburgh, EH8 9XD, UK. Excelra Knowledge Solutions (formerly GVK Informatics) Pvt. Ltd., Hyderabad-500039, India http://www.excelra.com/index.php http://www.slideshare.net/cdsouthan 2 2 1 2 Target Ranking: - “Slice and dicing ” target-to-compound outputs over time offers unique insights (PMID 24204758). - As expected, the cumulative ranking (below top) is not that different from 2011 (PMID 21569515). - However, the most recent papers (below centre) show quite different rankings to recent patents (lower chart). - Data mining facillitates high-resolution competitive intelligence and patent > later paper disclosures. - It also provides scientific SAR insights (e.g. shifts of affinity, scaffolds and chemical properties from institutional and collective target validation endeavours). Database Content: - By inspecting a document "D" Excelra expert curators identify a bioactivity assay "A" (e.g. for an enzyme) with a quantitative result "R" (e.g. an IC50) for a compound "C" (a defined chemical structure) as an activity modulator. (typically inhibition) of protein target "P“, or a cell-based assay. - A useful shorthand for this mapping is “D-A-R-C-P”. - Assays are classified into different types. - Location of structures in documents is specified (e.g. “cpd 5a” from a paper or “Example 102” in a patent). - Starting from 1945, there are 1.34 mill compounds from 112K papers and 3.35 million from 71K patents - The form of these plots up to 2012 is discussed in PMID 24204758. - The fall in patent med. chem. SAR continues. - Since 2012 literature SAR is now also falling but converging in numbers with patents. - Causes of declines probably dominated by Pharma M&A activity and shift to biologicals but does not preclude improvements in compound quality. Introduction: - Excelra has developed a suite of five unified database products covering global drug discovery R&D outputs. - These are termed Global Online Structure Activity Relationships (GOSTAR). - This work provides an update, mainly on the largest two components, the Medicinal Chemistry (MCD) and Target (TCD) databases for the extraction of papers and patents, respectively. - Content is derived from expert curation of structure-activity relationships (SAR) from documents. - Details, are described on the Excelra website and historical statistics in “Tracking 20 - Years of Compound to-Target Output from Literature and Patents” Southan et al, 2013, PLoS One, PubMed 24204758. 1 Conclusions: - The human protein target totals have increased to 3383 in MCD, 2431 in TCD , 3882 combined and 546 patent-only. - Exceeds the total combined Swiss-Prot cross-references for the activity-mapped public sources of Guide to PHARMACOLOGY, BindingDB and ChEMBL of 3,272. - Taking the current human Swiss-Prot total of 20,201 , MCD and TCD provide chemical modulation starting points for a druggable proteome of 19% - Compound capture has increased by 27% since 2012 - In addition to MCD and TCD, GOSTAR includes 33,620 compounds in development or approved drugs and Mechanism Based Toxicity data from 28,305 of these. - The GOSTAR subscription resource offers one of the largest available D-A-R-C-P compilations - Covers chemical biology as well as drug R&D - Advanced on-line data mining features and options for internal integration for exploitation Compounds vs year (1970 to 2015) for TCD from patents and MCD from papers 91676 78884 2383 1761 1641
1

Large-scale curation of bioactive chemistry from patents and papers: Excelra GOSTAR

Jan 22, 2018

Download

Science

Chris Southan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Large-scale curation of bioactive chemistry from patents and papers: Excelra GOSTAR

Large-scale curation of bioactive chemistry from patents and papers:

A snapshot of the ‘Excelra GOSTAR’ statistics

www.excelra.com

0 20000 40000 60000 80000 100000

MAPK14

PIK3CA

PDGFR BETA

CANNABINOID RECEPTOR 1

JANUS KINASE 2

C-SRC TYROSINE KINASE

THROMBIN

COAGULATION FACTOR X

EGFR

VEGFR2

All-time Target Ranking, MCD + TCD

0 500 1000 1500 2000 2500 3000

CANNABINOID RECEPTOR 1

MONOAMINE OXIDASE B

VEGFR2

CARBONIC ANHYDRASE I

CARBONIC ANHYDRASE II

CYP3A4

EGFR

POTASSIUM CHANNEL KV11.1

BUTYRYLCHOLINESTERASE

ACETYLCHOLINESTERASE

2015-16 Target Ranking MCD (papers)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

INTERLEUKIN-1 RECEPTOR-ASSOCIATED …

NUCLEAR RECEPTOR ROR-GAMMA T

BRUTON TYROSINE KINASE

JANUS KINASE 2

PIK3CB

SODIUM CHANNEL NAV1.7

PIK3CG

PIK3CD

NUCLEAR RECEPTOR ROR-GAMMA

PIK3CA

2015-16 Target Ranking TCD (patents)

64263

60722

58123

56276

55714

54210

53844

53277

1364

1325

1190

1090

1064

1050

1043

9079

8094

7948

7838

7354

7076

6558

6487

5754

5590

Christopher Southan, Anil Kumar Manchala and Sreeni Devidas IUPHAR/BPS Guide to PHARMACOLOGY, Centre for Integrative Physiology, University of Edinburgh, EH8 9XD, UK. Excelra Knowledge Solutions (formerly GVK Informatics) Pvt. Ltd., Hyderabad-500039, India http://www.excelra.com/index.php http://www.slideshare.net/cdsouthan

2 2

1

2

Target Ranking:

- “Slice and dicing ” target-to-compound outputs over time o�ers unique insights (PMID 24204758).- As expected, the cumulative ranking (below top) is not that di�erent from 2011 (PMID 21569515).- However, the most recent papers (below centre) show quite di�erent rankings to recent patents (lower chart). - Data mining facillitates high-resolution competitive intelligence and patent > later paper disclosures.- It also provides scienti�c SAR insights (e.g. shifts of a�nity, sca�olds and chemical properties from institutional and collective target validation endeavours).

Database Content:

- By inspecting a document "D" Excelra expert curators identify a bioactivity assay "A" (e.g. for an enzyme) with a quantitative result "R" (e.g. an IC50) for a compound "C" (a de�ned chemical structure) as an activity modulator. (typically inhibition) of protein target "P“, or a cell-based assay.- A useful shorthand for this mapping is “D-A-R-C-P”. - Assays are classi�ed into di�erent types. - Location of structures in documents is speci�ed (e.g. “cpd 5a” from a paper or “Example 102” in a patent).- Starting from 1945, there are 1.34 mill compounds from 112K papers and 3.35 million from 71K patents

- The form of these plots up to 2012 is discussed in PMID 24204758.- The fall in patent med. chem. SAR continues.- Since 2012 literature SAR is now also falling but converging in numbers with patents.- Causes of declines probably dominated by Pharma M&A activity and shift to biologicals but does not preclude improvements in compound quality.

Introduction:

- Excelra has developed a suite of �ve uni�ed database products covering global drug discovery R&D outputs.- These are termed Global Online Structure Activity Relationships (GOSTAR). - This work provides an update, mainly on the largest two components, the Medicinal Chemistry (MCD) and Target (TCD) databases for the extraction of papers and patents, respectively. - Content is derived from expert curation of structure-activity relationships (SAR) from documents.- Details, are described on the Excelra website and historical statistics in “Tracking 20 - Years of Compound to-Target Output from Literature and Patents” Southan et al, 2013, PLoS One, PubMed 24204758.

1

Conclusions:- The human protein target totals have increased to 3383 in MCD, 2431 in TCD , 3882 combined and 546 patent-only. - Exceeds the total combined Swiss-Prot cross-references for the activity-mapped public sources of Guide to PHARMACOLOGY, BindingDB and ChEMBL of 3,272. - Taking the current human Swiss-Prot total of 20,201 , MCD and TCD provide chemical modulation starting points for a druggable proteome of 19% - Compound capture has increased by 27% since 2012 - In addition to MCD and TCD, GOSTAR includes 33,620 compounds in development or approved drugs and Mechanism Based Toxicity data from 28,305 of these. - The GOSTAR subscription resource o�ers one of the largest available D-A-R-C-P compilations- Covers chemical biology as well as drug R&D- Advanced on-line data mining features and options for internal integration for exploitation

Compounds vs year (1970 to 2015) for TCD from patents and MCD from papers

91676

78884

2383

1761

1641