Top Banner
© Strand Life Sciences 2006; Confidential 03/22/22 1 Combining Natural Language Processing with Substructure Search for efficient mining of Scientific literature Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd
31

Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

Dec 30, 2015

Download

Documents

William Mason

Combining Natural Language Processing with Substructure Search for efficient mining of Scientific literature. Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd. Background. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 1

Combining Natural Language Processing with Substructure Search

for efficient mining of Scientific literature

Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian

Strand Life Sciences Pvt. Ltd

Page 2: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 2

Background

• During the lead design/optimization phase, only the interaction between the lead and its target is investigated. – Interactions with different targets that could be

potentially undesirable are not studied.

• Undesirable interactions usually only become apparent at later stages of the discovery process - in vivo

Page 3: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 3

Solutions that Currently Exist

• Run experimental assays to determine undesirable interactions– A prospective panel of “side-effect” related assays e.g. for

kinases– The need for the assay may arise due to side-effects observed

in animal studies or liabilities known about the target class

– Synthesis and assay costs in conducting these experiments are considerable.

– What has not been checked may be missed

• Run a search engine like “QueryChem” with ‘structure and keyword’– Need to predefine the keyword – results limited by what you

define– Display of results is not intuitive or user-friendly– Further refinement or exploration on these results is unwieldy

Page 4: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 4

Justin Klekota, Frederick P. Roth and Stuart L. SchreiberBioinformatics 2006 22(13):1670-1673

• Structures are first searched against public databases

• ‘Text’ names of the ‘hits’ so obtained are then combined with user defined keywords and again used to search information from the internet.

Page 5: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 5

Results of valproate & hERG binding

Page 6: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 6

Issues with this approach

• Only looks for co-occurrences of the compound and the keyword– Hence, potentially misses lot of interactions

• The result of a search is a (long) text list– not easy to examine – no real analysis is possible

• What could be an alternate approach– Cover as many biological interactions as currently

available in literature– Show results in a user-friendly and intuitive manner– Allow further refinements of search and exploration in a

dynamic manner

Page 7: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 7

The Workflow

• ‘Draw’ the structure of a ‘query’ compound,

• Run a similarity or sub-structure search against ‘target’ compounds in an ‘interactions’ database, – define ‘hit’ compounds ‘similar’ to the ‘query’ compound– check the interactions of these ‘hit’ compounds

• A network(s) of interactions for the given compound is obtained…

• Networks can be analyzed - provides a means of understanding the potential liabilities of the scaffold under consideration

Page 8: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 8

Basic Assumptions

• Similarity principle– Similar compounds will most likely have similar

biological interactions

• The presence of a pre-mapped interactions database that is remains current with latest literature

• The presence of small molecules within the database along with their structures that affords sub-structure and similarity searching

Page 9: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 9

"TLR-2 expression on monocytes was enhanced by macrophage colony-stimulating factor (M-CSF) and interleukin-10 (IL-10), but was reduced by transforming growth factor beta1.

Interaction Database Creation

• Database created using NLP

• Protein, genes and small molecule interactions captured

Page 10: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 10

Entity Recognitio

n Phase

Information

Extraction Phase

Input sentence

Tagged sentence

•Glucose-6-phosphatase was found to play a role in the regulation of insulin.

•A was found to play a role in the regulation of B.

•Glucose-6-phosphatase insulinregulation

Interactions

NLP Schemata

Page 11: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 11

Entity Recognition

• Create dictionaries of protein names, small molecules etc.

• Identify alternative names/synonyms/symbols

• Resolve ambiguities

Page 12: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 12

Information Extraction

• First understand sentence structure– Syntax Analysis

• Understand meaning– Semantic Analysis

• Final interaction extraction– Inferencing

Page 13: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 13

Mammal [human, mouse, rat]

Mammal Interaction Database

Page 14: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 14

Step 1 – Draw (sub)structure

Page 15: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 15

Step 2 - Perform similarity (or) substructure search

Page 16: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 16

Step 3 - Build network with hits

Page 17: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 17

Example of Interaction Network

Page 18: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 18

Analyze Network

• Relevance interactions: binding, transcription, post-translational, small molecules, metabolism or transport regulation etc.

• Interaction networks: shortest path network, network regulators, network targets etc

• Advanced analysis: relevance list, custom relevance interactions, custom interaction network etc

• Enrichment analysis: GO group enrichment, similar pathways etc

• Numerical data analysis: If present

Page 19: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 19

Case Study

Potential hepatotoxic side-effects of lead molecules

Page 20: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 20

1 – Draw Structure and perform search

Page 21: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 21

2 - Gather Hits

Page 22: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 22

3- Generate Network

Cholestatic Role of Chlorpromazine

Page 23: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 23

Processes

hypersensitivity membrane fluidity

portal tract inflammation SAM protection

hepatocyte damage mitochondrial damage

bile acid independent flow bile duct proliferation

bile salt inspissation biliary permeability

pericanalicular microfibrils canalicular dilation

microvilli reduction

Chlorpromazine

Proteins

microsomal enzymes canalicular membrane ATPase

diacylcholine phosphotransferase adenylate cyclase

phospholipase A2 cytochrome p oxidase

sodium-potassium exchanging ATPase

ATP synthase

IL2 Cyp23IP

TNF leucine aminopeptidase3

4- Analyze Biological Processes

Page 24: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 24

Case Study 2

Page 25: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 25

Hits matching

• Multiple Matches found– including amiodarone

• Network created with a focus on the liver

• Analysis performed on the results

Page 26: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 26

Interactions of amiodarone in the liver

Page 27: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 27

Steatosis Network

Page 28: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 29

Processes

toxic hepatitis portal tract inflammation

fibrosis lipidosis

hepatocyte damage Phospholipidosis

steatosis

Amiodarone

Proteins

phospholipase A2 phospholipase C

Cyp2E1 Cyp3A

CYP3A4

voltage gated potassium channel

SAM domain protein

Table view of interactions and proteins

Page 29: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 30

Conclusions

• Combining structure based searches along with an interaction database allows the in silico assessment of the potential liabilities of a lead molecule

• We have performed text-mining using Natural Language Processing (NLP). The approach uses both syntactic and semantic analysis of sentences along with inferencing.

• We have applied NLP on PubMed abstracts to create a database of interactions containing proteins, small molecules and genes

• We can perform similarity and sub-structure searches against this database to generate a network based on hits

• We have demonstrated this approach in two cases to show scaffold liabilities for hepatotoxicity

Page 30: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 31

Acknowledgements

• Pathway ArchitectTM Team• SarchitectTM Team

• Vaijayanti Gupta• R. Nalini

Page 31: Shaillay Dogra, Ramesh Hariharan and Kalyanasundaram Subramanian Strand Life Sciences Pvt. Ltd

© Strand Life Sciences 2006; Confidential04/19/23 32

Thank You