Top Banner
Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology Experimental Radionuclide Therapy Harvard Medical School To download this talk, see my personal website: Type Pavel Pospisil in Google first hit.
30

Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Jun 18, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Data Mining & Molecular Modeling

Pavel Pospisil

Department of Radiology Experimental Radionuclide Therapy

Harvard Medical School

To download this talk, see my personal website:Type Pavel Pospisil in Google first hit.

Page 2: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Outlook

• Data Mining – why data mining?• Bioinformatics• Cheminformatics

• Molecular Modeling – why do we do models?• Macromolecules• Small molecules

• Examples of studies• Let’s try it now!

Page 3: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Data Mining Molecular modeling

Bioinformatics

Knowledge bases

Databases

Literature

Ligand-based drug design

Structure-based drug design

QSAR

Homology modeling

Pharmacophore

Cheminformatics

Chemical databases

Docking

Microarray

Chemical microarray

HTS

Page 4: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Bioinformatics“Bioinformatics derives knowledge from computer analysis of biological data.”Information stored in the genetic code, but also experimental results from various sources, patient statistics, medical imaging, and scientific literature.

Bioinformatics finds genes or proteins involved in a particular disease and identifies novel therapeutic targets.Bioinformatics creates data mining tools.

Page 5: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Cheminformatics

Informatics of chemical databasesCalculation of chemical propertiesQSAR - Quantitative-structure activity relationshipChemical descriptors – describe molecules in 2-D, 3-D dimensionsLigand-based drug design

Example from Accelrys web page

Page 7: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Text Mining

PubMed16 M articles4000 full-access articles5000 journals52% non-US journals

ISI Web of Knowledgeaccess, analyze, and manage research literatureMore then PubMed! All science

Cross-product searchingLinks to full textPersonal journal listsPersonal bibliographic management

Cool: your personal impact factorbased on where YOU are cited.

Visit the web pages through Countway library and Harvard PIN

Page 8: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

NCBI GenBank and Entrez Gene

Entrez Gene“~all genesStarted with 500 K molecules of NCI (National Cancer Institute) and 350 K toxicology molecules of NLM (National Library of Medicine).Today 8 M molecules contributed by more than 20 commercial and scientific organizations.

Conflict with American Chemical Society (ACS)

GenBank“~all genes”NIH genetic sequence database130 B nucleotides66 M sequences248 K speciesBillions inquiries made yearly

www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene

www.ncbi.nlm.nih.gov/gquery

Page 9: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Entrez www.ncbi.nlm.nih.gov/gquery/gquery.fcgi

Page 10: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Protein Databases

UniProt All proteinsUniProt is a centralized resource for protein sequences and functional information.UniProt was created by joining together the information from Swiss-Prot, TrEMBL and PIR. Over 3 M proteins.

PDBAll proteins with resolved 3-dimensional structureOver 36K proteins (and other macromolecules).Also modeled proteins.

www.rcsb.orgwww.uniprot.org

Page 11: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

PubChem or SciFinder

PubChem“~all open-access chemicals”Small molecules with biological activities.Started with 500 K molecules of NCI (National Cancer Institute) and 350 K toxicology molecules of NLM (National Library of Medicine).Today 8 M molecules contributed by more than 20 commercial and scientific organizations.

SciFinderWorld's largest collection of biochemical, chemical, medical, and other related information.Scientific information in journals and patent literature from around the world.12 M single- and multi-step reactions 1.5 B predicted and experimental properties Original source and final authority for CAS Registry Numbers®All patent records

pubchem.ncbi.nlm.nih.gov

Free access.Download through www.chem.harvard.edu/library/databases.php

Page 12: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

ChemBank and DiscoveryGate

www.discoverygate.com

Harvard access free!

DiscoveryGateAll chemical compounds, contains alsoBeilstein – since 18th centuryGmelin – inorganic and organometallicsACD: commercially available chemicalsAccess compounds and related data, reactions, original journal articles and patents

ChemBank is a public, web-based informatics environment created by the Broad Institute and funded in large part by NCIStores cell measurements derived from, cell lines treated with small molecules. Small molecule screens

http://chembank.broad.harvard.edu

Harvard access free.

Page 13: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

The Gene Ontology

All descriptionsA controlled vocabulary to

describe gene and gene product attributes in any organism, consistent descriptions of gene products in different databases.

The building blocks of the Gene Ontology are the terms: e.g. cell, fibroblast, growth factor receptor binding, or signal transduction.

The three organizing principles of GO are

cellular componentbiological processmolecular function. www.geneontology.org

Page 14: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Microarrays

OncomineAll cancer microarrays1125 cancer microarray studiesOver 18 K microarraysOver 580 M data points39 cancer types

Stanford Microarray DatabaseAll cancer microarraysOver 65 K microarray experimentsOver 11 K publicly available50 organisms

Partly free access.

www.oncomine.orgFree access.

http://genome-www5.stanford.edu/index.shtml

Page 15: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Knowledge Bases

Ingenuity•world’s largest ontology•20000 genes•1 million pathway interactions retrieved from 36 full text curated, peer-reviewed journals

LSGraph•PubMed-based search based on combined keywords and retrieval of most cited proteins•Functional neighbors•Link to gene/protein databases and Gene Ontology

http://www.it-omics.com http://www.ingenuity.com

Page 16: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Example of the Ingenuity Network

ExtracellularSpace

PlasmaMembrane

CytoplasmNucleus

Page 17: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

iHOP – info Hyperlinked Over Proteins

Good way to find all about gene or protein

www.ihop-net.org

Page 18: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Example- Data mining in most serious cancers

Prostate Breast Lung Colon Ovarian Pancreas

abstracts with proteins related to extracellular OR membrane environment

8097

LSGraph

Ingenuity

1602all cited proteins retrieved

Proteins in cancer and extracell. space

104 147 185 124 159 140

Enzymes 23 54 71 41 38 47

Phosphatases 3 4 8 3 5

5457 10628 4222

4

12226

2105

14680

1068 1956 2771 1974

nr. of abstracts with protein relations; normal: nr. of genes or proteins (entities)

Page 19: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

From bioinformatics to modeling

Once data mining identifies suitable therapeutic targete.g. protein

Target structure is studied or modeled –macromolecule modeling

Small organic molecules (ligands) are designed in order to bind to the protein – drug design, small molecule modeling

NH

PO

OH

O

+Rational

drug design

Page 20: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Molecular Modeling

•Visualization of molecules – molecular modelsSmall molecules (Mw ≤ 500 g/mol)Large molecules – polynucleotides (genes), polypeptides (proteins)

•Calculation of molecular physicochemical properties• Minimization and dynamic simulation= optimization of conformations• Interactions between molecules= docking• Virtual Screening= automatic docking of chemical libraries

What do we need?

Page 21: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Visualization and simple modeling

Creating molecules, calculation of basic properties

•ChemDraw (2D), Chem3D -www.cambridgesoft.com•ViewerPro Lite -accelrys.com•Java web applications -www.molinspiration.com

HO

H

HO

H

ChargesO -0.366 [O(2)]H 0.183 [H(1)]H 0.183 [H(3)]

HETATM 1 O * 1 0.003 -0.007 0.005 0.00 0.00 O HETATM 2 H * 1 0.325 0.449 0.794 0.00 0.00 H HETATM 3 H * 1 -0.964 -0.007 0.005 0.00 0.00 H CONECT 1 3 2END

Example of the file format: .pdb (PDB format)

Page 22: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Homology modeling and Docking program examples

Homology Modeling and sequence alignmentSwiss-Prot retrieves sequencesSwiss-Model – create models based on sequence -

http://swissmodel.expasy.org

Modeller – already contains models - salilab.org/modeller

GPCRdb – contain GPCR models – gpcr.org

Free macro modeling visualization programsSwiss DeepView – http://ca.expasy.org/spdbv/

Chimera – http://www.cgl.ucsf.edu/chimera/

Docking of small molecules to macromoleculesAutoDock 3.0 – free for academia – autodock.scripps.duDock 6.0 (Kuntz lab) – free for academia -

http://dock.compbio.ucsf.edu/DOCK_6/index.htmArgusLab 4.0 docking tool – free for academia

www.planaria-software.com/arguslab40.htm

Page 23: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Large modeling software

• MOE - chemcomp.com

• Tripos – tripos.com

• Accelrys – accelrys.com

• Schrödinger – schrodinger.com

• OpenEye - eyesopen.com

Page 24: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Rational Drug Design

Receptor-based design

Molecular complementarity

Ligand-based design

Molecular overlapping

Pharmacophore-based design

Molecular mimicry

Page 25: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

∆Srt

∆Sint

∆HLW∆HRW

∆HLR

∆SW

∆Svib

Ligand insolution

free rotation

Receptor

bound water

loosely associatedwater molecules

free water

Receptor-Ligand complex

Predicting binding affinities∆Gbinding = f (Interactions)

Dbinding KRTG log.=∆

Binding free energyGas constant

Temperature

Equilibrium dissociation constant

∆H-T∆S = ∆G

Free energyEnthalpyEntropy

Page 26: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Case Study

*IQ2-P

*IQ2-OH

Overexpression of phosphatase by cancer cells

CANCER CELLCANCER CELL

HEALTHY CELLHEALTHY CELL

water-soluble,

non-fluorescent

prodrug

*123I/125I/127I/131I

water-insoluble,fluorescent drug

I

NNH

O

HO OPO

O-

*

I

NNH

O

HO

*

• By data mining, identified extracellular hydrolases overexpressed by tumor cells

EMCIT concept: Enzyme Mediated Cancer Imaging and Therapy

Page 27: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Results of PAP-IQ2-P Docking

Asp258

His12

Arg79

Arg11

His257

Asp258

His12Arg79

Arg11

Arg15

His257

Docking using AutoDock 3.0: Docking of flexible ligand into the rigid active site of the target; genetic algorithm

PAP-IQ2-P∆G = -13.39 kcal/mol

PAP-BABPA∆G = -12.35 kcal/mol

NH

PO

OH

O

N

NH

OP

O

OH O

O

I

Pospisil et al., Cancer Research, in press for March 2007

Page 28: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

IQ2-P Hydrolysis In Vitro by PAP

125IQ2-P

125IQ2-OH

PAP

0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 min0.0

1.0e3

2.0e3

3.0e3

4.0e3

5.0e3

6.0e3

7.0e3

8.0e3

9.0e3

1.0e4

1.1e4

1.2e4

1.3e4cps

0:00 2:00 4:00 6:00 8:00 10:00 12:00 14:00 16:00 min0.0

1.0e32.0e33.0e34.0e35.0e36.0e37.0e38.0e39.0e31.0e41.1e41.2e41.3e41.4e41.5e41.6e41.7e41.8e4

cps

0.0001 Unit/µl

Time (min)

1 Unit/µl

PAP

LNCaP

22Rv1

HMEC

Pospisil et al., Cancer Research, in press for March 2007

Page 29: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Conclusion• Bioinformatics can link resources and reveal

known/unknown information about gene and proteins, their relationships to biological functions and diseases.

• Data Mining can identify the therapeutically interesting targets present in the huge corpus of knowledge.

• Molecular modeling allows exploration of target candidates

• Docking places ligands in target active site• QSAR compares compounds activities to their

structures

Page 30: Data Mining & Molecular Modelingmedapps.med.harvard.edu/kassislab/Pavel_web/Course_JPNM2007.… · Data Mining & Molecular Modeling Pavel Pospisil Department of Radiology ... molecules

Let’s exercise!Let’s try it.

Think about a protein or its gene, think about its function, cellular process..Say its name…Can we find it?

Where is the protein expressed?Is the 3D structure known?

Does the protein bind a ligand, inhibitor or substrate?Can we draw the ligand?

Docking or QSAR?