YMIB Maze in biology: the pathway problem Ueng-Cheng Yang ( 楊永正 ) Institute of Bioinformatics National Yang-Ming University Nov. 14, 2003

Post on 21-Dec-2015

215 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

Transcript

YMIB

Maze in biology: the pathway problem

Ueng-Cheng Yang (楊永正 )

Institute of Bioinformatics

National Yang-Ming University

Nov. 14, 2003

http://www.flint.umich.edu/Departments/ITS/crac/mazeorig.form.html

YMIB

oogenesis

mRNAlocalization

fertilization 1st cleavage

2nd cleavage

3rd cleavage

2 identical cells

4 identical cells

8 cells with 2 different cell types

sperm

oocyte

embryonic development

Genome is the complete set of genetic material, which is similar to the

programs in the ROM

YMIB

Gene expression of eukaryotes

Picture taken fromLehninger’s “Principles of Biochemistry”

YMIB

Microarray (Gene chip) is a high-throughput technique that may measure thousands of gene expressi

on at a time

Black box

Changes in geneexpression

Perturbation

YMIB

Presentation of life and knowledge management

Sequence information

decompress

Expression level

Tissue (spatial)

Development(temporal)Genes

YMIB

Transform or out of the game?

http://www.sciencemag.org/cgi/content/full/291/5507/1221/F1

Global

High-throughput

analysis

Local

Individualanalysis

YMIB

Bioinformatics should provide the direction for future biology

Bioinformaticsresearch

Genome, transcriptomeand proteome research

Collectdata

Interpretdata

tatttctctactgatttgaacaagattgtcgagaaattcccaaaacaagccgaaaaattg

Data  => Information => Knowledge => Technique => Economy

YMIB

Are there rules in biology?

* Picture made from screenshot of http://www.shef.ac.uk/~chem/web-elements/

YMIB

Should there be rules in biology?

Geneduplication

Variation(mutation)

Geneduplication

Recombination

+

YMIB

Pathway study is the one of most fundamental problems for biological research at molecular level

• Metabolism• Signal transduction• Biosynthesis of

macromolecules (mechanism study)– Replication

– Transcription

– RNA processing

– Translation

YMIB

Similar chemistry can be re-used in different enzymes

COOH COOHCH2 CH2

CH2 CH2

C O + NAD+ + CoASH C O + CO2 + NADH+H+

COOH S CoAketoglutarate succinyl CoA

YMIB

Paralogous genes may have similar functions

Linear molecule pyruvate (3) → acetyl CoA (2) + CO2

-ketobutyrate (4) → propionyl CoA (3) + CO2

-ketoglutarate (5) → succinyl CoA (4) + CO2

-ketoadipic acid (6) → glutaryl CoA (5) + CO2

Branched molecule-ketoisovalerate (5) → isobutyryl CoA (4) + CO2

-ketoisocaproic acid (6) → isovaleryl CoA (5) + CO2

-keto--methylvalerate (6) → -methylbutyryl CoA (5) + CO2

YMIB

Observation (III): “Dehydrogenation, hydration, dehydrogenation” is a pathway module

OAA citrate

isocitrate

-ketoglutarate

succinyl CoAsuccinate

malate

fumarate

-2H-CO2

-2H-CO2

CoA

-2H

-2H

CoA + GTP

acetyl CoA

release CO2

reforming the carrier

H2O

TCA cycle

YMIB

A set of reactions can be “re-used” together

RCH2CH2 CH2C-S-CoA

O

RCH2CH=CHC-S-CoA

OH O

RCH2CH CH2C-S-CoA

OH O

RCH2C CH2C-S-CoA

O O

-2H

+H2O

-2H

RCH2CH2CH2CH2CH2CH2C-S-CoA

O

RCH2CH2CH2CH2C-S-CoA

O

RCH2CH2C-S-CoA

O

Acetyl CoA

Acetyl CoA

YMIB

A single reaction may create a new pathway

3 1

5 + 5

3 + 3 + 7

6

4 + 6

Trans-ketolase 6 Trans-aldolase 1

5 + 5

33 + 7

6

4 + 6

Photosynthesis Pentose phosphate cycle

YMIB

The pathway problems that might be obvious to physicists

Pathway simulation => hypothetical cell– Flux balance analysis– S-system– … etc.

YMIB

Complicated feedback regulation

A B C D

W X Y Z(-)

(-)

"x"(such as ADP) will accumulate if this reaction is inhibited.

YMIB

M

G1

S

G2

Cell cycle and simulation of complex biological events

M G1 S G2 M

interphase

YMIB

Other types of pathway problems

• Pathway discovery– From protein-protein interaction and microarray

• Pathway reconstruction– Genome annotation and interpretation

• Pathway simulation => hypothetical cell– Flux balance analysis– S-system

YMIB

Information integration is the first step for data mining

Modification, expression, interaction, structure

DNA

RNA

transcription

translation

protein

Genomic seq.

EST, SAGE,Gene chips

Annotation,comparison

YMIB

Different cells have the same genome, but they express different set of genes after differentiation

Colon KidneyLung OvarySmallintestineTestis Thyroid… Total

EGF 0 15 1 0 0 0 0 … 19EGFR 3 4 19 9 0 0 0 … 103PLCG1 1 3 7 1 2 1 0 … 68SHC1 4 10 22 1 0 3 1 … 249GRB2 1 1 3 2 0 0 2 … 77SOS1 4 3 0 2 0 0 0 … 36HRAS 1 7 10 0 2 1 0 … 58RAF1 4 6 28 1 3 4 0 … 197MAP3K1 2 8 2 2 0 0 0 … 44MAP2K4 5 6 1 3 1 4 0 … 81MAP2K1 4 10 3 2 0 2 0 … 82MAPK8 1 2 2 0 0 1 0 … 33STAT1 13 32 14 6 4 6 3 … 260STAT3 3 7 17 7 0 1 0 … 135MAPK3 9 10 9 4 1 1 0 … 181

YMIB

Organizing the known information: Integrating different types of pathways

Signal transduction Gene regulatorynetwork

Metabolicpathway

CDK E2F PFK

F6P

F1,6P

EGF

Glycolysis

YMIB

Steps in pathway discovery

Factors involved => Components

Molecular interaction => Events

Order of events => Pathways

Pathway interaction => Circuits

YMIB

The dream of molecular biologists

?

Cell., 100(1):57–70 Review, 2000.

PNAS, Vol. 95, 14863-14868

Science. Vol 292. May,2001

YMIB

Appropriate presentation format is essential for computation

[EGFR]+[EGF] <-> [EGF-EGFR]

[EGF-EGFR]+[EGF-EGFR] <->[(EGF-EGFR)2]

[(EGF-EGFR)2]<->[(EGF-EGFR*)2]

[(EGF-EGFR*)2]+[GAP]<->[(EGF-EGFR*)2-GAP]

[(EGF-EGFR*)2-GAP]+[Grb2]<->[(EGF-EGFR*)2-GAP-Grb2]

[(EGF-EGFR*)2-GAP-Grb2]+[Sos]<->[(EGF-EGFR*)2-GAP-Grb2-Sos]

[(EGF-EGFR*)2-GAP-Grb2-Sos]+[Ras-GDP]<->[(EGF-EGFR*)2-GAP-Grb2-Sos-Ras-GDP]

[(EGF-EGFR*)2-GAP-Grb2-Sos-Ras-GDP]<->[(EGF-EGFR*)2-GAP-Grb2-Sos]+[Ras-GTP]

[Raf]+[Ras-GTP]<->[Raf-Ras-GTP]

[Raf-Ras-GTP]<->[Raf*]+[Ras-GTP*]

Nature biotechnology 20, 370-375

YMIB

Strategy

Nucleus

cellmembrane

Zoutwardreconstruction

Y

X

?

?

inwardreconstruction

Receptor

adaptor

?

?connector

YMIB

Reconstructing pathways based on protein-protein interaction

Receptor

adaptor

… etc.inward

reconstruction

YMIB

Identifying new receptor is the starting point for inward reconstruction

YMIB

1

2

9

10

1112

13

1415

16 17

19

21

22

232425

2627

2829

3

45

678

18

20

30?

The distribution of death domain containing genes in human genome

YMIB

A

B

C

D

E

F

0.1

16 UNC5D10 UNC5A

21 UNC5B7 UNC5C

23 NFKB231

8 NFKB119 DAPK1

34 NY-REN-6436 MALT1

33 IRAK235 IRAK1

26 IRAK-M12

23 EDAR

529 NGFR

27 CRADD6

24 FADD28 TRADD

11 RIPK113 TNFRSF21

32 LRDD1 TNFRSF12

25 TNFRSF1A14 TNFRSF10A

15 TNFRSF10B18 TNFRSF11B

22 TNFRSF630 P84

4 MYD8820 ANK317 ANK1

9 ANK2

Phylogenetic clusters correlate with protein functions

YMIB

Functional correlation: Tissue specificity of gene expression

brain tissues

Paralogous genes

YMIB

Specificity of protein-protein interaction

A

B

C

D

E

F

0.1

16 UNC5D10 UNC5A

21 UNC5B7 UNC5C

23 NFKB231

8 NFKB119 DAPK1

34 NY-REN-6436 MALT1

33 IRAK235 IRAK1

26 IRAK-M12

23 EDAR

529 NGFR

27 CRADD6

24 FADD28 TRADD

11 RIPK113 TNFRSF21

32 LRDD1 TNFRSF12

25 TNFRSF1A14 TNFRSF10A

15 TNFRSF10B18 TNFRSF11B

22 TNFRSF630 P84

4 MYD8820 ANK317 ANK1

9 ANK2

TNFRSF1A, 12 --- TRADD --- FADDTNFRSF6, 10A, 10B --- FADD

YMIB

Reconstructing pathways based gene expression and pathway information

Nucleus

cellmembrane

Jun

outwardreconstruction

MAPK8-P*

MAPK8-P*

MAP2K4-P*

?

YMIB

Related pathways in heart

YMIB

Related pathways can be discovered by looking for shared components among pathways

25

25

23

16

14

14

20

17

18

13

1915

19

15

1517

1517

13

13

13

1813

Shared

Component

Pathway1 Pathway2 Index

pdgfPathway egfPathway 1.96e-40

pdgfPathway tpoPathway 9.89e-27

pdgfPathway igf1Pathway 2.26e-22

pdgfPathway insulinPathway 2.26e-22

egfPathway igf1Pathway 2.26e-22

egfPathway insulinPathway 2.26e-22

pdgfPathway ngfPathway 2.20e-22

… … …

YMIB

To die, or not to die? It’s a

signaling problem

YMIB

If PDGF receptor does not exist in colon, why do we need the downstream

components in PDGF

signaling pathway?

YMIB

“MAP2K4, MAPK8, Jun” is a pathway

module shared by at least 3 pathways

PDGF 11

EGF 11

TNF 21

EGF/PDGF 16

ALL 4

YMIB

Pathway modules

MAP3K1(MEKK1)module

RAF1(RAF)

module

MAP3K7(TAK)

module

Death signalGrowth signal Stress signal

HRASTRAF2

FOS JUN ATF2 SP1

Gene expression regulation, (including transcription, splicing), translation and protein modification…

RPS6KA5

YMIB

Connector

Factors involved => Components

Molecular interaction => Events

Order of events => Pathways

YMIB

Inducible gene sets are co-regulated.

Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html

YMIB

Most constitutively expressed genes are not regulated

Pyruvate kinase

Rate-limiting step is usually the target for regulation

YMIB

Microarray exp. is the nature’s way to cla

ssify genes

Collect sections from different angles

Image reconstructionhttp://www.npcc.gov.tw/npcc/chn/imaging/imaging.htm

Tomography(斷層掃瞄 )

YMIB

In extreme environment, the whole pathway can be turned on/offALPHA = alpha factor arrest 18; ELU = centrifugal elutriation 14; CDC15 = cdc15 ts 15; SPO = sporulation 7; HT = shock by high temp 6; D = reducing agent 4; C = low temp 4; DX = diauxic shift 7

Clustering is driven by these features

ALPHA ELU CDC15 SPO HT D C DX

Conflicts?

YMIB

Unrelated sequences of similar function cluster together

Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA 95, 14863-14868.

YMIB

How good is the classification?

• In microarray clustering– hexokinase II– phosphofructokinase– aldolase– triose phosphate isomerase– GAPDH 1, 2, 3– phosphoglycerate kinase– phosphoglycerate mutase– Enolase II– pyruvate kinase

• In glycolysis, in total there are 10 enzymes involved

• Microarray experiment only missed phospho-glucose isomerase

• Pyruvate (de)carboxylase and transaldolase are mis-placed

Pretty good

YMIB

Pathway is a subset of components in a regulatory network

How can we reconstruct the network from partial pathways?

YMIB

Tri-component relation is better than bi-component relation

YMIB

Distinguishing branch and linear structures is sufficient

YMIB

Distinguish the branch and linear structures

YMIB

Exact order within a subset is not essential to reconstruct the pathway

4 5 6 73

{4,5,6}{5,6,7}

{3,4,5}

3=>4=>5=>6=>7

{5,4,6}

{7,5,6}

{4,5,3}

YMIB

Integrating discontinuous tri-component relation

YMIB

Summary

• Inward reconstruction– Look for novel receptors by protein domain search– Look for possible pathways by protein-protein interaction

information.

• Connector– Look for trio-relation by learning Bayesian network

• Outward reconstruction– Look for pathway modules– Establish transcription regulation network

Need a user-centric environment for information-

driven biomedical research

YMIB

Acknowledgements

• Yuh-Fan Liu: Genome wide motif scanning

• Yung-Wen Deng: Death domain resource and cross talks among pathways

• Yu-Tai Wang: Pathway knowledge management system

• Kai-Lung Tang: Pathway visualization

• Shih-Te Yang: Pathway prediction

• Collaborator: Dr. Der-Ming Liou

YMIB

Complications in regulation

Alternative pathways caused by alternative splicing events

YMIB

Differential Processing of The Calcitonin Gene Transcript in Rats

Picture taken from Lehninger’s “Principles of Biochemistry”

YMIB

A tumor necrosis factor receptor that lacks of transmembrane region

YMIB

A FADD protein that lacks of DED domain

YMIB

Information-driven biomedical research

Make observations and working hypotheses by comparing information

top related