Top Banner
1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver
53

1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

Jan 02, 2016

Download

Documents

Miles Hood
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

1

Integrated Genomics

Steven JonesGenome Sciences Centre

Vancouver

Page 2: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

2

Integrated Genomics

• How are biological relationships represented?• What is the underlying topology of such

networks?• How do we visualize such networks?• How can we exploit such networks?• How do we go about building such networks?

Page 3: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

3

BiologicalNetworks

Peter Yodzis, University of Guelph

Page 4: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

4

How to make a biological network

• Protein Interaction – Y2H, TAP tagging• ChIP/chip – Protein/DNA regulatory networks• RNAi screens• Synthetic lethal – epistatic relationships• Gene Expression – SAGE,microarray• Metabolic pathways • Signalling – protein/protein protein/sm. mol.

Page 5: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

5

Ho et al. Nature Jan 10 2002; 415:180-3

Page 6: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

6

Page 7: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

7

Studying networks

• How do we start to model or understand biology in the concept of networks.

• What do they look like? How do they behave? How do they evolve?

Page 8: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

8

Erdös-Rényi Model

Pál Erdös Pál Erdös (1913-1996)

RRényinyi (1921-1970)

Erdös-Rényi Model:• Classical random network theory

Page 9: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

9

Who was Paul Erdös?

• He managed to think about more problems than any other mathematician in history and could recite the details of all 1,475 of the papers he had written or co-authored. Fortified by expresso and amphetamines, Erdös did mathematics 19 hours a day, seven days a week.

• For the last 25 years of his life, Erdös raced against the specter of old age to prove as many mathematical theorems as possible. "The first sign of senility," Erdös often said, "is when a man forgets his theorems. The second sign is when he forgets to zip up. The third sign is when he forgets to zip down."

Page 10: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

10

Erdös-Rényi Model

Pál Erdös Pál Erdös (1913-1996)

RRényinyi (1921-1970)

Erdös-Rényi Model:• Connectivity follows a Poisson distribution

Page 11: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

11

Pál Erdös Pál Erdös (1913-1996)

RRényinyi (1921-1970)

Scale-Free Network:• Deviation from complete randomness.

Scale-Free Model

Page 12: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

12

Power Law Distribution

• Research has shown that the distribution of links to all sites on the web approximates a “power law”.

• A small number of sites receive the majority of links and most sites receive very few links.

DM Pennock, GW Flake, S. Lawrence, EJ Glover and CL Giles, “Winners don’t take all: Characterizing the competition for links on the web”, (2002) PNAS 99(8):5207-5211.

Page 13: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

13

Power Law Distribution

•Power law distributions are usually plotted using log scales

for both the x and y axis.

•A pure power law becomes a straight line.

•Displays the distribution of inlinks to 100,000 random web pages.

• Plotted using regular linear scales on each axis.

a small number of sites receive the majority of links and most sites receive very few links

Page 14: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

14

Classical

Scale-Free

Two Network Models

Gen

es

Interactions

Page 15: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

15

Inherent Robustness (and Fallibility)

• Can functionality be maintained with errors or failures?• Yes - Lethality correlates with the number of interactions the node

(protein) has.• Random attacks on the network will likely hit poorly connected nodes• Targetted attacks on highly connected nodes will have a much greater

effect

Non-Lethal

Lethal

Page 16: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

16

Exploiting Biological Networks

• Topology of biological networks displays similarity to other self-organising networks such as the world-wide-web and the internet

• Much of this work pioneered by Albert Barabási, University of Notre Dame

Page 17: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

17

• H.Jeong, B.Tombor, R.Albert, Z.N.Oltvai, A.L.Barabasi, “Lethality and centrality in protein networks”, Nature 407 651 (2000).

Predicting perturbation of a network

Page 18: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

18

Page 19: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

19

Modeling Network

Behaviours

Ideker T, Ozier O, Schwikowski B, Siegel A. Discovering regulatory and signaling circuits in molecular interaction networks. Bioinformatics 18, S233 (2002).

Cytoscape allows the correlation of protein interaction data With expression data

Hypothesis is that changes in gene expression should Correlate with specific regions of the network (“Hot-Spots”)

http://www.cytoscape.org/

Page 20: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

20

Cytoscape can identify active subnetworks Under different conditions in yeast

Protein linkages can be through Physical protein interaction (Blue) or through Protein->DNA interactions (yellow)

Page 21: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

21

The Cytoscape Application

Page 22: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

22

Also Osprey, a Canadian equivalent. Breitkreutz, BJ., Stark, C., Tyers M. "Osprey: A Network Visualization System." Genome Biology 2003 4(3):R22

Page 23: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

23

Are Humans Capable of Understanding or Interpreting Networks?

Page 24: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

24

How can we use LSF networks to study Cancer?

• LSF networks are robust. Randomly knocking out a node is unlikely to perturb the network.

• Targeting highly-connected nodes is likely to perturb the network. But these are also more likely to have a severe effect on normal cells.

Page 25: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

25

• This can explain how chemotherapeutics have been historically chosen, why they are toxic and why they don’t work.

• Therefore, we need to determine a combination of drugs that will preferentially damage a cancer cells and not the network of normal cells.

Page 26: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

26

Can we compare network topologies from tumorous and normal tissues?

• Are there real differences in the networks that we can exploit?

• If differences exist, then is there a combination of drugs which can selectively perturb the cancerous network and not the normal network?

Page 27: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

27

Comparative Topologies

Page 28: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

28

10000000

1E+09

1E+11

1E+13

1E+15

1E+17

1E+19

1E+21

1E+23

1E+25

1E+27

1E+29

1E+31

1E+33

1E+35

1E+37

1E+39

1E+41

2 3 4 5 6 7 8 9 10

Perturbing a 10,000 node network. Targeting between 2 and 10 random nodes (Atoms on earth 1E+49).

The number of possible permutations

Page 29: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

29

Network Perturbation Results

Page 30: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

30

Integrative Genomics

• We are along way from being able to simulate the entire biological network of a cell. Although, groups are working on this, e.g. the CyberCell project, Alberta.

• However, we can already combine genetic, expression, interaction, pathway, medical, physiological data into networks to allow us to answer biologically relevant questions.

Page 31: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

31

• Show all the genes that are significantly up-regulated in a tumor and which are known cancer genes or are more than 50% identical to a known human cancer gene.

• Show all the proteins that are known to play a role in the process of apoptosis in human or are 70% identical to proteins known to be involved in apoptosis in any other organism.

• Show all the genes which are significantly down-regulated in the tumor and for which mutants are known in either the mouse, Drosophila, or C.elegans

Example Integrative Questions

Page 32: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

32

•Show me all the genes that have been implicated in a human disease, or genes that are known to be part of biological pathway for which a disease gene has been determined

•Show me all the Zn-finger proteins that are up-regulated, or any proteins known to bind to these Zn-finger proteins

•Using literature data, show all the genes that are up regulated and which are thought to bind or interact with telomeres in any organism.

•etc…

Page 33: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

33

An Example

(c) CGDN

Page 34: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

34

• COX deficeincy, also Known as Leigh Syndrome

•COX, functioning in the mitochondria, consists of 13 subunits and many other proteins are required for proper assembly and coordination with co-factors.

•An integrative genomics approach, using bioinformatics, was able to consolidate mitochondrial proteomics, gene expression data and the genetic map to pin-point the exact gene from a 2 megabase pair interval

Page 35: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

35

Building Biological relationships and networks

Page 36: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

36

Page 37: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

37

Assigning Orthology

• A key step in integrative genomics is going to be able to infer information between species

• Important if we using the relationship to infer a function.

• Need to consider “In-paralogs” which are also bona-fide orthologs

• Software such as Inparanoid can detect both orthologues and in-paralogs

Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Remm M, Storm CE, Sonnhammer EL. J Mol Biol. 2001

Page 38: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

38

Assigning Orthology

A B C A B2B1 C

Organism 1 Organism 2

B1 and B2 are “in-paralogues”

Page 39: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

39

Assigning Function

• PFAM provides a large collection of Hidden-Markov-Models determined from protein multiple sequence alignments.

• Version 6.6 contains over 3071 protein domain families covering 69% of proteins in the SwissProt database

The Pfam Protein Families Database, Bateman et al. Nucleic Acids Research, 2002, Vol. 30, No. 1 276-280.

Through Computational Means

Page 40: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

40

Page 41: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

41

Using Sequence similarity to assign function

• Exploits the standard hypothesis that the more similar two proteins are the more likely they are to have the same function.

• But how do we know the annotations of the proteins are correct. Many annotations are inferred from other incorrect annotations.

• Is the annotation relevant to the species.• Annotations are inconsistent in their wording and

specificity. • Therefore, need a way to link the computational

output of a blast search with curated annotation.

Page 42: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

42

The Gene Ontology

• Provides a controlled vocabulary to describe the roles of genes and proteins.

• GO assigns three ontologies to each gene, molecular function, cellular location and cellular location.

• GO has now been adopted by almost all model organism databases

Gene Ontology: tool for the unification of biology. The Gene OntologyConsortium (2000) Nature Genet. 25: 25-29.

Page 43: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

43

But what is an ontology?

The hierarchical structuring of knowledge about things by subcategorising them according to their essential (or at least relevant and/or cognitive) qualities.

Page 44: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

44

Representing an Ontology

Page 45: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

45

The Gene OntologyConsortium

Page 46: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

46

The Gene OntologyConsortium

Page 47: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

47

Mapping Protein Interactions

• Bind database aims to provide full descriptions of interactions, molecular complexes and pathways

• Data is freely available and in XML, ASN.1 and text format for easy manipulation

• Over 16,000 protein interactions are currently in the database.

Bader GD, Betel D, Hogue CW. (2003) BIND: the Biomolecular

Interaction Network Database. Nucleic Acids Res. 31(1):248-50

Page 48: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

48

Page 49: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

49

Page 50: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

50

How do we computerize biological knowledge?

• Most biological facts and inferences are present in the literature and not in accessible databases.

• Literature is represented by free-form text.• Gene names are inconsistent and ambiguous• PreBind database has been generated

through a literature mining approach

Donaldson et al (2003) PreBIND and Textomy - mining thebiomedical literature for protein-protein interactions using a support vector machine. BMC Bioinformatics. 4(1):11.

Page 51: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

51

Natural Language Processing

Information OverloadFiltering• important concepts• relevant facts

Relating Collected Facts

noun-verb-noun pattern

Literature Sources

• Pubmed 12M abstracts• 400K per year

Most scientific information is in literature

cancer

related

• The XYZ gene is expressed in cancer.

• Gene A interacts with XYZ

XYZ gene

Gene A

Page 52: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

52

Data Federation Approaches

Page 53: 1 Integrated Genomics Steven Jones Genome Sciences Centre Vancouver.

53

Further Reading H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.-L. Barabási The large-scale organization of metabolic networks Nature 407, 651-654 (2000).

Schwikowski, B., et al. 2000. A network ofprotein-protein interactions in yeast. Nature Biotechnology. 18:1257-1261.

Lenhard B, Hayes WS, Wasserman WW. GeneLynx: a gene-centric portal to the humangenome. Genome Res. 2001 Dec;11(12):2151-7.