Top Banner
Network Biology: from lists to underpinnings of molecular behaviour Michel Dumontier, Ph.D. Associate Professor of Bioinformatics Carleton University 1 BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
127

Network Biology: from lists to underpinnings of molecular behaviour

May 10, 2015

Download

Health & Medicine

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Network Biology: from lists to underpinnings of molecular behaviour

Network Biology:from lists to underpinnings of molecular

behaviour

Michel Dumontier, Ph.D.Associate Professor of Bioinformatics

Carleton University

1BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 2: Network Biology: from lists to underpinnings of molecular behaviour

2BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 3: Network Biology: from lists to underpinnings of molecular behaviour

Provenance

• This talk was prepared in part with input from the “Interpreting Gene Lists” workshop put forward by the Canadian Bioinformatics Workshops (bioinformatics.ca)

• http://bioinformatics.ca/workshops/2009/course-content

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier] 3

Page 4: Network Biology: from lists to underpinnings of molecular behaviour

So you did some mass spectrometry?

Protein Identification4BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 5: Network Biology: from lists to underpinnings of molecular behaviour

database search vs de novoS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lative

Ab

un

da

nce

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

WR

A

C

VG

E

K

DW

LP

T

L T

WR

A

C

VG

E

K

DW

LP

T

L T

de novo

AVGELTK

Database Search

Database ofknown peptides

MDERHILNM, KLQWVCSDL, PTYWASDL, ENQIKRSACVM, TLACHGGEM, NGALPQWRT,

HLLERTKMNVV, GGPASSDA, GGLITGMQSD, MQPLMNWE,

ALKIIMNVRT, AVGELTK, HEWAILF, GHNLWAMNAC,

GVFGSVLRA, EKLNKAATYIN..

5BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 6: Network Biology: from lists to underpinnings of molecular behaviour

6BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 7: Network Biology: from lists to underpinnings of molecular behaviour

My experiment worked and I have dozens, hundreds, or thousands of

hits…. now what?

?Protein

IdentificationS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lative

Ab

un

da

nce

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

7BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 8: Network Biology: from lists to underpinnings of molecular behaviour

Use the list to explore Biology

• Determine significant shared attributes• Explore putative mechanisms of actions• Test hypotheses

Protein IdentificationS#: 1708 RT: 54.47 AV: 1 NL: 5.27E6

T: + c d Full ms2 638.00 [ 165.00 - 1925.00]

200 400 600 800 1000 1200 1400 1600 1800 2000

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lative

Ab

un

da

nce

850.3

687.3

588.1

851.4425.0

949.4

326.0524.9

589.2

1048.6397.1226.9

1049.6489.1

629.0

Eureka!Hypothesis on the

molecular basisof disease/process

Network Biology

8BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 9: Network Biology: from lists to underpinnings of molecular behaviour

# in list having attribute

# in list sharing these attributes

Oxidative Metabolism

Detoxification

Enriched in smokers =UP-regulated in smokers

9BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 10: Network Biology: from lists to underpinnings of molecular behaviour

Outline

1. Explore identified proteins

2. Attribute enrichment

3. Networks

4. Pathways

5. Lab

10BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 11: Network Biology: from lists to underpinnings of molecular behaviour

A hypothesis underlies the list of identified proteins

• An initial question was posed, an experiment performed and a list of candidates obtained.

• The question is, what are the roles of these entities in the biological process being investigated. – Normal vs pathological– Response to stimulus– Interactions and complexes

11BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 12: Network Biology: from lists to underpinnings of molecular behaviour

Biological Answers

• Computational systems biology– Information retrieval and summary– Interaction network analysis– Pathway analysis– Function prediction

12BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 13: Network Biology: from lists to underpinnings of molecular behaviour

Molecular Attributes

• An attribute provides information about to the entity in question (e.g. shape, function, process)

• Sequence and structure provides information about – Motifs, domains, interaction/binding sites, post-

translational modifications, conformational changes, molecular complexes, mutations, conservation/evolution

– Functions, localization, biological / pathological processes

13BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 14: Network Biology: from lists to underpinnings of molecular behaviour

Gene Ontology

• Captures terminology related to three aspects– biological processes– molecular functions – cellular components

• Relationships between terms are largely defined with “is a” and “part of” relations

Cell division

Isomerase activity

14BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 15: Network Biology: from lists to underpinnings of molecular behaviour

GO Structure cell

membrane chloroplast

mitochondrial chloroplastmembrane membrane

is-apart-of

Species independent. Some lower-level terms are specific to a group, but higher level terms are not

15BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 16: Network Biology: from lists to underpinnings of molecular behaviour

Gene Ontology

• 30,393 terms, 99.2% with definitions– 18,939 biological processes– 2,735 cellular components– 8,719 molecular functions

• GO Slim is an official reduced set of GO terms– Generic, plant, yeast– Good for making pie charts

16BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 17: Network Biology: from lists to underpinnings of molecular behaviour

Annotation

• Manual annotation– Created by scientific curators

• High quality• Small number (time-consuming to create)

• Electronic annotation– Annotation derived without human validation

• Computational predictions (accuracy varies)• Lower ‘quality’ than manual codes

• Key point: be aware of annotation origin

17BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 18: Network Biology: from lists to underpinnings of molecular behaviour

Evidence Type(provenance of facts)

• ISS: Inferred from Sequence/Structural Similarity

• IDA: Inferred from Direct Assay• IPI: Inferred from Physical Interaction• IMP: Inferred from Mutant Phenotype• IGI: Inferred from Genetic Interaction• IEP: Inferred from Expression Pattern• TAS: Traceable Author Statement• NAS: Non-traceable Author Statement• IC: Inferred by Curator• ND: No Data available

• IEA: Inferred from electronic annotation

18BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 19: Network Biology: from lists to underpinnings of molecular behaviour

Variable Coverage

Lomax J. Get ready to GO! A biologist's guide to the Gene Ontology. Brief Bioinform. 2005 Sep;6(3):298-304.

19BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 20: Network Biology: from lists to underpinnings of molecular behaviour

GO Software Tools

• GO resources are freely available to anyone without restriction– Includes the ontologies, gene associations

and tools developed by GO• Other groups have used GO to create

tools for many purposeshttp://www.geneontology.org/GO.tools

20BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 21: Network Biology: from lists to underpinnings of molecular behaviour

Accessing GO: QuickGO

http://www.ebi.ac.uk/ego/21BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 22: Network Biology: from lists to underpinnings of molecular behaviour

Explore Ontologies

http://www.ebi.ac.uk/ontology-lookup

22BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 23: Network Biology: from lists to underpinnings of molecular behaviour

Databases of Molecular Annotation

• NCBI – Genbank / RefSeq– Entrez Gene

• EBI – UniProt– Ensembl BioMart

(eukaryotes)

Model Organism Databases• Berkeley Drosophila Genome Project (BDGP)• dictyBase (Dictyostelium discoideum) • FlyBase (Drosophila melanogaster) • GeneDB (Schizosaccharomyces pombe,

Plasmodium falciparum, Leishmania major and Trypanosoma brucei)

• UniProt Knowledgebase (Swiss-Prot/TrEMBL/PIR-PSD) and InterPro databases

• Gramene (grains, including rice, Oryza) • Mouse Genome Database (MGD) and Gene

Expression Database (GXD) (Mus musculus) • Rat Genome Database (RGD) (Rattus

norvegicus)• Reactome• Saccharomyces Genome Database (SGD)

(Saccharomyces cerevisiae) • The Arabidopsis Information Resource (TAIR)

(Arabidopsis thaliana) • The Institute for Genomic Research (TIGR):

databases on several bacterial species • WormBase (Caenorhabditis elegans) • Zebrafish Information Network (ZFIN): (Danio

rerio 23BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 24: Network Biology: from lists to underpinnings of molecular behaviour

24BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 25: Network Biology: from lists to underpinnings of molecular behaviour

Identifiers

• Identifiers (IDs) are ideally unique, stable names or numbers that help track database records– E.g. Social Insurance Number, Entrez Gene ID 41232

• Gene and protein information stored in many databases– Genes have many IDs

• Records for: Gene, DNA, RNA, Protein– Important to recognize the correct record type– E.g. Entrez Gene records don’t store sequence. They

link to DNA regions, RNA transcripts and proteins.

25BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 26: Network Biology: from lists to underpinnings of molecular behaviour

NCBI Database

Links

http://www.ncbi.nlm.nih.gov/Database/datamodel/data_nodes.swf

NCBI:U.S. National Center for Biotechnology Information

Part of National Library of Medicine (NLM)

26BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 27: Network Biology: from lists to underpinnings of molecular behaviour

Common IdentifiersSpecies-specificHUGO HGNC BRCA2MGI MGI:109337RGD 2219 ZFIN ZDB-GENE-060510-3 FlyBase CG9097 WormBase WBGene00002299 or ZK1067.1 SGD S000002187 or YDL029WAnnotationsInterPro IPR015252OMIM 600185Pfam PF09104Gene Ontology GO:0000724SNPs rs28897757Experimental PlatformAffymetrix 208368_3p_s_atAgilent A_23_P99452CodeLink GE60169Illumina GI_4502450-S

GeneEnsembl ENSG00000139618Entrez Gene 675Unigene Hs.34012

RNA transcriptGenBank BC026160.1RefSeq NM_000059Ensembl ENST00000380152

ProteinEnsembl ENSP00000369497RefSeq NP_000050.2UniProt BRCA2_HUMAN or A1YBP1_HUMANIPI IPI00412408.1EMBL AF309413 PDB 1MIU

Red = Recommended27BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 28: Network Biology: from lists to underpinnings of molecular behaviour

Identifier Mapping

• So many IDs!– Mapping (conversion) is a headache

• Four main uses– Disambiguate similarly named entities– Used to reference related information– Biological and informational provenance

• E.g. Genes to proteins, Entrez Gene to Affy

– Unification during dataset merging• Equivalent entities

28BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 29: Network Biology: from lists to underpinnings of molecular behaviour

ID Mapping Services

• Synergizer– http://llama.med.harvard.edu/

synergizer/translate/

• Ensembl BioMart

– http://www.ensembl.org

• UniProt– http://www.uniprot.org/

29BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 30: Network Biology: from lists to underpinnings of molecular behaviour

Outline

1. Explore identified proteins

2. Attribute enrichment

3. Networks

4. Pathways

30BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 31: Network Biology: from lists to underpinnings of molecular behaviour

Attribute Enrichment (AE)

Given:1. list: e.g. RRP6, MRD1, RRP7, RRP43, RRP42

2. attributes: e.g. function, process, localization, interactions

AE Question: Are any of the attributes surprisingly enriched in the list?

• Details:– How to assess “surprisingly” (statistics)– How to correct for repeating the tests

31BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 32: Network Biology: from lists to underpinnings of molecular behaviour

What is a P-value?

• The P-value is (a bound) on the probability that the “null hypothesis” is true,

• Calculated through statistics with the data and testing the probability of observing those statistics, or ones more extreme, given a sample of the same size distributed according to the null hypothesis,

• Intuitively: P-value is the probability of a false positive result (aka “Type I error”)

32BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 33: Network Biology: from lists to underpinnings of molecular behaviour

How likely are the observed differences between the two distributions due to chance?

66

7

7

5

01

1 22

1

1

1

10

00 0

value

value distribution

33BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 34: Network Biology: from lists to underpinnings of molecular behaviour

AE using the T-test

Answer: Two-tailed T-test

Black: N1=500

Red: N2=4500

Mean: m1 = 1.1 Std: s1 = 0.9

T-statistic =

Mean: m1 = 4.9 Std: s1 = 1.0

2

22

1

21

21

Ns

Ns

mm

= -88.5

Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same?

34BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 35: Network Biology: from lists to underpinnings of molecular behaviour

AE using the T-test

T-statistic =

2

22

1

21

21

Ns

Ns

mm

= -88.5

T-distribution

Pro

ba

bil

ity

de

ns

ity

T-statistic

0

P-value = shaded area * 2

-88.5

Formal Question: What is the probability of observing the T-statistic or one more extreme if the means of the two distributions were the same?

35BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 36: Network Biology: from lists to underpinnings of molecular behaviour

T-test limitations1. Assumes distributions are both approximately Gaussian (i.e. normal)

– Score distribution assumption is often true for:• Log ratios from microarrays

– Score distribution assumption is rarely true for:• Peptide counts, sequence tags (SAGE or NextGen sequencing), transcription factor

binding sites hits

2. Tests for significance of difference in means of two distribution but does not test for other differences between distributions.

Pro

bab

ilit

y d

en

sity

score 0

Values are positive and have increasing density near zero, e.g. sequence counts

Pro

bab

ilit

y d

en

sity

score

Distributions with outliers, or “heavy-tailed” distributions

Pro

bab

ilit

y d

en

sity

score

Bimodal “two-bumped” distributions.

36BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 37: Network Biology: from lists to underpinnings of molecular behaviour

Kolmogorov-Smirnov (K-S) testP

rob

abil

ity

den

sity

score 0

Question: Are the red and black distributions significantly different?

Calculate cumulative distributions of red and black

Cu

mu

lati

ve p

rob

abil

ity

score 0

0.5

1.0

Cumulative distribution

Length = 0.4

Formal question: Is the length of largest difference between the “empirical distribution functions” statistically significant?

37BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 38: Network Biology: from lists to underpinnings of molecular behaviour

What is the probability of finding 4 or more proteins with feature X in a random sample of

5 proteinslist

RRP6MRD1RRP7RRP43RRP42

Background population:500 X proteins,5000 proteins

38BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 39: Network Biology: from lists to underpinnings of molecular behaviour

Fisher’s exact test

Background population:500 X proteins, 5000 proteins

list

RRP6MRD1RRP7RRP43RRP42

P-value

Null distribution

Answer = 4.6 x 10-4

P-value for Fisher’s exact testis “the probability that a random draw of the same size as the list from the background population would produce the observed number (or more) of attributes in the list.”,depends on size of the list, # with features (in list, background), and the background population. 39BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 40: Network Biology: from lists to underpinnings of molecular behaviour

Important details

• To test for under-enrichment of “black”, test for over-enrichment of “red”.

• Need to choose “background population” appropriately, e.g., if only portion of the total complement is queried (or having annotation), only use that population as background.

• To test for enrichment of more than one independent types of annotation (red vs black and circle vs square), apply Fisher’s exact test separately for each type.

• The hypergeometric test is equivalent to a one-tailed Fisher’s exact test.

40BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 41: Network Biology: from lists to underpinnings of molecular behaviour

How to win the P-value lottery, part 1

Background population:500 X5000 Y

Random draws

… 7,834 draws later …

Expect a random draw with observed enrichment once every 1 / P-value draws

41BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 42: Network Biology: from lists to underpinnings of molecular behaviour

How to win the P-value lottery, part 2Keep the list the same, evaluate different annotations

Observed drawRRP6MRD1RRP7RRP43RRP42

Different annotations

RRP6MRD1RRP7RRP43RRP42

42BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 43: Network Biology: from lists to underpinnings of molecular behaviour

Correcting for multiple tests

• The Bonferroni correction controls the probability any one test is due to random chance aka Family-Wise Error Rate (FWER) If M = # of annotations tested: Corrected P-value = M x original P-value

• The Benjamini-Hochberg (B-H) controls the proportion of positive tests (i.e. rejections of the null hypothesis) that are false positives aka False Discovery Rate (FDR)– FDR is the expected proportion of the observed enrichments that

are due to random chance.– Less stringent than the Bonferroni

43BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 44: Network Biology: from lists to underpinnings of molecular behaviour

Reducing multiple test correction stringency

• The correction to the P-value threshold a depends on the # of tests that you do, so, no matter what, the more tests you do, the more sensitive the test needs to be

• Can control the stringency by reducing the number of tests: – e.g. use GO slim or restrict testing to the appropriate

GO annotations.

44BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 45: Network Biology: from lists to underpinnings of molecular behaviour

AE tools

• Web-based tools – Funspec:

• easy tool for yeast, not maintained, uses GO annotations and some annotations (e.g. protein complexes)

– YeastFeatures • Similar to Funspec, different datasets and presentation

– GoMiner: • Uses GO annotations, covers many organisms, needs a

background set of genes

• Cytoscape-based tools– BINGO:

• Does GO annotations and displays enrichment results graphically and visually organizes related categories

45BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 46: Network Biology: from lists to underpinnings of molecular behaviour

Funspec: Simple ORA for yeasthttp://funspec.med.utoronto.ca/

Paste list hereBonferroni correct? YES!

Choose sources of annotation

Cavaets:• yeast only,• last updated 2002

46BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 47: Network Biology: from lists to underpinnings of molecular behaviour

http://software.dumontierlab.com/yeastfeatures47BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 48: Network Biology: from lists to underpinnings of molecular behaviour

48BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 49: Network Biology: from lists to underpinnings of molecular behaviour

GoMiner, part 1http://discover.nci.nih.gov/gominer

1. Click “web interface”

2. Upload background

3. Upload list

4. Choose organism

5. Choose evidence code (All or Level 1)

49BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 50: Network Biology: from lists to underpinnings of molecular behaviour

GoMiner, part 2

6. Restrict # of tests via category size

7. Restrict # of tests via GO hierarchy

8. Results emailed to this address, in a few minutes

50BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 51: Network Biology: from lists to underpinnings of molecular behaviour

DAVID, part 1 http://david.abcc.ncifcrf.gov/

Paste list here

Choose ID type

List type: list or background?

DAVID automatically detects organism

51BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 52: Network Biology: from lists to underpinnings of molecular behaviour

DAVID, part 2http://david.abcc.ncifcrf.gov/

52BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 53: Network Biology: from lists to underpinnings of molecular behaviour

BINGO, an ORA cytoscape pluginhttp://www.psb.ugent.be/cbd/papers/BiNGO/index.htm

Links represent parent-child relationships in GO ontology

Colours represent significance of enrichment

Nodes represent GO categories

53BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 54: Network Biology: from lists to underpinnings of molecular behaviour

54BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 55: Network Biology: from lists to underpinnings of molecular behaviour

Outline

1. Explore identified proteins

2. Attribute enrichment

3. Networks • Physical networks• Genetic networks• Functional networks

4. Pathways

55BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 56: Network Biology: from lists to underpinnings of molecular behaviour

Why Network and Pathway Analysis?

• Intuitive to Biologists• Provide a biological context for results• More efficient than searching databases gene-by-gene• Intuitive display for sharing data

• Computation on Pathway Content• Visualize multiple data types on a pathway or network• Find active pathways• Identify potential regulators

56BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 57: Network Biology: from lists to underpinnings of molecular behaviour

network

In biology, a network is a graph comprised of nodes that correspond to entities (genes, proteins, small molecules) and edges that correspond to physical/agentive or associative relations between entities.

Vertex (node)

EdgeCycle

-5

Directed Edge (Arc)

Weighted Edge7

10

57BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 58: Network Biology: from lists to underpinnings of molecular behaviour

Integration in a Network Context

58BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 59: Network Biology: from lists to underpinnings of molecular behaviour

Expression data mappedto node colours

Integration in a Network Context

59BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 60: Network Biology: from lists to underpinnings of molecular behaviour

Mapping Biology to a Network

• A simple mapping: Protein-protein interactions– one protein/node, one interaction/edge

• Edges can represent other relationships– Physical e.g. protein-protein interaction– Regulatory e.g. kinase activates target– Genetic e.g. epistasis– Similarity e.g. protein sequence similarity

• Critical: understand the mapping for network analysis

60BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 61: Network Biology: from lists to underpinnings of molecular behaviour

Protein Sequence Similarity Network

http://apropos.icmb.utexas.edu/lgl/61BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 62: Network Biology: from lists to underpinnings of molecular behaviour

Literature Network

• Computationally extract gene relationships from text, usually PubMed abstracts

• Useful if network is not in a database– Literature search tool

• BUT not perfect– Problems recognizing gene names– Natural language processing is difficult

• Agilent Literature Search Cytoscape plugin• iHOP (www.ihop-net.org/UniPub/iHOP/)

62BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 63: Network Biology: from lists to underpinnings of molecular behaviour

Agilent Literature Search

63BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 64: Network Biology: from lists to underpinnings of molecular behaviour

Cytoscape Network produced by Literature Search.

Abstract from the scientific literature

Sentences for an edge

64BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 65: Network Biology: from lists to underpinnings of molecular behaviour

Enrichment Map

A

B

|)||,(|min

||

BA

BA

Overlap

65BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 66: Network Biology: from lists to underpinnings of molecular behaviour

Nodes represent gene-sets

66BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 67: Network Biology: from lists to underpinnings of molecular behaviour

Olfactory Receptor

Muscle Contraction

Ectodermal Dev. &Keratinocyte Diff.

Ubiquitin Processes

DNA Processes

Mitotic Cell Cycle

DNA Repair

DNA ReplicationRas GTPase

Serine Endopeptidase

Chromatin Remodeling

Chromosome

Ubiquitin-dependent Proteolysis

Ubiquitin Ligase

Microtubule Cytoskeleton

Intermediate Filament

Cytoskeleton

Ion ChannelCalcium

Potassium Sodium

Mitochondrial Oxidative

Metabolism

Fatty Acid Metabolism

Cytoskeleton

mRNA Transport

RNA Splicing

RNA Processes

Transcription

rRNA Processing

Ribonucleotide Metabolism

Translation

67BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 68: Network Biology: from lists to underpinnings of molecular behaviour

68

Physical Networks

• Between two molecular objects– DNA, RNA, gene, protein, complex, small molecule,

photon– Requires a site of interaction / binding

• Biologically relevant:– Present/expressed at the same time– Share a cellular location– Leads to some biologically relevant outcome

BA

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 69: Network Biology: from lists to underpinnings of molecular behaviour

Molecular Interactions

RAS interacting with RALGDS

(PDB: 1LFD)

Synthetic protein interacting with ATP and Zinc

(PDB: 2P0X)

69BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 70: Network Biology: from lists to underpinnings of molecular behaviour

70

Experimental Interaction Discovery

Microarray

Two-Hybrid

MassSpectrometry

Genetics

X-Ray

NMR

Direct, Physical Indirect, Physical Indirect, Genetic

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 71: Network Biology: from lists to underpinnings of molecular behaviour

71

Experimental Considerations• How do you know if the interaction really

exists? • Each method has its advantages and

disadvantages. – Be aware of systematic errors– Be aware of contaminants.

• Each method observes interactions from a slightly different experimental condition.

• Support from many different sources is certainly better (necessary) than just one.

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 72: Network Biology: from lists to underpinnings of molecular behaviour

72

B

Some affinity purification caveats

A

First and most importantly, this is only a representation of the observation.

You can only tell what proteins are in the eluate; you can’t tell how they are connected to one another.

If there is only one other protein present (B), then its likely thatA and B are directly interacting.

But, what if I told you that two other proteins (B and C) werepresent along with A…. B

A

C

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 73: Network Biology: from lists to underpinnings of molecular behaviour

73

B

Complexes with unknown topology

A

Which of these models is correct?The complex described by this experimental result is said to have an Unknown Topology.

C B

A

C B

A

C

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 74: Network Biology: from lists to underpinnings of molecular behaviour

74

B

Complexes with unknown stoichiometry

A

Here’s another possibility?The complex described by this experimental result is also said to have Unknown Stoichiometry.

B

A

B

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 75: Network Biology: from lists to underpinnings of molecular behaviour

75

Interaction Models

Spoke Matrix

Simple model, useful for data navigation

More accurate

Theoretical max. number of interactions

ActualTopology

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 76: Network Biology: from lists to underpinnings of molecular behaviour

76

High-throughput Mass Spectrometric Protein Complex Identification (HMS-PCI)

Ste12

Ho et al. Nature. 2002 Jan 10;415(6868):180-3

Mike Tyers, SLRI

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 77: Network Biology: from lists to underpinnings of molecular behaviour

77BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 78: Network Biology: from lists to underpinnings of molecular behaviour

78

k-core analysis

• A part of a graph where every node is connected to other nodes with at least k edges (k=0,1,2,3...)

• Highest k-core is a central most densely connected region of a graph

• Regions of dense connectivity may represent molecular complexes

• Therefore, high k-cores may be molecular complexes

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 79: Network Biology: from lists to underpinnings of molecular behaviour

79

Pre MS Ho

Gavin

Union

6-core 6-core

6-core 9-core

Interaction can define function

MCODE plugin for CytoscapeBIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 80: Network Biology: from lists to underpinnings of molecular behaviour

80

http://pathguide.org

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 81: Network Biology: from lists to underpinnings of molecular behaviour

Interaction Databases

• Experiment (E)• Structure detail (S)• Predicted

– Physical (P)– Functional (F)

• Curated (C)• Homology

modeling (H)• *IMEx consortium

81BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 82: Network Biology: from lists to underpinnings of molecular behaviour

Network Classification of Disease• Traditional: Gene association• Limitations: Too many genes reduces

statistical power• New: Active cell map based approaches

combining network and molecular profiles

Chuang HY, Lee E, Liu YT, Lee D, Ideker TNetwork-based classification of breast cancer metastasisMol Syst Biol. 2007;3:140. Epub 2007 Oct 16

Liu M, Liberzon A, Kong SW, Lai WR, Park PJ, Kohane IS, Kasif SNetwork-based analysis of affected biological processes in type 2 diabetes modelsPLoS Genet. 2007 Jun;3(6):e96

Efroni S, Schaefer CF, Buetow KHIdentification of key processes underlying cancer phenotypes using biologic pathway analysisPLoS ONE. 2007 May 9;2(5):e425

82BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 83: Network Biology: from lists to underpinnings of molecular behaviour

Network-Based Breast Cancer Classification• 57k intx from Y2H,

orthology, co-citation, HPRD, BIND, Reactome

• 2 breast cancer cohorts, different expression platforms

Chuang HY, Lee E, Liu YT, Lee D, Ideker TNetwork-based classification of breast cancer metastasisMol Syst Biol. 2007;3:140. Epub 2007 Oct 16

83BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 84: Network Biology: from lists to underpinnings of molecular behaviour

• Similar network markers across 2 data sets (better than original overlap)

• Increased classification accuracy

• Better coverage of known cancer risk genes (*)

84BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 85: Network Biology: from lists to underpinnings of molecular behaviour

PIPE

• Predicts yeast PPI from sequence– Uses interaction databases to find similar

interacting proteins– Estimates the site of interaction– 75% accuracy (61% sensitivity, 89%

specificity)– Finds new interactions among complexes

85BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 86: Network Biology: from lists to underpinnings of molecular behaviour

86BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 87: Network Biology: from lists to underpinnings of molecular behaviour

87BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 88: Network Biology: from lists to underpinnings of molecular behaviour

PIPE2

• First all-to-all sequence-based computational screen of PPIs in yeast – 29,589 high confidence interactions of ~ 2 x 107

possible pairs – 16,000x faster than PIPE– 99.95% specificity

88BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 89: Network Biology: from lists to underpinnings of molecular behaviour

89

Synthetic Genetic Interactions

• Synthetic genetic interactions (lethal, slow growth)• Mate two mutants without phenotypes to get a daughter

cell with a phenotype• Synthetic lethal (SL), slow growth

• robotic mating using the yeast deletion library• Genetic interactions provide functional data on protein

interactions or redundant genes• About 23% of known SLs (1295 - YPD+MIPS) were

known protein interactions in yeast

Tong et al. Science. 2001 Dec 14;294(5550):2364-8

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 90: Network Biology: from lists to underpinnings of molecular behaviour

90

Cell PolarityCell Wall Maintenance Cell StructureMitosisChromosome StructureDNA Synthesis DNA RepairUnknownOthers

Synthetic Genetic Interactions in Yeast

Tong, BooneBIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 91: Network Biology: from lists to underpinnings of molecular behaviour

Validation: Protein Localization

A – A3: Y2HB: physical methodsC: geneticE: immunological

True positives:- Localized in the

same cellular compartment

- Have common cellular role

Sprinzak, Sattath, Margalit, J Mol Biol, 200391BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 92: Network Biology: from lists to underpinnings of molecular behaviour

Comparisons• All methods except for Y2H and synthetic

lethality technique are biased toward abundant proteins.

• PPI bias toward certain cellular localizations. • Evolutionarily conserved proteins have much

better coverage in Y2H than the proteins restricted to a certain organism.

C. Von Mering et al, Nature, 2002:

92BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 93: Network Biology: from lists to underpinnings of molecular behaviour

Functional Associations• Molecular Interactions• Regulatory Interactions• Genetic Interactions• Similarity relationships

– Co-expression– Protein sequence– Domain architecture– Phylogenetic profiles– Gene neighborhood– Gene fusion– …

93BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 94: Network Biology: from lists to underpinnings of molecular behaviour

http://string.embl.de/von Mering et al., Nucleic Acids Res., 2005

94BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 95: Network Biology: from lists to underpinnings of molecular behaviour

95

95BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 96: Network Biology: from lists to underpinnings of molecular behaviour

96BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 97: Network Biology: from lists to underpinnings of molecular behaviour

=

Query-specific weights for multifaceted function queries

+GeneticTong et al. 2001

w1 x w2 x w3 xweights

Co-expression

CDC27

APC11CDC23

XRS2RAD54

MRE11

UNK1

UNK2

Cell cycle

DNA repair

Pavlidis et al, 2002, Lanckriet et al, 2004Mostafavi et al, 2008

+Co-complexed

Durrett 2006

Gene Function Prediction using a Multiple Association Network Integration Algorithm

97BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 98: Network Biology: from lists to underpinnings of molecular behaviour

GeneMANIA Cytoscape Plugin

98BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 99: Network Biology: from lists to underpinnings of molecular behaviour

Outline

1. Explore identified proteins

2. Attribute enrichment

3. Networks

4. Pathways

5. Lab

99BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 100: Network Biology: from lists to underpinnings of molecular behaviour

pathway

In biology, a pathway is a network which consists of inputs (physical entities), outputs (physical entities, biological outcomes), and the molecular machinery and chemical transformations required/expected to realize the end-directed activity.

100BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 101: Network Biology: from lists to underpinnings of molecular behaviour

Using Pathway Information

Databases

Literature

Expert knowledge

Experimental Data

Find active processesunderlying a phenotype

PathwayInformation

PathwayAnalysis

101BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 102: Network Biology: from lists to underpinnings of molecular behaviour

htt

p:/

/pat

hg

uid

e.o

rg

Vuk PavlovicSylva Donaldson

>290 PathwayDatabases!

• Varied formats, representation, coverage

• Pathway data extremely difficult to combine and use

102BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 103: Network Biology: from lists to underpinnings of molecular behaviour

Aim: Convenient Access to Pathway Information

Facilitate creation and communication of pathway dataAggregate pathway data in the public domainProvide easy access for pathway analysis

http://www.pathwaycommons.org

103BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 104: Network Biology: from lists to underpinnings of molecular behaviour

Access From Cytoscape

104BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 105: Network Biology: from lists to underpinnings of molecular behaviour

Fatty Acid Degradation?Other pathways / processes?

GenMAPP.org

cardiomyopathy: downregulated genes

105BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 106: Network Biology: from lists to underpinnings of molecular behaviour

Fatty Acid Degradation Pathway

106BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 107: Network Biology: from lists to underpinnings of molecular behaviour

Cardiomyopathy Data on Fatty Acid Degradation Pathway

107BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 108: Network Biology: from lists to underpinnings of molecular behaviour

Visualizing Time Course Data on Pathways: Multiple Comparison View

108BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 109: Network Biology: from lists to underpinnings of molecular behaviour

Outline

1. Explore identified proteins

2. Attribute enrichment

3. Networks

4. Pathways

5. Lab

109BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 110: Network Biology: from lists to underpinnings of molecular behaviour

110

Network Analysis

• Cytoscape– Visualize molecular interaction

networks and integrate interactions with gene expression profiles and other state data. Data filters & custom plug-in architecture.

– http://www.cytoscape.org

• Biolayout Express 3D– Large networks– Gene expression– www.sanger.ac.uk/Teams/Team101/

biolayout/b3d.html

BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 111: Network Biology: from lists to underpinnings of molecular behaviour

Network Analysis using Cytoscape

Databases

Literature

Expert knowledge

Experimental Data

Find biological processesunderlying a phenotype

NetworkInformation

NetworkAnalysis

111BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 112: Network Biology: from lists to underpinnings of molecular behaviour

Network visualization and analysis

UCSD, ISB, Agilent, MSKCC, Pasteur, UCSF, Unilever, UToronto, U Texas

http://cytoscape.org

Pathway comparisonLiterature miningGene Ontology analysisActive modulesComplex detectionNetwork motif search

112BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 113: Network Biology: from lists to underpinnings of molecular behaviour

Manipulate Networks Filter/Query

Automatic LayoutInteraction Database Search

113BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 114: Network Biology: from lists to underpinnings of molecular behaviour

Focus

Overview

Zoom

PKC Cell Wall Integrity

114BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 115: Network Biology: from lists to underpinnings of molecular behaviour

Active Community

• Help– 8 tutorials, >10 case studies– Mailing lists for discussion– Documentation, data sets

• Annual Conference: Houston Nov 6-9, 2009

• 10,000s users, 2500 downloads/month• >40 Plugins Extend Functionality

– Build your own, requires programming

http://www.cytoscape.org

Cline MS et al. Integration of biological networks and gene expression data using Cytoscape Nat Protoc. 2007;2(10):2366-82

115BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 116: Network Biology: from lists to underpinnings of molecular behaviour

LAB

Objective• Create a map of the functional enrichments from

the 14 input proteins

Methods• Use HGNC to obtain the gene symbols from the

names• Submit the gene symbols to a tool that already

has datasets loaded.• Get Attributes and do analysis on network

116BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 117: Network Biology: from lists to underpinnings of molecular behaviour

14 Proteins• ISOFORM of APOPTOSIS-INDUCING FACTOR 1, MITOCHONDRIAL • QUINONE OXIDOREDUCTASE.; 26 KDA PROTEIN.;22 KDA PROTEIN.; 32 KDA PROTEIN.• 14-3-3 PROTEIN EPSILON.• ELONGATION FACTOR 1-GAMMA.; 50 KDA PROTEIN.• AFG3-LIKE PROTEIN 2.• 3-KETOACYL-COA THIOLASE, MITOCHONDRIAL• IMPORTIN BETA-1 SUBUNIT.• FH1/FH2 DOMAIN-CONTAINING PROTEIN• ANNEXIN VI ISOFORM 2.; ANNEXIN A6.• 2,4-DIENOYL-COA REDUCTASE, MITOCHONDRIAL• HYDROXYACYL GLUTATHIONE HYDROLASE ISOFORM 1.; HYDROXYACYLGLUTATHIONE

HYDROLASE.• ISOFORM 1 OF ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA.; ISOFORM 2 OF

ELECTRON TRANSFER FLAVOPROTEIN SUBUNIT BETA• ISOFORM 1 OF LONG-CHAIN-FATTY-ACID--COA LIGASE 1• PHOSPHOLIPASE C DELTA 4.

117BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 118: Network Biology: from lists to underpinnings of molecular behaviour

Get their gene symbol/identifiersHGNC - http://www.genenames.org

• Provide a table of mappings• What challenges did you face when trying to identify the

symbols from textual descriptions?118BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 119: Network Biology: from lists to underpinnings of molecular behaviour

Identify functional enrichments

Discuss and provide a plot for the enrichment of Gene Ontology categories

119BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 120: Network Biology: from lists to underpinnings of molecular behaviour

Build an attribute enrichment network

• Which new proteins are functionally linked?• What datasets were used in the network construction?

120BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 121: Network Biology: from lists to underpinnings of molecular behaviour

Attribute Enrichment with a custom data set

• Use BioMart to– convert HGNC identifiers to Ensembl

Identifiers– Obtain the Gene Ontology categories for the

target proteins and the background proteins.• Use FUNC to do the enrichment analysis

121BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 122: Network Biology: from lists to underpinnings of molecular behaviour

122BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 123: Network Biology: from lists to underpinnings of molecular behaviour

123BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 124: Network Biology: from lists to underpinnings of molecular behaviour

124BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 125: Network Biology: from lists to underpinnings of molecular behaviour

125BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 126: Network Biology: from lists to underpinnings of molecular behaviour

Collect the Gene Ontology attributes for the list, then for all the human genes

126BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]

Page 127: Network Biology: from lists to underpinnings of molecular behaviour

Next steps are harder…

To use FUNC, you need to convert the BioMART output to the file format above. This is pretty easy to do in excel for the protein list, but excel can’t handle the results for all the human proteins. Need to write a small script… take BIOC3008 and become a competent in simple data manipulation

http://func.eva.mpg.de/

127BIOL5502B|CHEM5900 Methods in Proteomics [17/05/2010:Dumontier]