Top Banner
29.09.2009 – Stefan Götz, Valencia 1 Functional Analysis - Outline Test for enriched functions Fisher's Exact Test (FatiGO) Gene Set Enrichment (GSEA, FatiScan) Kegg Pathway Analysis with B2G B2G-Far
33

Functional Analysis - Outline

May 08, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

1

Functional Analysis - Outline

● Test for enriched functions– Fisher's Exact Test (FatiGO)

– Gene Set Enrichment (GSEA, FatiScan)

● Kegg Pathway Analysis with B2G● B2G-Far

Page 2: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

2

Biosynthesis 54% Biosynthesis 18%

Sporulation 18% Sporulation 18%

One Gene List (A) The other list (B)

Are this two groups of genes

carrying out different

biological roles?

Fisher's Exact Test

Is this statistically significant?which means: is it unlikely to have occurred by

chance

???

???

Page 3: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

3

Biosynthesis 54% Biosynthesis 18%

Sporulation 18% Sporulation 18%

One Gene List (A) The other list (B)Are this two

groups of genes carrying out

different biological roles?

95No biosynthesis

26Biosynthesis

BAGenes in group A have not significantly to do with biosynthesis nor sporulation.

Fisher's Exact TestC

ontingency table

p-value for biosynthesis = 0.0913

Page 4: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

4

Multiple testing correction

We do this for all GO term of our dataset!!!

Many tests => Many false positive => We need correction!

FDR control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. In a list of rejected hypotheses, FDR controls the expected proportion of incorrectly rejected null hypotheses.

FWER control: The familywise error rate is the probability of making one or more false discoveries among all the hypotheses when performing multiple pairwise tests.

(more conservative)

Page 5: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

5

Different types of comparisons

● Compare one condition against another

● Remove Common Ids

● Test and Ref-Set are interchangeable

Set 1 Set 2

Common IDs

● Compare a subset against the total

● Gossip default setting

● Test and Ref-Set are NOT interchangeable

Test-Set

Ref-Set

Common IDs

Test-Set

Ref-Set

Common IDs

Page 6: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

6

FatiGO in Blast2GO

● Two-Tailed test not only identifies over but also under represented functions.

● If no Ref-Set is chosen all annotations are used as reference

Page 7: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

7

FatiGO in Blast2GO● Result table with link outs to sequence lists

Page 8: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

8

FatiGO in Blast2GO

Retains only the lowest, most specific enriched term per GO branch

Page 9: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

9

FatiGO in Blast2GO● Export enriched terms data as DAG graphs!

reduce

=> To draw all nodes, set filter to 1

Page 10: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

10

FatiGO in Blast2GO

=> Filter results

% of sequences in Ref group

% of sequences in Test group

If Test > Ref = over-expressed

If Ref > Test = under-expressed

● Export enriched terms as chart!

Page 11: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

11

FatiScan Features

● Interpret a ranked list of genes● There is not need for choosing a cut-off (all

information is included)● One statistical test for each Functional Block of

annotation– Multiple testing context (hundreds of annotation)

– Filtering of annotation is convenient (the less tests the best correction)

Page 12: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

12

FatiScan...testing along an ordered list

• Index ranking genes according to some biological aspect under study.

• Database that stores gene class membership information.

• FatiScan searches over the whole ordered list, trying to find runs of functionally related genes.

List of genes

+

-

Annotation label A

Annotation label B

Annotation label CBA C

Block of genes enriched in the annotation A

Annotation C is homogeneously

distributed along the list

Block of genes enriched in the annotation B

Page 13: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

13

FatiScangene set enrichment analysis (fold-change case-control)

Ranked ge

ne list (e.g. fold-ch

anges)

-.....

GO1 GO2 GO10.... GO11

A:Functional labels (GO, KEGG, etc.) over-represented among over-expressed genes

B:Functional labels under-represented among over-expressed genes

C: Functional labels over-represented among under-expressed genes

D:Functional labels under-represented among under-expressed genes

+

A

B

C D

Page 14: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

14

Functional Analysis - Outline

● Test for enriched functions– Fisher's Exact Test (FatiGO)

– Gene Set Enrichment (GSEA, FatiScan)

● Kegg Pathway Analysis● B2G-Far

C04018C10 GO:0004707C04018C10 GO:0006468C04018C10 GO:0005524C04018C10 EC:2.7.11.24C04018E10 GO:0005739C04018E10 GO:0009536C04018A12 GO:0009056C04018C12 GO:0004869C04018C12 ...

Export annotations to other tools

Page 15: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

15

BabelomicsWEB tools suite

● A complete suite of web tools for the functional analysis of groups of genes in high-throughput experiments

Page 16: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

16

Babelomics Databases

Babelomics

Interpro

Gene Ontology

KEGGEnsembl

SwissProt

Transcription Factors

MicroRNA

Cisred

BioentitiesLiterature

Integrated Biological DBof Functional Annotation for more than 10 species

● Import your own annotations

Page 17: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

17

Babelomics Tools

FatiGO: Finds differential distributions of functional terms between two groups of

genes, these terms can be: Gene Ontology , InterPro motifs, SwissProt KW ,

transcription factors (TF), gene expression in tissues, bioentities from

scientific literature, cis-regulatory elements CisRed.

Tissues Mining Tool: compares reference values of gene expression in

tissues to your results.

MARMITE: Finds differential distributions of bioentities extracted from PubMed

between two groups of genes.

FatiScan: detect significant functions with Gene Ontology, InterPro motifs,

Swissprot KW and KEGG pathways in lists of genes ordered according to

differents characteristics.

MarmiteScan: Use chemical and disease-related information to detect related

blocks of genes in a gene list with associated values.

twolists

ranked list

Page 18: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

18

FatiScan Web Tool

List of genes

C04018C12 2.31C04018C13 2.23C04018C14 1.87C04018C16 1.62C04018E18 0.87C04018E19 0.12C04018A21 -0.01C04018C33 -0.18C04018C65 ....

C04018C13 GO:0004707C04018C14 GO:0006468C04018C15 GO:0005524C04018C16 EC:2.7.11.24C04018E17 GO:0005739C04018E18 GO:0009536C04018A19 GO:0009056C04018C22 GO:0004869C04018C32 ...

List of annotations

Page 19: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

19

FatiScan ResultsSignificantly enriched GO-Terms

Percentages (Test/Ref)

Adjusted p-Values

p-Values

Page 20: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

20

Excercise● List of genes differentially expressed between

two tumour classes ● To identify functionally enriched terms for

blocks of genes we are going to perform a threshold-free FatiScan

● Genes are ranked by there fold-change

Page 21: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

21

Functional Analysis - Outline

● Test for enriched functions– Fisher's Exact Test (FatiGO)

– Gene Set Enrichment (GSEA, FatiScan)

● Kegg Pathway Analysis with B2G● B2G-Far

Page 22: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

22

KEGG: Kyoto Encyclopedia of Genes and Genomes

GO Term

EnzymeCode

KEGGPATHWAY

Page 23: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

23

Obtain Enzyme Codes

Page 24: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

24

KEGG Pathways

● First, choose a folder to save the KEGG maps

● Maps are retrieved online● Export as text file (tab-sep)

Page 25: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

25

KEGG Pathways

--> Each enzyme in a different color --> Pathways are ordered for abundance

Ordered List

Page 26: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

26

Functional Analysis - Outline

● Test for enriched functions– Fisher's Exact Test (FatiGO)

– Gene Set Enrichment (GSEA, FatiScan)

● Kegg Pathway Analysis with B2G● Functional Annotation Repository: B2G-FAR

Page 27: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

27

Annotation Repository: B2G-Far

● Many public available sequences are uncharacterised → Reduce the amount of un-annotated sequences

● Generate high-quality functional annotations especially for the non-model species community

Comprehensive and high throughput:Apply the Blast2GO methodology to the probably most largest protein sequence resource: SIMAP → pre-calculated sequence alignments of 29 million non-redundant proteins which cover the content of all major public sequence, contains InterPro domains Annotation

Functional and highly used:Annotate non-model Affymetrics Microarrays

Page 28: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

28

Analysis PipelineSimap contains

29 million proteins Annotation source: manual curated

GO-lite database

17 non-model species

GeneChips

Page 29: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

29

Hundreds of species now with Blast2GO protein

annotations

Page 30: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

30

Results

Datasource Unique sequences

Whole Simap 29.375.919

Simap without metagenomes 24.394.532

Simap protein sequences annotated by Blast2GO 13.263.568

Sequences which do not surpass the annotation threshold 2.269.564

Sequences without sequence alignments to GO 8.861.400

Processing time: 3 days !

Page 31: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

31

Online Repository

● High quality functional annotations: 2000 species, 17 non-model species microarrays

● B2G-Far Content

– Taxonomy

– General Information

– Data download

– Statistics:Annotation distr.GO-level distr.Top-50 functions

Page 32: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

32

Search your sequences in B2G-FAR

● Download annotations of your species

● Load annotations into B2G

● Deselect all sequences

Page 33: Functional Analysis - Outline

29.09.2009 – Stefan Götz, Valencia

33

Exercise

● Go to web site –> course material● Perform an enrichment analysis with

Blast2GO and FatiScan in Babelomics● Once completed, check out the

comments/solution of the exercise