Top Banner
Dmitry Grapov, PhD Gene Ontology Network Enrichment Analysis
30

Gene Ontology Network Enrichment Analysis

Jul 01, 2015

Download

Science

UC Davis

From the UC Davis Proteomics 2014 Summer Workshop
www.proteomics.ucdavis.edu

by Dmitry Grapov, Ph D
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gene Ontology Network Enrichment Analysis

Dmitry Grapov, PhD

Gene Ontology Network Enrichment Analysis

Page 2: Gene Ontology Network Enrichment Analysis

Download all material for the tutorial

https://sourceforge.net/projects/teachingdemos/files/2014%20UC%20Davis%20Proteomics%20Workshop/Summer%202014%20Proteomics%20Workshop.zip/download

https://sourceforge.net/projects/teachingdemos/files/

Choose 2014 UC Davis Proteomics Workshop or use the full URL below

Page 3: Gene Ontology Network Enrichment Analysis

• decrease• increase

Use functional analysis to identify if the changes in variables are enriched (increased compared to random chance) for some biological pathway, domain or ontological category.

Page 4: Gene Ontology Network Enrichment Analysis

Enrichment or Overrepresentation analysis

Biochemical Pathway Biochemical Ontology

Page 5: Gene Ontology Network Enrichment Analysis

Major Tasks

Using the proteins listed in the excel workbook: ‘proteomic data for analysis.xlsx’ and worksheet: ‘protein IDs’

1. Conduct Gene Ontology (GO) Enrichment Analysis using DAVID Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp

2. Investigate enriched terms using Quick GO http://www.ebi.ac.uk/QuickGO/

3. Summaries and visualize the results using REVIGO http://revigo.irb.hr/

4. Create and modify GO network using Cytoscape http://www.cytoscape.org/

Page 6: Gene Ontology Network Enrichment Analysis

Protein IDsCommon protein identifier UniProt/SwissProt Accession (default in scaffold) http://www.uniprot.org/

Use Biomart to translate to other database IDS

http://www.biomart.org/

e.g. gene symbols

Page 7: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resourceshttp://david.abcc.ncifcrf.gov/home.jsp

Page 8: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

1. Upload list

2. Choose ID type

3. Select list type

4. Submit

Page 9: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resourcesorganism Make sure all IDs were recognized

List of biochemical databases tested for enrichment

Page 10: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Choose GO

Page 11: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

Page 12: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

List of biochemical databases tested for enrichment

1. Overview BP: Biological process

2. Select

Page 13: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources

http://david.abcc.ncifcrf.gov/helps/functional_annotation.html#E3

Page 14: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources1. Overview most enriched term

Page 15: Gene Ontology Network Enrichment Analysis

Quick GO http://www.ebi.ac.uk/QuickGO/1. View children (lower hierarchy subsets) of this term

Page 16: Gene Ontology Network Enrichment Analysis

David Bioinformatics Resources/Quick GO1. Can you identify any enriched children of this term in our DAVID output?

?

2. Download results

Page 17: Gene Ontology Network Enrichment Analysis

Overview and Format Results in Excel

1. Save results 2. Open in MS Excel

Page 18: Gene Ontology Network Enrichment Analysis

Overview Results

Modified Fisher’s Exact Test p-value

optionally: Check in Rx<-data.frame(user=c(1,47),genome=c(690,13528))

fisher.test(x) # p-value = 5.41e-06

(13/47) / (690/13528)

Page 19: Gene Ontology Network Enrichment Analysis

Alternative to Fisher Exact Test:

Hypergeometric Test

How to calculate statistics to determine enrichment?

hit.num = 51 # number of significantly changed pathway variables

set.num = 1455 # number of variables in pathway

full = 3358 # all possible variables in organism

q.size = 72 # number of significantly changed variables

phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)

enrichment p-value = 1.717553e-06

Page 20: Gene Ontology Network Enrichment Analysis

Visualization Options

Challenges: •Removal of redundant information•Visualizing term relationships (term-term, term-protein)

Page 21: Gene Ontology Network Enrichment Analysis

Use REVIGO to filter redundant terms

http://revigo.irb.hr/

prepare input (term, p-value)

1. Upload to

REVIGO

Supek F, Bošnjak M, Škunca N, Šmuc T. "REVIGO summarizes and visualizes long lists of Gene Ontology terms" PLoS ONE 2011. doi:10.1371/journal.pone.0021800

2. Run

Page 22: Gene Ontology Network Enrichment Analysis

REVIGO: overview scatterplot

Position defined on similarity (MDS)

Page 23: Gene Ontology Network Enrichment Analysis

REVIGO: overview table

Cluster leaders prioritized based on enrichment p-value

Page 24: Gene Ontology Network Enrichment Analysis

REVIGO: network

• Edges: 3% of the strongest GO term pairwise similarities

• Node size: generality of term (small = specific)

• Node color: p-value

Download network

Page 25: Gene Ontology Network Enrichment Analysis

Cytoscape

1. Open Cytoscape

Import REVIGO network into cytoscape

2

3 4

Page 26: Gene Ontology Network Enrichment Analysis

Cytoscape: set layout and defaults

1. Set layout 3. Set network defaults

2

4 5

Page 27: Gene Ontology Network Enrichment Analysis

Cytoscape: map data to network properties

1. Set Edge width and color 2. Set Node labels, size and color

Page 28: Gene Ontology Network Enrichment Analysis

Cytoscape: overview network components

Download edge information

1

2

3. View in excel

Download node information

1

2

3. View in excel

Page 29: Gene Ontology Network Enrichment Analysis

Bonus: Modify Edge and Node Attributes to show term to protein connections

See file ‘test edge.xlsx’ and ‘test node.xslx, for examples of upload formats

See detailed instructions at http://www.slideshare.net/dgrapov/demonstration-of-network-mapping

Page 30: Gene Ontology Network Enrichment Analysis

See more Statistical and Multivariate Analysis Examples athttp://imdevsoftware.wordpress.com/tutorials/

Questions?

[email protected]

This research was supported in part by NIH 1 U24 DK097154