Approaches for Integration of multiple ‘Omic’ Data Dmitry Grapov, PhD
May 10, 2015
Approaches for Integration of multiple ‘Omic’ Data
Dmitry Grapov, PhD
Examples
Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643
FBA = flux-balance analysis
• Topological enrichment can give broad overview of impacted genes, proteins and metabolites
• Changes in biochemical domains corroborated by multi-Omic data sets can be used to identify robust candidates responsible for phenotypic variation between comparisons
• Gene-gene, protein-protein or gene-protein interaction networks can be used to deconvolute ambiguous metabolic pathways
Common Approaches
Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643
Biochemical Domain Enrichment Analysis
• Genes/Proteins DAVID, AmiGo, etc GO:terms
• Genes/Proteins + Metabolites IMPaLA: Integrated Molecular Pathway Level Analysis (http://impala.molgen.mpg.de/) pathways
1. Classify all species domains (e.g. biological process, pathway, etc)
2. Calculate probability of observing changes in species by chance
IMPaLA: Gene + Metabolite pathway enrichment
Challenges:• Removal of redundant information• Preference of specific vs. generic pathways• Visualization of gene + metabolite + pathway relationships
Determining significance of the enrichment: Hypergeometric Test
How to calculate statistics to determine enrichment?
hit.num = 51 # number of significantly changed pathway metabolites set.num = 1455 # number of metabolites in pathway full = 3358 # all possible metabolites in organismq.size = 72 # number of significantly changed metabolites
phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)= 1.717553e-06
GO Enrichment analysis:Hierarchy of Redundancy (parents)
• GO is an ontology wherein enrichment is often shared by children and parents.
• Difficult to co-visualize term hierarchy and gene to term mapping
Enrichment networks: Removing the Hierarchy of Redundancy
Workflow:
1. If two nodes share all genes, drop least enriched (highest p-value)
2. Filter terms based on enrichment
3. Display term to gene/protein relationships as edges in a network
4. Map direction of change in genes/proteins to network node attributes
Enrichment NetworkMapping of parents through children
GO enrichment network displays:
• gene names associated with each overrepresented term
• Fold change in protein expression between two groups (can be extended k>2 groups)
• Can display enrichment p-value for each term
• Can incorporate metabolites as children of genes
Empirical Networks
• Correlation based networks (CN) (simple, tendency to hairball)
• GGM or partial correlation based networks (advanced, preference of direct over indirect relationships
• *Increase in robustness with sample size
10.1007/978-1-4614-1689-0_17
Topological Enrichment Networks
http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
http://www.genome.jp/dbget-bin/www_bget?rn:R00975
Topological Enrichment Networks:genes + proteins + metabolites
MetaMapRBiological network generator
https://github.com/dgrapov/MetaMapR
[email protected] metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154