Top Banner
From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to Useful Visual This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002. Opinions, interpretations, recommendations and conclusions are those of the authors and are not necessarily endorsed by the United States Government. Distribution Statement A: Approved for public release; distribution is unlimited.
20

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

Dec 28, 2015

Download

Documents

Brice Hodges
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20131

Anna Shcherbina

Bioinformatics Challenge Day

02/02/2013

From Metagenomic Sample to Useful Visual

This work is sponsored by the Defense Threat Reduction Agency under Air Force Contract #FA8721-05-C-0002.  Opinions, interpretations, recommendations and conclusions are those of the authors and

are not necessarily endorsed by the United States Government.

Distribution Statement A: Approved for public release; distribution is unlimited.

Page 2: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20132

The Opportunity

•NGS instruments have recently given us the ability to characterize the microbiomes that we live in and that live in us.

•We can get a step closer to this goal by creating a visualization program that facilitates manual data curation by a human.

Page 3: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20133

Your Mission

Invent novel visualization approaches to represent metagenomic data.

Subgoals:•Pick out anomalies within a given dataset. •Generate time series representation of multiple datasets.•Compress data efficiently to allow visualization of huge datasets.

Page 4: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20134

Metagenomic datasets (FASTQ format) from clinical and environmental samples.

• Metagenome of the human oral cavity under healthy and diseased conditions, with a focus on supragingival dental plaque and cavities. – “oral_healthy” and “oral_diseased” datasets– Roche 454

• Nose/throat swab from Nicaraguan child with acute respiratory illness– “nicaragua” dataset– Illumina

The Data (I)

Page 5: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20135

• Skin surface from the palm of a human hand – “palm” dataset– Roche 454

• Human abscess sample of unknown etiology – “abscess” dataset– Illumina

• Cultivated corn soil metagenome – “soil” dataset– Illumina

The Data (II)

Page 6: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20136

Our Processing Pipeline

Raw FASTA reads

BLAST against virus, bacteria, and archaea databases

(from GenBank)

Data Processing•Parsed CSV summary of BLAST hits

•BLAST hits sorted by species, FASTA format

Other BLAST parsers

Data is available from each stage of the processing pipeline

Page 7: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20137

Parsed BLAST File Example for a Single Hit

S62.141238_159200 Query Name+ Query Strand1 Query Start232 Query EndNeisseria meningitidis Query OrganismBacteria; Proteobacteria; Betaproteobacteria; Query Taxonomy 232 Identities100 Percent0 Number Gaps0 Number CharactersGU561418 Target Name- Target Strand47 Target Start 278 Target EndNeisseria subflava Target OrganismBacteria;Proteobacteria;Betaproteobacteria;Neisseriales;Neisseriaceae;Neisseria.Target Taxonomy CTGGGCCGTGTCTCAGTCCCAGTGTGGC Query SequenceCTGGGCCGTGTCTCAGTCCCAGTGTGGC Target SequenceBLASTN Analysis Programbacteria.gdna Database

Page 8: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20138

Your Open-Source Toolkit

•MEGAN4

•IMG/IM

•KRONA (included with PhymmBl)

•MG-RAST

•METAREP

•Mothur

•Feel free to use any additional tools you think are useful.

Page 9: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/20139

MEGAN4-MEtaGenomoe ANalyzer

•A simple lowest common ancestor algorithm assigns reads to taxa. • Taxonomic level reflects the degree of conservation of a sequence.

•Dissects large datasets without assembly or the targeting of specific phylogenetic markers.

•Graphical and statistical output for comparing different datasets.

Page 10: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201310

MEGAN4-MEtaGenomoe ANalyzer

Oral Diseased Bacteria

Oral Healthy Bacteria

Oral Diseased Virus Oral Healthy Virus

Page 11: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201311

MEGAN4-MEtaGenomoe ANalyzer

Oral healthy Vs.

Oral diseasedBacteria

Oral healthy Vs.

Oral diseasedVirus

Page 12: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201312

• Web interface: http://img.jgi.doe.gov/cgi-bin/m/main.cgi

IMG/IM – Integrated Microbial Genomes with Microbial Samples

source: http://img.jgi.doe.gov/m/doc/about_index.html

Page 13: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201313

IMG/IM Phylogenetic Distribution of Genes Based on Distribution of BLAST Hits

source: http://img.jgi.doe.gov/m/doc/about_index.html

Page 14: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201314

IMG/M Abundance Profile Overview

source: http://img.jgi.doe.gov/m/doc/about_index.html

Page 15: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201315

• KRONA allows hierarchal data to be explored with zoomable pie-charts. – Excel template or KRONA tools. – Support for several bioinformatics tools and raw data formats.

KRONA

source: http://sourceforge.net/p/krona/home/krona/

Page 16: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201316

MG-RAST

Oral Diseased

source: http://blog.metagenomics.anl.gov/

Page 17: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201317

MG-RAST

Oral Healthy

source: http://blog.metagenomics.anl.gov/

Page 18: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201318

MG-RAST

Oral Diseased Oral Healthy

source: http://blog.metagenomics.anl.gov/

Page 19: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201319

• A Web 2.0 application to analyze and compare annotated metagenomic datasets.

• Compare absolute and relative counts of multiple datasets at various functional and taxonomic levels.

• Statistical tests, multidimensional scaling, heatmap and hierarchal clustering plots.

JCVI Metagenomics Reports (METAREP)

source: http://blogs.jcvi.org/tag/metarep/

Heatmap Plot

Hierarchical Clustering Plot

METASTAT Results

Page 20: From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/2013 1 Anna Shcherbina Bioinformatics Challenge Day 02/02/2013 From Metagenomic Sample to.

From Metagenomic Sample to Useful Visual Anna Shcherbina 01/10/201320

• A single platform for sequence alignment, pairwise distance calculation, distance matrix analysis.

• Venn diagrams, community trees, heat maps, sample-based rarefaction curves.

Mothur: 16S rRNA Sequence Analysis