Top Banner
Attribution-NonCommercial-ShareAlike CC BY-NC-SA The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results Microbial Genomics and Bioinformatics Research Group Renzo Kottmann [email protected] Hinxton, 2015-11-18
48

The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Feb 19, 2017

Download

Science

Renzo Kottmann
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

The Ocean Sampling Day's Metagenome Analysis:

Standards, Pipelines and First Results

Microbial Genomics and Bioinformatics Research GroupRenzo Kottmann

[email protected], 2015-11-18

MAX PLANCK INSTITUTEFOR MARINE MICROBIOLOGY

Investigation of the diversity,structure and distribution ofmicrobial populations through theapproaches based onnucleic acid analyses

Junior Group of Molecular EcologyDr. Rudolf Amann

Page 2: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Common Bioinformatics View on Metagenomics:

Give me a big metagenomic sequence data,I assign gene functions

Page 3: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Data Centric View on Metagenomics

Sketch from Martin Fowler: http://martinfowler.com/articles/bigData/

Page 4: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

OSD Sampling

Scientists

Masame/GustaME

Page 5: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

A global mega-sequencing campaign

June Solstice 2014/15

Page 6: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Page 7: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MyOSD Citizen Science Project

www.my-osd.org

Page 8: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Page 9: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Key facts > 200 sites participated in OSD

• 4 OSD Pilot Events (Jun/Dec 2012 + Jun/Dec 2013) • 2 OSD Main Events (June 2014 and June 2015)

Participating OSD sites ranged from: • subtropical waters in Hawaii to extreme

environments such as the Fram Strait in the Arctic Ocean

Each main year:• ~150 Metagenomes• ~200 Amplicon (16S/18S) samples

2 Citizen Science Campaigns• MyOSD 2014 and 2015 • ~190 Amplicon (16S/18S) samples

Page 10: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Standards & Data Harvesting

Scientists

Masame/GustaME

Page 11: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Standard Sampling Protocols: OSD Handbook

http://www.microb3.eu/sites/default/files/osd/OSD_Handbook_v2.0.pdf P ten Hoopen et al.

Page 12: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Data Standards: M2B3

Core:• Absolute minimum for Micro-

B3 data sets Time and place Event/sample ID Temperature Salinity Molecular sampling protocol

Recommended:• Remainder of mandatory fields

for existing standards and others

Optional• Useful fields from existing

standards

P ten Hoopen et al. Marine microbial biodiversity, bioinformatics and biotechnology (M2B3) data reporting and service standards

Page 13: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Environmental Data

• Air temperature• Water temperature• Salinity• Phosphate• Secchi depth• …

Page 14: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Information System

Curation

Contextual Data

Page 15: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MicroB3 Summer School, May-June 2014

Logsheets

Page 16: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

OSD Citizen App

https://itunes.apple.com/us/app/osd-citizen/id834353532?mt=8

https://play.google.com/store/apps/details?id=com.iw.esa

Early, consistent, digital acquisition of environmental data

Page 17: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Sample Registration

Page 18: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Logistics

Page 19: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

More than 1000 filters (OSD2014)

High-throughput Visualization of Big Diversity

Page 20: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

SAMPLESWith BARCODES

METADATA

SAMPLING

CONTEXTUALDATA

SequencingCentre +

SEQUENCINGDATA

CONTEXTUALDATA & METADATA

SEQUENCINGDATA & METADATA

Ocean Sampling Day: Data Flow Overview

http://www.oceansamplingday.org

Page 21: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Scientists

Masame/GustaME

Page 22: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Page 23: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Pre-Processing

Page 24: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Sequence Data Pre-Processing Definition: Filter original raw sequence data

• based on well defined sequence quality criteria Goal:

• provide all OSD participants (and the whole scientific community) with a single, quality-controlled dataset

• ensure comparability and repeatability of analysis results.

Scope: • covers both amplicon (i.e. 16/18S rDNA) and

shotgun (i.e. metagenome) data sequenced with Documentation:

http://tinyurl.com/osd-pre-processing

Page 25: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Raw reads

Quality evaluation

Pre-processing

Read-level analysis

Assembly-level analysis

Pipeline View

Page 26: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

MetagenomicsAmplicons 16S/18S

Page 27: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Scientists

Masame/GustaME

Page 28: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Open Collaboration All data publically available beFORe analysis Open Analysis Group

• Working on > 40 scientific questions/topics Analysis grouped into 3 topic areas:

1. Diversity(Using OTU-based metrics and alternatives such as MED, UniPept etc.)

2. Insights metabolic functions (with focus on human impact) and their role in the ecosystems from Metagenomes

3. Towards an understanding of broad-scale ecological patterns

Page 29: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Some preliminary Results

By Antonio Fernandez Guerra(MPI Bremen, Oxford University)

Page 30: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MalaspinaGOS TARA OSD

OSD in Context of other mega-sequencing campaigns

Page 31: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Sunagawa, et al., 2015

Most complete view of the Ocean Microbiome

First sampling 2013

243 sites

7.2Tbp

Page 32: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Ocean Microbial-Reference Gene Catalog (OM-RGC) Assembled reads Annotated Clustered to generate

a non-redundant set of reference genes.

Sunagawa et al., http://ocean-microbiome.embl.de/companion.html

Page 33: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

OM-RGCIncremental clustering

(cd-hit-2d):~ 7 M

OSD genes present in OM-RGC

After Clustering at 95%:

~3.6 M

OSD genes NOT present in OM-RGC

~4.5 M

Adding OSD to the Ocean Microbiome Reference Gene Catalogue

Page 34: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

MalaspinaGOS TARA OSDOSD is 75% Coastal Sampling Day

~50% of OSD genes NOT present in OM-RGC Under-sampling of coasts

• Where 30-40% of human population lives

Page 35: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

The Ocean Resistome

(slides left-out, please contact for further Information)

Page 36: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

The dark side of the metagenomes

Text

Unravelling the unknown in the metagenomic protein universe

Page 37: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

unknownknown

Knowns and unknowns in the metagenomic protein universe

sequences with significant similarity to known protein domains (PFAM)

all the sequences without an assigned function

Page 38: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

unknownknownenvironmentalunknown

genomicunknown

Knowns and unknowns in the metagenomic protein universe

sequences with significant similarity to known protein domains (PFAM)

all the sequences without an assigned function

Page 39: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Metagenomic Network Analysis

Cluster1800572 environmental unknown

SAR11_0487 Tryptophan synthase

SAR11_1266 hypothetical protein

SAR11_0686 hypothetical protein

SAR11_1277 aspartate racemase

Discover the unknowns (Global Ocean Survey)

Page 40: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Bioinformatics Pipelines

Scientists

Masame/GustaME

Page 41: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Ecological Analysis Tools: Gustame

http://mb3is.megx.net/gustame

Page 42: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Buttigieg & Ramette, 2014

Page 43: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Multivariate AnalysiS Applications for Microbial Ecology (MASAME)

http://mb3is.megx.net/masame/

Page 44: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Summary Ocean Sampling Day

• A great open collaboration, open science community• Sharing data is pre-requisite!

Build on existing infrastructure(s)• Only added missing components• Cover all aspects of studying marine microbiomes

Data gathering Bioinformatics Analysis Archiving/Dissemination Ecological Analysis

Comparative metagenomics which takes into account whole data lifecycle• From sampling to ecological and biotechnological analysis

Page 45: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Page 46: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Thanks for your attention

1st Marine Board Forum: Marine data Challenges: from Observation to Information

http://twitter.com/Micro_B3http://www.oceansamplingday.org

Page 47: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA

Ecological Traits of Metagenomes

Established Database of calculated traits • Available at http://mb3is.megx.net/mg-traits

PCA of Codon Usage (see http://mb3is.megx.net/mg-traits/traits-summary)

Page 48: The Ocean Sampling Day's Metagenome Analysis: Standards, Pipelines and First Results

Attribution-NonCommercial-ShareAlike CC BY-NC-SA