Top Banner
C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute
19

C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

C A M E R AA Metagenomics Resource for

Marine Microbial Ecology

July 27, 2007

Paul GilnaUCSD/Calit2

Saul A. KravitzJ. Craig Venter Institute

Page 2: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• UCSD/Calit2- Larry Smarr, PI; Paul Gilna, Executive Director

- Phil Papadopoulos, Technical Lead

- Weizhong Li

• JCVI- Marv Frazier, co-PI

- Leonid Kagan, Architect; Jennifer Wortman, Bioinformatics

- Rekha Seshadri, Outreach and Training;

- Doug Rusch, Shibu Yooseph, Aaron Halpern, Granger Sutton

• UC Davis- Jonathan Eisen, co-investigator

• Gordon and Betty Moore Foundation- David Kingsbury and Mary Maxon

Acknowledgements

Page 3: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Outline

• New Discipline of Metagenomics

• Global Ocean Sampling Expedition

• Challenges of Metagenomic Data

• CAMERA Features

• CAMERA Usage to Date

• Cyberinfrastructure

Page 4: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Genomics – ‘Old School’- Study of an organism's genome - Genome sequence determined using shotgun

sequencing and assembly- ~1300 microbes sequenced, first in 1995

- DNA usually obtained from pure cultures

• Metagenomics - Application of genome sequencing methods to

environmental samples (no culturing)- Environmental shotgun sequencing is the most widely

used approach

Genomics vs Metagenomics

Page 5: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Within an environment- What biological functions are present (absent)?

- What organisms are present (absent)

• Compare data from (dis)similar environments- What are the fundamental rules of microbial ecology

• Search for novel proteins and protein families

Metagenomic Questions

Page 6: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Metagenomics Applications

• Marine Ecology and Microbiology• Alternative Energy and Industrial

- Hypersaline ponds, Oceans- Termite Metabolism

• Medical Applications- Microbial Ecology of Human body cavities and fluids

• Agricultural- Disease Vector Metabolism (Glassy Eyed Sharpshooter)- Soil Ecology

• Environmental Remediation- DOE: Acid Mine Drainage, Chemical and Radioactive Waste

Page 7: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Metagenomics- Genomics + Metadata

• Environmental Metadata- Time and location (lat, long, depth)

of sample collection

- Correlate w/remote sensing data

- Physico-chemical properties (e.g. temperature, salinity)

MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003

Metadata

Page 8: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

JCVI Global Ocean Sampling Expedition Largest Metagenomic

Study to Date

Page 9: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Global Ocean Sampling (GOS)178 Total Sampling Locations

Phase 1: 41 samples, 7.7M reads, >6M proteinsDiverse Environments

Open ocean, estuary, embayment, upwelling, fringing reef, atoll, warm seep, mangrove, fresh water, biofilms, sediments, soils

Page 10: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Novel clustering process• Sequence similarity based

• Predict proteins and group into related clusters

• Include GOS and all known proteins

• Findings• GOS proteins cover ~all existing prokaryotic families

• GOS expands diversity of known protein families

• 1700 large novel clusters with no homology to known protein families

• Higher than expected proportion of novel clusters are viral

• No saturation in the rate of novel protein family discover

GOS Protein Analysis Yooseph et al (PLoS 2007)

Page 11: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

H. marismortui

B. haloduransT. thermophilus

B. anthracis

D. psychrophila

D. radiodurans

UVDE homologs

Rubisco homologs

GOS prokaryotes

Known eukaryotes

Known prokaryotes

GOS prokaryotes

Known eukaryotes

Known prokaryotes

GOS viral

Known viral

GOS eukaryotes

Added Diversity

Page 12: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Rate of discovery

0

50

100

150

200

250

0 1 2 3 4 5 6 7

Number of sequences (millions)

Nu

mb

er o

f clu

ster

s (t

ho

usa

nd

s)

size >=3

size >=5

size >=10

size >=20

Rate of Protein Discovery

Page 13: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Fragment Recruitment ViewerRusch et al, PLoS 3/2007

Pe

rce

nt I

den

tity

Reference Genome Coordinates

100%

55% Ribosomal operon

“core” genome,

~75% identical

Sequence absent from most strains – phage/other lateral transfer?

100%

50%

Page 14: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Public repositories not focused on environmental metagenomics- Sargasso Sea data underutilized by community

• M$ invested in sequencing and analysis but only accessible to bioinformatics elite

• Release of GOS dataset in March 2007• Comply with Convention on Biodiversity

Why CAMERA?

Page 15: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

CAMERA – http://camera.calit2.net

• “Convenient acronym for cumbersome name…”- Henry Nichols, PLoS Biology

• Mission- Enable Research in Marine Microbiology

• CAMERA Partners:

Page 16: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Enormous datasets with high gene density- large compute resources required- 2 orders of magnitude jump

• Fragmentary data- inadequate bioinformatics tools for assembly,

annotation, analysis, visualization

• Metadata standards non-existent- metadata absent from databases- Lack of standards impedes collection of datasets

• Diversity of User Sophistication and Needs

Challenges

Page 17: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Maintain searchable sequence collections- ALL metagenomic sequence reads, assemblies

- Non-identical amino acid collection (extended NRAA)

- Viral, Fungal, pico-Eukaryotes, Microbial

- CAMERA protein clusters

• Metagenomics data easily downloadable

• Interactive and Batch Search Facility- Scalable parallel implementations of BLAST

- Integrated with associated metadata

CAMERA Services

Page 18: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

• Graphical Tools for Visualizing Diversity- Based on Rusch et al- Fragment recruitment viewer

• CAMERA Protein Clusters- Based on Yooseph et al- Incremental version implemented in 2007

• Annotation- Break through quadratic complexity via clusters- Phyletic Classification

• Overviews of sequence collections

Distinctive Features Set in Progress

Page 19: C A M E R A A Metagenomics Resource for Marine Microbial Ecology July 27, 2007 Paul Gilna UCSD/Calit2 Saul A. Kravitz J. Craig Venter Institute.

Fragment Recruitment Viewer

Metagenomic Sequence

vs

Reference Sequence

• Highlight and Select with Associated Metadata

• View large datasets• AJAX I/F

Based on Doug Rusch’s Viewer