Top Banner
Genovo: De Novo Assembly for Metagenomes Gao Song 2010/07/14
22

Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Dec 31, 2015

Download

Documents

Abigayle Quinn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Genovo: De Novo Assembly for Metagenomes

Gao Song2010/07/14

Page 2: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

OutlineOverview of MetagenomicesCurrent AssemblersGenovo Assembly

Page 3: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Overview of Metagemices

Page 4: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Metagenomics is:

Why Do We Need Metagenomics?Snapshot of bacterial communityCannot be cultivated

Motivation

<1%

Page 5: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Monitoring the impact of pollutants on ecosystems

Discovery of new genes, enzymes…- Global Ocean Sampling Expedition

Human Microbiome Project

JGI sequenced Acid Mine Drainage sample

Applications

Page 6: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Marker Gene Sequencing16s rRNA:

Two ways

Other marker genes: RuBisCo, NifHOnly composition

Whole Genome Sequencing (WGS)Detailed picture of community

Two Paradigms

Page 7: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Complex Communities>1000X5000200L

1million

Page 8: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Current Assembler

Page 9: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Why not assemble reads?

ORFome assembler*Three steps:

The putative ORFs are annotated for each read ORFs are assembled using EULER ORF homologs are searched for in Integrated Microbial

Genomics (IMG) database

Existing WGS assemblersSanger reads: Phrap, Celera, Arachne, JAZZ…Short reads: Velvet, Newbler…

Current Status

* Y. Ye and H. Tang, "An orfome assembly approach to metagenomics sequences analysis." Journal of bioinformatics and computational biology, vol. 7, no. 3, pp. 455-471, June 2009

Page 10: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Genovo: De Novo Assembly for Metagenomes

Jonathan Laserson, Vladimir Jojic and Daphne Koller. RECOMB 2010, LNBI 6044, pp. 341-356, 2010

Page 11: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Main IdeaPropose a generative model for Metagenome

dataUsing iterated conditional modes (ICM)Using hill-climbing steps iterativelyDesign a score for evaluation

Page 12: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

ModelInitialize contigs:

Infinite contigs with infinite length

Partition the readsUsing Chinese Restaurant Process

Page 13: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

ModelGenerate the starting point oi

Generate the length of read

Quality of assembly of each read

Page 14: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

AlgorithmUsing ICMStarting from initial condition, hill-climbing

moves are performed iterativelyMove 1: Consensus Sequence:

Select the most frequent base

Page 15: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

AlgorithmMove 2: Read Mapping

For read i, first remove it, then recalculate its contig and alignment

First, for each potential location, compute alignment

Then, select the location according to possibility

Filtering: using common 10-mer

Page 16: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

AlgorithmMove 3: update geometric variable

->Globle moves:

Propose indelsCenterMerge contigs

Chimeric readsDisassemble the dangling contigs

Page 17: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

EvaluationBLASTPFAMDesigned score

1st term: quality of assembly2nd term: penalty for total length3rd term: prefer to merge when V>V0

Page 18: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

ResultsUsing 454 readsCompare with Newbler, Velvet and EULER-

SRSingle Genome

Page 19: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

ResultMetagenome data

Score

PFAM

Page 20: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

DiscussionNew ideaApply a mature algorithm to assembly

domainSystematically describe and analyze the

problem and algorithmResults are better

Page 21: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

DiscussionSlowly: minute vs. hours for 300k 454 readsMain idea: try to extend as long as possible,

so they will have more hits for BLASTWhy choose 20 for V0?How to deal with branching? Repeats?Model:

Why it can capture the property of metagenomic data?

How to argue the correctness of that model?The distribution of starting points

Page 22: Gao Song 2010/07/14. Outline Overview of Metagenomices Current Assemblers Genovo Assembly.

Thank you