Top Banner
The best of both worlds Combining PacBio with short read technology for improved de novo genome assembly Lex Nederbragt, NSC and CEES [email protected]
56

Combining PacBio with short read technology for improved de novo genome assembly

May 10, 2015

Download

Technology

Lex Nederbragt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Combining PacBio with short read technology for improved de novo genome assembly

The best of both worldsCombining PacBio with short read technology

for improved de novo genome assembly

Lex Nederbragt, NSC and [email protected]

Page 2: Combining PacBio with short read technology for improved de novo genome assembly

This talk

Page 3: Combining PacBio with short read technology for improved de novo genome assembly

Why does everybody want longer reads?

… for genome assemblies

Page 4: Combining PacBio with short read technology for improved de novo genome assembly

What is a genome assembly

reads

contigs

scaffolds

Hierarchical structure

Page 5: Combining PacBio with short read technology for improved de novo genome assembly

Sequence data

Reads

http://www.cbcb.umd.edu/research/assembly_primer.shtml

reads

contigs

scaffolds

original DNA

fragments

original DNA

fragments

Sequenced ends

Page 6: Combining PacBio with short read technology for improved de novo genome assembly

Contigs

Building contigsreads

contigs

scaffolds

Page 7: Combining PacBio with short read technology for improved de novo genome assembly

Contigs

Building contigsreads

contigs

scaffolds

Repeat copy 1 Repeat copy 2

Collapsed repeat consensus

Contig orientation?Contig order?

http://www.cbcb.umd.edu/research/assembly_primer.shtml

Page 8: Combining PacBio with short read technology for improved de novo genome assembly

Mate pairs

Other read type

Repeat copy 1 Repeat copy 2

reads

contigs

scaffolds

mate pair reads(much) longer fragments

Page 9: Combining PacBio with short read technology for improved de novo genome assembly

Scaffolds

Ordered, oriented contigsreads

contigs

scaffolds

contigs

mate pairs

gap size estimate

Page 10: Combining PacBio with short read technology for improved de novo genome assembly

What is a genome assembly

reads

contigs

scaffolds

Hierarchical structure

Page 11: Combining PacBio with short read technology for improved de novo genome assembly

Genome assembly

So, what’s so hard about it?

Page 12: Combining PacBio with short read technology for improved de novo genome assembly

1) Repeats

reads

contigs

scaffolds

Repeat copy 1 Repeat copy 2

Collapsed repeat consensus

Repeats break up contigs

http://www.cbcb.umd.edu/research/assembly_primer.shtml

Page 13: Combining PacBio with short read technology for improved de novo genome assembly

2) Heterozygosity

http://commons.wikimedia.org/wiki/File:Chromosome_1.svg

*

*

*

Differences between sister chromosomes

Page 14: Combining PacBio with short read technology for improved de novo genome assembly

2) Heterozygosity

Polymorphic contig 2Polymorphic contig 2

Polymorphic contig 3Polymorphic contig 3

Contig 4Contig 1

Page 15: Combining PacBio with short read technology for improved de novo genome assembly

2) Heterozygosity

http://www.astraean.com/borderwars/wp-content/uploads/2012/04/heterozygoats.jpg and many other sites

Page 16: Combining PacBio with short read technology for improved de novo genome assembly

3) Many programs to choose from

Zhang et al. (2011) doi:10.1371/journal.pone.0017915.g001

Page 17: Combining PacBio with short read technology for improved de novo genome assembly

Assembly: challenges

Repeat copy 1 Repeat copy 2

Polymorphic contig 2Polymorphic contig 2

Polymorphic contig 3Polymorphic contig 3

Contig 4Contig 1

Knowing how to use the programs

Heterozygosity

Page 18: Combining PacBio with short read technology for improved de novo genome assembly

So, why does everybody want longer reads?

http://www.autobizz.com.my/forum/forum/General-Chat/944-The-worlds-longest-car.html

Page 19: Combining PacBio with short read technology for improved de novo genome assembly

Longer reads?

Repeat copy 1 Repeat copy 2

Long reads can span repeats

Polymorphic contig 2Polymorphic contig 2

Polymorphic contig 3Polymorphic contig 3

Contig 4Contig 1

and heterozygous regions

Page 20: Combining PacBio with short read technology for improved de novo genome assembly

PacBio to the rescue?

Page 21: Combining PacBio with short read technology for improved de novo genome assembly

High-throughput sequencing

Library preparation

Large Insert Sizes Single pass

Small Insert Sizes

Multiple passes

Continued generationsof reads

Page 22: Combining PacBio with short read technology for improved de novo genome assembly

High-throughput sequencing

Raw read length

Page 23: Combining PacBio with short read technology for improved de novo genome assembly

Large Insert Sizes Single pass

Small Insert Sizes

Multiple passes

High-throughput sequencing

Raw reads and subreads

‘Subreads’

Page 24: Combining PacBio with short read technology for improved de novo genome assembly

Large Insert Sizes Single pass

PacBio: uses

Long reads low quality

Useful for assembly?

85-87% accuracy

Page 25: Combining PacBio with short read technology for improved de novo genome assembly

Solutions for assembly

Page 26: Combining PacBio with short read technology for improved de novo genome assembly

Solutions for assembly (1)

Designed by Pacific Biosciences

http://www.clker.com/clipart-4245.html

Page 27: Combining PacBio with short read technology for improved de novo genome assembly

Solutions for assembly (2)Broad Institute

Need a special recipefor sequencing

Page 28: Combining PacBio with short read technology for improved de novo genome assembly

Solutions for assembly (3)

PacBioToCAError correct with short reads

http://schatzlab.cshl.edu/presentations/2012-01-17.PAG.SMRTassembly.pdf

Celera assembler

Page 29: Combining PacBio with short read technology for improved de novo genome assembly

PacBioToCA

Koren et al, 2012

Page 30: Combining PacBio with short read technology for improved de novo genome assembly

Shameless self-promotion

flxlexblog.wordpress.com

Page 31: Combining PacBio with short read technology for improved de novo genome assembly

Shameless self-promotion

@lexnederbragt

Page 32: Combining PacBio with short read technology for improved de novo genome assembly

The Atlantic cod genome project

Page 33: Combining PacBio with short read technology for improved de novo genome assembly

First draft

Fragmented assembly- short contigs- many gap bases

http://en.wikipedia.org

Page 34: Combining PacBio with short read technology for improved de novo genome assembly

First draft

6467 scaffolds

35% gap bases

Page 35: Combining PacBio with short read technology for improved de novo genome assembly

The causes

Short Tandem Repeats (>20% of gaps)

Page 36: Combining PacBio with short read technology for improved de novo genome assembly

The causes

Polymorphic contig 2Polymorphic contig 2

Polymorphic contig 3Polymorphic contig 3

Contig 4Contig 1

Heterozygosity?

Page 37: Combining PacBio with short read technology for improved de novo genome assembly

23 pseudochromosomes

Below 5% gap bases

Longer contigs

The goal

PacBio to the rescue?

Page 38: Combining PacBio with short read technology for improved de novo genome assembly

Large Insert Sizes

The approach

Libraries

Aim for looooong insert sizes

Page 39: Combining PacBio with short read technology for improved de novo genome assembly

Large Insert Sizes Single pass

The approach

Sequencing

Sequence with 90 minute movies

10 x coverage in reads of at least 3000 bp

No, we don’t throw this away…

Page 40: Combining PacBio with short read technology for improved de novo genome assembly

The approach

Error-correction

Page 41: Combining PacBio with short read technology for improved de novo genome assembly

PacBio results

Fraction of bases at minimum length

4kb insert

10kb insert 110kb insert 2

Large library insert size important!

Page 42: Combining PacBio with short read technology for improved de novo genome assembly

PacBio results

64 SMRT Cells

Large Insert Sizes

2.2 Gigabytes in longest subreads readsLargest 15 kbp

3.2 Gigabytes in raw reads at least 3kb3.8 x coverage

Page 43: Combining PacBio with short read technology for improved de novo genome assembly

PacBio results

Mapping to the cod genome11.4 kbp subread

10.9 kbp subread

10.6 kbp subread

Page 44: Combining PacBio with short read technology for improved de novo genome assembly

Example 1

232 bp Gap

ACACAC repeat

TGTGTG repeat

Page 45: Combining PacBio with short read technology for improved de novo genome assembly

Example 1

Page 46: Combining PacBio with short read technology for improved de novo genome assembly

Example 1

Page 47: Combining PacBio with short read technology for improved de novo genome assembly

Example 1

PacBio reads

Scaffold

Unplaced contig

...ACACAC TGTGTG...

Page 48: Combining PacBio with short read technology for improved de novo genome assembly

Example 2

TGTGTG repeat

344 bp Gap

Page 49: Combining PacBio with short read technology for improved de novo genome assembly

Example 2

Page 50: Combining PacBio with short read technology for improved de novo genome assembly

Example 2

PacBio reads

Scaffold

Heterozygosity?

...TGTGTG

Page 51: Combining PacBio with short read technology for improved de novo genome assembly

Example 3

PacBio reads

Scaffold

300 bp misassembly?

Page 52: Combining PacBio with short read technology for improved de novo genome assembly

Error-correction

http://openclipart.org/

Page 53: Combining PacBio with short read technology for improved de novo genome assembly

Outlook

Will PacBio solve our problems?

Page 54: Combining PacBio with short read technology for improved de novo genome assembly

Outlook

Or

Page 55: Combining PacBio with short read technology for improved de novo genome assembly

Outlook

Will we find the heterozygous regions?

Polymorphic contig 2Polymorphic contig 2

Polymorphic contig 3Polymorphic contig 3

Contig 4Contig 1

Page 56: Combining PacBio with short read technology for improved de novo genome assembly

Outlook

http://www.pasteur.fr/recherche/unites/Bbi/en.wikipedia.organd Martin Malmstrøm