Top Banner
Genome Assembly at JGI Alicia Clum Genomic Technologies Workshop JGI User Meeting March 22, 2016
26

Genome Assembly at JGI

Oct 29, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Genome Assembly at JGI

Genome Assembly at JGI

Alicia Clum Genomic Technologies Workshop JGI User Meeting March 22, 2016

Page 2: Genome Assembly at JGI

Outline

• Overview •  Improving assemblies with long

read technology •  Future improvements

3/23/16 2

Page 3: Genome Assembly at JGI

Outline

• Overview •  Improving assemblies with long

read technology •  Future improvements

3/23/16 3

Page 4: Genome Assembly at JGI

Genome assembly review

3/23/16 4

Genomic DNA

fragmentation

Library creation

Sequencing

Assemble reads

Page 5: Genome Assembly at JGI

Overview of assembly at JGI

ProgramSize (MB) LibrariesAssembler

Target assemblies / year

Microbe 5 1 SPAdes/ HGAP 1,330

Fungi 10's 1 ALLPATHS-LG/ Falcon 160

Plant100-10

000 3+

Arachne/ ALLPATHS-LG/Falcon 20

Metagenome10-100

00 1 MEGAHIT 825

Page 6: Genome Assembly at JGI

Challenges in genome assembly

• Repeat content • Genome size • GC content • DNA quality

and quantity •  Ploidy

Genome Size (MB)

Rep

eat C

onte

nt

Fungal Repeat Content vs Genome Size (MB)

•  37 MB median genome size •  9% median repeat content

Page 7: Genome Assembly at JGI

Making assemblies better

Page 8: Genome Assembly at JGI

Outline

• Overview •  Improving assemblies with long

read technology •  Future improvements

3/23/16 8

Page 9: Genome Assembly at JGI

Microbial drafts- number of contigs by data type

Num

ber o

f con

tigs

Illumina fragment

PacBio 10kb

Data Type

Median=43 N=1203

Median=2 N=216

Page 10: Genome Assembly at JGI

Overview of Assembly at JGI

ProgramSize (MB) LibrariesAssembler

Target genomes / year

Microbe 5 1 SPAdes/ HGAP 1,330

Fungi 10's 1 ALLPATHS-LG/ Falcon 160

Plant100-10

000 3+

Arachne/ ALLPATHS-LG/Falcon 20

Metagenome10-100

00 1 MEGAHIT 825

Page 11: Genome Assembly at JGI

Timeline - PacBio for fungal genomes

Feb. - First Illumina/PacBio hybrid release (APLG)

2012 2013

May - First PacBio only release (HBAR-DTK)

2014

July – Falcon development begins

summer – JGI Falcon testing begins, first good diploid assemblies

July – daligner work begins

2015

Jan. – Falcon incorporates daligner

Oct. – First Falcon assembly to annotation

Summer -Validated switch to PacBio for fungal assemblies for FY 2016

2016

Page 12: Genome Assembly at JGI

Can a single PacBio library approach produce better fungal assemblies?

Genome Size (MB)Repeat Content (%)PloidyClavicorona pyxidata 43 14 diploidByssothecium circinans 48 15 haploidClathrospora elynae 45 47 haploidLindgomyces ingoldianus 66 20 diploid

1 Illumina fragment library

1 Illumina 4kb mate-pair library

10 kb AMPure PacBio library

ALLPATHS-LG Falcon

4 fungal genomes (~5 ug DNA each)

Image Credit: Laszlo Nagy, Manfred Binder, Pedro Crous, David Culley

Page 13: Genome Assembly at JGI

PacBio assemblies have fewer contigs

0

500

1000

1500

2000

2500

Clavicorona pyxidata

Byssothecium circinans

Clathrospora elynae

Lindgomyces ingoldianus

Con

tigs

(N)

Genome

Number of Contigs

PacBio

Illumina

Page 14: Genome Assembly at JGI

PacBio assemblies produce longer contigs

0 100 200 300 400 500 600 700 800

Clavicorona pyxidata

Byssothecium circinans

Clathrospora elynae

Lindgomyces ingoldianus

Con

tig L

50 (k

b)

Genome

Contig L50

PacBio

Illumina

Page 15: Genome Assembly at JGI

PacBio assemblies are larger

•  larger assembled genome sizes representing assembled repeat content

0 10 20 30 40 50 60 70 80

Clavicorona pyxidata

Byssothecium circinans

Clathrospora elynae

Lindgomyces ingoldianus

Ass

embl

ed S

ize

(MB

)

Genome

Assembled Genome Size

PacBio

Illumina

Page 16: Genome Assembly at JGI

PacBio assembles more repeat content

0

10

20

30

40

50

60

Basme Boled Hesve Lacbi Lizem Pirfi

Mas

ked

Sequ

ence

(%)

Genome

Percent of Assembled Genome Repeat Masked

PacBio

Illumina

Median difference of 7 % between how much sequence is masked in Illumina vs. PacBio

Data courtesy of the fungal annotation team

Page 17: Genome Assembly at JGI

PacBio only assembly now implemented for fungal assembly Genomic

DNA

Short insert fragment (270bp)

Random fragmentation

Paired-end short insert

reads (millions)

Library Creation

Sequencing

Assemble reads

Long fragment (10kb)

Long reads (~100,000)

Illumina PacBio

Page 18: Genome Assembly at JGI

Outline

• Overview •  Improving assemblies with long

read technology •  Future improvements

3/23/16 18

Page 19: Genome Assembly at JGI

Courtesy: Jason Chin

Page 20: Genome Assembly at JGI

Courtesy: Jason Chin

(Clavicorona pyxidata HHB10654)

Managed to phase >50% of the genome. JGI data with current Falcon is at < 25%.

Page 21: Genome Assembly at JGI

Conclusions

•  Assembly pipelines vary by program and input data

•  Long read technology and assembly algorithm development have improved assembly results

•  Continued efforts for further improvements

Page 22: Genome Assembly at JGI

Acknowledgments

3/23/16 22

JGI Alex Copeland Igor Grigoriev & Fungal Annotation Group Chris Daum & Sequencing Technologies Group Genome Assembly & QA/QC Groups Pacific Biosciences Jason Chin Paul Peluso David Rank Kristi Spittle

Page 23: Genome Assembly at JGI

Supplement

3/23/16 23

Page 24: Genome Assembly at JGI

Long Reads Span Common Repetitive Elements

3/23/16 24

Example for the Input Data: Length Distribution of the Pre-assembled Reads For Assembly

6

Transposons

45S rDNAs

Retrotransposons

Common repeat element lengths

Methods for pre-assembly consensus: Genome Biology 2013, 14:R101 S. Koren, et al. Nature Methods 10, 563–569 (2013), C.-S. Chin, et al.

Acc. > 99%

PacBio Read Length Distribution

Page 25: Genome Assembly at JGI

>10kb AMPure Subread Lengths

L50 subread lengths range from 3.3 kb-6.5 kb

Page 26: Genome Assembly at JGI

Evaluating Assemblers

3/23/16 26