Top Banner
Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France
17

Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Dec 14, 2015

Download

Documents

Amina Oakland
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Frédéric CHOULET

A pseudomolecule of 774 Mb: the 3B experience

INRA GDEC – Clermont-Ferrand, France

Page 2: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

3B

Sequenced physical map

#MTP BACs 8452

3B MTP-BAC sequencing

#BAC pools 922

#Roche 8 kb MP lib. 922

bp coverage (Roche/454) 36x

BAC-ends (Sanger) 42,551

Whole Genome Prof. tags 327,282

Whole 3B shotgun (Illumina) 82x

Page 3: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

#BACs

#BAC-contigs

Physical map

132,000 (19x)

1282

#MTP-BACs 8452

3B900 Mb

3B physical map

Page 4: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

ACGTAGACTACA

3B

Assembly and scaffolding 3B-v1

16,136 scaff

1,040 Mb

Page 5: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

o Curation of the scaffoldingV. Barbe, S. Mangenot (Genoscope)

Assembly and scaffolding

Integration of BAC-end match positions

Parsing of MP read positions

scaff00001 scaff00013

scaff00024 scaff00008scaff00011

scaff00007

scaff00005

3B-v1

16,136 scaff

1,040 Mb

18% Ns

3B-v3

4,999 scaff

992 Mb

13% Ns

Page 6: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

o Curation of the scaffoldingV. Barbe, S. Mangenot (Genoscope)

Assembly and scaffolding

3B-v3

4,999 scaff

992 Mb

13% Ns

3B-v1

16,136 scaff

1,040 Mb

o Gap filling o Seq. error corrections

JM. Aury, A. Couloux (Genoscope)

Illumina readsWhole 3B Shotgun

109,914 gaps filled 126,290 bases corrected (error rate: 0.1%)

18% Ns

3B-v4

-

-

8% Ns

Page 7: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

o Curation of the scaffoldingV. Barbe, S. Mangenot (Genoscope)

Assembly and scaffolding 3B-v1

16,136 scaff

1,040 Mb

o Gap filling o Seq. error corrections

JM. Aury, A. Couloux (Genoscope)

o Redundancy removal and scaffold mergingS. Theil (INRA GDEC)

Pool_A

Pool_B

ctg1

ctg2

2,808 scaff

833 Mb

3B-v443

scaffAssembler.pl

3B-v4

4,999 scaff

992 Mb

redundancy:160 Mb

Page 8: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Search for shared TE-junctions

Page 9: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Ordering scaffolds 2,808 scaff

833 Mb

3B-v443

o SNP discovery

BaitTE

DNA captured from 10 genotypes

gene

52,265 baits isbpProbeDesign.pl 39,077 SNPs

SureSelect® seq. capture (E. Paux, N. Cubizolles, E. Rey)

Page 10: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Ordering scaffolds

o SNP discovery

Genetic mapping (P. Sourdille)

+ Neighbor map: 3865 markers

LD mapping (F. Balfourier)

• Anchor map: 384 indiv Cs x Renan

o Genotyping mapping pop

3,075 SNPs

• 367 lines from a core-collection

Page 11: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

3B

Ordering scaffolds

genetic map

44.8 cM152 scaffolds

LD map

19 LD blocks

366 bins0 cM 133 cM

554 bins

Page 12: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

64 markers at the same genetic position

Linkage Disequilibrium

Page 13: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Ordering scaffolds

pseudomolBuilder.pl

1358 scaff

774 Mb

pseudomolecule

unlocalized

1450 scaff

59 Mb

93%

7%

N N N N N N N N

o SNP discovery

o Genotyping mapping pop

o Integration of phys. map info

Page 14: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

3cM 0 1 2 3 3 4 5 6

A B C D E

o orientation unknown: 48% of the seq.

o micro-order unknown: 554 bins / 1358 scaff

? ?

o RH mapo Optical mapo Long reads

Future Improvements

?

Page 15: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

• 7264 protein coding genes

TRIANNOT

• 234,606 TEs

CLARI-TE

774 Mb

Annotation

Page 16: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Bioinformatics

Scaffolding/pseudomolecule construction

scaffAssembler.pl

Annotation

gapCloser ssrFinishing

triAnnot (new modules: filtering, pseudogenes, transfer annotation)

clari-TE & clari-TE-lib

Data management gowDB (Bio::DB::seqFeatureStore)

Gbrowse @ URGI

pseudomolBuilder.pl

isbpProbeDesign.pl

Assembly Newbler

Page 17: Frédéric CHOULET A pseudomolecule of 774 Mb: the 3B experience INRA GDEC – Clermont-Ferrand, France.

Sébastien TheilNatasha GloverJosquin DaronLise Pingault

Pierre SourdilleEtienne Paux

Philippe Leroy

Jacques Le GouisNicolas Guilhot

Aurélien Bernard

Nelly Cubizolles

Catherine Feuillet

François Balfourier

M. AlauxL. CoudercV. JamillouxH. Quenesville

URGI

H. BergesA. Bellec

CNRGV

BIA

A. AlbertiV. BarbeJ. PoulainC. DurandS. MangenotJM. AuryA. CoulouxP. Wincker

Genoscope

J. DolezelJ. Safar

IEB

K. Vandepoele

K. Mayer et al. P. SchnableS. RounsleyD. Ware

C. Gaspin

SAB

VIB

MIPS

Acknowledgments

Hélène Rimbert

TGACJ. Rogers, M. Caccamo et al.

J. RogersK. Eversole