Top Banner
Tomato Genome SL2.50 and Beyond… Surya Saha, Jeremy Edwards and Lukas Mueller Sol Genomics Network (SGN) Boyce Thompson Institute, Ithaca, NY [email protected] @ SahaSurya Slides: http://bit.ly/PAGbld230 https://fanart.tv/movie/196/back-to-the-future-part-iii/
22
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Tomato Genome SL2.50 and Beyond…

Tomato Genome SL2.50 and

Beyond…

Surya Saha, Jeremy Edwards and Lukas Mueller

Sol Genomics Network (SGN)

Boyce Thompson Institute, Ithaca, NY

[email protected] @SahaSurya

Slides: http://bit.ly/PAGbld230

https://fanart.tv/movie/196/back-to-the-future-part-iii/

Page 2: Tomato Genome SL2.50 and Beyond…

CHROMOSOMES

SCAFFOLDSCONTIGS

Gene to Genome – The BIG picture

SCAFFOLD GAPS

CHROMOSOME GAPS

SGN Workshop, PAG 2015

GENES

TM2 (Chr 9)

L2 (Chr 10)

Page 3: Tomato Genome SL2.50 and Beyond…

Tomato Build SL2.40 SL2.50

SGN Workshop, PAG 2015

Lindsay Shearer

Stephen Stack

Page 4: Tomato Genome SL2.50 and Beyond…

Genome Assembly @NCBI

Contigs

• Components

Tiling Path file

(TPF)

• Accession numbers

• Can have nested

components

Accession

Golden Path files

(AGP)

• Scaffold IDs

• Orientation

• Chromosome from

contig AGP

• Chromosome from

scaffold AGP

• Scaffold from

contig AGP

NCBI

Page 5: Tomato Genome SL2.50 and Beyond…

SGN Workshop, PAG 2015

Jeremy Edwards

https://github.com/solgenomics/Bio-GenomeUpdate

FISH• Order

• Orientation

• Gap sizes

Tiling Path file

(TPF)

Accession

Golden Path files

(AGP)NCBI

Gap extension

Scaffold flip

Page 6: Tomato Genome SL2.50 and Beyond…

SGN Workshop, PAG 2015

Jeremy Edwards

https://github.com/solgenomics/Bio-GenomeUpdate

SL2.40 Annotation

• SL2.40 AGP

• SL2.50 AGP

• SL2.40 GFF3

SL2.50 Annotation

• SL2.50 GFF3

• Validated via Fasta

Errors corrected

• Start/end coordinates in different scaffolds

• Start > end coordinates for UTRs

• Start or end coordinates in gap region

• Dropped Solyc03g053140.1 and Solyc12g032910.1

Page 7: Tomato Genome SL2.50 and Beyond…

SL2.50 Availability

JBrowse

FTP Site

SGN Locus/Gene Pages

NCBI

SGN Workshop, PAG 2015

Page 8: Tomato Genome SL2.50 and Beyond…

SL2.50 Genome Release

Genome build

2.5 Fasta

+

ITAG 2.4 GFFs

CHADO

FTP site

Website

JBrowse

Blast DBs

SGN Workshop, PAG 2015

Page 9: Tomato Genome SL2.50 and Beyond…

State of the SL2.50 Build

SGN Workshop, PAG 2015

0

20000000

40000000

60000000

80000000

100000000

120000000

0 1 2 3 4 5 6 7 8 9 10 11 12

Page 10: Tomato Genome SL2.50 and Beyond…

State of the SL2.50 Build

SGN Workshop, PAG 2015

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 1 2 3 4 5 6 7 8 9 10 11 12

Sequence Scaffold gap length Component gap length

Page 11: Tomato Genome SL2.50 and Beyond…

State of the SL2.50 Build

SGN Workshop, PAG 2015

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

0 1 2 3 4 5 6 7 8 9 10 11 12

Sequence Scaffold gap length Component gap length

Length 823Mb

Sequence 737Mb

Component gaps 43Mb (5.30%)

Scaffold gaps 42Mb (5.17%)

Total gaps 86Mb (10.47%)

Page 12: Tomato Genome SL2.50 and Beyond…

SGN Workshop, PAG 2015

https://fanart.tv/movie/196/back-to-the-future-part-iii/

Page 13: Tomato Genome SL2.50 and Beyond…

BAC Resources

SGN Workshop, PAG 2015

Page 14: Tomato Genome SL2.50 and Beyond…

BAC Resources

Bruce Roe

HTGS Phase 1: 332

HTGS Phase 2: 520

HTGS Phase 3: 2751

http://www.ncbi.nlm.nih.gov/genbank/htgs/faq

SGN Workshop, PAG 2015

Page 15: Tomato Genome SL2.50 and Beyond…

HTGS Phase 3 BACs

SGN Workshop, PAG 2015

Chr 0 53

Chr 1 589

Chr 2 248

Chr 3 137

Chr 4 147

Chr 5 117

Chr 6 104

Chr 7 111

Chr 8 249

Chr 9 119

Chr 10 620

Chr 11 100

Chr 12 86

Unknown 84

Page 16: Tomato Genome SL2.50 and Beyond…

SGN Workshop, PAG 2015

Jeremy Edwards

https://github.com/solgenomics/Bio-GenomeUpdate

BAC assemblies

• Phrap

• ACE files

BAC sets

• Assembled BACs

• Singleton BACs

Align to SL2.50

• Nucmer

• 100bp word size

• 500bp minimum alignment

• 99% identity

Novel sequences

• Extensions

• Gap coverage

Page 17: Tomato Genome SL2.50 and Beyond…

HTGS Phase 3 BACs

SGN Workshop, PAG 2015

0

100

200

300

400

500

600

700

1 2 3 4 5 6 7 8 9 10 11 12

Page 18: Tomato Genome SL2.50 and Beyond…

Phrap Assembly (HTGS Phase 3 BACs)

SGN Workshop, PAG 2015

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 2 3 4 5 6 7 8 9 10 11 12

Assembled BACs Singleton BACs

Page 19: Tomato Genome SL2.50 and Beyond…

Phrap Assembly (HTGS Phase 3 BACs)

SGN Workshop, PAG 2015

Chr10 Contig68 10 BACs (242Kb!!)

Chr2 Contig185 7 BACs (566Kb!!)

Page 20: Tomato Genome SL2.50 and Beyond…

Future Work

• Manually examine assembled BAC contigs with < 99% identity

• Evaluate HTGS phase 2 BACs

• Use PCR walking to close gaps

• Create TPF files for SL3.0

• Annotate SL3.0 and lift over annotations from SL2.50

SGN Workshop, PAG 2015

Page 21: Tomato Genome SL2.50 and Beyond…

Acknowledgements

SGN Workshop, PAG 2015

Page 22: Tomato Genome SL2.50 and Beyond…

SGN Workshop, PAG 2015