www.454.com Xuemin Liu, PhD GS FLX+ System Delivering sequencing reads up to 1,000 bp
www.454.com
Xuemin Liu, PhD
GS FLX+ System
Delivering sequencing reads up to 1,000 bp
www.454.com
IMPORTANT NOTICE
Intended Use
Unless explicitly stated otherwise, all Roche Applied Science
and 454 Life Sciences products and services referenced in
this presentation/document are intended for the following use:
For Life Science Research Only.
Not for Use in Diagnostic Procedures.
Overview
GS FLX+ System Information
GS FLX+ Performance
GS FLX+ Application Data
GS Junior Introduction
454 Sequencing Read Length Evolution
Raising the bar in next-gen sequencing
GS 20
GS FLX Standard
100 bp
250 bp
2005 2007 2008
GS FLX Titanium
2011
GS FLX+ New!
Re
ad
Le
ng
th (
bp
)
1,000
500
250
750
GS Junior 400 bp
Up to
1,000 bp
www.454.com
GS Junior GS FLX+
Introducing the GS FLX+ System
Longer reads, more complete genome coverage
True capillary sequencing read lengths: Longest reads over 1 kb; greater
than 80% of total bases from reads longer than 500 bp
More complete genome coverage: Generate more accurate de novo
assemblies with fewer gaps and better coverage of repeat regions
Long GS FLX+ reads provide accurate scaffolds: Co-assemble with
short reads for more cost-effective de novo assemblies of complex genomes
More comprehensive transcript coverage: Single reads span more exons
for more complete transcript reconstruction
GS FLX+ System
Overview
Latest generation of the GS FLX System
Available as new instrument or on-site
upgrade to existing instrument
Choose from new GS FLX Titanium
Sequencing Kit XL+ for extra-long read
sequencing, or existing GS FLX Titanium
Sequencing Kit XLR70
Uses existing Rapid Library Prep and emPCR
kits, with slight protocol modifications for
extra-long read XL+ sequencing
New GS FLX+ Computing Station: Off-the-
shelf workstation optimized for XL+ long
read processing
7
Overview
System Information
Performance
Application Data
GS FLX+ Upgrade
How it works
New v2.6 instrument software:
- Sequencing cycles increased from
200 to 400 per run for XL+
Minor instrument modifications:
- New reagent door to accommodate
increased buffer volumes
- New chiller adapted for either XL+
or XLR70 reagent cassettes
- Waste ported into carboy outside
instrument for easy disposal
9
New Waste Disposal
Overview
System Information
Performance
Application Data
GS FLX+ System Typical Performance
Overview
11
Sequencing Kit New! XL+ XLR70
Read Lengths Up to 1,000 bp Up to 600 bp
Modal Read Length 700 bp 450 bp
Throughput Profile
- 85% of total bases from
reads >500 bp
- 45% of total bases from
reads >700 bp
- 85% of total bases from
reads > 300 bp
- 20% of total bases from
reads > 500 bp
Typical Throughput 700 Mb 450 Mb
Reads per Run ~1,000,000
Consensus Accuracy* 99.997% 99.995%
Run Time 23 hours 10 hours
Typical performance. Actual results depend on specific sample and genomic characteristics
*Consensus accuracy at 15x coverage E. coli.
GS FLX+ Read Length Distribution
Example Runs: Plant and animal
12
Budgie Bird (Parakeet)
Mode Read Length = 801 bp
Throughput = 860 Mb
Rice
Mode Read Length = 775 bp
Throughput = 753 Mb
GS FLX+ Read Length Distribution
Example Runs: Human BACs, bacteria
Human BACs
Mode Read Length = 725 bp
Throughput = 772 Mb
Bordetella pertussis
Mode Read Length = 685 bp
Throughput = 790 Mb
GS FLX+ Improvements
Significantly more bases from Sanger-like reads
- Example E. coli sequencing runs on the GS FLX+ System; one run using XLR70
backwards compatibility and one using XL+ sequencing kit
- New GS FLX+ generates significantly more bases from Sanger-like reads
(>500 bp in length)
0
1000000
2000000
3000000
0 100 200 300 400 500 600 700 800 900 1000
Ba
se
Co
un
t
Read Length (bp)
Base Count Distribution
GS FLX+ GS FLX
GS FLX + System
High accuracy over the long read
Source: Mapped E. coli reads from a GS FLX+ run and Illumina TruSeq run
Illumina TruSeq E. coli data set from http://www.illumina.com/truseq/tru_resources/datasets.ilmn
70%
75%
80%
85%
90%
95%
100%
0 100 200 300 400 500 600 700 800 900
Av
g.
Sin
gle
Re
ad
Ac
cu
rac
y
Read Length (bp)
GS FLX+
Illumina TruSeq
Overview
System Information
Performance
Application Data
GS FLX+ Applications
De novo Complex Genome Assembly
- Animal – Budgie Bird
- Assembly comparison using XL+ and XLR70
- Hybrid assembly using 454 Systems and
Illumina
cDNA/Transcriptome Assembly
Sequence Capture
Shotgun Metagenomics
17
Application Data
- De novo Genome Sequencing
- Transcriptome Sequencing
- Sequence Capture
- Shotgun Metagenomics
GS FLX+ Applications
Transcriptome Sequencing
Sample: Drosophila larva; gene count -
16,600; 21,784 transcripts
Question: Do longer read lengths improve
transcriptome assembly?
Experimental Design:
- Random primed Drosophila larva cDNA
libraries (from mRNA)
- GS FLX cDNA Rapid library protocol
- Sequenced using XL+ and XLR70
Analysis: Compared mapping and assembly
metrics
Drosophila larva
Fragmentation
200 ng mRNA 50 to 2000 nt
AAAAAAAA
AAAAAAAA
First strand
cDNA synthesis
5’
3’
Fragmented mRNA
P-N6-primer
First strand cDNA
Second strand
synthesis
3’ 5’
5’
3’ 3’
5’ dscDNA with a 3’ overhang
End
repair
P-
A -
- A
-P
Blunt end / Phosphorylation and
the addition of overhanging A
Adaptor
ligation
Final Library
Enters the Rapid Library Prep Protocol
GS FLX Titanium
emPCR amplification
&
Sequencing
Adaptor Adaptor
cDNA Rapid Library Protocol Overview
Transcriptome Sequencing
Example Read Length Distribution Comparison
21
0
2000
4000
6000
8000
10000
12000
14000
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000
GS FLX+
GS FLX
Read Length
Re
ad
Co
un
t
Transcriptome Sequencing
Transcript Coverage Analysis
Transcripts are grouped according to their lengths:
- Small transcripts: ≤ 1.5 kb
- Medium transcripts: 1.5 to 4.5 kb
- Large transcripts: ≥ 4.5 kb
Within each and every transcript, every 5% of its length would be mapped
back to the reference and percentage of mappable hits are reported
Same number of reads from XLR70 and XL+ are compared
Transcript Coverage
Extra long reads provide much better coverage of long transcripts.
GS FLX+
GS FLX
Large Transcript Coverage
0
10
20
30
40
50
60
70
80
90
100
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
5' to 3' Transcript Coverage
%L
arg
e T
ran
sc
rip
ts (
>4
50
0)
wit
h E
ST
Ev
ide
nc
e
Small Transcript Coverage
0
10
20
30
40
50
60
70
80
90
100
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
5' to 3' Transcript Coverage
% S
ma
ll T
ras
nc
rip
ts (
<=
15
00
) w
ith
ES
T
Ev
ide
nc
e
Medium Transcript Coverage
0
10
20
30
40
50
60
70
80
90
100
5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
5' to 3' Transcript Coverage
%M
ed
ium
Tra
ns
cri
pts
(>
15
00
&<
45
00
)
wit
h E
ST
Ev
ide
nc
e
≤ 1.5 kb 1.5 to 4.5 kb
≥ 4.5 kb
Mapping to the Chromosome GS FLX GS FLX+ Benefit
Number of Reads 1,557,790
% Reads Mapped 94% 92%
Number of Bases Mapped 547,105,393 732,429,546 +34%
Gene/Transcript/Exon Detection
% Genes 86% 86% -
% Transcripts 89% 89% -
% Exons 83% 85% +2%
% Avg. Transcript Length Covered 60% 74% +14%
% Chromosome Coverage
chr2L 24% 26% +2%
chr2R 29% 31% +2%
chr3L 25% 25% +2%
chr3R 26% 29% +3%
chr4 39% 40% +1%
chrX 25% 28% +3%
Transcriptome Sequencing
Mapping Results
Extra long reads improve transcriptome mappability
Transcriptome Sequencing Assembly Results – Decrease Ambiguity – Improve Assembly
Transcriptome Assembly
predicted gene locus
or transcriptional unit
predicted
transcripts
A set of reads assembled into different
combinations
The fewer isotigs and isogroups, the better de novo assembly of the
transcriptome.
GS FLX GS FLX+ Benefit
Number of Reads 1,557,790
Number of Isogroups 21323 19229 -10% (fewer is better)
Number of Isotigs 25019 24545 -2% (fewer is better)
Number of Bases 31,702,745 41,016,364 +29%
Average Isotig Size 1267 1671 +32%
N50 Isotig Size 1922 2565 +33%
Largest Isotig Size 26377 31533 +20%
Transcriptome Sequencing
Transcript Assembly Example
Reference TepII transcript is 4,535 bp in length
Assembled putative transcript from GS FLX+ EST data is an accurate
reconstruction of the full length reference transcript, including the splice
junctions
Reference
GS FLX
GS FLX+
Reference Source: Flybase Version 5.18
Transcriptome de novo Assembly
GS FLX+ Advancements
Extra-long reads:
• Increase coverage of full-length transcripts, particularly transcripts with
length ≥ 4.5 kb
• Span more exons and thus provide more information on splicing patterns
and spice junctions
• Generate more accurate and complete assemblies by reducing ambiguity
• Increase the total bases or genes that can be mapped uniquely to the
genome
Application Data
- De novo Genome Sequencing
- Transcriptome Sequencing
- Sequence Capture
- Shotgun Metagenomics
GS FLX+ Applications
Sequence Capture
Genome Info: Maize (plant), highly repetitive,
diploid. Known very complicated sequence.
Able to test assembly based upon exonic
capture by looking for intronic known bases
in the reads.
Questions: Can you capture increased GS
FLX+ fragment size? What is the value of
extra long reads?
Experimental Design: Library Prep
NimbleGen Array-based Capture XL+
Sequencing
Analysis: Reference mapping and
comparison to the capture design
NimbleGen Sequence Capture Workflow
Genomic DNA Library Preparation Hybridization Capture and Washing Amplification and
QC
Sequencing
Target
Regions
Amplify DNA and
Enrichment QC
Prepare with Next-
Gen Sequencing
Adaptors
Sequencing with
GS FLX+ System SeqCap EZ (Solution Capture)
Sequence Capture (Array Capture)
Array Probes
Sequencing
Reads
Sequence Capture
Results
Depth of
Coverage
GS FLX + reads span exons and complete intronic regions from one
exon to another.
Target Region
GS FLX+ Advancements
Sequence Capture
Extra-long reads:
• Span exons and complete intronic regions from one exon to another
• Improve mapability leading to better SNP/trait linkage
• Better resolution of homologs and paralogs
• Improve allele resolution of high ploidy genomes
Application Data
- De novo Genome Sequencing
- Transcriptome Sequencing
- Sequence Capture
- Shotgun Metagenomics
Shotgun Metagenomics
Gut Microbes
Human discordant twins – obese
vs slim
Gut microbes isolated from both
and sequenced
To study if gut microbes play a
role in human obesity
Long reads improve identification
of microbes and provide more
species information
Confidential
GS FLX+ Sequencing Run Example
Metagenomics - Human Gut Microbes
GS FLX+ Demonstrated Applications
De novo Genome Assembly
- Resolve more repeats: Ideal for complex genome assemblies including animal, plant, and bacteria
- Improve assembly quality with fewer scaffolds/contigs at lower coverage
Transcriptome Sequencing
- Cover more exons / splice junctions with single reads
- Improve coverage of long transcripts
- Accurately reconstruct gene models
Sequence Capture
- Extend coverage beyond target regions into intergenic/intronic sequence
Metagenomics (Shotgun)
- Improve sensitivity and specificity of taxonomic assignment
GS FLX+ System
Summary
Longer capillary sequencing-like read lengths
Increased throughput
More complete genome and transcriptome coverage
Better results with improved economics!
Overview
GS FLX+ System Information
GS FLX+ Performance
GS FLX+ Application Data
GS Junior Introduction
GS Junior System
- An integrated solution from to sample to analysis
Day 3
数据分析
Day 2
测序
Day 1
样品制备
快速文库制备及emPCR制备
- 需要最少的实验室配备
• emPCR富集
• 即用的测序试剂
• 测序过夜 – 10小时运行时间
• 2 小时完成数据分析
• 界面友好的软件支持拼装,图谱分析以及扩增子变异分析
www.454.com
Performance Summary
GS Junior System
GS Junior System
Throughput 35 million bases shotgun, 24 million bases amplicon (approx.)
Avg. Read Length 400 bases (approx.)
HQ Reads per Run 100,000 shotgun, 70,000 amplicon (approx.)
Accuracy Q20 read length at 400 bases (99% accuracy at 400 bases)
Run Time 10 hours sequencing, 2 hours data processing
Sample Input Purified gDNA, amplicons, cDNA, depending on application
*Per run specifications is for shotgun libraries, and can vary based on the organism and genomic content.
Reference organism is E. coli.
GS Junior System
- High Quality Data and Reliable Performance
Num
ber
of
reads
Readlength (bases)
One GS Junior System run produces reads from 50-600 or
more in length
Average is in 330-400 base range
Most reads are in the 450-550 base range
Q20 read length of 400 bases (99% accuracy at 400 bases)
GS Junior Instrument
GS Junior Titanium Kits GS Junior Attendant PC
GS Junior System in Your Lab
Next-gen sequencing & analysis in a corner of
your lab
7 ft 8 ft
Sequencing Based Assays
Full 454 Product Co-Development
Released HLA LSR ADME - RainDance
Future
TET2/CBL/KRAS (2011)
(Cancer/Leukemia)
Cancer Exons – NimbleGen
RUNX1 (2011)
(Cancer)
CFTR - Multiplicom
HLA Registry (2012)
(Bone Marrow Transfer)
HIV resistance CE (2012)
HCV resistance CE (2012)
www.454.com
www.454.com
Beltsville, Maryland
New York, New York
Atlanta, Georgia
Where In the World is GS Junior?
In hundreds of labs worldwide!
Houston, Texas
Denver, Colorado
La Jolla, California
Corvallis, Oregon
Queensland, Australia
Perth, Australia
Ontario, Canada
Wenzhou, China
Copenhagen, Denmark
Oulu, Finland
Graz, Austria
Marcy L’etoile, France
Munich, Germany
Seoul, Korea
Tokyo, Japan
Auckland,
New Zealand
Barcelona, Spain
London, England
Sao Paulo, Brazil
Cape Town, South Africa
Taipei, Taiwan
Quezon City, Philippines
Riyadh, Saudi Arabia
Osaka, Japan
Warsaw, Poland
Tallinn, Estonia
Mountain View, Cali
Branford, Connecticut
Nashville, Tennessee
Geneva, Switzerland
Hong Kong, China
Note: Placements shown are a representative sample.
Disclaimer & Trademarks
For life science research only. Not for use in diagnostic procedures.
454, 454 SEQUENCING, 454 LIFE SCIENCE, EMCPR, GS FLX,
GS FLX TITANIUM, GS JUNIOR, NEWBLER, NIMBLEGEN, and
SEQCAP are trademarks of Roche.
Other brands or product names are trademarks of their respective holders.