Ngs intro_v6_public
Post on 10-May-2015
3800 Views
Preview:
DESCRIPTION
Transcript
An Introduction to NGS(Next Generation Sequencing)
François Paillier - 22/02/2011
[ Reminder about Sanger Sequencing ]
• NGS Definition
• Overview of NGS technologies
• NGS Applications & examples
• Conclusion
NOT discussed here : Sequence accuracy, assembly and sampling ; NGS
data Analysis & BioInformatics tools
Plan
Still a gold standard but capillary sequencing has reached its technical
limitation (costs and performance will remain unchanged)
A word about Sanger Sequencing(First generation sequencing machine Video)
3730xl
Principle (only the tube G + dideoxyG)
From gel to
capillary
Short Reminder about « Classical » Assembly
projects
Sample Libraries
n Sequencing sub-projects
Finishing: Draft (Q40)
Assembly
Annotation
Annotated Genome
Target genome
Clone selection &
Sequencing
SubTargets (BACs, cosmids, ..)
Assembly
Other strategy : wgs
Cloning
Sequencing, what for ?
Assembly projects for example
In bioinformatics, sequence assembly refers to aligning and merging fragments of
a much longer DNA sequence in order to reconstruct the original sequence. This
is needed as DNA sequencing technology cannot read whole genomes in one go,
but rather small pieces between 20 and 1000 bases, depending on the technology
used. Typically the short fragments, called reads, result from shotgun sequencing
genomic DNA, or gene transcript (ESTs).
Target genome
Sequencing
Assembly
Consensus
Assembled reads
reads
4X Local coveragescaffold
gap gap gap
Vocabulary that should be kept in mind
in the sequencing field
• Assembly : result of the sequence clustering based on their local
similarity
• Contig : A set of overlapping DNA segments
• Coverage (in sequencing) : The mean number of times a nucleotide is
sequenced in a genome (example: 10X coverage)
• Scaffolds : A series of contigs that are in the right order but not necessarily
connected in one contiguous stretch
• Mate pairs Sequences known to be in the 3′ and 5′ of a contig from a single
clone
• WGS = Whole genome shotgun sequencing strategy
• ESS = Environmental Shotgun Sequencing
NGS = Next Generation
Sequencing
After PCR,
THE new revolution
in Biology ?
First Generation :SANGER Sequencing
Second Generation :
NGS = Massively
Parallel Sequencing
Third Generation :
NGS = HTS, Single
Molecule Sequencing
NGS Synonym is : High-throughput Sequencing
(HTS)
Overview of actual NGS technologies
(Second generation sequencing machines)
Roche, 454 GS-FLXTitanium Protocol a must
Illumina, GA1 then GA2
Applied Bio.,
Solid v3
Each machine with
different :
- Throughput
- Sequence accuracy
- Data formats (and
programs)
*NGS “proof of principle” was done in 2000 by Lynx Therapeutics : They publishes and markets "MPSS" - a parallelized,
adapter/ligation-mediated, bead-based sequencing technology, launching "next-generation" sequencing.
Year 2005*
2006
2007
Throughput per
Illumina Channel
HOW is it
Possible ?
NGS Principle
Building sequencing devices at nanoscale
Polony : Discrete clonal amplifications of a single DNA molecule,
grown in a gel matrix. The clusters can then be individually
sequenced, producing short reads. Polony-based sequencing is
the basis of most second generation sequencers
A typical NGS Workflow is:
1) Library construction
2) Template CLONAL amplification
3) Massively PARALLEL sequencing
High Parallelism is Achieved in
Polony Sequencing
PolonySanger
Generation of Polony array: DNA
Beads (454, SOLiD)
DNA Beads are generated using Emulsion PCR
Generation of Polony array: DNA
Beads (454, SOLiD)
DNA Beads are placed in wells
Sequencing: Pyrosequencing (454)
DNA Polymerase
« pyrogram » / « Flowgram »
454 Process : Emulsion PCR &
Pyrosequencing
Titanium =
Read lengths approx. 400 nt
1 million reads / Run
400 Mb / day
VIDEOs
About Pyrosequencing 1’53’’: <here>
Summary about GS Flex 4’34’’: <click
here>
454 GS FLX titanium
No more Cloning step
From purified DNA to Sequencing
Fit the laboratory bench top / small
LONG Sequences (400 nt)
GS Junior system not so expensive
Capabilities : Multiplexing &
paired-ends
Well fitted to :
- proK. Genome sequencing
- RNA-seq
- Seq. Accuracy not so high
(especially in case of
homopolymers
Main error type is indel
- Cost : approx. 20K€ / Gb
Cost per base is cheaper
(regarding Sanger) but still
High regarding others NexGen
Machines
Illumina* : Bridge PCR
GA2x Version =
Read lengths
approx. 100 nt
240 million reads
1500 Mb / day
30000 Mb / Run
Generation of Polony array: Bridge-
PCR (Solexa)
DNA fragments are attached to array and
used as PCR templates
<Watch VIDEO : Related Links Video : Genome
Analyzer workflow Panel technology>
A Flow cell
8
Lanes
Illumina Chemistry : 4-color DNA sequencing-by-synthesis using reversible
terminators with removable flourescent dyes
Illumina seq. Accuracy
Illumina Throughput
Illumina
No more Cloning step
From purified DNA to Sequencing
Fit the laboratory bench top / small
Good Sequence Accuracy
Capabilities : Multiplexing &
paired-ends
Cost : approx. 2K€ / Gb , Cost per
base is cheaper than 454
Well fitted to :
- proK. Genome sequencing
- RNA-seq, ChIP-Seq,
Methyl-Seq
- Machine is very expensive
Main error type is mismatch
- Read lengths are still too short
Not fitted to big genomes
(Repeats)
- Poor coverage of AT rich regions
- Most widely used NGS platform.
- Requires least DNA
SOLiD system : 4-color DNA Sequencing by
Ligation
SOLiD V3 =
Read lengths
approx. 50 nt
400 million reads
1500 Mb / day
20000 Mb / Run
1500€ / Gb
<Watch Video> 4’46’’
Sequencing by ligation rxn: Fluorescently Labeled
Nucleotides (ABI SOLiD)
Complementary strand elongation: DNA Ligase
Sequencing by ligation ABI SOLiD
5 reading frames, each
position is read twice
Sequencing: Fluorescently Labeled Nucleotides
(ABI SOLiD)
Sequencing: Fluorescently Labeled
Nucleotides (ABI SOLiD)
SOLiD
No more Cloning step
From purified DNA or RNA to Seq.
Fit the laboratory bench top / small
Good Sequence Accuracy
Capabilities : Multiplexing &
paired-ends
Cost : approx. 1.5K€ / Gb , Cost per
base is cheaper than illumina
Well fitted to :
- REsequencing
- RNA-seq, ChIP-Seq,
Methyl-Seq
- This Technology is NOT
Intuitive
- Machine is VERY expensive
-HUGE amount of data produced
(1500 Gb !!)
-Long Run times
-Has been demonstrated
certain reads don’t match
Reference !
Focusing NGS effort on predefined targets :
« Target Enrichment » Technology (Capture Array)
Focusing NGS effort on predefined targets :
« Target Enrichment » Technology (Capture Beads)
Summary : NGS Workflows
Source: BCG
+/- Target Enrichment Strategy
Prokaryotic Genome Sequencing
Project as a mix of NGS technologies
Conclusion :
- High quality drafts can be produced for small genomes without any Sanger data input.
- We found that 454 GSFLX and Solexa/Illumina show great complementarity in producing
large contigs and supercontigs with a low error rate.
NGS Applications
• In different fields…
– Metagenomics
– Genomics
– Transcriptomics
– proteomics
DEEPER insight into biological processes
BROADER sampling of populations (cells, viruses,
Ecosystems…)
Genome
* De Novo Sequencing
* Targeted Resequencing
(SNP, Indel, CNV)
* Whole Genome Resequencing
* Metagenome analyses
Transcriptome
* Gene Expression Profiling
* Small RNA Analysis
* Whole Transcriptome Analysis
Epigenome
* Chromatin Immunoprecipitation
Sequencing (ChIP-Seq)
* Methylation Analysis
…for different
purposes…-Towards Personalized
Medicine
- Biodiversity assessment
-De Novo Sequencing of
prokaryotic or eukaryotic
genomes (or re-sequencing)
-RNA-Seq Annotation of
eukaryotic genomes
-SNP calling : identification of
mutations
-Chip-Seq : identification of
DNA/protein interactions
What is the current impact of
NGS on Biology ?
• Both transcriptomics and genomics can now be adressed using one technology with higher accuracy and robustess (instead of Sanger sequencing + µarrays p.e.) (Example of RNA-SEQ)
• SNP calling can rely on ultra-deep assemblies
• Whole genome overview of transcription factors binding sites
• Biodiversity assessment (Metagenomics projects)
• And so much more…
About whole-exome sequencing :
« For the First Time, DNA Sequencing Technology
Saves A Child's Life »
« Proponents of genetic medicine say DNA sequencing is the future of
medicine and that soon every truly sick person will have his or her genome
sequenced. Critics cite privacy concerns and note that genetic mutations and
variations don’t necessarily lead to medical outcomes. Whatever the
position, it’s hard to argue that this isn’t good news: the first child – plagued
by undiagnosable illness – has been saved by DNA sequencing.
That may be a bit of a strong statement – six-year-old Nicholas Volker is
doing well, though complications could soon arise. But it’s highly likely that
the sequencing of young Nicholas’s genome saved his life. »
<Link> <Article>
Mayer & Al. Genetics IN Medicine • Volume xx, Number xx, 01 2011
What’s Next ?
Second Generation :
NGS = Massively
Parallel Sequencing
(polony sequencing)
Third Generation :
- Single
Molecule Sequencing (no bias)
- Faster
- Cheaper (or not)
- 1000€ Human genome ?
Roche, 454 GS-FLXTitanium
Illumina, GA2
Applied BioSys, Solid v3
PacBioIonTorrent
Conclusion : impact of NGS
Global Shift to sequencing-based technologies
Great improvements on-going : Higher throughput, longer reads
Is it the end of µarrays ? A sub-part of NGS workflows restricted to target-
enrichment ?
Is it the end of forward genetics ? Reverse genetics only ?
Biologists education should integrate NGS knowledge
Is it the end of « Big sequencing centers »?change in their mission ?
Next bottleneck : BioInformatics
- Storing data a problem (SRA soon down ?) AND IT networks speed
FAR too low Very difficult to share NGS data Fridges instead of
disks !?
- Analyzing data a problem great improvements but still a lot of work
remain to be done
Thanks
for your attention !
Technology Summary
Read length Sequencing
Technology
Throughput
(per run)
Cost
(1mbp)*
Sanger ~800bp Sanger 400kbp 500$
454 ~400bp Polony 500Mbp 60$
Solexa/Illumi
na
75bp Polony 20Gbp 2$
SOLiD 75bp Polony 60Gbp 2$
Helicos 30-35bp Single
molecule
25Gbp 1$
*Source: Shendure & Ji, Nat Biotech, 2008
ABI SOLiD Illumina GA 454 Roche FLX
Cost SOLiD 4: $495k
SOLiD PI: $240k
IIe: $470k
IIx: $250k
HiSeq: $690k
Titanium: $500k
Quantity
of Data
per run
SOLiD 4: 100Gb
SOLiD PI: 50Gb
IIe: 20 - 38 Gb
IIx: 50 – 95 Gb
HiSeq: 200Gb +
450 Mb
Run Time 7 Days 4 Days 9 Hours
Pros Low error rate due to
dibase probes
Most widely used
NGS platform.
Requires least DNA
Short run time. Long
reads better for de
novo sequencing
Cons Long run times. Has
been demonstrated
certain reads don’t
match reference
Least multiplexing
capability of the 3.
Poor coverage of AT
rich regions
Expensive reagent
cost. Difficulty
reading
homopolymer
regions
NGS Technology Comparison
Source: The University of Western Ontario
top related