Next Generation Sequencing Miluše Hroudová Laboratory of Genomics and Bioinformatics Institute of Molecular Genetics of the ASCR, v.v.i. The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
43
Embed
Next Generation Sequencing Miluše Hroudová Laboratory of Genomics and Bioinformatics Institute of Molecular Genetics of the ASCR, v.v.i. The presentation.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Next Generation Sequencing
Miluše HroudováLaboratory of Genomics and Bioinformatics
Institute of Molecular Genetics of the ASCR, v.v.i.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Outline• Introduction to Next Generation Sequencing (NGS)• Material - DNA / RNA (types, characteristics, applications) - genomics x transcriptomics• Technologies - Principles
- Workflow - Parametres
• Data analysis (basic pipeline)• Project example (IMG)• Technology progression
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Basic Terms
• Base-pair - basic building block of double-stranded DNA, unit of DNA segment length (bp)
• Read - continuous sequence produced by sequencer
• Coverage - the number of short reads that overlap each other within a specific genomic region (how many times the particular base or region is read)
• Consensus sequence - idealized sequence in which each position represents the base most often found when many sequences are compared
• Contig - set of overlapping segments (reads) of DNA sequences forming continuous consensus sequence
• Assembly - aligning and merging fragments of DNA sequence (reads, contigs) in order to reconstruct the original sequence
• Scaffold - set of linked non-contiguous series of genomic sequences, consisting of contigs separated by gaps of known length
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Next Generation Sequencing Introduction
• Modern high-throughput DNA sequencing technologies• Massive, parallel, rapid ...• Decreasing price, time, workflow complexity, error rate• Increasing data quantity and quality, read lenght (data storage
capacity), repertoire of bioinformatics tools• Wide range of applications
• Third Generation Sequencing (single molecule, real time, in situ ...)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Input Material, Target Sequence
DNA • De novo genome seq
• Resequencing (ChIP-Seq)
• Amplicon seq (16S)
• Sequence capture
• Base modification detection
• Genomic variations
=> Genomics chromosomel
eukaryotic
viral
prokaryotic
Genomics• Area of genetics that concerns the sequencing and analysis of an organism’s genetic
information
• DNA sequencing + bioinformatics => sequence, assemble and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism)
Bacterial genome Human genome
Input Material, Target Sequence
RNA• RNA Seq (Whole Transcriptome
Shotgun Seq – WTSS, normalized)
• SNPs detection
• RNA species other than mRNA
• Quantitative seq
(without normalization)
Total RNA
Coding RNA4 % of total
Functional RNA96 % of total
Pre-mRNA(hnRNA)
mRNA
Pre-rRNA Pre-tRNA snRNA snoRNA miRNA siRNA
All organisms
Eukaryotes only
rRNA tRNA
=> TranscriptomicsThe presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
• Study of the transcriptome - the complete set of RNA transcripts produced from the genome, under specific circumstances at particular place and time• Methods: RT PCR, Microarrays, mRNA seq
Transcriptomics
Total RNA mRNA Fragmented mRNA
cDNA librarycDNA Raw data (reads)
polyA mRNA selection
rRNA depletion
Temperature based fragmentation
Reversetranscription
Librarypreparation
Adapter ligationSize selection
Sequencing run Normalized cDNA
Normalization
Optional
mRNA sequencing procedure
DNA sequencing procedure
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
• quality of the starting total RNA - RNA integrity number (RIN)• RIN<7 => unequal read distribution along 5’ and 3’ ends
=> bad sequencing results
Num
ber of reads
RNA Quality
RIN < 7RIN > 9
454 reads distribution
Agilent Bioanalyzer traces
cDNA synthesisTotal RNA (ug)
SMARTer II A Oligo:5’-AAGCAGTGGTATCAACGCAGAGTACGCGGG-3’
Paired-end x Mate-pair• Paired-end – sequencing from both fragment ends (< 1 kb)• Mate-pair – longer (3-20 kb) molecules circularized via internal adapter
x
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Parametres Comparison
Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.
PacBio RSII
Sequencing bysynthesis
> 4000 bp
99,999%
30 Min – 3 Hours
1.6 GB
Read length, fast,no amplification,real time record
0.06 M
Low throughput,low accuracy
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Parametres Comparison
Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Parametres Comparison of Benchtop Variants
Junior
700 bp
70 Mb
18 hours
2 days
Pyrosequencing
Minimize hand on time,increase emPCR reproducibility
On/Off instrument
µg
Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Applications and Suitable Seq Type
• de novo DNA/RNA seq – Illumina, Roche/454 (PE), PacBio• Resequencing – SOLiD, Illumina• SNPs detection – Roche/454, PacBio (x InDels variation – Illumina, SOLiD)• Sequence capture - Illumina• Sanger - low-coverage sequencing of individual positions and regions (e.g.,
diagnostic genotyping) or the sequencing of virus- and phage-sized genomes• Ion Torrent – short amplicons• SOLiD - quantitative applications, small RNAs, epigenomics• HeliScope – quantitative applications
• Combination of methods
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Data Analysis, Assembly, Annotation
• technology compatible software (user friendly, inefective) • general, free access software (search for optimal tool)• user developed (lack of qualified bioinformaticians)
• combination of different platforms data x problems with assemblers • platform specific errors, incompatible software parametres• multiple data filtering procedures
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Data Analysis, Assembly, Annotation
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Machine/Service Availability
• IMG – Roche/454 GS FLX+ (full run including library prep 5500 €/0,7GB)- Illumina NextSeq (next year? )
• Illumina MiSeq – IEM AS CR, GeneCore EMBL (1150 €/ 10 GB)• Illumina – GeneCore EMBL (HiSeq lane 100 bp PE 2500 €/200 GB) • Ion Torrent - GeneCore EMBL, TU Liberec• PacBio –Netherlands (Macrogen), Germany, Switzerland• Beijing Genomics Institute (BGI, China) – Illumina HiSeq 2000
- Roche GS FLX+ - SOLiD 4
- Ion Torrent - Sanger 3730xl
DNA Analyzer
Our Sequencing Projects
De novo genome sequencing (bacteria, protozoa, platyhelminthes, plants ...)
Metagenomics(simple bacterial consortia x complex environmental samples)
Transcriptomics(protozoa, cnidarians, insects, human cancer research ...)
Sequence capture(human cancer research, animal population genetics ...)
GS FLX+, Roche 454 HiSeq2000/MiSeq, Illumina
Beckman CEQ 2000XL- minor sequencing analyses
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Transcriptomics (Evo-Devo Studies)
Hroudova et al. 2012. PLoS ONE, 7(4): e36420
Craspedacusta sowerbyiSix and Pou genes
early evolution
De Novo Genome Seq
Achromobacter xylosoxidans• isolated from biphenyl contaminated soil• 2-chlorobenzoate and 2,5-dichlorobenzoate degrader
Strnad et al. 2011. J Bacteriol 193: 791-792
Metagenomics
total DNA
ecosystem
DNA fragments
sequencing
analysis
F. myxofacies
At. ferroxidans
others
Metagenomic Research Examples
Lean vs. obese phenotype
microbiome transplantation
Functional profiling and comparison of nine biomes
Cow rumen and biotechnology:Fishing out genes for cellulose biodegradation
Amplicon Sequencing
• 16S rDNA genes• bacterial consortia actively degrading biphenyl, benzoate, and naphthalene in a long-term contaminated soil
Uhlik et al. 2012. PLoS ONE, 7(7): e40653
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Sequencing Hot Today and Near Future
• Single-Molecule Real-Time seq – SMRT Pac Bio (without amplification necessary for signal detection)
• Single cell DNA/RNA seq based on micro/nanofluidics technology (without WGA based on MDA - Φ29 DNA polymerase)
• Nanopores – Oxford Nanopores Technologies (reduced enzymatic steps, electric current based detection)
• Silicon based nanopores - IBM
• Human genome (30x) under 1000 $ already announced by Illumina (HiSeq X Ten)
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Sequencing Hot Today and Near Future
The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”
Before You Start Planning Seq Experiment
• sufficient sample source • targeted application/platform • computational capacity (storage, back up, operations)• bioinformatics support
Take-away message
• NGS - high-throughput, massive, parallel, rapid DNA sequencing
• Third generation – single molecule, real time, reduced chemistry• Basic NGS principles – synthesis, ligation• Basic workflow
sample - fragmentation - library prep - seq run - data analysis• Applications – de novo seq, reseq, amplicons, SeqCap, RNA seq
(quantitative expression analysis x normalized cDNA seq)• Choose the right one application and prepare sample appropriately• Basic data analysis pipeline