Next Generation Sequencing Miluše Hroudová Laboratory of Genomics and Bioinformatics Institute of Molecular Genetics of the ASCR, v.v.i. The presentation.

Post on 18-Dec-2015

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Next Generation Sequencing

Miluše HroudováLaboratory of Genomics and Bioinformatics

Institute of Molecular Genetics of the ASCR, v.v.i.

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Outline• Introduction to Next Generation Sequencing (NGS)• Material - DNA / RNA (types, characteristics, applications) - genomics x transcriptomics• Technologies - Principles

- Workflow - Parametres

• Data analysis (basic pipeline)• Project example (IMG)• Technology progression

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Basic Terms

• Base-pair - basic building block of double-stranded DNA, unit of DNA segment length (bp)

• Read - continuous sequence produced by sequencer

• Coverage - the number of short reads that overlap each other within a specific genomic region (how many times the particular base or region is read)

• Consensus sequence - idealized sequence in which each position represents the base most often found when many sequences are compared

• Contig - set of overlapping segments (reads) of DNA sequences forming continuous consensus sequence

• Assembly - aligning and merging fragments of DNA sequence (reads, contigs) in order to reconstruct the original sequence

• Scaffold - set of linked non-contiguous series of genomic sequences, consisting of contigs separated by gaps of known length

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Next Generation Sequencing Introduction

• Modern high-throughput DNA sequencing technologies• Massive, parallel, rapid ...• Decreasing price, time, workflow complexity, error rate• Increasing data quantity and quality, read lenght (data storage

capacity), repertoire of bioinformatics tools• Wide range of applications

• Third Generation Sequencing (single molecule, real time, in situ ...)

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Input Material, Target Sequence

DNA • De novo genome seq

• Resequencing (ChIP-Seq)

• Amplicon seq (16S)

• Sequence capture

• Base modification detection

• Genomic variations

=> Genomics chromosomel

eukaryotic

viral

prokaryotic

Genomics• Area of genetics that concerns the sequencing and analysis of an organism’s genetic

information

• DNA sequencing + bioinformatics => sequence, assemble and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism)

Bacterial genome Human genome

Input Material, Target Sequence

RNA• RNA Seq (Whole Transcriptome

Shotgun Seq – WTSS, normalized)

• SNPs detection

• RNA species other than mRNA

• Quantitative seq

(without normalization)

Total RNA

Coding RNA4 % of total

Functional RNA96 % of total

Pre-mRNA(hnRNA)

mRNA

Pre-rRNA Pre-tRNA snRNA snoRNA miRNA siRNA

All organisms

Eukaryotes only

rRNA tRNA

=> TranscriptomicsThe presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

• Study of the transcriptome - the complete set of RNA transcripts produced from the genome, under specific circumstances at particular place and time• Methods: RT PCR, Microarrays, mRNA seq

Transcriptomics

Total RNA mRNA Fragmented mRNA

cDNA librarycDNA Raw data (reads)

polyA mRNA selection

rRNA depletion

Temperature based fragmentation

Reversetranscription

Librarypreparation

Adapter ligationSize selection

Sequencing run Normalized cDNA

Normalization

Optional

mRNA sequencing procedure

DNA sequencing procedure

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

• quality of the starting total RNA - RNA integrity number (RIN)• RIN<7 => unequal read distribution along 5’ and 3’ ends

=> bad sequencing results

Num

ber of reads

RNA Quality

RIN < 7RIN > 9

454 reads distribution

Agilent Bioanalyzer traces

cDNA synthesisTotal RNA (ug)

SMARTer II A Oligo:5’-AAGCAGTGGTATCAACGCAGAGTACGCGGG-3’

Modified CDS Primer5’-AAGCAGTGGTATCAACGCAGAGTTTTTGTTTTTTTCTTTTTTTTTTVN-3’

mRNA with polyA 3’end

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

abundant transcripts rare transcripts

cDNA normalization

TRIMMER cDNA normalization kit (Evrogen) DSN = duplex-specific nuclease

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Sequencing Principles

• Sequencing by Synthesis• Sanger/Dideoxy chain termination (Life Technologies, Applied Biosystems)• Pyrosequencing (Roche/454)• Reversible terminator (Illumina )• Ion proton semiconductor (Life Technologies)• Zero Mode Waveguide (Pacific Biosciences)

• Sequencing by Oligo Ligation Detection• SOLiD (Applied Biosystems)

• Other • Asynchronous virtual terminator chemistry - HeliScope (Helios)

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Actual Sequencing Platforms

• Roche/454 (GS FLX+/GS Junior)• Illumina Genome Analyzer (HiSeq/MiSeq/NextSeq)• Life Technologies (3500 Genetic Analyzer, Ion Torrent Proton/PGM)• Pacific Biosciences (PACBIO RSII)• Applied Biosystems (SOLiD, 3730xl DNA Analyzer )

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Sanger (3500 GA, 3730xl DNA Analyzer)

Sequencing by synthesis

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Oligo Ligation Detection (SOLiD)

Sequencing by ligation

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Reversible Terminator (HiSeq, MiSeq, NextSeq)

Cluster generation on a flow-cell surface

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Reversible Terminator (HiSeq, MiSeq, NextSeq)

Sequencing by synthesis

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Pyrosequencing (GS FLX, GS Junior)

Sequencing by synthesis

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Pyrosequencing (GS FLX, GS Junior)

Sequencing by synthesis

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Sequencing Matrices

Sanger, 96-well, 8 capillaries96 x 600 bp / 24 h

1400 €

Pyrosequencing, 2 regions1,000,000 x 600 bp / 20 h

5500 €

Revers. terminator, MiSeq10,000,000 x 250 bp / 40 h

1150 €

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

General Workflow

• Nucleic acid isolation/purification • RNA – selection of particular RNA species, cDNA synthesis• DNA – fragmentation, size selection (shotgun x paired end)• Seq library preparation (platform specific adaptors ligation, indexes)• Amplification of seq library (DNA-binding beads and other carriers)• Sequencing run set up • Image processing (images => sequence + quality information)• Data analysis (assembly, mapping, annotation ...)

• Special tricks for amplicons, SeqCap, ChIP-Seq, small RNAs ...

user

user

serv

ice

serv

ice

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Pyrosequencing workflow

Library preparation:Fragmentation Adaptor ligation

Emulsion PCR amplification:

Bead deposition onto PicoTiter Plate (PTP):

Paired-end x Mate-pair• Paired-end – sequencing from both fragment ends (< 1 kb)• Mate-pair – longer (3-20 kb) molecules circularized via internal adapter

x

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Mate-pair types

• Mate-pair – longer (3-20 kb) molecules circularized via internal adapter

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Parametres Comparison

Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.

PacBio RSII

Sequencing bysynthesis

> 4000 bp

99,999%

30 Min – 3 Hours

1.6 GB

Read length, fast,no amplification,real time record

0.06 M

Low throughput,low accuracy

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Parametres Comparison

Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Parametres Comparison of Benchtop Variants

Junior

700 bp

70 Mb

18 hours

2 days

Pyrosequencing

Minimize hand on time,increase emPCR reproducibility

On/Off instrument

µg

Liu et al. 2012. Comparison of Next-Generation Sequencing Systems. Journal of Biomedicine and Biotechnology. 251364.

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Applications and Suitable Seq Type

• de novo DNA/RNA seq – Illumina, Roche/454 (PE), PacBio• Resequencing – SOLiD, Illumina• SNPs detection – Roche/454, PacBio (x InDels variation – Illumina, SOLiD)• Sequence capture - Illumina• Sanger - low-coverage sequencing of individual positions and regions (e.g.,

diagnostic genotyping) or the sequencing of virus- and phage-sized genomes• Ion Torrent – short amplicons• SOLiD - quantitative applications, small RNAs, epigenomics• HeliScope – quantitative applications

• Combination of methods

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Data Analysis, Assembly, Annotation

• technology compatible software (user friendly, inefective) • general, free access software (search for optimal tool)• user developed (lack of qualified bioinformaticians)

• combination of different platforms data x problems with assemblers • platform specific errors, incompatible software parametres• multiple data filtering procedures

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Data Analysis, Assembly, Annotation

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Machine/Service Availability

• IMG – Roche/454 GS FLX+ (full run including library prep 5500 €/0,7GB)- Illumina NextSeq (next year? )

• Illumina MiSeq – IEM AS CR, GeneCore EMBL (1150 €/ 10 GB)• Illumina – GeneCore EMBL (HiSeq lane 100 bp PE 2500 €/200 GB) • Ion Torrent - GeneCore EMBL, TU Liberec• PacBio –Netherlands (Macrogen), Germany, Switzerland• Beijing Genomics Institute (BGI, China) – Illumina HiSeq 2000

- Roche GS FLX+ - SOLiD 4

- Ion Torrent - Sanger 3730xl

DNA Analyzer

Our Sequencing Projects

De novo genome sequencing (bacteria, protozoa, platyhelminthes, plants ...)

Metagenomics(simple bacterial consortia x complex environmental samples)

Transcriptomics(protozoa, cnidarians, insects, human cancer research ...)

Amplicon seq(environmental samples, 16S rDNA genes)

Sequence capture(human cancer research, animal population genetics ...)

GS FLX+, Roche 454 HiSeq2000/MiSeq, Illumina

Beckman CEQ 2000XL- minor sequencing analyses

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Transcriptomics (Evo-Devo Studies)

Hroudova et al. 2012. PLoS ONE, 7(4): e36420

Craspedacusta sowerbyiSix and Pou genes

early evolution

De Novo Genome Seq

Achromobacter xylosoxidans• isolated from biphenyl contaminated soil• 2-chlorobenzoate and 2,5-dichlorobenzoate degrader

Strnad et al. 2011. J Bacteriol 193: 791-792

Metagenomics

total DNA

ecosystem

DNA fragments

sequencing

analysis

F. myxofacies

At. ferroxidans

others

Metagenomic Research Examples

Lean vs. obese phenotype

microbiome transplantation

Functional profiling and comparison of nine biomes

Cow rumen and biotechnology:Fishing out genes for cellulose biodegradation

Amplicon Sequencing

• 16S rDNA genes• bacterial consortia actively degrading biphenyl, benzoate, and naphthalene in a long-term contaminated soil

Uhlik et al. 2012. PLoS ONE, 7(7): e40653

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Sequencing Hot Today and Near Future

• Single-Molecule Real-Time seq – SMRT Pac Bio (without amplification necessary for signal detection)

• Single cell DNA/RNA seq based on micro/nanofluidics technology (without WGA based on MDA - Φ29 DNA polymerase)

• Nanopores – Oxford Nanopores Technologies (reduced enzymatic steps, electric current based detection)

• Silicon based nanopores - IBM

• Human genome (30x) under 1000 $ already announced by Illumina (HiSeq X Ten)

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Sequencing Hot Today and Near Future

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Before You Start Planning Seq Experiment

• sufficient sample source • targeted application/platform • computational capacity (storage, back up, operations)• bioinformatics support

Take-away message

• NGS - high-throughput, massive, parallel, rapid DNA sequencing

• Third generation – single molecule, real time, reduced chemistry• Basic NGS principles – synthesis, ligation• Basic workflow

sample - fragmentation - library prep - seq run - data analysis• Applications – de novo seq, reseq, amplicons, SeqCap, RNA seq

(quantitative expression analysis x normalized cDNA seq)• Choose the right one application and prepare sample appropriately• Basic data analysis pipeline

image acquisition, quality metrics - filtering - contig building - annotation

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Acknowledgement

• Laboratory of Transcriptional Regulation, IMG (Dr. Zbyněk Kozmik)• Core facility of Genomics and Bioinformatics, IMG (Mgr. Šárka Kocourková, Mgr. Marcela Vedralová)• GeneCore, EMBL, Heidelberg (Dr. Vladimír Beneš)• Roche CR (Diagnostic Division), Genetica CR (Illumina Division)

Laboratory of Genomics and Bioinformatics

IMG AS CR, Prague

Čestmír VlčekVáclav Pačes

Jan PačesHynek StrnadMichal Kolář

Jakub RídlŠárka Pinkasová

Thank you for your attention!

The presentation is supported from the project OP EC CZ.1.07/2.3.00/30.0027 “Founding the Centre of Transgenic Technologies”

Miluše HroudováInstitute of Molecular Genetics of the ASCR, v.v.i.

hroudova@img.cas.cz

top related