Next-Generation sequencing (NGS) technologies – overview NGS targeted re-sequencing – fishing out the regions of interest NGS workflow: data collection.

Post on 15-Dec-2015

230 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

Transcript

Next–generation DNA sequencing technologies –

theory & practice

Next-Generation sequencing (NGS) technologies – overview

NGS targeted re-sequencing – fishing out the regions of interest

NGS workflow: data collection and processing – the exome sequencing pipeline

Outline

PART I: NGS technologiesNext-Generation sequencing (NGS) technologies – overview

The automated Sanger method is considered as a ‘first-generation’ technology, and newer methods are referred to as next-generation sequencing (NGS).

DNA Sequencing – the next generation

1953 Discovery of DNA double helix structure 1977

◦ A Maxam and W Gilbert "DNA seq by chemical degradation"◦ F Sanger"DNA sequencing with chain-terminating inhibitors"

1984 DNA sequence of the Epstein-Barr virus, 170 kb 1987 Applied Biosystems - first automated sequencer 1991 Sequencing of human genome in Venter's lab 1996 P. Nyrén and M Ronaghi - pyrosequencing 2001 A draft sequence of the human genome 2003 human genome completed 2004 454 Life Sciences markets first NGS machine

Landmarks in DNA sequencing

Random genome sequencing• 25 Mb• 300k reads• 110bp

Sanger sequencing• Targeted • 700-1000 bp

DNA Sequencing – the next generation

The newer technologies constitute various strategies that rely on a combination of ◦ Library/template preparation◦ Sequencing and imaging

DNA Sequencing – the next generation

Commercially available technologies◦ Roche – 454

GSFLX titanium Junior

◦ Illumina HiSeq2000 MySeq

◦ Life – SOLiD 5500xl Ion torrent

◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

DNA Sequencing – the next generation

DNA Sequencing – the next generation

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Produce a non-biased source of nucleic acid material from the genome

Current methods:◦ randomly breaking genomic DNA into smaller

sizes◦ Ligate adaptors◦ attach or immobilize the template to a solid

surface or support◦ the spatially separated template sites allows

thousands to billions of sequencing reactions to be performed simultaneously

Template preparation

Clonal amplification◦ Roche – 454◦ Illumina – HiSeq◦ Life – SOLiD

Single molecule sequencing◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

Template preparation

In solution – emulsion PCR (emPCR)◦ Roche – 454◦ Life – SOLiD

Solid phase – Bridge PCR◦ Illumina – HiSeq

Template preparation: Clonal amplification

Template preparation: Clonal amplification - emPCR

Sequencing

SOLiD 454

Pyrosequencing

Picotitre plate Pyrosequencing

Pyrosequencing

Sequencing by ligation

Sequencing by ligation

Sequencing by ligation

Template preparation: Clonal amplification – Bridge PCR

Template preparation: Single molecule templates

Heliscope BioPac

HiSeq Heliscope

The major advance offered by NGS is the ability to cheaply produce an enormous volume of data

The arrival of NGS technologies in the marketplace has changed the way we think about scientific approaches in basic, applied and clinical research

DNA Sequencing – the next generation

PART II: NGS targeted resequencing

fishing out the regions of interest

The beginning

Random genome

sequencing

??? ??? Sanger sequencing• Targeted • 700-1000 bp

Library/template preparation Library enrichment for target Sequencing and imaging

DNA Sequencing – the next generation

Target enrichment strategies

Random genome

sequencing

Hybrid Capture

PCR based Sanger sequencing

Target enrichment strategies

Target enrichment strategies

Target enrichment strategies

Target enrichment strategies: MIP

Hybrid Capture

In solution• Agilent• Nimblegen• ...

Solid phase• Agilent• Nimblegen• Febit• ...

Hybrid Capture

In solution• Relatively cheap• High throughput is possible• Small amounts of DNA

sufficient

Solid phase• Straightforward method• Flexible• Higher amounts of DNA

Target enrichment strategies

PCR based approaches

• Uniplex• Multiplex• Fluidigm• Raindance• Multiplicon

• Longrange PCR products• Raindance

PCR based approaches: Raindance

PCR based approaches: Fluidigm• 48.48 Access Array

PCR based approaches: Fluidigm• 48.48 Access Array

PCR based approaches: Fluidigm• 48.48 Access Array

Target enrichment strategies

PART III: NGS workflow

data collection and processing – the exome sequencing pipeline

The human genome◦ Genome = 3Gb◦ Exome = 30Mb◦ 180 000 exons

Protein coding genes ◦ constitute only approximately 1% of the human

genome ◦ It is estimated that 85% of the mutations with

large effects on disease-related traits can be found in exons or splice sites

Whole Exome Sequencing

gDNA3 Gb

Exome 38Mb NGS

Exome sequencing

1/01/2010 1/08/2010 1/01/2011

1100860

300

5900

2600

1000

7000

3460

1300

exome capture Seq - 2.5Gbases total cost

The past, present & future

HiSeq specifications:◦ 2 flow cells◦ 16 lanes (8 per flow cell)◦ 200-300 Gbases per flow cell◦ 10 days for a single run

Exome throughput◦ 96 @ 60x coverage per run◦ 3000 @ 60x coverage per year

Exome sequencing capacity

Data processing workflow

Data formatting & QC

Mapping & QC

Variant calling

Variant annotation

Variant filtering/comparison

Data processing

DATA STORAGEDATA GENERATION DATA PROCESSING

REPORTING &

VALIDATION

RESULTS

INTERPRETATION

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

NGS data processing: overview

1

•Mapping

2

•Duplicate marking

3

•Local realignment

4

•Base quality recalibration

5

•Analysis-ready mapped reads

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

QC NGS

Mapping

QC HC

DATA PROCESSING

QC NGS

Mapping

QC HC

DATA PROCESSING

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

SNPs vs Indels

0

200000

400000

600000

800000

1000000

1200000

INDELSNP

exonic vs non-exonic

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

stopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionnon-codingframeshift insertionframeshift deletion

Exonic

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

synonymous SNVstoploss SNVstopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Exonic

0

50

100

150

200

250

300

350

400

450

500

stoploss SNVstopgain SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

REPORTING &

VALIDATION

RESULTSValidated variants in candidate

genes

INTERPRETATION

top related