Top Banner
Next–generation DNA sequencing technologies – theory & practice
73

Next-Generation sequencing (NGS) technologies – overview NGS targeted re-sequencing – fishing out the regions of interest NGS workflow: data collection.

Dec 15, 2015

Download

Documents

Malik Noyce
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Next–generation DNA sequencing technologies –

theory & practice

Page 2: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Next-Generation sequencing (NGS) technologies – overview

NGS targeted re-sequencing – fishing out the regions of interest

NGS workflow: data collection and processing – the exome sequencing pipeline

Outline

Page 3: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PART I: NGS technologiesNext-Generation sequencing (NGS) technologies – overview

Page 4: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

The automated Sanger method is considered as a ‘first-generation’ technology, and newer methods are referred to as next-generation sequencing (NGS).

DNA Sequencing – the next generation

Page 5: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

1953 Discovery of DNA double helix structure 1977

◦ A Maxam and W Gilbert "DNA seq by chemical degradation"◦ F Sanger"DNA sequencing with chain-terminating inhibitors"

1984 DNA sequence of the Epstein-Barr virus, 170 kb 1987 Applied Biosystems - first automated sequencer 1991 Sequencing of human genome in Venter's lab 1996 P. Nyrén and M Ronaghi - pyrosequencing 2001 A draft sequence of the human genome 2003 human genome completed 2004 454 Life Sciences markets first NGS machine

Landmarks in DNA sequencing

Page 6: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 7: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Random genome sequencing• 25 Mb• 300k reads• 110bp

Sanger sequencing• Targeted • 700-1000 bp

DNA Sequencing – the next generation

Page 8: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

The newer technologies constitute various strategies that rely on a combination of ◦ Library/template preparation◦ Sequencing and imaging

DNA Sequencing – the next generation

Page 9: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Commercially available technologies◦ Roche – 454

GSFLX titanium Junior

◦ Illumina HiSeq2000 MySeq

◦ Life – SOLiD 5500xl Ion torrent

◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

DNA Sequencing – the next generation

Page 10: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

DNA Sequencing – the next generation

Page 11: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Page 12: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Produce a non-biased source of nucleic acid material from the genome

Template preparation: STEP1

Page 13: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Produce a non-biased source of nucleic acid material from the genome

Current methods:◦ randomly breaking genomic DNA into smaller

sizes◦ Ligate adaptors◦ attach or immobilize the template to a solid

surface or support◦ the spatially separated template sites allows

thousands to billions of sequencing reactions to be performed simultaneously

Template preparation

Page 14: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Clonal amplification◦ Roche – 454◦ Illumina – HiSeq◦ Life – SOLiD

Single molecule sequencing◦ Helicos BioSciences – HeliScope◦ Pacific Biosciences – PacBio RS

Template preparation

Page 15: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

In solution – emulsion PCR (emPCR)◦ Roche – 454◦ Life – SOLiD

Solid phase – Bridge PCR◦ Illumina – HiSeq

Template preparation: Clonal amplification

Page 16: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Template preparation: Clonal amplification - emPCR

Page 17: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequencing

SOLiD 454

Page 18: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Pyrosequencing

Picotitre plate Pyrosequencing

Page 19: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Pyrosequencing

Page 20: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequencing by ligation

Page 21: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequencing by ligation

Page 22: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequencing by ligation

Page 23: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Template preparation: Clonal amplification – Bridge PCR

Page 24: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Template preparation: Single molecule templates

Heliscope BioPac

Page 25: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

HiSeq Heliscope

Page 26: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

The major advance offered by NGS is the ability to cheaply produce an enormous volume of data

The arrival of NGS technologies in the marketplace has changed the way we think about scientific approaches in basic, applied and clinical research

DNA Sequencing – the next generation

Page 27: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PART II: NGS targeted resequencing

fishing out the regions of interest

Page 28: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

The beginning

Random genome

sequencing

??? ??? Sanger sequencing• Targeted • 700-1000 bp

Page 29: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Library/template preparation Library enrichment for target Sequencing and imaging

DNA Sequencing – the next generation

Page 30: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies

Random genome

sequencing

Hybrid Capture

PCR based Sanger sequencing

Page 31: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies

Page 32: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies

Page 33: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies

Page 34: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies: MIP

Page 35: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Hybrid Capture

In solution• Agilent• Nimblegen• ...

Solid phase• Agilent• Nimblegen• Febit• ...

Page 36: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Hybrid Capture

In solution• Relatively cheap• High throughput is possible• Small amounts of DNA

sufficient

Solid phase• Straightforward method• Flexible• Higher amounts of DNA

Page 37: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies

Page 38: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PCR based approaches

• Uniplex• Multiplex• Fluidigm• Raindance• Multiplicon

• Longrange PCR products• Raindance

Page 39: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PCR based approaches: Raindance

Page 40: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PCR based approaches: Fluidigm• 48.48 Access Array

Page 41: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PCR based approaches: Fluidigm• 48.48 Access Array

Page 42: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PCR based approaches: Fluidigm• 48.48 Access Array

Page 43: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Target enrichment strategies

Page 44: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

PART III: NGS workflow

data collection and processing – the exome sequencing pipeline

Page 45: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

The human genome◦ Genome = 3Gb◦ Exome = 30Mb◦ 180 000 exons

Protein coding genes ◦ constitute only approximately 1% of the human

genome ◦ It is estimated that 85% of the mutations with

large effects on disease-related traits can be found in exons or splice sites

Whole Exome Sequencing

Page 46: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

gDNA3 Gb

Exome 38Mb NGS

Exome sequencing

Page 47: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

1/01/2010 1/08/2010 1/01/2011

1100860

300

5900

2600

1000

7000

3460

1300

exome capture Seq - 2.5Gbases total cost

The past, present & future

Page 48: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

HiSeq specifications:◦ 2 flow cells◦ 16 lanes (8 per flow cell)◦ 200-300 Gbases per flow cell◦ 10 days for a single run

Exome throughput◦ 96 @ 60x coverage per run◦ 3000 @ 60x coverage per year

Exome sequencing capacity

Page 49: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Data processing workflow

Data formatting & QC

Mapping & QC

Variant calling

Variant annotation

Variant filtering/comparison

Page 50: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Data processing

Page 51: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 52: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

DATA STORAGEDATA GENERATION DATA PROCESSING

REPORTING &

VALIDATION

RESULTS

INTERPRETATION

Page 53: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Page 54: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Page 55: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Prepare

sample

library

Perfom

exome

capture

Perform

sequencin

g

DATA GENERATION

Page 56: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

Page 57: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 58: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

NGS data processing: overview

1

•Mapping

2

•Duplicate marking

3

•Local realignment

4

•Base quality recalibration

5

•Analysis-ready mapped reads

Page 59: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequence Data10-15 Gb / exome

DATA STORAGEDATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Page 60: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

QC NGS

Mapping

QC HC

DATA PROCESSING

Page 61: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

QC NGS

Mapping

QC HC

DATA PROCESSING

Page 62: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Page 63: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 64: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 65: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 66: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Page 67: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

SNPs vs Indels

0

200000

400000

600000

800000

1000000

1200000

INDELSNP

Page 68: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

exonic vs non-exonic

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

1000000

stopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionnon-codingframeshift insertionframeshift deletion

Page 69: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Exonic

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

synonymous SNVstoploss SNVstopgain SNVnonsynonymous SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Page 70: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Exonic

0

50

100

150

200

250

300

350

400

450

500

stoploss SNVstopgain SNVnonframeshift insertionnonframeshift deletionframeshift insertionframeshift deletion

Page 71: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

Page 72: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.
Page 73: Next-Generation sequencing (NGS) technologies – overview  NGS targeted re-sequencing – fishing out the regions of interest  NGS workflow: data collection.

Sequence Data10-15 Gb / exome

DATA STORAGE

Mapping results5 Gb / exome

Variant Calls100Mb / exome

DATA GENERATION DATA PROCESSING

Image processingBase calling

QC sequencingMapping

sequencesQC capture exp

Variant CallingVariant Annotation

Database knownVariants Public &

PrivateVariant Filtering

REPORTING &

VALIDATION

RESULTSValidated variants in candidate

genes

INTERPRETATION