Top Banner
NGS Data Generation Dr Laura Emery
24

NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Dec 23, 2015

Download

Documents

Letitia Briggs
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

NGS Data Generation

Dr Laura Emery

Page 2: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Overview

• The NGS data explosion

• Sequencing technologies

• An example of a sequencing workflow

• Bioinformatics challenges

Page 3: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

The NGS data explosion

Page 4: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

EBI biological data

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

TB

of

da

ta

Page 5: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Bottlenecks to biological research

Source: Qiagen

Page 6: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

NGS Technologies

• A variety of platforms available

• Differ in:

• Library preparation

• Sequencing chemistry

Page 7: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Comparison of NGS Technologies

Library preparation

Sequencing chemistry

Features

Roche 454 Emulsion PCR

Pyrosequencing

Longer read length, only available until 2016

Illumina HiSeq

Solid phase amplification

Reversible terminator

Best output to cost ratio, low error rates

Applied Biosciences SOLiD

Emulsion PCR

Sequencing by ligation

Highest accuracy

Pacific Biosciences RS II

Single molecule

Real time Very long read lengths, highest error rates

Page 8: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Example: Illumina NGS workflow

4. Data Analyses

3. Sequencing

2. Hybridisation and Amplification

1. Library Preparation

Page 9: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

1. Library preparation

• RNA extraction

• Fragmentation and size selection

• cDNA synthesis

• Adapter ligation

RNA only

RNA only

Page 10: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

1. Library preparation

• Alternative library preparation methods:

• Mate pair • Targeted • Strand specific

Page 11: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

1. Library preparation

• Multiplexing (optional)

Page 12: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Example: Illumina NGS workflow

4. Data Analyses

3. Sequencing

2. Hybridisation and Amplification

1. Library Preparation

Page 13: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

2. Hybridisation and Amplification

Page 14: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Example: Illumina NGS workflow

4. Data Analyses

3. Sequencing

2. Hybridisation and Amplification

1. Library Preparation

Page 15: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

3. Sequencing

Errors!

Page 16: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

3. Sequencing (Paired-end)

Page 17: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Example: Illumina NGS workflow

4. Data Analyses

3. Sequencing

2. Hybridisation and Amplification

1. Library Preparation

Page 18: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

4. Data analyses: generalised pipeline

Data submission to public repository

Downstream analyses

Alignment and/or assembly

Filtering

QC

FASTQ

Otherdata

Page 19: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Bioinformatics challenges

• Library preparation biases

• Random hexamer priming

• GC content

• Data storage

• Data analysis

• Errors

• Mapping/assembly uncertainty

Page 20: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Bioinformatics challenges

• Library preparation biases

• Random hexamer priming

• GC content

• Data storage

• Data analysis

• Errors

• Mapping/assembly uncertainty

Sequence bias in the first 13 nucleotidesMethods for correction: Cufflinks, mmseq

Page 21: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Bioinformatics challenges

• Library preparation biases

• Random hexamer priming

• GC content

• Data storage

• Data analysis

• Errors

• Mapping/assembly uncertainty

GC-rich or AT-rich fragments have been found to be over/underrepresentedMethods for correction: EDASeq, CG correct

Page 22: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Bioinformatics challenges

• Library preparation biases

• Random hexamer priming

• GC content

• Data storage

• Data analysis

• Errors

• Mapping/assembly uncertainty

Page 23: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

Conclusions

• NGS technologies provide us with new opportunities but new challenges

• You will learn more about overcoming these challenges during this course

• Furthermore, other omics technologies will be introduced

Page 24: NGS Data Generation Dr Laura Emery. Overview The NGS data explosion Sequencing technologies An example of a sequencing workflow Bioinformatics challenges.

So over to Bernardo…