Top Banner
DM Church Last Updated: 7 May 20 Intro to Next Generation Sequencing
46

DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

Jan 03, 2016

Download

Documents

Garey Nichols
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Intro to Next Generation Sequencing

Page 2: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

http://omicsmaps.com/ Nick Loman and James Hadfield

Page 3: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Page 4: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Koboldt et al., 2010 (Figure 3)

Page 5: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Page 6: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Bench work to build libraries and

sequence

Clean up and QA reads

Alignments to Genome or

Transcriptome

Analysis of Alignments

Page 7: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Koboldt et al., 2010

Sample Contamination

Library chimeras

Sample mix-upsTumor-normal

switches

Run quality

Page 8: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Koboldt et al, (Fig 4A)

Page 9: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Page 10: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Chor et al., 2009

Page 11: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

CCL Bio

Page 12: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

GCTACGGCATTCAGGCATCAGGCATTAGCAGGGCATTCAGGGATCAGGCATTAGC->

<-CATGGCATTCAGGGATCAGGCATT<-GCCATGGCATTCAGGGATCAGGC

CATTCAGGGATCAGGCATTAGCAG->

GGCATTCAGGGATCAGGCATTAGC->CATTCAGGGATCAGGCATTAGCAG->

GGCATTCAGGGATCAGGCATT->

<-GGATCAGGCATTAGCAG<-GATCAGGCATTAGCAG<-GGATCAGGCATTAGCAG

Page 13: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

High Coverage: qualities may not be needed

Page 14: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Low Coverage: qualities are important

Page 15: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Custodia-Lora et al., 2003

Page 16: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

FASTQ Example

FASTQ example from: Cock et al. (2009). Nuc Acids Res 38:1767-1771.

For analysis, it may be necessary to convert to the Sanger form of FASTQ…For example,

Illumina stores quality scores ranging from 0-62;Sanger quality scores range from 0-93.

Solexa quality scores have to be converted to PHRED quality scores.

Page 17: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

SAM (Sequence Alignment/Map)

• It may not be necessary to align reads from scratch…you can instead use existing alignments in SAM format– SAM is the output of aligners that map reads to a

reference genome– Tab delimited w/ header section and alignment

section• Header sections begin with @ (are optional)• Alignment section has 11 mandatory fields

– BAM is the binary format of SAM

http://samtools.sourceforge.net/

Page 18: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012http://samtools.sourceforge.net/SAM1.pdf

Mandatory Alignment Fields

Page 19: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012http://samtools.sourceforge.net/SAM1.pdf

Alignment Examples

Alignments in SAM format

Page 20: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

chr1 86114265 86116346 nsv433165chr2 1841774 1846089 nsv433166chr16 2950446 2955264 nsv433167chr17 14350387 14351933 nsv433168chr17 32831694 32832761 nsv433169chr17 32831694 32832761 nsv433170chr18 61880550 61881930 nsv433171

chr1 16759829 16778548 chr1:21667704 270866 -chr1 16763194 16784844 chr1:146691804 407277 +chr1 16763194 16784844 chr1:144004664 408925 -chr1 16763194 16779513 chr1:142857141 291416 -chr1 16763194 16779513 chr1:143522082 293473 -chr1 16763194 16778548 chr1:146844175 284555 -chr1 16763194 16778548 chr1:147006260 284948 -chr1 16763411 16784844 chr1:144747517 405362 +

Valid BED files

Page 21: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

GTF

Page 22: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

##gff-version 3##gvf-version 1.02##species http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=10090##genome-build NCBI MGSCv36##assembly-name MGSCv36##assembly-accession GCF_000001635.15##file-date 2011-11-18# Study_accession: Combined studies on MGSCv36# Display_name: Combined studies on MGSCv36# Study_description: Combined studies on MGSCv36chr1 dbVar copy_number_variation 90044442 90114410 . . .

ID=nsv433533;Name=nsv433533;Start_range=.,90044442;End_range=90114410,.chr4 dbVar copy_number_variation 121483931 121646639 .

. .ID=nsv433534;Name=nsv433534;Start_range=.,121483931;End_range=121646639,.chr9 dbVar copy_number_variation 109128634 109146964 .

. .ID=nsv433535;Name=nsv433535;Start_range=.,109128634;End_range=109146964,.chr17 dbVar copy_number_variation 30240627 30614866 . . .

ID=nsv433536;Name=nsv433536;Start_range=.,30240627;End_range=30614866,.chr17 dbVar copy_number_variation 30983722 31036099 . . .

ID=nsv433537;Name=nsv433537;Start_range=.,30983722;End_range=31036099,.chr17 dbVar copy_number_variation 34907088 34962504 . . .

ID=nsv433538;Name=nsv433538;Start_range=.,34907088;End_range=34962504,.

GVF format

Page 23: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

http://www.ncbi.nlm.nih.gov/dbvar

http://www.ebi.uk/dgva

http://www.ncbi.nlm.nih.gov/snp

Derived data

Page 24: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Derived data

Page 25: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Actual data

Page 26: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012Oct-00 Feb-02 Jun-03 Nov-04 Mar-06 Aug-07 Dec-08 May-10 Sep-11

100000000

1000000000

10000000000

100000000000

1000000000000

10000000000000

100000000000000

1000000000000000 Trace and SRA Holdings

TraceArchive Bases

SRA Bases

SRA Bytes

Getting exponential growth under control

Page 27: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Trace Organization

seq1

seq2

FASTAQualityChromatogramExperimental infoSample

FASTAQualityChromatogramExperimental infoSample

SRA Organization

Experiments

Samples

Sequences and Qualities

Page 28: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012Feb-08 Sep-08 Mar-09 Oct-09 May-10 Nov-10 Jun-11 Dec-110

1

2

3

4

5

6

7

8

9

10

Bytes per base in SRA

CummulitiveIncrementalMoving Av-erage

Era of NGS Explosion FASTQ Era Bits/Base Era

As of April 10, 2012 SRA contains less bytes then bases

Page 29: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

New CycleDecision Circle

What data series to store

Redundancy removal

Normalization

Lossy vs Lossless

Compression tuning

Practical Application

BAM and similar formats containing both raw

reads and alignments become primary output

of raw sequencing

Increases the number of data

series

Compression By Reference

reduces sizes of other data series

New sets of tradeoffs

New compression algorithms

Page 30: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Analyzing New Compression MethodData from 1000 Genome Project

• All available combinations of samples, platforms, and aligners

• 3114 files• 27 Tb of disk space after compression

BAMs from 1000 Genome Project

• Names are dropped after restoring mates• Only sequencing quality score is saved• None of non-redundant optional tags are preserved

BAM treatment

• Occasional alignments to stretches of Ns on the reference and beyond the reference were converted to unaligned

• Different PCR duplicate flags for mates

Correction of BAM

inconsistencies

Page 31: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Changes To SRA Run Browser

Page 32: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

http://aws.amazon.com/datasets/4383

Page 33: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

https://main.g2.bx.psu.edu/

Page 34: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

http://www.genomespace.org/

Page 35: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Science 1 July 2011:Vol. 333 no. 6038 pp. 53-58DOI: 10.1126/science.1207018

Page 36: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Li et al., 2011, Figure 1

Page 37: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Li et al., 2011Fig. 2

Page 38: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Kleinman et al., 2012Fig 1

Page 39: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Kleinman et al., 2012Table 1

Page 40: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Lin et al., 2012Fig 1

Page 41: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Lin et al., 2012Fig 2

Page 42: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Pickrell et al., 2012Fig 1

Page 43: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Li et al, 2012Fig 1

Page 44: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Li et al., 2012Fig 2

Page 45: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Li et al., 2012Fig 3

Page 46: DM ChurchLast Updated: 7 May 2012 Intro to Next Generation Sequencing.

DM Church Last Updated: 7 May 2012

Li et al, 2012Fig 4