Top Banner
Next-Generation Sequencing (NGS) Technologies and Data Analysis Christopher E. Mason TA: Paul Zumbo Spring 2010
38

Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

Aug 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

Next-Generation Sequencing (NGS) Technologies and Data Analysis

Christopher E. Mason

TA: Paul Zumbo

Spring 2010

Page 2: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

Class #2: Alignments, QC, and data processing

Page 3: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

!"#$%"&'()*"+,*-(,"

copy the version of BWA into the 1KG directory

!"#$%&'$(()*+,)$

Page 4: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

+,."/-(,"0).)123+4"

,-.$/0-$#123-45-04$610-$

!$7.#8))7.#5.31"-(0"92(02:(;/<)*===;-0/>-?)7.#)41.1)@'=ABCD)?-EF-0"-G3-14)HII==*=*JG*(726.(71?.E(;K$

!$7.#8))7.#5.31"-(0"92(02:(;/<)*===;-0/>-?)7.#)41.1)@'=ABCD)?-EF-0"-G3-14)HII==*=*JGL(726.(71?.E(;K$

!"#$%&'()*'+%,*-.'

!$;F0K2#$M(;K$

/001'2('()*'32(2.'

!$6?$N6:$

40"#('()*',%#*-'56/7'847'9:;<=8$

!$O"$HII==*=*JGL(726.(71?.E$

>0'-0?*'?2().'

!$-P#3$LQJBJQ==$)$J$

http://www.1000genomes.org

Page 5: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

5,67-6(8*9"!:89*(,*.;"R-37/3>$.:-$162;0>-0.$S160T8$

!()9O1$160$(();-0/>-?):;*B(71$HII==*=*JG*(726.(71?.E$U$HII==*=*JG*(?12$

!()9O1$160$(();-0/>-?):;*B(71$HII==*=*JGL(726.(71?.E$U$HII==*=*JGL(?12$

V.:-3$/#.2/0?$62?.-4$1.8$:..#8))92/59O1(?/F3"-7/3;-(0-.)9O1(?:.>6$

Page 6: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

<-*=,6."/'778>"!66)?;".-"5-;8.8-*;",-0-31.-$'62;0>-0.?$20$W'X$7/3>1.$SW20;6-$H04$I-14?T$

!$()9O15=(D(Y)9O1$?1>?-$:;*B(71$HII==*=*JG*(?12$HII==*=*JG*(726.(71?.E$UHII==*=*JG*(?1>$

,-0-31.-$'62;0>-0.?$20$W'X$7/3>1.$SR123-4$H04$I-14?T$

!$()9O15=(D(Y)9O1$?1>#-$:;*B(71$HII==*=*JG*(?12$HII==*=*JGL(?12$HII==*=*JG*(726.(71?.E$

HII==*=*JGL(726.(71?.E$UHII==*=*JGRH(?1>$

Page 7: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

!:;-@"?-'6"ABC";?;.,("D)*"()E,")"F877,6,*D,"

,-.$1$6F?.3-$726-?Z?.->$27$Z/F$"10[$S?:/3.$7/3$\20FP$]6F?.-3T$

S@[email protected]/3^$_26-$WZ?.->$SR`_WT$R13166-6$`23.F16$_26-$WZ?.->$Sa-331_W$Ta-331W"16-$a-":($_26-$WZ?.->$S,R_WTb%X$,-0-316$R13166-6$_26-$WZ?.->$

Sun Microsystems

Page 8: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

%G6,)F;@"H66-6;@")*F"A*F,:;";.6-*9:?")77,D."

.G,"):89*(,*.;I";J,,F")*F")DD'6)D?"!$.2>-$()9O1$160$(();-0/>-?):;*B(71$HII==*=*JG*(726.(71?.E$

$ $!"#$%&'(')*

!$.2>-()9O1$160$N.$C$(();-0/>-?):;*B(71$HII==*=*JG*(726.(71?.E$

$ +#"&'"")*

!$.2>-$()9O1$160$N.$C$N-$*=$(();-0/>-?):;*B(71$HII==*=*JG*(726.(71?.E$

$ J>*=(YCJ?$

<@2?&,*-'A%()'2'+%,*'A%()'1#0A#'B2C%2#(-.'

)((&.DD&)E-%0,0FEG?*3GH0C#*,,G*3"D+2H",(ED?2-0#D,2ID32(2D6!JD'

!()9O1$160$(();-0/>-?):;*B(71$b04-6?(71?.E$Ub04-6?(?12$

!()9O1$?1>?-$(();-0/>-?):;*B(71$b04-6?(?12$b04-6?(71?.E$Ub04-6?(?1>$

60A'+%#3'-*K"*#H*-'A%()',2CF*C'%#3*,-'

!$()9O1$160$N-$*=$(();-0/>-?):;*B(71$b04-6?(71?.E$Ub04-6?G-*=(?12$

!$()9O1$?1>?-$(();-0/>-?):;*B(71$b04-6?G-*=(?12$b04-6?(71?.E$Ub04-6?G-*=(?1>$

Page 9: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

3'6.-;8;")*F"/E,K*,;;"H;.8().,"5H"L<""

Skewness:

As close to zero as possible Kurtosis:

As high as possible (at least >0.6)

Page 10: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

$G)."8;"/!MN"SAM is a rapidly developing data specification and format

for the storage of sequence alignments and their mapping coordinates.

Sequence Alignment/Map (SAM) also has a binary version of the format, called BAM.

SAMtools is a set of tools for manipulating and controlling SAM/BAM files

Bam-Bam of the Flintstones is currently unrelated to Heng Li and Richard Durbin’s work with SAM/BAM

Page 11: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/!M"C'.J'."

@HD = Header

@SQ = Sequence Dictionary LN=length of sequence

@RG= Read Group ID=unique read group identifier’

PU=Platform Unit

LB=Library SM=Sample

Page 12: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/!M"C'.J'."

QNAME = name of read

FLAG = Bitwise FLAG (216-1) RNAME = Reference sequence name

POS = Position (1-based) MAPQ = Mapping Quality (Phred-based)

CIGAR = CIGAR STRING

MRNM = Mate Reference Sequence MPOS = 1-based Mate Position of the other seq

ISIZE = Inferred Insert Size SEQ = Sequence reported on the + strand

QUAL = Quality scores (ASCII-33 = Phred)

TAG = TAG

Page 13: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

#8.K8;,"O:)9;")6,"<-(P8*,F"#8.;" 0100101001010

Bit 0 = The read was part of a pair during sequencing

Bit 1 = The read is mapped in a pair Bit 2 = The query sequence is unmapped

Bit 3 = The mate is unmapped

Bit 4 = Strand of query (0=forward 1=reverse)

To find the value from the individual flags is additive. If the flag is false, don't add anything to the total. If it’s true then add 2 raised to the power of the bit position.

For example:

Bit 0 - false - add nothing

Bit 1 - true - add 2**1 = 2 Bit 2 - false - add nothing

Bit 3 - true - add 2**3 = 8

Bit 4 - true - add 2**4 = 16

Bit pattern = 11010 = 16+8+2 = 26 So the flag value would be 26.

Other Examples:

0=0000000

99 = 01100011 147 = 10010011

0 = Not paired, mapped, forward strand.

99 = Paired, Proper Pair, Mapped, Mate Mapped, Forward, Mate Reverse, First in pair, Not second in pair

147 = Paired, Proper Pair, Mapped, Mate Mapped, Reverse, Mate Forward, Not first in pair, Second in pair

Page 14: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

#8.K8;,"O:)9"H>J:)*).8-*"

!"#$%& !"#$%'& ($)*+#,-.$& /0-1&

Q" 2"

8;R,)F5)6.C7!5)86,F!:89*

(,*." Q>QQQ2""

2" S"

8;R,)F!56-J,65)86,F!:89*

(,*." Q>QQQS""

S" T" 8;L',6?U*()JJ,F" Q>QQT""

V" W" 8;M).,U*M)JJ,F" Q>QQW""

T" 2X" 8;L',6?R,=,6;,/.6)*F"

Q>QQ2Q""

17):;,"YZ".6',"[4"

\" VS" 8;M).,R,=,6;,/.6)*F" Q>QQSQ""

X" XT" 8;R,)FO86;.5)86" Q>QQTQ""

]" 2SW" 8;R,)F/,D-*F5)86" Q>QQWQ""

W" S\X" 8;!:89*(,*.^-.568()6?" Q>Q2QQ""

_" \2S" F-,;R,)FO)8:`,*F-6L<" Q>QSQQ""

2Q" 2QST" 8;R,)F!0'J:8D).," Q>QTQQ""

Page 15: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/!M"/J,D878D).8-*;"SAM can store various alignments as a CIGAR format:

1.!Standard

2.!Clipped

a. soft-clipped= non-matched sequence present in alignment

b. Hard-clipped= non-matched sequence missing from alignment

4. Spliced (Intron (N))

5. Multi-part

6. Padded (Insertions (I) and Deletions (D))

7. Color-space

Page 16: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/!M"/J,D878D).8-*;"

Li et al, 2010

Page 17: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/-(,")(P89'8.8,;"6,()8*"CIGAR format is a short way of storing mis-aligned bases

to a reference genome.

In certain cases, CIGAR will need pileup-based padding,

though this is currently not supported.

Page 18: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

$G)."87"A"F-*I."G)=,"7);.a"78:,;@"-6"

A"):6,)F?"G)=,"):89*(,*.;N"/-(,".--:;"):6,)F?",>8;.".-"DG)*9,"7-6().;"b"

"#$!"

"/!M.--:;"

" <-*=,6.,6;" " " "%--:;"samtools.pl

wgsim_eval.pl

blast2sam.pl

bowtie2sam.pl export2sam.pl

novo2sam.pl sam2vcf.pl

soap2sam.pl

zoom2sam.pl

qualfa2fq.pl

solid2fastq.pl (not recommended!)

Page 19: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

56,F8D.8*9"+,*,.8D"`)68).8-*"K8.G"/!M.--:;"

_23?.c$Z/F$O266$0--4$./$;-.$1$.//6^2.c$104$O-$O266$F?-$W'X.//6?8$

:..#8))?/F3"-7/3;-(0-.)#3/d-".?)?1>.//6?)726-?)$

:..#8))?1>.//6?(?/F3"-7/3;-(0-.)$

e/O06/14$.:-$?/F3"-$"/4-8$

!$:..#8))?/F3"-7/3;-(0-.)#3/d-".?)?1>.//6?)726-?)?1>.//6?)=(*(Y)?1>.//6?5=(*(Y1(.13(9KL)4/O06/14$

f0K2#$.:-$.139166$S/3$4/F96-5"62"^$27$/0$e-?^./#T8$

!9K2#L$5"4$?1>.//6?5=(*(Y1(.13(9KL$g$.13$P<7$5$

]:10;-$20./$.:-$0-O$423-"./3Z$

!"4$?1>.//6?5=(*(Y1$

]/>#26-$.:-$#3/;31>$

!>1^-$

X/<-$.:-$-P-"F.196-$20./$>120$423-"./3Z$

!"#$?1>.//6?$(()$

Page 20: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/!M.--:;"()8*"-J.8-*;"

Page 21: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

58:,'J".G,"R,)F;".-"D)::"=)68)*.;"!"#$%&'&()'%)*+,'-.&$'/01'23-.*%4'5016'7$%"*&''

!()?1>.//6?$2>#/3.$(();-0/>-?):;*B(71$b04-6?G-*=(?1>$b04-6?G-*=(91>$

5$%&'&()'/01'7-8)'27$%'7*,&)%'#%$9),,-.:'8*&)%6'

!()?1>.//6?$?/3.$b04-6?G-*=(91>$b04-6?G-*=(?/3.-4$$

;)%7$%"'*';-8)<#'28*4)%'&()'%)*+,'$.'&$#'$7')*9('$&()%6'

!$()?1>.//6?$#26-F#$5<"7$(();-0/>-?):;*B(71$b04-6?G-*=(?/3.-4(91>$Ub04-6?G-*=(#26-F#G31O$

!7'4$<'=*.&'&$'98)*.'<#'&()'#-8)<#'34'+)#&('$7'9$>)%*:)?'

!#-36$?1>.//6?5=(*(Y)?1>.//6?(#6$<13_26.-3$54$*=$726-(#26-F#(31O$Ub04-6?G-*=(#26-F#G*=h$

!7'4$<'=*.&'&$'98)*.'<#'&()'#-8)<#'34'@<*8-&4',9$%),'$7'@AB'29$8<".'C6D''!"#$%&'()'*+,-./'*")0'1.(.-'

2.3$4.-5.-0'#$/'6-3#$'7.-$35"#$'8*279:'

!1O^$ijAUkL=i$726-(#26-F#(31O$U726-(#26-F#GIXWL=(/F.$

E$'8$$F'98$,)%'*&'4$<%'8-,&'$7'>*%-*.&,?'

!6-??$HII(#26-F#G*=h$

Page 22: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

$G)."8;"8*".G,"-'.J'.N"1.! Chromosome: reference sequence name

2.! Position: reference coordinate in position (1-based)

3.! Reference Base: base of the genome, or `*' for an indel line

4.! Genotype: where heterozygotes are encoded in the IUPAC/IUB code: M=A/C, R=A/G, W=A/T, S=C/G, Y=C/T and K=G/T; indels are indicated by, for example, */+A, -A/* or +CC/-C. There is no difference between */+A or +A/*.

5.! Consensus Quality: Phred-scaled likelihood that the genotype is wrong

6.! SNP Quality: Phred-scaled likelihood that the genotype is identical to the reference, which is also called `SNP quality'. Suppose the reference base is A and in alignment we see 17 G and 3 A. We will get a low consensus quality because it is difficult to distinguish an A/G heterozygote from a G/G homozygote. We will get a high SNP quality, though, because the evidence of a SNP is very strong.

7.! RMS: root mean square (RMS) mapping quality, a measure of the variance of quality scores

8.! Coverage: # reads covering the position

9.! Bases with Support/Indel#1: Bases used for SNP line, “^” from CIGAR N/S/H break, “$” end of read segment; the 1st indel allele otherwise

10. !Quality of bases/Indel#2: base quality at a SNP line; the 2nd indel allele otherwise

11. !INDEL#1: # reads directly supporting the 1st indel allele

12. !INDEL#2: # reads directly supporting the 2nd indel allele

13. !INDEL#3: # reads supporting a third indel allele

14. !Blank

Page 23: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

/!M%--:;"J8:,'J"';,;".G,"/^5"(-F,:"76-("M!L@"

)*F"G);")"7,K"-J.8-*;"

V#.2/0?8$

5"$]3-1.-$.:-$"/0?-0?F?$91?-$1.$-1":$#/?2.2/0$

5<$W:/O$#/?2.2/0?$.:1.$4/$0/.$1;3--$O2.:$.:-$3-7-3-0"-$;-0/>-$S:;*B(71T$

57$a:-$3-7-3-0"-$;-0/>-$2?$20$_'Wa'$7/3>1.$

Page 24: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

$G).")6,";-(,"-.G,6"78:.,6;N"

Parameter INTEGER [Default Value]

-Q INT minimum RMS mapping quality for SNPs [25]

-q INT minimum RMS mapping quality for gaps [10]

-d INT minimum read depth [3]

-D INT maximum read depth [100]

-G INT min indel score for nearby SNP filtering [25]

-w INT SNP within X bp around a gap to filter [10]

-W INT window size for filtering dense SNPs [10]

-N INT max number of SNPs in a window [2]

-l INT window size for filtering adjacent gaps [30]

Page 25: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

International Union of Biochemistry (IUB) / or Intn’l Union of Pure and Applied Chemistry (IUPAC) Codes

2+#$& 3$45"5)5+"& ($-"5"1&

!" !F,*8*," 6"

<" <?.-;8*," 2"

+" +')*8*," 7"

%" %G?(8*," 8"

R" !+" J'98*,"

c" <%" J:68(8F8*,"

3" +%" ;,.-"

M" !<" )(8*-"

/" +<" <.6-*9"

$" !%" =,)E"

#" <+%" ^-."!"

0" !+%" ^-."<"

&" !<%" ^-."+"

`" !<+" ^-."%"

^" !+<%" ),?"

Page 26: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

%=8,K""G-%,&H'"*F)'4$<%'/01'!.+)I'

!()?1>.//6?$204-P$b04-6?G-*=(?/3.-4(91>$

J$='4$<'9*.'8$$F'*&'4$<%'*8-:.").&,'

!()?1>.//6?$.<2-O$b04-6?G-*=(?/3.-4(91>$(();-0/>-?):;*B(71$!()?1>.//6?$.<2-O$HII==*=*JGRH(?/3.-4(91>$(();-0/>-?):;*B(71$

!,/$./$1$?#-"272"$20.-3<16$;$

a:-0$.Z#-8$":3Y8DYDJAJ*A$$

Page 27: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

CJ,*"O)D.-6;"-7")"`)68)*.I;"O8F,:8.?"

l/O$4/$O-$^0/O$.:-$EF162.Z$2?$;//4m$

(N) Number of reads supporting that site,

(Pv) Probability of that platform-specific variant change,

(QVD) The average deviation of the quality values, (T) The set of alignments with unique start sites,

(D) PCR Duplicates,

(S) Strand representation (half on one, half on the other),

(Z) Zygosity change (CNV regions)

(C) Cellular heterogeneity

Page 28: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

+,*-(8D"R,:).8=8.?"D)*"D6,).,"8*.,6,;.8*9"

D-(J-'*F"G,.,6-d?9-.,;"

>*?+.+& 0+>-)5+"& @-)5$")A3& ?$-#B& @-)5$")A9& ?$-#B& @-)5$")'3& ?$-#B& @-)5$")'9& ?$-#B&

DG622" X2X\S2_]" ^!" Q" [+BY+%<+" 2S" ^!" Q" [+B[+" W"

-G!Paternal A!

C!

A!

T!

T!

G!

G!

G!

A!

C!

A!

T!

G!

T!

G!

G!

G!

Reference Maternal A!

C!

A!

T!

G!

G!

T!

C!

G!

T!

G!

G!

G!

+GTCG!

[++%<+BY++%<+"

Page 29: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

W[2Qe"D-=,6)9,";'778D8,*."7-6"G89G[a'):8.?"/^5"D)::;""

Page 30: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

`)68)*."<)::"O-6()."

G..JbBB2QQQ9,*-(,;f-69BK8E8BF-E'fJGJN

8Fg2QQQh9,*-(,;b)*):?;8;b=D7=VfS"

Columns:

1.! #CHROM 2.! POS

3.! ID 4.! REF

5.! ALT

6.! QUAL 7.! FILTER

8.! INFO

Page 31: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

1Kg

Page 32: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

C.G,6"!*):?;8;"CJ.8-*;"

a:-$,-0/>-$'016Z?2?$a//6^2.$S,'a+T$

:..#8))OOO(93/1420?.2.F.-(/3;);?1)O2^2)204-P(#:#)a:-G,-0/>-G'016Z?2?Ga//6^2.$

R2"134$

:..#8))#2"134(?/F3"-7/3;-(0-.)204-P(?:.>6$

#8-5,6:b"

" #8-bb0#bb/)("

Page 33: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

!*):?d,"O,).'6,;"-7"H66-6;"7-6"

,)DG"P);,"K8.G"+!%3"

•!R,J-6.,F"a'):8.?";D-6,""

•!%G,"J-;8.8-*"K8.G8*".G,"6,)F""

•!%G,"J6,D,F8*9")*F"D'66,*."*'D:,-.8F,"

1;,a',*D8*9"DG,(8;.6?",77,D.4"-P;,6=,F"P?"

.G,";,a',*D8*9"()DG8*,""

•!56-P)P8:8.?"-7"(8;().DG8*9".G,"6,7,6,*D,"

9,*-(,"

•!R,[D):D':).,"L;D-6,;"

Page 34: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul
Page 35: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

+!%3"H>)(J:,;"1,>.6)4"_23?.c$6-.n?$>1^-$?F3-$Z/F$:1<-$d1<1$

!$d1<1$N<-3?2/0$

@/Oc$6-.n?$;-.$,'a+$

!$O;-.$7.#8))7.#(93/1420?.2.F.-(/3;)#F9);?1),-0/>-'016Z?2?a+),-0/>-'016Z?2?a+561.-?.(.13(9KL$

%F0K2#Lc$104$F0.13$.:-$726-($$]4$20./$.:1.$423-"./3Z($

\-.n?$;-.$?/>-$-P1>#6-$41.1$./$O/3^$O2.:8$

!"K9,."7.JbBB7.JfP6-)F8*;.8.'.,f-69BJ'PB9;)B,>)(J:,O8:,;B,>)(J:,O8:,;f.)6fPdS"

&-K"()*?"6,)F;"F-"?-'"G)=,N"

i"j)=)"[j)6"+,*-(,!*):?;8;%3fj)6"[R",>)(J:,O!/%!f7);.)""[A",>)(J:,#!MfP)("[%"<-'*.R,)F;"

l/O$>10Z$6/"2$4/$Z/F$:1<-m$

!"j)=)"[j)6"+,*-(,!*):?;8;%3fj)6"[R",>)(J:,O!/%!f7);.)""[A",>)(J:,#!MfP)("[%"<-'*.k-D8$

Page 36: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

+!%3"+,*-(,"56-D,;;8*9")F?3)6/"16)"6F?.-3)?/7.O13-)d3-*(A51>4AJ)d3-*(A(=G=*)920)d1<1$5d13$,-0/>-'016Z?2?a+(d13$5I$:;*C(71?.1$5b$(()YD*QD*I(91>(?/3.-4(91>$5a$]/F0.I-14?$

Page 37: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

+!%3"CJ.8-*;"<+.$&6C-50-D0$&-"-0EB$BF&

`)68)*.!**-.).-6""""""""!**-.).,;"=)68)*."D)::;"K8.G"D-*.,>."8*7-6().8-*f"

0,J.GC7<-=,6)9," "<-(J'.,;".G,"F,J.G"-7"D-=,6)9,").")::":-D8"8*".G,";J,D878,F"6,98-*"-7".G,"6,7,6,*D,f"

`)68)*.O8:.6).8-*"O8:.,6;"=)68)*."D)::;"';8*9")"*'(P,6"-7"';,6[;,:,D.)P:,@"J)6)(,.,68d)P:,"D68.,68)f"

U*878,F+,*-.?J,6 "!"=)68)*."D)::,6"KG8DG"'*878,;".G,")JJ6-)DG,;"-7";,=,6):"F8;J)6).,"D)::,6;f"

A*F,:+,*-.?J,6`S" "%G8;"8;")";8(J:,@"D-'*.;[)*F[D'.-77;"P);,F".--:"7-6"D)::8*9"8*F,:;"76-("):89*,F" " "

" " " ";,a',*D8*9"F).)f"""

A*F,:R,):89*,6 "5,67-6(;":-D):"6,):89*(,*."-7"6,)F;"P);,F"-*"(8;):89*(,*.;"F',".-".G,"J6,;,*D,"-7"8*F,:;f"""

R,):89*,6%)69,.<6,).-6""H(8.;"8*.,6=):;"7-6".G,"k-D):"A*F,:"R,):89*,6".-".)69,."7-6"D:,)*8*9f"

<-'*.k-D8 " "$):E;"-=,6".G,"8*J'."F).)";,.@"D):D':).8*9".G,".-.):"l"-7"D-=,6,F":-D8"7-6"F8)9*-;.8D"J'6J-;,;f"""

<-'*.R,)F; "$):E;"-=,6".G,"8*J'."F).)";,.@"D):D':).8*9".G,"l"-7"6,)F;";,,*"7-6"F8)9*-;.8D"J'6J-;,;f"""

`):8F).8*958:,'J"!.",=,6?":-D';"8*".G,"8*J'.";,.@"D-(J)6,;".G,"J8:,'J"F).)"16,7,6,*D,"P);,@"):89*,F"P);,"

`)68)*.H=): "!"6-P';.")*F"9,*,6):"J'6J-;,".--:"7-6"DG)6)D.,68d8*9".G,"a'):8.?"-7"/^5;@"A*F,:;@")*F"

-.G,6"""""""""""""""""""""""""""=)68)*.;".G)."8*D:'F,;"P);8D"D-'*.8*9@".8B.=@"FP/^5m"187"[0"8;"J6-=8F,F4@"D-*D-6F)*D,".-"

DG8J""-6"=):8F).8-*"F).)@")*F"K8::";G-K"8*.,6,;.8*9";8.,;"1[`4".G).")6,"O^;@"O5@",.DfK):E,6;"

Page 38: Next-Generation Sequencing (NGS) T echnologies and Data Analysis · 2010-05-04 · Next-Generation Sequencing (NGS) T echnologies and Data Analysis Christopher E. Mason T A: Paul

A7";G)68*9")"D:';.,6@"P,")"9--F"*,.8d,*"

L-*'()*-*'(%&-'(0'I*'2'F003'&0C(?2#(*2".'

*(!]:-"^$Z/F3$42?^$F?1;-$S47$N:T$

L(!WF9>2.$d/9?$./$.:-$EF-F-c$27$1<126196-$

Q(!W:13-$;-0/>-$2042"-?$20$/0-$#61"-$

J(!\-1<-$.:-$"1>#$?2.-$9-..-3$.:10$Z/F$7/F04$2.($$]6-10$F#$/64$726-?[$

D(!@-<-3$1??F>-$1$91"^F#$2?$.:-3-o$>1^-$Z/F3$/O0$*(! \2<-WZ0"$7/3$?Z0":/032K1.2/0$

L(! a2>-$]1#?F6-$7/3$%1"^F#$