Top Banner
CZ5225: Modeling and Simulation in CZ5225: Modeling and Simulation in Biology Biology Lecture 9: Next Generation Sequencing Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Prof. Chen Yu Zong Tel: 6516-6877 Tel: 6516-6877 Email: Email: [email protected] http://bidd.nus.edu.sg Room 08-14, level 8, S16, NUS Room 08-14, level 8, S16, NUS
46

CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: [email protected] .

Jan 01, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

CZ5225: Modeling and Simulation in BiologyCZ5225: Modeling and Simulation in Biology

Lecture 9: Next Generation SequencingLecture 9: Next Generation Sequencing

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]

http://bidd.nus.edu.sgRoom 08-14, level 8, S16, NUSRoom 08-14, level 8, S16, NUS

Page 2: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

OutlineOutline

• First generation sequencing

• Next generation sequencing

• Third generation sequencing

• Analysis challenges

Page 3: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Sanger SequencingSanger Sequencing

• DNA is fragmented• Cloned to a plasmid

vector• Cyclic sequencing

reaction• Separation by

electrophoresis• Readout with

fluorescent tags

Page 4: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Steps to Assemble a GenomeSteps to Assemble a Genome

1. Find overlapping reads

4. Derive consensus sequence ..ACGATTACAATAGGTT..

2. Merge some “good” pairs of reads into longer contigs

3. Link contigs to form supercontigs

Some Terminology

read a 500-900 long word that comes out of sequencer

mate pair a pair of reads from two endsof the same insert fragment

contig a contiguous sequence formed by several overlapping readswith no gaps

supercontig an ordered and oriented set(scaffold) of contigs, usually by mate

pairs

consensus sequence derived from thesequene multiple alignment of reads

in a contig

Page 5: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Sequencing Types and ApplicationsSequencing Types and Applications

Page 6: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Cyclic-Array MethodsCyclic-Array Methods

• DNA is fragmented• Adaptors ligated to

fragments• Several possible

protocols yield array of PCR colonies.

• Enyzmatic extension with fluorescently tagged nucleotides.

• Cyclic readout by imaging the array.

Page 7: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Emulsion PCREmulsion PCR

• Fragments, with adaptors, are PCR amplified within a water drop in oil.

• One primer is attached to the surface of a bead. • Used by 454, Polonator and SOLiD.

Page 8: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Bridge PCRBridge PCR

• DNA fragments are flanked with adaptors.• A flat surface coated with two types of primers,

corresponding to the adaptors.• Amplification proceeds in cycles, with one end of each

bridge tethered to the surface.• Used by Solexa.

Page 9: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Comparison of Existing MethodsComparison of Existing Methods

Page 10: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Genome Assembly: Find Overlapping ReadsGenome Assembly: Find Overlapping Reads

aaactgcagtacggatctaaactgcag aactgcagt… gtacggatct tacggatctgggcccaaactgcagtacgggcccaaa ggcccaaac… actgcagta ctgcagtacgtacggatctactacacagtacggatc tacggatct… ctactacac tactacaca

(read, pos., word, orient.)

aaactgcagaactgcagtactgcagta… gtacggatctacggatctgggcccaaaggcccaaacgcccaaact…actgcagtactgcagtacgtacggatctacggatctacggatcta…ctactacactactacaca

(word, read, orient, pos.)

aaactgcagaactgcagtacggatcta actgcagta actgcagtacccaaactgcggatctacctactacacctgcagtacctgcagtacgcccaaactggcccaaacgggcccaaagtacggatcgtacggatctacggatcttacggatcttactacaca

Page 11: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

• Find pairs of reads sharing a k-mer, k ~ 24• Extend to full alignment – throw away if not >98% similar

TAGATTACACAGATTAC

TAGATTACACAGATTAC|||||||||||||||||

T GA

TAGA| ||

TACA

TAGT||

• Caveat: repeats A k-mer that occurs N times, causes O(N2) read/read comparisons ALU k-mers could cause up to 1,000,0002 comparisons

• Solution: Discard all k-mers that occur “too often”

• Set cutoff to balance sensitivity/speed tradeoff, according to genome at hand and computing resources available

Genome Assembly: Find Overlapping ReadsGenome Assembly: Find Overlapping Reads

Page 12: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Create local multiple alignments from the

overlapping reads

TAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG TTACACAGATTATTGATAGATTACACAGATTACTGA

Genome Assembly: Find Overlapping ReadsGenome Assembly: Find Overlapping Reads

Page 13: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

• Correct errors using multiple alignment

TAGATTACACAGATTACTGATAGATTACACAGATTACTGATAGATTACACAGATTATTGATAGATTACACAGATTACTGATAG-TTACACAGATTACTGA

TAGATTACACAGATTACTGATAGATTACACAGATTACTGATAG-TTACACAGATTATTGATAGATTACACAGATTACTGATAG-TTACACAGATTATTGA

insert A

replace T with Ccorrelated errors—probably caused by repeats disentangle overlaps

TAGATTACACAGATTACTGATAGATTACACAGATTACTGA

TAG-TTACACAGATTATTGA

TAGATTACACAGATTACTGA

TAG-TTACACAGATTATTGA

In practice, error correction removes up to 98% of the errors

Genome Assembly: Find Overlapping ReadsGenome Assembly: Find Overlapping Reads

Page 14: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Genome Assembly: Merge Reads into ContigsGenome Assembly: Merge Reads into Contigs

• Overlap graph:– Nodes: reads r1…..rn

– Edges: overlaps (ri, rj, shift, orientation, score)

Note:of course, we don’tknow the “color” ofthese nodes

Reads that comefrom two regions ofthe genome (blueand red) that containthe same repeat

Page 15: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

We want to merge reads up to potential repeat boundaries

repeat region

Unique Contig

Overcollapsed Contig

Genome Assembly: Merge Reads into ContigsGenome Assembly: Merge Reads into Contigs

Page 16: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

• Ignore non-maximal reads• Merge only maximal reads into contigs

repeat region

Genome Assembly: Merge Reads into ContigsGenome Assembly: Merge Reads into Contigs

Page 17: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Read Length and PairingRead Length and Pairing

• Short reads are problematic, because short sequences do not map uniquely to the genome.

• Solution #1: Get longer reads.• Solution #2: Get paired reads.

ACTTAAGGCTGACTAGC TCGTACCGATATGCTG

Page 18: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Third Generation SequencingThird Generation Sequencing

• Nanopore sequencing– Nucleic acids driven through a nanopore.– Differences in conductance of pore provide readout.

• Real-time monitoring of PCR activity– Read-out by fluorescence resonance energy transfer

between polymerase and nucleotides or– Waveguides allow direct observation of polymerase

and fluorescently labeled nucleotides

Page 19: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Nanopore sequencingNanopore sequencing

Deamer, DW, and Akeson, M. ‘Nanopores and Nucleic Acids: prospects for ultrarapid sequencing’. Tibtech.Meller, A J. Phys.: Condens. Matter 15 (2003) R581–R607

Earlier Findings – Transmembrane voltage drives

RNA through the protein nanopore α-hemolysin.

– Passage of RNA through the pore reduces the ionic current

– Blockage current is modulated by base identity

• PolyC – iblock = 5 pA, • PolyA – Iblock = 20 pA

– Translocation rate depends on base identity

• PolyC - v = 3 µs/base• PolyA – v = 20 µs/base

Page 20: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Automated Rapid DNA Sequencing with NanoporesAutomated Rapid DNA Sequencing with Nanopores

Church, George M. ‘Genomes for All’ Scientific American, Jan 2006, pp. 47-54.

Sequencing will require a better understanding of the physics of the interaction between DNA and protein pore during translocation.

Page 21: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Modeling of ssDNA TranslocationModeling of ssDNA Translocation

• F = zeVa– ze = effective charge / base– V = applied voltage– a = base-to-base distance

• F = (1)(1.6 x10-19)(.125)(.4 x 10-9) ~ 5kbT / a ~ 44 pN

• Basis for modeling– P(forward or backward) ~ exp(Fa/kBT)– Averaged over all monomers

• Model Assumptions: – Length of polymer = L >> pore length – With short polymers, membrane has 0 thickness

D. K. Lubensky and D. R. Nelson, Biophys. J. 77, 1824 (1999).

F

Page 22: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

Experiment of ssDNA TranslocationExperiment of ssDNA Translocation

Conditions– Temp: 2oC

– Electrolyte solution– 1M KCl, 1 mM Tris-EDTA buffer,

pH 8.5

– Polymer• Polydeoxyadenylic acid

(poly(dA))• Length: 4 – 100 bases

– Driving voltage: 70-300 mV

Meller, A., L. Nivon, D. Branton, 2001. Voltage-Driven DNA

Translocations Through a Nanopore, Phys. Rev. Lett., 86,3435-39

Page 23: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2323

Sequence Alignment as a Mathematical Sequence Alignment as a Mathematical Problem: Problem:

Example: Sequence a:  ATTCTTGC Sequence b: ATCCTATTCTAGC

          Best Alignment:             ATTCTTGC

                                 ATCCTATTCTAGC                                           /|\                   gap        Bad Alignment: AT     TCTT       GC                                  ATCCTATTCTAGC                                                                /|\             /|\                                           gap          gap

What is a good alignment? 

Page 24: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2424

How to rate an alignment?How to rate an alignment?• Match: +8 (w(x, y) = 8, if x = y)

• Mismatch: -5 (w(x, y) = -5, if x ≠ y)

• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

Page 25: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2525

Pairwise AlignmentPairwise AlignmentSequence a: CTTAACTSequence b: CGGATCAT

An alignment of a and b:

C---TTAACTCGGATCA--T

Insertion gap

Match Mismatch

Deletion gap

Page 26: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2626

Alignment GraphAlignment GraphSequence a: CTTAACT

Sequence b: CGGATCATC G G A T C A T

C

T

T

A

A

C

T

C---TTAACTCGGATCA--T

Insertion gap

Deletion gap

Page 27: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2727

Graphic representation of an alignmentGraphic representation of an alignment

Sequence a: CTTAACT Sequence b: CGGATCAT

C

C C---TTAACTCGGATCA--T

Page 28: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2828

Graphic representation of an alignmentGraphic representation of an alignment

Sequence a: CTTAACT Sequence b: CGGATCAT

C G G A

C C---TTAACTCGGATCA--T

Page 29: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

2929

Graphic representation of an alignmentGraphic representation of an alignment

Sequence a: CTTAACT Sequence b: CGGATCAT

C G G A T

C

T

C---TTAACTCGGATCA--T

Page 30: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3030

Graphic representation of an alignmentGraphic representation of an alignment

Sequence a: CTTAACT Sequence b: CGGATCAT

C G G A T C A

C

T

T

A

A

C

C---TTAACTCGGATCA--T

Page 31: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3131

Graphic representation of an alignmentGraphic representation of an alignment

Sequence a: CTTAACT Sequence b: CGGATCAT

C G G A T C A T

C

T

T

A

A

C

T

C---TTAACTCGGATCA--T

Page 32: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3232

Pathway of an alignmentPathway of an alignmentSequence a: CTTAACT

Sequence b: CGGATCATC G G A T C A T

C

T

T

A

A

C

T

C---TTAACTCGGATCA--T

Page 33: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3333

Alignment ScoreAlignment ScoreSequence a: CTTAACT

Sequence b: CGGATCAT

8 5 2 -1

-1+8

=7

7-3

=4

4+8

=12

12-3

=9

9-3

=6

C G G A T C A T

C

T

T

A

A

C

T

C---TTAACTCGGATCA--T

6+8=14

Alignment score

Page 34: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3434

An optimal alignmentAn optimal alignment-- the alignment of maximum score-- the alignment of maximum score

• Let A=a1a2…am and B=b1b2…bn .

• Si,j: the score of an optimal alignment between

a1a2…ai and b1b2…bj

• With proper initializations, Si,j can be computedas follows.

),(

),(

),(

max

1,1

1,

,1

,

jiji

jji

iji

ji

baws

bws

aws

s

Page 35: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3535

Computing Computing SSi,ji,j

i

j

w(ai,-)

w(-,bj)

w(ai,bj)

Sm,n

Page 36: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3636

InitializationsInitializationsS0,0= 0

S0,1=-3, S0,2=-6,

S0,3=-9, S0,4=-12,

S0,5=-15, S0,6=-18,

S0,7=-21, S0,8=-24

S1,0=-3, S2,0=-6,

S3,0=-9, S4,0=-12,

S5,0=-15, S6,0=-18,

S7,0=-21

0 -3 -6 -9 -12 -15 -18 -21 -24

-3

-6

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Gap symbol: -3

Page 37: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3737

SS1,11,1 = = ??Option 1:

S1,1 = S0,0 +w(a1, b1)

= 0 +8 = 8

Option 2:

S1,1=S0,1 + w(a1, -)

= -3 - 3 = -6

Option 3:

S1,1=S1,0 + w( - , b1)

= -3-3 = -6

Optimal:

S1,1 = 8

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 ?

-6

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

Page 38: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3838

SS1,21,2 = = ??Option 1:

S1,2 = S0,1 +w(a1, b2)

= -3 -5 = -8

Option 2:

S1,2=S0,2 + w(a1, -)

= -6 - 3 = -9

Option 3:

S1,2=S1,1 + w( - , b2)

= 8-3 = 5

Optimal:

S1,2 =5

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 ?

-6

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

Page 39: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

3939

SS2,12,1 = = ??Option 1:

S2,1= S1,0 +w(a2, b1)

= -3 -5 = -8

Option 2:

S2,1=S1,1 + w(a2, -)

= 8 - 3 = 5

Option 3:

S2,1=S2,0 + w( - , b1)

= -6-3 = -9

Optimal:

S2,1 =5

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5

-6 ?

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

Page 40: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4040

SS2,22,2 = = ??Option 1:

S2,2= S1,1 +w(a2, b2)

= 8 -5 = 3

Option 2:

S2,2=S1,2 + w(a2, -)

= 5 - 3 = 2

Option 3:

S2,2=S2,1 + w( - , b2)

= 5-3 = 2

Optimal:

S2,2 =3

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5

-6 5 ?

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

Page 41: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4141

SS3,53,5 = = ??

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 ?

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Page 42: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4242

SS3,53,5 = = ??

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -4 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

optimal score

Page 43: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4343

C T T A A C – TC T T A A C – TC G G A T C A TC G G A T C A T

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -4 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

8 – 5 –5 +8 -5 +8 -3 +8 = 14

Page 44: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4444

Multiple sequence alignment MSAMultiple sequence alignment MSA

Page 45: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4545

How to score an MSA?How to score an MSA?

• Sum-of-Pairs (SP-score)

GC-TC

A---C

G-ATC

GC-TC

A---C

GC-TC

G-ATC

A---C

G-ATC

Score =

Score

Score

Score

+

+

Page 46: CZ5225: Modeling and Simulation in Biology Lecture 9: Next Generation Sequencing Prof. Chen Yu Zong Tel: 6516-6877 Email: phacyz@nus.edu.sg .

4646

How to score an MSA?How to score an MSA?

• Sum-of-Pairs (SP-score)

GC-TC

A---C

G-ATC

GC-TC

A---C

GC-TC

G-ATC

A---C

G-ATC

Score =

Score

Score

Score

+

+

-5-3+8-3+8= 5

+

8-3-3+8+8= 18

+

-5+8-3-3+8= 5

= 28

SP-score=5+18+5=28