Top Banner
Algorithmic Problems in Peptide Sequencing
54

Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

Jan 02, 2016

Download

Documents

Helen Russell
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

Algorithmic Problems in Peptide Sequencing

Page 2: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 2/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra

Peptide Identification in Real World Discussions

Page 3: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 3/54

Briefings

• We mainly focus on the following result:

– Ting Chen, Ming-Yang Kao, Matthew Tepel, John Rush and George Church, A Dynamic Programming Approach to De Novo Peptide Sequencing via Tandem Mass Spectrometry, Journal of Computational Biology, 8(3): 325-337, 2001.

– Its preliminary version also appears in The 11th Annual SIAM-ACM Symposium on Discrete Algorithms (SODA 2000), page 389-398, 2000.

• One of the most-cited algorithm articles in the computational proteomics community.

Page 4: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 4/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra An Improved Version

Peptide Identification in Real World Discussions

Page 5: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 5/54

Anatomy of Protein Molecules

• Neutral peptide • Residue (of the peptides)

H

C

Rx

NH C

H

C

Rx

NH C

OH

OH

O

Basic building blocksStable state in nature

Page 6: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 6/54

Proteins and Peptides

arginine (R) or lysine (K)

N COOHC

R5

H

H

N C

R3

HH

C

O

N C

R4

HH

C

O

H

trypsin + H2OK 146.19 128.17

N CC

R1

H2

H O

N C

R2

H

H

N COOHC

R5

H

H

C

O

N C

R3

HH

C

O

N C

R4

HH

C

O

OHN CC

R1

H2

H O

N C

R2H

C

O

H

R 174.13 156.11

Rectangles stand for amino acid residues

Page 7: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 7/54

Amino Acid Molecules

• Please visit http://www.ionsource.com/ for more information.

Page 8: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 8/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identifications De Novo Sequencing An Algorithm for Perfect Spectra

Peptide Identification in Real World Discussions

Page 9: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 9/54

Tandem Mass Spectrometry

• Mass Spectrometers measure the mass of charged ions.

– A mass spectrometer has 3 major components.

Ionizer

Sample

+_

Mass Analyzer Detector

Adapted from Nathan Edwards’ slides

Page 10: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 10/54

Proteomics via Mass Spectrometers

Enzymatic Digestand

Fractionation First stage MS

MS/MS

Precursor selection and dissociation

Adapted from Nathan Edwards’ slides

Page 11: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 11/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra

Peptide Identification in Real World Discussions

Page 12: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 12/54

Peptide Identification

• Given:

• A MS/MS spectrum (m/z, intensity, possibly along with its retention time)

• The precursor mass

• Output:

• The amino-acid sequence of the peptide

• Imagine a deck of cards that you can cut many times and obtains the sums of the upper or lower half

Page 13: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 13/54

Peptide Fragmentation Mechanism

N-Terminus C-Terminusb-ions y-ions

m/zL G E R

R E G L

b-ions

y-ions

Page 14: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 14/54

Peaks in a Spectrum

• Peptide: L – G – E – R

Weight Ion Amino

Acids

Amino

Acids

Ion Weight

114.2 b1L GER y3

361.3

171.2 b2LG ER y2

304.3

300.3 b3LGE R y1

175.2

Page 15: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 15/54

Manual De Novo Sequencing

667.27-536.24=131.03 Molecular weight of M

128.09 ≈147.11-19 Molecular weight of K

Page 16: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 16/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra

Peptide Identification in Real World Discussions

Page 17: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 17/54

De Novo Sequencing

• De Novo: From the beginning in Latin.– Database search tools match against known

peptides.

• Problem Definitions:Given a spectrum ( a set of real intervals ), a mass value M, compute a sequence P, ( a set of real number with specific order)s.t. m(P)=M, and the matching score is maximized.m(P) is the sum of residue mass.

M

Page 18: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 18/54

De Novo Sequencing: An Ideal Case

• An ideal tandem mass spectrum is noise-free and contains only b- and y-ions, and every mass peak has the same height.

The task is to find paths connecting two endpoints on a directed acyclic graph.

The problem is : how to construct the ion ladder?

M

Page 19: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 19/54

Ion Ladders in an Ideal Case

Based on an ideal ion ladder, we can determine the sequence by concatenating prefixes (or suffixes) in order.

However, we cannot determine the ion type of a peak before identifying it.

m/z

y1 y2 y3

L G E R

R E G L

Given onlyL+ , ER+,

LGE+, R+

Page 20: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 20/54

NC-Spectrum Model

• We generate a (superset of ) ladder of ions.

– A Trick: Even if we cannot determine the ion types, we know that an ion is either b-ion or y-ion.

1. Assume that we want to generate b-ion ladder.

2. If a peak is a b-ion, add the peak value to the list.

3. If a peak is a y-ion, add the complementary b-ion value to the list.

• This phase doubles the number of peaks.

Page 21: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 21/54

NC-Spectrum Model

• For the peptide sequence LGRE, we construct all possible b-ions with respect to current spectrum.

• {P1, Q3, P4} or {P2, P3, Q1} are both complete ladders.

m0

P1 P2 P3 P4

L R ER LGE

Q2 Q1Q4 Q3

m/2

LG GER Pi: observed peaks

Qi: artificial peaks

Page 22: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 22/54

NC-Spectrum Model

• Given a peak list = {P1,P2,P3, … , Pk}

• The coordinates of all points along the line:

1. Pk – 1

2. Qk = M – Pk+1 (why?)

• We still have to add two endpoints:

1. 0

2. M – 18

(M – (Pk – 1 ) ) - 1

Since the ion loses a Hydrogen

Page 23: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 23/54

NC Spectrum Model: A Summary

• We are given k peaks.

– Now we have at most 2k+2 vertices.

• Two vertices are adjacent if their coordinates differ by the weight of some amino acid.

– The spectrum graph can be constructed in O(n2). (Why?)

• The de novo sequencing is to search a path (or paths) representing a good path from coordinate 0 to M-18.

– Such a path is not necessarily an ion ladder, though.

Page 24: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 24/54

Dynamic Programming Strategy

• Dynamic Programming can solve this problem efficiently.

– Uni-directional (forward) DP does not work since it could produce a solution containing both candidates for each peak.

m0

P1 P2 P3 P4

Q2 Q1Q4 Q3

m/2

Page 25: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 25/54

Dynamic Programming Strategy (Cont’d)

• Dynamic Programming can solve this problem efficiently using a different encoding scheme.

– We approach the middle part from both end sides.

m0

P1 P2 P3 P4

Q2 Q1Q4 Q3

m/2

Page 26: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 26/54

Dynamic Programming Strategy (Cont’d)

• Mass(b-ion) + Mass(y-ion) = PrecursorMass +2

– These b-ion candidates are nested pairs in the spectrum graph.

m0m/2

Page 27: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 27/54

Relabeling the Vertices

• To encode the spectrum graph by the nested pairs, we need to relabel the vertex number.

1. {0 = x0, x1, x2, …, xk, yk, …, y2, y1, y0 = m}

2. xi and yi are both generated from the same peak.

3. We go one level further in each iteration.

m0m/2

x0 xk yk y0

Page 28: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 28/54

How Dynamic Programming Works

• We design the |V|×|V| matrix M for representing partial path candidates.

1. M(i, j) = 1 iff [xo, xi] and [yj, yo] can occur simultaneouly in a legal path.

2. For 1 ≦ s ≦ i, 1 ≦ s ≦ j, s occurs exactly once in the determined partial path.

m0m/2

?

xi yj

Page 29: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 29/54

How Dynamic Programming Works (Cont’d)

mm/2

0

x0 x1 x2 x4x3 y4 y3 y2 y1 y0

x0 y0

x0 y0y1

x0 y0x1

M(0,0) = 1

M(0,1) = 1

M(1,0) = 1

Page 30: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 30/54

How Dynamic Programming Works (Cont’d)

mm/2

0

x0 x1 x2 x4x3 y4 y3 y2 y1 y0

x0 y0y1

x0 y0x1

M(0,1) = 1M(1,0) = 1

x0 y0y1

x0 y0x1M(2,0) = 0

M(2,1) = 1

x2

•M(1,0) =1 , but we cannot reach x2 from x0 nor x1.

x2

•M(0,1) =1 , and we can reach x2 from x0.

Page 31: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 31/54

How Dynamic Programming Works (Cont’d)

mm/2

0

x0 x1 x2 x4x3 y4 y3 y2 y1 y0

x0 y0y1

x0 y0x1

M(0,1) = 1M(1,0) = 1

x0 y0y2

x0 y0y1M(0,2) = 0

M(1, 2) = 1

y2

•M(0,1) =1 , but we cannot reach y2 from y0 nor y1.

x1

•M(1,0) =1 , and we can reach y2 from y0.

Page 32: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 32/54

Dynamic Programming: Preview

• In the i-th iteration, we determine and record all possible (partial) paths in [0, xi] and [ yi, m].

x0y0yt

xi-1… …

x0 y0ytxi-1

… …

m0

m/2

t < i-1xi or yi?

xi yi

Page 33: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 33/54

Dynamic Programming: Preview(Cont’d)

Path extension• How can we reach yi?

• To calculate M(xj, yi) for all j < i,

• For every j < i, check if yi is adjacent to yt and M(xj, yt) = 1, for some t < i

– Then M(xj, yi) = 1. Otherwise, it is 0.

x0y0yt

xj… …

yix0 y0ytxj… …

yi

Page 34: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 34/54

Dynamic Programming: Preview(Cont’d) Path extension

• Similarly, how can we reach xi?

• To calculate M(xi, yj) for all j < i,

• For every j < i, check if xi is adjacent to xt and M(xt, yj) = 1, for some t < i

– Then define M(xi, yj) =1.

x0y0yjxt

… …

xix0 y0yjxt

… …

xi

Page 35: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 35/54

Dynamic Programming

mm/2

0

M y0 y1 y2 y3 y4

x0

x1

x2

x3

x4

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 36: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 36/54

Dynamic Programming: Initialization

mm/2

0

M y0 y1 y2 y3 y4

x0 1

x1 0 0 0 0

x2 0 0 0 0

x3 0 0 0 0

x4 0 0 0 0

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 37: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 37/54

Dynamic Programming: 1st iteraton

We then compute M(1,0) and M(0,1).

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1

x1 1 0 0 0 0

x2 0 0 0 0

x3 0 0 0 0

x4 0 0 0 0Check the arcs (x0, x1) and (y1, y0)

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 38: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 38/54

Dynamic Programming: Recursion (a)

For j = 2 to k

For i = 0 to j-2

(a) If M(i, j-1) = 1 and edge(Xi, Xj) = 1, then M(j, j-1) = 1.

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1

x1 1 0

x2 1 0 0 0

x3 0 0 0

x4 0 0 0Can we adjust the leftmost endpoint to xj?

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 39: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 39/54

Dynamic Programming: Recursion (b)

For j = 2 to k

For i = 0 to j-2

(b) If M(i, j-1) = 1 and edge(Yj, Yj-1) = 1, then M(i, j) = 1.

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1 0

x1 1 0

x2 1 0 0 0

x3 0 0 0

x4 0 0 0Can we adjust the rightmost endpoint to yj?

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 40: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 40/54

Dynamic Programming: Recursion (c)

For j = 2 to k

For i = 0 to j-2

(c) If M(j-1,i) = 1 and edge(Xj-1, Xj) = 1, then M(j, i) = 1.

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1 0

x1 1 0

x2 0 1 0 0 0

x3 0 0 0

x4 0 0 0Can we adjust the leftmost endpoint to xj?

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 41: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 41/54

Dynamic Programming: Recursion (d)

For j = 2 to k

For i = 0 to j-2

(d) If M(j-1, i) = 1 and edge(Yi, Yj) = 1, then M(j-1, j) = 1.

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1 0

x1 1 0 1

x2 0 1 0 0 0

x3 0 0 0

x4 0 0 0Can we adjust the rightmost endpoint to yj?

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 42: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 42/54

Dynamic Programming (Cont’d)

Now for j = 3

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1 0 0

x1 1 0 1 1

x2 0 1 0 1 0

x3 0 0 1 0 0

x4 0 0 0

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 43: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 43/54

Dynamic Programming (Cont’d)

Now for j = 4

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1 0 0 0

x1 1 0 1 1 0

x2 0 1 0 1 0

x3 0 0 1 0 0

x4 0 0 0 1 0

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 44: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 44/54

Dynamic Programming: Constructing the Answer

• Legal path: Starting our search from the outermost regions ( the last row/column):

– [x4, y4] -> [x3, y3] -> [x2, y2] ->[x1, y1]

– We backtrack M to search each edge corresponding to the feasible solution

mm/2

0

M y0 y1 y2 y3 y4

x0 1 1 0 0 0

x1 1 0 1 1 0

x2 0 1 0 1 0

x3 0 0 1 0 0

x4 0 0 0 1 0

x0 x1 x2 x3 x4 y4 y3 y2 y1 y0

Page 45: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 45/54

Dynamic Programming: Review

• Chen et al. create a new NC-specturm graph G=(V, E), where V=2k+2 and k is the number of mass peaks (ions).

• Given the NC-spectrum graph, we can solve the ideal de novo peptide sequencing problem in O(|V|2) time and O(|V|2) space.

– M construction : O(|V|2) time

– Constructing a feasible solution : O(|V|) time

• Therefore we find a feasible solution in O(|V|2) time and O(|V|2) space.

Page 46: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 46/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra

Peptide Identification in Real World Discussions

Page 47: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 47/54

Noises in Real Spectra

• The de novo strategy is too fragile to handle frequent errors.

1. False negative peaks• Missing ions will break the path. The algorithms may find

wrong paths by concatenating two partial paths.

2. False positive peaks• The main critique of de novo strategy

3. Peak value is not the ion mass• Peak values represent the mass over charge value of ions.

• It relies on the vendor. (Applied Biosystem)

Page 48: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 48/54

False Positives in Real Spectra

• Different types of ions– a-x, b-y, c-z– Internal fragments/immonium ions

• Neutral losses– Neutral loss of water (~18Da)– Neutral loss of ammonia (~17Da)

• PTM (like adding new letters)– Phosphorylation, glycopeptides

• Isotopes• Unpurified samples

Page 49: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 49/54

Database Search Tools

• MASCOT: http://www.matrixscience.com/

• The de facto identification tool

Page 50: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 50/54

Database Search Tools (Cont’d)

• Brian Searle of Proteome Software informs us:

Page 51: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 51/54

Peptide and Protein Identification

• A brief comparison of popular tools

Scoring Strategy RepresentativesCorrelation, Z-score, posterior probabilities

SEQUEST, MS-Tag, Scope, CIDentify, Popitam, ProbID, and PepSearch

Statistical significance: E-values or P-values

Mascot, Sonar, InsPecT,

OMSSA, and X!Tandem

De Novo Sequencing

Pseudo-peaks PEAKS

Spectrum graphs Lutefisk, PepNovo, AUDENS

Statistical models NovoHMM

Page 52: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 52/54

Outline

Basics of Proteomics Roles and Anatomy of Proteins Tandem Mass Spectrometry

Algorithms for Peptide Identification De Novo Sequencing An Algorithm for Perfect Spectra An Improved Version

Peptide Identification in Real World Discussions

Page 53: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 53/54

Wrap Up

• The MS/MS measures the mass of fragment ions.– A single stage MS measures a collection of peptide.

• We generate ion ladders to reconstruct peptide sequence.– Masses are more reliable than intensities.

• De novo sequencing is an elegant strategy, but not robust.– We need some signal preprocessing strategies.

• Database search tools cannot handle novel proteins, and results from different tools are often inconsistent.– Integration of the two above methods may be a possible

way.

Page 54: Algorithmic Problems in Peptide Sequencing. De Novo Sequencing for Peptide Identificaiton 2/54 Outline Basics of Proteomics Roles and Anatomy of Proteins.

De Novo Sequencing for Peptide Identificaiton 54/54

Some Guys You May Wish to Know

Affiliation

Principal Investigators Topics

ETH at Zurich

Ruedi Aebersold Peptide-atlas, statistical significance estimation

UCSD Pavel Pevzner, Vineet Bafna De novo sequencing: Multi-spectra alignment

Waterloo Bin Ma De novo sequencing: SPIDER, PEAKS

NIH Yi-Kuo Yu Signal calibration, statistical significance estimation

Xerox Andrew Goldberg, Marshall Bern PTM

Georgetown Nathan Edwards Peptide identification

USC Tim Chen De Novo Sequencing