Top Banner
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang, U. Western Ontario, Canada Chengzhi Liang, Bioinformatics Solutions Inc. Canada
28

PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Jan 15, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

PEAKS: De Novo Sequencing using MS/MS spectra

Bin Ma, U. Western Ontario, Canada

Kaizhong Zhang, U. Western Ontario, Canada

Chengzhi Liang, Bioinformatics Solutions Inc. Canada

Page 2: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Outline

• Background – Tandem Mass Spectrometry

• De novo sequencing– Problem Definition and Algorithm.

• Software implementation – PEAKS

• Future work

Page 3: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Background

• Human has 100,000 different proteins. Because of the existence of post translational modifications, each protein can have many different versions.

• Diseases are closely related to the abnormal proteins or the expression levels of proteins.

• Given a tissue, the identification of the proteins (and their modified versions) in it is a fundamental problem for the drug design.

Page 4: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Proteins and Peptides

• A protein is a sequence of 20 different types of amino acids.– A protein is a string over alphabet with size 20

• A peptide is a substring of the protein.• The 20 amino acids have 19 distinct masses.

– I and L have the same mass and cannot (difficult) be distinguished by MS/MS.

– Regard them as the same letter.

Page 5: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Tandem Mass Spectrometry

• MS/MS is the only reliable way for protein identification.

…VITK | GTDIMNEMR | SMW…

tissue fraction gel protein

peptide

Page 6: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

LGSSEVEQVQLVVDGVKpeptide sequence:

tandem mass spectrometer:

MS/MS spectrum

de novo sequencing:

LGSSEVEQVQLVVDGVK

database

Page 7: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

How Does a Peptide Fragment?

m(y1)=19+m(A4)m(y2)=19+m(A4)+m(A3)m(y3)=19+m(A4)+m(A3)+m(A2)

m(b1)=1+m(A1)m(b2)=1+m(A1)+m(A2)m(b3)=1+m(A1)+m(A2)+m(A3)

Page 8: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Matching Sequence with Spectrum

Page 9: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

• For any peptide P= a1…an, m(P) = Σi ai.

• De Novo Sequencing

– Given a spectrum, a mass value m, compute a sequence P, s.t. m(P)=m, and the matching score score(P) is maximized.

De Novo Sequencing

Page 10: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

A Simpler Case – Only Y-ions

Page 11: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Y-ions Determined By a Suffix19

y1 y2 y3score(Q) can be defined for a suffix Q.

)(max)()(

QscoreuDPuQm

)()()( ufVRscoreLVRscore

)()(max)(a

ufauDPuDP

Page 12: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Counting Both y and b ions

Page 13: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Strategies

• Consider a pair of prefix R and a suffix Q simultaneously.

• Consider only those pairs (R,Q) that satisfy a nice property, which we call “chummy”

• Chummy pairs allow:– The score of a chummy pair can be computed

recursively from a smaller chummy pair. – There are a series of chummy pairs that grow to

the optimal solution.

Page 14: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Dynamic Programming

• Combining Lemma A, B, we can compute

• Suppose (R,Q) is the pair maximizing DP(u,v) under the condition m(R)+m(Q)+a=m. Then RaQ is the optimal peptide.

),(max),(

chummy ),(

)(,)(QRscorevuDP

QR

vQmuRm

Page 15: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

PEAKS – The Software

Page 16: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Red = Correct

m/z z Correct Sequence PEAKS (de novo) Comments Lutefisk (de novo)

MALDI MS/MS BSA

927.4 1 YLYEIAR YLYEIAR correct [276.14]EY[184.08]R 1439.7 1 RHPEYAVSVLLR GVLMVDVPPADNGR Wrong (?) No results 1479.8 1 LGEYGFQNALIVR LWYGFQNALIVR correct No results 1639.8 1 KVPQVSTPTLVEVSR RAPKVPQVSTPTLVEVSR correct No results

ESI MS/MS Cyt- c

482.7 2 EDLIAYLK EDLIAYLK correct [357.15]LAYLK 584.8 2 TGPNLHGLFGR TGPNLHGLFGR correct TGPNLHGLFGR 589.3 1 GDVEK VDVEK V = Ac-G VDVEK 634.4 1 IFVQK IFVQK correct IFVQK 678.3 1 YIPGTK YIPGTK correct YIPGTK 728.8 2 TGQAPGFSYTDANK TGQAPGFSYTDANK correct [199.10]SAPGF[250.09]TWNK 779.4 1 MIFAGIK MIFAGIK correct [244.12]FAGLK 792.9 2 KTGQAPGFSYTDAMK KTGAGAPGFSYTDAMK almost [229.15]QGAPGAYQNHANK 817.3 2 IFVQKCAQCHTVEK QFVTHMACCHTVEK partial [257.08][218.08][GP][260.08][HM]TVEK

Apo-Myoglobin

662.3 1 ASEDLK ASEDLK correct [244.07]SALK 689.9 2 HGTVVLTALGGILK HGTVVLTALGGILK correct HGTVVLTALG[170.1]LK 748.4 1 ALELFR ALELFR correct [184.12]ELFR 803.9 2 VEADIAGHGQEVLIR LDADIAGHGQEVLIR almost no results 908.4 2 GLSDGEWQQVLNVWGK GLSDGEWQQVLNVWGK correct [170.11]SG[244.07]WQQVLNVWGK 943.2 2 YLEFISDAIIHVLHSK YLEFISDAIIHVLHSK correct [276.1]EFLSD[184.12]LHVLHSK

Comparison of PEAKS and Lutefisk

Page 17: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Users

Page 18: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Implementation Particulars

• More accurate scoring:– sum of the logarithmic intensities– many other ion types– coexisting ions, e.g., x2, y2, z2

• Deconvolution– converting multiply-charged peaks to singly-charged

ones

• Recalibration – compress/stretch the spectrum for calibration error

• Noise reduction

Page 19: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Acknowledgement

• Bin Ma, Kaizhong Zhang were supported by NSERC.

• Chengzhi Liang was supported by BSI.

• Thanks the development team in BSI for the software development.

Page 20: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Page 21: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Tandem Mass Spectrometer

massanalyzer

fragment

precursor ions fragment ions

MPSER

SG…

+

PAK +

+

P+ AKPAK +

PAK + PA+ K

AK+P

K+PA

P +K+

PA+

AK+

PAK +

PAK +

de novo sequencing

massanalyzer

ionsdetector

Page 22: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Algorithm Sandwich• DP(0,0) = 0; DP(u,v) = -infinity for (u,v)!=(0,0);

• for u from 1 to m/2 do

for v from u-max(a) to u+max(a) do

for a in Σ do

if u<v then

else

• find u,v,a, s.t. u+v+a=m and DP(u,v) maximized;

• backtracking;

),(),,(),(max),( vauDPvufvuDPvauDP

),(),,(),(max),( avuDPvugvuDPavuDP

Page 23: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Page 24: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Dynamic Programming

1. for u from 0 to m

2. backtracking

)()(max)( ufauDPuDP a

Page 25: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Dynamic Programming

),(max),(

suffix is prefix, is

)(,)(QRscorevuDP

QR

vQmuRm

•We hope DP(u,v) for u+v=m gives the optimal prefix and suffix. •The optimal solution can be obtained by concatenation of the prefix and suffix.

Page 26: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Chummy Pairs

• Two strings Ra and bQ are called chummy pairs, iff. either of the following two is true:(C1)(C2)

)a(1)b(19)(1 RmQmRm

)b(19)a(1)(19 QmRmQm

(LGE, LVR) (C2)(LGE, VR) (C1)(LGE, R) (C1)(LG,VR) is not chummy

Page 27: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Chummy pairs

• Lemma A – Suppose Ra and bQ are a chummy pair. u=m(Ra), v=m(bQ). If (C1) is true,

If (C2) is true,

) , ( ) a ( ) b a (v u f ,Q R score Q, R score

) , ( ) b ( ) b a (v u g Q R, score Q, R score

Page 28: PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.

Chummy Pairs

• Lemma B – Let P be the optimal solution. Then there is a chummy pair (R,Q) and a letter a such that P=RaQ. Also, there is a chummy pair series such that

),(),(),(),( 11 QRQRQR nn