Mass Spectrometry in a drug discovery setting Claus Andersen Senior Scientist Sienabiotech Spa
Jan 24, 2016
Mass Spectrometry in a drug discovery setting
Claus AndersenSenior Scientist
Sienabiotech Spa
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Overview
• From genes to phenotype
• Proteins an introduction
• Mass Spec for protein
• Mass Spec data
• Mass Spec data analysis
• Mass Spec database searching
• Recent advances
identificationquantificationcharacterization
Bioinformatics and statistics in a drug discovery companyClaus Andersen
From genes to phenotypegenes
proteins
functions
pathways
metabolites
phenotypes
mRNA expression
Regulation D
egradation
Activation/inactivation
InteractionsKinematics
Protein abundance
Metabolite levels
ADME/Tox
Structure
Pharmacophore
Genome comparison
mRNA expression
Activation/inactivation
Protein abundance
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Proteins as functional units
Glucose
ATP
ATP
D.S. Goodsell pdb.orgVale and Milligan Science 2000
Myosin
Bioinformatics and statistics in a drug discovery companyClaus Andersen
What affects the proteome
Cellular proteome
Interactions
Temperature
Stress
Environment
Physiological role
Pharmaceuticalsubstances
Proteasomeprotein degradation
mRNA
Ribosomeprotein production
Genome
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Protein extraction and
digestion
Mass Spec on proteins
Treated/Sick
Control/Healthy Mass Spectrometer
Protein peptides
identification
MS spectra
quantification
characterization
KKYAAELHLV
P
O
Phosphorylation
KAVQQPDGLA
Oxidation
… post translational modifications (PTM)
QFHFHWGSLDQPDGLA
Peptides
and MS/MS spectra
HPLC
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Mass Spec data5 g
3000 MS spectra 500 MB
Total 700 MB
Gygi et al. Mol. Cell Bio. (1999)400 MS/MS spectra 200 MB
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Mass Spec data analysis• Fourier transformation (noise filtering)
• Gaussian peak fitting (peak detection)
• Generation of theoretical spectra (sequencespectra)
• Large scale spectral comparison (DB searching)
• Spectral deconvolution (de-novo sequencing)
• Large scale sequence searching (DB searching)
• Data fitting (quantitation)
• Statistics and probability theory (reliability estimation)
• Linear discriminant analysis (quality assessment)
• …. and lots more
Large scale spectral comparison (DB searching)
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Large scale spectral comparisonMass spec data
MS spectrum
FLIDSSRFSYPERPIIFLSMCYNIYSIAYIVRLTVGRERISCDFEEAAEPVLIQEGLKNT
Protein sequence DB ~2 mil
Protein peptides ~60 mil
Peptide fragments ~2000 mil
ERPIIFLSMCYNIYSIAYIV
etc. etc…
ERPIIFLSMCYNIYSIAYIVERPIIFLSMCYNIYSIAYIERPIIFLSMCYNIYSIAYERPIIFLSMCYNIYSIAERPIIFLSMCYNIYSIERPIIFLSMCYNIYSERPIIFLSMCYNIYERPIIFLSMCYNIERPIIFLSMCYNERPIIFLSMCYERPIIFLSMCERPIIFLSM…
In-silico data
MS/MS Spectrum
(Mpeptide+H)+ ±Δ
i
Ni
Ki{VIVYIVAYIVIAYIVSIAYIVYSIAYIVIYSIAYIVNIYSIAYIV…
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Large scale spectral comparisonPEP_PROBE by Sadygov and Yates Anal. Chem. 75 2003
NN
KNKN
KK
NKPi
iii
iiNK ),(,
Hypergeometric probability model
)!(!
!
knk
nkn
where
is the binomial coefficient
i
iKK i
iNN
Bioinformatics and statistics in a drug discovery companyClaus Andersen
where is the cumulative distribution function given by the hypergeometric model, is the number of all peptides in the database matching the (M+H)+ mass value.
Sadygov and Yates Anal. Chem. 2003
0
)(
0)(
)(
PPHM
HM
PN
PPNE
Expectation value (E-value)
Large scale spectral comparison
The E-value tells you how many peptides from the database are expected to have the same or better matches to the experimental spectrum by chance alone.
)( )( HM
N
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Large scale spectral comparison
Sadygov and Yates Anal. Chem. 2003
An example from yeast (Saccharomyces cerevisiae)
MS/MS spectrum(M+H)+ = 2076.010 ± 0.002 AMU
Yeast proteins 6 200
Yeast peptides ~200 000
Peptide fragments ~5 mil
N=569 160
K= 84 150ATHILDFGPGGASGLGVLTHR
Top candidate peptidesK1N1
LTPPQLPPQLENVILNKY
40 34
E-value
34 15
10-26.62
10-5.25
FAS1
SIP2
PeptideProtein name
Bioinformatics and statistics in a drug discovery companyClaus Andersen
Large scale spectral comparisonThe protein FAS1 is part of the fatty acid biosynthesis of yeast.Its enzyme classification number is (EC 2.3.1.86)
FAS1
Protein identificationIn general several peptides are found for each protein (3-10)
www.kegg.org
Bioinformatics and statistics in a drug discovery companyClaus Andersen
•Inverted sequence DB used for background distribution estimation (PRISM) Emili’s group Mol. Cell Proteomics, 2(2), p96-106, 2003
•Number of Sibling peptides (ProteinProphet)Aebersold’s group Anal. Chem. 74, p5383-5392, 2004
•Suffix tree searching: Lu and Chen Bioinformatics 19(2), pii113-ii121, 2003
•Bayesian approach:Chen Biosilico in press 2004
Most recent advances
Large scale spectral comparison
•An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Yates’ group J.Am.Soc.Mass Spec. 5(11) 1994•ProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Aebersold’s group Proteomics 2(10) 2002
Other approaches