This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
RENSSELAER
PLS: PARTIAL-LEAST SQUARES
• PLS: - Partial-Least Squares - Projection to Latent Structures - Please listen to Svante Wold• Error Metrics• Cross-Validation - LOO - n-fold X-Validation - Bootstrap X-Validation• Examples: - 19 Amino-Acid QSAR - Cherkassky’s nonlinear function
- y = sin|x|/|x|• Comparison with SVMs
mhnmmhnmnh
Thmnhnm
nThnnhmnmn
nThnmh
Thmmhm
nTT
m
PXWXT
PTX
yTTbXy
yTWPWb
yXXXb
*
111
11
1
11
1
ˆ
IMPORTANT EQUATIONS FOR PLS
RENSSELAER
• t’s are scores or latent variables• p’s are loadings
• w1 eigenvector of XTYYTX• t1 eigenvector of XXTYYT
• w’s and t’s of deflations:• w’s are orthonormal• t’s are orthogonal• p’s not orthogonal• p’s orthogonal to earlier w’s
TPTZZ 111
Z
IMPORTANT EQUATIONS FOR PLS
11
1
1*
1
1*1
11*
1
11
1
ˆ
ˆ
nTT
m
nT
m
TTm
nTT
n
nTT
n
yTWPWb
yTWb
CTTWb
yTTTXWy
yTTTTy
1*
*
**
*
WPWW
IWP
EWWTPT
ETPX
XWT
T
T
T
T
NIPALS ALGORITHM FOR PLS (with just one response variable y)
RENSSELAER
• Start for a PLS component:
• Calculate the score t:
• Calculate c’:
• Calculate the loading p:
• Store t in T, store p in P, store w in W• Deflate the data matrix and the response variable:
'1
'1
'1'
1
11
'1
ˆm
Tm
mm
nTn
Tmn
m
ww
ww
yyyXw
11
11
nTn
nTmn
m tttXp
'11 ˆmnmn wXt
11
11'11
nTn
nTn
ttytc
'11111
'11
ctyy
ptXX
nnn
Tmnnmnm
Do
for h
late
nt v
aria
bles
The geometric representation of PLSR. The X-matrix can be represented as N points in the K dimensional space where each column of X (x_k) defines one coordinate axis. The PLSR model defines an A-dimensional hyper-plane, which in turn, is defined by one line, one direction, per component. The direction coefficients of these lines are p_ak. The coordinates of each object, i, when its ak data (row i in X) are projected down on this plane are t_ia. These positions are related to the values of Y.
From Svante Wold, Michael Sjölström, Lennart Erikson, "PLS-regression: a basic tool of chemometrics," Chemometrics and Intelligent Laboratory Systems, Vol 58, pp. 109-130 (2001)
RENSSELAER
PIE Lipophilicity constant of the AA side chainPIF "DGR Free energy of transfer on AA sidechain from protein to H2OSAC Water accessible surface of AAMR Molecular refractivityLam Polarity parametereVol Molecular VolumeDDGTS Free energy of unforlding a protein
INXIGHT VISUALIZATION PLOT
RENSSELAER
REM RECOVER FILEScopy svante.txt a.txtcopy svante_label.txt sel_lbls.txt
REM MAHALINOBIS SCALINGanalyze a.txt 3copy a.txt.txt a.txt
REM PLS BOOTSTRAPanalyze a.txt 33
REM DESCALE RESULTSanalyze resultss.ttt 4
REM SCATTERPLOT WITH dos_mbotw results.ttt
QSAR.BAT: SCRIPT FOR BOOTSTRAP VALIDATION FOS AA’s
• w1 eigenvector of XTYYTX• t1 eigenvector of XXTYYT
• w’s and t’s of deflations:• w’s are orthonormal• t’s are orthogonal• p’s not orthogonal• p’s orthogonal to earlier w’s
Linear PLS Kernel PLS
• trick is a different normalization• now t’s rather than w’s are normalized• t1 eigenvector of K(XXT)YYT
• w’s and t’s of deflations of XXT
• • ••
KERNEL PLS HIGHLIGHTS
• Invented by Rospital and Trejo (J. Machine learning, December 2001)• They first altered the linear PLS by dealing with eigenvectors of XXT
• They also made the NIPALS PLS formulation resemble PCA more• Now non-linear correlation matrix K(XXT) rather than XXT is used• Nonlinear Correlation matrix contains nonlinear similarities of datapoints rather than • An example is the Gaussian Kernel similarity measure: