LS-SVMlab & Large scale modeling Kristiaan Pelckmans, ESAT- SCD/SISTA J.A.K. Suykens, B. De Moor
Feb 07, 2016
ContentContent
• I. Overview• II. Classification • III. Regression• IV. Unsupervised Learning• V. Time-series• VI. Conclusions and Outlooks
People
Contributors to LS-SVMlab:
•Kristiaan Pelckmans
•Johan Suykens
•Tony Van Gestel
•Jos De Brabanter
•Lukas Lukas
•Bart Hamers
•Emmanuel Lambert
Supervisors:
•Bart De Moor
•Johan Suykens
•Joos Vandewalle
Acknowledgements
Our research is supported by grants from several funding agencies and sources: Research Council K.U.Leuven: Concerted Research Action GOA-Mefisto 666 (Mathematical Engineering), IDO (IOTA Oncology, Genetic networks), several PhD/postdoc & fellow grants; Flemish Government: Fund for Scientific Research FWO Flanders (several PhD/postdoc grants, projects G.0407.02 (support vector machines), G.0080.01 (collective intelligence), G.0256.97 (subspace), G.0115.01 (bio-i and microarrays), G.0240.99 (multilinear algebra), G.0197.02 (power islands), research communities ICCoS, ANMMM), AWI (Bil. Int. Collaboration South Africa, Hungary and Poland), IWT (Soft4s (softsensors), STWW-Genprom (gene promotor prediction), GBOU McKnow (Knowledge management algorithms), Eureka-Impact (MPC-control), Eureka-FLiTE (flutter modeling), several PhD-grants); Belgian Federal Government: DWTC (IUAP IV-02 (1996-2001) and IUAP V-10-29 (2002-2006): Dynamical Systems and Control: Computation, Identification & Modelling), Program Sustainable Development PODO-II (CP-TR-18: Sustainibility effects of Traffic Management Systems); Direct contract research: Verhaert, Electrabel, Elia, Data4s, IPCOS. JS is a professor at K.U.Leuven Belgium and a postdoctoral researcher with FWO Flanders. BDM and JWDW are full professors at K.U.Leuven Belgium.
I. OverviewI. Overview
• Goal of the Presentation1. Overview & Intuition
2. Demonstration LS-SVMlab
3. Pinpoint research challenges
4. Preparation NIPS 2002
• Research results and challenges• Towards applications• Overview LS-SVMlab
I.2 Overview researchI.2 Overview research
“Learning, generalization, extrapolation, identification, smoothing, modeling”
• Prediction (black box modeling)
• Point of view: Statistical Learning, Machine Learning, Neural Networks, Optimization, SVM
I.3 Towards applicationsI.3 Towards applications
• System identification• Financial engineering• Biomedical signal processing• Datamining• Bio-informatics• Textmining• Adaptive signal processing
I.4 LS-SVMlab (2)I.4 LS-SVMlab (2)
• Starting points:– Modularity– Object Oriented & Functional Interface– Basic bricks for advanced research
• Website and tutorial
• Reproducibility (preprocessing)
II. ClassificationII. Classification
“Learn the decision function associated with a set of labeled data points to predict the values of unseen data”
• Least Squares – Support Vector Machines
• Bayesian Framework• Different norms• Coding schemes
II.1 Least Squares – Support vector Machines II.1 Least Squares – Support vector Machines (LS-SVM (LS-SVM (,)))
1. Least Squares cost-function + regularization & equality constraints
2. Non-linearity by Mercer kernels
3. Primal-Dual Interpretation (Lagrange multipliers)
Primal parametric Model:
iiT
i ebxwy
Dual non-parametric Model:
i
n
jjiii ebxxKy
1
),(
(.,.)K
II.1 LS-SVM II.1 LS-SVM ((,,))
“Learning representations from relations”
NNNN
N
aaaa
aa
aaaaaa
,......,
............
.........,
,...,,
12
12111
II.2 Bayesian Inference
• Bayes rule (MAP):
• Closed form formulasApproximations: - Hessian in optimum
- Gaussian distribution
• Three levels of posteriors:
)(
)()|()|(
XP
PXPXP
)|(:Level
),|(:Level
),,|(:Level
3
2
1
XKP
XKP
XKP
II.3 SVM formulations & normsII.3 SVM formulations & norms
• 1 norm + inequality constraints: SVMextensions to any convex cost-function
• 2 norm + equality constraints: LS-SVM
weighted versions
II.4 Coding schemesII.4 Coding schemes
… 1 2 4 6 2 1 3 …
…… 1 -1 1 1 …
… -1 -1 -1 1 …
… 1 -1 -1 -1 …… 1 2 4 6 2 1 3 …
Encoding Decoding
Multi-class Classification task (multiple) binary classifiers
Labels:
III. RegressionIII. Regression
“Learn the underlying function from a set of data points and its corresponding noisy targets in order to predict the values of unseen data”
• LS-SVM(,)
• Cross-validation (CV)• Bayesian Inference• Robustness
III.1 LS-SVM(,)
• Least Squares cost-function + Regularization & Equality constraints
• Mercer kernels
• Lagrange multipliers:Primal Parametric Dual Non-parametric
III.1 III.1 LS-SVM(,) (2)• Regularization parameter:
– Do not fit noise (overfitting)!– trade-off noise and information
ex
xxf 5
)10sin()sinc()(
III.2 Cross-validation (CV)III.2 Cross-validation (CV)
“How to estimate generalization power of model?”
• Division training set – test set
• Repeated division: Leave-one-out CV (fast implementation)
• L-fold cross-validation
• Generalized Cross-validation (GCV):
• Complexity criteria: AIC, BIC, …
NN y
y
y
y
KXS
ˆ
...
ˆ
... . ),|(11
1 2 3…t-l-1 t-l…t+l t+1+l … n
1 2 3 …. t-2 t-1 t t+1 t+2 … n
1 2 3 …. t-1 t … n
III.2 Cross-validation Procedure III.2 Cross-validation Procedure (CVP)(CVP)
“How to optimize model for optimal generalization performance”
• Trade-off fitting – model complexity
• Kernel parameters
• Optimization routine?
III.1 III.1 LS-SVM(,) (3)
• Kernel type and parameter“Zoölogy as elephantism and non-elephantism”
• Model Comparison
• By cross-validation or Bayesian Inference
III.3 ApplicationsIII.3 Applications
“ok, but does it work?”
• Soft4s– Together with O. Barrero, L. Hoegaerts,
IPCOS (ISMC), BASF, B. De Moor– Soft-sensor
• ELIA– Together with O. Barrero, I.Goethals, L.
Hoegaerts, I.Markovsky, T. Van Gestel, ELIA, B. De Moor
– Prediction short and long term electricity consumption
III.2 Bayesian Inference
• Bayes rule (MAP):
• Closed form formulas
• Three levels of posteriors:
)(
)()|()|(
XP
PXPXP
)|(:)Comparison (Model Level
),|(:ation)(Regulariz Level
),,|(: )parameters (Model Level
3
2
1
XKP
XKP
XKP
III.4 RobustnessIII.4 Robustness
“How to build good models in the case of non-Gaussian noise or outliers”
• Influence function
• Breakdown point
• How:– De-preciating influence of large residuals– Mean - Trimmed mean – Median
• Robust CV, GCV, AIC,…
IV. Unsupervised LearningIV. Unsupervised Learning
“Extract important features from the unlabeled data”
• Kernel PCA and related methods • Nyström approximation
– From Dual to primal
– Fixed size LS-SVM
IV.2 Kernel PCA (2)IV.2 Kernel PCA (2)
• Primal Dual LS-SVM style formulations
• For Kernel PCA, CCA, PLS
IV.2 NystrIV.2 Nyström approximationöm approximation
• Sampling of integral equation
• Approximating Feature map for Mercer kernel
)()()(),( ydxxpyyxK ii
)()(),(1
yyyxK i
N
jiij
)()(),(1
yyyxK i
n
jiij
)()(),( yxyxK T
(.)
(.)
V. Time-seriesV. Time-series
“Learn to predict future values given a sequence of past values”
• NARX• Recurrent vs. feedforward
V.1 NARXV.1 NARX
• Reducible to static regression
• CV and Complexity criteria• Predicting in recurrent mode• Fixed size LS-SVM (sparse representation)
),...,,(ˆ 11 ltttt yyyfy
,....,,,,,..., 54321 tttttt yyyyyy
f
V.2V.2 Recurrent models? Recurrent models?
“How to learn recurrent dynamical models?”
• Training cost = Prediction cost?
• Non-parametric model class?
• Convex or non-convex?
• Hyper-parameters?
)ˆ,...,ˆ,ˆ(ˆ 21 ltttt yyyfy
VI.0 ReferencesVI.0 References
• J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor & J. Vandewalle (2002), Least Squares Support Vector Machines, World Scientific.
• V. Vapnik (1995), The Nature of Statistical Learning Theory, Springer-Verlag. • B. Schölkopf & A. Smola (2002),
Learning with Kernels, MIT Press.• T. Poggio & F. Girosi (1990), ``Networks
for approximation and learning'', Proc. of the IEEE, , 78, 1481-1497.
• N. Cristianini &J. Shawe-Taylor (2000), An Introduction to Support Vector Machines, Cambridge University Press.