8/3/2019 UKKDD 2007 Niall Talk
1/19
1
Stacking for supervised learningStacking for supervised learning
Niall Rooney,
NIKEL, University of Ulster
8/3/2019 UKKDD 2007 Niall Talk
2/19
2
Ensemble learningEnsemble learning
l Postulate multiple hypotheses to explain thedata
l Shortcomings of single model learning
algorithms (Dietterich , 2002) Statistical problem Computational problem
Representational problem
8/3/2019 UKKDD 2007 Niall Talk
3/19
3
Ensemble learningEnsemble learning
l Generalization Error: Bias + VarianceBias: how close the algorithms average
prediction is close to the target
Variance : how much the algorithmspredictions bounces round for differenttraining sets
a model which is too simple, or too inflexible,
will have a large bias
a model which has too much flexibility willhave high variance
8/3/2019 UKKDD 2007 Niall Talk
4/19
4
Ensemble learningEnsemble learning
l Generalization Error: EnsemblesEnsembles reduce bias and/or variance
Ensembles to be effective need diverse and
accurate base modelsDiversity measured by level of variability in
base members predictions (for regression)
8/3/2019 UKKDD 2007 Niall Talk
5/19
5
Ensemble learningEnsemble learning
Homogeneous learning- data sampling, feature sampling,
randomization, parameter settings
Heterogeneous learning
- Same data, different learning algorithms
8/3/2019 UKKDD 2007 Niall Talk
6/19
6
Ensemble LearningEnsemble Learning
Classifier 1 Classifier 2 Classifier N. . .
Input Features
Combiner
Class Predictions
Class Prediction
8/3/2019 UKKDD 2007 Niall Talk
7/19
7
Ensemble learningEnsemble learning
Methods of combination: Voting, Weighting, Selection
Mixture of experts
Error-correcting output codes
Bagging
Boosting Stacking
8/3/2019 UKKDD 2007 Niall Talk
8/19
8
Ensemble Learning: StackingEnsemble Learning: Stacking
Base Model1 Base Model 2 Base Modeln
Meta Model
Prediction
instance
8/3/2019 UKKDD 2007 Niall Talk
9/19
9
Meta Technique: SRMeta Technique: SR
CV Meta-training set
{ }( ( ),..., ( ), ) f f y j m j j1 x x
...M1 M2 Mm
instanceInstance x*
Base
Modelfi
Base Predictions f1(x*) f2(x
*) fm(x
*)
Combining (Meta-Level)
model Meta-M
Final Prediction Meta-M(f1(x* ),..., fm(x*) )
8/3/2019 UKKDD 2007 Niall Talk
10/19
10
Stacking for classificationStacking for classification
Use class distributions from base classifiersrather than class predictions
1 1 1 1{( ( | ),..., ( | ),..., ( | ),..., ( | ), )}k m m k P C x P C x P C x P C x y
Choice of Meta-classifier:
Multi-response linear regression
- For a classification with m class values, mregression problems
- Only use probabilities related to class Cj to
predict class Cj
8/3/2019 UKKDD 2007 Niall Talk
11/19
11
Stacking for classificationStacking for classification
Different type of base classifers Multi-response model trees used to
guarantee better performance thanSelecting best classifier
8/3/2019 UKKDD 2007 Niall Talk
12/19
12
Stacking for regressionStacking for regression
Linear regression requires non-negativeweights
Model trees meta-learner
Homogeneous Stacking using randomfeature sub-sets
Feature sub-sets can be improved upon
using hill-climbing or GA techniques
8/3/2019 UKKDD 2007 Niall Talk
13/19
13
RelatedRelated techniques:Mutipletechniques:Mutiple metameta--levelslevels
Classifer1
Classifer2
Classifer3
Cascade Generalization
8/3/2019 UKKDD 2007 Niall Talk
14/19
14
RelatedRelated techniques:Mutipletechniques:Mutiple metameta--levelslevels
Classifer1 Classifer2 Classifer3
Combiner Trees
Classifer4
T1 T2 T1 T1
Combiner1 Combiner2
Combiner3
Disjoint training sets
8/3/2019 UKKDD 2007 Niall Talk
15/19
15
Related Techniques: DynamicRelated Techniques: DynamicIntegrationIntegration
Meta- Level Training Set
...M1 M2 Mm
instance
x*
Base
Modelfi
Base errors
Combining model
(Meta-level) Meta-M
Final Prediction Meta-M( f1(x* ),..., fm(x*) )
{(xj,Err1(xj),..,Errm(xj),yj)}
fm(x*)f2(x
*)f1(x*)
Erri(x)=|fi(x)-yi|
8/3/2019 UKKDD 2007 Niall Talk
16/19
16
Dynamic IntegrationDynamic Integration
Meta-M Meta Model - distance weighted k-NN
l NN set of k nearest meta-instances
l
For each member find cumulative errorof each model
8/3/2019 UKKDD 2007 Niall Talk
17/19
17
Dynamic IntegrationDynamic Integration
l Dynamic Selection (DS) choose the model with lowest cumulative error
l Dynamic Weighting (DW)
combine the models with weights based on theircumulative error
l Dynamic Weighting with Selection (DWS)
combine the models as DW but exclude models ifthey have larger than median cumulative error
8/3/2019 UKKDD 2007 Niall Talk
18/19
18
ApplicationsApplications
l Distributed data mining
l Intrusion detection
l
Concept drift
8/3/2019 UKKDD 2007 Niall Talk
19/19
19
Key papersKey papers
l Wolpert, D. H.: Stacked Generalization. Neural Networks, 5(1992) 241-259
l Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996)49-64
l Dietterich, T. G.: Ensemble Methods in Machine Learning.Lecture Notes in Computer Science, 1857 (2000) 1-15
l Dzeroski, S., & Zenko, B.: Is Combining Classifiers withStacking Better than Selecting the Best One? MachineLearning, 54 (2004) 255-273
l Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization.Journal of Artificial Intelligence Research, 10 (1999) 271-289