UKKDD 2007 Niall Talk

8/3/2019 UKKDD 2007 Niall Talk

1/19

1

Stacking for supervised learningStacking for supervised learning

Niall Rooney,

NIKEL, University of Ulster


2/19

2

Ensemble learningEnsemble learning

l Postulate multiple hypotheses to explain thedata

l Shortcomings of single model learning

algorithms (Dietterich , 2002) Statistical problem Computational problem

Representational problem


3/19

3


l Generalization Error: Bias + VarianceBias: how close the algorithms average

prediction is close to the target

Variance : how much the algorithmspredictions bounces round for differenttraining sets

a model which is too simple, or too inflexible,

will have a large bias

a model which has too much flexibility willhave high variance


4/19

4


l Generalization Error: EnsemblesEnsembles reduce bias and/or variance

Ensembles to be effective need diverse and

accurate base modelsDiversity measured by level of variability in

base members predictions (for regression)


5/19

5


Homogeneous learning- data sampling, feature sampling,

randomization, parameter settings

Heterogeneous learning

- Same data, different learning algorithms


6/19

6

Ensemble LearningEnsemble Learning

Classifier 1 Classifier 2 Classifier N. . .

Input Features

Combiner

Class Predictions

Class Prediction


7/19

7


Methods of combination: Voting, Weighting, Selection

Mixture of experts

Error-correcting output codes

Bagging

Boosting Stacking


8/19

8

Ensemble Learning: StackingEnsemble Learning: Stacking

Base Model1 Base Model 2 Base Modeln

Meta Model

Prediction

instance


9/19

9

Meta Technique: SRMeta Technique: SR

CV Meta-training set

{ }( ( ),..., ( ), ) f f y j m j j1 x x

...M1 M2 Mm

instanceInstance x*

Base

Modelfi

Base Predictions f1(x*) f2(x

*) fm(x

*)

Combining (Meta-Level)

model Meta-M

Final Prediction Meta-M(f1(x* ),..., fm(x*) )


10/19

10

Stacking for classificationStacking for classification

Use class distributions from base classifiersrather than class predictions

1 1 1 1{( ( | ),..., ( | ),..., ( | ),..., ( | ), )}k m m k P C x P C x P C x P C x y

Choice of Meta-classifier:

Multi-response linear regression

- For a classification with m class values, mregression problems

- Only use probabilities related to class Cj to

predict class Cj


11/19

11

Stacking for classificationStacking for classification

Different type of base classifers Multi-response model trees used to

guarantee better performance thanSelecting best classifier


12/19

12

Stacking for regressionStacking for regression

Linear regression requires non-negativeweights

Model trees meta-learner

Homogeneous Stacking using randomfeature sub-sets

Feature sub-sets can be improved upon

using hill-climbing or GA techniques


13/19

13

RelatedRelated techniques:Mutipletechniques:Mutiple metameta--levelslevels

Classifer1

Classifer2

Classifer3

Cascade Generalization


14/19

14

RelatedRelated techniques:Mutipletechniques:Mutiple metameta--levelslevels

Classifer1 Classifer2 Classifer3

Combiner Trees

Classifer4

T1 T2 T1 T1

Combiner1 Combiner2

Combiner3

Disjoint training sets


15/19

15

Related Techniques: DynamicRelated Techniques: DynamicIntegrationIntegration

Meta- Level Training Set

...M1 M2 Mm

instance

x*

Base

Modelfi

Base errors

Combining model

(Meta-level) Meta-M

Final Prediction Meta-M( f1(x* ),..., fm(x*) )

{(xj,Err1(xj),..,Errm(xj),yj)}

fm(x*)f2(x

*)f1(x*)

Erri(x)=|fi(x)-yi|


16/19

16

Dynamic IntegrationDynamic Integration

Meta-M Meta Model - distance weighted k-NN

l NN set of k nearest meta-instances

l

For each member find cumulative errorof each model


17/19

17

Dynamic IntegrationDynamic Integration

l Dynamic Selection (DS) choose the model with lowest cumulative error

l Dynamic Weighting (DW)

combine the models with weights based on theircumulative error

l Dynamic Weighting with Selection (DWS)

combine the models as DW but exclude models ifthey have larger than median cumulative error


18/19

18

ApplicationsApplications

l Distributed data mining

l Intrusion detection

l

Concept drift


19/19

19

Key papersKey papers

l Wolpert, D. H.: Stacked Generalization. Neural Networks, 5(1992) 241-259

l Breiman, L.: Stacked Regressions. Machine Learning, 24 (1996)49-64

l Dietterich, T. G.: Ensemble Methods in Machine Learning.Lecture Notes in Computer Science, 1857 (2000) 1-15

l Dzeroski, S., & Zenko, B.: Is Combining Classifiers withStacking Better than Selecting the Best One? MachineLearning, 54 (2004) 255-273

l Ting, K. M., & Witten, I. H.: Issues in Stacked Generalization.Journal of Artificial Intelligence Research, 10 (1999) 271-289

UKKDD 2007 Niall Talk

Documents

UKKDD 2007 Niall Talk