Boosting

Boosting

Rong Jin

Inefficiency with Bagging

D

Bagging

…

D1 D2 Dk

Boostrap Sampling

Pr( | , )ii c h x

h1 h2 hk

Inefficient boostrap sampling:• Every example has equal chance to be sampled• No distinction between “easy” examples and “difficult” examples

Inefficient model combination:• A constant weight for each classifier• No distinction between accurate classifiers and inaccurate classifiers

Improve the Efficiency of Bagging

Better sampling strategy• Focus on the examples that are difficult to classify

Better combination strategy• Accurate model should be assigned larger weights

Intuition

Training Examples

X1

Y1

X2

Y2

X3

Y3

X4

Y4

MistakesX1

Y1

X3

Y3

Classifier1 Classifier2

MistakesX1

Y1

+Classifier3

No training mistakes !! May overfitting !!

+

AdaBoost Algorithm

AdaBoost Example: t=ln2x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

1/5 1/5 1/51/5 1/5D0:x5, y5x3, y3x1, y1

Sample

h1

Training

2/7 1/7 2/71/7 1/7D1:

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

Update Weightsh1

Samplex3, y3x1, y1

h2

Training

x1, y1 x2, y2 x3, y3 x4, y4 x5, y5

h2 Update Weights

2/9 1/9 4/91/9 1/9D2: Sample …

How To Choose t in AdaBoost?How to construct the best distribution Dt+1(i)1. Dt+1(i) should be significantly different from Dt(i)2. Dt+1(i) should create a situation that classifier ht performs poorly

How To Choose t in AdaBoost?

Optimization View for Choosing t

ht(x): x{1,-1}; a base (weak) classifierHT(x): a linear combination of basic classifiers

Goal: minimize training error

Approximate error swith a exponential function

AdaBoost: Greedy OptimizationFix HT-1(x), and solve hT(x) and t

Empirical Study of AdaBoost

AdaBoosting decision trees• Generate 50 decision trees by

AdaBoost• Linearly combine decision trees

using the weights of AdaBoost

In general:• AdaBoost = Bagging > C4.5• AdaBoost usually needs less number

of classifiers than Bagging

Bia-Variance Tradeoff for AdaBoost• AdaBoost can reduce both variance and bias

simultaneously

single decision tree

Bagging decision tree

bias

variance

AdaBoosting decision trees

Boosting

Documents

weights of adaboost

exponential function

training mistakes

number of classifiers

classifier ht

training data road map

easy examples

possible overfitting