Boosting Rong Jin
Feb 23, 2016
Boosting
Rong Jin
Inefficiency with Bagging
D
Bagging
…
D1 D2 Dk
Boostrap Sampling
Pr( | , )ii c h x
h1 h2 hk
Inefficient boostrap sampling:• Every example has equal chance to be sampled• No distinction between “easy” examples and “difficult” examples
Inefficient model combination:• A constant weight for each classifier• No distinction between accurate classifiers and inaccurate classifiers
Improve the Efficiency of Bagging
Better sampling strategy• Focus on the examples that are difficult to classify
Better combination strategy• Accurate model should be assigned larger weights
Intuition
Training Examples
X1
Y1
X2
Y2
X3
Y3
X4
Y4
MistakesX1
Y1
X3
Y3
Classifier1 Classifier2
MistakesX1
Y1
+Classifier3
No training mistakes !! May overfitting !!
+
AdaBoost Algorithm
AdaBoost Example: t=ln2x1, y1 x2, y2 x3, y3 x4, y4 x5, y5
1/5 1/5 1/51/5 1/5D0:x5, y5x3, y3x1, y1
Sample
h1
Training
2/7 1/7 2/71/7 1/7D1:
x1, y1 x2, y2 x3, y3 x4, y4 x5, y5
Update Weightsh1
Samplex3, y3x1, y1
h2
Training
x1, y1 x2, y2 x3, y3 x4, y4 x5, y5
h2 Update Weights
2/9 1/9 4/91/9 1/9D2: Sample …
How To Choose t in AdaBoost?How to construct the best distribution Dt+1(i)1. Dt+1(i) should be significantly different from Dt(i)2. Dt+1(i) should create a situation that classifier ht performs poorly
How To Choose t in AdaBoost?
Optimization View for Choosing t
ht(x): x{1,-1}; a base (weak) classifierHT(x): a linear combination of basic classifiers
Goal: minimize training error
Approximate error swith a exponential function
AdaBoost: Greedy OptimizationFix HT-1(x), and solve hT(x) and t
Empirical Study of AdaBoost
AdaBoosting decision trees• Generate 50 decision trees by
AdaBoost• Linearly combine decision trees
using the weights of AdaBoost
In general:• AdaBoost = Bagging > C4.5• AdaBoost usually needs less number
of classifiers than Bagging
Bia-Variance Tradeoff for AdaBoost• AdaBoost can reduce both variance and bias
simultaneously
single decision tree
Bagging decision tree
bias
variance
AdaBoosting decision trees