Top Banner
Boosting of classifiers Ata Kaban
18

Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Jan 01, 2016

Download

Documents

Naomi Thomas
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Boosting of classifiers

Ata Kaban

Page 2: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Motivation & beginnings

• Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner– E.g. if an email contains the work “money” then classify

it as spam, otherwise as non-spam• Is it possible to use this weak learning algorithm to

create a strong classifier with error rate close to 0?• Ensemble learning – the wisdom of crowds– More heads are better than one

Page 3: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Motivation & beginnings

• The answer is YES• Rob Shapire and Yoav Freund developed the

Adaboost algorithm• Given:

1. Examples where 2. A weak learning algorithm A, that produces weak

classifiers

• Goal: Produce a new classifier with error Note, is not required to be in

Page 4: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Idea

• Use the weak learning algorithm to produce a collection of weak classifiers– Modify the input each time when asking for a new

weak classifier• Weight the training points differently

• Find a good way to combine them

Page 5: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Main idea behind Adaboost

• Iterative algorithm• Maintains a distribution of weights over the training

examples• Initially weights are equal• At successive iterations the weight of misclassified

examples is increased• This forces the algorithm to focus on the examples that

have not been classified correctly in previous rounds• Take a linear combination of the predictions of the weak

learners, with coefficients proportional to the performance of the weak learner.

Page 6: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Demo – first two rounds

Page 7: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Demo (cont’d) – third round …

Page 8: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Demo (cont’d) – the final classifier

Page 9: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Pseudo-code

• For t=1,…,T– Construct a discrete probability distribution over

indices of training points {1,2,…N}, denote it as – Run algorithm A on to produce weak classifier – Calculate where by the weak learning assumption

this is slightly smaller than ½ (random guessing)• Output where

Page 10: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Details for pseudo-code

• How to construct • How to determine Adaboost does these in the following way:

Page 11: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

The weights of training points

• Initially all weights are equal.

• Weights of examples go up or down depending on how easy the example was to classify: If an example is easy it will get small weight , hard ones get large weights

Page 12: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

The combination coefficients

• Weighted vote, where the coefficient for weak-learner is related to how well the weak classifier performed on the weighted training set:

Page 13: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Comments

• One can show that the training error of Adaboost drops exponentially fast as the rounds progress

• The more rounds the more complex the final classifier is, so overfitting can happen

• In practice overfitting is rarely observed and Adaboost tends to have excellent generalisation performance

Page 14: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Typical behaviour

Page 15: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Practical advantages of Adaboost

• Can construct arbitrarily complex decision regions

• Generic: Can use any classifier as weak learner, we only need it to be slightly better than random guessing

• Simple to implement• Fast to run• Adaboost is one of the ‘top 10’ algorithms in

data mining

Page 16: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Caveats

• Adaboost can fail if there is noise in the class labels (wrong labels)

• It can fail if the weak-learners are too complex• It can fail of the weak-learners are no better

than random guessing

Page 17: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Topics not covered

• Other combination schemes for classifiers– E.g. Bagging

• Combinations for unsupervised learning

Page 18: Boosting of classifiers Ata Kaban. Motivation & beginnings Suppose we have a learning algorithm that is guaranteed with high probability to be slightly.

Further readings• Robert E. Schapire. The boosting approach to machine learning. In D. D.

Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.

• Robi Polikar. Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Magazine, 6(3), pp. 21-45, 2006. http://users.rowan.edu/~polikar/RESEARCH/PUBLICATIONS/csm06.pdf

• Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2): 139-158, 2000.

• Collection of papers: http://www.cs.gmu.edu/~carlotta/teaching/CS-795-s09/info.html