Classification
Derek HoiemCS 598, Spring 2009
Jan 27, 2009
Outline
• Principles of generalization
• Survey of classifiers
• Project discussion
• Discussion of Rosch
Pipeline for Prediction
Imagery Representation Classifier Predictions
Free Lunch Theorem
Bias and Variance
Complexity Low BiasHigh Variance
High BiasLow Variance
Err
or
Overfitting• Need validation set• Validation set not same as test set
Bias-Variance View of Features• More compact = lower variance, potentially
higher bias• More features = higher variance, lower bias• More independence among features = simpler
classifier lower variance
How to reduce variance• Parameterize model
E.g., linear vs. piecewise
How to measure complexity?• VC dimension
Training error +
Upper bound on generalization error
N: size of training seth: VC dimension: 1-probability
How to reduce variance• Parameterize model• Regularize
How to reduce variance• Parameterize model• Regularize• Increase number of training examples
Effect of Training Size
Number of Training Examples
Err
or
Risk Minimization• Margins
x x
xx
x
x
x
x
oo
o
o
o
x2
x1
Classifiers• Generative methods
– Naïve Bayes– Bayesian Networks
• Discriminative methods– Logistic Regression– Linear SVM– Kernelized SVM
• Ensemble methods– Randomized Forests– Boosted Decision Trees
• Instance based– K-nearest neighbor
• Unsupervised– Kmeans
Components of classification methods• Objective function• Parameterization• Regularization• Training• Inference
Classifiers: Naïve Bayes• Objective• Parameterization• Regularization• Training• Inference x1 x2 x3
y
Classifiers: Logistic Regression• Objective• Parameterization• Regularization• Training• Inference
Classifiers: Linear SVM• Objective• Parameterization• Regularization• Training• Inference
x x
xx
x
x
x
x
oo
o
o
o
x2
x1
Classifiers: Linear SVM• Objective• Parameterization• Regularization• Training• Inference
x x
xx
x
x
x
x
oo
o
o
o
x2
x1
Classifiers: Linear SVM• Objective• Parameterization• Regularization• Training• Inference
x x
xx
x
x
x
x
o
oo
o
o
o
x2
x1
Needs slack
Classifiers: Kernelized SVM• Objective• Parameterization• Regularization• Training• Inference
xx xx oo o
x1
x
x
x
x
o
oo
x1
x12
Classifiers: Decision Trees• Objective• Parameterization• Regularization• Training• Inference
x x
xx
x
x
x
x
oo
o
o
o
o
x2
x1
Ensemble Methods: Boosting
figure from Friedman et al. 2000
Boosted Decision Trees
…
Gray?
High inImage?
Many LongLines?
Yes
No
NoNo
No
Yes Yes
Yes
Very High Vanishing
Point?
High in Image?
Smooth? Green?
Blue?
Yes
No
NoNo
No
Yes Yes
Yes
Ground Vertical Sky
[Collins et al. 2002]
P(label | good segment, data)
Boosted Decision Trees• How to control bias/variance trade-off
– Size of trees– Number of trees
K-nearest neighbor
x x
xx
x
x
x
xo
oo
o
o
o
o
x2
x1
• Objective• Parameterization• Regularization• Training• Inference
Clustering
x x
xx
x
xo
o
o
o
o
x1
x
x2
+ +
++
+
++
+
+
+
+
x2
x1
+
References
• General– Tom Mitchell, Machine Learning, McGraw Hill, 1997– Christopher Bishop, Neural Networks for Pattern Recognition, Oxford
University Press, 1995
• Adaboost– Friedman, Hastie, and Tibshirani, “Additive logistic regression: a statistical view
of boosting”, Annals of Statistics, 2000
• SVMs– http://www.support-vector.net/icml-tutorial.pdf
Project ideas?
Discussion of Rosch