Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg (CVPR08) Jitendra Malik UC Berkeley.

Classification using intersection kernel SVMs is efficient

Joint work with Subhransu Maji and Alex Berg (CVPR’08)

Jitendra Malik UC Berkeley

Detection: Is this an X?

Ask this question over and over again,varying position, scale, multiple categories…





Boosted decision trees + Very fast evaluation - Slow training (esp. multi-class)Linear SVM + Fast evaluation + Fast training - Low accuracy unless very good featuresNon-linear kernelized SVM + Better accuracy than linear . Medium training - Slow evaluation

This work

Support Vector Machines

Linear Separators (aka. Perceptrons)

B2


Other possible solutions

B2


Which one is better? B1 or B2? How do you define better?

B1

B2


Find hyperplane maximizes the margin => B1 is better than B2

B1

B2

b11

b12

b21

b22

margin

Kernel Support Vector Machines

Kernel :•Inner Product in Hilbert Space

•Can Learn Non Linear Boundaries

2

2( , ) exp( )

2

x zK x z

σ−

= −

( , ) ( ) ( )TK x z x z=Φ Φ

Feature Representation

Discriminative Classifier

(+ examples) (- examples)

Training Stage

Our Multiscale HOG-like feature

Concatenate orientation histograms for each orange region.Differences from HOG: -- Hierarchy of regions -- Only performing L1 normalization once (at 16x16)

Comparison to HOG (Dalal & Triggs)

Smaller Dimensional (1360 vs. 3780) Simple Implementation (Convolutions) Faster to compute

+ No non-local Normalization

+ No gaussian weighting

+ No color normalization

Comparison to HOG (Dalal & Triggs)

What is the Intersection Kernel?

Histogram Intersection kernel between histograms a, b

What is the Intersection Kernel?

Histogram Intersection kernel between histograms a, b

K small -> a, b are differentK large -> a, b are similar

Intro. by Swain and Ballard 1991 to compare color histograms.Odone et al 2005 proved positive definiteness.Can be used directly as a kernel for an SVM.Compare to

linear SVM, Kernelized SVM, IKSVM

Decision function is where:

Linear:

Non-linearUsingKernel

HistogramIntersectionKernel

Kernelized SVMs slow to evaluate

Arbitrary Kernel

HistogramIntersectionKernel

Feature corresponding to a support vector l

Feature vector to evaluate

Kernel EvaluationSum over all support vectors

SVM with Kernel Cost: # Support Vectors x Cost of kernel comp.IKSVM Cost: # Support Vectors x # feature dimensions


The Trick


Just sort the support vectorvalues in each coordinate, andpre-compute

To evaluate, find position ofin the sorted support vectorvalues (cost: log #sv)look up values, multiply & add

The Trick


Just sort the support vectorvalues in each coordinate, andpre-compute

To evaluate, find position ofin the sorted support vectorvalues (cost: log #sv)look up values, multiply & add

#support vectors x #dimensions

log( #support vectors ) x #dimensions

The Trick 2

For IK hi is piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves

time & space!


#support vectors x #dimensionslog( #support vectors ) x #dimensions

The Trick 2


#support vectors x #dimensionslog( #support vectors ) x #dimensions

constant x #dimensions

For IK hi is piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves

time & space!

Timing Results

Time to evaluate 10,000 feature vectors

IKSVM with our multi-scale version of HOG featuresbeats Dalal & Triggs. Alsofor Daimler Chrysler data. Current Best on these datasets.

Linear SVM with our multi-scale Version of HOG featureshas worse classification perf.than Dalal & Triggs.

reduced memory!

Distribution of support vector values and hi

Distribution of

Best Performance on Pedestrian Detection,Improve on Linear for Many Tasks

INRIA PedestriansDaimler Chrysler Pedestrians

Caltech 101 with “simple features” Linear SVM 40% correct IKSVM 52% correct

Classification Errors

Results – ETHZ DatasetDataset: Ferrari et al., ECCV 2006 255 images, over 5 classes training = half of positive images for a class + same number from the other classes (1/4 from each) testing = all other images large scale changes; extensive clutter

Method Applelogo Bottle Giraffe Mug Swan Avg

PAS* 65.0 89.3 72.3 80.6 64.7 76.7

Our 86.1 81.0 62.1 78.0 100 81.4

Beats many current techniques without any changes to our features/classification framework.

Recall at 0.3 False Positive per Image Shape is an important cue (use Pb instead of OE)

Results – ETHZ Dataset

*Ferarri et.al, IEEE PAMI - 08

Other kernels allow similar trick


IKSVM SVM

hi not piece-wise linear,but we can still use anapproximation for fastevaluation.

hi are piece-wise linear,uniformly spacedpiece-wise linear approx.is fast.

Conclusions Exactly evaluate IKSVM in O(n log m) as opposed to O(nm)

Makes SV cascade or other ordering schemes irrelevant for intersection kernel

Verified that IKSVM offers classification performance advantages over linear

Approximate decision functions that decompose to a sum of functions for each coordinate (including Chi squared)

Directly learn such classification functions (no SVM machinery) Generalized linear svm beats linear SVM in some applications

often as good as more expensive RBF kernels Showed that relatively simple features with IKSVM beats Dalal

& Triggs (linear SVM), leading to the state of the art in pedestrian detection.

Applies to best Caltech 256, Pascal VOC 2007 methods.

Classification Using Intersection Kernel Support Vector Machines is efficient.Subhransu Maji and Alexander C. Berg and Jitendra Malik.Proceedings of CVPR 2008, Anchorage, Alaska, June 2008.

Software and more results available at

http://www.cs.berkeley.edu/~smaji/projects/fiksvm/

Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg (CVPR08) Jitendra Malik UC Berkeley.

Documents

b slide

correct slide

b2 slide

16x16 slide

perceptrons slide

linear boundaries slide

support vectors svm

hog dalal triggs slide