SVMs for (x) Recognition (From Moghaddam / Yang’s “Gender Classification with SVMs”) Brian Whitman.

Post on 14-Jan-2016

215 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

SVMs for (x) Recognition

(From Moghaddam / Yang’s “Gender Classification with SVMs”)

Brian Whitman

“Commodity Intelligence” ‘Wow factor’ important

Collaborative filtering ‘Simple’ tasks sometimes the most useful An SVM embedded evaluator… Cameras with ‘common sense’

Why SVM for feature detection? Quick evaluation model Machines (SVs) are easily stored and small

Experiment: Gender ID Using MITFaces dataset

~7500 faces with varying genders, races, ages, expressions, ‘extras’

All aligned 160x160 with left eye at 80,80 Face content is usually only 80x40

MITFaces examples

Representation? Simple pixel values Why?

Sample size Maintain ‘ground rule’ of ML

Dimensions < Examples*2 At 3200 dims (80x40), this is hard Training parameters (maximum

lagrangians, kernel width) help We use 80x40 and 40x20 in our examples

Training stage Choose 3200 random adult faces for

training and 3200 random faces for testing Extract 80x40 ‘face window’ from each

face and treat the 3200 doubles (0..1) as a training example

Train SVM on pixel values of the train set (dual p4 xeon linux 2ghz -- 30 minutes)

Testing Stage Take the other 3200 face vectors and

present them to the learned SVM If class > 0, male, < 0, female. Confidence: some linear combination of #

of support vectors and magnitude of result Had no problem doing this at 10hz on a

PIII800 with tons running

In-class face gender results 80x40; C=100, aux=100

93% of faces classified correctly 95% male 90% female

40x20; C=100, aux=10 97%

98% male 95% female

Next step: Realtime Media Lab is where webcams go to die Webcam at 160x120, ‘face region’ to

80x40, downsampled to 40x20. Webcam gets frames at 10hz, we greyscale

it and present it to the previously trained SVM

Results… mixed

Realtime examples (If the demo crashes)

‘Creepybot’ With better control over alignment Monitors Windows clipboard Same architecture as the Creepycam

Creepybot Examples (If the demo crashes)

Other parameters MITFaces has a great data label set Train an SVM for appearance of each

descriptor: Race Age Gender Expression Moustache

Per-class results (40x20, etc…) “Adult or not”

Overall: 94% (Not adult: 403/516) (78%) (Adult): 2605/2684) (97%)

Per-class results… “Smiling or not”

Overall: 88% (Not smiling: 1354/1520) (89%) (Smiling: 1450 / 1672) (87%)

Per-class results “Serious or not”

Overall: 88% (Not serious: 1517/1712) (89%) (Serious: 1311/1484) (88%)

Could we do better? Representation is lacking But results are surprisingly good For realtime, need auto-alignment / rescaling, or a

better representation Could this lead to an invasion of cheap intelligent

cameras, each with tacky switches for feature detection and marketing?

top related