1 Unsupervised Joint Alignment of Complex Images Gary B Huang, Vidit Jain, Erik Learned-Miller.
Post on 28-Mar-2015
220 Views
Preview:
Transcript
1
Unsupervised Joint Alignment of Complex Images
Gary B Huang, Vidit Jain, Erik Learned-Miller
2/12
Joint Face Alignment
The Recognition Pipeline Most systems ignore the middle stage, relying on the initial detector
to do a rough alignment Alignment reduces variability and allows for conditioning on spatial
position and analysis of structure Two major drawbacks to current alignment methods
Designed for a single class Require manually labeling of either specific features or pose
More involved than simple discrete labels for detection and recognition AAM - ~80 landmarks for >100 training images
Unsupervised method with congealing No manually selected landmarks or hand selected parts No image explicitly labeled as canonical pose End result entirely determined by data
3/12
Congealing Intuition
Intra-class images have similar structure and shape Thus, low variability of pixel values at specific location
Distribution Field Distribution over alphabet ({0,1} for binary images) at each
pixel Set of images defines an empirical distribution field
Congealing
update distribution field from transformed
images
increase likelihoodof image with
respect to distributionfield
4/12
Congealing How to align a new image after congealing?
Insert into training set, re-run algorithm More efficient to save sequence of distribution fields
from congealing High entropy to low entropy sequence “Image Funnel”
Funneling: increase likelihood of new image at each iteration according to corresponding distribution field
New Image Aligned Image Image Funnel
5/12
Congealing Complex Images Congealing has proven to work well on certain object
classes Traditionally applied directly to pixel values Applied successfully to binary handwritten digits and MRI
volumes Our goal: Extend congealing to deal with noise in real
world images Complex and variable lighting effects Occlusions Highly varied foreground objects (hair, hats, glasses…) Highly varied backgrounds
6/12
Congealing Complex Images Extending Congealing to Complex Images
Traditionally congealing is done on pixel intensities High variation due to lighting and variable foreground
high entropy even when correctly aligned Congealing on edge values
No “basin of attraction”, plateaus in optimization landscape
Integrate over window SIFT descriptor at each pixel Each descriptor is 32 dimensional vector, too large to
estimate entropy
7/12
Congealing Complex Images Extending Congealing to Face Images (cont)
Cluster SIFT descriptors using kmeans Congealing on hard assignments forces pixels to take
relatively small number of values Similar local minima problems as with edge values Initial experiments with hard assignments led to congealing
terminating early with no significant changes from initial alignment
Use soft assignment of pixels to clusters Each pixel is multinomial distribution, with probabilities
equal to probability of belonging to each cluster Does not change nature of distribution field
Distribution field is still a set of distributions, one at each pixel, over the possible clusters
Analogy with grayscale using binary alphabet Gray pixels are treated as mixtures of underlying black and
white “subpixels”
8/12
Congealing Complex Images
Window around pixel SIFT vector and clusters
Posterior distribution
9/12
Results (faces) Congealed with 300 images from “Faces in the
Wild” Realistic data set of news photos with different people,
complex backgrounds, variable illumination and foreground appearance
10/12
Results (cars) Congealed with 125 rear car images (variable background/lighting)
Achieved with no labeling and no changes to code
11/12
Results on Recognition Tested effect on recognition
Used trained hyper-feature based recognizer (Jain et al) Tested using outputs of Viola-Jones, Zhou (supervised),
and funneling Congealing improves recognition with no added
supervision
AUC
Unaligned 0.6870
Zhou aligned
0.7312
Congealing
0.7549
12/12
Future Work Two-tier alignment process
Score alignment results based on likelihood under final distribution field, align low scoring images in separate stage
top related