Learning Local Affine Learning Local Affine Representations for Texture and Representations for Texture and Object Recognition Object Recognition Svetlana Lazebnik Svetlana Lazebnik Beckman Institute, University of Illinois at Beckman Institute, University of Illinois at Urbana-Champaign Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce) (joint work with Cordelia Schmid, Jean Ponce)
49
Embed
Learning Local Affine Representations for Texture and Object Recognition Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Learning Local Affine Representations Learning Local Affine Representations for Texture and Object Recognitionfor Texture and Object Recognition
Svetlana Lazebnik Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-ChampaignBeckman Institute, University of Illinois at Urbana-Champaign
(joint work with Cordelia Schmid, Jean Ponce)(joint work with Cordelia Schmid, Jean Ponce)
OverviewOverview• Goal:Goal:
– Recognition of 3D textured surfaces, object classes
• Our contribution:Our contribution: – Texture and object representations based on
local affine regions
• Advantages of proposed approach:Advantages of proposed approach: – Distinctive, repeatable primitives– Robustness to clutter and occlusion – Ability to approximate 3D geometric transformations
The ScopeThe Scope1. Recognition of single-texture images (CVPR 2003)
2. Recognition of individual texture regions in multi-texture images (ICCV 2003)
3. Recognition of object classes (BMVC 2004, work in progress)
1. Recognition of Single-Texture Images1. Recognition of Single-Texture Images
Affine Region DetectorsAffine Region DetectorsHarris detector (H) Laplacian detector (L)
• Based on range spin images (Johnson & Hebert 1998)• Two-dimensional histogram:
distance from center × intensity value
Rotation-Invariant Descriptors 2: RIFTRotation-Invariant Descriptors 2: RIFT• Based on SIFT (Lowe 1999)• Two-dimensional histogram:
distance from center × gradient orientation• Gradient orientation is measured w.r.t. to the direction
pointing from the center of the patch
Signatures and EMDSignatures and EMD
• SignaturesS = {(m1, w1), … , (mk, wk)} mi — cluster center wi — relative weight
• Earth Mover’s Distance (Rubner et al. 1998)– Computed from ground distances d(mi, m'j) – Can compare signatures of different sizes – Insensitive to the number of clusters
Affine adaptation None (support of descriptors is fixed)
Descriptors Spin images, RIFT Raw pixel values
Textons Separate set of textons for each image
Universal texton dictionary
Representing/comparing texton distributions
Signatures/EMD Histograms/ chi-squared distance
Results of Evaluation:Results of Evaluation:Classification rate vs. number of training samplesClassification rate vs. number of training samples
• Conclusion:Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set
(H+L)(S+R) VZ-Joint VZ-MRF
SummarySummary
• A sparse texture representation based on local affine regions
• Two novel descriptors (spin images, RIFT)• Successful recognition in the presence of viewpoint
changes, non-rigidity, non-homogeneity• A flexible approach to invariance
2. Recognition of Individual Regions in 2. Recognition of Individual Regions in Multi-Texture ImagesMulti-Texture Images
• A two-layer architecture: – Local appearance + neighborhood relations
• Learning:– Represent the local appearance of each texture class
using a mixture-of-Gaussians model– Compute co-occurrence statistics of sub-class labels over
affinely adapted neighborhoods
• Recognition:– Obtain initial class membership probabilities from the
generative model– Use relaxation to refine these probabilities
Two Learning ScenariosTwo Learning Scenarios
• Fully supervised:Fully supervised: every region in the training image is labeled with its texture class
• Weakly supervised:Weakly supervised: each training image is labeled with the classes occurring in it
brick
brick, marble, carpet
Estimate:• probability p(c,c')• correlation r(c,c')
Neighborhood StatisticsNeighborhood Statistics
Neighborhood definition
Relaxation (Rosenfeld et al. 1976)Relaxation (Rosenfeld et al. 1976)
• Iterative process: – Initialized with posterior probabilities p(c|xi) obtained from
the generative model– For each region i and each sub-class label c, update the
probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c')
• Shortcomings:– No formal guarantee of convergence– After the initialization, the updates to the probability values
do not depend on the image data
Experiment 1: 3D Textured SurfacesExperiment 1: 3D Textured SurfacesSingle-texture images
Objects without Characteristic TextureObjects without Characteristic Texture
(LeCun’04)
Summary of TalkSummary of Talk
1. Recognition of single-texture images • Distribution of local appearance descriptors
2. Recognition of individual regions in multi-texture images• Local appearance + loose statistical neighborhood
relations
3. Recognition of object categories• Local appearance + strong geometric relations
For more information: http://www-cvr.ai.uiuc.edu/ponce_grp
Issues, ExtensionsIssues, Extensions
• Weakly supervised learning– Evaluation methods?– Learning from contaminated data?
• Probabilistic vs. geometric approaches to invariance• EM vs. direct correspondence search• Training set size• Background modeling• Strengthening the representation
– Heterogeneous local features– Automatic feature selection– Inter-part relations