Top Banner
Feature Selection for Image Retrieval and Object Recognition Nuno Vasconcelos et al. Statistical Visual Computing Lab ECE, UCSD Presented by Dashan Gao
34

Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

Feb 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

Feature Selection for Image Retrieval and Object Recognition

Nuno Vasconcelos et al.Statistical Visual Computing Lab

ECE, UCSD

Presented by Dashan Gao

Page 2: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

2

Scalable Discriminant Feature Selection for Image Retrieval and RecognitionN. Vasconcelos and M. VasconcelosTo appear in IEEE CVPR 2004

Feature Selection by Maximum Marginal Diversity: optimality and implications for visual recognitionN. VasconcelosProceedings of IEEE CVPR, 2003

Feature Selection by Maximum Marginal DiversityN. VasconcelosProceedings of Neural Information Processing Systems, 2002.

Page 3: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

3

Overview (1)Image retrieval is a large scale classification problem:

A large number of classes, large amounts of data per class

A discriminant feature space (of small dimensionality) is a pre-requisite for success

Feature Selection (FS) makes learning easier and tractable, in a lower dimensional feature space X

Goal: Find transformation T, constrained to be a subset projection

Find the projection matrix T that optimizes a criterion for “feature goodness”

Page 4: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

4

Overview (2)Weaknesses of traditional methods:

Based on sub-optimal criteria: variance maximization (principal component analysis PCA)Lack of scalability: they take infeasible time to computeDifficult to extend to multi-class problems (boosting)

Ultimate goal: minimize probability of error (MPE)Search for the Bayes error-optimal space of a given classification problem

Achievable goal (discriminant sense) : maximize separation between the different classes to recognize.

Page 5: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

5

Information theoretic feature selection (ITFS)

Infomax goal: maximize mutual information between the selected features and class labelsOutline:

Optimality properties (in MPE and discriminantsense) (Contribution 1)Trade-off between optimality and complexity(Contribution 2)Algorithmic implementation with low complexity

Page 6: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

6

Bayes Error (BE)

Advantage:BE depends only on the feature space, thus is the ultimate discriminant measure for FS.

Disadvantage: nonlinearity of max(.) operation

Page 7: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

7

Infomax principle

H(X) H(Y)

H(XIY) H(Y|X)

H(.) is entropy

H(Y|X) is conditional entropy (class posterior entropy, CPE)

max I(Y;X) = min H(Y|X)

Page 8: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

8

Infomax example2 classes (M=2), 2 features ,

Note: Variance-based criteria (e.g. PCA) fail in this case!!

x1 x1 x2

x 2

Page 9: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

9

Infomax vs BETo show: Bayes error >= Infomax

Page 10: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

10

Example

Important observation:• The gradients of the two curve

have the same signs everywhere when defined

• The extrema of both sides are co-located

LHS and RHS have the same optimization solution

-0.8

0.6

0

Page 11: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

11

Infomax vs BEBayes error >= Infomax

Page 12: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

12

Infomax is optimal in MPE sense!Infomax is a good approximation of BE.The infomax solutions will be very similar to those BE.

example: M=2

CPE (H(Y|X)BE

µ

BE and CPE as functions of m

Page 13: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

13

Discriminant form of infomax

Infomax goal is equivalent to the goal that maximizes separation between the different classes

Noting that

Theorem 3:

Page 14: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

14

Feature Selection (FS)Forward sequential search for FS:

A set of features are added to the current best subset in each step, with the goal of optimizing a cost function

Denote the current subset by , the added features by , and the new subset by . We can prove

or

Maximizing mutual information (infomax) is simpler than minimizing BE

Page 15: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

15

Proof

Proof:

Page 16: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

16

Feature Selection (cont’d)

A trade-off between the maximization of discriminant power and the minimization of redundancyProblem: Infomax requires high-dimensional density estimatesFind a trade-off between optimality and complexity

favors discriminant featurespenalizes features redundant with previous

unless redundancy provides information about Y

Page 17: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

17

Maximum Marginal Diversity (MMD)Marginal Diversity

MMD based FS: a naïve infomaxSelect the subset of features that lead to a set of maximally diverse marginal densities.

Optimality conditionLemma : MMD is optimal if the following holds:

the average mutual information between features is not affected by the knowledge of the class label

Page 18: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

18

the Naïve Bayes ClassifierAssumption: features are only conditional independent given the class label

however, the optimality condition for MMD doesn’t holdunder this assumption. Since

Feature selected by MMD are not good for Naïve BayesClassifier!

Page 19: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

19

MMD (continued)

Fortunately, recent studies show that, for image recognition problems, MMD is very close to the optimal solution for the biologically plausible features, e.g. wavelet coefficients

Advantage: Computation is simple: only marginal distribution of each feature is considered.

Disadvantage: The existence of optimality condition can hardly be proved practically.There is no guarantee for optimality if the condition does not hold.

Page 20: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

20

Image statisticsFeature dependencies tend to be localized across both space and image scalee.g. for standard wavelet decomposition:

co-located coefficients of equal orientation can be arbitrarily dependent on the classaverage dependence between such sets of coefficients does not depend on the image class (strong V freq => weak H freq)

This property is referred to as a more general casethanMMD: l-decomposability:

feature set decomposable into mutually exclusive subsets of lthorderfeatures within subsets arbitrarily dependent, no constraintsdependence between subsets does not depend on image class

Page 21: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

21

More general caseAll the features are grouped as a collection of disjoint subsetsThe features within each subset are allowed to have arbitrary dependencies The dependencies between the subsets are constrained to be non-informative

Page 22: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

22

A family of FS algorithmsl-decomposability

Page 23: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

23

A family of FS Algorithms (cont’d)Theorem

The optimal infomax FS solution only requires density estimates of dimension

Page 24: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

24

A family of FS Algorithms (cont’d)

Parameter is a trade-off between optimality and complexity

, sub-optimal but computationally efficient= 0, MMD case, all the features depend in a non-

informative way.= n, all features depend in informative ways,

optimal but computational unscalable

Page 25: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

25

Infomax-based FS Algorithm

Page 26: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

26

Algorithm ComplexitySuppose C classes, F feature vectors per class, histogram with b bins along each axis

Page 27: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

27

Experiments on MMD (1)A Simple example (the optimal feature subsets are known)Tow Gaussian classes of identity covariance and means , n = 20Compare the “average feature selection quality” between with Jain&Zongker’s result (Mahalanobis distance)

Ave

rage

Qua

lity

“feature selection quality”: ratio between the correctly selected features and n

# of training samples # of training samples

In this sample, the optimal condition of MMD is satisfied

better

MMDBranch and bound

SFS

Page 28: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

28

Experiments on MMD (2)Brodatz texture-base classification 112 texture classes, 64(8*8) dimensional feature space, classifiers based on Gaussian mixutures

# of features # of features

Cum

ulat

ive

MD

Cla

ssifi

catio

n A

ccur

acy

Page 29: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

29

Experiments on MMD (3)Image retrieval on Brodatz texture database

PRA

PRA: area under precision/recall curve

# of features # of features

MD

Page 30: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

30

Experiments on MMD (4)Features as filters

projection of the textures onto the five most informative basis functionsdetectors of lines, corners, t-junctions and so forth

Page 31: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

31

Experiment on infomax (1)Image retrieval on Corel image database (15 classes, 1500 images)Different size of the clusters ( )

l=1l=2l=0variance

# of features

Main observations:• ITFS can

significantly outperform variance-based methods (10 vs 30 features for equivalent PRA)

• for ITFS there is no noticeable gain for l > 1!

PRA

Page 32: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

32

Experiment on infomax (2)Different number of histogram bins

# of features

Main observations:• Infomax-based

FS is quite insensitive to the quality of the estimates (no noticeable variation above 8 bins per axis, small degradation for 4)

• Always significantly better than variance

PRA

Page 33: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

33

Experiment on infomax (3)Image retrieval results on Corel

Page 34: Feature Selection for Image Retrieval and Object Recognitionelkan/254/MMD_dashan.pdf19 MMD (continued) Fortunately, recent studies show that, for image recognition problems, MMD is

34

ConclusionInfomax based feature selection is optimal in MPE senseAn explicit understanding of the trade-off between optimality and complexity, and the corresponding optimality condition implied by infomax (Most important contribution)

A scalable Infomax-based FS algorithm for image retrieval and recognitionFuture work:

Evaluation of optimality and efficiency of this infomax-based algorithm on other features (such as rectangular features in Viola&Jones’ face detector) and classification problems.