SVM Active Learning with Application to Image Retrieval Simon Tong, Edward Chang, Proceedings of the ninth ACM international conference on Multimedia,

SVM Active Learning with Application to Image Retrieval

Simon Tong , Edward Chang,

Proceedings of the ninth ACM international conference on Multimedia,

September 30-October 05, 2001

Outline

• Introduction

• SVM

• Version Space

• Active Learning

• Experiments

• Conclusion

Introduction- what is image retrieval?

• An image retrieval system is a computer system for browsing, searching and retrieving images from a large database of digital images.

• Most traditional methods of image retrieval utilize some method of adding metadata such as captioning, keywords, or descriptions to the images so that retrieval can be performed over the annotation words.

But as you known…. user is LazySo, Here is the Question …

How to automatically find the correct images for user ?

Introduction- relevance feedback

• Because hand-labeling each image with descriptive words is time-consuming, and costly.

• Thus, there is a need for a way to allow a user to implicitly inform a database of his or her desired output or query concept.

• Relevance feedback can be used as a query refinement scheme to derive or learn a user’s query concept.

• To solicit feedback, the refinement scheme displays a few image instances and the user labels each image as ‘relevant’ or ‘not relevant’.

Introduction- Active Learning

• Based on the answers, another set of images from the database are brought up to the user for labeling.

• The previous mentioned scheme often called pool-based active learning.

image

image

image

image

image

image image

image

image

image

Label for relevant or notimage

image

Introduction- active learning

• The main issue with active learning is finding a way to choose informative images within the pool to ask the user to label.

• In general, and for the image retrieval task in particular, such a learner must meet two critical design goals.– The learner must learn target concepts accurately.

– The learner must grasp a concept quickly, with only a small number of labeled instances, since most users do not wait around to provide a great deal of feedback

The key idea with active learning is that is should choose its next pool-query based on past answers to previous pool-queries

Introduction- proposed learner

• In this study, we propose using a support vector machine active learner (short for SVMact) to achieve our goals.

• The support vector machine active learner followed three idea below:– SVMact regards the task of learning a target concept as one of

learning a SVM binary classifier.

– SVMact learns the classifier quickly via active learning. The active part of SVMact selects the most informative instances with which to train the SVM classifier.

– Once the classifier is trained, SVMact returns the top-k most relevant images.

SVM- Linear Classifier

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

w x +

b=0

w x + b<0

w x + b>0


denotes +1

denotes -1




denotes +1

denotes -1




denotes +1

denotes -1


Any of these would be fine..

..but which is best?


denotes +1

denotes -1



Misclassified to +1 class


denotes +1

denotes -1


Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

denotes +1

denotes -1


Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.


denotes +1

denotes -1


The maximum margin linear classifier is the linear classifier with the, maximum margin.

This is the simplest kind of SVM (Called an LSVM)Linear SVM

Support Vectors are those datapoints that the margin pushes up against

1. Maximizing the margin is good2. Implies that only support vectors

are important; other training examples are ignorable.

3. Empirically it works very very well.

SVM- SVM Mathematically

,( , , )

w x bd w b x

w

( )( )

w xf x

w

Classifier Hyper plane

Dist between Data point x to classifier

SVM- SVM Mathematically

Margin:, 1 , 1

2( , ) min ( , ; ) min ( , ; )

i i i ii ix y x y

M w b d w b x d w b xw

SVM want to maximize margin, so this is an object function

Subject to:

How to solve? Lagrange Multiplier

( ) 1,i iy wx b i

SVM- Nonlinear dataset

• If the data is not linearly separable, what should we do ?

The data is not linear separable in this space, doesn’t mean that it is still not linear separable in other space

( )x

Version Space- Notations

• F: feature space

• H: a set that contain all hypotheses (hyperplane)

• W: parameter space

• f(x): classifier (hyperplane)

• v: version space

( ){ | ( ) , }

w xf f x w W

w

{ | {1, }, ( ) 0}i iv f i n y f x

{ | 1, {1, }, ( ) 0}i iv w W w i n y f x

Version space is the set that contain all possible classifiers

Active Learning- concept

• Given an unlabeled pool U , an active learner l has three components: l( f, q, X).– f : classifier (trained on the current set of labeled data X)

– q: query component, which give a current labeled set X, decides which instance in U to query next.

– X: labeled dataset

• DEFINITION 4.1 Area(V) is the surface area that version space V occupies on the hypersphere, where ||w|| = 1

We want the classifier get more precise when queries rounds is increased. So we need to reduce the version space as much as possible


• LEMMA (Tong & Koller, 2000) suppose we have an input space X, finite dimensional feature space F (induced via a kernel K), and parameter space W. suppose active learner l* always queries instances whose corresponding hyperplanes in parameter space W halves the area of the current version space. Let l be any other active learner. Denote Vi respectively. Let P denote the set of all conditional distribution of y given x. then

*,sup ( ) sup ( )P i P iP P

i E Area V E Area V


• This discussion provide motivation for us an approach where we query instances that split the current version space into two equal parts as much as possible

• Given an unlabeled instance X from the pool, it is not practical to explicitly compute the sizes of new space V-, V+.

• Hence, there is a way of approximating this procedure– simple method: learn an SVM on existing labeled data and choose

as the next instance to query the pool instance that comes closest to the hyperplane.

Active Learning- why simple work

• The SVM unit vector w obtained from labeled data is the center of the largest hypersphere that can fit inside the current version space V.

• The position of w is often approximately in the center of the version space.

• So, we can test each of unlabeled instances x in the pool to see how close their corresponding hyperplane in W to the centrally placed w.

• The distance calculation is straightforward

| ( ) |iw x

Active Learning- why simple work

Choose this for labeling

labeled instance

unlabeled instance

w

Active Learning- SVMact summary

• To summarize, our SVMact system performs the following for each round of relevance feedback:– Learn an SVM on the current labeled data– If this is the first feedback round, ask the user to label twenty

randomly selected images. Otherwise, ask the user to label the twenty pool images closest to the SVM boundary.

• After the relevance feedback rounds have been performed SVMact retrieves the top-k most relevant images:– Learn a final SVM on the labeled data– The final SVM boundary separate “relevant” images from

irrelevant ones. Display the k relevant images that are farthest from the SVM boundary

Experiments- Image Characterization

• Our image retrieval system employs a multi-resolution image representation scheme.

• We characterize images by two main feature– Color– Texture

• Given an image, we can extract the above color and texture information to produce a 144 dimensional vector of numbers.

• Thus, the space X for our SVMs is a 144 dimensional space, and each image in our database corresponds to a point in this space.

Experiment- Image Databases

• We used three real-world image dataset:– Four-category

– Ten-category

– Fifteen-category

• Each category consisted 100 to 150 images

• Four-category– Architecture, flowers, landscape, people

• Ten-category– Architecture, bears, clouds, flowers, landscape, people objectionable ,

tigers, tools, and waves

• Fifteen-category– In addition to ten, Elephant, fabrics, fireworks, food, and texture

Experiments- method

• The goal of SVMact is to learn a given concept through a relevance feedback process.

• At each feedback round SVMact selects twenty images to ask the user to label as ‘relevant’ or ‘not relevant’ with respect to the query concept.

• It then uses the labeled instances to successively refine the concept boundary.

• After the relevance feedback rounds have finished SVMact then retrieves the top-k most relevant images from the dataset based on the final concept it has learned.

Experiments- average top-k accuracy

Experiments- compared with passive

Conclusion

• Active learning with SVM can provide a powerful tool for searching image databases, outperforming a number of traditional query refinement schemes.

• SVMact not only achieve s consistently high accuracy on a wide variety of desired returned results, but also does it quickly and maintains high precision when ask for large quantities of images.

SVM Active Learning with Application to Image Retrieval Simon Tong, Edward Chang, Proceedings of the ninth ACM international conference on Multimedia,

Documents

image slide

svm active learning

introduction active

svm classifier

svm linear classifier

image instances

signw x b

svm binary classifier