Active reranking for web image search

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010 805

Active Reranking for Web Image SearchXinmei Tian, Dacheng Tao, Member, IEEE, Xian-Sheng Hua, Member, IEEE, and Xiuqing Wu

Abstract—Image search reranking methods usually fail to cap-ture the user’s intention when the query term is ambiguous. There-fore, reranking with user interactions, or active reranking, is highlydemanded to effectively improve the search performance. The es-sential problem in active reranking is how to target the user’s in-tention. To complete this goal, this paper presents a structural in-formation based sample selection strategy to reduce the user’s la-beling efforts. Furthermore, to localize the user’s intention in thevisual feature space, a novel local-global discriminative dimensionreduction algorithm is proposed. In this algorithm, a submanifoldis learned by transferring the local geometry and the discrimina-tive information from the labelled images to the whole (global)image database. Experiments on both synthetic datasets and a realWeb image search dataset demonstrate the effectiveness of the pro-posed active reranking scheme, including both the structural infor-mation based active sample selection strategy and the local-globaldiscriminative dimension reduction algorithm.

Index Terms—Active reranking, local-global discriminative(LGD) dimension reduction, structural information (SInfo) basedactive sample selection, web image search reranking.

I. INTRODUCTION

C URRENTLY, most of the popular commercial Web imagesearch engines, e.g., Microsoft’s Live Image Search and

Google Image Search, are built for “query by keywords” sce-nario. That is, a user provides a keyword, e.g., “panda”, thenthe search engine returns corresponding images by processingthe associated textual information, e.g., file name, surroundingtext, URL, etc.

Although text-based search techniques have shown their ef-fectiveness in the document search, they are problematic whenapplied to the image search. There are two main problems. Oneis the mismatching between images and their associated tex-tual information, resulting into irrelevant images appearing inthe search results. For example, an image which is irrelevant to“panda” will be mistaken as a relevant image if there is a word

Manuscript received March 04, 2009; revised October 05, 2009. Firstpublished November 03, 2009; current version published February 18, 2010.This work was supported by the Nanyang Technological University NanyangSUG Grant (M58020010), the Microsoft Operations PTE LTD-NTU JointR&D (M48020065), and the K. C. Wong Education Foundation Award. Theassociate editor coordinating the review of this manuscript and approving it forpublication was Prof. Sharathchandra Pankanti

X. Tian and X. Wu are with the Department of Electronic Engineering andInformation Science, University of Science and Technology of China, Hefei230027, China (e-mail: [email protected]; [email protected]).

D. Tao is with the School of Computer Engineering, The Nanyang Techno-logical University, 50 Nanyang Avenue, Blk N4, Singapore, 639798 (e-mail:[email protected]).

X.-S. Hua is with Microsoft Research Asia, Beijing 100190, China (e-mail:[email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2009.2035866

“panda” existing in its surrounding text. The other problem isthat the textual information is insufficient to represent the se-mantic content of the images. The same query words may referto images that are semantically different, e.g., we cannot dif-ferentiate an animal panda image from an image for a personwhose name is Panda, just with the text word “panda”.

Because the textual information is insufficient for semanticimage retrieval, a natural recourse is the visual information. Re-cently a dozen of image/video reranking methods [6], [14], [15],[17], [34] have been proposed to exploit the usage of the vi-sual information for refining the text-based search result. Mostof these reranking methods utilize the visual information in anunsupervised and passive manner. The only exception is the In-tentSearch [6], which reorders the text-based search result byusing query by example (QBE), with the query image specifiedby the user from the initial text-based search result.

Unsupervised reranking methods, e.g., the clustering basedalgorithm [14], the random work [15], the VisualRank [17] andthe Bayesian reranking [34], can only achieve limited perfor-mance improvements. This is because the visual informationis insufficient to infer the user’s intention, especially when thequery term is ambiguous. For example, “panda” can be eitheran animal or a person whose name is Panda. Without user inter-actions, we have no idea which kind of panda images are pre-ferred by the user. However, if the user interactions are avail-able, we can learn his/her intention and then rerank the initialsearch results to achieve a significant performance improve-ment. For instance, in the query “panda”, if the user labels theanimal pandas as relevant and other images as irrelevant, dif-ferent kinds of animal pandas will be returned to the user. Inthis paper, reranking with user’s interactions is named as activereranking. IntentSearch [6] can be regarded as a simplified ac-tive reranking method with only one relevant image labelled bythe user.

In active reranking, the essential problem is how to capturethe user’s intention, i.e., to distinguish query relevant imagesfrom irrelevant ones. Different from the conventional learningproblems, in which each sample only has one fixed label, animage may be relevant for one user but irrelevant for another.In other words, the semantic space is user-driven, accordingto their different intentions but with identical query keywords.Therefore, we propose to target the user-driven intention fromtwo aspects: collecting labeling information from users to obtainthe specified semantic space, and localizing the visual charac-teristics of the user’s intention in this specific semantic space,as detailed in Sections I-A and B, respectively.

Although IntentSearch [6] can be deemed as a simplified ver-sion of active reranking, i.e., the user’s intention is defined byonly one query image, it cannot work well when the user’s inten-tion is too complex to be represented by one image. As shownin Fig. 3, the query relevant images for “Animal” vary largely

1057-7149/$26.00 © 2010 IEEE

Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on July 20,2010 at 10:56:33 UTC from IEEE Xplore. Restrictions apply.

806 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 19, NO. 3, MARCH 2010

both in visual appearance and features, thus we cannot repre-sent “Animal” only with one image. Instead, our proposed ac-tive reranking method can learn the user’s intention more exten-sively and completely.

A. Active User’s Labeling Information Collection

To collect the labeling information from users efficiently, anew structural information (SInfo) based strategy is proposedto actively select the most informative query images.

It is boring and unacceptable to keep asking a user to label alot of images in the interaction stage. Thus, it is essential to getthe necessary information by labeling as few images as possible.Active learning is well-known for reducing the labeling efforts,by labeling most informative samples [4], [20]. Conventionalactive learning strategies can be divided into two categories: theerror reduction strategy [12], [25], [43] and the most uncertain(close-to-boundary) strategy [4], [36]. Both of them suffer fromthe small sample size problem, i.e., the unreliable estimation ofthe expected error risk and the uncertainty caused by the insuf-ficient labelled samples.

In active reranking, however, only a few images will belabelled by a user. To avoid or alleviate the influence of thesmall sample size problem, our proposed SInfo sample se-lection strategy considers two aspects: the ambiguity and therepresentativeness, simultaneously.

The ambiguity denotes the uncertainty whether an image isrelevant or not to the user’s intention. Chang et al. [4] and Wanget al. [36] have demonstrated the effectiveness of the ambiguityin active learning for image retrieval. However, they are notspecified for reranking problem. In this paper, the ambiguity isconsidered in a more natural way for reranking; it is derivedfrom the ranking scores, which denotes the images’ relevancedegrees. Besides the ambiguity, the representativeness, anotherimportant aspect, is also considered. An image is more represen-tative if it is located in a dense area with many images aroundit. Labeling a representative sample will bring more informationthan labeling an isolated one. In active reranking, the represen-tativeness is derived in a totally unsupervised fashion and inde-pendent to the learning algorithms, to alleviate the influence ofthe aforementioned small sample size problem. Experiments onboth synthetic data and a real Web image search dataset showthat the SInfo is much more effective than other strategies, e.g.,the most uncertain strategy and the error reduction strategy, inactive reranking for Web image search.

B. Visual Characteristic Localization

To localize the visual characteristics of the user’s intention,we propose a novel local-global discriminative (LGD) dimen-sion reduction algorithm. Basically, we assume that the queryrelevant images, which represent the user’s intention, are lyingon a low-dimensional submanifold of the original ambient (vi-sual feature) space. LGD learns this submanifold by transfer-ring both the local geometry and the discriminative informa-tion from labelled images to unlabelled ones. The learned sub-manifold preserves both the local geometry of labelled relevantimages and the discriminative information to separate relevantfrom irrelevant images. As a consequence, we can eliminate thewell-known semantic gap between low-level visual features and

high-level semantics to further enhance the reranking perfor-mance on this submanifold.

In the past decades, a dozen of dimension reduction algo-rithms have been proposed, e.g., principal components analysis(PCA) [13], transductive component analysis [23], locally linearembedding (LLE) [27], Discriminant LLE [21], ISOMAP [31],nonparametric discriminant analysis [29], semi-supervised dis-criminant analysis (SDA) [3], biased marginal Fisher’s analysis(BMFA) [37], locality preserving projections (LPP) [11], super-vised LPP (SLPP) [2], geometric mean for subspace selection[28], local discriminant embedding (LDE) [5], semantic man-ifold learning (SML) [22], orthogonal Laplacianface [1], max-imum margin projection (MMP) [10] and the recently devel-oped correlation metric based methods [8], [9]. However, theyare problematic for active reranking in Web image search forthe following reasons. Unsupervised methods, e.g., PCA andLLE, exploit a subspace or submanifold on the whole imagespace but ignore user’s labeling information. As a consequent,these algorithms fail to capture the user-driven intentions. Su-pervised linear algorithms, e.g., LDA [7] and biased discrimi-nant analysis (BDA) [41], learn a subspace on the labelled set sothey ignore the submanifold of all relevant images. Supervisedmanifold learning algorithms, e.g., SLPP and BMFA, cannottransfer the learned submanifold from labelled images to unla-belled images. Although some semi-supervised algorithms, e.g.,SML and SDA, have been developed to model both labelled andunlabelled images, they are not designed specifically for activereranking in Web image search. They assume both relevant andirrelevant unlabelled images are drawn from a nonlinear man-ifold. In Web image search, however, irrelevant images scatterin the whole space, i.e., they may be distributed uniformly, andthus popular manifold regularizations [3], [22] will over-fit tounlabelled images. As a consequence, the performance obtainedby popular semi-supervised learning algorithms is poor. Thispaper presents a new algorithm to target user’s intention. Pre-liminary experimental results on both synthetic data and a realWeb image search dataset demonstrate the effectiveness of theproposed LGD.

The rest of the paper is organised as follows. Firstly, we in-troduce the overall framework for active reranking in Section II.The SInfo active sample selection strategy is detailed in Sec-tion III and the LGD dimension reduction algorithm is presentedin Section IV. In Section V, the basic Bayesian reranking algo-rithm is briefly introduced and the overall procedure of activereranking based on it is given. Experimental results on syntheticdatasets and a real Web image search dataset are reported in Sec-tion VI and Section VII, respectively. In Section VIII, we givesome analysis to the important parameters in SInfo and LGD,followed by the conclusion in Section IX.

II. ACTIVE RERANKING FOR WEB IMAGE SEARCH

Fig. 1 shows the proposed general framework for activereranking in Web image search. Take the query term “panda”as an example. When “panda” is submitted to the Web imagesearch engine, an initial text-based search result is returned tothe user, as shown in Fig. 1(a) (only the top nine images aregiven for illustration). This result is unsatisfactory because bothperson and animal images are retrieved as top results. This is


TIAN et al.: ACTIVE RERANKING FOR WEB IMAGE SEARCH 807

Fig. 1. Framework for active reranking illustrated with the query “panda”. When the query is submitted, the text-based image search engine returns a coarse result(a). Then the active reranking process is adopted to obtain a more satisfactory result (b), by learning the user’s intention.

caused by the ambiguity of the query term. Without the userinteractions, it is impossible to eliminate this ambiguity. In par-ticular, which kind of images, animal panda or person whosename is Panda, are user’s intention? Therefore, traditionalreranking methods, which improve the initial search results byonly utilizing the visual property of images, cannot achievegood performances.

To solve this problem, active reranking, i.e., reranking withuser interactions, is proposed. As shown in Fig. 1, four im-ages are first selected according to an active sample selectionstrategy, and then the user is required to label them. If the userlabels the animal pandas as query relevant (indicated by “ ” inFig. 1) and other two images (person, car) as query irrelevant.Then we can learn that the animal panda is the user’s intention.To represent this intention, i.e., the animal panda, a discrimina-tive submanifold should be exploited to separate query relevantimages from irrelevant ones. A dimension reduction step is thusintroduced to localize the visual characteristics of the user’s in-tention.

With the knowledge of the user’s intention, including boththe labeling information and the learned discriminative subman-ifold, the reranking process is conducted and different kinds ofanimal pandas are returned, as shown in Fig. 1(b). Sometimes,several interaction rounds are preferred to achieve a more satis-factory performance.

In summary, there are two key steps in learning the user’s in-tention, i.e., the active sample selection strategy and the dimen-sion reduction algorithm. This paper implements these two stepsvia a new SInfo sample selection strategy and a novel LGD di-mension reduction algorithm, as will be discussed in Sections IIIand IV, respectively.

III. SINFO ACTIVE SAMPLE SELECTION

An SInfo active sample selection strategy is presented to learnthe user’s intention efficiently which selects images by consid-ering not only the ambiguity but also the representativeness inthe whole image database. Ambiguity and representativenessare two important aspects in active sample selection. Labelinga sample which is more ambiguous will bring more informa-tion. On the other side, the information provided by individualsample can be shared by its neighbors. Therefore, the morerepresentative samples are preferred for labeling. In SInfo, the

ambiguity of an image is measured by the entropy of the rel-evance probability distribution while the representativeness ismeasured by the density.

A. Ambiguity

The ambiguity denotes the uncertainty whether an image isrelevant or not. It can be estimated via various sophisticatedlearning methods, e.g., support vector machine (SVM) [35],transductive SVM (TSVM) [18] and the harmonic Gaussianfiled method [42], by conducting a binary classification task.

However, in active reranking, it is direct and reasonable tomeasure the ambiguity with the ranking scores obtained in thereranking process. There are two reasons. One reason is thatthe reranking problem is essentially different from classifica-tion [34], thus the ambiguity estimated via conducting classi-fication task may be not as accurate as that directly derived inreranking process. The other reason is that additional cost willbe introduced if the ambiguity is estimated via other learningmethods. In contrast, measuring ambiguity through the rankingscores avoids this additional cost.

For an image is its ranking score, wheremeans is definitely query relevant, while means istotally irrelevant. and can be regarded as the prob-ability of to be relevant and irrelevant respectively. Then theambiguity can be measured via the information entropy, whichis a widely used measurement in the information theory. Theambiguity of is

(1)

Because the reranking is conducted based on the initial text-based search result [34], the ambiguity in the initial text-basedsearch result should also be taken into account, i.e.,

(2)

where is the initial text-based search ranking scorefor .

By combining (1) and (2), the total ambiguity for is

(3)

where is a trade-off parameter to control the influ-ence of the two ambiguity terms.



Fig. 2. Because “A” and “B” have the same distance to the hyper-plane (dashedline), they have an identical ambiguity. However, the more representative sample“A” is more preferable to “B”.

B. Representativeness

Besides the ambiguity, representativeness, an important prop-erty but not well studied before, is also taken into account. Apartfrom the unreliable estimation led by insufficient labelled im-ages, the ambiguity measures the importance of the image it-self only. Once the Web image search system gets the labelinginformation of an image, it is very important to consider howmany other images can share the labeling information with thelabelled one. For example, given two unlabelled samples withthe identical ambiguity, labeling the more representative one,i.e., many samples are distributed around it, will bring more in-formation and achieve a better reranking performance.

To explain this, a simple synthetic dataset is shown in Fig. 2.There are two labelled samples (a big “*” for the query rele-vant sample while a big “o” for the query irrelevant one) andseveral unlabelled ones (marked with black big dot “.”). Thesesix samples distribute along a line and the coordinates on thehorizontal axis denote their positions. By using SVM [35], theclassification hyper-plane , which separates the two labelledsample with the largest margin, crosses position 0 as shown inFig. 2 with the dashed line. According to the most uncertaintycriteria, i.e., the samples closest to having the maximum am-biguity, we can get that “A” and “B” have the maximum andidentical ambiguity because they have the same distance, i.e.,0.4 for both, to the hyper-plane. However, if we can choose onlyone sample for labeling, it is better to label “A” than “B” becausemore unlabelled samples will share the labeling information of“A”.

To avoid the small sample size problem in active sample se-lection, the representativeness can be estimated in an unsuper-vised manner. Intuitively, labeling an image in a dense area willbe more helpful than labeling an isolated one because the la-beling information of the image can be shared with other sur-rounding images. As a consequence, we can measure the rep-resentativeness of image via the probability density ,which can be estimated by using the kernel density estimation(KDE) [26]

(4)

where is the set of neighbors of . is the visual feature forimage . is a kernel function that satisfies bothand . The Gaussian kernel is adopted in thispaper. For the synthetic dataset in Fig. 2, the estimated repre-sentativeness is given by the curve .

C. Active Sample Selection

Since the most informative images should meet both ambi-guity and representativeness simultaneously, the structural in-formation of image , can be measured by the productof the two terms, i.e.,

Then the most informative image is selected from the un-labelled image set according to

(5)

In practical applications, to provide a good user experience,it would be better to ask users to label a small number of imagesthan only one image in each round. This is because users willlose their patience after a few rounds. Thus, the batch mode isutilized to select several images in each round. A simple methodis to select the top- most informative images. The disadvantageof this method is that the selected images may be redundantand cluster in a small area in the high-dimensional feature space.Thus, we seek to select a batch of most informative images andmaintain their diversity at the same time.

The angle-diversity criterion [4] is a good choice to achievethis purpose. This criterion iteratively selects images which aremost informative and also be diverse to the already selectedimage set . For an unlabelled image , the diversity between

and is measured by the minimal angle between and eachimage . Then, the images are selected iteratively ac-cording to

(6)

where is a trade-off parameter which is introducedto balance the effects of the two components: the structural in-formation and the angle-diversity.

IV. LGD DIMENSION REDUCTION

In reranking, the images returned for acertain query term are represented by low-level visual features,i.e., with the -dimensional visualfeature for image . The performance of reranking isusually poor because of the gap between the low-level visualfeatures and high-level semantics.

With user interactions, this semantic gap can be reduced sig-nificantly. By mining user’s labeling information, we can learna submanifold to encode the user’s intention. This submanifoldis embedded in the ambient space, i.e., the high-dimensional vi-sual feature space . In this paper, a linear subspace is usedto approximate this submanifold and then the images can be rep-resented as with

for image . By using , an improved reranking per-formance can be further obtained.

This paper presents an LGD dimension reduction algorithmto learn such a . LGD considers both the local informationcontained in the labelled images and the global information ofthe whole image database simultaneously. In detail, LGD trans-fers the local information, including both the local geometry ofthe labelled relevant images and the discriminative informationin the labelled images, to the global domain (the whole imagedatabase). This cross domain transfer process is completed by



building different local and global patches for each image, andthen aligning those patches together to learn a consistent coor-dinate. One patch is a local area formed by a set of neighboringimages. We have three types of images: labelled relevant, la-belled irrelevant, and unlabelled. Therefore, we build 3 types ofpatches, which are: 1) local patches for labelled relevant imagesto represent the local geometry of them and the discriminativeinformation to separate relevant images from irrelevant ones,2) local patches for labelled irrelevant images to represent thediscriminative information to separate irrelevant images fromrelevant ones, and 3) global patches for both labelled and unla-belled images for transferring both the local geometry and thediscriminative information from all labelled images to the unla-belled ones.

For convenience, we use superscript “ ” to denote the la-belled relevant images and “ ” to denote the labelled irrele-vant ones. If there is no superscript, it refers to an arbitraryimage which may be labelled relevant, labelled irrelevant or un-labelled.

A. Local Patches for Labelled Relevant Images

BDA, a popular dimension reduction algorithm for image re-trieval, assumes that all query relevant samples are alike whileeach irrelevant sample is irrelevant in its own way [41]. Thus,the relevant samples are required to be close to each other in theprojected subspace. However, this assumption is usually unreli-able in Web image search.

The query relevant samples may vary in appearance andcorresponding visual features. For example, in query “animal”,query relevant images are different from each other, as shownin Fig. 3. For this reason, instead of requiring relevant imagesto be close to each other in the projected subspace, it is moreproper to remain the local geometry of the relevant imageswhile separating relevant images from all irrelevant ones.Therefore, the local patch for a labelled relevant imageshould preserve both the local geometry of relevant images andthe discriminative information between the relevant imagesand all irrelevant images. This paper models the local patch forthe low-dimensional representation of the labelled relevantimage as

(7)

are ’s nearest neighbors in the labelled rel-evant image set “ ”, and The are itsnearest neighbors in the labelled irrelevant image set “ ”. Thecombination coefficient is a trade-off factor between the twoparts.

The first part in (7) is used to preserve the local geometryof labelled relevant images before and after projection, thus thelinear combination coefficient vector is required to recon-struct from its neighboring relevant images with minimalerror

(8)

Fig. 3. For query “animal”, the query relevant images vary largely in both ap-pearance (a) and visual features (b). In (b), the utilized 428-D visual featuresinclude 225-D color moment, 128-D wavelet texture and 75-D edge distribu-tion histogram.

Solving problem (8), we can get

with the local gram matrix.

To rewrite (7) in a more compact form, we consider its twoparts separately. For the first part, which models the local ge-ometry of relevant images

(9)

where and

with .

The second part models the discriminative information forseparating relevant image from all irrelevant ones, i.e.,

(10)

where and

.



By combining (9) and (10) together into (7), we have

B. Local Patches for Labelled Irrelevant Images

Discriminative information is also partially encoded in all ir-relevant images, so we construct local patches for labelled irrel-evant images by separating each irrelevant image from all rel-evant images. Because each irrelevant image is irrelevant in itsown way, it could be unreasonable to keep the local geometryof the irrelevant images. In this paper, we model the local patchfor the low-dimensional representation of labelled irrelevantimage as

(11)

The is ’s nearest neighbors in the labelledrelevant image set “ ”. The matrix can be calculated in theway similar to that of computing in (10) by settingand .

C. Global Patches for All Images

In active reranking, users would like to label only a smallnumber of images, so it is lavish and unreasonable to abandon alarge number of unlabelled images. With only the labelled im-ages, the learned subspace will bias to that spanned by theselabelled images and cannot generalize well to the large amountof unlabelled data. Therefore, some semi-supervised methodshave been proposed which also take the unlabelled images intoutilization. However, because only relevant images are lying onan unknown manifold and the distribution of irrelevant imagesis nearly flat, conventional manifold regularizations which as-sume both relevant and irrelevant samples are drawn from un-known manifolds prone to over-fit to unlabelled samples. As aconsequence, another method will be considered in this paperto model unlabelled images in active reranking.

To make use of both the labelled and unlabelled images, themost important thing is to exploit the information contained inthem. Inspired by the main idea in the cross domain learning[16] and the transfer learning [32], in this paper, we introducethe global patches to both labelled and unlabelled images. Theglobal patches transfer the local geometry and the discrimina-tive information, which is exploited in the domain of labelledimages, to the domain of unlabelled images. With the globalpatches, we aim to preserve the principal subspace to keep thesubmanifold of relevant images. The noise information con-tained in the ambient space should be eliminated. The principalcomponent analysis (PCA) is a suitable choice, which maxi-mizes the mutual information between the ambient space andthe corresponding projected subspace [13]. Another reason for

us to use PCA here is the rule of Occam’s razor [24], i.e., theutilization of PCA is helpful to avoid the over-fitting caused byusing conventional manifold regularizations.

To illustrate the advantage of global patches for dimension re-duction, a synthetic example is shown in Fig. 4. Fig. 4(a) showsthe synthetic 3-D dataset and its projection on the 2-D planesfor a nice view. In this dataset, there are 8 labelled samples,4 relevant and 4 irrelevant, accompanied with abundant unla-belled samples. The relevant samples are all marked by “*”, withbig red “*” for the 4 labelled relevant samples and small black“*” for the unlabelled relevant ones. The irrelevant samples aremarked by “o”, where big blue “o” and small green “o” denotethe labelled and unlabelled irrelevant samples respectively. Theirrelevant samples scatter in the space and the relevant imagesare distributed on a manifold approximately.

We have tried many different dimension reduction algo-rithms and the results are illustrated in Fig. 4(b)–(k). Foreach dimension reduction algorithm, we have computed theprojection plane (the upper part of subfigure) and the projected2-D data (the lower part of subfigure). With these conventionalalgorithms, the relevant and irrelevant samples are overlappedin the projected subspace and the submanifold of the relevantsamples is not well preserved, as illustrated in the figure.This is caused by the problems existing in these algorithms asaforementioned.

To avoid these problems, the proposed LGD learns thesubmanifold by transferring both the local geometry and thediscriminative information from labelled samples to all un-labelled samples. Global patches are built for each sample(including both labelled and unlabelled) to complete the crossdomain knowledge transferring process. According to the align-ment scheme in [40], the global patch for the low-dimensionalrepresentation of the image is modeled in a similar way tolocal patches

(12)

where is the centroid of the projected low-dimensional fea-ture. Here we use a variant version of the original definition ofPCA to achieve a formula-level consistency for both local andglobal patches.

We rewrite (12) as

where withare the rest images beyond , vector

and

.

By combining both local and global patches, LGD approxi-mates the intrinsic submanifold of relevant samples, as shown



Fig. 4. Three-dimensional synthetic dataset for dimension reduction illustration. In this dataset, big red “*” and big blue “o” denote labelled relevant and irrelevantsamples, respectively. Small black “*” and small green “o” are unlabelled relevant and irrelevant samples, respectively. As given in (b), LGD reveals the submani-fold of the relevant samples and separates the relevant samples from the irrelevant ones in the projected 2-D subspace. When other dimension reduction algorithmsare adopted, the relevant and irrelevant samples are overlapped in the projected subspace, as shown in (c)–(k). (a) The 3-D synthetic data and its 2-D projections onthe three planes, i.e., XY, XZ, and YZ, (b) LGD, (c) Local patches, (d) Global patches, (e) LGD-LPP, (f) BDA, (g) BMFA, (h) LDE, (i) SLPP, (j) SDA, (k) SML.

in Fig. 4(b). Relevant samples can be separated from irrelevantones in the projected 2-D subspace. Besides, we show resultsof only local patches and only global patches for dimension re-duction in Fig. 4(c) and (d), respectively. Neither of them canperform well.

To investigate the effectiveness of the PCA based globalpatches, we replace them with LPP based patches, which arebuilt in a similar way for each sample. We name this LPP basedLGD as LGD-LPP and show its performance in Fig. 4(e). Thisresult is unsatisfactory because LPP assumes there is a manifoldfor both labelled and unlabelled samples which violates thetrue distribution of irrelevant samples. On the other hand, byusing PCA based global patches, the subspace with maximumvariance is preserved, so manifold structure of relevant samplescan also be preserved. By integrating global patches and localpatches, we can discover the intrinsic submanifold of relevantsamples, and separate relevant samples from irrelevant samples.

D. Patch Coordinate Alignment

Each patch has its own coordinate system. With the calculatedlocal and global patches, we can align them together into a con-sistent coordinate. For each imagecan be rewritten as , where and

is the selection matrix. The is defined ac-cording to [38]–[40] as

where is the index vector for samples in .

Then, we can combine all the patches defined in (7), (11), and(12) together

(13)

where

and is a control parameter.



By imposing , the projection matrixcan be obtained by solving the standard eigende-

composition problem

(14)

where is consisting of the eigenvectors corresponding to thelargest eigenvalues.

V. BAYESIAN RERANKING

To verify the effectiveness of the proposed active rerankingmethod, we apply the SInfo active sample selection strategyand the LGD dimension reduction algorithm to reranking. BothSInfo and LGD are general and can be directly applied to var-ious reranking algorithms, e.g., VisualRank [17]. In this paper,we take the Bayesian reranking [34] as the basic reranking al-gorithm for illustration.

We first give a brief introduction for Bayesian reranking. Inthis method, reranking is explicitly formulated into a global op-timization problem. The optimal reranked score list is ob-tained by minimizing the following energy function:

(15)

where is the initial text search score list, isa trade-off parameter and is a graph which is constructed withnodes being the images and the weights being their visual simi-larities, and is the regularizer, which will be detailedbelow.

The two terms on the right hand side of (15) correspond totwo assumptions, i.e., the visual consistency and the rankingconsistency, respectively. The first term, i.e., the regularizationterm, penalizes the ranking score inconsistency within visuallysimilar samples. The second term is the ranking distance termwhich penalizes the derivation of the reranked results from theinitial text-based search results.

For the regularization term, the local kernel is adopted

(16)

where is the local kernel matrix [33]. A point-wise distanceis adopted for the ranking distance

(17)

With (16) and (17), we obtain

where with . Then, a closed-form solution for is given by

(18)

When applying the Bayesian reranking for active reranking,modifications will be made to incorporate the new obtained in-formation, i.e., the images’ labels obtained from SInfo and theeffective feature learned via LGD. For a labelled image, its

is set as its ground truth label (“1” for relevant and “0” forirrelevant) and large (set as 100 in this paper) is adopted toensure equal or very close to its ground truth label. The graph

is built with the learned to model the visual consistencyprecisely.

In active reranking, at the very beginning, the Bayesianreranking is performed in the original feature space withoutlabelled images. Then, with the derived , SInfo is conductedto select informative images for labeling. By interacting withthe user, the labels of these images are obtained with which theeffective feature is learned via LGD. With the latest labelledimage set as well as , Bayesian reranking is performed toderive a new . The final reranking result is obtained by sortingthe images according to in a descending order.

Usually, several interaction rounds are performed to achievea satisfactory performance. Therefore, in next interaction round,SInfo and LGD are performed with the new obtained in the lastround. The overall procedure of our active reranking is summa-rized as follows:

1: Initialization: the image set , the number of interactionrounds T, labelled image set and .

2: /* Perform Bayesian reranking to get */Bayesian reranking .

3: For to T do1) /* Perform SInfo to select a set of image */

SInfo/* Update */

2) /* Perform LGD to learn a new */

3) /* Perform Bayesian reranking to derive a new */Bayesian reranking

4: End for5: Return

VI. EXPERIMENTS ON SYNTHETIC DATASETS

In this section, we used three synthetic datasets to illustratethe effectiveness of the SInfo sample selection strategy, asshown in Fig. 5 (top). In each dataset, the relevant samplesare marked with red stars (“*”) while the irrelevant ones aremarked with blue circles (“o”).

The initial ranking score list was set randomly since wehad no textual information to simulate the text-based searchprocess. At the beginning stage, one relevant and one irrele-vant sample were randomly selected as the labelled set and therest were taken as the unlabelled. The initial reranked results[“RerankInitial” curve in Fig. 5 (bottom)] were obtained byreranking without user interactions. Parameters in each methodwere determined empirically in this paper to achieve its bestperformance.

In each interaction round, only one sample was selected forlabeling. For each dataset, we have given the reranked resultsafter 4 interaction rounds with different active sample selection



Fig. 5. Active reranking on synthetic datasets.

strategies. We performed 100 random trials and showed the av-eraged performance, measured by the widely used noninterpo-lated Average Precision (AP) [30]. The AP averages the preci-sion values obtained when each relevant image occurs.

We compared SInfo with other three sample selectionstrategies, i.e., “Error Reduction” [43], “Most Uncertain” [4]and “Random”. In “Most Uncertain”, the most ambiguitysamples are selected for interaction according to (3). Whilein “Random”, the query samples are selected randomly. Thecomparison results, as shown in Fig. 5 (bottom), demonstratethat the proposed strategy outperforms the rival methodsconsistently on all three datasets. This is because “Error Reduc-tion” and “Most Uncertain” suffer from the small sample sizeproblem. SInfo is more robust because it takes both ambiguityand representativeness into consideration, and thus alleviatesthe influence of the small sample size problem.

VII. EXPERIMENTS ON WEB IMAGE SEARCH DATASET

We also conducted experiments on a real Web image searchdataset. In this dataset, there are 105 queries selected seriouslyfrom a commercial image search engine query log as well aspopular tags of Flickr. These queries cover a large range oftopics, including named person, named object, general objectand scene. For each query, a maximum of 1 000 images returnedby commercial image search engines, i.e., Google, Live andYahoo, were collected as the initial text-based search results.This dataset contains 94 341 images in total. For each query,three participants were asked to judge whether the returned im-ages are query relevant or irrelevant. An image is labelled asquery relevant if at least two of the three participants judged itas relevant, and vice versa.

Images are represented by 428-D low-level visual features,including 225-D color moment in LAB color space, 128-Dwavelet texture as well as 75-D edge distribution histogram.For the initial text search score list , because images are alldownloaded from Web search engines (e.g., Google, Live and

Yahoo), we only know ranks of images in the text-based searchand their scores are not available. According to [14], the nor-malized rank is adopted as the pseudo score, forthe th ranked image, where and is the numberof images returned by the Web search engine for a query term.

For active sample selection, five images were selected to in-teract with the user in each interaction round and four roundswere considered. Therefore, for each query, there were 20 im-ages labelled by the user totally. The performance is also mea-sured by average precision (AP) [30]. We calculated the APs atdifferent positions from top-1 to top-100 to obtain the AP curve.We averaged the APs over all the 105 queries to get the meanaverage precision (MAP) for overall performance evaluation.

A. Active Reranking With SInfo

In this section, we will investigate the effectiveness of SInfosample selection strategy and compare it with other threemethods: “Error Reduction” [43], “Most Uncertain” [4], and“Random.” To be noted, here both the reranking and the activesample selection were conducted in the original feature space.The effectiveness of the LGD dimension reduction algorithmwill be discussed in Section VII-B, in comparing with otherrepresentative ones.

Fig. 6 summarizes the comparison results. The “Baseline”curve gives the performance of the text-based search results andthe “RerankInitial” curve is the performance of the unsuper-vised reranking without user interactions. The “SInfo”, “ErrorReduction”, “Most Uncertain”, and “Random” curves denotethe performances of the reranked results with query images se-lected according to these four strategies respectively.

Fig. 6 shows the effectiveness of the proposed activereranking framework as well as the superiority of the proposedSInfo sample selection strategy. Curves in this figure showthat user’s labeling information helps enhance the rerankingperformance. User interactions can improve the average perfor-mance, no matter which sample selection strategy is adopted.



Fig. 6. MAP over all queries with different sample selection strategies.

Moreover, among these four strategies, SInfo performs bestand achieves a significant performance improvement. This isbecause SInfo considers both the ambiguity and the represen-tativeness while the “Most Uncertain” and “Random” onlytake one side of them into account. For “Error Reduction” and“Most Uncertain”, they both suffer from the small sample sizeproblem while our method alleviates this influence by takingrepresentativeness into account in an unsupervised manner.

B. Active Reranking With LGD

To test the effectiveness of LGD discussed in Section IV,we conducted the active reranking in the projected subspaceby using different dimension reduction algorithms. The SInfosample selection strategy was adopted in this experiment.

We compared LGD with several representative algorithms,including unsupervised algorithm, i.e., PCA [13], supervisedones, i.e., BDA [41], LDE [5] and SLPP [2], as well as semi-su-pervised ones, i.e., SML [22], SDA [3] and LGD-LPP. The sub-space dimension was set to 100 for all algorithms empirically.Fig. 7 shows the results. The “SInfo” curve denotes the rerankedresults of active reranking which is conducted in the originalfeature space without dimension reduction with the samples se-lected via SInfo. This curve is identical to the “SInfo” curve inFig. 6. The performance of reranking via different dimensionreduction algorithms is denoted as SInfo+DR algorithm name,e.g., “SInfo LGD” for performance of LGD.

Fig. 7 shows that LGD performs best among these algorithmsand achieves a more satisfactory performance than “SInfo”. Itreflects the effectiveness of LGD in localizing the visual charac-teristics of the user intention. For the other dimension reductionalgorithms, reranked performances are either slightly improvedor dramatically decreased. PCA fails to capture the user-drivenintention since it ignores the labeling information. BDA, LDE,and SLPP, which are all supervised dimension reduction algo-rithms, only utilize a few labelled images. Thus, the subspacelearned by them is biased to that spanned by several labelledimages and cannot generalize well to the large amount of unla-belled ones.

For semi-supervised algorithms, SDA is unsuitable for thereranking task because it assumes that images in an identical

Fig. 7. MAP over all queries with different dimension reduction algorithms.

Fig. 8. Performance of SML-PCA.

class are sampled from a Gaussian. However, in Web imagesearch, each irrelevant image is irrelevant in its own way andthus images in the irrelevant class are not similar to each other,i.e., it is inconvenient to assume that irrelevant images are froman identical Gaussian. Therefore, SDA performed poorly. SMLassumes that all images are sampled from a nonlinear mani-fold. In image search, irrelevant images usually scatter in thewhole space, i.e., they may be distributed uniformly. SML isprone to over-fit to unlabelled images because of the impropermanifold regularization assumption. To justify this point, wereplaced the Laplacian regularization in SML with the globalpatches in LGD. This method is denoted as SML-PCA. Theexperimental results of SML-PCA with varying trade-off pa-rameter (controls the influence of global patches) are given inFig. 8. The figure shows that SML-PCA performs much betterthan SML, but not as well as LGD. The result of LGD-LPP fur-ther confirms that improper manifold regularization is harmful.In contrast with them, the proposed LGD duly learned the sub-manifold of the relevant images and overcome the difficultiesdiscussed above by preserving the local geometry of the labelledrelevant images through local patches and the global structureof the whole image set via global patches. In Figs. 19 and 20, we



Fig. 9. Performance of LGD with samples selected via random and SInforespectively.

further illustrate the active reranked results on queries “GeorgeW. Bush” and “zebra”. For each query, the top-20 ranked im-ages are shown for both the text-based search result and the ac-tive reranked result. For a nice view, we mark the query irrele-vant images appeared in the result with cross “ ”. These figuresshow that the proposed active reranking method is effective totarget user’s intention.

C. LGD With Random Sample Selection

In Section VII-B, we have shown that, when samples are se-lected via SInfo, the performance of reranking conducted inthe original feature space, i.e., the “SInfo” curve in Fig. 7, isconsistently improved when LGD is utilized. As illustrated inFig. 7, “SInfo+LGD” performed better than “SInfo”. To verifythe sensitivity of LGD to sample selection strategy, we fur-ther conducted experiments for LGD when samples were ran-domly selected. The experimental results are given in Fig. 9, inwhich the result of LGD with SInfo is also given for compar-ison. From this figure, we can see that “Random LGD” out-performs “Random” and “SInfo+LGD” outperforms “SInfo”.It demonstrates the robustness of LGD to varying sample se-lection strategies. Further comparing the performance of LGDwith “Random” and “SInfo”, we can see that “SInfo LGD”achieves better performance than “Random+LGD”. This is be-cause more informative samples are selected in “SInfo” and thuswith which LGD can learn the user intentions more effectively.In other words, a better active sample selection algorithm canbring more benefits to LGD. This phenomenon shows that bothsample selection and dimension reduction are important for ac-tive reranking and thus should be elaborately developed.

VIII. PARAMETER SENSITIVITY

In this section, we analyse the sensitivity of important pa-rameters in SInfo and LGD for active reranking. The analysesare performed based on the experiments conducted on the Webimage search dataset. The experiments are conducted with SInfoactive sample selection and LGD dimension reduction, if not ex-plicitly stated otherwise. We first analyse some important fac-

Fig. 10. Performance of SInfo with different �. The solid line indicates theperformance of “RerankInitial”, i.e., reranking without user interactions.

tors: in (3) for SInfo, in (7) and in (13) for LGD. And thenwe investigate the influence of the interaction rounds of activesample selection and the dimension of the projected feature inLGD. The mean AP averaged over AP@1 to AP@100 is uti-lized for overall performance evaluation.

A. Evaluation on Ambiguity Trade-Off Parameter

The in (3) plays an important role in balancing the am-biguity estimation, which is one of the two critical aspectsin SInfo. With close to 1, the ambiguity is derived entirelyfrom the reranked result and the ambiguity contained in thetext search prior is ignored. Fig. 10 shows the performance ofSInfo subject to different . In this experiment, the rerankingis conducted in the original feature. The “RerankInitial”, i.e.,reranking without user interactions, is also given for compar-ison, denoted by the solid line in Fig. 10.

Fig. 10 shows that the performance of SInfo increases whengrowing and arrives at the peak with . This value is

close to the best setup for the text search prior that have been re-ported in other applications which is around 0.85 [15], [17]. Byfurther comparing with “RerankInitial”, we can see that SInfooutperforms it consistently no matter which is adopted. It il-lustrates the effectiveness of SInfo for reranking.

B. Evaluation on Local Patch Trade-Off Parameter

We also investigated the influence of the trade-off parameterin (7) for LGD when building the local patch for labelled

relevant images. A large reflects the importance of separatingirrelevant samples from relevant ones, i.e., the discriminativeinformation, with less attention given to the local geometry ofrelevant images. Fig. 11 shows the performance of LGD withdifferent , from which we can have the following observations.

• When is small, e.g., less than 0.3, the performance isunsatisfactory and even worse than “SInfo” (solid line inFig. 11). This is because that in this situation the local ge-ometry within labelled relevant images is mainly preservedwhile important discriminative information is less consid-ered. This phenomenon reveals the importance of the dis-



Fig. 11. Performance of LGD with different �. The solid line indicates the per-formance of “SInfo”, i.e., active reranking in the original feature space withoutdimension reduction.

Fig. 12. Performance of LGD with different � . The solid line indicates the per-formance of “SInfo”, i.e., active reranking in the original feature space withoutdimension reduction.

criminative information contained in the labelled relevantand irrelevant images.

• The performance of LGD increases when growing andreaches the optimal value at . However, the APdecreases when larger than this best setup and gives asteady performance when in which case the dis-criminative information dominates the local patch and thelocal geometry is ignored.

Therefore, both the local geometry and the discriminate infor-mation reflect the information contained in local patches fromdifferent aspects for complimentary. A suitable combination ofthem is essential to achieve a good performance.

C. Evaluation on Local-Global Patch Trade-Off Parameter

Both the local and global patches reflect data informationfrom different aspects. To investigate the contributions of thesetwo parts, we have tested the performance of LGD with differenttrade-offs . When , only local patches are utilized. When

Fig. 13. Average AP over the first three pages of results.

Fig. 14. Comparison of the average number of irrelevant images per query.

Fig. 15. Performance of LGD with different interaction rounds.

, only global patches are involved and LGD degrades toPCA in this case. A proper is demanded to balance them. Ac-cording to our empirical comparisons, the best setup for is0.03, as shown in Fig. 12. The solid line in this figure indicates



Fig. 16. Performance curves of � with different number of labelled images. (a) # �� , (b) # �� , (c) # �� , (d) # �� .

the performance of “SInfo”, i.e., reranking in the original fea-ture space without dimension reduction. Fig. 12 shows that LGDoutperforms it consistently with various and LGD is robust.

As shown in Fig. 12, the improvement of LGD over PCA oc-curs in a range of [0.01, 0.05] for . This range seems to be alittle narrow. However, it is worth emphasizing that only a smallpart of images (around % in experiments, half forrelevant and half for irrelevant images) are labelled. As a con-sequence, the number of global patches is much more than thatof the labelled relevant and irrelevant patches. After eliminatingissue of the patch number imbalance, the range for is mod-erate, i.e., it is around [1.0, 5.0].

For the comparison between LGD and PCA, in Fig. 12, weonly give the overall performance of mean AP averaging overtop-1 to top-100 ranked images. We refer the reader back toFig. 7 for sufficient details. Fig. 7 shows that LGD outperformsPCA consistently on top-1 to top-100 ranked images. It is worthemphasizing that it is very difficult to improve the baselines forWeb data based applications and 1% improvement is usually ac-knowledged, e.g., TRECVID [19]. The top-20 images are im-portant in Web search because they are displayed on the firstpage and dominant the user’s evaluation of the search results.Comparing with PCA, much more improvements are obtainedby LGD, i.e., LGD finds at least one more relevant image fortop-20 ranked images every five runs. This is practically signif-icant. Fig. 13 shows the average performance of LGD versusPCA over top-1 to top-20, top-21 to top-40, top-41 to top-60,and top-1 to top-60 ranked images, which corresponds to thefirst 3 pages of results (assuming 20 images are displayed oneach page). LGD improves PCA consistently.

Besides the AP, another evaluation criterion [17] is alsointroduced for performance evaluation. It is the average numberof irrelevant images per query among the top-k ranked results.Fig. 14 illustrates the statistical results. Among the top-20ranked images, LGD gives an average of 2.26 irrelevant resultsand represents about 10 percent drop, compared with the2.51 obtained by SInfo. However, PCA gives 2.50 irrelevantresults which are very close to that given by SInfo. For overallevaluation, compared with SInfo, LGD shows about 10% dropconsistently while PCA only gives less than 5%.

Finally, considering the complexity of Web images (collectedfrom varying sources, taken from different viewpoints, with dif-ferent size, qualities/resolutions and complex backgrounds, andhigh diversity), this improvement is practically acceptable.

D. Evaluation on Number of Interaction Rounds for ActiveSample Selection

More labelled images will bring more information and thusa better performance can be achieved. However, users usuallylose their patience after a few interaction rounds. Therefore, itis important to find out a good trade-off between the rerankingperformance and the number of the interaction rounds. In thisexperiment, we investigated the performance of reranking withinteraction rounds varying from 1 to 20. In each round, 5 imagesare selected via SInfo for labeling. LGD is adopted to learn theeffective subspace for reranking.

The experimental results are illustrated in Fig. 15. Zerointeraction round means that the reranking conducted withoutuser interactions, i.e., the “Reranking Initial”. When interactionround increases from 0 to 4, the performance receives dramaticimprovements steadily. However, when more interactionsare performed, the performance increases slowly and evenshows slightly decreasing at certain rounds. As a consequence,reranking with 4 interaction rounds is a good choice by consid-ering both the reranking performance and user tolerance.

E. Influence of Labelled Image Size on Model Parameters

In Sections VIII-B and C, we have discussed the influenceof parameters and in LGD to the reranking performancewhen 20 images (4 interaction rounds with 5 images labelledper round) are labelled. In this section, we turn to investigatethe influence of the number of labelled images on these modelparameters. Fig. 16 shows the performance curves of withdifferent number of labelled images while Fig. 17 illustrates thatof .

The in (7) is utilized to balance the influence of the local ge-ometry and the discriminative information in labelled relevantpatch. A larger indicates more emphasis is assigned to sepa-rating the labelled relevant images from irrelevant ones while asmaller reflects that more attention is assigned to the local ge-ometry of relevant images. In Fig. 16(a), we can see that whenonly 5 images are labelled, a smaller (less than 0.3) givesbetter performance which indicates that the local geometry ismore important. Because the irrelevant images are much morediverse than the relevant ones, over-fitting may occur if moreemphasis is assigned to the discriminative information with onlyfew labelled images. When more images are labelled, the dis-criminative information is more reliable and thus a larger is



Fig. 17. Performance curves of � with different number of labelled images. (a) # �� , (b) # �� , (c) # �� , (d) # �� .

Fig. 18. Performance of LGD with features projected onto the subspaces withdifferent dimensions.

preferred. Fig. 16(c) and (d) shows that the best performance isachieved when is around 1.0 which means the local geometryand the discriminative information are equally important.

The in (13) is utilized to control the influence of the globalpatches. Fig. 17(a) shows that a larger is preferred when fewerimages are labelled. With few labelled images, little informa-tion is contained in them and thus the global patches play themain role. Fig. 17(d) shows that when the number of labelled im-ages is augmented, the discriminative information and the localgeometry become robust and thus a smaller provides betterperformance.

F. Evaluation on Dimension of the Projected Subspace

LGD aims to learn a submanifold from the ambient visual fea-ture space to express the user’s intention. To find out a proper di-mension of the projected feature, the following experiment hasbeen done to investigate the influence of the dimension. Fig. 18shows the performance of LGD with features projected ontothe subspaces with different dimensions. When the dimensionis too low, e.g., less than 50, the learned subspace is insufficientto encode the intention so the reranking performance is poor.When dimension equals or closes to that of the ambient featurespace, i.e., 428 in this paper, no or less benefit can be obtainedfrom LGD. From our experiments, the active reranking achievedits best performance with the dimension of 100, which gave agood trade-off. Besides, lower dimension leads to a less compu-tational cost for active reranking.

Fig. 19. Query “George W. Bush”.

Fig. 20. Query “zebra”.

IX. CONCLUSION

This paper has presented a novel active reranking frameworkfor Web image search by using user interactions. To target theuser’s intention effectively and efficiently, we have proposed anactive sample selection strategy and a dimension reduction al-gorithm, to reduce labeling efforts and to learn the visual char-acteristics of the intention respectively. To select the most in-formative query images, the structural information based ac-tive sample selection strategy takes both the ambiguity and the



representativeness into consideration. To learn the visual char-acteristics, a new local-global discriminative dimension reduc-tion algorithm transfers the local information in the domain ofthe labelled images domain to the whole image database. Theexperiments on both synthetic datasets and a real Web imagesearch dataset have demonstrated the effectiveness of the pro-posed active reranking scheme, including both the sample se-lection strategy and the dimension reduction algorithm.

REFERENCES

[1] D. Cai, X. He, J. Han, and H.-J. Zhang, “Orthogonal laplacianfacesfor face recognition,” IEEE Trans. Image Process., pp. 3608–3614,2006.

[2] D. Cai, X. He, and J. Han, Using Graph Model for Face Analysis, Tech.Rep., 2005, Comput. Sci. Dept., Univ. Illinois, Urbana-Champaign.

[3] D. Cai, X. He, and J. Han, “Semi-supervised discriminant analysis,” inProc. IEEE Int. Conf. Computer Vision, 2007, pp. 1–8.

[4] E. Y. Chang, S. Tong, K. Goh, and C.-W. Chang, “Support vectormachine concept-dependent active learning for image retrieval,” IEEETrans. Multimedia, 2005.

[5] H.-T. Chen, H.-W. Chang, and T. L. Liu, “Local discriminant embed-ding and its variants,” in IEEE Int. Conf. Computer Vision and PatternRecognition, 2005, pp. 846–853.

[6] J. Cui, F. Wen, and X. Tang, “Real time google and live image searchre-ranking,” presented at the ACM Int. Conf. Multimedia, 2008.

[7] R. A. Fisher, “The use of multiple measurements in taxonomic prob-lems,” Ann. Eugen., pp. 179–188, 1936.

[8] Y. Fu and T. Huang, “Image classification using correlation tensoranalysis,” IEEE Trans. Image Process., pp. 226–234, 2008.

[9] Y. Fu, S. Yan, and T. Huang, “Correlation metric for generalizedfeature extraction,” IEEE Trans. Pattern Anal. Mach. Intell., pp.2229–2235, 2008.

[10] X. He, D. Cai, and J. Han, “Learning a maximum margin subspace forimage retrieval,” IEEE Trans. Knowl. Data Eng., pp. 189–201, 2008.

[11] X. He and P. Niyogi, “Locality preserving projections,” Adv. NeuralInf. Process. Syst., 2003.

[12] S. C. H. Hoi and M. R. Lyu, “A semi-supervised active learning frame-work for image retrieval,” in Proc. IEEE Int. Conf. Computer Visionand Pattern Recognition, 2005, pp. 302–309.

[13] H. Hotteling, “Analysis of a complex of statistical variables into prin-cipal components,” J. Ed. Psych., pp. 417–441, 1933.

[14] W. H. Hsu, L. S. Kennedy, and S.-F. Chang, “Video search rerankingvia information bottleneck principle,” in Proc. ACM Int. Conf. Multi-media, 2006, pp. 35–44.

[15] W. H. Hsu, L. S. Kennedy, and S.-F. Chang, “Video search rerankingthrough random walk over document-level context graph,” in Proc.ACM Int. Conf. Multimedia, 2007, pp. 971–980.

[16] H. D. III and D. Marcu, “Domain adaptation for statistical classifiers,”J. Artif. Intell. Res., pp. 101–126, 2006.

[17] Y. Jing and S. Baluja, “Pagerank for product image search,” in Proc.Int. Conf. World Wide Web, 2008, pp. 307–316.

[18] T. Joachims, “Transductive inference for text classification using sup-port vector machines,” in Proc. Int. Conf. Machine Learning, 1999, pp.200–209.

[19] L. S. Kennedy and S.-F. Chang, “A reranking approach for context-based concept fusion in video indexing and retreival,” in Proc. ACMInt. Conf. Image and Video Retrieval, 2007, pp. 333–340.

[20] D. D. Lewis and W. A. Gale, “A sequential algorithm for training textclassifiers,” in Proc. ACM Int. Conf. Research and Development in In-formation Retrieval, 1994, pp. 3–12.

[21] X. Li, S. Lin, S. Yan, and D. Xu, “Discriminant locally linear embed-ding with high-order tensor data,” IEEE Trans. Syst., Man, Cybern. B,Cybern., pp. 342–352, 2008.

[22] Y.-Y. Lin, T.-L. Liu, and H.-T. Chen, “Semantic manifold learning forimage retrieval,” in Proc. ACM Int. Conf. Multimedia, 2005, pp. 06–11.

[23] W. Liu, D. Tao, and J. Liu, “Transductive component analysis,” in Proc.IEEE Int. Conf. Data Mining Series, 2008, pp. 433–442.

[24] I. J. Myung and M. A. Pitt, “Applying occam’s razor in modeling cog-nition: A bayesian approach,” Psych. Bull. Rev., 1997.

[25] H. T. Nguyen and A. Smeulders, “Active learning using pre-clustering,”in Proc. Int. Conf. Machine Learning, 2004, pp. 623–630.

[26] E. Parzen, “The annals of mathematical statistics,” On Estimation of aProbability Density Function and Mode, pp. 1065–1076, 1962.

[27] S. T. Roweis and L. K. Saul, “Nonlinear dimensionality reduction bylocally linear embedding,” Science, pp. 2323–2326, 2000.

[28] D. Tao, X. Li, X. Wu, and S.-J. Maybank, “Geometric mean for sub-space selection,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 260–274,2009.

[29] D. Tao and X. Tang, “Nonparametric discriminant analysis in relevancefeedback for content-based image retrieval,” in Proc. IEEE Int. Conf.Pattern Recognition, 2004, pp. 1013–1016.

[30] Trec-10 Proceddings Appendix on Common Evaluation Measures[Online]. Available: http://trec.nist.gov/pubs/trec10/appendices/mea-sures.pdf

[31] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geo-metric framework for nonlinear dimensionality reduction,” Science, pp.2319–2323, Dec. 2000.

[32] S. Thrun and T. M. Mitchell, “Learning one more thing,” in Proc. Int.Joint Conf. Artificial Intelligence, 1995, pp. 1217–1225.

[33] X. Tian, L. Yang, J. Wang, X. Wu, and X.-S. Hua, “Transductive videoannotation via local learnable kernel classifier,” in Proc. IEEE Int. Conf.Multimedia & Expo, 2008, pp. 1509–1512.

[34] X. Tian, L. Yang, J. Wang, Y. Yang, X. Wu, and X.-S. Hua, “Bayesianvideo search reranking,” in Proc. ACM Int. Conf. Multimedia, 2008,pp. 131–140.

[35] V. N. Vapnik, Statistical Learning Theory. New York: Wiley, 1998.[36] L. Wang, K. L. Chan, and Z. Zhang, “Bootstrapping svm active

learning by incorporating unlabelled images for image retrieval,” inProc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2003,pp. 629–634.

[37] D. Xu, S. Yan, D. Tao, and H.-J. Zhang, “Marginal fisher analysis andits variants for human gait recognition and content-based image re-trieval,” IEEE Trans. Image Process., pp. 2811–2821, 2007.

[38] T. Zhang, D. Tao, X. Li, and J. Yang, “Patch alignment for dimension-ality reduction,” IEEE Trans. Knowl. Data Eng., pp. 1299–1313, 2009.

[39] T. Zhang, D. Tao, and J. Yang, “Discriminative locality alignment,” inProc. European Conf. Computer Vision, 2008, pp. 725–738.

[40] Z. Zhang and H. Zha, “Principal manifolds and nonlinear dimension-ality reduction via tangent space alignment,” SIAM J. Sci. Comput., pp.313–338, 2004.

[41] X. S. Zhou and T. S. Huang, “Small sample learning during multimediaretrieval using biasmap,” in Proc. IEEE Int. Conf. Computer Vision andPattern Recognition, 2001, pp. 11–17.

[42] X. Zhu, Z. Ghahramani, and J. Lafferty, “Semi-supervised learningusing gaussian fields and harmonic functions,” in Proc. Int. Conf. Ma-chine Learning, 2003, pp. 912–919.

[43] X. Zhu, J. Lafferty, and Z. Ghahramani, “Combining active leanringand semi-suppervised learning using gaussian fields and harmonicfunctions,” in Proc. Int. Conf. Machine Learning, 2003, pp. 58–65.

Xinmei Tian received the B.S. degree in 2005 fromthe University of Science and Technology of China,Hefei, where she is currently pursuing the Ph.D. de-gree in the Department of Electronic Engineering andInformation Science.

From December 2007 to July 2008, she was aResearch Intern with the Internet Media Group atMicrosoft Research Asia, Beijing. From August2008 to December 2008, she was a Research Assis-tant with the School of Computing, the Hong KongPolytechnic University. Her current research inter-

ests include computer vision, content-based video analysis, and image/videosearch reranking.



Dacheng Tao (M’07) received the B.Eng. degreefrom the University of Science and Technologyof China, the M.Phil. degree from the ChineseUniversity of Hong Kong, and the Ph.D. degree fromthe University of London, London, U.K.

Currently, he is a Nanyang Assistant Professorwith the School of Computer Engineering, NanyangTechnological University. His research is mainlyon applying statistics and mathematics for dataanalysis problems in computer vision, data mining,machine learning, multimedia, and video surveil-

lance. He has published more than 100 scientific articles extensively in theIEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,the IEEE TRANSACTIONS ON IMAGE PROCESSING, the IEEE TRANSACTIONS ON

KNOWLEDGE AND DATA ENGINEERING, CVPR, ECCV, ICDM, ACM, TKDD,Multimedia, KDD, etc.

Dr. Tao is the recipient of best paper awards and finalists. He holds the K. C.Wong Education Foundation Award of the Chinese Academy of Sciences.

Xian-Sheng Hua (M’04) received the B.S. andPh.D. degrees from Peking University, Beijing,China, in 1996 and 2001, respectively, both inapplied mathematics.

When he was at Peking University, his major re-search interests were in the areas of image processingand multimedia watermarking. Since 2001, he hasbeen with Microsoft Research Asia, Beijing, wherehe is currently a Lead Researcher with the InternetMedia Group. He is also an Adjunct Professor at theUniversity of Science and Technology of China. His

current interests are in the areas of video content analysis, multimedia search,management, authoring, sharing, and advertising. He has authored more than100 publications at prestigious international conferences and journals includingACM MM, CVPR, and the IEEE TRANSACTIONS ON MULTIMEDIA. He also has30 filed or issued patents.

Dr. Hua received the Best Paper Award and the Best Demonstration Awardfrom the ACM International Conference on Multimedia in 2007. He is a memberof the Association for Computing Machinery and serves as an Associate Editorof the IEEE TRANSACTIONS ON MULTIMEDIA and Editorial Board Member ofMultimedia Tools and Applications.

Xiuqing Wu received the B.S. degree from the Uni-versity of Science and Technology of China, Hefei,in 1965.

She is a Professor in the Department of ElectronicEngineering and Information Science, Universityof Science and Technology of China. From 1985 to1986, she was a Visiting Scientist in the Departmentof Computer Science, Carnegie Mellon University,Pittsburgh, PA. Her research interests include intel-ligent information processing, multiresource datafusion, and digital image analysis.


Active reranking for web image search

Education

web image search reranking

animal panda image

image processing

image database

search performance

query panda

mostweb image search

document search