Top Banner
Transductive Regression Piloted by Inter-Manifold Relations Huan Wang [email protected] IE, The Chinese University of Hong Kong, Hong Kong Shuicheng Yan [email protected] Thomas Huang [email protected] ECE, University of Illinois at Urbana Champaign, USA Jianzhuang Liu [email protected] Xiaoou Tang [email protected] IE, The Chinese University of Hong Kong, Hong Kong Abstract In this paper, we present a novel semisuper- vised regression algorithm working on multi- class data that may lie on multiple manifolds. Unlike conventional manifold regression al- gorithms that do not consider the class dis- tinction of samples, our method introduces the class information to the regression pro- cess and tries to exploit the similar config- urations shared by the label distribution of multi-class data. To utilize the correlations among data from different classes, we develop a cross-manifold label propagation process and employ labels from different classes to en- hance the regression performance. The inter- class relations are coded by a set of inter- manifold graphs and a regularization item is introduced to impose inter-class smoothness on the possible solutions. In addition, the algorithm is further extended with the ker- nel trick for predicting labels of the out-of- sample data even without class information. Experiments on both synthesized data and real world problems validate the effective- ness of the proposed framework for semisu- pervised regression. 1. Introduction Large scale and high dimensional data are ubiqui- tous in real-world applications, yet the processing and Appearing in Proceedings of the 24 th International Confer- ence on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the author(s)/owner(s). analysis of these data are often difficult due to the curse of dimensionality as well as the high compu- tational cost involved. Usually, these high dimen- sional data lie approximately on an underlying com- pact low dimensional manifold, which may turn the problem tractable. Substantive works have been de- voted to unveiling the intrinsic structure of the man- ifold data, among which the popular ones include ISOMAP (Tenenbaum et al., 2000), LLE (Roweis & Saul, 2000), Laplacian Eigenmap (Belkin & Niyogi, 2003), and MVU (Weinberger et al., 2004). Besides these unsupervised algorithms that purely ex- plore the manifold structure of the data, researchers have also well utilized the latent manifold structure information from both labeled and unlabeled samples to enhance learning algorithms with limited number of labeled samples. Great success has been achieved in various areas, such as classification (Krishnapuram et al., 2005) (Belkin et al., 2005), manifold alignment (Ham et al., 2005) and regression (Belkin et al., 2004) (Zhu & Goldberg, 2005). Recently increasing atten- tion has been drawn to the semisupervised regres- sion problem by considering the manifold structure. (Chapelle et al., 1999) proposed a transductive algo- rithm minimizing the leave-one-out error of the ridge regression on the joint set composed from both labeled and unlabeled data. To exploit the manifold structure, (Belkin et al., 2004) adds a graph Laplacian regular- ization item to the regression objective, which imposes extra condition on the smoothness along the data man- ifold and it has proved to be quite useful in applica- tions. (Cortes & Mohri, 2007) first roughly transduces the function values from the labeled data to the unla- beled ones utilizing local neighborhood relations, and then optimizes the global objective that best fits the
8

Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Jun 05, 2018

Download

Documents

dinhque
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

Huan Wang [email protected]

IE, The Chinese University of Hong Kong, Hong Kong

Shuicheng Yan [email protected]

Thomas Huang [email protected]

ECE, University of Illinois at Urbana Champaign, USA

Jianzhuang Liu [email protected]

Xiaoou Tang [email protected]

IE, The Chinese University of Hong Kong, Hong Kong

Abstract

In this paper, we present a novel semisuper-vised regression algorithm working on multi-class data that may lie on multiple manifolds.Unlike conventional manifold regression al-gorithms that do not consider the class dis-tinction of samples, our method introducesthe class information to the regression pro-cess and tries to exploit the similar config-urations shared by the label distribution ofmulti-class data. To utilize the correlationsamong data from different classes, we developa cross-manifold label propagation processand employ labels from different classes to en-hance the regression performance. The inter-class relations are coded by a set of inter-manifold graphs and a regularization item isintroduced to impose inter-class smoothnesson the possible solutions. In addition, thealgorithm is further extended with the ker-nel trick for predicting labels of the out-of-sample data even without class information.Experiments on both synthesized data andreal world problems validate the effective-ness of the proposed framework for semisu-pervised regression.

1. Introduction

Large scale and high dimensional data are ubiqui-tous in real-world applications, yet the processing and

Appearing in Proceedings of the 24 th International Confer-ence on Machine Learning, Corvallis, OR, 2007. Copyright2007 by the author(s)/owner(s).

analysis of these data are often difficult due to thecurse of dimensionality as well as the high compu-tational cost involved. Usually, these high dimen-sional data lie approximately on an underlying com-pact low dimensional manifold, which may turn theproblem tractable. Substantive works have been de-voted to unveiling the intrinsic structure of the man-ifold data, among which the popular ones includeISOMAP (Tenenbaum et al., 2000), LLE (Roweis &Saul, 2000), Laplacian Eigenmap (Belkin & Niyogi,2003), and MVU (Weinberger et al., 2004).

Besides these unsupervised algorithms that purely ex-plore the manifold structure of the data, researchershave also well utilized the latent manifold structureinformation from both labeled and unlabeled samplesto enhance learning algorithms with limited numberof labeled samples. Great success has been achievedin various areas, such as classification (Krishnapuramet al., 2005) (Belkin et al., 2005), manifold alignment(Ham et al., 2005) and regression (Belkin et al., 2004)(Zhu & Goldberg, 2005). Recently increasing atten-tion has been drawn to the semisupervised regres-sion problem by considering the manifold structure.(Chapelle et al., 1999) proposed a transductive algo-rithm minimizing the leave-one-out error of the ridgeregression on the joint set composed from both labeledand unlabeled data. To exploit the manifold structure,(Belkin et al., 2004) adds a graph Laplacian regular-ization item to the regression objective, which imposesextra condition on the smoothness along the data man-ifold and it has proved to be quite useful in applica-tions. (Cortes & Mohri, 2007) first roughly transducesthe function values from the labeled data to the unla-beled ones utilizing local neighborhood relations, andthen optimizes the global objective that best fits the

Page 2: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

labels of the training points as well as the estimatedlabels provided by the first step.

None of these state-of-the-art regression algorithmsmake use of the class information to guide the re-gression. In real-world applications, however, multi-class samples are ubiquitous and samples from differ-ent classes can be regarded as lying on multiple man-ifolds. For example, in age estimation, the personalaging process varies differently for different genders,but there still exists consistency between the agingprocesses of male and female. In the pose estimationfrom images of multiple persons, each person can beregarded as one class and the images vary similarlywith poses from different persons. Moreover, we mayalso encounter the multi-modality samples whose rep-resentations may not be consistent due to the lack ofcorrespondence, while the inner manifold configurationfor the label distribution across different modalitiesmay still be similar. Besides, in the function learningstage of the semisupervised regression framework, theclass or modality information is usually easy to obtain,thus it is desirable to utilize this information to furtherboost the regression accuracy. To fully exploit the re-lations among data manifolds, we develop in this papera semisupervised regression algorithm called TRIM

(Transductive Regression piloted by Inter-ManifoldRelations). TRIM is based on the assumption thatthe data of different classes share similar configura-tion for label distributions. In our proposed algorithm,manifolds of different classes are aligned according tothe landmark connections constructed from both thesample label distance and manifold structural similar-ity, and then the labels are transduced across differentmanifolds based on the alignment output. Besides theintra-manifold smoothness condition within each class,our method introduces an inter-manifold regulariza-tion item and employs the transduced labels from var-ious data manifolds to pilot the trend of the functionvalues over all the samples. In addition, the functionto be learnt can be approximated by the functionals ly-ing on the Reproducing Kernel Hilbert Space (RKHS),and the regression on the induced RKHS directly leadsto an efficient solution for the label prediction of theout-of-sample data even without corresponding classinformation.

2. Background

2.1. Intuitive Motivations

Consider the problem of adult height estimation basedon the parents’ heights. Dramatic difference exists be-tween male and female adults, while the general config-uration of the height distribution within each gender

−0.5 0 0.5 1 1.5 2−1

0

1

−1

−0.5

0

0.5

1

Figure 1. Demonstration of the inter-class relations. La-beled samples are marked by ′+′ and the inter-manifoldrelations are indicated by the dashed lines. The manifoldsare aligned and the sample labels are propagated acrossmanifolds to pilot the regression.

may still be similar: the taller the parents are, thehigher the children are supposed to be. While once wemix up the data from both genders, the height distri-bution may become complicated. Figure 1 displays atoy example for the obstacles encountered in the multi-class regression problem. We have two classes of sam-ples and the function values within each class manifoldshare similar configurations, while the number of sam-ples and labels may vary across classes. As is shown inthe upper part of Figure 1, it is difficult to get a sat-isfactory regression if we put all the samples togetherdue to the complex structures produced by the mani-fold intersections. On the other hand, if the regressionis carried out separately on different classes, the ac-curacy could still be low due to the lack of sufficientlabeled samples for each class. Also, for the incom-ing out-of-sample data, the prediction cannot be donewithout class information. Moreover, we may facesome regression problems with multi-modality sam-ples, e.g., the estimation of human ages from both pho-tos and sketch images. It is meaningless to composedata of different modalities into one manifold since thesemantics of the features from different modalities areessentially different, while it is still possible that theintra-modality label configurations are similar. In thispaper we focus on the multi-class regression problem,but our framework can also be easily extended to givepredictions for the multi-modality data.

2.2. Related Work

Multi-view learning (Blum & Mitchell, 1998) tech-nique can also be applied to the regression problemwith multi-class data. Denote the labeled and unla-beled instances as xl ∈ X l ⊆ X and xu ∈ X u ⊆ Xrespectively, where X l,X u,X are the labeled, unla-beled and whole sample sets. State-of-the-art semisu-pervised multi-view learning frameworks employ mul-tiple regression functions from the Hilber space Hv sothat the estimation error on the training set and dis-agreement among the functions on the unlabeled data

Page 3: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

are minimized (Brefeld et al., 2006), i.e.,

f̃v|Ml

v=1 = minfv∈H

Ml∑

v=1

(∑

x∈X l

c(fv(xl), y(xl)) + γ‖fv‖2)

Ml∑

v1,v2=1

x∈Xu

c(fv1(xu) − fv2

(xu)),

(1)

where y(x) is the sample label, fv|Ml

v=1 are Ml multiplelearners and c(·) is a cost function.

We would like to highlight beforehand some propertiesof our framework compared to the multi-view learningalgorithms:

1. There exists a clear correspondence between themulti-view data from the same instance in themulti-view learning framework, while our algo-rithm does not require the correspondence amongdifferent manifolds. The data of different modali-ties or classes may be obtained from different in-stances in our configuration, thus it is much morechallenging to give an accurate regression.

2. The class information is utilized in two ways:a) Sample relations within each class are coded byintra-manifold graphs and a corresponding regu-larization item is introduced to ensure the within-class smoothness separately.b) A set of inter-manifold graphs are constructedfrom the cross manifold label propagation on thealigned manifolds and an inter-manifold regular-ization item is proposed to fully exploit the infor-mation conveyed among different classes.

3. The class information is used in the functionlearning phase but no class attributes are requiredfor the out-of-sample data in the prediction stage.

3. Problem Formulation

3.1. Notations

Assume that the whole sample set X consists of Nsamples from M classes, denoted as X1,X2, ...,XM .For each class Xk, Nk = lk + uk samples are given,i.e.,

Xk = {(xk1 , yk

1 ), ..., (xklk , yk

lk), (xklk+1, y

∗klk+1),

..., (xklk+uk , y∗k

lk+uk)},

where Y kl = [yk

1 , yk2 , ..., yk

lk ]T represents the functionvalues with respect to the given labeled samples andY k

u = [y∗klk+1, ..., y

∗klk+uk ]T corresponds to the function

values of the remaining unlabeled data to be estimatedin the kth class. In this paper, the ′label′ refers to areal value to be regressed.

Let Gk = (V k, Ek) denote the intra-class graph withvertex set V k and edge set Ek constructed within thedata of the kth class. Here we focus on the undi-rected graph and it is easy to generalize our algo-rithm for directed graphs. The edges in Ek reflectthe neighborhood relations along the manifold data,which can be defined in terms of k-nearest neighborsor an ǫ-ball distance criterion in the sample featurespace F . One choice for those non-negative weightson the corresponding edges is to use the heat kernel(Belkin & Niyogi, 2003) or the inverse of feature dis-tances (Cortes & Mohri, 2007), i.e.,

wij =e−‖Φk(xi)−Φk(xj)‖2

t

or wij =‖Φk(xi) − Φk(xj)‖−1,

where t ∈ R is the parameter for the heat kernel andΦk(·) is a feature mapping from X to the normed fea-ture vector space F for the sample of the kth class.For samples from different modalities/manifolds, thefeature mappings may be different. The other one isto solve a least-square problem to minimize the recon-struction error and get the weights wij :

wij = arg minwij

‖xi −∑

j

wijxj‖2,

s.t.∑

j

wij = 1, wij ≥ 0

(Roweis & Saul, 2000).

To encode the mutual relations among different sam-ple classes, we also introduce the inter-manifold graph,denoted as a triplet Gkikj = (V ki , V kj , Ekikj ). Theinter-manifold graph Gkikj is a bipartite graph withone vertex set from the kth

i class and the other fromthe kth

j . The construction of Gkikj will be discussed inthe following subsections.

3.2. Regularization along Data Manifold

Now we are given the sample data of multiple classes,denoted as X , including both labeled and unlabeledsamples X l and Xu. The manifold structures withineach class data are encoded by the intra-class graph.(Belkin et al., 2004) introduced a manifold regulariza-tion item for the semisupervised regression framework,i.e., the graph Laplacian, which is expected to imposesmoothness conditions along the manifold on the pos-sible solutions. The final formulation seeks a balancebetween the fitting item and the smoothness regular-

Page 4: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

ization, i.e.,

f̃ = arg min1

l

i

(fi − yi)2 + γf tLpf, (2)

where p ∈ N . When p = 1, the regularization item isthe graph Laplacian and for p = 2, the regularizationturns out to be the 2-norm of the reconstruction errorwhen the weights wij are normalized, i.e.,

fT LT Lf =∑

i

‖fi −∑

j

wijfj‖2

w.r.t.∑

j

wij = 1, wij ≥ 0.

3.3. Cross Manifold Label Propagation

In our configuration, there does not exist a clear corre-spondence among the manifolds of different classes ormodalities, and even the representations for differentmodalities can be distinct. Thus it is rather difficultto construct sample relations directly from the similar-ity in the sample space X . Alternatively, the functionlabels still convey some correspondence information,which may be utilized to guide the inter-manifold re-lations. Moreover, the manifold structure also containssome indications about the correspondence. Thus wefirst seek a point-to-point correspondence for the la-beled data combining the indications provided by boththe sample labels and the manifold structures. This isdone under two assumptions:

1. Samples with similar labels lie generally in similarrelative positions on the corresponding manifold.

2. Corresponding sample sets tend to share similargraph structures on the respective manifold.

3.3.1. Reinforced Landmark Correspondence

First we search for a set of stable landmarks to guidethe manifold alignment. Specifically, we use the ǫ-ball distance criterion on the sample labels to initializethe inter-manifold graphs. To give a robust correspon-dence, we reinforce the inter-manifold connections byiteratively implementing

W kikj ⇐ W ki × W kikj × W kj . (3)

Similar to the similarity propagation process on thedirected graph (Blondel et al., 2004), (3) reinforcesthe similarity score of sample pairs by the similarityof its neighbor pairs, i.e.,

wkikj

ij ⇐∑

m,n

wki

imwkikjmn w

kj

nj . (4)

1 2 1 2 2 1 2 1

1 3 3 4 2 4

1 4 4 3 2 3

AB A AB B

a b a a a b b b

A AB B

a a a b b b

A AB B

a a a b b b

W W W W

W W W

W W W

=

+

+

a 1

a 2

a 3

a 4

a 5

a 6b 2

b 1

b 4

b 3

Figure 2. Demonstration of reinforced landmark corre-spondence. The similarity score of a1 and b2 is reinforcedby the scores of neighbor pairs.

The assumption here is that two nodes tend to be sim-ilar if they have similar neighbors. (4) utilizes theintra-manifold structure information to reinforce theinter-manifold similarity and thus can generate a morerobust correspondence. The accuracy of the landmarkcorrespondence is critical for our algorithm. To en-sure a robust performance, only the correspondenceswith the top 20% largest similarity scores are selectedas the landmarks and it is common that some classesmay miss certain sample labels, so plenty of labeledsamples remain unmatched.

3.3.2. Manifold Alignment

To propagate the sample labels to the unlabeled pointsacross manifolds, we ’stretch’ all the manifolds with re-spect to the landmark points obtained from the previ-ous step and this can be realized by the semisupervisedmanifold alignment (Ham et al., 2005).

In the manifold alignment process, we seek an embed-ding that minimizes the correspondence error on thelandmark points and at the same time keeps the intra-manifold structures, i.e.,

fki |Mki=1 = arg min(C(fki |Mki=1)

kifkiT Dkifki

), (5)

and

C(fki |Mki=1) =∑

kikj

wkikj

ij ‖fki

xkii

− fkj

xkj

j

‖2

+γM∑

ki=1

(fkiT Lpki

fki) + βfT Laf, (6)

where Dki is the diagonal degree matrix, Lkiis the

graph Laplacian matrix for the kthi class and p ∈ N.

To ensure the inter-class adjacency, we add a globalcompactness regularization item βfT Laf to the costfunction C, where La is the Laplacian Matrix of W a

with

waij =

{

1 if xi and xj are of different classes

0 o.w.

The labels are propagated across manifolds on the de-rived aligned manifolds using the nearest neighbor ap-

Page 5: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

Algorithm 1 Procedure to construct inter-manifoldconnections

1: Inter-manifold graph initialization

wkikj

ij =

{

1 if ‖yki

i − ykj

j ‖2 < ǫ

0 o.w.

2: Correspondence Reinforcementfor Iter = 1 : NIter

W kikj = W ki × W kikj × W kj

end3: Landmark Selection: Select the sample pairs with

the top 20% largest similarity scores as correspon-dence. Set the corresponding elements in W kikj as1 and others 0.

4: Manifold Alignment using the Inter-manifoldGraphs W kikj .

5: Find the corresponding points from differentclasses for the unmatched labeled samples usingthe nearest neighbor approach in the aligned spaceand update the inter-manifold graphs W kikj .

proach, i.e., we connect the labeled samples with thenearest points from other classes on the aligned space.The derived inter-manifold graphs are concatenated toform

W r =

O W 12 ... W 1M

W 21O ... W 2M

... ... ... ...WM1 WM2 ... O

, (7)

which is then symmetrized and employed in the fol-lowing Inter-Manifold regularization item.

3.4. Inter-Manifold Regularization

We rearrange the samples to place the data from thesame class together and put the labeled points first foreach class, that is,

X = {x11, x

12, ..., x

1l1 , x

1l1+1, ..., x

1l1+u1 , x2

1, x22, ..., x

2l2 , x

2l2+1,

..., x2l2+u2 , ..., xM

1 , xM2 , ..., xM

lM , xMlM+1, ..., x

MlM+uM }.

Denote the corresponding function values as f , whichis a concatenation of fk = [fk

xk1, ..., fk

xk

lk+uk

]T from all

the classes. Our regression objective is defined as:

f̃ = arg minf

k

1

lk

xki∈X l

‖fkxk

i

− yki ‖

2

+β∑

k

1

(Nk)2(fk)T Lp

kfk +λ

N2fT Lrf, (8)

where Lr is the Laplacian matrix of the symmetrizedW r.

The minimization of the objective is achieved when

f̃ = R−1∑

k

1

lk(Sk

lkSk)T Y kl (9)

where

Sklk =

(

Ilk×lk Olk×uk

)

,

Sk =

(

ONk×

k−1P

ki=1

Nki

INk×Nk ONk×

MP

ki=k+1

Nki

)

are label and class selection matrices respectively and

R =∑

k

1

lk(Sk

lkSk)T (SklkSk)

+ β∑

k

1

(Nk)2SkT Lp

kSk +λ

N2Lr.

4. Regression on Reproducing KernelHilbert Space (RKHS)

A series of algorithms such as SVM, Ridge regressionand LapRLS (Belkin et al., 2005) employ different reg-ularization items and empirical cost measures to theobjective and solve the optimization problem in an ap-propriately chosen Reproducing Kernel Hilbert Space(RKHS). One merit for the regression on RKHS is theability to predict out-of-sample labels.

Let K denote a Mercer kernel: X × X → R and HK

denote the induced RKHS of functions X → R withnorm ‖ ‖K . The regression with inter-manifold regu-larization on the RKHS is defined as

f̃ = arg minf∈HK

k

1

lk

xki∈X l

‖fkxk

i

− yki ‖

2 + γ‖f‖2K

+β∑

k

1

(Nk)2(fk)T Lp

kfk +λ

N2fT Lrf. (10)

Similar to the Tikhonov regularization, we add anRKHS norm penalization item to the TRIM algorithmas a smoothness condition.

Page 6: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

(a)

MAE:586.46

(b)

MAE:231.87

(c)

MAE:393.78

(d)

Figure 3. Regression on the nonlinear Two Moons Dataset. (a) Original Function Value Distribution. (b) TraditionalGraph Laplacian Regularized Regression (separate regressors for different classes). (c) Two Class TRIM. (d) Two ClassTRIM on RKHS. Note the difference in the area indicated by the rectangle.

For the multi-class data under the same modality, wehave the following theorem:

Theorem-1. The solution to the minimization prob-lem (10) admits an expansion

f̃(x) =

N=P

k(lk+uk)

i=1

αiK(xi, x) (11)

Theorem-1 is a special version of the Generalized

Representer Theorem (Schlkopf et al., 2001)and the proof is omitted here. It says that theminimizer of (10) can be expressed in terms ofthe linear expansion of K(xi, x) on both labeledand unlabeled data over all the sample classes.Thus the minimization over Hilbert space boilsdown to minimizing the coefficient vector α =[α1

1, ..., α1l1 , ..., α

1l1+u1 , ..., αM

1 , ..., αMlM , ..., αM

lM+uM ]T

over RN and the minimizer is given by:

α̃ = J−1∑

k

1

lk(Sk

lkSk)T Y kl , (12)

where

J =∑

k

1

lk(Sk

lkSk)T (SklkSk)K + γI

+ β∑

k

1

(Nk)2SkT Lp

kSkK +λ

N2LrK.

and K is the N × N Gram matrix of labeled and un-labeled points over all the sample classes.

For the out-of-sample data, the estimated labels canbe obtained using:

ynew =

N=P

k(lk+uk)

i=1

α̃iK(xi, xnew). (13)

Note here in this framework the class information forthe incoming sample is not required in the predictionstage.

5. Experiments

We performed experiments on two synthetic datasetsand one real-world regression problem of human ageestimation. Comparisons are made with the tradi-tional graph Laplacian regularized semisupervised re-gression (Belkin & Niyogi, 2003). We also evaluate thegeneralization performance of the multi-class regres-sion on RKHS. In the experiments, the intra-manifoldgraphs are constructed using 10 nearest neighbors andthe inter-manifold graphs are constructed by follow-ing the procedure described in Section 3. For all theconfigurations, the parameter β for the intra-manifoldgraph is empirically set as 0.001 and the λ for theinter-manifold regularization is set as 0.1. For thekernelized algorithms, the coefficient for the RKHSnorm γ = 0.001 and the gaussian kernel K(x, y) =exp{−‖x − y‖2/δ2

o} with parameters δo = 21/2.5δ isapplied, where δ is the standard deviation of the sam-ple data. We use the Mean Absolute Error (MAE)criterion to measure the regression accuracy and it isdefined as an average of the absolute errors betweenthe estimated labels and the ground truth labels.

5.1. Synthetic Data: Nonlinear Two Moons

The nonlinear two moons dataset is shown in Figure3. The colors in the figure are associated with thefunction values and the variation of those sample la-bels along the manifold is not uniform. The labeledsamples are marked by ′+′ and their distributions arequite different across classes. As we can see, the sam-ple labels for the class lying on the upper part of thefigure are not enough to give an accurate guidance forthe regression on the nonlinear label distribution. Thetraditional graph Laplacian regularized regression al-gorithm does not make use of the information conveyedby the inter-class similarity and the prediction resultis not satisfactory, while in our algorithm the samplelabels from different classes can be utilized to guidethe regression and thus estimation accuracy is muchhigher.

Page 7: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

(a)

MAE:248.50

(b)

MAE:144.11

(c)

MAE:107.64

(d)

Figure 4. Regression on Cyclone Dataset: (a) Original Function Values. (b) Traditional Graph Laplacian RegularizedRegression (separate regressors for different classes). (c) Three Class TRIM. (d) Three Class TRIM on RKHS.

5.2. Synthetic Data: Three-class Cyclones

One merit of our algorithm is that the regression ac-curacy may be boosted as the class number increases.This is because the cross manifold information thatcould be utilized grows rapidly as the class number in-creases. The Cyclone dataset consists of three classesof samples and the class distribution is demonstratedin Figure 5. The label distributions among differentclasses are quite similar, while the labeled samplesscatter differently from class to class. As we may ob-serve from Figure 4, without the inter-class regulariza-tion, the regression for certain class may fail due to thelack of sufficient labeled samples while our algorithmstill gives a satisfying performance.

5.3. Human Age Estimation

In this experiment we consider as a regression problemthe age estimation from facial images. In real applica-tions, we often cannot obtain enough age labels thoughwe may easily get plenty of facial images. Those unla-beled data can be used to guide our regression in thesemisupervised framework. One challenge for this es-timation is caused by the gender difference. Althoughthe male and female may share some similar configu-rations in the age distribution, just as demonstrated inFigure 1, mixing the two may complicate the regres-sion problem. On the other hand, the gender informa-tion is usually easy to obtain and it is thus desirableto use both the age labels and gender information toguide the regression.

The aging data set used in this experiment is theYamaha database, which contains 8000 Japanese fa-cial images of 1600 persons with ages ranging from0 to 93. Each person has 5 images and the Yamahadatabase is divided into two parts with 4000 imagesfrom 800 males and another 4000 images for 800 fe-males. The ages distribute evenly from 0 to 69 with3500 male images and 3500 female images, and the restare distributed in the age gap from 71 to 93. We ran-domly sampled 1000 photos from the male and female

Figure 5. Class Distribution of the Cyclone Dataset

subset respectively and thus altogether 2000 imagesare used in our experiments. Before the regression,the input image data is preprocessed with PrincipalComponent Analysis and the first 20 dimensions areused for data projection. To fully evaluate the regres-sion performance for both close set samples and thoseout-of-sample data, we design two experimental con-figurations for this dataset.

Configuration 1: Close Set Evaluation. In thisconfiguration, we do not have the out-of-sample dataand the close set performance of TRIM is evaluated incomparison with the traditional graph Laplacian reg-ularized regression (Belkin et al., 2004). The closeset contains altogether 2000 samples with 1000 im-ages from males and 1000 images from females re-spectively. We vary the number of randomly selectedlabeled samples and examine the performance of dif-ferent regression algorithms. The comparison resultsbetween TRIM and the traditional single class Lapla-cian Regularized regression are shown in Figure 6.We can observe that our algorithm generally gives ahigher regression accuracy and the performance im-provement is remarkable especially when the numberof labeled samples is small. As the number of samplelabels increases, the difference between the two algo-rithms becomes smaller. This may be caused by thefact that as the labels are sparse, the label guidanceis far less enough and thus the class information andinter-manifold relations will have a greater influenceon the regression accuracy while when the labels areabundant enough to guide the regression, the class in-formation as well as those inter-manifold connectionsbecomes less important.

Configuration 2: Open Set Evaluation. Now weexamine the out-of-sample prediction performance for

Page 8: Transductive Regression Piloted by Inter-Manifold … · Transductive Regression Piloted by Inter-Manifold Relations labels of the training points as well as the estimated labels

Transductive Regression Piloted by Inter-Manifold Relations

Figure 6. TRIM vs traditional graph Laplacian regular-ized regression for the close set evaluation on YAMAHAdatabase.

the kernelized TRIM compared with kernelized ver-sion of the traditional graph Laplacian regularized re-gression (Belkin et al., 2005). In this configuration,the sample set is divided into two subsets: one for theregression function training and the other for the eval-uation of aging estimation performance on those out-of-sample data. The training set contains randomlyselected 800 male and 800 female images and the re-maining ones are used for out-of-sample evaluation. Inthe testing phase we do not input any gender informa-tion. As is demonstrated in Figure 7, our algorithmachieves a lower MAE in both training and testing sets.Another observation similar as in the case for close setconfiguration is that the regression accuracy improve-ment grows as the sample labels become sparser.

6. Conclusions

In this paper, we have presented a novel algorithm ded-icated for the regression problem on multi-class data.Manifolds constructed from different classes are reg-ularized separately and to utilize the inter-manifoldrelations, we developed an efficient cross manifold la-bel propagation method and the labels from differ-ent classes can thus be employed to pilot the regres-sion. Moreover, the regression function is further ex-tended by the kernel trick to predict the labels of theout-of-sample data without class information. Bothsynthesized experiments and real world applicationsdemonstrated the superiority of our proposed frame-work over the state-of-the-art semisupervised regres-sion algorithms. To the best of our knowledge, this isthe first work to discuss the semisupervised regressionproblem on multi-class data.

7. Acknowledgement

The work described in this paper was supported bythe Research Grants Council of the Hong Kong SAR(Project No. CUHK 414306) and DTO ContractNBCHC060160 of USA.

(a) (b)

Figure 7. Open set evaluation for the kernelized regressionon the YAMAHA database. (a) Regression on the trainingset. (b) Regression on out-of-sample data.

References

Belkin, M., Matveeva, I., & Niyogi, P. (2004). Regular-ization and semi-supervised learning on large graphs.COLT.

Belkin, M., & Niyogi, P. (2003). Laplacian eigenmaps fordimensionality reduction and data representation. Neu-ral Comput.

Belkin, M., Niyogi, P., & Sindhwani, V. (2005). On mani-fold regularization. AISTATS.

Blondel, V. D., Gajardo, A., Heymans, M., Senellart, P., &Dooren, P. V. (2004). A measure of similarity betweengraph vertices: Applications to synonym extraction andweb searching. SIAM Rev.

Blum, A., & Mitchell, T. (1998). Combining labeled andunlabeled data with co-training. COLT.

Brefeld, U., Gartner, T., Scheffer, T., & Wrobel, S. (2006).Efficient co-regularised least squares regression. ICML.

Chapelle, O., Vapnik, V., & Weston, J. (1999). Transduc-tive inference for estimating values of functions. NIPS.

Cortes, C., & Mohri, M. (2007). On transductive regres-sion. In Nips.

Ham, Lee, & Saul (2005). Semisupervised alignment ofmanifolds. AISTATS.

Krishnapuram, B., DavidWilliams, Xue, Y., Hartemink,A., & Carin, L. (2005). On semi-supervised classifica-tion. NIPS.

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimension-ality reduction by locally linear embedding. Science.

Schlkopf, B., Herbrich, R., & Smola, A. J. (2001). A gen-eralized representer theorem. COLT/EuroCOLT.

Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). Aglobal geometric framework for nonlinear dimensionalityreduction. Science.

Weinberger, K. Q., Sha, F., & Saul, L. K. (2004). Learninga kernel matrix for nonlinear dimensionality reduction.ICML.

Zhu, X., & Goldberg, A. B. (2005). Semi-supervised re-gression with order preferences. Computer Sciences TR.