Top Banner
Research Article Unsupervised Domain Adaptation Using Exemplar-SVMs with Adaptation Regularization Yiwei He , 1 Yingjie Tian , 2,3,4 Jingjing Tang , 5 and Yue Ma 5 1 School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049, China 2 Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China 3 Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China 4 School of Economics and Management, University of Chinese Academy of Sciences, Beijing 100190, China 5 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China Correspondence should be addressed to Yingjie Tian; [email protected] Received 28 August 2017; Accepted 18 February 2018; Published 22 April 2018 Academic Editor: Shirui Pan Copyright © 2018 Yiwei He et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Domain adaptation has recently attracted attention for visual recognition. It assumes that source and target domain data are drawn from the same feature space but different margin distributions and its motivation is to utilize the source domain instances to assist in training a robust classifier for target domain tasks. Previous studies always focus on reducing the distribution mismatch across domains. However, in many real-world applications, there also exist problems of sample selection bias among instances in a domain; this would reduce the generalization performance of learners. To address this issue, we propose a novel model named Domain Adaptation Exemplar Support Vector Machines (DAESVMs) based on exemplar support vector machines (exemplar-SVMs). Our approach aims to address the problems of sample selection bias and domain adaptation simultaneously. Comparing with usual domain adaptation problems, we go a step further in slacking the assumption of i.i.d. First, we formulate the DAESVMs training classifiers with reducing Maximum Mean Discrepancy (MMD) among domains by mapping data into a latent space and preserving properties of original data, and then, we integrate classifiers to make a prediction for target domain instances. Our experiments were conducted on Office and Caltech10 datasets and verify the effectiveness of the model we proposed. 1. Introduction Over the past decades, machine learning technologies have achieved significant success in various areas, such as com- puter vision [1], natural language processing [2], and video detection [3]. However, traditional machine learning meth- ods assume that training and testing data come from the same domain, which implies that training or testing data are drawn from the same distribution and represented in the same feature spaces. is assumption is too violated to be held in the real world as collecting suitable and enough labeled data is time consuming and an expensive manual effort. Lacking labeled data, most of traditional machine learning methods always lose their generalization performance in reality. erefore, it is desired to utilize the data of the relational domain to help training a robust learner for target domains. Driven by this requirement, transfer learning has rapidly developed in recent years [4]. Transfer learning slacks the assumption of the traditional machine learning in which data or labels are drawn from the same distribution and represented in the same feature space. In the transfer learning settings, it is always assumed that domains are similar or related, with even no relationships, which is instead of i.i.d. assumption. us, transfer learning has a strong motivation when developing the classical machine learning functions or applying the functions to real-world applications. Besides, transfer learning can be regarded as a supplement of classical machine learning methods. One is the problem of covariate shiſt or sample selection bias. Another motivation is that we want to train a universal or general model as a predictor for all the tasks, viewed as the parameter or learner shared. It is also considered as a goal of Artificial General Intelligence. Transfer learning aims to utilize source or related domains to help target domain tasks. It has achieved significant success Hindawi Complexity Volume 2018, Article ID 8425821, 13 pages https://doi.org/10.1155/2018/8425821
14

Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Mar 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Research ArticleUnsupervised Domain Adaptation Using Exemplar-SVMs withAdaptation Regularization

Yiwei He 1 Yingjie Tian 234 Jingjing Tang 5 and Yue Ma 5

1School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing 100049 China2Research Center on Fictitious Economy and Data Science Chinese Academy of Sciences Beijing 100190 China3Key Laboratory of Big Data Mining and Knowledge Management Chinese Academy of Sciences Beijing 100190 China4School of Economics and Management University of Chinese Academy of Sciences Beijing 100190 China5School of Mathematical Sciences University of Chinese Academy of Sciences Beijing 100049 China

Correspondence should be addressed to Yingjie Tian tyjucaseducn

Received 28 August 2017 Accepted 18 February 2018 Published 22 April 2018

Academic Editor Shirui Pan

Copyright copy 2018 Yiwei He et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Domain adaptation has recently attracted attention for visual recognition It assumes that source and target domain data are drawnfrom the same feature space but different margin distributions and its motivation is to utilize the source domain instances to assistin training a robust classifier for target domain tasks Previous studies always focus on reducing the distribution mismatch acrossdomains However inmany real-world applications there also exist problems of sample selection bias among instances in a domainthis would reduce the generalization performance of learners To address this issue we propose a novel model named DomainAdaptation Exemplar Support Vector Machines (DAESVMs) based on exemplar support vector machines (exemplar-SVMs) Ourapproach aims to address the problems of sample selection bias and domain adaptation simultaneously Comparing with usualdomain adaptation problems we go a step further in slacking the assumption of iid First we formulate the DAESVMs trainingclassifiers with reducingMaximumMeanDiscrepancy (MMD) among domains bymapping data into a latent space and preservingproperties of original data and then we integrate classifiers to make a prediction for target domain instances Our experimentswere conducted on Office and Caltech10 datasets and verify the effectiveness of the model we proposed

1 Introduction

Over the past decades machine learning technologies haveachieved significant success in various areas such as com-puter vision [1] natural language processing [2] and videodetection [3] However traditional machine learning meth-ods assume that training and testing data come from thesame domain which implies that training or testing data aredrawn from the same distribution and represented in thesame feature spacesThis assumption is too violated to be heldin the real world as collecting suitable and enough labeleddata is time consuming and an expensive manual effortLacking labeled data most of traditional machine learningmethods always lose their generalization performance inreality Therefore it is desired to utilize the data of therelational domain to help training a robust learner for targetdomains Driven by this requirement transfer learning has

rapidly developed in recent years [4] Transfer learning slacksthe assumption of the traditional machine learning in whichdata or labels are drawn from the same distribution andrepresented in the same feature space In the transfer learningsettings it is always assumed that domains are similar orrelated with even no relationships which is instead of iidassumption Thus transfer learning has a strong motivationwhen developing the classical machine learning functions orapplying the functions to real-world applications Besidestransfer learning can be regarded as a supplement of classicalmachine learning methods One is the problem of covariateshift or sample selection bias Another motivation is that wewant to train a universal or general model as a predictor forall the tasks viewed as the parameter or learner shared Itis also considered as a goal of Artificial General IntelligenceTransfer learning aims to utilize source or related domains tohelp target domain tasks It has achieved significant success

HindawiComplexityVolume 2018 Article ID 8425821 13 pageshttpsdoiorg10115520188425821

2 Complexity

in various practical applications such as face recognition[5] natural language processing [6] cross-language textclassification [7]WiFi localization [8] ormedicine image [9]

Domain adaptation is a subproblem of transfer learningwhich assumes that source and target domain data aregenerated from the same feature and label space but differentmargin probability distributions It aims to solve the problemsthat there is none or less labeled data in the target domainand usually use labeled data in the source domain to assist thetraining of target domain tasks Massive works focus on thedomain adaptation problems and they also extend to someapplications such as WiFi location text sentiment analysisand image classification for multidomains Since distributionmismatch generally exists in the real-world applicationsthere is also some other research area concern about domainadaptation For example extreme learning machine (ELM) isan efficient model for training single-hidden layer networks[10] There are also some ELM works in a domain adaptationsetting [11 12]They utilize most previous domain adaptationclassifiers that have added constraint term which is basedon using instance reweighting to minimize Maximum MeanDiscrepancy (MMD) [13] However these methods need toassume that the difference between the source and targetdomain is not too large Namely this idea requires thatdifferent domains are similar

Most pattern recognition problems can be transformedinto several basic classification tasks Generally speakingclassification tasks assume that a category can be representedby a hyperplane [14 15] and most of the machine learningalgorithms aim to learn hyperplanes to predict for unseeninstancesMeanwhile to improve the ability of representationby a hyperplane there are some works which cluster thesamples first and then solve the classification tasks on theclusters In contrast to the category classification tasks acluster classifier can include more information about thepositive category but the more risks of overfitting Moti-vated by the object detection [16] proposed an extremeclassification model training the classifiers for every positiveinstance and all the negative instances named exemplarsupport vector machines (E-SVMs) In fact exemplar-SVMscan be viewed as an extreme situation of cluster-level SVMin which every positive sample is regarded as a cluster Thereare two viewpoints about the reason why the exemplar-SVMachieves a surprising generalization performance One of theviewpoints is taking the exemplar-SVMs as a representationwith complete details of positive instances In other wordsevery classifier captures details of the positive instance likebackground corner color or orientations and most of theclassifiers can describe the category more intrinsically Fromtransfer learning viewpoint training data cannot satisfythe underlying assumption of iid as every instance inthe training set may be different from each other namelysample selection bias [17] Each exemplar-SVMs classifier istrained on a high weight positive sample and other negativesamples it can represent the positive sample well in the samedistribution Recently [18] extends exemplar-SVMs into atransfer learning form which uses loss function reweightingand adds a low-rank regularization item for classifiers

In this work we propose a novel model to addressunsupervised domain adaptation problems that there isno label on target domain data Furthermore it permitsdistribution mismatch among instances In our model wetrain kernel exemplar classifiers for every positive instanceand then integrate the classifier tomake a prediction for targetdomain data To align the distribution mismatch we embedthe regularization item based on TCA in our classifiers Inour opinion the model constructs the bridge to transfer theknowledge and we use the information in the kernel matrixwhich includes the instances representation in the high-dimension space to assist classifier training across domainsFor the problem of sample selection bias we integratethe classifiers to make a prediction Basically the step ofintegration is to expand the representation of hyperplanesthat entirely take advantage of details learned before

Our contributions are as follows (1)We propose a novelunsupervised domain adaptation model based on exemplar-SVMs named Domain Adaptation Exemplar Support VectorMachines (DAESVMs) and it improves standard domainadaptation prediction accuracy by transferring knowledgeacross domains (2) Every DAESVM classifier constructs abridge that transmits knowledge from the source domainto target domain Compared with the traditional two-stepmethod this strategy thoroughly searches the optimizationpoint of the model whichmakes the classification hyperplanemore precious about domains (3) To solve the problemof sample selection bias we use the ensemble methods tointegrate the classifiersThe process of the ensemble is similarto slacking the classification hyperplane which drops offsomeunreliable classification results and use the reliable partsto make a prediction (4) We bring in the method of thepseudo label in DAESVMs inspired by [19] to supplement theinformation of target domain and the experiments verify theeffectiveness of the pseudo label (5) We push a step furtherto extend to implementing DAESVMs on the multidomainadaptation The rest of this paper is organized as follows InSection we introduce the notation of the problem Mean-while we review the related works of domain adaptationexemplar-SVM andTransfer Component Analysis (TCA) InSection we introduce the deduction process ofDAESVMandformulate themodel In Section we propose the optimizationalgorithm for our model In Section we integrate all theDAESVMs classifiers to make a prediction In Section weanalyze the experiments on some transfer learning dataset toverify the effectiveness of DAESVMs In Section we concludeour work and give an expectation

2 Notation and Related Works

This section will introduce the notation and related worksabout this paper

21 Notation In this paper we use the notation of [4] defi-nition in transfer learning and the definition just considersthe condition of one source domain and one target domainFirst it needs to define the Domain and Task Domain Dis composed of a feature space X and a margin probabilitydistribution 119875(119909) namely D = X 119875(119909) 119909 isin X Task T

Complexity 3

is composed of a label spaceY and a prediction model 119891(119909)namely T = Y 119891(119909) 119910 isin Y From view of probability119891(119909) = 119875(119910 | 119909) Notations in this paper which are frequentlyused are summarized in the Notations and Descriptionssection The definition of transfer learning is as follows Givea source domain dataD푆 = (119909푆1 119910푆1) (119909푆119899119878 119910푆119899119878 ) and asource task T푆 and a target domain data which is unlabeledD푇 = (119909푇1) (119909푇119899119879 ) and a target task T푇 Transferlearning aims to utilize D푆 and D푇 to help train a robustprediction model 119891푡(119909) on the condition of D푆 = D푇 orT푆 = T푇

22 Domain Adaptation As a subproblem of transfer learn-ing domain adaptation has achieved great success and isutilized in many applications It assumes that source andtarget domain data have the same feature space label spaceand prediction function from the view of probability equal-ing conditional probabilities distribution namely 119891푆(119909) =119891푇(119909) or 119875푆(119910 | 119909) = 119875푇(119910 | 119909) It is agreed that theapproaches of domain adaptation can be divided into threeparts reweighting approach feature transfer approach andparameter shared approach

(1) Reweighting Approaches In the transfer learning tasks thebasic idea of utilizing the source data to help training targetpredictor is to reduce the discrepancy between the sourceand target data as far as possible Under the assumption thatsource and target domains have a lot of overlapping featuresa conventional method is reweighting or selecting the sourcedomain instances to correct the marginal probability dis-tribution mismatch Based on the metric distance methodbetween distributions named Maximum Mean Discrepancy(MMD) [20] proposed a technique called Kernel MeanMinimum (KMM) revising the weight of every instance tominimize MMD between the source and target domainBeing similar to KMM [21] used the same idea but adifferentmetricmethod to adjust the discrepancy of domainsReference [22] used the strategy of AdaBoost to updatethe weights of source domain data which improved theweight of instances in favor of classification task It alsointroduced the generalization error bounds of model basedon the PAC learning theory In recent years [23] used atwo-step approach first is sampling the instances which aresimilar with other domains as landmarks and then use theselandmarks to map the data into a high-dimension spaceafter which it is more overlapping Reference [24] solvedthe same problem but slacked the similarity assumption itassumes that there are no relationships between the sourceand target domain The model named Selective TransferMachine (STM) reweights the instance of personal faces totrain a generic classifier Most of instance-based transferlearning techniques use KMM to measure the difference ofthe distributions and these methods are applied in manyareas such as facial action unit detection [25] and prostatecancer mapping [26]

(2) Feature Transfer Approaches Compared with instance-based approaches feature-based approaches slack the simi-larity assumption It assumes that source and target domain

share some features named shared features and domains havetheir own features named spec-features [27] For examplewhen we train a task that uses movie critical to helpsofa critical sentiment analysis classification task The wordldquocomfortablerdquo is always nonzero in the sofa domain featuresbut always zero in the movie domain features This wordis the spec-feature of sofa domain feature Feature transferapproaches aim to find a shared latent subspace where thedistance between the source and target domain is minimizedReference [28] proposed an unsupervised domain adaptationapproach named Geodesic Flow Kernel (GFK) based onkernel method GFK maps data into Grassmann manifoldsand constructs geodesic flows to reduce themismatch amongdomains It effectively exploits intrinsic low-dimensionalstructures of data in domains To solve problems of cross-domain natural language processing (NLP) [29] proposeda general method structural correspondence learning (SCL)to learn a discriminative predictor by identifying correspon-dences from features in domains Primarily SCL finds thepivot features and then links the shared features with eachother Reference [7] learned a predictor bymapping the targetkernel matrix to a submatrix of the source kernel matrixThedeep neural network is used not for learning essential featuresbut also for domain adaptation Reference [30] proposed aneural network architecture for domain adaptation namedDeep Adaptation Network (DAN) and extended it to jointadaptation networks (JAN) [31] Reference [32] discussed thetransferable domain features on the deep neural network

(3) Parameter-Based ApproachesThe core idea of parameter-based approaches aims to transfer parameters from sourceto target domain tasks It assumes that different domainsshare someparameters and these parameters could be utilizedfor domains Reference [33] proposed Adaptive SupportVector Machine (A-SVM) as a general method to adopt newdomains A-SVM trains an auxiliary classifier firstly and thenlearns the target predictor based on the original param-eters Reference [34] reweighted prediction of the sourceclassifier on target domain by signing distance betweendomains

23 Exemplar SupportVectorMachines Reference [16] is pro-posed for object detection and getting high performance Ittrains classifiers on every positive instance from all negativeinstances Every positive instance is an exemplar and the clas-sifier corresponding to it can be viewed as a representation ofthe positive instance In the process of the prediction everyclassifier predicts a value for the test instance and uses afunction to make a calibration for the value and then gets thehigh score classifiers result as a predicted classThe exemplar-SVMs solve the problem that a hyperplane is hard to representa category instance and utilize an extreme strategy to trainpredictor In [35] they gather the training procession into onemodel and enter the nuclear norm regularization to the sceneof domain generalization which assumes target domain isunseenThey also extend themodel to the problemof domaingeneralization and multiview [36 37] In [38] they reducedtwo hyperparameters into one and spread exemplar-SVMs toa kernel form

4 Complexity

24 Transfer Component Analysis Reference [39] proposeda dimension reduction method called maximum mean dis-crepancy embedding (MMDE) By minimizing the distanceof source and target domain data distribution in a sharedlatent space the source domain data is utilized to assisttraining classifier on the target domain MMDE is not onlyto minimize the distance between the domains in the latentspace but also preserve the properties of data by maximumof the variance of data Based on the MMDE [40] extendedit to have the ability of deal with the unseen instance andreduce the computation complexity of MMDE SubstantiallyTCA simplifies the process of learning kernel matrix insteadby transforming init kernel matrix The optimization of thisproblem is equal to a solution in 119898 leading eigenvectors ofobject matrix

3 Domain Adaptation Exemplar SupportVector Machine

In this section we present the formulation of DomainAdaptation Exemplar Support Vector Machine (DAESVM)In the remainder of this paper we use a lowercase letter inboldface to represent a column vector and an uppercase inboldface to represent a matrix The notation mentioned inSection is extended We use x+푖 119894 isin 1 119899+푆 where 119899+푆is the number of positive instances to represent a positiveinstance and xminus푗 119895 isin 1 119899minus푆 where 119899minus푆 is the number ofnegative instances to represent a negative instanceThe set ofnegative samples are written as 119873minus This section introducesthe formulation procession of an exemplar classifier In factwe need to train exemplar classifiers in the number of sourcedomain instances and the method which integrates theseclassifiers is proposed in Section

31 Exemplar-SVM The exemplar-SVM is constructed by anextreme idea of training a classifier by a positive instance fromall the negative instances and then calibrating the outputsof classifiers into a probability distribution to separate thesamples The model trains the number of positive instanceclassifiers Learning a classifier which aims to separate apositive instance from all the negative instance can bemodeled as 119891 (w 119887) = w2 + 1198621ℎ (w푇x+ + 119887)+ 1198622 sum

xminus119894isin 푛minus119878

ℎ (minusw푇xminus푖 minus 119887) (1)

where sdot is 2-norm of a vector and 1198621 and 1198622 are thetradeoff parameters corresponding to119862 in SVM for balancingthe positive and negative error cost ℎ(119909) = max (0 1 minus 119909) isa hinge loss function

The formulation (1) is the primal problem of exemplar-SVM and we can find the dual problem for utilizing kernelmethod The dual formulation can be written as follows [38]

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878sum푖=1

120572푖 = 0

0 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1(2)

120572 = (1205720 1205721 120572푛minus119878

) isin R푛minus

119878+1 are Lagrangian multipliers

e is an identity vector We take this model as an exemplarlearner The matrix K isin R(푛

minus

119878+1)times(푛minus

119878+1) is composed of

K = [119896 (x+ x+) minusk푇minusk K] isin R

(푛minus119878+1)times(푛minus

119878+1)

119896 isin R푁minus119878

119896푖 = 119896 (x xminus푖 ) K푖푗 = 119896 (xminus푖 xminus푗 )

(3)

32 Pseudo Label for Kernel Matrix To make the best use ofsamples in source or target we construct the kernel matrixon both domain data However in the dual problem of SVMkernel matrixK needs to be supplied labeled data Ourmodelis based on the unsupervised domain adaptation problemin which only source domain data are labeled Motivated by[19] we use the pseudo label to help model training Pseudolabels are predicted by classical classifiers SVM in ourmodelwhich train on the source labeled dataDue to the distributionmismatch between source and target domain there may bemany labels incorrect Followed by [19] we assume that thepseudo class centroids calculated by them may reside notfar apart from the true class centroids Thus we use bothdomain data to supplement the kernel matrix K with labelinformation In our experiments we testify this method iseffective

33 Exemplar Learner in Domain Adaptation Form In facteach exemplar learner is an SVM in kernel form which istrained by a positive instance and all the negative instancesIn the opinion of [16] a discriminative exemplar classifier canbe taken as a representation of a positive instance Howeverin the task of object detection or image classification thisparametric form representation is feasible because of somecharacteristics in samples such as angle color orientationsand background which are hard to represent The instance-based parametric discriminative classifier can include moreinformation about positive samples Similarly with the moti-vation of transfer learning we can view a positive instance asa domain and there is some mismatch among domains Ourmodel aims to correct this mismatch and reduce the distancefrom the target domain We construct an exemplar learnerdistance metric of domains fromMMD and it can be writtenas

dist (x푆 x푇)= 1003817100381710038171003817100381710038171003817100381710038171003817120601 (x+푆 ) + 1119899minus푆

푛119878sum푖=1

120601 (xminus푆 ) minus 2119899푇푛119879sum푖=1

120601 (x푇)10038171003817100381710038171003817100381710038171003817100381710038172

H

(4)

Complexity 5

However it is just a metric of distance which is satisfiedwith our requirement of minimizing this distance by sometransformation Motivated by Transfer Component Analysis(TCA) we want to map the instance into a latent spacethat the instances from source and target domain are moresimilar and assume this mapping is 119875(119909) Namely we aimto minimize MMD distance between domains by mappinginstances into another spaceWe extend the distance functionas follows

dist (x푆 x푇) = 1003817100381710038171003817100381710038171003817100381710038171003817120601 (119875 (x+푆 )) + 1119899푆푛119878sum푖=1

120601 (119875 (xminus푆 ))minus 2119899푇푛119879sum푖=1

120601 (119875 (x푇))10038171003817100381710038171003817100381710038171003817100381710038172

H

(5)

Corresponding to a general approach it always reformulates(4) to construct a kernel matrix form We define the Grammatrices on the source positive domain source negativedomain and target domainThe kernelmatrixK is composedof nine submatrices K푇+ K푇minus K푇푇 K++ Kminusminus K+minus K+푆Kminus푇 Kminus+ where K = [120601(119909푖)푇 120601(119909푗)]

K = [[[K++ K+minus K+푇Kminus+ Kminusminus Kminus푇K푇+ K푇minus K푇푇

]]] isin R(1+푛minus119878+푛119879)times(1+푛

minus

119878+푛119879) (6)

and it constructs the coefficient matrix L

L푖푗 =

1 if x푖 x푗 isin X+푆 1119899minus푆 if x푖 isin X+푆 x푗 isin Xminus푆 minus 2119899푇 if x푖 isin X+푆 x푗 isin X푇minus 2119899minus푆119899푇 if x푖 isin X푇 x푗 isin Xminus푆 1119899minus푆 2 if x푖 x푗 isin Xminus푆 4119899푇2 if x푖 x푗 isin X푇

(7)

Thus the primal distance function is represented by KLMotivated by TCA [40] the mapping for primal data isequal to the transformation of kernel matrix generated by thesource and target domain data Utilizing the low-dimensiontransform matrix M isin R(1+푛

minus

119878+푛119879)times푚 reduces the dimension

of the primal kernel matrix It maps the empirical kernel mapK = (KKminus12)(Kminus12K) into an 119898-dimensional shared spaceMostly we replaced the distance functionKL by (KMM푇KL)In our case we follow [40] and minimize the trace of thedistance

dist (x+푆 xminus푆 x푇) = tr (M푇KLKM) (8)

For controlling the complexity ofM and preserving the datacharacteristic we add the regularization and constraint item

The domain adaptation item is formulated followed fromTCA and written as

Ω(x+푆 xminus푆 x푇) = tr (M푇KLKM) + 120583 tr (M푇M)st M푇KHKM = I푚 (9)

where 120583 gt 0 is a tradeoff parameter and I푚 isin R(푚times푚) isan identity matrix H = I푛minus

119878+푛119879+1

minus (1(119899minus푆 + 119899푇 + 1))ee푇 isa centering matrix

Furthermore the objective function of dual SVM needsto be added to the training label information which is similarto our model Thus we construct the training label matrix U

U = diag (y+푆 yminus푆 y푇) (10)

y+푆 is the label of a positive instance yminus푆 is the label vector

of negative source instances and y푇 is the pseudo labels oftarget instances which are predicted by SVM before It can berewritten in another form

U = diag(1 minus1 minus1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛minus119878

1199101푇 119910푛119879푇⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛119879

) (11)

Label matrix U provides the information of source domaindata labels and target domain pseudo labels The matrix K ina dual problem of exemplar-SVM (2) is primal data kernelmatrix We want to replace it by mapping the kernel matrixinto a latent subspace Namely replace K by K and the finalobjective function of each DAESVM model is formulated asfollows

min훼M120572푇K120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1M푇KHKM = I푚K = UKMM푇KU

(12)

4 Optimization Algorithm

To minimize problem (12) we adopt the alternated opti-mization method which alternates between solving twosubproblems over parameter 120572 and mapping matrix MUnder these methods the alternated optimization approachis guaranteed to decrease the objective function Algorithm 1summarizes the optimization procedure of problem (12)which we formulated

6 Complexity

Input X푡푟 X푡푒 parameter 120582 120583119898 1198621 and 1198622Output optimal 120572 andM(1) initial 120572 = 0(2) Construct kernel matrix K from X푡푟 and X푡푒 based on

(6) coefficient matrix L based on (7) centering matrixH label matrix U based on (11)

(3) repeat(4) Update transformation matrixM when fix 120572(5) Eigendecompose the optimization matrix of(KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK and select119898 leading eigenvectors to construct the transformation

matrixM(6) Solve the convex optimization problem for fixedM

to optimize 120572(7) until Convergence

Algorithm 1 Domain Adaptation Exemplar Support Vector Machine

Minimizing over 120572 The optimization over 120572 can be rewritteninto the following form

minM120572푇UKMM푇KU120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st M푇KHKM = I푚(13)

Being similar to TCA the formulation is containing a non-convex norm constraint and we transform this optimizationproblem by reformulating as

maxM

tr ((M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus1M푇KHKM) (14)

Proof The Lagrangian of (12) is

L (MZ) = 120572푇UKMM푇KU120572 minus e푇120572

+ 120582 tr (M푇KLKM) minus 120583 tr (M푇M)minus tr ((M푇KHKM minus I푚)Z)

(15)

Because the initial kernel matrixK is a symmetric matrix andwe can rewrite the first term of (15)

120572푇UKMM푇KU120572 = tr (120572푇UKMM푇KU120572)

= tr [(M푇K푇U120572)푇 (M푇K푇U120572)]= tr [(M푇K푇U120572) (M푇K푇U120572)푇]= tr (M푇KU120572120572푇UKM)

(16)

The original Lagrangian formulation is written as follows

tr (M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus tr ((M푇KHKM minus I푚)Z) (17)

The derivative of (17) is

(KU120572120572푇UK + 120582KLK minus 120583I푚)M minus KHKMZ (18)

We set the derivative above to zero and we get Z as

Z = (M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)sdot M (19)

Substituting Z into (17) we obtain

minM

tr ((M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M) (20)Final we obtain an equivalent maximization problem(14)

Being similar to TCA the solution is finding the 119898leading eigenvectors of (KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK

Minimizing overMThe optimization overM can be rewritteninto the following QP form

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 2: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

2 Complexity

in various practical applications such as face recognition[5] natural language processing [6] cross-language textclassification [7]WiFi localization [8] ormedicine image [9]

Domain adaptation is a subproblem of transfer learningwhich assumes that source and target domain data aregenerated from the same feature and label space but differentmargin probability distributions It aims to solve the problemsthat there is none or less labeled data in the target domainand usually use labeled data in the source domain to assist thetraining of target domain tasks Massive works focus on thedomain adaptation problems and they also extend to someapplications such as WiFi location text sentiment analysisand image classification for multidomains Since distributionmismatch generally exists in the real-world applicationsthere is also some other research area concern about domainadaptation For example extreme learning machine (ELM) isan efficient model for training single-hidden layer networks[10] There are also some ELM works in a domain adaptationsetting [11 12]They utilize most previous domain adaptationclassifiers that have added constraint term which is basedon using instance reweighting to minimize Maximum MeanDiscrepancy (MMD) [13] However these methods need toassume that the difference between the source and targetdomain is not too large Namely this idea requires thatdifferent domains are similar

Most pattern recognition problems can be transformedinto several basic classification tasks Generally speakingclassification tasks assume that a category can be representedby a hyperplane [14 15] and most of the machine learningalgorithms aim to learn hyperplanes to predict for unseeninstancesMeanwhile to improve the ability of representationby a hyperplane there are some works which cluster thesamples first and then solve the classification tasks on theclusters In contrast to the category classification tasks acluster classifier can include more information about thepositive category but the more risks of overfitting Moti-vated by the object detection [16] proposed an extremeclassification model training the classifiers for every positiveinstance and all the negative instances named exemplarsupport vector machines (E-SVMs) In fact exemplar-SVMscan be viewed as an extreme situation of cluster-level SVMin which every positive sample is regarded as a cluster Thereare two viewpoints about the reason why the exemplar-SVMachieves a surprising generalization performance One of theviewpoints is taking the exemplar-SVMs as a representationwith complete details of positive instances In other wordsevery classifier captures details of the positive instance likebackground corner color or orientations and most of theclassifiers can describe the category more intrinsically Fromtransfer learning viewpoint training data cannot satisfythe underlying assumption of iid as every instance inthe training set may be different from each other namelysample selection bias [17] Each exemplar-SVMs classifier istrained on a high weight positive sample and other negativesamples it can represent the positive sample well in the samedistribution Recently [18] extends exemplar-SVMs into atransfer learning form which uses loss function reweightingand adds a low-rank regularization item for classifiers

In this work we propose a novel model to addressunsupervised domain adaptation problems that there isno label on target domain data Furthermore it permitsdistribution mismatch among instances In our model wetrain kernel exemplar classifiers for every positive instanceand then integrate the classifier tomake a prediction for targetdomain data To align the distribution mismatch we embedthe regularization item based on TCA in our classifiers Inour opinion the model constructs the bridge to transfer theknowledge and we use the information in the kernel matrixwhich includes the instances representation in the high-dimension space to assist classifier training across domainsFor the problem of sample selection bias we integratethe classifiers to make a prediction Basically the step ofintegration is to expand the representation of hyperplanesthat entirely take advantage of details learned before

Our contributions are as follows (1)We propose a novelunsupervised domain adaptation model based on exemplar-SVMs named Domain Adaptation Exemplar Support VectorMachines (DAESVMs) and it improves standard domainadaptation prediction accuracy by transferring knowledgeacross domains (2) Every DAESVM classifier constructs abridge that transmits knowledge from the source domainto target domain Compared with the traditional two-stepmethod this strategy thoroughly searches the optimizationpoint of the model whichmakes the classification hyperplanemore precious about domains (3) To solve the problemof sample selection bias we use the ensemble methods tointegrate the classifiersThe process of the ensemble is similarto slacking the classification hyperplane which drops offsomeunreliable classification results and use the reliable partsto make a prediction (4) We bring in the method of thepseudo label in DAESVMs inspired by [19] to supplement theinformation of target domain and the experiments verify theeffectiveness of the pseudo label (5) We push a step furtherto extend to implementing DAESVMs on the multidomainadaptation The rest of this paper is organized as follows InSection we introduce the notation of the problem Mean-while we review the related works of domain adaptationexemplar-SVM andTransfer Component Analysis (TCA) InSection we introduce the deduction process ofDAESVMandformulate themodel In Section we propose the optimizationalgorithm for our model In Section we integrate all theDAESVMs classifiers to make a prediction In Section weanalyze the experiments on some transfer learning dataset toverify the effectiveness of DAESVMs In Section we concludeour work and give an expectation

2 Notation and Related Works

This section will introduce the notation and related worksabout this paper

21 Notation In this paper we use the notation of [4] defi-nition in transfer learning and the definition just considersthe condition of one source domain and one target domainFirst it needs to define the Domain and Task Domain Dis composed of a feature space X and a margin probabilitydistribution 119875(119909) namely D = X 119875(119909) 119909 isin X Task T

Complexity 3

is composed of a label spaceY and a prediction model 119891(119909)namely T = Y 119891(119909) 119910 isin Y From view of probability119891(119909) = 119875(119910 | 119909) Notations in this paper which are frequentlyused are summarized in the Notations and Descriptionssection The definition of transfer learning is as follows Givea source domain dataD푆 = (119909푆1 119910푆1) (119909푆119899119878 119910푆119899119878 ) and asource task T푆 and a target domain data which is unlabeledD푇 = (119909푇1) (119909푇119899119879 ) and a target task T푇 Transferlearning aims to utilize D푆 and D푇 to help train a robustprediction model 119891푡(119909) on the condition of D푆 = D푇 orT푆 = T푇

22 Domain Adaptation As a subproblem of transfer learn-ing domain adaptation has achieved great success and isutilized in many applications It assumes that source andtarget domain data have the same feature space label spaceand prediction function from the view of probability equal-ing conditional probabilities distribution namely 119891푆(119909) =119891푇(119909) or 119875푆(119910 | 119909) = 119875푇(119910 | 119909) It is agreed that theapproaches of domain adaptation can be divided into threeparts reweighting approach feature transfer approach andparameter shared approach

(1) Reweighting Approaches In the transfer learning tasks thebasic idea of utilizing the source data to help training targetpredictor is to reduce the discrepancy between the sourceand target data as far as possible Under the assumption thatsource and target domains have a lot of overlapping featuresa conventional method is reweighting or selecting the sourcedomain instances to correct the marginal probability dis-tribution mismatch Based on the metric distance methodbetween distributions named Maximum Mean Discrepancy(MMD) [20] proposed a technique called Kernel MeanMinimum (KMM) revising the weight of every instance tominimize MMD between the source and target domainBeing similar to KMM [21] used the same idea but adifferentmetricmethod to adjust the discrepancy of domainsReference [22] used the strategy of AdaBoost to updatethe weights of source domain data which improved theweight of instances in favor of classification task It alsointroduced the generalization error bounds of model basedon the PAC learning theory In recent years [23] used atwo-step approach first is sampling the instances which aresimilar with other domains as landmarks and then use theselandmarks to map the data into a high-dimension spaceafter which it is more overlapping Reference [24] solvedthe same problem but slacked the similarity assumption itassumes that there are no relationships between the sourceand target domain The model named Selective TransferMachine (STM) reweights the instance of personal faces totrain a generic classifier Most of instance-based transferlearning techniques use KMM to measure the difference ofthe distributions and these methods are applied in manyareas such as facial action unit detection [25] and prostatecancer mapping [26]

(2) Feature Transfer Approaches Compared with instance-based approaches feature-based approaches slack the simi-larity assumption It assumes that source and target domain

share some features named shared features and domains havetheir own features named spec-features [27] For examplewhen we train a task that uses movie critical to helpsofa critical sentiment analysis classification task The wordldquocomfortablerdquo is always nonzero in the sofa domain featuresbut always zero in the movie domain features This wordis the spec-feature of sofa domain feature Feature transferapproaches aim to find a shared latent subspace where thedistance between the source and target domain is minimizedReference [28] proposed an unsupervised domain adaptationapproach named Geodesic Flow Kernel (GFK) based onkernel method GFK maps data into Grassmann manifoldsand constructs geodesic flows to reduce themismatch amongdomains It effectively exploits intrinsic low-dimensionalstructures of data in domains To solve problems of cross-domain natural language processing (NLP) [29] proposeda general method structural correspondence learning (SCL)to learn a discriminative predictor by identifying correspon-dences from features in domains Primarily SCL finds thepivot features and then links the shared features with eachother Reference [7] learned a predictor bymapping the targetkernel matrix to a submatrix of the source kernel matrixThedeep neural network is used not for learning essential featuresbut also for domain adaptation Reference [30] proposed aneural network architecture for domain adaptation namedDeep Adaptation Network (DAN) and extended it to jointadaptation networks (JAN) [31] Reference [32] discussed thetransferable domain features on the deep neural network

(3) Parameter-Based ApproachesThe core idea of parameter-based approaches aims to transfer parameters from sourceto target domain tasks It assumes that different domainsshare someparameters and these parameters could be utilizedfor domains Reference [33] proposed Adaptive SupportVector Machine (A-SVM) as a general method to adopt newdomains A-SVM trains an auxiliary classifier firstly and thenlearns the target predictor based on the original param-eters Reference [34] reweighted prediction of the sourceclassifier on target domain by signing distance betweendomains

23 Exemplar SupportVectorMachines Reference [16] is pro-posed for object detection and getting high performance Ittrains classifiers on every positive instance from all negativeinstances Every positive instance is an exemplar and the clas-sifier corresponding to it can be viewed as a representation ofthe positive instance In the process of the prediction everyclassifier predicts a value for the test instance and uses afunction to make a calibration for the value and then gets thehigh score classifiers result as a predicted classThe exemplar-SVMs solve the problem that a hyperplane is hard to representa category instance and utilize an extreme strategy to trainpredictor In [35] they gather the training procession into onemodel and enter the nuclear norm regularization to the sceneof domain generalization which assumes target domain isunseenThey also extend themodel to the problemof domaingeneralization and multiview [36 37] In [38] they reducedtwo hyperparameters into one and spread exemplar-SVMs toa kernel form

4 Complexity

24 Transfer Component Analysis Reference [39] proposeda dimension reduction method called maximum mean dis-crepancy embedding (MMDE) By minimizing the distanceof source and target domain data distribution in a sharedlatent space the source domain data is utilized to assisttraining classifier on the target domain MMDE is not onlyto minimize the distance between the domains in the latentspace but also preserve the properties of data by maximumof the variance of data Based on the MMDE [40] extendedit to have the ability of deal with the unseen instance andreduce the computation complexity of MMDE SubstantiallyTCA simplifies the process of learning kernel matrix insteadby transforming init kernel matrix The optimization of thisproblem is equal to a solution in 119898 leading eigenvectors ofobject matrix

3 Domain Adaptation Exemplar SupportVector Machine

In this section we present the formulation of DomainAdaptation Exemplar Support Vector Machine (DAESVM)In the remainder of this paper we use a lowercase letter inboldface to represent a column vector and an uppercase inboldface to represent a matrix The notation mentioned inSection is extended We use x+푖 119894 isin 1 119899+푆 where 119899+푆is the number of positive instances to represent a positiveinstance and xminus푗 119895 isin 1 119899minus푆 where 119899minus푆 is the number ofnegative instances to represent a negative instanceThe set ofnegative samples are written as 119873minus This section introducesthe formulation procession of an exemplar classifier In factwe need to train exemplar classifiers in the number of sourcedomain instances and the method which integrates theseclassifiers is proposed in Section

31 Exemplar-SVM The exemplar-SVM is constructed by anextreme idea of training a classifier by a positive instance fromall the negative instances and then calibrating the outputsof classifiers into a probability distribution to separate thesamples The model trains the number of positive instanceclassifiers Learning a classifier which aims to separate apositive instance from all the negative instance can bemodeled as 119891 (w 119887) = w2 + 1198621ℎ (w푇x+ + 119887)+ 1198622 sum

xminus119894isin 푛minus119878

ℎ (minusw푇xminus푖 minus 119887) (1)

where sdot is 2-norm of a vector and 1198621 and 1198622 are thetradeoff parameters corresponding to119862 in SVM for balancingthe positive and negative error cost ℎ(119909) = max (0 1 minus 119909) isa hinge loss function

The formulation (1) is the primal problem of exemplar-SVM and we can find the dual problem for utilizing kernelmethod The dual formulation can be written as follows [38]

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878sum푖=1

120572푖 = 0

0 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1(2)

120572 = (1205720 1205721 120572푛minus119878

) isin R푛minus

119878+1 are Lagrangian multipliers

e is an identity vector We take this model as an exemplarlearner The matrix K isin R(푛

minus

119878+1)times(푛minus

119878+1) is composed of

K = [119896 (x+ x+) minusk푇minusk K] isin R

(푛minus119878+1)times(푛minus

119878+1)

119896 isin R푁minus119878

119896푖 = 119896 (x xminus푖 ) K푖푗 = 119896 (xminus푖 xminus푗 )

(3)

32 Pseudo Label for Kernel Matrix To make the best use ofsamples in source or target we construct the kernel matrixon both domain data However in the dual problem of SVMkernel matrixK needs to be supplied labeled data Ourmodelis based on the unsupervised domain adaptation problemin which only source domain data are labeled Motivated by[19] we use the pseudo label to help model training Pseudolabels are predicted by classical classifiers SVM in ourmodelwhich train on the source labeled dataDue to the distributionmismatch between source and target domain there may bemany labels incorrect Followed by [19] we assume that thepseudo class centroids calculated by them may reside notfar apart from the true class centroids Thus we use bothdomain data to supplement the kernel matrix K with labelinformation In our experiments we testify this method iseffective

33 Exemplar Learner in Domain Adaptation Form In facteach exemplar learner is an SVM in kernel form which istrained by a positive instance and all the negative instancesIn the opinion of [16] a discriminative exemplar classifier canbe taken as a representation of a positive instance Howeverin the task of object detection or image classification thisparametric form representation is feasible because of somecharacteristics in samples such as angle color orientationsand background which are hard to represent The instance-based parametric discriminative classifier can include moreinformation about positive samples Similarly with the moti-vation of transfer learning we can view a positive instance asa domain and there is some mismatch among domains Ourmodel aims to correct this mismatch and reduce the distancefrom the target domain We construct an exemplar learnerdistance metric of domains fromMMD and it can be writtenas

dist (x푆 x푇)= 1003817100381710038171003817100381710038171003817100381710038171003817120601 (x+푆 ) + 1119899minus푆

푛119878sum푖=1

120601 (xminus푆 ) minus 2119899푇푛119879sum푖=1

120601 (x푇)10038171003817100381710038171003817100381710038171003817100381710038172

H

(4)

Complexity 5

However it is just a metric of distance which is satisfiedwith our requirement of minimizing this distance by sometransformation Motivated by Transfer Component Analysis(TCA) we want to map the instance into a latent spacethat the instances from source and target domain are moresimilar and assume this mapping is 119875(119909) Namely we aimto minimize MMD distance between domains by mappinginstances into another spaceWe extend the distance functionas follows

dist (x푆 x푇) = 1003817100381710038171003817100381710038171003817100381710038171003817120601 (119875 (x+푆 )) + 1119899푆푛119878sum푖=1

120601 (119875 (xminus푆 ))minus 2119899푇푛119879sum푖=1

120601 (119875 (x푇))10038171003817100381710038171003817100381710038171003817100381710038172

H

(5)

Corresponding to a general approach it always reformulates(4) to construct a kernel matrix form We define the Grammatrices on the source positive domain source negativedomain and target domainThe kernelmatrixK is composedof nine submatrices K푇+ K푇minus K푇푇 K++ Kminusminus K+minus K+푆Kminus푇 Kminus+ where K = [120601(119909푖)푇 120601(119909푗)]

K = [[[K++ K+minus K+푇Kminus+ Kminusminus Kminus푇K푇+ K푇minus K푇푇

]]] isin R(1+푛minus119878+푛119879)times(1+푛

minus

119878+푛119879) (6)

and it constructs the coefficient matrix L

L푖푗 =

1 if x푖 x푗 isin X+푆 1119899minus푆 if x푖 isin X+푆 x푗 isin Xminus푆 minus 2119899푇 if x푖 isin X+푆 x푗 isin X푇minus 2119899minus푆119899푇 if x푖 isin X푇 x푗 isin Xminus푆 1119899minus푆 2 if x푖 x푗 isin Xminus푆 4119899푇2 if x푖 x푗 isin X푇

(7)

Thus the primal distance function is represented by KLMotivated by TCA [40] the mapping for primal data isequal to the transformation of kernel matrix generated by thesource and target domain data Utilizing the low-dimensiontransform matrix M isin R(1+푛

minus

119878+푛119879)times푚 reduces the dimension

of the primal kernel matrix It maps the empirical kernel mapK = (KKminus12)(Kminus12K) into an 119898-dimensional shared spaceMostly we replaced the distance functionKL by (KMM푇KL)In our case we follow [40] and minimize the trace of thedistance

dist (x+푆 xminus푆 x푇) = tr (M푇KLKM) (8)

For controlling the complexity ofM and preserving the datacharacteristic we add the regularization and constraint item

The domain adaptation item is formulated followed fromTCA and written as

Ω(x+푆 xminus푆 x푇) = tr (M푇KLKM) + 120583 tr (M푇M)st M푇KHKM = I푚 (9)

where 120583 gt 0 is a tradeoff parameter and I푚 isin R(푚times푚) isan identity matrix H = I푛minus

119878+푛119879+1

minus (1(119899minus푆 + 119899푇 + 1))ee푇 isa centering matrix

Furthermore the objective function of dual SVM needsto be added to the training label information which is similarto our model Thus we construct the training label matrix U

U = diag (y+푆 yminus푆 y푇) (10)

y+푆 is the label of a positive instance yminus푆 is the label vector

of negative source instances and y푇 is the pseudo labels oftarget instances which are predicted by SVM before It can berewritten in another form

U = diag(1 minus1 minus1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛minus119878

1199101푇 119910푛119879푇⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛119879

) (11)

Label matrix U provides the information of source domaindata labels and target domain pseudo labels The matrix K ina dual problem of exemplar-SVM (2) is primal data kernelmatrix We want to replace it by mapping the kernel matrixinto a latent subspace Namely replace K by K and the finalobjective function of each DAESVM model is formulated asfollows

min훼M120572푇K120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1M푇KHKM = I푚K = UKMM푇KU

(12)

4 Optimization Algorithm

To minimize problem (12) we adopt the alternated opti-mization method which alternates between solving twosubproblems over parameter 120572 and mapping matrix MUnder these methods the alternated optimization approachis guaranteed to decrease the objective function Algorithm 1summarizes the optimization procedure of problem (12)which we formulated

6 Complexity

Input X푡푟 X푡푒 parameter 120582 120583119898 1198621 and 1198622Output optimal 120572 andM(1) initial 120572 = 0(2) Construct kernel matrix K from X푡푟 and X푡푒 based on

(6) coefficient matrix L based on (7) centering matrixH label matrix U based on (11)

(3) repeat(4) Update transformation matrixM when fix 120572(5) Eigendecompose the optimization matrix of(KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK and select119898 leading eigenvectors to construct the transformation

matrixM(6) Solve the convex optimization problem for fixedM

to optimize 120572(7) until Convergence

Algorithm 1 Domain Adaptation Exemplar Support Vector Machine

Minimizing over 120572 The optimization over 120572 can be rewritteninto the following form

minM120572푇UKMM푇KU120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st M푇KHKM = I푚(13)

Being similar to TCA the formulation is containing a non-convex norm constraint and we transform this optimizationproblem by reformulating as

maxM

tr ((M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus1M푇KHKM) (14)

Proof The Lagrangian of (12) is

L (MZ) = 120572푇UKMM푇KU120572 minus e푇120572

+ 120582 tr (M푇KLKM) minus 120583 tr (M푇M)minus tr ((M푇KHKM minus I푚)Z)

(15)

Because the initial kernel matrixK is a symmetric matrix andwe can rewrite the first term of (15)

120572푇UKMM푇KU120572 = tr (120572푇UKMM푇KU120572)

= tr [(M푇K푇U120572)푇 (M푇K푇U120572)]= tr [(M푇K푇U120572) (M푇K푇U120572)푇]= tr (M푇KU120572120572푇UKM)

(16)

The original Lagrangian formulation is written as follows

tr (M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus tr ((M푇KHKM minus I푚)Z) (17)

The derivative of (17) is

(KU120572120572푇UK + 120582KLK minus 120583I푚)M minus KHKMZ (18)

We set the derivative above to zero and we get Z as

Z = (M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)sdot M (19)

Substituting Z into (17) we obtain

minM

tr ((M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M) (20)Final we obtain an equivalent maximization problem(14)

Being similar to TCA the solution is finding the 119898leading eigenvectors of (KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK

Minimizing overMThe optimization overM can be rewritteninto the following QP form

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 3: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Complexity 3

is composed of a label spaceY and a prediction model 119891(119909)namely T = Y 119891(119909) 119910 isin Y From view of probability119891(119909) = 119875(119910 | 119909) Notations in this paper which are frequentlyused are summarized in the Notations and Descriptionssection The definition of transfer learning is as follows Givea source domain dataD푆 = (119909푆1 119910푆1) (119909푆119899119878 119910푆119899119878 ) and asource task T푆 and a target domain data which is unlabeledD푇 = (119909푇1) (119909푇119899119879 ) and a target task T푇 Transferlearning aims to utilize D푆 and D푇 to help train a robustprediction model 119891푡(119909) on the condition of D푆 = D푇 orT푆 = T푇

22 Domain Adaptation As a subproblem of transfer learn-ing domain adaptation has achieved great success and isutilized in many applications It assumes that source andtarget domain data have the same feature space label spaceand prediction function from the view of probability equal-ing conditional probabilities distribution namely 119891푆(119909) =119891푇(119909) or 119875푆(119910 | 119909) = 119875푇(119910 | 119909) It is agreed that theapproaches of domain adaptation can be divided into threeparts reweighting approach feature transfer approach andparameter shared approach

(1) Reweighting Approaches In the transfer learning tasks thebasic idea of utilizing the source data to help training targetpredictor is to reduce the discrepancy between the sourceand target data as far as possible Under the assumption thatsource and target domains have a lot of overlapping featuresa conventional method is reweighting or selecting the sourcedomain instances to correct the marginal probability dis-tribution mismatch Based on the metric distance methodbetween distributions named Maximum Mean Discrepancy(MMD) [20] proposed a technique called Kernel MeanMinimum (KMM) revising the weight of every instance tominimize MMD between the source and target domainBeing similar to KMM [21] used the same idea but adifferentmetricmethod to adjust the discrepancy of domainsReference [22] used the strategy of AdaBoost to updatethe weights of source domain data which improved theweight of instances in favor of classification task It alsointroduced the generalization error bounds of model basedon the PAC learning theory In recent years [23] used atwo-step approach first is sampling the instances which aresimilar with other domains as landmarks and then use theselandmarks to map the data into a high-dimension spaceafter which it is more overlapping Reference [24] solvedthe same problem but slacked the similarity assumption itassumes that there are no relationships between the sourceand target domain The model named Selective TransferMachine (STM) reweights the instance of personal faces totrain a generic classifier Most of instance-based transferlearning techniques use KMM to measure the difference ofthe distributions and these methods are applied in manyareas such as facial action unit detection [25] and prostatecancer mapping [26]

(2) Feature Transfer Approaches Compared with instance-based approaches feature-based approaches slack the simi-larity assumption It assumes that source and target domain

share some features named shared features and domains havetheir own features named spec-features [27] For examplewhen we train a task that uses movie critical to helpsofa critical sentiment analysis classification task The wordldquocomfortablerdquo is always nonzero in the sofa domain featuresbut always zero in the movie domain features This wordis the spec-feature of sofa domain feature Feature transferapproaches aim to find a shared latent subspace where thedistance between the source and target domain is minimizedReference [28] proposed an unsupervised domain adaptationapproach named Geodesic Flow Kernel (GFK) based onkernel method GFK maps data into Grassmann manifoldsand constructs geodesic flows to reduce themismatch amongdomains It effectively exploits intrinsic low-dimensionalstructures of data in domains To solve problems of cross-domain natural language processing (NLP) [29] proposeda general method structural correspondence learning (SCL)to learn a discriminative predictor by identifying correspon-dences from features in domains Primarily SCL finds thepivot features and then links the shared features with eachother Reference [7] learned a predictor bymapping the targetkernel matrix to a submatrix of the source kernel matrixThedeep neural network is used not for learning essential featuresbut also for domain adaptation Reference [30] proposed aneural network architecture for domain adaptation namedDeep Adaptation Network (DAN) and extended it to jointadaptation networks (JAN) [31] Reference [32] discussed thetransferable domain features on the deep neural network

(3) Parameter-Based ApproachesThe core idea of parameter-based approaches aims to transfer parameters from sourceto target domain tasks It assumes that different domainsshare someparameters and these parameters could be utilizedfor domains Reference [33] proposed Adaptive SupportVector Machine (A-SVM) as a general method to adopt newdomains A-SVM trains an auxiliary classifier firstly and thenlearns the target predictor based on the original param-eters Reference [34] reweighted prediction of the sourceclassifier on target domain by signing distance betweendomains

23 Exemplar SupportVectorMachines Reference [16] is pro-posed for object detection and getting high performance Ittrains classifiers on every positive instance from all negativeinstances Every positive instance is an exemplar and the clas-sifier corresponding to it can be viewed as a representation ofthe positive instance In the process of the prediction everyclassifier predicts a value for the test instance and uses afunction to make a calibration for the value and then gets thehigh score classifiers result as a predicted classThe exemplar-SVMs solve the problem that a hyperplane is hard to representa category instance and utilize an extreme strategy to trainpredictor In [35] they gather the training procession into onemodel and enter the nuclear norm regularization to the sceneof domain generalization which assumes target domain isunseenThey also extend themodel to the problemof domaingeneralization and multiview [36 37] In [38] they reducedtwo hyperparameters into one and spread exemplar-SVMs toa kernel form

4 Complexity

24 Transfer Component Analysis Reference [39] proposeda dimension reduction method called maximum mean dis-crepancy embedding (MMDE) By minimizing the distanceof source and target domain data distribution in a sharedlatent space the source domain data is utilized to assisttraining classifier on the target domain MMDE is not onlyto minimize the distance between the domains in the latentspace but also preserve the properties of data by maximumof the variance of data Based on the MMDE [40] extendedit to have the ability of deal with the unseen instance andreduce the computation complexity of MMDE SubstantiallyTCA simplifies the process of learning kernel matrix insteadby transforming init kernel matrix The optimization of thisproblem is equal to a solution in 119898 leading eigenvectors ofobject matrix

3 Domain Adaptation Exemplar SupportVector Machine

In this section we present the formulation of DomainAdaptation Exemplar Support Vector Machine (DAESVM)In the remainder of this paper we use a lowercase letter inboldface to represent a column vector and an uppercase inboldface to represent a matrix The notation mentioned inSection is extended We use x+푖 119894 isin 1 119899+푆 where 119899+푆is the number of positive instances to represent a positiveinstance and xminus푗 119895 isin 1 119899minus푆 where 119899minus푆 is the number ofnegative instances to represent a negative instanceThe set ofnegative samples are written as 119873minus This section introducesthe formulation procession of an exemplar classifier In factwe need to train exemplar classifiers in the number of sourcedomain instances and the method which integrates theseclassifiers is proposed in Section

31 Exemplar-SVM The exemplar-SVM is constructed by anextreme idea of training a classifier by a positive instance fromall the negative instances and then calibrating the outputsof classifiers into a probability distribution to separate thesamples The model trains the number of positive instanceclassifiers Learning a classifier which aims to separate apositive instance from all the negative instance can bemodeled as 119891 (w 119887) = w2 + 1198621ℎ (w푇x+ + 119887)+ 1198622 sum

xminus119894isin 푛minus119878

ℎ (minusw푇xminus푖 minus 119887) (1)

where sdot is 2-norm of a vector and 1198621 and 1198622 are thetradeoff parameters corresponding to119862 in SVM for balancingthe positive and negative error cost ℎ(119909) = max (0 1 minus 119909) isa hinge loss function

The formulation (1) is the primal problem of exemplar-SVM and we can find the dual problem for utilizing kernelmethod The dual formulation can be written as follows [38]

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878sum푖=1

120572푖 = 0

0 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1(2)

120572 = (1205720 1205721 120572푛minus119878

) isin R푛minus

119878+1 are Lagrangian multipliers

e is an identity vector We take this model as an exemplarlearner The matrix K isin R(푛

minus

119878+1)times(푛minus

119878+1) is composed of

K = [119896 (x+ x+) minusk푇minusk K] isin R

(푛minus119878+1)times(푛minus

119878+1)

119896 isin R푁minus119878

119896푖 = 119896 (x xminus푖 ) K푖푗 = 119896 (xminus푖 xminus푗 )

(3)

32 Pseudo Label for Kernel Matrix To make the best use ofsamples in source or target we construct the kernel matrixon both domain data However in the dual problem of SVMkernel matrixK needs to be supplied labeled data Ourmodelis based on the unsupervised domain adaptation problemin which only source domain data are labeled Motivated by[19] we use the pseudo label to help model training Pseudolabels are predicted by classical classifiers SVM in ourmodelwhich train on the source labeled dataDue to the distributionmismatch between source and target domain there may bemany labels incorrect Followed by [19] we assume that thepseudo class centroids calculated by them may reside notfar apart from the true class centroids Thus we use bothdomain data to supplement the kernel matrix K with labelinformation In our experiments we testify this method iseffective

33 Exemplar Learner in Domain Adaptation Form In facteach exemplar learner is an SVM in kernel form which istrained by a positive instance and all the negative instancesIn the opinion of [16] a discriminative exemplar classifier canbe taken as a representation of a positive instance Howeverin the task of object detection or image classification thisparametric form representation is feasible because of somecharacteristics in samples such as angle color orientationsand background which are hard to represent The instance-based parametric discriminative classifier can include moreinformation about positive samples Similarly with the moti-vation of transfer learning we can view a positive instance asa domain and there is some mismatch among domains Ourmodel aims to correct this mismatch and reduce the distancefrom the target domain We construct an exemplar learnerdistance metric of domains fromMMD and it can be writtenas

dist (x푆 x푇)= 1003817100381710038171003817100381710038171003817100381710038171003817120601 (x+푆 ) + 1119899minus푆

푛119878sum푖=1

120601 (xminus푆 ) minus 2119899푇푛119879sum푖=1

120601 (x푇)10038171003817100381710038171003817100381710038171003817100381710038172

H

(4)

Complexity 5

However it is just a metric of distance which is satisfiedwith our requirement of minimizing this distance by sometransformation Motivated by Transfer Component Analysis(TCA) we want to map the instance into a latent spacethat the instances from source and target domain are moresimilar and assume this mapping is 119875(119909) Namely we aimto minimize MMD distance between domains by mappinginstances into another spaceWe extend the distance functionas follows

dist (x푆 x푇) = 1003817100381710038171003817100381710038171003817100381710038171003817120601 (119875 (x+푆 )) + 1119899푆푛119878sum푖=1

120601 (119875 (xminus푆 ))minus 2119899푇푛119879sum푖=1

120601 (119875 (x푇))10038171003817100381710038171003817100381710038171003817100381710038172

H

(5)

Corresponding to a general approach it always reformulates(4) to construct a kernel matrix form We define the Grammatrices on the source positive domain source negativedomain and target domainThe kernelmatrixK is composedof nine submatrices K푇+ K푇minus K푇푇 K++ Kminusminus K+minus K+푆Kminus푇 Kminus+ where K = [120601(119909푖)푇 120601(119909푗)]

K = [[[K++ K+minus K+푇Kminus+ Kminusminus Kminus푇K푇+ K푇minus K푇푇

]]] isin R(1+푛minus119878+푛119879)times(1+푛

minus

119878+푛119879) (6)

and it constructs the coefficient matrix L

L푖푗 =

1 if x푖 x푗 isin X+푆 1119899minus푆 if x푖 isin X+푆 x푗 isin Xminus푆 minus 2119899푇 if x푖 isin X+푆 x푗 isin X푇minus 2119899minus푆119899푇 if x푖 isin X푇 x푗 isin Xminus푆 1119899minus푆 2 if x푖 x푗 isin Xminus푆 4119899푇2 if x푖 x푗 isin X푇

(7)

Thus the primal distance function is represented by KLMotivated by TCA [40] the mapping for primal data isequal to the transformation of kernel matrix generated by thesource and target domain data Utilizing the low-dimensiontransform matrix M isin R(1+푛

minus

119878+푛119879)times푚 reduces the dimension

of the primal kernel matrix It maps the empirical kernel mapK = (KKminus12)(Kminus12K) into an 119898-dimensional shared spaceMostly we replaced the distance functionKL by (KMM푇KL)In our case we follow [40] and minimize the trace of thedistance

dist (x+푆 xminus푆 x푇) = tr (M푇KLKM) (8)

For controlling the complexity ofM and preserving the datacharacteristic we add the regularization and constraint item

The domain adaptation item is formulated followed fromTCA and written as

Ω(x+푆 xminus푆 x푇) = tr (M푇KLKM) + 120583 tr (M푇M)st M푇KHKM = I푚 (9)

where 120583 gt 0 is a tradeoff parameter and I푚 isin R(푚times푚) isan identity matrix H = I푛minus

119878+푛119879+1

minus (1(119899minus푆 + 119899푇 + 1))ee푇 isa centering matrix

Furthermore the objective function of dual SVM needsto be added to the training label information which is similarto our model Thus we construct the training label matrix U

U = diag (y+푆 yminus푆 y푇) (10)

y+푆 is the label of a positive instance yminus푆 is the label vector

of negative source instances and y푇 is the pseudo labels oftarget instances which are predicted by SVM before It can berewritten in another form

U = diag(1 minus1 minus1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛minus119878

1199101푇 119910푛119879푇⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛119879

) (11)

Label matrix U provides the information of source domaindata labels and target domain pseudo labels The matrix K ina dual problem of exemplar-SVM (2) is primal data kernelmatrix We want to replace it by mapping the kernel matrixinto a latent subspace Namely replace K by K and the finalobjective function of each DAESVM model is formulated asfollows

min훼M120572푇K120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1M푇KHKM = I푚K = UKMM푇KU

(12)

4 Optimization Algorithm

To minimize problem (12) we adopt the alternated opti-mization method which alternates between solving twosubproblems over parameter 120572 and mapping matrix MUnder these methods the alternated optimization approachis guaranteed to decrease the objective function Algorithm 1summarizes the optimization procedure of problem (12)which we formulated

6 Complexity

Input X푡푟 X푡푒 parameter 120582 120583119898 1198621 and 1198622Output optimal 120572 andM(1) initial 120572 = 0(2) Construct kernel matrix K from X푡푟 and X푡푒 based on

(6) coefficient matrix L based on (7) centering matrixH label matrix U based on (11)

(3) repeat(4) Update transformation matrixM when fix 120572(5) Eigendecompose the optimization matrix of(KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK and select119898 leading eigenvectors to construct the transformation

matrixM(6) Solve the convex optimization problem for fixedM

to optimize 120572(7) until Convergence

Algorithm 1 Domain Adaptation Exemplar Support Vector Machine

Minimizing over 120572 The optimization over 120572 can be rewritteninto the following form

minM120572푇UKMM푇KU120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st M푇KHKM = I푚(13)

Being similar to TCA the formulation is containing a non-convex norm constraint and we transform this optimizationproblem by reformulating as

maxM

tr ((M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus1M푇KHKM) (14)

Proof The Lagrangian of (12) is

L (MZ) = 120572푇UKMM푇KU120572 minus e푇120572

+ 120582 tr (M푇KLKM) minus 120583 tr (M푇M)minus tr ((M푇KHKM minus I푚)Z)

(15)

Because the initial kernel matrixK is a symmetric matrix andwe can rewrite the first term of (15)

120572푇UKMM푇KU120572 = tr (120572푇UKMM푇KU120572)

= tr [(M푇K푇U120572)푇 (M푇K푇U120572)]= tr [(M푇K푇U120572) (M푇K푇U120572)푇]= tr (M푇KU120572120572푇UKM)

(16)

The original Lagrangian formulation is written as follows

tr (M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus tr ((M푇KHKM minus I푚)Z) (17)

The derivative of (17) is

(KU120572120572푇UK + 120582KLK minus 120583I푚)M minus KHKMZ (18)

We set the derivative above to zero and we get Z as

Z = (M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)sdot M (19)

Substituting Z into (17) we obtain

minM

tr ((M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M) (20)Final we obtain an equivalent maximization problem(14)

Being similar to TCA the solution is finding the 119898leading eigenvectors of (KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK

Minimizing overMThe optimization overM can be rewritteninto the following QP form

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 4: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

4 Complexity

24 Transfer Component Analysis Reference [39] proposeda dimension reduction method called maximum mean dis-crepancy embedding (MMDE) By minimizing the distanceof source and target domain data distribution in a sharedlatent space the source domain data is utilized to assisttraining classifier on the target domain MMDE is not onlyto minimize the distance between the domains in the latentspace but also preserve the properties of data by maximumof the variance of data Based on the MMDE [40] extendedit to have the ability of deal with the unseen instance andreduce the computation complexity of MMDE SubstantiallyTCA simplifies the process of learning kernel matrix insteadby transforming init kernel matrix The optimization of thisproblem is equal to a solution in 119898 leading eigenvectors ofobject matrix

3 Domain Adaptation Exemplar SupportVector Machine

In this section we present the formulation of DomainAdaptation Exemplar Support Vector Machine (DAESVM)In the remainder of this paper we use a lowercase letter inboldface to represent a column vector and an uppercase inboldface to represent a matrix The notation mentioned inSection is extended We use x+푖 119894 isin 1 119899+푆 where 119899+푆is the number of positive instances to represent a positiveinstance and xminus푗 119895 isin 1 119899minus푆 where 119899minus푆 is the number ofnegative instances to represent a negative instanceThe set ofnegative samples are written as 119873minus This section introducesthe formulation procession of an exemplar classifier In factwe need to train exemplar classifiers in the number of sourcedomain instances and the method which integrates theseclassifiers is proposed in Section

31 Exemplar-SVM The exemplar-SVM is constructed by anextreme idea of training a classifier by a positive instance fromall the negative instances and then calibrating the outputsof classifiers into a probability distribution to separate thesamples The model trains the number of positive instanceclassifiers Learning a classifier which aims to separate apositive instance from all the negative instance can bemodeled as 119891 (w 119887) = w2 + 1198621ℎ (w푇x+ + 119887)+ 1198622 sum

xminus119894isin 푛minus119878

ℎ (minusw푇xminus푖 minus 119887) (1)

where sdot is 2-norm of a vector and 1198621 and 1198622 are thetradeoff parameters corresponding to119862 in SVM for balancingthe positive and negative error cost ℎ(119909) = max (0 1 minus 119909) isa hinge loss function

The formulation (1) is the primal problem of exemplar-SVM and we can find the dual problem for utilizing kernelmethod The dual formulation can be written as follows [38]

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878sum푖=1

120572푖 = 0

0 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1(2)

120572 = (1205720 1205721 120572푛minus119878

) isin R푛minus

119878+1 are Lagrangian multipliers

e is an identity vector We take this model as an exemplarlearner The matrix K isin R(푛

minus

119878+1)times(푛minus

119878+1) is composed of

K = [119896 (x+ x+) minusk푇minusk K] isin R

(푛minus119878+1)times(푛minus

119878+1)

119896 isin R푁minus119878

119896푖 = 119896 (x xminus푖 ) K푖푗 = 119896 (xminus푖 xminus푗 )

(3)

32 Pseudo Label for Kernel Matrix To make the best use ofsamples in source or target we construct the kernel matrixon both domain data However in the dual problem of SVMkernel matrixK needs to be supplied labeled data Ourmodelis based on the unsupervised domain adaptation problemin which only source domain data are labeled Motivated by[19] we use the pseudo label to help model training Pseudolabels are predicted by classical classifiers SVM in ourmodelwhich train on the source labeled dataDue to the distributionmismatch between source and target domain there may bemany labels incorrect Followed by [19] we assume that thepseudo class centroids calculated by them may reside notfar apart from the true class centroids Thus we use bothdomain data to supplement the kernel matrix K with labelinformation In our experiments we testify this method iseffective

33 Exemplar Learner in Domain Adaptation Form In facteach exemplar learner is an SVM in kernel form which istrained by a positive instance and all the negative instancesIn the opinion of [16] a discriminative exemplar classifier canbe taken as a representation of a positive instance Howeverin the task of object detection or image classification thisparametric form representation is feasible because of somecharacteristics in samples such as angle color orientationsand background which are hard to represent The instance-based parametric discriminative classifier can include moreinformation about positive samples Similarly with the moti-vation of transfer learning we can view a positive instance asa domain and there is some mismatch among domains Ourmodel aims to correct this mismatch and reduce the distancefrom the target domain We construct an exemplar learnerdistance metric of domains fromMMD and it can be writtenas

dist (x푆 x푇)= 1003817100381710038171003817100381710038171003817100381710038171003817120601 (x+푆 ) + 1119899minus푆

푛119878sum푖=1

120601 (xminus푆 ) minus 2119899푇푛119879sum푖=1

120601 (x푇)10038171003817100381710038171003817100381710038171003817100381710038172

H

(4)

Complexity 5

However it is just a metric of distance which is satisfiedwith our requirement of minimizing this distance by sometransformation Motivated by Transfer Component Analysis(TCA) we want to map the instance into a latent spacethat the instances from source and target domain are moresimilar and assume this mapping is 119875(119909) Namely we aimto minimize MMD distance between domains by mappinginstances into another spaceWe extend the distance functionas follows

dist (x푆 x푇) = 1003817100381710038171003817100381710038171003817100381710038171003817120601 (119875 (x+푆 )) + 1119899푆푛119878sum푖=1

120601 (119875 (xminus푆 ))minus 2119899푇푛119879sum푖=1

120601 (119875 (x푇))10038171003817100381710038171003817100381710038171003817100381710038172

H

(5)

Corresponding to a general approach it always reformulates(4) to construct a kernel matrix form We define the Grammatrices on the source positive domain source negativedomain and target domainThe kernelmatrixK is composedof nine submatrices K푇+ K푇minus K푇푇 K++ Kminusminus K+minus K+푆Kminus푇 Kminus+ where K = [120601(119909푖)푇 120601(119909푗)]

K = [[[K++ K+minus K+푇Kminus+ Kminusminus Kminus푇K푇+ K푇minus K푇푇

]]] isin R(1+푛minus119878+푛119879)times(1+푛

minus

119878+푛119879) (6)

and it constructs the coefficient matrix L

L푖푗 =

1 if x푖 x푗 isin X+푆 1119899minus푆 if x푖 isin X+푆 x푗 isin Xminus푆 minus 2119899푇 if x푖 isin X+푆 x푗 isin X푇minus 2119899minus푆119899푇 if x푖 isin X푇 x푗 isin Xminus푆 1119899minus푆 2 if x푖 x푗 isin Xminus푆 4119899푇2 if x푖 x푗 isin X푇

(7)

Thus the primal distance function is represented by KLMotivated by TCA [40] the mapping for primal data isequal to the transformation of kernel matrix generated by thesource and target domain data Utilizing the low-dimensiontransform matrix M isin R(1+푛

minus

119878+푛119879)times푚 reduces the dimension

of the primal kernel matrix It maps the empirical kernel mapK = (KKminus12)(Kminus12K) into an 119898-dimensional shared spaceMostly we replaced the distance functionKL by (KMM푇KL)In our case we follow [40] and minimize the trace of thedistance

dist (x+푆 xminus푆 x푇) = tr (M푇KLKM) (8)

For controlling the complexity ofM and preserving the datacharacteristic we add the regularization and constraint item

The domain adaptation item is formulated followed fromTCA and written as

Ω(x+푆 xminus푆 x푇) = tr (M푇KLKM) + 120583 tr (M푇M)st M푇KHKM = I푚 (9)

where 120583 gt 0 is a tradeoff parameter and I푚 isin R(푚times푚) isan identity matrix H = I푛minus

119878+푛119879+1

minus (1(119899minus푆 + 119899푇 + 1))ee푇 isa centering matrix

Furthermore the objective function of dual SVM needsto be added to the training label information which is similarto our model Thus we construct the training label matrix U

U = diag (y+푆 yminus푆 y푇) (10)

y+푆 is the label of a positive instance yminus푆 is the label vector

of negative source instances and y푇 is the pseudo labels oftarget instances which are predicted by SVM before It can berewritten in another form

U = diag(1 minus1 minus1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛minus119878

1199101푇 119910푛119879푇⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛119879

) (11)

Label matrix U provides the information of source domaindata labels and target domain pseudo labels The matrix K ina dual problem of exemplar-SVM (2) is primal data kernelmatrix We want to replace it by mapping the kernel matrixinto a latent subspace Namely replace K by K and the finalobjective function of each DAESVM model is formulated asfollows

min훼M120572푇K120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1M푇KHKM = I푚K = UKMM푇KU

(12)

4 Optimization Algorithm

To minimize problem (12) we adopt the alternated opti-mization method which alternates between solving twosubproblems over parameter 120572 and mapping matrix MUnder these methods the alternated optimization approachis guaranteed to decrease the objective function Algorithm 1summarizes the optimization procedure of problem (12)which we formulated

6 Complexity

Input X푡푟 X푡푒 parameter 120582 120583119898 1198621 and 1198622Output optimal 120572 andM(1) initial 120572 = 0(2) Construct kernel matrix K from X푡푟 and X푡푒 based on

(6) coefficient matrix L based on (7) centering matrixH label matrix U based on (11)

(3) repeat(4) Update transformation matrixM when fix 120572(5) Eigendecompose the optimization matrix of(KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK and select119898 leading eigenvectors to construct the transformation

matrixM(6) Solve the convex optimization problem for fixedM

to optimize 120572(7) until Convergence

Algorithm 1 Domain Adaptation Exemplar Support Vector Machine

Minimizing over 120572 The optimization over 120572 can be rewritteninto the following form

minM120572푇UKMM푇KU120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st M푇KHKM = I푚(13)

Being similar to TCA the formulation is containing a non-convex norm constraint and we transform this optimizationproblem by reformulating as

maxM

tr ((M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus1M푇KHKM) (14)

Proof The Lagrangian of (12) is

L (MZ) = 120572푇UKMM푇KU120572 minus e푇120572

+ 120582 tr (M푇KLKM) minus 120583 tr (M푇M)minus tr ((M푇KHKM minus I푚)Z)

(15)

Because the initial kernel matrixK is a symmetric matrix andwe can rewrite the first term of (15)

120572푇UKMM푇KU120572 = tr (120572푇UKMM푇KU120572)

= tr [(M푇K푇U120572)푇 (M푇K푇U120572)]= tr [(M푇K푇U120572) (M푇K푇U120572)푇]= tr (M푇KU120572120572푇UKM)

(16)

The original Lagrangian formulation is written as follows

tr (M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus tr ((M푇KHKM minus I푚)Z) (17)

The derivative of (17) is

(KU120572120572푇UK + 120582KLK minus 120583I푚)M minus KHKMZ (18)

We set the derivative above to zero and we get Z as

Z = (M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)sdot M (19)

Substituting Z into (17) we obtain

minM

tr ((M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M) (20)Final we obtain an equivalent maximization problem(14)

Being similar to TCA the solution is finding the 119898leading eigenvectors of (KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK

Minimizing overMThe optimization overM can be rewritteninto the following QP form

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 5: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Complexity 5

However it is just a metric of distance which is satisfiedwith our requirement of minimizing this distance by sometransformation Motivated by Transfer Component Analysis(TCA) we want to map the instance into a latent spacethat the instances from source and target domain are moresimilar and assume this mapping is 119875(119909) Namely we aimto minimize MMD distance between domains by mappinginstances into another spaceWe extend the distance functionas follows

dist (x푆 x푇) = 1003817100381710038171003817100381710038171003817100381710038171003817120601 (119875 (x+푆 )) + 1119899푆푛119878sum푖=1

120601 (119875 (xminus푆 ))minus 2119899푇푛119879sum푖=1

120601 (119875 (x푇))10038171003817100381710038171003817100381710038171003817100381710038172

H

(5)

Corresponding to a general approach it always reformulates(4) to construct a kernel matrix form We define the Grammatrices on the source positive domain source negativedomain and target domainThe kernelmatrixK is composedof nine submatrices K푇+ K푇minus K푇푇 K++ Kminusminus K+minus K+푆Kminus푇 Kminus+ where K = [120601(119909푖)푇 120601(119909푗)]

K = [[[K++ K+minus K+푇Kminus+ Kminusminus Kminus푇K푇+ K푇minus K푇푇

]]] isin R(1+푛minus119878+푛119879)times(1+푛

minus

119878+푛119879) (6)

and it constructs the coefficient matrix L

L푖푗 =

1 if x푖 x푗 isin X+푆 1119899minus푆 if x푖 isin X+푆 x푗 isin Xminus푆 minus 2119899푇 if x푖 isin X+푆 x푗 isin X푇minus 2119899minus푆119899푇 if x푖 isin X푇 x푗 isin Xminus푆 1119899minus푆 2 if x푖 x푗 isin Xminus푆 4119899푇2 if x푖 x푗 isin X푇

(7)

Thus the primal distance function is represented by KLMotivated by TCA [40] the mapping for primal data isequal to the transformation of kernel matrix generated by thesource and target domain data Utilizing the low-dimensiontransform matrix M isin R(1+푛

minus

119878+푛119879)times푚 reduces the dimension

of the primal kernel matrix It maps the empirical kernel mapK = (KKminus12)(Kminus12K) into an 119898-dimensional shared spaceMostly we replaced the distance functionKL by (KMM푇KL)In our case we follow [40] and minimize the trace of thedistance

dist (x+푆 xminus푆 x푇) = tr (M푇KLKM) (8)

For controlling the complexity ofM and preserving the datacharacteristic we add the regularization and constraint item

The domain adaptation item is formulated followed fromTCA and written as

Ω(x+푆 xminus푆 x푇) = tr (M푇KLKM) + 120583 tr (M푇M)st M푇KHKM = I푚 (9)

where 120583 gt 0 is a tradeoff parameter and I푚 isin R(푚times푚) isan identity matrix H = I푛minus

119878+푛119879+1

minus (1(119899minus푆 + 119899푇 + 1))ee푇 isa centering matrix

Furthermore the objective function of dual SVM needsto be added to the training label information which is similarto our model Thus we construct the training label matrix U

U = diag (y+푆 yminus푆 y푇) (10)

y+푆 is the label of a positive instance yminus푆 is the label vector

of negative source instances and y푇 is the pseudo labels oftarget instances which are predicted by SVM before It can berewritten in another form

U = diag(1 minus1 minus1⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛minus119878

1199101푇 119910푛119879푇⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟푛119879

) (11)

Label matrix U provides the information of source domaindata labels and target domain pseudo labels The matrix K ina dual problem of exemplar-SVM (2) is primal data kernelmatrix We want to replace it by mapping the kernel matrixinto a latent subspace Namely replace K by K and the finalobjective function of each DAESVM model is formulated asfollows

min훼M120572푇K120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1M푇KHKM = I푚K = UKMM푇KU

(12)

4 Optimization Algorithm

To minimize problem (12) we adopt the alternated opti-mization method which alternates between solving twosubproblems over parameter 120572 and mapping matrix MUnder these methods the alternated optimization approachis guaranteed to decrease the objective function Algorithm 1summarizes the optimization procedure of problem (12)which we formulated

6 Complexity

Input X푡푟 X푡푒 parameter 120582 120583119898 1198621 and 1198622Output optimal 120572 andM(1) initial 120572 = 0(2) Construct kernel matrix K from X푡푟 and X푡푒 based on

(6) coefficient matrix L based on (7) centering matrixH label matrix U based on (11)

(3) repeat(4) Update transformation matrixM when fix 120572(5) Eigendecompose the optimization matrix of(KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK and select119898 leading eigenvectors to construct the transformation

matrixM(6) Solve the convex optimization problem for fixedM

to optimize 120572(7) until Convergence

Algorithm 1 Domain Adaptation Exemplar Support Vector Machine

Minimizing over 120572 The optimization over 120572 can be rewritteninto the following form

minM120572푇UKMM푇KU120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st M푇KHKM = I푚(13)

Being similar to TCA the formulation is containing a non-convex norm constraint and we transform this optimizationproblem by reformulating as

maxM

tr ((M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus1M푇KHKM) (14)

Proof The Lagrangian of (12) is

L (MZ) = 120572푇UKMM푇KU120572 minus e푇120572

+ 120582 tr (M푇KLKM) minus 120583 tr (M푇M)minus tr ((M푇KHKM minus I푚)Z)

(15)

Because the initial kernel matrixK is a symmetric matrix andwe can rewrite the first term of (15)

120572푇UKMM푇KU120572 = tr (120572푇UKMM푇KU120572)

= tr [(M푇K푇U120572)푇 (M푇K푇U120572)]= tr [(M푇K푇U120572) (M푇K푇U120572)푇]= tr (M푇KU120572120572푇UKM)

(16)

The original Lagrangian formulation is written as follows

tr (M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus tr ((M푇KHKM minus I푚)Z) (17)

The derivative of (17) is

(KU120572120572푇UK + 120582KLK minus 120583I푚)M minus KHKMZ (18)

We set the derivative above to zero and we get Z as

Z = (M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)sdot M (19)

Substituting Z into (17) we obtain

minM

tr ((M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M) (20)Final we obtain an equivalent maximization problem(14)

Being similar to TCA the solution is finding the 119898leading eigenvectors of (KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK

Minimizing overMThe optimization overM can be rewritteninto the following QP form

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 6: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

6 Complexity

Input X푡푟 X푡푒 parameter 120582 120583119898 1198621 and 1198622Output optimal 120572 andM(1) initial 120572 = 0(2) Construct kernel matrix K from X푡푟 and X푡푒 based on

(6) coefficient matrix L based on (7) centering matrixH label matrix U based on (11)

(3) repeat(4) Update transformation matrixM when fix 120572(5) Eigendecompose the optimization matrix of(KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK and select119898 leading eigenvectors to construct the transformation

matrixM(6) Solve the convex optimization problem for fixedM

to optimize 120572(7) until Convergence

Algorithm 1 Domain Adaptation Exemplar Support Vector Machine

Minimizing over 120572 The optimization over 120572 can be rewritteninto the following form

minM120572푇UKMM푇KU120572 minus e푇120572 + 120582 tr (M푇KLKM)+ 120583 tr (M푇M)

st M푇KHKM = I푚(13)

Being similar to TCA the formulation is containing a non-convex norm constraint and we transform this optimizationproblem by reformulating as

maxM

tr ((M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus1M푇KHKM) (14)

Proof The Lagrangian of (12) is

L (MZ) = 120572푇UKMM푇KU120572 minus e푇120572

+ 120582 tr (M푇KLKM) minus 120583 tr (M푇M)minus tr ((M푇KHKM minus I푚)Z)

(15)

Because the initial kernel matrixK is a symmetric matrix andwe can rewrite the first term of (15)

120572푇UKMM푇KU120572 = tr (120572푇UKMM푇KU120572)

= tr [(M푇K푇U120572)푇 (M푇K푇U120572)]= tr [(M푇K푇U120572) (M푇K푇U120572)푇]= tr (M푇KU120572120572푇UKM)

(16)

The original Lagrangian formulation is written as follows

tr (M푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M)minus tr ((M푇KHKM minus I푚)Z) (17)

The derivative of (17) is

(KU120572120572푇UK + 120582KLK minus 120583I푚)M minus KHKMZ (18)

We set the derivative above to zero and we get Z as

Z = (M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)sdot M (19)

Substituting Z into (17) we obtain

minM

tr ((M푇KHKM)daggerM푇 (KU120572120572푇UK + 120582KLK minus 120583I푚)M) (20)Final we obtain an equivalent maximization problem(14)

Being similar to TCA the solution is finding the 119898leading eigenvectors of (KU120572120572푇UK + 120582KLK minus 120583I푚)minus1KHK

Minimizing overMThe optimization overM can be rewritteninto the following QP form

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 7: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Complexity 7

Input y푆 120572 X푡푒 parameterPOutput prediction labels y(1) Compute the weights w of the classifiers(2) Construct weight matrixW and bias b of predictors

based on 120572(3) repeat(4) Compute scores of each classifier in this category(5) Find topP scores(6) Compute the sum of these top scores(7) untilThe number of categories(8) Choose the max score owned category as the prediction

label y

Algorithm 2 Ensemble Domain Adaptation Exemplar Classifiers

min120572

120572푇K120572 minus e푇120572

st 1205720 minus 푛minus119878+푛119879sum푖=1

120572푖 = 00 le 1205720 le 11986210 le 120572푖 le 1198622forall119894 ge 1

(21)

K = UKMM푇KU which represents the kernel matrix hasbeen transformed by transformation matrix M It is obviousthat this problem is a QP problem and it could be solvedefficiently using interior point methods or other succes-sive optimization procedures such as Alternating DirectionMethod of Multipliers (ADMM)

5 Ensemble Domain AdaptationExemplar Classifiers

In this section we introduce the method of integrationexemplar classifiers As mentioned before we get the numberof source domain instances classifiers and this section aimsto predict labels for target domain instances In our opinionsthe classification hyperplane of an exemplar classifier is rep-resentation for a source domain positive instance Howevermost of the hyperplanes contain information which comesfrom various samples such as images of different backgroundor source In fact we aim to search the exemplar classifierswhich are from instances similar to the testing sample Thuswe utilize integrating method to filter out classifiers whichinclude details different with the testing sample Anotherview for the integration method is that it slacks the part ofhyperplanes Namely it removes some exemplar classifierswhich are trained by large instances distribution mismatch

In our method we first construct the classifiers fromLagrange multipliers 120572 The classifier construction equationis

w = 1205720x+ minus 푛minus119878+푛119879sum푖=1

120572푖xminus푖 (22)

where w is the weight of classifier

119887 = 119910푗 minus 1205720K0푗 minus 푛minus119878+푛119879sum푖=1

119910푖120572푖K푖푗 (23)

where 119887 is the bias of classifier The classifier is given by119904 = 120572⊤x + 119887 (24)

And then we compute the scores by every classifier andthe testing instance Second we find the top P numbers ofscores for each class classifier and compute the sum of thosescores At last we get a score for each class and the highestscore is the category that we predict The prediction methodis described in Algorithm 2

6 Experiments

In this section we conduct experiments onto the fourdomains Amazon DSLR Caltech and Webcam to evaluatethe performance of proposed Domain Adaptation ExemplarSupport Vector Machines We first compare our methodto baselines and other domain adaptation methods Nextwe analyze the effectiveness of our approach At last weintroduce the problem of parameter sensitivity

61 Data Preparation We run the experiments on Office andOfficeCaltech datasets Office dataset contains three domainsAmazon Webcam and DSLR Each of them includes imagesfrom amazoncom or Office environment images taken withvarying lighting and pose changes using a Webcam or aDSLR camera Office Caltech dataset contains the ten over-lapping categories between the Office dataset and Caltech-256 dataset By the standard transfer learning experimentmethod we merge two datasets it entirely includes fourdomains Amazon DSLR Caltech and Webcam which arestudied in [41] The dataset of Amazon is the images down-loaded from Amazon merchants The images in the Webcamalso come from the online web page but they are of lowquality as they are taken by web camera The domain ofDSLR is photographed by the digital SLR camera by whichthe images are of high quality Caltech is always addedto domain adaptation experiments and it is collected byobject detection tasks Each domain has its characteristicCompared to the other domains the quality of images in

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 8: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

8 Complexity

the DSLR is higher than others and the influence factorssuch as object detection and background are less than imagesdownloaded from the web Amazon andWebcam come fromthe web and images in the domains are of low quality andmore complexity However there are some different detailson each of them Instances in the Webcam are object alonebut the composition of samples in Amazon is more complexincluding background and other goods Figure 1 shows theexample of the backpack from four domain samples In theview of transfer learning the datasets come from differentdomains and the differentmargin probabilities for the imagesIn our model we aim to solve this problem and get a robustclassifier for the cross-domain

We chose ten common categories among all four datasetsbackpack bike bike helmet bookcase bottle calculator deskchair desk lamp desktop computer and file cabinet Thereare 8 to 151 samples per category in a domain 958 imagesin Amazon 295 images in Webcam 157 images in DSLR1123 images in Caltech and 2533 images total in the datasetFigure 1 shows examples for datasets

We follow both SURF and DeCAF features extraction inthe experiments First we use SURF features encoding theimages into 800-bin histograms Next we use DeCAF featurewhich is extracted by 7 layers of Alex-net [42] into 4096-binhistograms At last we normalized the histograms and then119911-scored to have zero mean and unit standard deviation ineach dimension

We run our experiments on a standard way for visualdomain adaptation It always uses one of four datasets assource domain and another one as target domain Eachdataset provides same ten categories and uses the samerepresentation of images which is considered as the problemof homogeneous domain adaptation For example we chooseimages taken by the set of DSLR (denoted by 119863) as sourcedomain data and use images in Amazon (denoted by 119860) astarget domain data This problem is denoted as D rarr AUsing this method we can compose 12 domain adaptationsubproblems from four domains

62 Experiment Setup

(1) Baseline Method We compare our DAESVMmethod withthree kinds of classical approaches one is classified withoutregularization of transfer learning the second is conventionaltransfer learning methods and the last one is the foundationmodel which is low-rank exemplar support vector machineThe methods are listed as follows

(1) Transfer Component Analysis (TCA) [40](2) Support Vector Machine (SVM) [43](3) Geodesic Flow Kernel (GFK) [28](4) Landmarks Selection-based Subspace Alignment

(LSSA) [23](5) Kernel Mean Maximum (KMM) [20](6) Subspace Alignment (SA) [44](7) Joint Matching Transfer (TJM) [45](8) Low-Rank Exemplar-SVMs (LRESVMs) [18]

TCA GFK and KMM are the classical transfer learningmethods We compare our model with these methodsBesides we prove our method is more robust than modelswithout domain adaptation items in the transfer learningscenery TCA is the foundation of our model and it is similarto GFK and SFA which are based on the idea of featuretransfer KMM transfer knowledge by instance reweightingTJM is a popularmodel utilizing the problemof unsuperviseddomain adaptation SA and LSSA are the models usinglandmarks to transfer knowledge

(2) Implementation Details For baseline method SVM istrained on the source data and tested on the target data [46]TCA SA LSSA TJM and GFK are first viewed as dimensionreduction process and then train a classifier on the sourcedata and make a prediction for the target domain [19] Beingsimilar to dimension reduction KMM is first to computethe weight of each instance and then train predictor on thereweighting source data

Under the assumption of unsupervised domain adap-tation it is impossible to tune the optimal parameters forthe target domain task by cross validation since thereexists distribution mismatch between domains Thereforein the experiments we adopt the strategy of Grid Searchto obtain the best parameters and report the best resultsOur method involves five tunable parameters tradeoff inESVM 1198621 and 1198622 tradeoff in regularization items 120582 and120583 and parameter of dimension reduction 119898 The param-eters of tradeoff in ESVM 1198621 and 1198622 are selected over10minus3 10minus2 10minus1 10minus0 101 102 103 We fix 120582 = 1 120583 = 1119898 = 40 empirically and select radial basic function (RBF)as the kernel function In fact our model is relatively stableunder a wide range of parameter values We train a classifierfor every positive instance in the source domain data andthenwe put them into a probability distributionWe deal withthe multiclass classifier in a one versus the others way Tomeasure the performance of our method we use the averageaccuracy and the standard deviation over ten repetitionsTheaverage testing accuracies and standard errors for all 12 tasksof ourmethods are reported in Table 1 For the rest of baselineexperiments most of them are cited by the papers which arepublished before

63 Experiments Results In this section we compare ourDAESVM with baseline methods regarding classificationaccuracy

Table 1 summarizes the classification accuracy obtainedby all the 10 categories and generates 12 tasks in 4 domainsThe highest accuracy is in a bold font which indicates thatthe performance of this task is better than others First weimplement the traditional classifiers without domain adapta-tion items that we train the predictors on the source domaindata andmake a prediction for target domain dataset Secondwe compared our DAESVM with unsupervised domainadaptation methods such as TCA or GFK implementedto use the same dimension reduction with the parameter119898 in our model At last we also compared DAESVMwith newly transfer learning models like low-rank ESVMs[18]

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 9: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Complexity 9

(a)

(b)

Figure 1 Example images from the backpack category in Amazon DLSR ((a) from left to right) Webcam and Caltech-256 ((b) from left toright) The different domain images are various The images have different style background or sources

Overall in a usual transfer learning way we run datasetsacross different pairs of source and target domain Theaccuracy of DAESVM for the adaptation fromDSLR toWeb-cam can achieve 921 which make the improvement overLRESVM by 12 Compared with TCA DAESVMs makea consideration about the distribution mismatch among

instances or different domains For the adaptation fromWebcam to DSLR this task can get the accuracy of 918For the domain datasets Amazon and Caltech which aremore significant than DSLR andWebcam DAESVM gets theaccuracy of 775 which improves about 362 compared tothemethod of TJM For the ability which transfers knowledge

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 10: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

10 Complexity

Table 1 Classification accuracies of different methods for different tasks of domain adaptationWe conduct the experiments on conventionaltransfer learning methods Comparing with traditional methods DAESVMs gain a big improvement in the prediction accuracy And theyalso improve confronted with the approach of LRESVM which is proposed recently [average plusmn standard error of accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMsA rarr C 454 422 453 569 518 496 548 798 775 plusmn 079A rarr D 507 427 603 564 564 557 573 749 768 plusmn 076A rarr W 474 424 613 510 547 569 567 754 732 plusmn 108C rarr A 507 483 547 586 571 512 584 772 802 plusmn 039C rarr D 532 535 564 574 590 571 591 871 890 plusmn 023C rarr W 442 458 504 588 627 571 581 741 747 plusmn 038D rarr A 408 422 538 461 589 592 584 804 834 plusmn 141D rarr C 483 416 439 496 543 594 577 790 730 plusmn 104D rarr W 678 729 824 820 834 802 871 910 921 plusmn 025W rarr A 424 419 530 508 570 662 597 743 778 plusmn 033W rarr C 412 390 537 548 347 524 542 706 665 plusmn 054W rarr D 802 820 879 834 789 812 872 892 918 plusmn 059Average 510 495 586 588 591 605 624 794 800 plusmn 067Table 2We also conduct our experiments for the tasks of multidomain and gain an improvement comparing withmethods proposed beforeThe experiments adopt the same strategy as the single domain adaptation We treat multidomain as one source or target to find the sharedfeatures in a latent space However the complexity of the multidomain shared features limits the accuracy of tasks [average plusmn standard errorof accuracy ()]

Task SVM KMM TCA TJM SA GFK LSSA LRESVM DAESVMDW rarr A 457 374 405 571 594 473 617 801 772 plusmn 127AD rarr CW 371 316 430 602 487 476 742 869 847 plusmn 065D rarr ACW 414 438 572 639 519 514 770 829 884 plusmn 021ADW rarr C 439 506 549 690 602 604 637 877 901 plusmn 034AD rarr W 710 610 540 613 540 470 719 808 838 plusmn 078AC rarr DW 814 539 774 718 574 641 807 893 924 plusmn 025Average 534 464 545 639 552 530 715 846 861 plusmn 058from large dataset to small domain dataset from Amazon toDSLRwe get the accuracy of 768Contrarily fromDSLR toAmazon the prediction accuracy is 834 Totally speakingour DAESVM trained on one domain has good performanceand will also have robust performance on multidomain

We also complement tasks of multidomains adaptationwhich utilized one or more domains as source domain dataand made an adaptation to other domains The results areshown in Table 2The accuracy of DAEVM for the adaptationfromAmazon DSLR andWebcam to Caltech achieves 901which get the improvement over LERSVM For the task ofadaptation from Amazon and Caltech toWebcam DSLR canget the accuracy of 924 The experiments prove that ourmodels are effective not only for single domain adaptation butalso for multidomain adaptation

Two key factors may contribute to the superiority of ourmethod The feature transfer regularization item is utilizedto slack the similarity assumption It just assumes that thereare some shared features in different domains instead of theassumption that different domains are similar to each otherThis factor makes the model more robust than models withreweighting item The second factor is the exemplar-SVMswhich are proposed from a motivation of transfer learningwhich makes a consideration that instances are distribution

mismatch from each other Our model combines these twofactors to resist the problem of distribution mismatch amongdomains and sample selection bias among instances

64 Pseudo Label Effectiveness Following [19] we use pseudolabels to supplement training model In our experimentswe test the prediction results which are influenced by theaccuracy rate of pseudo labels As a result described byFigure 2 the prediction accuracy is improved following theincreasing accuracy of pseudo labels It is proved that themethod of the pseudo label is effective and we can do theiteration by using the labels predicted by the DAESVM as thepseudo labels The iteration step can efficiently enhance theperformance of the classifiers

65 Parameter Sensitivity There are five parameters in ourmodel and we conduct the parameter sensitivity analysiswhich can achieve optimal performance under a wide rangeof parameter values and discuss the results

(1) Tradeoff 120582 120582 is a tradeoff to control the weight of MMDitem which aims to minimize the distribution mismatchbetween source and target domain Theoretically we wantthis term to be equal to zero However if we set this

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 11: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Complexity 11

0 02 04 06 08 102

04

06

08

1

Pseudo label accuracy

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 2The accuracy of DAESVMs is improvedwith the improve-ment of the pseudo label accuracyThe results verify the effectivenessof the pseudo label method

parameter to infinite 120582 rarr infin it may lose the data propertieswhen we transform source and target domain data intohigh-dimension space Contrarily if we set 120582 to zero themodel would lose the function of correcting the distributionmismatch

(2) Tradeoff 120583 120583 is a tradeoff to control the weight ofdata variance item which aims to preserve data propertiesTheoretically we want this item to be equal to zero Howeverif we set this parameter to infinite 120583 rarr infin it may augmentthe data distribution mismatch among different domainsnamely transformation matrix M cannot utilize source datato assist the target task Contrarily if we set 120583 to zero themodel cannot preserve the properties of original data

(3) Dimension Reduction119898119898 is the dimension of the trans-formation matrix namely the dimension of the subspacewhich we want to map samples into Similarly minimizing119898too less may lead to losing the properties of data which maylead to the classifier failure If119898 is too large the effectivenessof correct distributionmismatchmay be lostWe conduct theclassification results influenced by the dimension of 119898 andthe results are displayed in Figure 3

(4) Tradeoff in ESVM 1198621 and 1198622 Parameters 1198621 and 1198622are the upper bound of the Lagrangian variables In thestandard SVM positive and negative instances share thesame standard of these two parameters In our models weexpect the weights of the positive samples to be higher thannegative samples In our experiments the value of 1198621 isone hundred times 1198622 which could gain a high-performancepredictor The visual analysis of these two parameters is inFigure 4

20 40 60 80 100

1

Dimension m

04

06

08

Pred

ictio

n ac

cura

cy

rarr

rarr

rarr

Figure 3 When the dimension is 20 or 40 the prediction accuracyis higher than others

108 1006 80

604

07

075

08

085

09

095

402 200 0

Figure 4 We fix 120582 = 1 119898 = 20 and 120583 = 1 in these experimentsand 1198621 is searched in 01 05 1 5 10 50 100 and 1198622 is searched in0001 0005 001 01 05 1 107 Conclusion

In this paper we have proposed an effective method fordomain adaptation problems with regularization item whichreduces the data distribution mismatch between domainsand preserves properties of the original data Furthermoreutilizing the method of integrating classifiers can predicttarget domain data with high accuracyThe proposedmethodmainly aims to solve the problem in which domains orinstances distributions mismatch occurs Meanwhile weextend DAESVMs to the multiple source or target domainsExperiments conducted on the transfer learning datasetstransfer knowledge from image to image

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 12: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

12 Complexity

Our future works are as follows First we will integratethe training procession of all the classifiers in an ensembleway It is better to accelerate training process by rewritingall the weight into a matrix form This strategy can omitthe process of matrix inversion optimization Second wewant to make a constraint for 120572 that can hold the sparsityAt last we will extend DAESVMs on the problem transferknowledge among domains which have few relationshipssuch as transfer knowledge from image to video or text

Notations and Descriptions

D푆D푇 Sourcetarget domainT푆T푇 Sourcetarget task119889 Dimension of featureX푆X푇 Sourcetarget sample matrixy푆 y푇 Sourcetarget sample label matrixK Kernel matrix without label information120572 Lagrange multipliers vector119899푆 119899푇 The number of sourcetarget domain

instancese Identity vectorI Identity matrix

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This work has been partially supported by grants fromNational Natural Science Foundation of China (nos61472390 71731009 91546201 and 11771038) and the BeijingNatural Science Foundation (no 1162005)

References

[1] S Ren K He R Girshick and J Sun ldquoFaster R-CNN TowardsReal-Time Object Detection with Region Proposal NetworksrdquoIEEE Transactions on Pattern Analysis andMachine Intelligencevol 39 no 6 pp 1137ndash1149 2017

[2] R Collobert and J Weston ldquoA unified architecture for naturallanguage processing deep neural networks with multitasklearningrdquo in Proceedings of the 25th International Conference onMachine Learning pp 160ndash167 ACM July 2008

[3] R Girshick J Donahue T Darrell and J Malik ldquoRich fea-ture hierarchies for accurate object detection and semanticsegmentationrdquo in Proceedings of the 27th IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo14) pp 580ndash587 Columbus Ohio USA June 2014

[4] S J Pan and Q Yang ldquoA survey on transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 22 no10 pp 1345ndash1359 2010

[5] W-S Chu F D L Torre and J F Cohn ldquoSelective transfermachine for personalized facial action unit detectionrdquo inProceedings of the 26th IEEEConference onComputer Vision andPattern Recognition CVPR 2013 pp 3515ndash3522 USA June 2013

[6] A Kumar A Saha and H Daume ldquoCo-regularization basedsemi-supervised domain adaptationrdquo In Advances in NeuralInformation Processing Systems 23 pp 478ndash486 2010

[7] M Xiao and Y Guo ldquoFeature space independent semi-supervised domain adaptation via kernel matchingrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol37 no 1 pp 54ndash66 2015

[8] S J Pan J T Kwok Q Yang and J J Pan ldquoAdaptive localizationin a dynamic WiFi environment through multi-view learningrdquoin Proceedings of the AAAI-07IAAI-07 Proceedings 22nd AAAIConference on Artificial Intelligence and the 19th InnovativeApplications of Artificial Intelligence Conference pp 1108ndash1113can July 2007

[9] A Van Engelen A C Van Dijk M T B Truijman et alldquoMulti-Center MRI Carotid Plaque Component SegmentationUsing Feature Normalization and Transfer Learningrdquo IEEETransactions on Medical Imaging vol 34 no 6 pp 1294ndash13052015

[10] Y Zhang J Wu Z Cai P Zhang and L Chen ldquoMemeticExtreme Learning Machinerdquo Pattern Recognition vol 58 pp135ndash148 2016

[11] M Uzair and A Mian ldquoBlind domain adaptation with aug-mented extreme learning machine featuresrdquo IEEE Transactionson Cybernetics vol 47 no 3 pp 651ndash660 2017

[12] L Zhang andD Zhang ldquoDomainAdaptation ExtremeLearningMachines for Drift Compensation in E-Nose Systemsrdquo IEEETransactions on Instrumentation and Measurement vol 64 no7 pp 1790ndash1801 2015

[13] B Scholkopf J Platt and T Hofmann in A kernel method forthe two-sample-problem pp 513ndash520 2008

[14] J Wu S Pan X Zhu C Zhang and X Wu ldquoMulti-instanceLearning withDiscriminative BagMappingrdquo IEEE Transactionson Knowledge and Data Engineering pp 1-1

[15] J Wu S Pan X Zhu C Zhang and P S Yu ldquoMultipleStructure-View Learning for Graph Classificationrdquo IEEE Trans-actions on Neural Networks and Learning Systems vol PP no99 pp 1ndash16 2017

[16] T Malisiewicz A Gupta and A A Efros ldquoEnsemble ofexemplar-SVMs for object detection and beyondrdquo in Proceed-ings of the 2011 IEEE International Conference on ComputerVision ICCV 2011 pp 89ndash96 Spain November 2011

[17] B Zadrozny ldquoLearning and evaluating classifiers under sampleselection biasrdquo in Proceedings of the Twenty-first internationalconference p 114 Banff Alberta Canada July 2004

[18] W Li Z Xu D Xu D Dai and L Van Gool ldquoDomainGeneralization and Adaptation using Low Rank ExemplarSVMsrdquo IEEE Transactions on Pattern Analysis and MachineIntelligence pp 1-1

[19] M Long J Wang G Ding S J Pan and P S Yu ldquoAdaptationregularizationA general framework for transfer learningrdquo IEEETransactions on Knowledge and Data Engineering vol 26 no 5pp 1076ndash1089 2014

[20] J Huang A J Smola A Gretton K M Borgwardt andB Scholkopf ldquoCorrecting sample selection bias by unlabeleddatardquo in Proceedings of the 20th Annual Conference on NeuralInformation Processing Systems NIPS 2006 pp 601ndash608 canDecember 2006

[21] M Sugiyama and M Kawanabe Machine Learning in Non-Stationary Environments The MIT Press 2012

[22] W Dai Q Yang G Xue and Y Yu ldquoBoosting for transferlearningrdquo in Proceedings of the 24th International Conference on

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 13: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Complexity 13

Machine Learning (ICML rsquo07) pp 193ndash200NewYorkNYUSAJune 2007

[23] R Aljundi R Emonet D Muselet and M Sebban ldquoLand-marks-based kernelized subspace alignment for unsuperviseddomain adaptationrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition CVPR 2015 pp 56ndash63 USA June 2015

[24] B Tan Y Zhang S J Pan and Q Yang ldquoDistant domaintransfer learningrdquo in Proceedings of the 31st AAAI Conference onArtificial Intelligence AAAI 2017 pp 2604ndash2610 usa February2017

[25] W-S Chu F De La Torre and J F Cohn ldquoSelective transfermachine for personalized facial expression analysisrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol39 no 3 pp 529ndash545 2017

[26] R Aljundi J Lehaire F Prost-Boucle O Rouviere and C Lar-tizien ldquoTransfer learning for prostate cancer mapping based onmulticentricMR imaging databasesrdquo Lecture Notes in ComputerScience (including subseries LectureNotes inArtificial Intelligenceand Lecture Notes in Bioinformatics) Preface vol 9487 pp 74ndash82 2015

[27] M LongTransfer learning problems andmethods [PhD thesis]Tsinghua University problems and methods PhD thesis 2014

[28] B Gong Y Shi F Sha and K Grauman ldquoGeodesic flow kernelfor unsupervised domain adaptationrdquo inProceedings of the IEEEConference on Computer Vision and Pattern Recognition (CVPRrsquo12) pp 2066ndash2073 June 2012

[29] J Blitzer R McDonald and F Pereira ldquoDomain adaptationwith structural correspondence learningrdquo in Proceedings of theConference on Empirical Methods in Natural Language Process-ing (EMNLP rsquo06) pp 120ndash128 Association for ComputationalLinguistics July 2006

[30] M Long Y Cao J Wang and M I Jordan ldquoLearning transfer-able features with deep adaptation networksrdquo and M I JordanLearning transferable features with deep adaptation networkspages 97ndash105 pp 97ndash105 2015

[31] M Long J Wang and M I Jordan Deep transfer learning withjoint adaptation networks 2016

[32] J Yosinski J Clune Y Bengio andH Lipson ldquoHow transferableare features in deep neural networksrdquo in Proceedings of the 28thAnnual Conference on Neural Information Processing Systems2014 NIPS 2014 pp 3320ndash3328 can December 2014

[33] J Yang R Yan and A G Hauptmann ldquoCross-domain videoconcept detection using adaptive SVMsrdquo in Proceedings of the15th ACM International Conference on Multimedia (MM rsquo07)pp 188ndash197 September 2007

[34] S Li S Song and G Huang ldquoPrediction reweighting fordomain adaptionrdquo IEEE Transactions on Neural Networks andLearning Systems vol 28 no 7 pp 1682ndash1695 2017

[35] Z Xu W Li L Niu and D Xu ldquoExploiting low-rank structurefrom latent domains for domain generalizationrdquo Lecture Notesin Computer Science (including subseries Lecture Notes in Arti-ficial Intelligence and Lecture Notes in Bioinformatics) Prefacevol 8691 no 3 pp 628ndash643 2014

[36] L NiuW Li D Xu and J Cai ldquoAn Exemplar-BasedMulti-ViewDomain Generalization Framework for Visual RecognitionrdquoIEEE Transactions on Neural Networks and Learning Systems2016

[37] L NiuW Li and D Xu ldquoMulti-view domain generalization forvisual recognitionrdquo in Proceedings of the 15th IEEE InternationalConference on Computer Vision ICCV 2015 pp 4193ndash4201Chile December 2015

[38] T Kobayashi ldquoThree viewpoints toward exemplar SVMrdquo inProceedings of the IEEE Conference on Computer Vision andPatternRecognition CVPR2015 pp 2765ndash2773USA June 2015

[39] S J Pan J T Kwok and Q Yang ldquoTransfer learning via dimen-sionality reductionrdquo in In Proceedings of the AAAI Conferenceon Artificial Intelligence pp 677ndash682 2008

[40] S J Pan I W Tsang J T Kwok and Q Yang ldquoDomainadaptation via transfer component analysisrdquo IEEE TransactionsonNeural Networks and Learning Systems vol 22 no 2 pp 199ndash210 2011

[41] K Saenko B Kulis M Fritz and T Darrell ldquoAdapting visualcategory models to new domainsrdquo in Computer VisionmdashECCV2010 vol 6314 ofLectureNotes inComputer Science pp 213ndash226Springer Berlin Germany 2010

[42] A Krizhevsky I Sutskever andG EHinton ldquoImagenet classifi-cation with deep convolutional neural networksrdquo in Proceedingsof the 26th Annual Conference on Neural Information ProcessingSystems (NIPS rsquo12) pp 1097ndash1105 Lake Tahoe Nev USADecember 2012

[43] VNVapnik Statistical LearningTheory Adaptive and LearningSystems for Signal Processing Communications and ControlWiley- Interscience New York NY USA 1998

[44] B Fernando A Habrard M Sebban and T Tuytelaars ldquoUnsu-pervised visual domain adaptation using subspace alignmentrdquoin Proceedings of the 2013 14th IEEE International Conferenceon Computer Vision ICCV 2013 pp 2960ndash2967 AustraliaDecember 2013

[45] M Long J Wang G Ding J Sun and P S Yu ldquoTransfer jointmatching for unsupervised domain adaptationrdquo in Proceedingsof the 27th IEEE Conference on Computer Vision and PatternRecognition CVPR 2014 pp 1410ndash1417 USA June 2014

[46] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom

Page 14: Unsupervised Domain Adaptation Using Exemplar-SVMs with ...downloads.hindawi.com/journals/complexity/2018/8425821.pdf · ResearchArticle Unsupervised Domain Adaptation Using Exemplar-SVMs

Hindawiwwwhindawicom Volume 2018

MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Mathematical Problems in Engineering

Applied MathematicsJournal of

Hindawiwwwhindawicom Volume 2018

Probability and StatisticsHindawiwwwhindawicom Volume 2018

Journal of

Hindawiwwwhindawicom Volume 2018

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawiwwwhindawicom Volume 2018

OptimizationJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Engineering Mathematics

International Journal of

Hindawiwwwhindawicom Volume 2018

Operations ResearchAdvances in

Journal of

Hindawiwwwhindawicom Volume 2018

Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018

International Journal of Mathematics and Mathematical Sciences

Hindawiwwwhindawicom Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Hindawiwwwhindawicom Volume 2018Volume 2018

Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in

Nature and SocietyHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Dierential EquationsInternational Journal of

Volume 2018

Hindawiwwwhindawicom Volume 2018

Decision SciencesAdvances in

Hindawiwwwhindawicom Volume 2018

AnalysisInternational Journal of

Hindawiwwwhindawicom Volume 2018

Stochastic AnalysisInternational Journal of

Submit your manuscripts atwwwhindawicom