HAL Id: hal-00835810 https://hal.inria.fr/hal-00835810 Submitted on 19 Jun 2013 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Good Practice in Large-Scale Learning for Image Classification Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid To cite this version: Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid. Good Practice in Large-Scale Learning for Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2014, 36 (3), pp.507-520. 10.1109/TPAMI.2013.146. hal-00835810
15
Embed
Good Practice in Large-Scale Learning for Image Classification
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
HAL Id: hal-00835810https://hal.inria.fr/hal-00835810
Submitted on 19 Jun 2013
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Good Practice in Large-Scale Learning for ImageClassification
To cite this version:Zeynep Akata, Florent Perronnin, Zaid Harchaoui, Cordelia Schmid. Good Practice in Large-ScaleLearning for Image Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence,Institute of Electrical and Electronics Engineers, 2014, 36 (3), pp.507-520. 10.1109/TPAMI.2013.146.hal-00835810
accuracy in on Fungus, Ungulate, Vehicle with 10 and 25
training images per class (repeated 100 times) using
4,096-dim BOV.
ranking framework of Usunier et al. [75] (see also our section
3.2) the MUL, RNK and WAR objective functions can be
written as sums of losses over triplets (xi, yi, y). Since the
number of triplets is C times larger than the number of
samples, one could expect the latter objective functions to
have an edge over OVR in the small sample regime. However,
while the RNK objective function is an unweighted sum over
triplets, the MUL and WAR objective functions correspond to
weighted sums that give more importance to the top of the list.
In other words, because of the weighting, the effective number
of triplets is much smaller than the actual number of triplets
in the objective function. For instance, in the case of MUL,
only the top triplet has a non-zero weight which means that
the effective number of triplets equals the number of samples.
This may explain why eventually MUL does not have an edge
over OVR in the small sample regime. Similarly, the WAR
weighting can be viewed as a smooth interpolation between
the MUL weighting and the RNK weighting. Although the
WAR weighting – which is inversely proportional to the triplet
rank – makes sense for our classification goal, it is by no
means guaranteed to be optimal. While Usunier et al. propose
alternative weight profiles (see section 6 in [75]), we believe
that such weights should be learned from the data in order
to give full strength to the WAR objective function. However,
it is not straightforward to devise a method to learn these
weights from the data. Furthermore, from our experiments,
the optimal weighting profile seems to be dependent on the
number of training samples.
Influence of Class Density.
We observed in Fig 5 that MUL can perform poorly,
especially on the very challenging Fungus dataset. Note that
this is the densest of the three fine-grained subsets2.
To better understand the influence of data density, we first
experimented on synthetic data. The data is composed of 2D
2. The density of a dataset [20] is defined as the mean distance between allpairs of categories, where the distance between two categories in a hierarchyis the height of their lowest common ancestor. Small values imply densedatasets and indicate a more challenging recognition problem.
10
1 2 5 10 20 5016
17
18
19
20
21
22
Passes through the data
Top−
1 Ac
cura
cy (i
n %
)
λ = 1e−4, ηt = 1/(λ (t+t0))
λ = 1e−4, ηt = η = 0.1
λ = 0.0, ηt = η = 0.1
(a) Fungus134
1 2 5 10 20 5023
24
25
26
27
28
29
30
31
32
Passes through the data
Top−
1 Ac
cura
cy (i
n %
)
λ = 1e−4, ηt = 1/(λ (t+t0))
λ = 1e−4, ηt = η = 0.1
λ = 0.0, ηt = η = 0.1
(b) Ungulate183
1 2 5 10 20 50
28
30
32
34
36
38
40
42
44
Passes through the data
Top−
1 Ac
cura
cy (i
n %
)
λ = 1e−5, ηt = 1/(λ (t+t0))
λ = 1e−5, ηt = η = 0.1
λ = 0.0, ηt = η = 0.1
(c) Vehicle262
Fig. 4. Impact of regularization on w-OVR. Results on Fungus, Ungulate, Vehicle using 130K-dim FVs. One pass here
means seeing all positives for a given class + (on average) β times more negatives.
10 25 50 1004
5
6
7
8
9
10
11
12
13
Influence of number of training images with all SGD methods on Fungus
Number of Images N
Top−1Accuracy
(in%)
w−OVRMULRNKWAR
(a) Fungus
10 25 50 1004
5
6
7
8
9
10
11
12
13
14
15
Influence of number of training images with all SGD methods on Ungulate
Number of Images N
To
p−
1A
ccu
racy
(in
%)
w−OVRMULRNKWAR
(b) Ungulate
10 25 50 1008
10
12
14
16
18
20
22
Influence of number of training images with all SGD methods on Vehicle
Number of Images N
Top−1Accuracy
(in%)
w−OVRMULRNKWAR
(c) Vehicle
Fig. 5. Comparison of four objective functions (w-OVR, MUL, RNK and WAR) as a function of the number of training
images on Fungus, Ungulate, Vehicle using 4,096-dim BOV
Fig. 9. ImageNet10K results (top-1 accuracy in %) obtained with w-OVR and 130K-dim Fisher vectors. (a-d) Sample
classes from the best performing ones until the ones with accuracy 50%. (e-h) Sample classes with performance
around 25% until the worst performing ones.
0 50 1000
500
1000
1500
2000
Top−1 Accuracy (%)
Nu
mb
er
of cl
ass
es
(a)
0 50 1000
0.5
1
Pe
rce
nta
ge
of cl
ass
es
Top−1 Accuracy in (%)
(b)
Fig. 10. Top-1 accuracies on ImageNet10K. Left: his-
togram of the top-1 accuracies of all the 10K classes.
Right: percentage of classes whose top-1 accuracy is
above a threshold.
Weston et al., who used a different ImageNet subset in
their experiments, show that WAR outperforms OVR [82] on
BOV descriptors, where their OVR baseline did not do any
reweighting of the positives/negatives, i.e. it is similar to our
u-OVR. We also observed that WAR significantly outperforms
u-OVR. However, we show that w-OVR performs significantly
better than u-OVR and slightly better than WAR.
As for [66], Sanchez and Perronnin use w-OVR with the
natural rebalancing (β = 1). We show that selecting β by
cross-validation can have a significant impact on accuracy:
using the same features, we improve their baseline by an
absolute 2.4%, from 16.7% to 19.1%. It is interesting to note
that while rebalancing the data has little impact on the 130K-
dim FV on ILSVRC 2010, it has a significant impact on
ImageNet10K. This is not in contradiction with our previous
statement that different objective functions perform similarly
on high-dimensional features. We believe this is because there
is no such thing as “high-dimensional” features. Features are
only high-dimensional with respect to the complexity of the
problem and especially the number of classes. While 130K-
dim is high-dimensional with respect to the 1K categories
of ILSVRC 2010, it is not high-dimensional anymore with
respect to the 10K categories of ImageNet10K.
Le et al. [42] and Krizhevsky et al. [40] also report results
on the same subset of 10K classes. They report respectively
a top-1 per-image accuracy of 19.2% and 32.6%3. Our per-
image accuracy is 21.0%. However, we did not optimize for
this metric which might lead to better results. Note that our
study is not about comparing FVs to features learned with
deep architectures. We could have used the learned features
of [42], [40] in our study if they had been available, i.e., we
could use as features the output of any of the intermediate
3. While it is standard practice to report per-class accuracy on this dataset(see [20], [66]), [42], [40] report a per-image accuracy. This results in a moreoptimistic number since those classes which are over-represented in the testdata also have more training samples and therefore have (on average) a higheraccuracy than those classes which are under-represented. This was clarifiedthrough a personal correspondence with the first authors of [42], [40].
13
layers.
Timing for ImageNet10K for 130K-dim FVs. For the com-
putation we used a small cluster of machines with 16 CPUs
and 32GB of RAM. The feature extraction step (including
SIFT description and FV computation) took approx. 250 CPU
days, the learning of the w-OVR SVM approx. 400 CPU days
and the learning of the WAR SVM approx. 500 CPU days.
Note that w-OVR performs slightly better than WAR and is
much easier to parallelize since the classifiers can be learned
independently.
6 CONCLUSION
In this work, we have studied visual classification on a large-
scale, i.e. when we have to deal with a large number of classes,
a large number of images and high dimensional features.
Two main conclusions have emerged from our work. The
first one is that, despite its theoretical suboptimality, one-vs-
rest is a very competitive training strategy to learn SVMs.
Furthermore, one-vs-rest SVMs are easy to implement and to
parallelize, e.g. by training the different classifiers on multiple
machines/cores. However, to obtain state-of-the-art results,
properly cross-validating the imbalance between positive and
negative samples is a must. The second major conclusion
is that stochastic training is very well suited to our large-
scale setting. Moreover simple strategies such as implicit
regularization with early stopping and fixed-step-size updates
work well in practice. Following these good practices, we were
able to improve the state-of-the-art on a large subset of 10K
classes and 9M images from 16.7% Top-1 accuracy to 19.1%.
Acknowledgements. The INRIA LEAR team acknowledges
financial support from the QUAERO project supported by
OSEO, the French State agency for innovation, the European
integrated project AXES, the ERC grant ALLEGRO and the
GARGANTUA project funded by the Mastodons program of
CNRS. The authors wish to warmly thank Matthijs Douze
and Mattis Paulin for providing a public implementation of
the online learning code described in this article.
REFERENCES
[1] E. L. Allwein, R. E. Schapire, and Y. Singer. Reducing multiclass tobinary: A unifying approach for margin classifiers. In ICML, 2000. 3
[2] B. Bai, J. Weston, D. Grangier, R. Collobert, O. Chapelle, and K. Wein-berger. Supervised semantic indexing. In CIKM, 2009. 6
[3] P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classifica-tion, and risk bounds. In NIPS, 2003. 11
[4] S. Bengio, J. Weston, and D. Grangier. Label embedding trees for largemulti-class tasks. In NIPS, 2010. 1, 3
[5] Y. Bengio, A. Courville, and P. Vincent. Representation learning: areview and new perspectives. 3
[6] A. Berg, J. Deng, and L. Fei-Fei. ILSVRC 2010. http://www.image-net.org/challenges/LSVRC/2010/index. 1, 6, 11
[7] A. Bergamo, L. Torresani, and A. Fitzgibbon. PICODES: Learning acompact code for novel-category recognition. In NIPS, 2011. 3
[8] A. Beygelzimer, V. Dani, T. P. Hayes, J. Langford, and B. Zadrozny.Error limiting reductions between classification tasks. In ICML, 2005.3
[9] A. Bordes, L. Bottou, P. Gallinari, and J. Weston. Solving multiclasssupport vector machines with LaRank. In ICML, 2007. 1
[10] L. Bottou. SGD. http://leon.bottou.org/projects/sgd. 4, 5, 8[11] L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In
NIPS, 2007. 1, 5, 7
[12] Y.-L. Bourreau, F. Bach, Y. LeCun, and J. Ponce. Learning mid-levelfeatures for recognition. In CVPR, 2010. 2
[13] G. Burghouts and J.-M. Geusebroek. Performance evaluation of localcolour invariants. CVIU, 2009. 2
[14] P. K. Chan and S. J. Stolfo. On the accuracy of meta-learning forscalable data mining. JIIS, 1997. 3
[15] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vectormachines. ACM TIST, 2011. Software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm. 4, 7
[16] K. Chatfield, V. Lempitsky, A. Vedaldi, and A. Zisserman. The devilis in the details: an evaluation of recent feature encoding methods. InBMVC, 2011. 3, 7
[17] K. Crammer and Y. Singer. On the algorithmic implementation ofmulticlass kernel-based vector machines. JMLR, 2002. 1, 3, 4
[18] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visualcategorization with bags of keypoints. In ECCV SLCV workshop, 2004.2
[19] J. Dean, G. Corrado, R. Monga, M. Devin, Q. Le, M. Mao, M. Ranzato,A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deepnetworks. In NIPS, 2012. 3
[20] J. Deng, A. Berg, K. Li, and L. Fei-Fei. What does classifying morethan 10,000 image categories tell us? In ECCV, 2010. 1, 3, 6, 7, 9, 11,12
[21] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet:A large-scale hierarchical image database. In CVPR, 2009. 1, 2
[22] J. Deng, S. Satheesh, A. Berg, and L. Fei-Fei. Fast and balanced:efficient label tree learning for large scale object recognition. In NIPS,2011. 1
[23] T. Deselaers and V. Ferrari. Visual and semantic similarity in imagenet.In CVPR, 2011. 7
[24] T. G. Dietterich and G. Bakiri. Solving multiclass learning problemsvia error-correcting output codes. JAIR, 1995. 3
[25] M. Everingham, L. V. Gool, C. Williams, J. Winn, and A. Zisserman.The pascal visual object classes (VOC) challenge. IJCV, 2010. 2, 9
[26] R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.LIBLINEAR: A library for large linear classification. JMLR, 2008. 7
[27] J. Farquhar, S. Szedmak, H. Meng, and J. Shawe-Taylor. Improvingbag-of-keypoints image categorisation. Technical report, University ofSouthampton, 2005. 2
[28] V. Franc and S. Sonnenburg. Optimized cutting plane algorithm forsupport vector machines. In ICML, 2008. 4
[29] T. Gao and D. Koller. Discriminative learning of relaxed hierarchy forlarge-scale visual recognition. In ICCV, 2011. 3
[30] J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest - a frameworkfor fast decision tree construction of large datasets. DMKD, 2000. 3
[31] Y. Gong and S. Lazebnik. Comparing data-dependent and data-independent embeddings for classification and ranking of internet im-ages. In CVPR, 2011. 2
[32] D. Grangier, F. Monay, and S. Bengio. A discriminative approach forthe retrieval of images from text queries. In ECML, 2006. 3
[33] C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan.A dual coordinate descent method for large-scale linear svm. In ICML,2008. 4
[34] K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is thebest multi-stage architecture for object recognition? In ICCV, 2009. 3
[35] H. Jegou, M. Douze, and C. Schmid. Product quantization for nearestneighbor search. IEEE TPAMI, 2011. 3, 7
[36] H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid.Aggregating local image descriptors into compact codes. IEEE TPAMI,accepted. 3
[37] T. Joachims. Making large-scale support vector machine learningpractical. In Advances in kernel methods. 1999. 4
[38] T. Joachims. Optimizing search engines using clickthrough data. InACM SIGKDD, pages 133–142. ACM, 2002. 1, 3, 4
[39] T. Joachims. Training linear svms in linear time. In ACM SIGKDD,2006. 7
[40] A. Krizhevsky, I. Sutskever, and G. Hinton. Image classification withdeep convolutional neural networks. In NIPS, 2012. 2, 3, 12
[41] S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatialpyramid matching for recognizing natural scene categories. In CVPR,2006. 2
[42] Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean,and A. Ng. Building high-level features using large scale unsupervisedlearning. In ICML, 2012. 2, 3, 12
[43] Y. LeCun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard,and L. Jackel. Handwritten digit recognition with a back-propagationnetwork. In NIPS, 1989. 3
[44] Y. LeCun, L. Bottou, G. Orr, and K. Muller. Efficient backprop. InNeural Networks: Tricks of the trade. Springer, 1998. 1
[45] Y. LeCun, F. Huang, and L. Bottou. Learning methods for generic object
recognition with invariance to pose and lighting. In CVPR, 2004. 3[46] T. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines:
Theory and application to the classification of microarray data andsatellite radiance data. JASA, 2004. 3
[47] L. Li, H. Su, E. Xing, and L. Fei-Fei. Object bank: A high-level imagerepresentation for scene classification and semantic feature sparsifica-tion. In NIPS, 2010. 3
[48] Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, and T. Huang.Large-scale image classification: Fast feature extraction and SVM train-ing. In CVPR, 2011. 1, 2, 3, 5, 11
[49] Y. Lin, F. Lv, S. Zhu, K. Yu, M. Yang, and T. Cour. Large-scale imageclassification: fast feature extraction and svm training. In CVPR, 2011.3
[50] D. G. Lowe. Distinctive image features from scale-invariant keypoints.IJCV, 2004. 2, 7
[51] S. Maji and A. Berg. Max-margin additive classifiers for detection. InICCV, 2009. 2
[52] M. Marszalek and C. Schmid. Constructing category hierarchies forvisual recognition. In ECCV, 2008. 3
[53] M. Mehta, R. Agrawal, and J. Rissanen. Sliq: A fast scalable classifierfor data mining. In EDBT, 1996. 3
[54] S. Nowozin and C. Lampert. Structured Learning and Prediction in
Computer Vision. Foundations and Trends in Computer Graphics andVision, 2011. 1
[55] F. Perronnin, Z. Akata, Z. Harchaoui, and C. Schmid. Towards goodpractice in large-scale learning for image classification. In CVPR, 2012.2, 11
[56] F. Perronnin and C. Dance. Fisher kernels on visual vocabularies forimage categorization. In CVPR, 2007. 2
[57] F. Perronnin, J. Sanchez, and Y. Liu. Large-scale image categorizationwith explicit data embedding. In CVPR, 2010. 2, 3, 5
[58] F. Perronnin, J. Sanchez, and T. Mensink. Improving the Fisher kernelfor large-scale image classification. In ECCV, 2010. 1, 5
[59] J. C. Platt. Fast training of support vector machines using sequentialminimal optimization. In Advances in kernel methods. 1999. 3
[60] M. Rastegari, C. Fang, and L. Torresani. Scalable object-class retrievalwith approximate and top-k ranking. In ICCV, 2011. 3
[61] R. Rifkin and A. Klautau. In defense of one-vs-all classification. JMLR,2004. 3
[62] M. Rohrbach, M. Stark, and B.Schiele. Evaluating knowledge transferand zero-shot learning in a large-scale setting. In CVPR, 2011. 1, 3, 5
[63] L. Rokach and O. Maimon. Top-down induction of decision treesclassifiers - a survey. IEEE TSMC, 2005. 3
[64] B. C. Russell, A. Torralba, K. P. Murphy, and W. T. Freeman. Labelme:A database and web-based tool for image annotation. IJCV, 2008. 2
[65] S. L. Salzberg. On comparing classifiers: Pitfalls toavoid and arecommended approach. DMKD. 9
[66] J. Sanchez and F. Perronnin. High-dimensional signature compressionfor large-scale image classification. In CVPR, 2011. 1, 3, 5, 6, 7, 11,12
[67] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimatesub-gradient solver for SVM. In ICML, 2007. 1, 4, 5, 7, 8
[68] D. J. Sheskin. Handbook of Parametric and Nonparametric Statistical
Procedures. Chapman & Hall/CRC, 4 edition, 2007. 9[69] J. Sivic and A. Zisserman. Video google: A text retrieval approach to
object matching in videos. In ICCV, 2003. 2[70] D. Tao, X. Tang, X. Li, and X. Wu. Asymmetric bagging and random
subspace for support vector machines-based relevance feedback in imageretrieval. IEEE Trans. Pattern Anal. Mach. Intell., 28(7), July 2006. 7
[71] A. Tewari and P. L. Bartlett. On the Consistency of MulticlassClassification Methods. JMLR, 2007. 3, 4
[72] A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: a largedataset for non-parametric object and scene recognition. IEEE TPAMI,2008. 1, 2, 3
[73] L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object categoryrecognition using classemes. In ECCV, 2010. 3
[74] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large marginmethods for structured and interdependent output variables. JMLR, 2005.4
[75] N. Usunier, D. Buffoni, and P. Gallinari. Ranking with ordered weightedpairwise classification. In ICML, 2009. 1, 3, 4, 5, 9
[76] J. C. van Gemert, C. J. Veenman, A. W. M. Smeulders, and J. M.Geusebroek. Visual word ambiguity. IEEE TPAMI, 2010. 2
[77] A. Vedaldi and A. Zisserman. Efficient additive kernels via explicitfeature maps. In CVPR, 2010. 2
[78] A. Vedaldi and A. Zisserman. Sparse kernel approximations for efficientclassification and detection. In CVPR, 2012. 3
[79] V. Vural and J. G. Dy. A hierarchical method for multi-class supportvector machines. In ICML, 2004. 3
[80] G. Wang, D. Hoiem, and D. Forsyth. Learning image similarity from
Flickr groups using stochastic intersection kernel machines. In ICCV,2009. 1, 3
[81] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong. Locality-constrained linear coding for image classification. In CVPR, 2010. 2
[82] J. Weston, S. Bengio, and N. Usunier. Large scale image annotation:Learning to rank with joint word-image embeddings. ECML, 2010. 1,3, 5, 6, 7, 11, 12
[83] J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to largevocabulary image annotation. In IJCAI, 2011. 3
[84] J. Weston and C. Watkins. Multi-class support vector machines.Technical report, Department of Computer Science, Royal Holloway,University of London, 1998. 4
[85] J. Weston and C. Watkins. Support vector machines for multi-classpattern recognition. In ESANN, 1999. 3
[86] J. Xu, T. Liu, M. Lu, H. Li, and W. Ma. Directly optimizing evaluationmeasures in learning to rank. In SIGIR, 2008. 1, 5
[87] J. Yang, K. Yu, Y. Gong, and T. S. Huang. Linear spatial pyramidmatching using sparse coding for image classification. In CVPR, 2009.2
[88] Y. Yue, T. Finley, F. Radlinski, and T. Joachims. A support vectormethod for optimizing average precision. In SIGIR, 2007. 1, 5
[89] B. Zhao, L. Fei-Fei, and E. Xing. Large-scale category structure awareimage categorization. In NIPS, 2011. 1
[90] Z. Zhou, K. Yu, T. Zhang, and T. Huang. Image classification usingsuper-vector coding of local image descriptors. In ECCV, 2010. 3
Zeynep Akata received her MSc degree in 2010from the Media Informatics department of RWTHAachen University, Germany. She is currentlyworking on her PhD in the CV team of the XeroxResearch Centre Europe and the LEAR team atINRIA Grenoble. She is working on image classi-fication for large scale datasets and is interestedin learning methods for classification. She is astudent member of IEEE.
Florent Perronnin holds an Engineering de-gree from the Ecole Nationale Superieure desTelecommunications and a Ph.D. degree fromthe Ecole Polytechnique Federale de Lausanne.In 2005, he joined the Xerox Research CentreEurope in Grenoble where he currently managesthe CV team. His main interests are in the ap-plication of machine learning to computer visiontasks such as image classification, retrieval orsegmentation.
Zaid Harchaoui graduated from the Ecole Na-tionale Superieure des Mines, France, in 2004,and obtained his Ph.D. degree from ParisTech,Paris, France. Since 2010, he is a permanentresearcher in the LEAR team, INRIA Grenoble,France. His research interests include statisticalmachine learning, kernel-based methods, signalprocessing, and computer vision.
Cordelia Schmid holds a M.S. degree in com-puter science from the University of Karlsruheand a doctorate from the Institut National Poly-technique de Grenoble. She is a research di-rector at INRIA Grenoble where she directs theLEAR team. In 2006, she was awarded theLonguet-Higgins prize for fundamental contribu-tions in computer vision that have withstood thetest of time. In 2012, she obtained an ERCadvanced grant for ”Active large-scale learningfor visual recognition”. She is a fellow of IEEE.