This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Pattern Recognition 75 (2018) 302–314
Contents lists available at ScienceDirect
Pattern Recognition
journal homepage: www.elsevier.com/locate/patcog
OPML: A one-pass closed-form solution for online metric learning
Wenbin Li a , Yang Gao
a , ∗, Lei Wang
b , Luping Zhou
b , Jing Huo
a , Yinghuan Shi a
a National Key Laboratory for Novel Software Technology, Nanjing University, China b School of Computing and Information Technology, University of Wollongong, Australia
a r t i c l e i n f o
Article history:
Received 28 September 2016
Revised 12 January 2017
Accepted 8 March 2017
Available online 9 March 2017
Keywords:
One-pass
Online metric learning
Triplet construction
Face verification
Abnormal event detection
a b s t r a c t
To achieve a low computational cost when performing online metric learning for large-scale data, we
present a one-pass closed-form solution namely OPML in this paper. Typically, the proposed OPML first
adopts a one-pass triplet construction strategy, which aims to use only a very small number of triplets
to approximate the representation ability of whole original triplets obtained by batch-manner methods.
Then, OPML employs a closed-form solution to update the metric for new coming samples, which leads
to a low space (i.e., O ( d )) and time (i.e., O ( d 2 )) complexity, where d is the feature dimensionality. In addi-
tion, an extension of OPML (namely COPML) is further proposed to enhance the robustness when in real
case the first several samples come from the same class (i.e., cold start problem). In the experiments,
we have systematically evaluated our methods (OPML and COPML) on three typical tasks, including UCI
data classification, face verification, and abnormal event detection in videos, which aims to fully evaluate
the proposed methods on different sample number, different feature dimensionalities and different fea-
ture extraction ways (i.e., hand-crafted and deeply-learned). The results show that OPML and COPML can
obtain the promising performance with a very low computational cost. Also, the effectiveness of COPML
under the cold start setting is experimentally verified.
he rest methods were implemented by ourselves. The parame-
ers of these methods were selected by cross-validation, except
MNN and ITML using the default settings. Moreover, a parame-
er sensitivity experiment is also made to verify the influence of
PML/COPML by changing the value of parameter γ / γ 1 , γ 2 (see
ig. 3 ). For OPML, let γ be 10 −3 seems to be a good choice, which
s indeed chose in all twelve UCI datasets in our experiment. For
OPML, let γ 1 be 10 −4 or 10 −3 seems to be a good choice when
1 ≡ 10 −3 (same setting as OPML for simplicity). Since the pairwise
r triplet constraints of POLA, LEGO and SOML need to be con-
tructed in advance, we randomly sample 10,0 0 0 constraints for
hese three methods (same setting as LEGO [10] ). The error rates
f the proposed methods and competitive methods are presented
n Table 2 .
Moreover, the p -values of student’s t -test were calculated to
heck statistical significance. Also, the statistics of win/tie/loss is
eported according to the obtained p -values (see Table 2 ). It is ob-
erved that (1) the performance of our methods is comparable to
EGO, and slightly better than other OML methods; (2) the perfor-
ance of our methods is close to batch metric learning methods,
.g., LMNN and ITML, and better than Euclidean and Mahalanobis;
E
3) our methods are faster than other OML methods except compa-
able with RDML, since instead of constructing triplets, RDML only
equires the pairwise constraint by receiving a pair of samples in
ach time.
To illustrate the performance with different numbers of triplet
onstraints on the learning of metric, we vary the numbers of
riplet constraints as (100, 1000, 2000, 5000, 10000, 15000, 20000)
or OASIS and SOML (see Fig. 2 ). Since the number of triplet con-
traint in OPML is a constant by using one-pass triplet construc-
ion, we can find that, OPML can achieve better performance by
sing fewer triplet constraints (except on UCI data 1, 3, 6).
.2. Face verification: PubFig
Face verification has become a popular application for evalu-
ting the performance of metric learning methods. Many metric
earning methods [24–26] have been proposed for this practical
pplication, even including deep metric learning methods [27–29] .
or face verification, we first evaluate our methods on the Pub-
ic Figures Face Database (PubFig) [30] . PubFig dataset consists of
wo subsets: Development Set (7650 images of 60 individuals) and
valuation Set (28954 images of 140 individuals). Following [30] ,
308 W. Li et al. / Pattern Recognition 75 (2018) 302–314
0 100 1000 2000 5000 10000 15000 20000 2500012
14
16
18
20
22
24
26
28
30
x=59
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 1
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 250000
10
20
30
40
50
60
x=72
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 2
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 250000
5
10
15
20
25
30
x=87
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 3
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 2500030
35
40
45
50
55
60
65
70
75
x=103
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 4
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 2500025
30
35
40
45
50
x=132
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 5
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 250005
10
15
20
25
30
35
40
45
50
x=172
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 6
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 2500010
15
20
25
30
35
40
45
50
55
x=310
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 7
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 25000−2
0
2
4
6
8
10
12
14
x=340
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 8
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 2500020
25
30
35
40
45
x=380
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 9
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 250000
10
20
30
40
50
60
70
x=1148
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 10
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 2500020
25
30
35
40
45
x=2498
The Number of Triplet
Err
or
Rat
e (%
)UCI Data: 11
OPMLOASISSOML
0 100 1000 2000 5000 10000 15000 20000 250000
5
10
15
20
25
30
35
x=2801
The Number of Triplet
Err
or
Rat
e (%
)
UCI Data: 12
OPMLOASISSOML
Fig. 2. Error rates of different methods with different numbers of triplets on twelve UCI datasets (the number of triplets in OPML is a constant).
Fig. 3. Error rates of OPML/COPML on the UCI datasets by changing the value of parameter γ / γ 1 , γ 2 (the last three datasets are specially constructed for cold start issue in
5.5 ).
(
t
d
a
t
p
d
d
o
we use the development set to develop all these methods, includ-
ing parameters tuning, while the evaluation set is used for per-
formance evaluation. The goal of face verification in PubFig is to
determine whether a pair of face images belong to the same per-
son. Please note that, images coming from the same person will
be regarded as belonging to the same class. For all subsets, 10-
fold cross validation is adopted to conduct the experiments, and
each fold is disjoint by identity (i.e., one person will not appear
in both the training and testing set). For testing each fold (with
rest 9 folds used for training), we randomly sample 10,0 0 0 pairs
50 0 0 intra- and 50 0 0 extra-personal pairs) for testing. Thus, the
otal number of pairs is 10 5 . In each training phase, we also ran-
omly select 10,0 0 0 pairwise or triplet constraints for LEGO, POLA
nd SOML as the same settings on the UCI datasets.
For sufficient and fair comparison, we use two forms of fea-
ures (i.e., attribute features and deep features) to evaluate the
erformance of all algorithms, respectively. Attribute features (73-
imension) provided by Kumar et al. [30] are ’high-level’ features
escribing nameable attributes such as gender, race, age, hair etc.,
f a face image. For deep features, we use a VGG-Face model
W. Li et al. / Pattern Recognition 75 (2018) 302–314 309
Fig. 4. ROC Curves of development set (left column) and evaluation set (right column) on the PubFig dataset. first row: attribute features; second row: deep features. AUC
value of each method is presented in bracket.
[
w
4
b
i
t
i
a
R
c
l
f
m
5
b
f
c
v
t
t
T
a
I
t
t
o
m
a
a
t
(
r
o
t
W
f
e
o
f
s
[
d
r
d
7
t
t
c
d
c
t
31] to extract a 4096-dimensional feature for each face image
hich has been aligned and cropped. For easier handling, the
096-dimensional feature is reduced to a 54-dimensional feature
y Principal Component Analysis (PCA) algorithm.
For each testing pair, we first calculate the distance (similar-
ty) between them by the learned metric obtained from respec-
ive methods. Then, all the distances (similarities) are normalized
nto the range [0, 1]. Receiver Operating Characteristic (ROC) curves
re provided in Fig. 4 , with the corresponding AUC (Area under
OC) values calculated. It can be observed that OPML and COPML
an obtain superior results compared with the state-of-the-art on-
ine/batch metric learning methods. Moreover, although the deep
eature already has a strong representation ability, our proposed
ethods can still slightly improve the performance.
.3. Face verification: LFW
For face verification, we also evaluate our methods on the La-
eled Faces in the Wild Database (LFW) [32] . LFW is a widely used
ace verification benchmark with unconstrained images, which
ontains 13,233 images of 5749 individuals. This dataset has two
iews: View 1 is used for development purposes (containing a
raining set and a test set); And, View 2 is taken as evalua-
ion benchmark for comparison (i.e., a 10-fold cross-validation set).
here are two forms of configuration in both views, that is, im-
ge restricted configuration and image unrestricted configuration.
n the first formulation, the training information is restricted to
he provided image pairs and additional information such as ac-
ual name information can not be used. In other words, we can
nly use the pairwise images for training without any label infor-
ation can be used at all. While, in the second formulation, the
ctual name information (i.e., label information) can be used and
s many pairs or triplets can be formulated as one desires. No mat-
er which configuration we choose, the test procedure is the same
i.e., using pairwise images for testing). In order to simulate the
eal online environment and because our methods and some meth-
ds (e.g., OASIS [6] , SOML [7] ) are triplet-based methods, we adopt
he image unrestricted configuration to construct the experiment.
e use View 1 for parameter tuning and then evaluate the per-
ormance of all the algorithms on each fold (300 intra- and 300
xtra-personal pairs) in View 2. Other settings are similar with the
nes on the PubFig dataset.
In this experiment, we adopt two types of features (i.e., SIFT
eatures and attribute features) to represent each face image, re-
pectively. The SIFT features are provided by Guillaumin et al.
33] by extracting SIFT descriptors [34] at 9 fixed facial landmarks
etected on a face, over three scales. Then we perform PCA algo-
ithm to reduce the original 3456-dimensional feature to a 100-
imensional feature. Like PubFig, the attribute features of LFW are
3-dimensional ’high-level’ features describing the nameable at-
ributes of a face image [30] . To evaluate our methods and the con-
rastive methods, we report the ROC curves and AUC values of the
orresponding methods (see Fig. 5 ). The results of ITML [12] aren’t
isplayed for its difficulty of convergence in the training data. We
an see that the proposed COPML method can achieve the-state-of-
he-art performance compared with the contrastive metric learning
310 W. Li et al. / Pattern Recognition 75 (2018) 302–314
Fig. 5. ROC Curves of our methods and contrastive methods on the LFW dataset. left: sift features; right: attribute features. AUC value of each method is presented in
methods. Especially, when using SIFT features, our methods can
significantly improve the AUC value over the Euclidean distance
by 13% (5.8% with attribute features), showing the validity of the
proposed methods. It is worth noting that some metric learning
methods cannot even improve over the Euclidean distance, which
has happened on the PubFig dataset. The reason why LMNN can-
not achieve the best performance may be over-fitting for lacking of
regularization.
5.4. Abnormal event detection in videos
The performance of the proposed methods is also evaluated on
UMN dataset for abnormal event detection. UMN dataset contains
3 different scenes with 7739 frames in total: Scene1 (1453 frames),
Scene2 (4144 frames) and Scene3 (2142 frames). In UMN dataset,
people walking around is considered as normal, while people run-
ning away is regarded as abnormal. The resolution of the video is
320 × 240. We divide each frame into 5 × 4 non-overlapping 64
× 60 patches. For each patch, the MHOF (Multi-scale Histogram
of Optical Flow) feature [35] was extracted from every two suc-
cessive frames. The MHOF is a 16-dimensional feature, which can
capture both motion direction and motion energy. For integrating
the multi-patches features, we combine features from all patches
in each frame, and form a 320-dimensional feature. For each scene,
we perform 2-fold cross validation for evaluation. The distance
metric is learnt from the training data in online way, then we use
the SVM classifier to classify the testing frames after feature trans-
formation by using the learned metric L .
Table 3 reports the AUC of all the methods. We can notice
that our methods is very effective and competitive, when com-
pared with other methods. Fig. 6 exhibits the sample frames of
normal and abnormal events in the 3 scenes respectively (top
row), and shows the abnormal event detection results of our
method (COPML) in the indication bars (green/red indicates nor-
mal/abnormal event). It’s worth mentioning that in this experi-
ment, COPML performs better than OPML, because the video data
has the cold start issue especially at the beginning.
5.5. COPML For cold start
We can observe that in the case free of the cold start issue (e.g.,
UCI data classification, face verification), OPML and COPML can ob-
tain comparable results, while in the case with cold start issue
(e.g., abnormal event detection in videos), COPML is better than
OPML. To further test the performance of COPML on an extreme
ase with cold start issue, we construct several datasets with spec-
fied structure to verify the different performance of COPML and
PML. Three datasets were picked from the UCI repository: (1) Im-
ge Segmentation (seg for short), with 7 classes, 19 features and
310 samples; (2) EEG Eye State (eeg for short), with 2 classes, 15
eatures and 14,980 samples; (3) Sensorless (sen for short), with 11
lasses, 49 features and 58,509 samples.
For each dataset, the samples from different classes are divided
nto disjoint 10/5/2 parts, then different parts of different classes
re crosswise put together to construct a new dataset. Afterwards,
he new dataset is divided into 2 folds. The first fold is used for
raining and the second fold is used for testing. As the previous
etting for classification, we take a k -NN (k = 5) classifier to get the
nal test results, shown in Table 4 . The results prove that COPML
W. Li et al. / Pattern Recognition 75 (2018) 302–314 311
GroundtruthOur Results
Scene1 Scene3Scene2
Scene1 Abnormal Scene2 Abnormal Scene3 AbnormalScene1 Normal Scene2 Normal Scene3 Normal
Fig. 6. Global abnormal event detection results of our method COPML and the ground truth on the UMN dataset. (For interpretation of the references to colour in this figure
legend, the reader is referred to the web version of this article.)
p
S
a
6
O
g
m
c
T
e
l
p
m
m
A
6
(
(
S
A
P
L
A
w
o
l
L
N
i
t
(
r
c
L
L
s
t
t
p
L
T
‖
R
x
d
‖
w
e
λ
i
λ
r
a
erforms better than OPML when the data has the cold start issue.
ince when cold start occurs, COPML will incorporate both the pair
nd triplet information, instead of only using triplet in OPML.
. Conclusion
We propose a one-pass closed-form solution for OML, namely
PML. It employs the one-pass triplet construction for fast triplet
eneration, together with a closed-form solution to update the
etric with the new coming sample at each time step. Also, for
old start issue, COPML, an extended version of OPML is developed.
he major advantages of our methods are: OPML and COPML are
asy to implement. Also, OPML and COPML are very scalable with
ow space (i.e., O ( d )) and time (i.e., O ( d 2 )) complexity. In the ex-
eriments, we show that our methods can obtain superior perfor-
ance on three typical tasks, compared with the state-of-the-art
ethods.
cknowledgments
This work is supported by the National NSF of China (Nos.
1432008 , 61673203 , U1435214 ), Australian Research Council (ARC)
No. DE160100241 ), Primary R&D Plan of Jiangsu Province, China
No. BE2015213), and the Collaborative Innovation Center of Novel
oftware Technology and Industrialization.
ppendix A. Proof of Theorem 2
roof. Recall that the metric update formula of OPML is
t =
{
L t−1 (I + γ A t ) −1 [ z] + > 0
L t−1 [ z] + = 0 .
(A.1)
ccording to the Theorem 1 , we can obtain that,
(I + γ A t ) −1 = I − 1
η + β[ ηγ A t − (γ A t )
2 ] , (A.2)
here η = 1 + tr(γ A t ) , β =
1 2 [(tr(γ A t ))
2 − tr(γ A t ) 2 ] . Here, we
nly consider the case that [ z] + > 0 . Then at t th time step, the
earned metric L t of one-pass strategy can be expressed as below,
t = L 0 (I + γ A 1 ) −1 (I + γ A 2 )
−1 · · · (I + γ A t ) −1 . (A.3)
ote that the batch triplet construction strategy here is considered
n an online manner, that is to say, for each sample x t at the t th
ime step, all past samples are stored to construct a triplet with x t i.e., each triplet contains this x t ). Similar to L t , the learned met-
ic L ∗t of the batch strategy (at t th time step, C i | t i =1 triplets can be
onstructed) can be denoted as follows,
∗t = L ∗0
C 1 ∏
i =1
(I + γ A 1 i ) −1
C 2 ∏
i =1
(I + γ A 2 i ) −1 · · ·
C t ∏
i =1
(I + γ A t i ) −1 . (A.4)
et 〈 x 1 , x p 1 , x q 1 〉 , . . . , 〈 x t , x p t , x q t 〉 be the sequence of triplets con-
tructed by the proposed one-pass strategy, which is contained in
he sequence of triplets constructed by the batch strategy. If we let
he L ∗ learn on the sequence of triplets constructed by the one-
ass strategy first, the Eq. (A.4) can be reorganized as below,
∗t = L ∗0 (I + γ A 1 )
−1 · · · (I + γ A t ) −1 ·
C 1 + ···+ C t −t ∏
i =1
(I + γ A i ) −1
(L ∗t learn on the sequence of L t first )
= L t ·C 1 + ···+ C t −t ∏
i =1
(I + γ A i ) −1
(L 0 and L ∗0 are both initialized as identity matrices )
= L t ·C 1 + ···+ C t −t ∏
i =1
(I + B )
(by Theorem 1, where B = 1
η+ β
[ (γ A i )
2 −ηγ A i
] )= L t
[ I +
C N ∑
i =1
B i +
C N ∑
i =1 , j=1 ,i< j
B i B j + · · · +
C N ∏
i =1
B i
] ( where C N = C 1 + · · · + C t − t) .
(A.5)
hen we can calculate that
L t − L ∗t ‖ F =
∥∥∥L t
[ C N ∑
i =1
B i +
C N ∑
i =1 , j=1 ,i< j
B i B j + · · · +
C N ∏
i =1
B i
] ∥∥∥F
≤ ‖ L t ‖ F ·∥∥∥ C N ∑
i =1
B i +
C N ∑
i =1 , j=1 ,i< j
B i B j + · · · +
C N ∏
i =1
B i
∥∥∥F .
(A.6)
ecall that A t = M 1 − M 2 = (x t − x p )(x t − x p ) T − (x t − x q )(x t − q ) T ∈ R
d×d , which is a symmetry square matrix. According to the
efinition of Frobenius norm,
A t ‖ F =
√
d ∑
i =1
d ∑
i =1
| a i j | 2 =
√
d ∑
i =1
σ 2 i , (A.7)
here σ i are the singular values of A t , which are equal to the
igenvalues of A t . According to Lemma 1, −λmax (M 2 ) ≤ λ(A t ) ≤max (M 1 ) , where λ( A t ) denotes the eigenvalue of A t , and λmax (M)
ndicates the maximum eigenvalue of M . Assuming ‖ x t ‖ 2 ≤ 1,
max (M 1 ) and λmax (M 2 ) belong to the range of [0, 4]. Since the
ank of A t is 2 (which has been proved in Section 3.2 ), there are
t most two nonzero eigenvalues. Thus we can easily obtain that
2 2 +2 α‖ L ∗x t ‖ · ‖ L ∗x p i ‖ + 2 ξ‖ L ∗x t ‖ · ‖ L ∗x q i ‖ .
(B.8)
According to the property of compatible norms, that is,
Ax ‖ 2 ≤ ‖ A ‖ F · ‖ x ‖ 2 . (B.9)
or assuming that ‖ x ‖ 2 ≤ R (for all samples), ‖ L ‖ F ≤ U and ‖ L ∗‖ FU , we can obtain that,
� ≤ 2(α + ξ + 1) R
2 U
2
�1 − �2 ≤ 2(α + ξ + 1) R
2 U
2 . (B.10)
hus, this theorem has be proved. �
ppendix C. Proof of Theorem 4
roof. The regret can be defined (according to the definition of
hapter 3 in [41] ) as below:
(L ∗, T ) =
T ∑
t=1
G t (L t ) −T ∑
t=1
G t (L ∗) , (C.1)
W. Li et al. / Pattern Recognition 75 (2018) 302–314 313
w
o
f
e
T
R
I
t
R
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
here G t (L) = [1 + ‖ L(x t − x p ) ‖ 2 2 − ‖ L(x t − x q ) ‖ 2 2
] + . Here, we also
nly consider the case that the loss is positive, which exactly af-
ects the updating of the metric L . However, one triplet which gen-
rates a positive loss with L may incur a negative loss with L ∗ .
hus, after expanding,
(L ∗, T ) ≤T ∑
t=1
[ ‖ L t (x t − x p ) ‖
2 2 − ‖ L t (x t − x q ) ‖
2 2
−‖ L ∗(x t − x p ) ‖
2 2 + ‖ L ∗(x t − x q ) ‖
2 2
] . (C.2)
n the similar way of proving the Theorem 3 , we can easily prove
his theorem. �
eferences
[1] K.Q. Weinberger , J. Blitzer , L.K. Saul , Distance metric learning for large margin
nearest neighbor classification, in: Advances in Neural Information ProcessingSystems (NIPS), 2005, pp. 1473–1480 .
[2] S. Xiang , F. Nie , C. Zhang , Learning a mahalanobis distance metric for data clus-tering and classification, Pattern Recognit. 41 (12) (2008) 3600–3612 .
[3] C.-C. Chang , A boosting approach for supervised mahalanobis distance metric
learning, Pattern Recognit. 45 (2) (2012) 844–862 . [4] Y. Mu , W. Ding , D. Tao , Local discriminative distance metrics ensemble learn-
ing, Pattern Recognit. 46 (8) (2013) 2337–2349 . [5] T. Mensink , J. Verbeek , F. Perronnin , G. Csurka , Distance-based image classifi-
cation: generalizing to new classes at near-zero cost, IEEE Trans. Pattern Anal.Mach. Intell. 35 (11) (2013) 2624–2637 .
[6] G. Chechik , V. Sharma , U. Shalit , S. Bengio , Large scale online learning of image
similarity through ranking, J. Mach. Learn. Res. 11 (2010) 1109–1135 . [7] X. Gao , S.C. Hoi , Y. Zhang , J. Wan , J. Li , Soml: Sparse online metric learning
with application to image retrieval, in: Proceedings of the Twenty-Eighth AAAIConference on Artificial Intelligence (AAAI), 2014 .
[8] H. Xia , S.C. Hoi , R. Jin , P. Zhao , Online multiple kernel similarity learning forvisual search, IEEE Trans. Pattern Anal. Mach. Intell. 36 (3) (2014) 536–549 .
[9] S. Shalev-Shwartz , Y. Singer , A.Y. Ng , Online and batch learning of pseudo-met-
rics, in: Proceedings of the twenty-first International Conference on MachineLearning (ICML), ACM, 2004, p. 94 .
[10] P. Jain , B. Kulis , I.S. Dhillon , K. Grauman , Online metric learning and fast sim-ilarity search, in: Advances in Neural Information Processing Systems (NIPS),
2009, pp. 761–768 . [11] R. Jin , S. Wang , Y. Zhou , Regularized distance metric learning: Theory and al-
gorithm, in: Advances in Neural Information Processing Systems (NIPS), 2009,pp. 862–870 .
[12] J.V. Davis , B. Kulis , P. Jain , S. Sra , I.S. Dhillon , Information-theoretic metric
learning, in: Proceedings of the 24th International Conference on MachineLearning (ICML), ACM, 2007, pp. 209–216 .
[13] K. Crammer , O. Dekel , J. Keshet , S. Shalev-Shwartz , Y. Singer , Online passive-ag-gressive algorithms, J. Mach. Learn. Res. 7 (2006) 551–585 .
[14] G. Kunapuli , J. Shavlik , Mirror descent for metric learning: a unified approach,in: Machine Learning and Knowledge Discovery in Databases, Springer, 2012,
pp. 859–874 .
[15] K.Q. Weinberger , L.K. Saul , Distance metric learning for large margin nearestneighbor classification, J. Mach. Learn. Res. 10 (2009) 207–244 .
[16] B. Shaw , B. Huang , T. Jebara , Learning a distance metric from a net-work, in: Advances in Neural Information Processing Systems (NIPS), 2011,
pp. 1899–1907 . [17] Q. Qian , R. Jin , J. Yi , L. Zhang , S. Zhu , Efficient distance metric learning by adap-
tive sampling and mini-batch stochastic gradient descent (sgd), Mach. Learn.
99 (3) (2015) 353–372 .
[18] J. Wang , Y. Song , T. Leung , C. Rosenberg , J. Wang , J. Philbin , B. Chen , Y. Wu ,Learning fine-grained image similarity with deep ranking, in: Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, 2014,pp. 1386–1393 .
[19] C. Stauffer , W.E.L. Grimson , Learning patterns of activity using real-time track-ing, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (20 0 0) 747–757 .
20] Y. Shi , Y. Gao , R. Wang , Real-time abnormal event detection in complicatedscenes, in: 20th International Conference on Pattern Recognition (ICPR), IEEE,
2010, pp. 3653–3656 .
[21] W. Gao , R. Jin , S. Zhu , Z.-H. Zhou , One-pass auc optimization, in: Proceedingsof the 30th International Conference on Machine Learning (ICML), ACM, 2013,
pp. 906–914 . 22] Y. Zhu , W. Gao , Z.-H. Zhou , One-pass multi-view learning, in: Proceedings of
The 7th Asian Conference on Machine Learning, 2015, pp. 407–422 . 23] K.S. Miller , On the inverse of the sum of matrices, Math. Mag. 54 (1981) 67–72 .
24] J. Lu , X. Zhou , Y.-P. Tan , Y. Shang , J. Zhou , Neighborhood repulsed metric learn-
ing for kinship verification, IEEE Trans. Pattern Anal. Mach. Intell. 36 (2) (2014)331–345 .
25] J. Lu , G. Wang , W. Deng , K. Jia , Reconstruction-based metric learning for un-constrained face verification, IEEE Trans. Inf. Forensics Secur. 10 (1) (2015)
79–89 . 26] Z. Huang , R. Wang , S. Shan , X. Chen , Face recognition on large-scale video
in the wild with hybrid euclidean-and-riemannian metric learning, Pattern
Recognit. 48 (10) (2015) 3113–3124 . [27] J. Hu , J. Lu , Y.-P. Tan , Discriminative deep metric learning for face verification
in the wild, in: Proceedings of the IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2014, pp. 1875–1882 .
28] J. Hu , J. Lu , Y.-P. Tan , Deep transfer metric learning, in: IEEE Conference onComputer Vision and Pattern Recognition (CVPR), 2015, pp. 325–333 .
29] F. Schroff, D. Kalenichenko , J. Philbin , Facenet: A unified embedding for face
recognition and clustering, in: IEEE Conference on Computer Vision and Pat-tern Recognition (CVPR), 2015, pp. 815–823 .
30] N. Kumar , A.C. Berg , P.N. Belhumeur , S.K. Nayar , Attribute and simile classifiersfor face verification, in: IEEE 12th International Conference on Computer Vision
(ICCV), IEEE, 2009, pp. 365–372 . [31] O.M. Parkhi , A . Vedaldi , A . Zisserman , Deep face recognition, in: British Ma-
chine Vision Conference (BMVC), 1, 2015, p. 6 .
32] G.B. Huang , M. Ramesh , T. Berg , E. Learned-Miller , Labeled Faces in the Wild: ADatabase for Studying Face Recognition in Unconstrained Environments, Tech-
nical Report, University of Massachusetts, Amherst, 2007 . 07–49 33] M. Guillaumin , J. Verbeek , C. Schmid , Is that you? metric learning approaches
for face identification, in: IEEE 12th International Conference on Computer Vi-sion (ICCV), IEEE, 2009, pp. 498–505 .
34] D.G. Lowe , Distinctive image features from scale-invariant keypoints, Int. J.
Comput. Vision 60 (2) (2004) 91–110 . 35] Y. Cong , J. Yuan , J. Liu , Sparse reconstruction cost for abnormal event detection,
in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE,2011, pp. 3449–3456 .
36] R. Mehran , A. Oyama , M. Shah , Abnormal crowd behavior detection using so-cial force model, in: IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR), IEEE, 2009, pp. 935–942 . [37] S. Wu , B.E. Moore , M. Shah , Chaotic invariants of lagrangian particle trajecto-
ries for anomaly detection in crowded scenes, in: IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), IEEE, 2010, pp. 2054–2060 . 38] V. Saligrama , Z. Chen , Video anomaly detection based on local statistical aggre-
gates, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),IEEE, 2012, pp. 2112–2119 .
39] J. Huo , Y. Gao , W. Yang , H. Yin , Multi-instance dictionary learning for detect-ing abnormal events in surveillance videos, Int. J. Neural Syst. 24 (03) (2014)
1430010 .
40] Y. Zhang , H. Lu , L. Zhang , X. Ruan , Combining motion and appearance cues foranomaly detection, Pattern Recognit. 51 (2016) 443–452 .
[41] S. Shalev-Shwartz, Y. Singer, Online learning: Theory, algorithms, and applica-tions (2007).
314 W. Li et al. / Pattern Recognition 75 (2018) 302–314
ology, Nanjing University. His current research interests include machine learning and
Technology, Nanjing University, China, in 20 0 0. Currently, He is a Professor, and also the
niversity. He is currently directing the Reasoning and Learning Research Group in Nanjing urnals. His current research interests include artificial intelligence and machine learning.
.
ore, in 2004. He is currently with the Faculty of Engineering and Information Sciences, is current research interests include machine learning and pattern recognition, such as
r vision, such as categorization and content-based image retrieval.
erra, ACT, Australia. She was a Research Fellow with the University of North Carolina onwealth Scientific and Industrial Research Organisation, Clayton, VIC, Australia. She is
gy, University of Wollongong, Wollongong, NSW, Australia. Her current research interests ceived the Discovery Early Career Researcher Award from the Australian Research Council
ng Normal University, China, in 2011. She is currently working towards her Ph.D. degree in in machine learning and computer vision. Her work currently focuses on metric learning,
r Science and Technology of Nanjing University, China. He received his Ph.D. and B.Sc.
2007, respectively. He was a visiting scholar in University of Technology, Sydney and de computer vision and medical image analysis. He has published more than 40 research
VPR, AAAI, ACMMM and IPMI. He serves as a program committee member for several
Wenbin Li is a Ph.D. student in the Department of Computer Science and Techncomputer vision.
Yang Gao received the Ph.D. degree from the Department of Computer Science and
Deputy Director in the Department of Computer Science and Technology, Nanjing UUniversity. He has published more than 100 papers in top-tired conferences and jo
He also serves as Program Chair and Area Chair for many international conferences
Lei Wang received the Ph.D. degree from Nanyang Technological University, SingapUniversity of Wollongong, Wollongong, NSW, Australia, as an Associate Professor. H
feature selection, model selection, and kernel-based learning methods, and compute
Luping Zhou received the Ph.D. degree from Australian National University, Canbat Chapel Hill, Chapel Hill, NC, USA, and a Staff Research Scientist with the Comm
currently a Senior Lecturer with the School of Computing and Information Technoloinclude machine learning, medical image analysis, and computer vision. Dr. Zhou re
in 2015.
Jing Huo received her B.Eng. degree in computer science and technology from Nanjithe Department of Computer Science, Nanjing University. Her research interests lie
subspace learning and their applications to heterogeneous face recognition.
Yinghuan Shi is currently an Assistant Researcher in the Department of Compute
degrees from Department of Computer Science of Nanjing University in 2013 andUniversity of North Carolina at Chapel Hill, respectively. His research interests inclu
papers in related journals and conferences such as TPAMI, TBME, TNNLS, TCYB, Cconferences, and also as a referee for several journals.