Neurocomputinghebmlc.org/UploadFiles/2017122014537905.pdfespecially for very large scale problems and no particular image is targeted. Hashing-based image retrieval methods [9–11]
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Neurocomputing 275 (2018) 916–923
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Bagging–boosting-base d semi-supervise d multi-hashing with
query-adaptive re-ranking
Wing W.Y. Ng
a , Xiancheng Zhou
a , Xing Tian
a , ∗, Xizhao Wang
b , Daniel S. Yeung
a
a School of Computer Science and Engineering, South China University of Technology, Guangzhou, China b College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
a r t i c l e i n f o
Article history:
Received 19 January 2017
Revised 8 July 2017
Accepted 13 September 2017
Available online 21 September 2017
Communicated by Yongdong Zhang
Keywords:
Semi-supervised information retrieval
Multi-hashing
Bagging
Boosting
a b s t r a c t
Hashing-based methods have been widely applied in large scale image retrieval problem due to its high
efficiency. In real world applications, it is difficult to require all images in a large database being la-
beled while unsupervised methods waste information from labeled images. Therefore, semi-supervised
hashing methods are proposed to use partially labeled database to train hash functions using both the
semantic and the unsupervised information. Multi-hashing methods achieve better precision-recall in
comparison to single hashing method. However, current boosting-based multi-hashing methods do not
improve performance after a small number of hash tables are created. Therefore, a bagging–boosting-
based semi-supervised multi-hashing with query-adaptive re-ranking (BBSHR) is proposed in this paper.
In the proposed method, an individual hash table of multi-hashing is trained using the boosting-based
BSPLH, such that each hash bit corrects errors made by previous bits. Moreover, we propose a new semi-
supervised weighting scheme for the query-adaptive re-ranking. Experimental results show that the pro-
posed method yields better precision and recall rates for given numbers of hash tables and bits.
n a combined form, e.g. MNIST-16 denotes the MNIST database
sing 16-bit hash codes. The t -test is performed for the BBSHR
ith respect to each of other method. In Table 1 , ‘ ∗’, ‘#’, ‘&’ and
$’ denote the BBSHR outperforms a particular method with statis-
ical significance of 99.9%, 99%, 95% and less than 50%, respectively.
able 1 shows that the BBSHR outperforms all existing methods in
xperiments except the BIQH in the CIFAR10-16 experiment with
ess than 50% statistical significance. Overall, the proposed BB-
HR is significantly better than state-of-the-art hashing methods
n comparisons.
Both unsupervised methods, i.e. the LSH and the CH, perform
he worst among all methods in comparisons. The CH performs
lightly better than the LSH because the CH is a multi-hashing
ethod which uses more bits in total. Single table-based methods,
.e. the SPLH and the BSPLH, perform worse than multi-hashing
ethods except the unsupervised CH. The major reason is that
ulti-hashing methods use more hash bits in comparison to single
able-based methods. It shows the benefits of multi-hashing. The
CH outperforms the CH in all experiments because the DCH is a
emi-supervised hashing and uses a boosting-based method for in-
ividual hash table training. Although the BIQH uses fully labeled
raining database, it does not outperform semi-supervised multi-
ashing DCH without query-adaptive re-ranking in 7 out of 15 ex-
eriments. This shows the major deficiency of the BIQH is the use
f unsupervised hashing method in combination of a fully super-
ised re-ranking. The unsupervised hashing method wastes labeled
nformation while the supervised re-ranking imposes a strong con-
traint to the BIQH by forcing it to use a fully labeled training
atabase.
The BBSH outperforms the DCH and the BIQH in 12 and 11,
espectively, out of all 15 experiments. This shows that both the
agging–boosting-based multi-hashing methods with and without
e-ranking, i.e. the BBSHR and the BBSH respectively, are effec-
ive. However, without the re-ranking, it is difficult to outperform
he BIQH using a fully supervised re-ranking. In contrast, the re-
anking of the BBSHR is designed for semi-supervised databases
nd more practical to real-world large scale problems. Overall, the
BSHR outperforms the BBSH by 3.93% in average of all experi-
ents.
Another observation is that better performances can be
chieved for all hashing methods except the DCH and the CH in
he same database when more hash bits are used. This may be
aused by the nature of boosting in the DCH and the CH which
ake them cannot improve after a number of hash tables being
reated owing to the out of useful training samples issue. This is
articularly significant when more bits per table is used because
he first few hash tables learn well using more bits and a lot of
raining samples are discarded by the boosting method in both the
H and the DCH.
The training of BBSHR consists of two loops. The computational
omplexity for creating a hash function is O (nd 2 + n 2 l d) and the
otal computational complexity of the BBSHR is O (mK(nd 2 + n 2 l d))
here n l denotes the number of labeled images.
.2. Parameter selection
There are two major parameters need to be selected for the
roposed BBSHR, i.e. the bagging ratio ( p ) and the number of hash
ables ( m ). Fig. 7 shows AUC performances of the BBSHR with dif-
erent p values for the MNIST-32 using 5 hash tables. The value of
controls the sampling ratio of unlabeled samples in the construc-
ion of training sets for individual hash table learning. Fig. 7 shows
hat increasing the p value does not yield obvious effect to the per-
ormance of the BBSHR. In our experiments, p = 0 . 4 is used. This
eeps a relatively large portion of unlabeled samples while pro-
ides a good trade-off with the computational costs.
922 W.W.Y. Ng et al. / Neurocomputing 275 (2018) 916–923
Fig. 7. AUC of the BBSHR method with different p values.
Fig. 8. AUC of the BBSHR method with different m values.
i
q
t
o
n
f
n
t
a
o
p
A
F
t
(
R
Fig. 8 shows AUC values of both the BBSHR and the DCH with
different number of hash tables ( m ) using the MNIST-32 with p =0 . 4 . Fig. 8 shows that the performance of the BBSHR improves
when m increases. However, the increment of m does not yield
obvious influence to the performance of the BBSHR when m > 5.
Therefore, m = 5 is used in our experiments. In contrast, the per-
formance of the DCH decreases when m > 3. Again, the boosting of
the DCH discards training samples when they are correctly classi-
fied by the current hash table, i.e. similar samples hashed to the
same side of hash function. This makes the later hash tables (e.g.
m > 3) have no useful training samples to learn. In the extreme
case which all labeled samples are well learned and discarded, the
hashing in later table reduces to unsupervised hashing.
5. Conclusion
A bagging–boosting-based semi-supervised multi-hashing
method with query-adaptive re-ranking (BBSHR) is proposed
in this paper. The BBSHR uses the semi-supervised bagging to
construct multiple hash tables and then individual hash table
s trained using a boosting-based method. The semi-supervised
uery-adaptive re-ranking is proposed to further improve re-
rieval performance. Experimental results show that the BBSHR
utperforms state-of-the-art hashing methods with statistical sig-
ificance. ann1 The current assignment method of pseudo-labels
or unlabeled samples may not be optimal. In cases that nearest
eighboring samples of an unlabeled sample are evenly dis-
ributed in several classes, the pseudo-label may require a random
ssignment among multiple majority classes. Further researches
n a better pseudo-label assignment method may improve the
erformance of the BBSHR.
cknowledgment
This work is under support of the National Natural Science
oundation of China under Grants ( 61272201 and 61572201 ) and
he Fundamental Research Funds for the Central Universities
2017ZD052).
eferences
[1] C. Zhang , J.Y. Chai , R. Jin , User term feedback in interactive text-based imageretrieval, in: Proceedings of the Twenty-eighth Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, 9, 2005,pp. 51–58 . 1–3
[2] W. Li , L. Duan , D. Xu , I.W.-H. Tsang , Text-based image retrieval using progres-sive multi-instance learning, in: Proceedings of the 2011 International Confer-
ence on Computer Vision, 58, 2011, pp. 2049–2055 . [3] D. Petrelli , P. Clough , Using concept hierarchies in text-based image retrieval:
a user evaluation, Lect. Notes Comput. Sci. 4022 (2005) 297–306 .
[4] J. Zhao , Research on content-based multimedia information retrieval, in: Pro-ceedings of the 2011 International Conference on Computational and Informa-
tion Sciences, 2011, pp. 261–263 . [5] T. Kato , Cognitive View Mechanism for Content-Based Multimedia Information
Retrieval, Springer, London, 1993, pp. 244–262 . [6] G. Zhou , M. Kai , F. Liu , Y. Yin , Relevance feature mapping for content-based
multimedia information retrieval, Pattern Recognit. 45 (4) (2012) 1707–1720 .
[7] C.S. Tong , M. Wong , Adaptive approximate nearest neighbor search for fractalimage compression, IEEE Trans. Image Process. 11 (6) (2002) 605–615 . A Pub-
lication of the IEEE Signal Processing Society. [8] M. Casey , M. Slaney , Song intersection by approximate nearest neighbor
search, in: Proceedings of the 2006 International Society for Music Informa-tion Retrieval (ISMIR), 6, 2006, pp. 144–149 .
[9] P. Li , M. Wang , J. Cheng , C. Xu , H. Lu , Spectral hashing with semantically con-
sistent graph for image indexing, IEEE Trans. Multimed. 15 (1) (2013) 141–152 .[10] J. Shao , F. Wu , C. Ouyang , X. Zhang , Sparse spectral hashing, Pattern Recognit.
Lett. 33 (3) (2012) 271–277 . [11] W.W. Ng , Y. Lv , D.S. Yeung , P.P. Chan , Two-phase mapping hashing, Neurocom-
puting 151 (3) (2015) 1423–1429 . [12] B. Demir , L. Bruzzone , Hashing-based scalable remote sensing image search
and retrieval in large archives, IEEE Trans. Geosci. Remote Sens. 54 (2) (2016)
892–904 . [13] X. Liu , Y. Mu , D. Zhang , B. Lang , X. Li , Large-scale unsupervised hashing with
shared structure learning, IEEE Trans. Cybern. 45 (9) (2015) 1811–1822 . [14] L. Liu , L. Shao , Sequential compact code learning for unsupervised image hash-
ing, IEEE Trans. Neural Netw. Learn. Syst. 27 (12) (2015) 2526–2536 . [15] L. Zhu , J. Shen , L. Xie , Unsupervised visual hashing with semantic assistant
for content-based image retrieval, IEEE Trans. Knowl. Data Eng. 29 (2) (2016)
472–486 . [16] L. Liu , M. Yu , L. Shao , Unsupervised local feature hashing for image similarity
search, IEEE Trans. Cybern. 46 (11) (2015) 2548–2558 . [17] J.-P. Heo , Y. Lee , J. He , S.-F. Chang , S.-E. Yoon , Spherical hashing: binary code
[18] A. Gionis , P. Indyk , R. Motwani , Similarity search in high dimensions via hash-
ing, in: Proceedings of the 2009 International Conference on Very Large DataBases, 2009, pp. 518–529 .
[19] D. Gorisse , M. Cord , F. Precioso , Locality-sensitive hashing for chi2 distance,IEEE Trans. Pattern Anal. Mach. Intell. 34 (2) (2012) 402–409 .
[20] Y. Matsushita , T. Wada , Principal component hashing: an accelerated approx-imate nearest neighbor search, Advances in Image and Video Technology,
Springer, 2009, pp. 374–385 . [21] G. Yunchao , L. Svetlana , G. Albert , P. Florent , Iterative quantization: a pro-
crustean approach to learning binary codes for large-scale image retrieval,
in: Proceedings of the 2011 IEEE Conference on Computer Vision and PatternRecognition, 35, 2013, pp. 2916–2929 . 12
[22] C. Strecha , A. Bronstein , M. Bronstein , P. Fua , LDAHash: improved matchingwith smaller descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 34 (1) (2011)
66–78 .
W.W.Y. Ng et al. / Neurocomputing 275 (2018) 916–923 923
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
i
r
i
23] W. Liu , J. Wang , R. Ji , Y.-G. Jiang , S.-F. Chang , Supervised Hashing with Kernels,in: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern
Recognition, 2012, pp. 2074–2081 . 24] V.-A. Nguyen , M.N. Do , Deep learning based supervised hashing for efficient
image retrieval, in: Proceedings of the 2016 IEEE International Conference onMultimedia and Expo, 2016, pp. 1–6 .
25] J. Wang , S. Kumar , S.-F. Chang , Semi-supervised hashing for large-scale search,IEEE Trans. Pattern Anal. Mach. Intell. 34 (12) (2012) 2393–2406 .
26] C. Wu , J. Zhu , D. Cai , C. Chen , J. Bu , Semi-supervised nonlinear hashing using
[27] C. Yao , J. Bu , C. Wu , G. Chen. , Semi-supervised spectral hashing for fast simi-larity search, Neurocomputing 101 (2013) 52–58 .
28] W. Ng , Y. Lv , Z. Zeng , D. Yeung , P. Chan , Sequential conditional entropy max-imization semi-supervised hashing for semantic image retrieval, in: Proceed-
ings of the 2015 International Journal of Machine Learning and Cybernetics,
2015, pp. 1–16 . 29] L. Gao , J. Song , F. Zou , D. Zhang , J. Shao , Scalable multimedia retrieval by deep
learning hashing with relative similarity learning, in: Proceedings of the Twen-ty-third ACM International Conference on Multimedia, 2015, pp. 903–906 .
30] H. Xie , Y. Zhang , J. Tan , L. Guo , J. Li , Contextual query expansion for imageretrieval, IEEE Trans. Multimed. 16 (4) (2014) 1104–1114 .
[31] L. Zhang , Y. Zhang , J. Tang , X. Gu , J. Li , Q. Tian , Topology preserving hashing for
similarity search, in: Proceedings of the Twenty-first ACM International Confer-ence on Multimedia, 2013, pp. 123–132 .
32] H. Xu , J. Wang , Z. Li , G. Zeng , Complementary hashing for approximate nearestneighbor search, in: Proceedings of the 2011 IEEE International Conference on
Computer Vision, 2011, pp. 1631–1638 . 33] P. Li , J. Cheng , H. Lu , Hashing with dual complementary projection learning for
fast image retrieval, Neurocomputing 120 (2013) 83–89 .
34] H. Fu , X. Kong , J. Lu , Large-scale image retrieval based on boosting itera-tive quantization hashing with query-adaptive reranking, Neurocomputing 122
(2013) 4 80–4 89 . 35] Z. Shaoting , Y. Ming , C. Timothee , Y. Kai , D. Metaxas , Query specific rank fu-
39] J. Hulse , T. Khoshgoftaar , Experimental perspectives on learning from imbal-anced data, in: Proceedings of the Twenty-Fourth International Conference on
Machine Learning, 2008, pp. 935–942 .
40] H. Lai , P. Yan , X. Shu , Y. Wei , S. Yan , Instance-aware hashing for multi-labelimage retrieval, IEEE Trans. Image Process. 25 (6) (2016) 2469–2479 .
[41] J. Myerson , L. Green , M. Warusawitharana , Area under the curve as a measureof discounting, J. Exp. Anal. Behav. 76 (2) (2001) 235–243 .
Wing W. Y. Ng (S’ 02-M’ 05-SM’ 15) received his B.Sc.and Ph.D. degrees from Hong Kong Polytechnic University
in 2001 and 2006, respectively. He is now a Professor inthe School of Computer Science and Engineering, South
China University of Technology, China. His major research
directions include machine learning and information re-trieval. He is currently an associate editor of the Interna-
tional Journal of Machine Learning and Cybernetics. He isthe principle investigator of three China National Nature
Science Foundation projects and a Program for New Cen-tury Excellent Talents in University from China Ministry
of Education. He served as the Board of Governor of IEEE
Systems, Man and Cybernetics Society in 2011–2013.
Xiancheng Zhou received the B.Sc. and M.Sc. degrees
in computer science from the South China University ofTechnology. His research interests include machine learn-
ing and information retrieval.
Xing Tian received his B.Sc. degree in Computer Sci-
ence from the South China University of Technology,Guangzhou, China and is currently a Ph.D. candidate of
the School of Computer Science and Engineering, South
China University of Technology. His current research in-terests focus on image retrieval and machine learning in
non-stationary big data environments.
Professor Xizhao Wang received the Ph.D. degree in
computer science from the Harbin Institute of Technol-ogy, Harbin, China, in 1998. He is currently a Professor
with the Big Data Institute, Shenzhen University, Shen-
zhen, China. His current research interests include uncer-tainty modeling and machine learning for big data. He
has edited more than ten special issues and publishedthree monographs, two textbooks, and more than 200
peer-reviewed research papers. By the Google scholar, thetotal number of citations is over 50 0 0. He is on the list of
Elsevier 2015/2016 most cited Chinese authors. He is the
Chair of the IEEE SMC Technical Committee on Computa-tional Intelligence, the Editor-in-Chief of Machine Learn-
ng and Cybernetics Journal, and Associate Editor for a couple of journals in theelated areas. He was a recipient of the IEEE SMCS Outstanding Contribution Award
n 2004 and a recipient of the IEEE SMCS Best Associate Editor Award in 2006.
Professor Daniel S. Yeung (M’ 89-SM’ 99-F’ 04) is a past
President of the IEEE SMC Society. He was Head and Chair
Professor of the Computing Department of Hong KongPolytechnic University, Hong Kong, and a faculty mem-
ber of Rochester Institute of Technology, USA. He has alsoworked for TRW Inc., General Electric Corporation R&D
Centre and Computer Consoles Inc. in USA. He is a Fel-low of the IEEE.