Learning to Hash with its Application to Big Data Retrieval and Mining oDepartment of Computer Science and Engineering Shanghai Jiao Tong University Shanghai, China Joint work with h, ™, Lfl¿ Dec 21, 2013 Li (http://www.cs.sjtu.edu.cn/ ~ liwujun) Learning to Hash CSE, SJTU 1 / 49
58
Embed
Learning to Hash with its Application to Big Data …cs.nju.edu.cn/lwj/slides/hash2.pdfwith its Application to Big Data Retrieval and Mining oÉ Department of Computer Science and
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Learning to Hashwith its Application to Big Data Retrieval and Mining
o�
Department of Computer Science and EngineeringShanghai Jiao Tong University
Shanghai, China
Joint work with ��h, ÜÀ�, L¯¿
Dec 21, 2013
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 1 / 49
Projected with real-valued projection functionGiven a point x, each projected dimension i will be associated with areal-valued projection function fi(x) (e.g. fi(x) = wT
i x)
Quantization Stage
Turn real into binary
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 11 / 49
The hashing function family is defined independently of the trainingdataset:
Locality-sensitive hashing (LSH): (Gionis et al., 1999; Andoni andIndyk, 2008) and its extensions (Datar et al., 2004; Kulis andGrauman, 2009; Kulis et al., 2009).
SIKH: Shift invariant kernel hashing (SIKH) (Raginsky and Lazebnik,2009).
Hashing function: random projections.
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 12 / 49
To generate a code of m bits, PCAH performs PCA on X, and then usethe top m eigenvectors of the matrix XXT as columns of the projectionmatrix W ∈ Rd×m. Here, top m eigenvectors are those corresponding tothe m largest eigenvalues {λk}mk=1, generally arranged with thenon-increasing order λ1 ≥ λ2 ≥ · · · ≥ λm. Let λ = [λ1, λ2, · · · , λm]T .Then
Λ = W TXXTW = diag(λ)
Define hash functionh(x) = sgn(W Tx)
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 23 / 49
[Schur-Horn Lemma (Horn, 1954)] Let c = {ci} ∈ Rm and b = {bi} ∈ Rm
be real vectors in non-increasing order respectively, i.e.,c1 ≥ c2 ≥ · · · ≥ cm, b1 ≥ b2 ≥ · · · ≥ bm. There exists a Hermitian matrixH with eigenvalues c and diagonal values b if and only if
k∑i=1
bi ≤k∑
i=1
ci, for any k = 1, 2, ...,m,
m∑i=1
bi =
m∑i=1
ci.
So we can prove :There exists a solution to the IsoHash problem. And this solution is in theintersection of T (a) and M(Λ).
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 28 / 49
The popular coding strategy SBQ which adopts zero as the thresholdis shown in Figure (a). Due to the thresholding, the intrinsicneighboring structure in the original space is destroyed.The HH strategy (Liu et al., 2011) is shown in Figure (b). If we used(A,B) to denote the Hamming distance between A and B, we canfind that d(A,D) < d(A,C) for HH, which is obviously notreasonable.With our DBQ code, d(A,D) = 2, d(A,B) = d(C,D) = 1, andd(B,C) = 0, which is obviously reasonable to preserve the similarityrelationships in the original space.
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 36 / 49
A. Andoni and P. Indyk. Near-optimal hashing algorithms for approximatenearest neighbor in high dimensions. Commun. ACM, 51(1):117–122,2008.
M. Chu. Constructing a Hermitian matrix from its diagonal entries andeigenvalues. SIAM Journal on Matrix Analysis and Applications, 16(1):207–217, 1995.
M. Datar, N. Immorlica, P. Indyk, and V. S. Mirrokni. Locality-sensitivehashing scheme based on p-stable distributions. In Proceedings of theACM Symposium on Computational Geometry, 2004.
A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensionsvia hashing. In Proceedings of International Conference on Very LargeData Bases, 1999.
Y. Gong and S. Lazebnik. Iterative quantization: A procrustean approachto learning binary codes. In Proceedings of Computer Vision andPattern Recognition, 2011.
A. Horn. Doubly stochastic matrices and the diagonal of a rotation matrix.American Journal of Mathematics, 76(3):620–630, 1954.
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49
W. Kong and W.-J. Li. Double-bit quantization for hashing. InProceedings of the Twenty-Sixth AAAI Conference on ArtificialIntelligence (AAAI), 2012a.
W. Kong and W.-J. Li. Isotropic hashing. In Proceedings of the 26thAnnual Conference on Neural Information Processing Systems (NIPS),2012b.
W. Kong, W.-J. Li, and M. Guo. Manhattan hashing for large-scale imageretrieval. In The 35th International ACM SIGIR conference on researchand development in Information Retrieval (SIGIR), 2012.
B. Kulis and K. Grauman. Kernelized locality-sensitive hashing for scalableimage search. In Proceedings of International Conference on ComputerVision, 2009.
B. Kulis, P. Jain, and K. Grauman. Fast similarity search for learnedmetrics. IEEE Trans. Pattern Anal. Mach. Intell., 31(12):2143–2157,2009.
S. Kumar and R. Udupa. Learning hash functions for cross-view similaritysearch. In IJCAI, pages 1360–1365, 2011.
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49
X. Li, G. Lin, C. Shen, A. van den Hengel, and A. R. Dick. Learning hashfunctions using column generation. In ICML, 2013.
W. Liu, J. Wang, S. Kumar, and S. Chang. Hashing with graphs. InProceedings of International Conference on Machine Learning, 2011.
W. Liu, J. Wang, R. Ji, Y.-G. Jiang, and S.-F. Chang. Supervised hashingwith kernels. In CVPR, pages 2074–2081, 2012.
M. Norouzi and D. J. Fleet. Minimal loss hashing for compact binarycodes. In Proceedings of International Conference on Machine Learning,2011.
M. Norouzi, D. J. Fleet, and R. Salakhutdinov. Hamming distance metriclearning. In NIPS, pages 1070–1078, 2012.
M. Ou, P. Cui, F. Wang, J. Wang, W. Zhu, and S. Yang. Comparingapples to oranges: a scalable solution with heterogeneous hashing. InKDD, pages 230–238, 2013.
M. Raginsky and S. Lazebnik. Locality-sensitive binary codes fromshift-invariant kernels. In Proceedings of Neural Information ProcessingSystems, 2009.
Li (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49
R. Salakhutdinov and G. Hinton. Semantic Hashing. In SIGIR workshopon Information Retrieval and applications of Graphical Models, 2007.
R. Salakhutdinov and G. E. Hinton. Semantic hashing. Int. J. Approx.Reasoning, 50(7):969–978, 2009.
J. Song, Y. Yang, Z. Huang, H. T. Shen, and R. Hong. Multiple featurehashing for real-time large scale near-duplicate video retrieval. In ACMMultimedia, pages 423–432, 2011.
J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-mediahashing for large-scale retrieval from heterogeneous data sources. InSIGMOD Conference, pages 785–796, 2013.
C. Strecha, A. A. Bronstein, M. M. Bronstein, and P. Fua. Ldahash:Improved matching with smaller descriptors. IEEE Trans. Pattern Anal.Mach. Intell., 34(1):66–78, 2012.
A. Torralba, R. Fergus, and Y. Weiss. Small codes and large imagedatabases for recognition. In Proceedings of Computer Vision andPattern Recognition, 2008.
J. Wang, S. Kumar, and S.-F. Chang. Sequential projection learning forLi (http://www.cs.sjtu.edu.cn/~liwujun) Learning to Hash CSE, SJTU 49 / 49
hashing with compact codes. In Proceedings of International Conferenceon Machine Learning, 2010a.
J. Wang, S. Kumar, and S.-F. Chang. Semi-supervised hashing forlarge-scale image retrieval. In Proceedings of Computer Vision andPattern Recognition, 2010b.
Y. Weiss, A. Torralba, and R. Fergus. Spectral hashing. In Proceedings ofNeural Information Processing Systems, 2008.
D. Zhang, F. Wang, and L. Si. Composite hashing with multipleinformation sources. In Proceedings of International ACM SIGIRConference on Research and Development in Information Retrieval,2011.
Y. Zhen and D.-Y. Yeung. A probabilistic model for multimodal hashfunction learning. In KDD, pages 940–948, 2012a.
Y. Zhen and D.-Y. Yeung. Co-regularized hashing for multimodal data. InNIPS, pages 1385–1393, 2012b.