Relative Distribution Entropy Loss Function in CNN Image Retrieval
Abstract
:1. Introduction
2. Related Work
2.1. Deep Metric Learning
2.1.1. Contrastive Loss
2.1.2. Triplet Loss
2.1.3. N-pair Loss
2.1.4. Lifted Structured Loss
2.2. Application of Spatial Information in Image Retrieval
2.3. Pooling and Normalization
2.4. Whitening
3. Method Overview
3.1. Calculation of Relative Distribution Entropy
3.2. Relative Distribution Entropy Loss Function
3.3. Network Architecture for Relative Distribution Entropy Loss Function
3.3.1. CNN Network Architecture
3.3.2. Architecture of Training
4. Experiments and Evaluation
4.1. Training Datasets
4.2. Training Configurations
4.3. Datasets and Evaluation of Image Retrieval
4.4. Results and Analysis
4.4.1. The Adjustment Process of Hyperparameter
4.4.2. Comparison of MAC, SPoC, and GeM
4.4.3. Comparison of Relative Distribution Entropy Triplet Loss and Triplet Loss
4.4.4. Comparison with State-of-Art
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- He, K.; Zhang, X.; Ren, S.; Sun, J. Intelligence, m. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Yue-Hei Ng, J.; Yang, F.; Davis, L.S. Exploiting local features from deep networks for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 53–61. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NA, USA, 27–30 June 2016; pp. 5297–5307. [Google Scholar] [CrossRef] [Green Version]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Alzu’bi, A.; Amira, A.; Ramzan, N.J.N. Content-based image retrieval with compact deep convolutional features. Neurocomputing 2017, 249, 95–105. [Google Scholar] [CrossRef] [Green Version]
- Howard, A.G. Some improvements on deep convolutional neural network based image classification. arXiv 2013, arXiv:1312.5402. [Google Scholar]
- Cireşan, D.; Meier, U.; Masci, J.; Schmidhuber, J. A committee of neural networks for traffic sign classification. In Proceedings of the 2011 International Joint Conference on Neural Networks (IJCNN), San Jose, CA, USA, 31 July–5 August 2011; pp. 1918–1921. [Google Scholar]
- Ciresan, D.C.; Meier, U.; Masci, J.; Gambardella, L.M.; Schmidhuber, J. Flexible, high performance convolutional neural networks for image classification. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Noh, H.; Araujo, A.; Sim, J.; Weyand, T.; Han, B. Large-scale image retrieval with attentive deep local features. In Proceedings of the IEEE International Conference on Computer Vision(CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 3456–3465. [Google Scholar]
- Lowe, D.G. Similarity metric learning for a variable-kernel classifier. Neural Comput. 1995, 7, 72–85. [Google Scholar] [CrossRef]
- Mika, S.; Ratsch, G.; Weston, J.; Scholkopf, B.; Mullers, K.-R. Fisher discriminant analysis with kernels. In Proceedings of the Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (cat. no. 98th8468), Madison, WI, USA, 25 August 1999; pp. 41–48. [Google Scholar]
- Xing, E.P.; Jordan, M.I.; Russell, S.J.; Ng, A.Y. Distance metric learning with application to clustering with side-information. In Proceedings of the Advances in neural information processing systems, Vancouver, BC, Canada, 8–13 December 2003; pp. 521–528. [Google Scholar]
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17 June 2006; Volume 2, pp. 1735–1742. [Google Scholar]
- Hoffer, E.; Ailon, N. Deep metric learning using triplet network. In Proceedings of the International Workshop on Similarity-Based Pattern Recognition, Copenhagen, Debnark, 12–14 October 2015; pp. 84–92. [Google Scholar]
- Law, M.T.; Thome, N.; Cord, M. Quadruplet-wise image similarity learning. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 3–6 December 2013; pp. 249–256. [Google Scholar]
- Oh Song, H.; Xiang, Y.; Jegelka, S.; Savarese, S. Deep metric learning via lifted structured feature embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NA, USA, 27–30 June 2016; pp. 4004–4012. [Google Scholar]
- Sohn, K. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 1857–1865. [Google Scholar]
- Yi, D.; Lei, Z.; Li, S. Deep metric learning for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 24–27 June 2014; pp. 34–39. [Google Scholar]
- Ustinova, E.; Lempitsky, V. Learning deep embeddings with histogram loss. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 4170–4178. [Google Scholar]
- Wang, J.; Zhou, F.; Wen, S.; Liu, X.; Lin, Y. Deep metric learning with angular loss. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2593–2601. [Google Scholar]
- Wu, C.-Y.; Manmatha, R.; Smola, A.J.; Krahenbuhl, P. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2840–2848. [Google Scholar]
- Ge, W. Deep metric learning with hierarchical triplet loss. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 269–285. [Google Scholar]
- Rao, A.; Srihari, R.K.; Zhang, Z. Spatial color histograms for content-based image retrieval. In Proceedings of the 11th International Conference on Tools with Artificial Intelligence, Chicago, IL, USA, 9–11 November 1999; pp. 183–186. [Google Scholar]
- Shannon, C.E.J.B. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
- Goodfellow, I.; Bengio, Y.; Courville, A.; Bengio, Y. Deep Learning; MIT Press: Cambridge, UK, 2016; Volume 1. [Google Scholar]
- Hamza, A.B. Jensen-Rhyi Divergence Measure: Theoretical and Computational Perspectives. IEEE Int. Symp. Inf. Theory 2003. [Google Scholar] [CrossRef]
- Lehmann, T.M.; Güld, M.O.; Deselaers, T.; Keysers, D.; Schubert, H.; Spitzer, K.; Ney, H.; Wein, B.B.J.; Graphics, C.M.I. Automatic categorization of medical images for content-based retrieval and data mining. Comput. Med Imag. Graph. 2005, 29, 143–155. [Google Scholar] [CrossRef]
- Radenovic, F.; Tolias, G.; Chum, O. Fine-tuning CNN Image Retrieval with No Human Annotation. IEEE Trans. Pattern Anal. 2018. [Google Scholar] [CrossRef] [Green Version]
- Mikolajczyk, K.; Matas, J. Improving Descriptors for Fast Tree Matching by Optimal Linear Projection. In Proceedings of the IEEE 11th International Conference on Computer Vision, 2007. ICCV 2007, Rio de Janeiro, Brazil, 14–21 October 2007; pp. 1–8. [Google Scholar]
- Huang, R.; Jiang, X. Off-Feature Information Incorporated Metric Learning for Face Recognition. IEEE Signal Process. Lett. 2018, 25, 541–545. [Google Scholar] [CrossRef]
- Feng, G.; Liu, W.; Tao, D.; Zhou, Y. Hessian Regularized Distance Metric Learning for People Re-Identification. Neural Process. Lett. 2019, 50, 2087–2100. [Google Scholar] [CrossRef]
- Tan, M.; Yu, J.; Yu, Z.; Gao, F.; Rui, Y.; Tao, D. User-Click-Data-Based Fine-Grained Image Recognition via Weakly Supervised Metric Learning. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2018, 14, 1–23. [Google Scholar] [CrossRef]
- Cao, R.; Zhang, Q.; Zhu, J.; Li, Q.; Qiu, G. Enhancing remote sensing image retrieval with triplet deep metric learning network. arXiv 2019, arXiv:1902.05818. [Google Scholar] [CrossRef] [Green Version]
- Xiang, J.; Zhang, G.; Hou, J.; Sang, N.; Huang, R. Multiple target tracking by learning feature representation and distance metric jointly. arXiv 2018, arXiv:1802.03252. [Google Scholar]
- Yang, J.; She, D.; Lai, Y.-K.; Yang, M.-H. Retrieving and classifying affective images via deep metric learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
- Mehmood, Z.; Anwar, S.M.; Ali, N.; Habib, H.A.; Rashid, M. A Novel Image Retrieval Based on a Combination of Local and Global Histograms of Visual Words. Math. Probl. Eng. 2016, 2016, 8217250. [Google Scholar] [CrossRef]
- Krapac, J.; Verbeek, J.; Jurie, F. Modeling Spatial Layout with Fisher Vectors for Image Categorization. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 1487–1494. [Google Scholar]
- Koniusz, P.; Mikolajczyk, K. Spatial Coordinate Coding to Reduce Histogram Representations, Dominant Angle and Colour Pyramid Match. In Proceedings of the 2011 18th IEEE International Conference on Image Processing (Icip), Brussels, Belguim, 11–14 September 2011; pp. 661–664. [Google Scholar]
- Sanchez, J.; Perronnin, F.; de Campos, T. Modeling the spatial layout of images beyond spatial pyramids. Pattern Recogn. Lett. 2012, 33, 2216–2223. [Google Scholar] [CrossRef]
- Liu, P.; Miao, Z.; Guo, H.; Wang, Y.; Ai, N. Adding spatial distribution clue to aggregated vector in image retrieval. EURASIP J. Image Video Process. 2018, 2018, 9. [Google Scholar] [CrossRef] [Green Version]
- Babenko, A.; Lempitsky, V. Aggregating deep convolutional features for image retrieval. arXiv 2015, arXiv:1510.07493. [Google Scholar]
- Razavian, A.S.; Sullivan, J.; Carlsson, S.; Maki, K. Particular object retrieval with integral max-pooling of CNN activations. arXiv 2015, arXiv:1511.05879. [Google Scholar]
- Jegou, H.; Chum, O. Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening. In Proceedings of the European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Pt Ii 2012. Volume 7573, pp. 774–787. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miani, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- He, K.M.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (Cvpr), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.H.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Chum, O.; Matas, J. Large-scale discovery of spatially related images. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 32, 371–377. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Radenovic, F.; Schonberger, J.L.; Ji, D.; Frahm, J.-M.; Chum, O.; Matas, J. From dusk till dawn: Modeling in the dark. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5488–5496. [Google Scholar]
- Philbin, J.; Chum, O.; Isard, M.; Sivic, J.; Zisserman, A. Object retrieval with large vocabularies and fast spatial matching. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
- Philbin, J.; Chum, O.; Isard, M.; Sivic, J.; Zisserman, A. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AL, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Gordo, A.; Almazán, J.; Revaud, J.; Larlus, D. Deep image retrieval: Learning global representations for image search. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 241–257. [Google Scholar]
- Kalantidis, Y.; Mellina, C.; Osindero, S. Cross-dimensional weighting for aggregated deep convolutional features. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 685–701. [Google Scholar]
- Mohedano, E.; McGuinness, K.; O’Connor, N.E.; Salvador, A.; Marques, F.; Giro-i-Nieto, X. Bags of local convolutional features for scalable instance search. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, New York, NY, USA, 6–9 June 2016; pp. 327–331. [Google Scholar]
- Arandjelovic, R.; Gronat, P.; Torii, A.; Pajdla, T.; Sivic, J. NetVLAD: CNN architecture for weakly supervised place recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5297–5307. [Google Scholar]
- Ong, E.-J.; Husain, S.; Bober, M. Siamese network of deep fisher-vector descriptors for image retrieval. arXiv 2017, arXiv:1702.00338. [Google Scholar]
- Gordo, A.; Almazan, J.; Revaud, J.; Larlus, D. End-to-end learning of deep visual representations for image retrieval. Int. J. Comput. Vis. 2017, 124, 237–254. [Google Scholar] [CrossRef] [Green Version]
Network | Oxford5k | Oxford5k(W) | Pairs6k | Pairs6k(W) | ||
---|---|---|---|---|---|---|
AlexNet | 0.50 | 10 | 58.10 | 67.60 | 71.64 | 79.60 |
0.75 | 20 | 60.87 | 67.19 | 75.33 | 79.43 | |
0.85 | 25 | 60.79 | 67.93 | 75.60 | 79.59 | |
0.90 | 30 | 61.32 | 68.22 | 75.29 | 80.07 | |
1.00 | 50 | 60.27 | 67.72 | 74.88 | 80.10 | |
VGG | 0.85 | 30 | 84.62 | 87.83 | 82.40 | 88.01 |
0.85 | 100 | 76.18 | 83.04 | 81.71 | 87.11 | |
0.90 | 30 | 85.09 | 88.00 | 82.69 | 88.12 |
Net | Oxford5k | Oxford5k(W) | Pairs6k | Pairs6k(W) |
---|---|---|---|---|
SPoC [43] | 41.83 | 55.34 | 55.49 | 68.61 |
MAC [44] | 47.50 | 55.95 | 62.16 | 71.30 |
GeM [30] | 60.79 | 68.22 | 75.29 | 80.07 |
Loss | Network | Oxford5k | Oxford5k(W) | Pairs6k | Pairs6k(W) |
---|---|---|---|---|---|
Triplet loss [30] | VGG | 81.48 | 82.80 | 82.79 | 84.78 |
Ours | 82.39 | 83.07 | 83.61 | 85.45 | |
Triplet loss [30] | ResNet | 81.49 | 85.33 | 87.70 | 91.11 |
Ours | 82.88 | 86.54 | 89.33 | 91.97 |
Net | Method | F-tuned | Oxford5k | Oxford105k | Pairs6k | Pairs106k |
---|---|---|---|---|---|---|
VGG | MAC [44] | no | 56.4 | 47.8 | 72.3 | 58.0 |
SPoC [43] | no | 68.1 | 61.1 | 78.2 | 68.4 | |
Crow [56] | no | 70.8 | 65.3 | 79.7 | 72.2 | |
R-MAC [52] | no | 66.9 | 61.6 | 83.0 | 75.7 | |
BoW-CNN [57] | yes | 73.9 | 59.3 | 82.0 | 64.8 | |
NetVLAD [58] | yes | 71.6 | - | 79.7 | - | |
Fisher [59] | yes | 81.5 | 76.6 | 82.4 | - | |
R-MAC [55] | yes | 83.1 | 78.6 | 87.1 | 79.7 | |
GeM [30] | yes | 87.9 | 83.3 | 87.7 | 81.3 | |
ours | yes | 88.0 | 83.5 | 88.1 | 79.9 | |
Res | R-MAC [52] | no | 69.4 | 63.7 | 85.2 | 77.8 |
GeM [30] | yes | 87.8 | 84.6 | 92.7 | 86.9 | |
ours | yes | 88.4 | 84.9 | 92.7 | 86.3 | |
Re-ranking(R) and Query Expansion(QE) | ||||||
VGG | Crow + QE [56] | no | 74.9 | 70.6 | 84.8 | 79.4 |
R-MAC+R+QE [52] | no | 77.3 | 73.2 | 86.5 | 79.8 | |
BoW-CNN+R+QE [57] | no | 78.8 | 65.1 | 84.8 | 64.1 | |
R-MAC+QE [55] | yes | 89.1 | 87.3 | 91.2 | 86.8 | |
GeM+QE [30] | yes | 91.9 | 89.6 | 91.9 | 87.6 | |
ours | yes | 92.0 | 89.6 | 92.4 | 87.3 | |
Res | R-MAC+QE [52] | no | 78.9 | 75.5 | 89.7 | 85.3 |
R-MAC+QE [60] | yes | 90.6 | 89.4 | 96.0 | 93.2 | |
GeM+QE [30] | yes | 91.0 | 89.5 | 95.5 | 91.9 | |
ours | yes | 91.7 | 89.7 | 96.0 | 92.1 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, P.; Shi, L.; Miao, Z.; Jin, B.; Zhou, Q. Relative Distribution Entropy Loss Function in CNN Image Retrieval. Entropy 2020, 22, 321. https://doi.org/10.3390/e22030321
Liu P, Shi L, Miao Z, Jin B, Zhou Q. Relative Distribution Entropy Loss Function in CNN Image Retrieval. Entropy. 2020; 22(3):321. https://doi.org/10.3390/e22030321
Chicago/Turabian StyleLiu, Pingping, Lida Shi, Zhuang Miao, Baixin Jin, and Qiuzhan Zhou. 2020. "Relative Distribution Entropy Loss Function in CNN Image Retrieval" Entropy 22, no. 3: 321. https://doi.org/10.3390/e22030321
APA StyleLiu, P., Shi, L., Miao, Z., Jin, B., & Zhou, Q. (2020). Relative Distribution Entropy Loss Function in CNN Image Retrieval. Entropy, 22(3), 321. https://doi.org/10.3390/e22030321