Deep Feature Pyramid Hashing for Efficient Image Retrieval
Abstract
:1. Introduction
- A Deep Feature Pyramid Hashing is proposed in this work which can fully exploit the multi-level visual and semantic information of images. Our architecture applies a new feature pyramid network designed for deep hashing to the VGG-19 model. Thus, the model becomes able to learn binary hash codes from various feature scales and then fuse them to obtain the final hash codes.
- To the best of our knowledge, the proposed DFPH is the first feature pyramid network-based method which generates hash codes from multiple feature scales for image retrieval.
2. Related Works
3. Proposed Method
3.1. Problem Definition
3.2. Model Architecture
3.3. Objective Function
4. Experiments
4.1. Datasets
4.2. Experimental Settings
4.3. Evaluation Metrics and Baselines
4.4. Results
4.5. Ablation Studies
- (1)
- Using various fundamental feature extractors: We replace VGG19 with VGG13 and VGG16, and Table 4 details their performance on the CIFAR-10 dataset. The table shows that employing deeper networks can enhance image retrieval performance. We thus employ VGG19 as our fundamental feature extractor.
- (2)
- Pyramid representations efficiency: We compare our DFPH method with its variants to further explore the effect of different scale features on performance. We use the single-scale map of C4 and C5 (Conv 4; Conv 5) and remove the top-down pathway. With this modification, the 3 × 3 convolutions connections followed by 1 × 1 convolutions are attached to the bottom-up pyramid.Table 5 shows the retrieval performance of single-level and multi-level features on CIFAR-10. It can be seen that using a feature from ‘Conv 5’ achieves the most significant average mAP score of 76.5%, while that from ‘Conv 4’ only achieves 58%. Moreover, our DFPH has achieved the average mAP scores of 82.75%, 6.25% higher than simply using single-level features. The hash codes achieve the best performance when all scale features are.In Figure 5, we display the precision–recall curves of DFPH in the case of various scale features. Our DFPH retains over 80% precision and nearly identical precision–recall curves at hash bits 12, 24, 36, and 48. DFPH achieves superior precision and recall with the same hash code length compared to single-level features. The binary hash codes perform best when all feature scales are used. It proves that high-level characteristics are more effective in carrying information when creating hash codes. While low-level features can contribute supplementary information to the high-level features information, low-level features cannot entirely take the place of high-level characteristics. The information contained in each scale’s features is essential. It further demonstrates how well DFPH makes use of all scales’ features.
- (3)
- Ablation studies of the objective function: We evaluate the impact of Pairwise Quantization Loss and Classification Loss constraints, which reflects the effects of hash coding and classification on CIFAR-10. The experimental configuration is based on the proposed DFPH method, where and are the relevant parameters for and . The model is designated DFPH-J3 when and DFPH-J2 when . As seen in Table 6, if and are not 0, then each term of the suggested loss function constrains the creation of hash codes, and our method has achieved a improvement. and make nearly identical enhancements. The primary reason is that for the whole model, and play a role in reducing quantization error and semantic preservation, respectively. The performance of our model as a whole may decrease if one of them is eliminated.
4.6. Parameter Sensitivity Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
DFPH | Deep Feature Pyramid Hashing for efficient Image Retrieval |
CBIR | Content-Based Image Retrieval |
FPN | Feature Pyramid Network |
CNN | Convolutional Neural Network |
DCNN | Deep Convolutional Neural Network |
References
- Belloulata, K.; Belhallouche, L.; Belalia, A.; Kpalma, K. Region Based Image Retrieval using Shape-Adaptive DCT. In Proceedings of the ChinaSIP-14 (2nd IEEE China Summit and International Conference on Signal and Information Processing), Xi’an, China, 9–13 July 2014; pp. 470–474. [Google Scholar]
- Belalia, A.; Belloulata, K.; Kpalma, K. Region-based image retrieval in the compressed domain using shape-adaptive DCT. Multimed. Tools Appl. 2016, 75, 10175–10199. [Google Scholar] [CrossRef]
- Gionis, A.; Indyk, P.; Motwani, R. Similarity search in high dimensions via hashing. In Proceedings of the VlDB, Edinburgh, UK, 7–10 September 1999; Volume 99, pp. 518–529. [Google Scholar]
- Wang, J.; Zhang, T.; Song, J.; Sebe, N.; Shen, H.T. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 769–790. [Google Scholar] [CrossRef] [PubMed]
- Erin Liong, V.; Lu, J.; Wang, G.; Moulin, P.; Zhou, J. Deep hashing for compact binary codes learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2475–2483. [Google Scholar]
- Zhu, H.; Long, M.; Wang, J.; Cao, Y. Deep hashing network for efficient similarity retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Lai, H.; Pan, Y.; Liu, Y.; Yan, S. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3270–3278. [Google Scholar]
- Cakir, F.; He, K.; Bargal, S.A.; Sclaroff, S. Hashing with mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2424–2437. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Cao, Z.; Long, M.; Wang, J.; Yu, P.S. Hashnet: Deep learning to hash by continuation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5608–5617. [Google Scholar]
- Li, W.J.; Wang, S.; Kang, W.C. Feature learning based deep supervised hashing with pairwise labels. arXiv 2015, arXiv:1511.03855. [Google Scholar]
- Liu, H.; Wang, R.; Shan, S.; Chen, X. Deep supervised hashing for fast image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2064–2072. [Google Scholar]
- Gordo, A.; Almazán, J.; Revaud, J.; Larlus, D. Deep image retrieval: Learning global representations for image search. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 241–257. [Google Scholar]
- Jiang, Q.Y.; Li, W.J. Asymmetric deep supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, Edmonton, AB, Canada, 13–17 November 2018; Volume 32. [Google Scholar]
- Shen, F.; Gao, X.; Liu, L.; Yang, Y.; Shen, H.T. Deep asymmetric pairwise hashing. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1522–1530. [Google Scholar]
- Yang, W.; Wang, L.; Cheng, S.; Li, Y.; Du, A. Deep Hash with Improved Dual Attention for Image Retrieval. Information 2021, 12, 285. [Google Scholar] [CrossRef]
- Monowar, M.; Hamid, M.; Ohi, A.; Alassafi, M.; Mridha, M. AutoRet: A Self-Supervised Spatial Recurrent Network for Content-Based Image Retrieval. Sensors 2022, 22, 2188. [Google Scholar] [CrossRef]
- Jardim, S.; António, J.; Mora, C.; Almeida, A. A Novel Trademark Image Retrieval System Based on Multi-Feature Extraction and Deep Networks. J. Imaging 2022, 8, 238. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, Z.; Peng, Y.; Zhang, Z.; Yu, G.; Sun, J. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7103–7112. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Wang, W.; Zhao, S.; Shen, J.; Hoi, S.C.; Borji, A. Salient object detection with pyramid attention and salient edges. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 1448–1457. [Google Scholar]
- Ye, L.; Rochan, M.; Liu, Z.; Wang, Y. Cross-modal self-attention network for referring image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 10502–10511. [Google Scholar]
- Jin, Z.; Li, C.; Lin, Y.; Cai, D. Density sensitive hashing. IEEE Trans. Cybern. 2013, 44, 1362–1371. [Google Scholar] [CrossRef] [Green Version]
- Andoni, A.; Indyk, P. Near-optimal hashing algorithms for near neighbor problem in high dimension. Commun. ACM 2008, 51, 117–122. [Google Scholar] [CrossRef] [Green Version]
- Kulis, B.; Darrell, T. Learning to hash with binary reconstructive embeddings. Adv. Neural Inf. Process. Syst. 2009, 22, 1. [Google Scholar]
- Liu, H.; Ji, R.; Wu, Y.; Liu, W. Towards optimal binary code learning via ordinal embedding. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
- Wang, J.; Wang, J.; Yu, N.; Li, S. Order preserving hashing for approximate nearest neighbor search. In Proceedings of the 21st ACM International Conference on Multimedia, Barcelona, Spain, 21–25 October 2013; pp. 133–142. [Google Scholar]
- Shen, F.; Shen, C.; Liu, W.; Tao Shen, H. Supervised discrete hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 37–45. [Google Scholar]
- Salakhutdinov, R.; Hinton, G. Semantic hashing. Int. J. Approx. Reason. 2009, 50, 969–978. [Google Scholar] [CrossRef] [Green Version]
- Zhang, S.; Li, J.; Jiang, M.; Yuan, P.; Zhang, B. Scalable discrete supervised multimedia hash learning with clustering. IEEE Trans. Circuits Syst. Video Technol. 2017, 28, 2716–2729. [Google Scholar] [CrossRef]
- Lin, M.; Ji, R.; Liu, H.; Sun, X.; Wu, Y.; Wu, Y. Towards optimal discrete online hashing with balanced similarity. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27–28 January 2019; Volume 33, pp. 8722–8729. [Google Scholar]
- Jegou, H.; Douze, M.; Schmid, C. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 117–128. [Google Scholar] [CrossRef] [Green Version]
- Weiss, Y.; Torralba, A.; Fergus, R. Spectral hashing. Adv. Neural Inf. Process. Syst. 2008, 21, 1. [Google Scholar]
- Gong, Y.; Lazebnik, S.; Gordo, A.; Perronnin, F. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2916–2929. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Wang, J.; Kumar, S.; Chang, S.F. Hashing with graphs. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Datar, M.; Immorlica, N.; Indyk, P.; Mirrokni, V.S. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, Brooklyn, NY, USA, 8–11 June 2004; pp. 253–262. [Google Scholar]
- Liu, W.; Wang, J.; Ji, R.; Jiang, Y.G.; Chang, S.F. Supervised hashing with kernels. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2074–2081. [Google Scholar]
- Norouzi, M.; Fleet, D.J. Minimal loss hashing for compact binary codes. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
- Xia, R.; Pan, Y.; Lai, H.; Liu, C.; Yan, S. Supervised hashing for image retrieval via image representation learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014. [Google Scholar]
- Cao, Y.; Liu, B.; Long, M.; Wang, J. Hashgan: Deep learning to hash with pair conditional wasserstein gan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1287–1296. [Google Scholar]
- Zhuang, B.; Lin, G.; Shen, C.; Reid, I. Fast training of triplet-based deep binary embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 5955–5964. [Google Scholar]
- Liu, B.; Cao, Y.; Long, M.; Wang, J.; Wang, J. Deep triplet quantization. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 755–763. [Google Scholar]
- Yang, H.F.; Lin, K.; Chen, C.S. Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 437–451. [Google Scholar] [CrossRef] [Green Version]
- Wang, M.; Zhou, W.; Tian, Q.; Li, H. A general framework for linear distance preserving hashing. IEEE Trans. Image Process. 2017, 27, 907–922. [Google Scholar] [CrossRef]
- Shen, F.; Xu, Y.; Liu, L.; Yang, Y.; Huang, Z.; Shen, H.T. Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 3034–3044. [Google Scholar] [CrossRef]
- Fang, Y.; Li, P.; Zhang, J.; Ren, P. Cohesion Intensive Hash Code Book Co-construction for Efficiently Localizing Sketch Depicted Scenes. IEEE Trans. Geo-Sci. Remote Sens. 2022, 60, 1–16. [Google Scholar]
- Jiang, Q.Y.; Li, W.J. Discrete latent factor model for cross-modal hashing. IEEE Trans. Image Process. 2019, 28, 3490–3501. [Google Scholar] [CrossRef] [Green Version]
- Lin, J.; Li, Z.; Tang, J. Discriminative Deep Hashing for Scalable Face Image Retrieval. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 2266–2272. [Google Scholar]
- Yang, Y.; Geng, L.; Lai, H.; Pan, Y.; Yin, J. Feature pyramid hashing. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, Ottawa, ON, Canada, 10–13 June 2019; pp. 114–122. [Google Scholar]
- Ng, W.W.; Li, J.; Tian, X.; Wang, H.; Kwong, S.; Wallace, J. Multi-level supervised hashing with deep features for efficient image retrieval. Neurocomputing 2020, 399, 171–182. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.utoronto.ca/~kriz/learning-features-2009-TR.pdf (accessed on 2 August 2022).
- Bai, J.; Li, Z.; Ni, B.; Wang, M.; Yang, X.; Hu, C.; Gao, W. Loopy residual hashing: Filling the quantization gap for image retrieval. IEEE Trans. Multimed. 2019, 22, 215–228. [Google Scholar] [CrossRef]
- Chua, T.S.; Tang, J.; Hong, R.; Li, H.; Luo, Z.; Zheng, Y. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, Santorini Island, Greece, 8–10 July 2009; pp. 1–9. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef] [Green Version]
- Jiang, Q.Y.; Li, W.J. Scalable graph hashing with feature transformation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Wang, J.; Kumar, S.; Chang, S.F. Semi-supervised hashing for large-scale search. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2393–2406. [Google Scholar] [CrossRef]
- Bai, J.; Ni, B.; Wang, M.; Li, Z.; Cheng, S.; Yang, X.; Hu, C.; Gao, W. Deep progressive hashing for image retrieval. IEEE Trans. Multimed. 2019, 21, 3178–3193. [Google Scholar] [CrossRef]
- Cao, Y.; Long, M.; Liu, B.; Wang, J. Deep cauchy hashing for hamming space retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1229–1237. [Google Scholar]
- Sun, Y.; Yu, S. Deep Supervised Hashing with Dynamic Weighting Scheme. In Proceedings of the 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), Xiamen, China, 8–11 May 2020; pp. 57–62. [Google Scholar]
Conv Block | Layers | Kernel Size | Feature Size |
---|---|---|---|
1 | Conv2D | ||
Conv2D # | |||
MaxPooling | |||
2 | Conv2D | ||
Conv2D # | |||
MaxPooling | |||
3 | Conv2D | ||
Conv2D | |||
Conv2D | |||
Conv2D # | |||
MaxPooling | |||
4 | Conv2D | ||
Conv2D | |||
Conv2D | |||
Conv2D # | |||
MaxPooling | |||
5 | Conv2D | ||
Conv2D | |||
Conv2D | |||
Conv2D # | |||
MaxPooling |
CIFAR-10 (MAP) | ||||
---|---|---|---|---|
Method | 12 Bits | 24 Bits | 32 Bits | 48 Bits |
DFPH | 0.800 | 0.823 | 0.838 | 0.840 |
DPH [57] | 0.698 | 0.729 | 0.749 | 0.755 |
LRH [52] | 0.684 | 0.700 | 0.727 | 0.730 |
HashNet [9] | 0.609 | 0.644 | 0.632 | 0.646 |
DHN [6] | 0.555 | 0.594 | 0.603 | 0.621 |
DNNH [7] | 0.552 | 0.566 | 0.558 | 0.581 |
CNNH [38] | 0.439 | 0.511 | 0.509 | 0.522 |
SDH [27] | 0.285 | 0.329 | 0.341 | 0.356 |
KSH [36] | 0.303 | 0.337 | 0.346 | 0.356 |
ITQ [33] | 0.162 | 0.169 | 0.172 | 0.175 |
SH [32] | 0.127 | 0.128 | 0.126 | 0.129 |
NUS-WIDE (MAP) | ||||
---|---|---|---|---|
Method | 12 Bits | 24 Bits | 32 Bits | 48 Bits |
DFPH | 0.826 | 0.850 | 0.853 | 0.859 |
DPH [57] | 0.770 | 0.784 | 0.790 | 0.786 |
LRH [52] | 0.726 | 0.775 | 0.774 | 0.780 |
DHN [6] | 0.708 | 0.735 | 0.748 | 0.758 |
HashNet [9] | 0.643 | 0.694 | 0.737 | 0.750 |
DNNH [7] | 0.674 | 0.697 | 0.713 | 0.715 |
CNNH [38] | 0.611 | 0.618 | 0.625 | 0.608 |
SDH [27] | 0.568 | 0.600 | 0.608 | 0.637 |
KSH [36] | 0.556 | 0.572 | 0.581 | 0.588 |
ITQ [33] | 0.452 | 0.468 | 0.472 | 0.477 |
SH [32] | 0.454 | 0.406 | 0.405 | 0.400 |
Method | 12 Bits | 24 Bits | 32 Bits | 48 Bits | Conv Layers Num |
---|---|---|---|---|---|
VGG13 | 0.759 | 0.798 | 0.782 | 0.796 | 10 |
VGG16 | 0.763 | 0.824 | 0.824 | 0.821 | 13 |
VGG19 | 0.800 | 0.823 | 0.838 | 0.840 | 16 |
CIFAR-10 (MAP) | ||||
---|---|---|---|---|
Method | 12 Bits | 24 Bits | 32 Bits | 48 bits |
0.634 | 0.811 | 0.801 | 0.814 | |
0.540 | 0.530 | 0.634 | 0.619 | |
DFPH | 0.800 | 0.827 | 0.838 | 0.845 |
CIFAR-10 (MAP) | ||||
---|---|---|---|---|
Method | 12 Bits | 24 Bits | 32 Bits | 48 Bits |
DFPH-J_2 | 0.721 | 0.780 | 0.809 | 0.771 |
DFPH-J_3 | 0.771 | 0.766 | 0.783 | 0.809 |
DFPH | 0.800 | 0.827 | 0.838 | 0.845 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Redaoui, A.; Belloulata, K. Deep Feature Pyramid Hashing for Efficient Image Retrieval. Information 2023, 14, 6. https://doi.org/10.3390/info14010006
Redaoui A, Belloulata K. Deep Feature Pyramid Hashing for Efficient Image Retrieval. Information. 2023; 14(1):6. https://doi.org/10.3390/info14010006
Chicago/Turabian StyleRedaoui, Adil, and Kamel Belloulata. 2023. "Deep Feature Pyramid Hashing for Efficient Image Retrieval" Information 14, no. 1: 6. https://doi.org/10.3390/info14010006
APA StyleRedaoui, A., & Belloulata, K. (2023). Deep Feature Pyramid Hashing for Efficient Image Retrieval. Information, 14(1), 6. https://doi.org/10.3390/info14010006