You are currently viewing a new version of our website. To view the old version click .
Information
  • Article
  • Open Access

5 March 2024

Deep Supervised Hashing by Fusing Multiscale Deep Features for Image Retrieval †

,
and
1
RCAM Laboratory, Telecommunications Department, Sidi Bel Abbes University, Sidi Bel Abbes 22000, Algeria
2
School of Computer Sciences, Sidi Bel Abbes 22000, Algeria
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in the 1st National Conference on New Educational Technologies and Informatics, NCNETi’23, Guelma, Algeria, 3–4 October 2023.
This article belongs to the Special Issue Application of Machine Learning and Deep Learning in Pattern Recognition and Biometrics

Abstract

Deep network-based hashing has gained significant popularity in recent years, particularly in the field of image retrieval. However, most existing methods only focus on extracting semantic information from the final layer, disregarding valuable structural information that contains important semantic details, which are crucial for effective hash learning. On the one hand, structural information is important for capturing the spatial relationships between objects in an image. On the other hand, image retrieval tasks often require a more holistic representation of the image, which can be achieved by focusing on the semantic content. The trade-off between structural information and image retrieval accuracy in the context of image hashing and retrieval is a crucial consideration. Balancing these aspects is essential to ensure both accurate retrieval results and meaningful representation of the underlying image structure. To address this limitation and improve image retrieval accuracy, we propose a novel deep hashing method called Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF). Our approach involves extracting multiscale features from multiple convolutional layers and fusing them to generate more robust representations for efficient image retrieval. The experimental results demonstrate that our method surpasses the performance of state-of-the-art hashing techniques, with absolute increases of 11.1% and 8.3% in Mean Average Precision (MAP) on the CIFAR-10 and NUS-WIDE datasets, respectively.

1. Introduction

The internet and communication advancements have led to an overwhelming influx of images on the web [,,], creating a challenge for accurate and efficient large-scale data retrieval. To address this, hash-based image retrieval techniques [] have gained attention due to their ability to generate compact binary codes, offering computational efficiency and storage advantages.
These techniques can be classified as data-independent and data-dependent approaches. Data-independent methods, such as locality-sensitive hashing (LSH) [], use random projections as hash functions but suffer from limitations. They do not utilize auxiliary information, leading to poor retrieval accuracy, and require longer codes, consuming more storage [,,]. In contrast, data-dependent methods leverage training information to learn hashing functions, resulting in shorter codes with improved performance. They can be further categorized as unsupervised [,,,] or supervised hashing methods [,,,,,,].
Deep hashing techniques [,,,] have arisen as a result of the achievements made by deep neural networks in computer vision tasks. These methods, as opposed to traditional hashing methods, possess the ability to effectively extract high-level semantic features and facilitate end-to-end frameworks for generating binary codes. Nevertheless, a drawback of numerous existing deep hashing techniques [,,] is their reliance on features from the penultimate layer in fully connected networks, which serve as global image descriptors but fail to capture local characteristics.
To overcome these challenges, this paper proposes a novel deep hashing method, called Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF), which effectively captures multiscale object information. Specifically, it extracts features from different network stages, fuses them at the fusion layer, and encodes them into robust hash codes. The network uses various hashing results based on different scale features, enhancing retrieval recall without sacrificing precision. The key contributions of this paper are as follows:
1.
Integration of Structural Information: Multiscale deep hashing allows for the integration of structural information from images at multiple levels of granularity. Different scales capture varying levels of structural details in the image. By considering these multiple scales, the model can learn to encode both low-level details and high-level structural features, resulting in more informative hash codes.
2.
Feature Hierarchy: Deep neural networks used in multiscale hashing often have multiple layers, each capturing features at different abstraction levels. Lower layers capture finer details, while higher layers capture more abstract features or structures. This hierarchical feature representation helps strike a balance by encoding both fine-grained structural details and higher-level abstract information, addressing the trade-off.
3.
Adaptability to Task-Specific Needs: Multiscale deep hashing can be tailored to the specific requirements of the retrieval task. For tasks where preserving structural information is crucial, appropriate design choices can be made to prioritize encoding such information in the hash codes.
4.
Quantization and Bit Allocation: Hashing involves quantization of continuous features into binary codes. The bit allocation strategy can be optimized to balance between capturing structural information and achieving high retrieval accuracy. Techniques like adaptive bit allocation can be employed to allocate more bits to preserve crucial structural details while still maintaining retrieval accuracy.
5.
Learning Objectives and Loss Functions: The design of learning objectives and loss functions can be customized to emphasize the preservation of structural information. Including terms in the loss function that encourage the preservation of certain structural features can guide the learning process.

3. Proposed Method

In this section, we provide a comprehensive explanation of our proposed approach, Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF). We start by defining the problem of learning hash codes and subsequently introduce the architecture of our model. Finally, we describe the objective function of our proposed DSHFMDF method.

3.1. Problem Definition

Let X = x i i = 1 N R d × N represent a training dataset consisting of N images. Here, Y = y i i = 1 N R K × N denotes the ground truth labels for the x i samples, where K represents the number of classes. The pairwise label matrix S = s i j indicates the semantic similarity between training image samples, with s i j 0 , 1 . If s i j = 1 , this signifies that samples x i and x j are semantically similar, whereas s i j = 0 indicates that they are not. The objective of deep hashing methods is to learn a deep hash function f : x B 1 , 1 L that encodes each input x i into b i 1 , 1 L . Here, L represents the length of the binary codes.

3.2. Model Architecture

Figure 1 shows the network’s architecture of the proposed deep hashing for image retrieval. Our proposed approach for the model architecture is designed to be comprehensive and effective, consisting of five main components: (1) feature extraction; (2) feature reduction; (3) feature fusion; (4) hash coding; (5) and classification. By providing a more detailed and bulky description, we can better understand the intricacies of each component and how they contribute to the overall functionality of the model.
Figure 1. Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF).
To begin with, the feature extraction stage plays a crucial role in capturing relevant information from the input image. In our approach, we leverage the VGG-19 network as the backbone, which is a deep convolutional neural network known for its ability to extract rich and discriminative features. The VGG-19 architecture comprises several layers, each responsible for extracting features at different levels of abstraction.
During the feature extraction process, we extract features from multiple levels of the VGG-19 network. This includes extracting low-level features that capture structural information at a local level, such as edges and corners, as well as high-level features that capture more abstract and semantic information about the image. By considering features from multiple levels, we aim to capture both fine-grained details and high-level semantic concepts in the image representation.
During the feature extraction process, we adopt a comprehensive approach to extract features from various levels of the VGG-19 network. Specifically, we focus on the feature output of the last convolutional layer within each convolutional block (‘conv3’, ‘conv4’, and ‘conv5’), as well as the fully connected layer ‘ f c 1 ’. We have deliberately chosen to exclude ‘conv1’ and ‘conv2’ from this process due to their substantial memory footprint and relatively low semantic information content.
Table 1 provides a detailed overview of the convolutional layers’ parameters and the corresponding feature sizes of different convolutional blocks. This multi-level feature-extraction strategy enables us to capture a wide spectrum of image characteristics, ranging from low-level structural details, such as edges and corners, to high-level semantic concepts. By considering features from multiple levels of abstraction, our goal is to create a comprehensive image representation that encompasses fine-grained details and high-level semantic information. This approach allows us to effectively balance local and global information, leading to a more robust and discriminative feature representation for our specific task.
Table 1. Specifics of the feature extraction network are as follows. It is important to note that we utilize the features from layers marked with ‘#’. For the sake of simplicity, we have omitted the ReLU and Batch Normalization layers.
After feature extraction, we move on to the feature reduction stage, where we aim to reduce the dimensionality of the extracted features while preserving their discriminative power. To achieve this, we employ a 1 × 1 convolutional kernel, which acts as a linear combination of the features from different levels. This process helps enhance the depth and robustness of the extracted features while reducing redundancy.
Next, we proceed to the feature fusion layer, which consists of 1024 nodes. At this layer, we connect and combine the different feature levels, allowing for the integration of both low-level and high-level information. This fusion of features from multiple levels helps capture a comprehensive representation of the image, combining both local structural details and global semantic information.
In order to approximate hash codes, we perform a nonlinear mapping of the features from the fusion layer and the fully connected layer (FC). This mapping is accomplished using hash layers, which consist of L nodes representing the desired length of the hash codes. The nonlinear mapping ensures that the generated hash codes capture the essential characteristics of the image representation in a compact and efficient manner.
Moving forward, we concatenate the two hashing layers, resulting in a consolidated representation of the hash codes. This concatenated layer is then connected to the final hashing layer, which further refines the representation and prepares it for classification. By arranging the architecture in this manner, we aim to enhance the preservation of semantic information in the generated hash codes, ensuring that they possess meaningful and discriminative properties.
The classification layer is the last component of our model architecture. It contains neurons equal to the number of classes in the dataset, allowing the network to classify the images based on the learned representations. The classification layer takes advantage of the discriminative power of the hash codes to accurately assign images to their respective classes.
Through the comprehensive approach outlined above, our model utilizes different hashing outcomes based on the various feature levels. This leads to improved image retrieval performance, as the hash codes capture both local and global information. Furthermore, the learning process of the hash codes ensures the preservation of pairwise similarity and the maintenance of semantic information. This results in more meaningful and effective image retrieval based on the learned representations.

3.3. Objective Function

To ensure the learning of similarity-preserving hash codes, our DSHFMDF approach employs three loss functions: pairwise similarity loss, quantization loss, and classification loss. These losses are combined to train our model effectively.

3.3.1. Pairwise Similarity Loss

Our DSHFMDF strives to maintain the resemblances between pairs of input data in a Hamming space. We evaluate pairwise similarity by utilizing the inner product. In particular, the inner product . , . between hash codes b i and b j is precisely defined as d i s t H b i , b j = 1 2 b i T b j .
Given the binary codes B = b i i = 1 N and the pairwise labels S = s i j , the formulation of the likelihood of pairwise labels is expressed in the following manner:
p s i j | B = σ ( w i j ) s i j = 1 1 σ ( w i j ) s i j = 0
where σ ( w i j ) = 1 1 + e w i j and w i j = 1 2 b i T b j .
This formulation implies that a larger inner product b i , b j corresponds to a smaller d i s t H b i , b j and a higher value of p 1 | b i , b j . Thus, when s i j = 1 , the binary codes b i and b j are considered similar.
Upon calculating the negative log-likelihood of labels on S, we encounter the subsequent optimization problem:
J 1 = log p S | B = s i j S ( s i j w i j log ( 1 + e w i j ) )
The optimization problem described above seeks to minimize Hamming distance between similar samples while maximizing the distance between dissimilar points. This objective is in line with the goals of pairwise similarity-based hashing techniques.

3.3.2. Quantization Loss

In practical applications, binary hash codes are commonly used to measure similarity. However, optimizing discrete hash codes within a CNN presents challenges. To overcome this, we propose a continuous form of hash coding. The output of the hash layer is defined as u i and we set b i = sgn u i .
To minimize the discrepancy between continuous and discrete hash codes, we introduce the quantization loss as the second objective:
J 2 = i = 1 Q b i u i 2 2
Here, Q represents the mini-batch size.

3.3.3. Classification Loss

To ensure robust learning of multiscale features throughout the deep network, we employ cross-entropy loss (classification loss) to classify the classes. The formulation of the classification loss is given by
J 3 = i = 1 Q k = 1 K y i , k log ( p i , k ) ,
In this context, y i , k denotes the true label and p i , k represents the softmax output of the i-th training sample belonging to the k-th class.
To summarize, the overall loss function is obtained by combining the losses from pairwise similarity, pairwise quantization, and classification:
J = J 1 + β J 2 + γ J 3

4. Experiments

We validate the effectiveness of our approach using two publicly available datasets: NUS-WIDE and CIFAR-10. Firstly, we provide a concise overview of these datasets, followed by an exploration of our experimental configurations. Section 4.3 presents the evaluation metrics and baseline methods. Finally, in the concluding section, we present the results of our method, including validations and comparisons with several state-of-the-art hashing techniques.

4.1. Datasets

The CIFAR-10 [] database, as described in the work by Krizhevsky et al. (2009), comprises a collection of 60,000 images categorized into 10 classes. Each image has a dimension of 32 × 32 pixels. Following the approach outlined in [], we randomly choose 100 images per class to serve as queries, resulting in a total of 1000 test instances. The remaining images form the database set. Additionally, we randomly select 500 images per category (totaling 5000) from the database to create the training set.
The NUS-WIDE [] database, introduced by Chua et al. (2009), is a comprehensive collection of approximately 270,000 images sourced from Flickr. It consists of 81 different labels or concepts. For our experiment, we randomly choose 2100 images from 21 classes to form the query database set, while the remaining images serve as the database. Furthermore, we randomly select 10,000 images from the database set to construct the training dataset.

4.2. Experimental Settings

To implement DSHFMDF, we utilize PyTorch as our framework. As a base network, we employ a VGG-19 convolutional network that has been pre-trained on the ImageNet dataset []. Throughout our experiments, we train our network using the Adam algorithm [] with a learning rate of 1 × 10 5 . As for the hyperparameters of the cost function, we assign a value of 0.01 to alpha and 0.1 to beta.

4.3. Evaluation Metrics

In order to assess the performance of various approaches, we employ four evaluation metrics: (MAP) Mean Average Precision, (PR) Precision–Recall curves, Precision curve within Hamming radius 2, and Precision curves with top N returned results (P@N).
We conduct a comparison between our proposed DSHFMDF method and several classical or state-of-the-art methods, which encompass five unsupervised shallow methods, two traditional supervised hashing techniques, and eight deep supervised hashing techniques. In the case of the multi-label CIFAR-10 and NUS-WIDE datasets, samples are considered similar if they share the same semantic labels. Conversely, if the samples have different semantic labels, they are considered dissimilar.

4.4. Results Discussion

The results obtained from our experiments on the CIFAR-10 and NUS-WIDE datasets, evaluating the performance of various hash code lengths, are presented in Table 2. These results are, partially, presented in our preliminary paper []. The table clearly shows that our proposed DSHFMDF (Deep Supervised Hashing by Fusing Multiscale Deep Feature) method outperforms all the compared methods by a significant margin. Specifically, when compared to SDH (Supervised Discrete Hashing), which is considered one of the top shallow hashing methods, DSHFMDF demonstrates substantial improvements, with absolute increases of 49.4% and 24% in average Mean Average Precision (MAP) on the CIFAR-10 and NUS-WIDE datasets, respectively. This highlights the effectiveness of our approach in generating high-quality hash codes for image retrieval tasks.
Table 2. The Mean Average Precision (MAP) scores for Hamming ranking on CIFAR-10 and NUS-WIDE datasets with different numbers of bits. The MAP values are computed based on the top 5000 retrieved images for the NUS-WIDE dataset.
Furthermore, our results indicate that deep hashing methods, including DSHFMDF, outperform traditional hashing techniques. This can be attributed to the ability of deep hashing methods to generate more robust feature representations, leveraging the power of deep neural networks. Among the deep hashing techniques, DSHFMDF surpasses the second-best method, DMUH (Deep Momentum Uncertainty Hashing), achieving average MAP increases of 1.275% and 3.1% on the CIFAR-10 and NUS-WIDE datasets, respectively. These findings highlight the superiority of DSHFMDF in capturing and preserving semantic information in the hash codes, leading to improved retrieval performance.
In order to provide a more detailed analysis of our results, we present Precision curves (P@H = 2) in Figure 2a and Figure 3a, which illustrate the retrieval performance of different methods. Notably, the Precision curves clearly demonstrate that our proposed DSHFMDF model consistently outperforms the other methods as the code length increases. This indicates that our method maintains the highest precision rate, even when longer hash codes are used, showcasing its effectiveness in producing accurate and reliable retrieval results.
Figure 2. The results obtained by comparing various methods on the CIFAR-10 dataset using three evaluation metrics.
Figure 3. The results of comparing different approaches on the NUS-WIDE dataset using three evaluation metrics.
To further evaluate the performance of DSHFMDF, we analyze its Precision–Recall (PR) performance and Precision at N (P@N) measures compared to other approaches. Figure 2b,c and Figure 3b,c present the PR performance and P@N results for the CIFAR-10 and NUS-WIDE datasets, respectively. In Figure 2c and Figure 3c, it is evident that the proposed DSHFMDF method achieves the highest precision when using 48-bit hash codes. This demonstrates its effectiveness in generating precise retrieval results. Additionally, Figure 2b and Figure 3b show consistently high precision levels at low recall, which is of great importance in precision-oriented retrieval tasks and has practical application in various systems.
In conclusion, our DSHFMDF method surpasses the compared methods across multiple evaluation aspects, highlighting its superiority in image retrieval tasks. To provide visual evidence of its effectiveness in removing irrelevant images, we present Figure 4, which showcases the retrieval accuracy of various image categories in the CIFAR-10 dataset using DSHFMDF with 48-bit binary codes. The figure includes the query images in the first column, while the subsequent columns display the retrieved images using DSHFMDF. This example further emphasizes the ability of our approach to accurately retrieve relevant images and demonstrates its practical utility.
Figure 4. The top 20 retrieved results from the CIFAR-10 dataset using DSHFMDF with 48-bit hash codes are presented. The query images are displayed in the first column, while the subsequent columns showcase the retrieval results obtained by DSHFMDF.
Overall, our extensive experiments and detailed analysis validate the superiority of the proposed DSHFMDF method in generating high-quality hash codes, improving retrieval performance, and achieving accurate and precise image retrieval results.

4.5. Ablation Studies

(1)
In our ablation studies, we explored different fundamental feature extractors. Specifically, we replaced VGG19 with VGG13 and VGG16, and the performance of these models on the CIFAR-10 dataset is presented in Table 3. The table illustrates that using deeper neural networks can improve image retrieval performance, leading us to choose VGG19 as our primary feature extractor.
Table 3. Mean Average Precision (MAP) of Hamming ranking for different number of bits using VGG13, VGG16, and VGG19 as the fundamental feature extractors on CIFAR-10 dataset. The best results are highlighted in bold.
(2)
Ablation studies on multi-level image representations for improved hash learning: In our ablation studies, we conducted a thorough investigation into the impact of different levels of image representations on the performance of our DSHFMDF method. Unlike many existing approaches that primarily focus on extracting semantic information from the penultimate layer in fully connected networks, we recognized the importance of structural information that contains crucial semantic details for effective hash learning. To address this, we considered using feature maps from various layers, ranging from low-level to high-level features.
Table 4 presents the results of these experiments, focusing on the retrieval performance using different feature maps on CIFAR-10. Notably, we observed that utilizing features from ‘ f c 1 ’ led to the most significant average mAP score of 69%, emphasizing the importance of high-level features in capturing semantic information. In contrast, using features from ‘conv 3–5’ yielded an average mAP score of only 59.4%, which reflects the significance of low-level and structural details. Furthermore, our proposed DSHFMDF method outperformed other methods, achieving an average mAP score of 82.15%. This result demonstrated that combining features from different scales, including both low-level and high-level features, significantly enhanced the performance of our method.
Table 4. Mean Average Precision (MAP) of Hamming ranking of different scales for different numbers of bits on CIFAR-10.
In Figure 5, we display the Precision–Recall curves of DSHFMDF in the case of various scale features. Our DSHFMDF retains over 80% Precision and nearly identical Precision–Recall curves at 12, 24, 36, and 48 hash bits. DSHFMDF achieves superior Precision and Recall with the same hash code length compared to other methods. The binary hash codes perform best when all feature scales are used. This proves that high-level characteristics are more effective in carrying information when creating hash codes. While low-level features can contribute supplementary information to the high-level features information, low-level features cannot entirely take the place of high-level characteristics. The information contained in each scale’s features is essential. This further demonstrates how well DSHFMDF makes use of all scales’ features.
Figure 5. Precision–Recall curve of different scale features on CIFAR-10 with 12, 24, 32, and 48 hash bits.
(3)
Ablation studies of the objective function: We conducted ablation studies on our objective function to assess the influence of Pairwise Quantization Loss and Classification Loss constraints, which examine the impact of hash coding and classification on CIFAR-10. We based our experiments on the proposed DSHFMDF method, where β and γ are the key parameters for J 2 and J 3 . When β = 0 , the model is referred to as DSHFMDF-J3, and when γ = 0 , it is DSHFMDF-J2. As shown in Table 5, if β and γ are not set to zero, each component in the suggested loss function contributes to hash code creation, resulting in a 6.55 % performance improvement. Both J 2 and J 3 yield similar enhancements because they reduce quantization errors and preserve semantics in the overall model. Eliminating either J 2 or J 3 may reduce the model’s performance.
Table 5. mAP values of different variants of our objective function for CIFAR-10 dataset.

4.6. Parameter Sensitivity Analysis

To examine how hyper-parameters β and γ impact retrieval performance, we conducted experiments spanning the values of β and γ from 0.0001 to 0.1, with a tenfold increase, using a 48-bit code on the CIFAR-10 dataset. In Figure 6, we present the changes in mAP curves for DSHFMDF as β and γ vary. Initially, mAP improves as β and γ increase, but performance levels off after reaching their peaks. Notably, our method maintains satisfactory performance over a wide range of β and γ values, demonstrating its robustness to these parameters.
Figure 6. Parameter sensitivity analysis, on CIFAR-10 dataset with 48 bits, of the proposed DSHFMDF.

5. Conclusions and Future Work

This paper presents an end-to-end approach called Deep Supervised Hashing by Fusing Multiscale Deep Features (DSHFMDF) for image retrieval. Our proposed method focuses on generating robust binary codes by optimizing the similarity loss, quantization loss, and semantic loss. Moreover, the network leverages various hashing results based on multiscale features, thereby enhancing retrieval recall while preserving precision. In summary, multiscale deep hashing offers a framework to efficiently incorporate structural information into hash representations, aiding in the balance between structural information and image retrieval accuracy. The careful design of architectures, loss functions, and bit allocation strategies is crucial in achieving this balance based on the specific requirements of the application. Through extensive experiments on two image retrieval datasets, we demonstrate the superiority of our method over other state-of-the-art hashing techniques. As a natural extension of this research, future work will explore the application of our framework to medical image datasets. Medical images often contain objects at various scales, with many being very small. We believe that the use of our method will significantly enhance retrieval performance in the context of medical imaging.

Author Contributions

Conceptualization, A.R. and K.B.; methodology, K.B.; software, A.R.; validation, K.B. and A.R.; formal analysis, A.R., A.B. and K.B.; investigation, K.B. and A.B.; writing—original draft preparation, A.R. and K.B.; writing—review and editing, K.B. and A.R.; visualization, A.R. and A.B.; supervision, K.B. and A.B.; project administration, K.B.; funding acquisition, K.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: http://www.cs.toronto.edu/~kriz/cifar.html, https://paperswithcode.com/datasets (all accessed on 20 July 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
DSHFMDFDeep Supervised Hashing by Fusing Multiscale Deep Features
CBIRContent-Based Image Retrieval
FPNFeature Pyramid Network
CNNConvolutional Neural Network
DCNNDeep Convolutional Neural Network

References

  1. Yan, C.; Shao, B.; Zhao, H.; Ning, R.; Zhang, Y.; Xu, F. 3D room layout estimation from a single RGB image. IEEE Trans. Multimed. 2020, 22, 3014–3024. [Google Scholar] [CrossRef]
  2. Yan, C.; Li, Z.; Zhang, Y.; Liu, Y.; Ji, X.; Zhang, Y. Depth image denoising using nuclear norm and learning graph model. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2020, 16, 1–17. [Google Scholar] [CrossRef]
  3. Li, S.; Chen, Z.; Li, X.; Lu, J.; Zhou, J. Unsupervised variational video hashing with 1d-cnn-lstm networks. IEEE Trans. Multimed. 2019, 22, 1542–1554. [Google Scholar] [CrossRef]
  4. Wang, J.; Zhang, T.; Song, J.; Sebe, N.; Shen, H.T. A survey on learning to hash. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 769–790. [Google Scholar] [CrossRef]
  5. Gionis, A.; Indyk, P.; Motwani, R. Similarity search in high dimensions via hashing. In Proceedings of the VLDB, Edinburgh, UK, 7–10 September 1999; Volume 99, pp. 518–529. [Google Scholar]
  6. Andoni, A.; Indyk, P. Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 2008, 51, 117–122. [Google Scholar] [CrossRef]
  7. Indyk, P.; Motwani, R. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, Dallas, TX, USA, 24–26 May 1998; pp. 604–613. [Google Scholar]
  8. Gong, Y.; Lazebnik, S.; Gordo, A.; Perronnin, F. Iterative quantization: A procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 2916–2929. [Google Scholar] [CrossRef]
  9. Liu, W.; Wang, J.; Mu, Y.; Kumar, S.; Chang, S.F. Compact hyperplane hashing with bilinear functions. arXiv 2012, arXiv:1206.4618. [Google Scholar]
  10. Gong, Y.; Kumar, S.; Rowley, H.A.; Lazebnik, S. Learning binary codes for high-dimensional data using bilinear projections. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 484–491. [Google Scholar]
  11. Lin, G.; Shen, C.; Wu, J. Optimizing ranking measures for compact binary code learning. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; Springer: Cham, Switzerland, 2014; pp. 613–627. [Google Scholar]
  12. Kulis, B.; Darrell, T. Learning to hash with binary reconstructive embeddings. Adv. Neural Inf. Process. Syst. 2009, 22, 1042–1050. [Google Scholar]
  13. Strecha, C.; Bronstein, A.; Bronstein, M.; Fua, P. LDAHash: Improved matching with smaller descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 66–78. [Google Scholar] [CrossRef] [PubMed]
  14. Lin, G.; Shen, C.; Suter, D.; Van Den Hengel, A. A general two-step approach to learning-based hashing. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2552–2559. [Google Scholar]
  15. Lin, G.; Shen, C.; Shi, Q.; Van den Hengel, A.; Suter, D. Fast supervised hashing with decision trees for high-dimensional data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1963–1970. [Google Scholar]
  16. Chen, Z.; Zhou, J. Collaborative multiview hashing. Pattern Recognit. 2018, 75, 149–160. [Google Scholar] [CrossRef]
  17. Cui, Y.; Jiang, J.; Lai, Z.; Hu, Z.; Wong, W. Supervised discrete discriminant hashing for image retrieval. Pattern Recognit. 2018, 78, 79–90. [Google Scholar] [CrossRef]
  18. Song, J.; Gao, L.; Liu, L.; Zhu, X.; Sebe, N. Quantization-based hashing: A general framework for scalable image and video retrieval. Pattern Recognit. 2018, 75, 175–187. [Google Scholar] [CrossRef]
  19. Erin Liong, V.; Lu, J.; Wang, G.; Moulin, P.; Zhou, J. Deep hashing for compact binary codes learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2475–2483. [Google Scholar]
  20. Zhu, H.; Long, M.; Wang, J.; Cao, Y. Deep hashing network for efficient similarity retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
  21. Lai, H.; Pan, Y.; Liu, Y.; Yan, S. Simultaneous feature learning and hash coding with deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3270–3278. [Google Scholar]
  22. Cakir, F.; He, K.; Bargal, S.A.; Sclaroff, S. Hashing with mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 2424–2437. [Google Scholar] [CrossRef]
  23. Gordo, A.; Almazán, J.; Revaud, J.; Larlus, D. Deep image retrieval: Learning global representations for image search. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Cham, Switzerland, 2016; pp. 241–257. [Google Scholar]
  24. Jiang, Q.Y.; Li, W.J. Asymmetric deep supervised hashing. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
  25. Shen, F.; Gao, X.; Liu, L.; Yang, Y.; Shen, H.T. Deep asymmetric pairwise hashing. In Proceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1522–1530. [Google Scholar]
  26. Jin, Z.; Li, C.; Lin, Y.; Cai, D. Density sensitive hashing. IEEE Trans. Cybern. 2013, 44, 1362–1371. [Google Scholar] [CrossRef]
  27. Li, X.; Wang, H. Adaptive Principal Component Analysis. In Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), Society for Industrial and Applied Mathematics: Virtual Conference, Originally Scheduled in Alexandria, Alexandria, VA, USA, Virtual. 28–30 April 2022; pp. 486–494. [Google Scholar] [CrossRef]
  28. Li, X.; Wang, H. On Mean-Optimal Robust Linear Discriminant Analysis. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 28 November–1 December 2022; pp. 1047–1052. [Google Scholar] [CrossRef]
  29. Datar, M.; Immorlica, N.; Indyk, P.; Mirrokni, V.S. Locality-sensitive hashing scheme based on p-stable distributions. In Proceedings of the Twentieth Annual Symposium on Computational Geometry, Brooklyn, NY, USA, 8–11 June 2004; pp. 253–262. [Google Scholar]
  30. Weiss, Y.; Torralba, A.; Fergus, R. Spectral hashing. Adv. Neural Inf. Process. Syst. 2008, 21, 1753–1760. [Google Scholar]
  31. Liu, W.; Wang, J.; Ji, R.; Jiang, Y.G.; Chang, S.F. Supervised hashing with kernels. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2074–2081. [Google Scholar]
  32. Norouzi, M.; Fleet, D.J. Minimal loss hashing for compact binary codes. In Proceedings of the ICML, Bellevue, WA, USA, 28 June–2 July 2011. [Google Scholar]
  33. Shen, F.; Shen, C.; Liu, W.; Tao Shen, H. Supervised discrete hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 37–45. [Google Scholar]
  34. Li, W.J.; Wang, S.; Kang, W.C. Feature learning based deep supervised hashing with pairwise labels. arXiv 2015, arXiv:1511.03855. [Google Scholar]
  35. Cao, Y.; Liu, B.; Long, M.; Wang, J. Hashgan: Deep learning to hash with pair conditional wasserstein gan. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1287–1296. [Google Scholar]
  36. Zhuang, B.; Lin, G.; Shen, C.; Reid, I. Fast training of triplet-based deep binary embedding networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 5955–5964. [Google Scholar]
  37. Liu, B.; Cao, Y.; Long, M.; Wang, J.; Wang, J. Deep triplet quantization. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 755–763. [Google Scholar]
  38. Yang, H.F.; Lin, K.; Chen, C.S. Supervised learning of semantics-preserving hash via deep convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 437–451. [Google Scholar] [CrossRef]
  39. Wang, M.; Zhou, W.; Tian, Q.; Li, H. A general framework for linear distance preserving hashing. IEEE Trans. Image Process. 2017, 27, 907–922. [Google Scholar] [CrossRef] [PubMed]
  40. Shen, F.; Xu, Y.; Liu, L.; Yang, Y.; Huang, Z.; Shen, H.T. Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 3034–3044. [Google Scholar] [CrossRef]
  41. Zhao, X.; Liu, J. Leveraging Deep Features Enhance and Semantic-Preserving Hashing for Image Retrieval. Electronics 2022, 11, 2391. [Google Scholar] [CrossRef]
  42. Ma, Z.; Guo, Y.; Luo, X.; Chen, C.; Deng, M.; Cheng, W.; Lu, G. Dhwp: Learning high-quality short hash codes via weight pruning. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 4783–4787. [Google Scholar]
  43. Fu, C.; Wang, G.; Wu, X.; Zhang, Q.; He, R. Deep momentum uncertainty hashing. Pattern Recognit. 2022, 122, 108264. [Google Scholar] [CrossRef]
  44. Lin, J.; Li, Z.; Tang, J. Discriminative Deep Hashing for Scalable Face Image Retrieval. In Proceedings of the IJCAI, Melbourne, Australia, 19–25 August 2017; pp. 2266–2272. [Google Scholar]
  45. Yang, Y.; Geng, L.; Lai, H.; Pan, Y.; Yin, J. Feature pyramid hashing. In Proceedings of the 2019 on International Conference on Multimedia Retrieval, Ottawa, ON, Canada, 10–13 June 2019; pp. 114–122. [Google Scholar]
  46. Redaoui, A.; Belloulata, K. Deep Feature Pyramid Hashing for Efficient Image Retrieval. Information 2023, 14, 6. [Google Scholar] [CrossRef]
  47. Ng, W.W.; Li, J.; Tian, X.; Wang, H.; Kwong, S.; Wallace, J. Multi-level supervised hashing with deep features for efficient image retrieval. Neurocomputing 2020, 399, 171–182. [Google Scholar] [CrossRef]
  48. Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images. 2009. Available online: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 31 July 2022).
  49. Bai, J.; Li, Z.; Ni, B.; Wang, M.; Yang, X.; Hu, C.; Gao, W. Loopy residual hashing: Filling the quantization gap for image retrieval. IEEE Trans. Multimed. 2019, 22, 215–228. [Google Scholar] [CrossRef]
  50. Chua, T.S.; Tang, J.; Hong, R.; Li, H.; Luo, Z.; Zheng, Y. Nus-wide: A real-world web image database from national university of singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval, Santorini Island, Greece, 8–10 July 2009; pp. 1–9. [Google Scholar]
  51. Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
  52. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  53. Redaoui, A.; Belloulata, K. Deep Supervised Hashing with Multiscale Feature Fusion (DSHMFF). In Proceedings of the NCNETi’23, The 1st National Conference on New Educational Technologies and Informatics, Guelma, Algeria, 3–4 October 2023. [Google Scholar]
  54. Xia, R.; Pan, Y.; Lai, H.; Liu, C.; Yan, S. Supervised hashing for image retrieval via image representation learning. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014. [Google Scholar]
  55. Cao, Z.; Long, M.; Wang, J.; Yu, P.S. Hashnet: Deep learning to hash by continuation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5608–5617. [Google Scholar]
  56. Bai, J.; Ni, B.; Wang, M.; Li, Z.; Cheng, S.; Yang, X.; Hu, C.; Gao, W. Deep progressive hashing for image retrieval. IEEE Trans. Multimed. 2019, 21, 3178–3193. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.