Top Position Sensitive Ordinal Relation Preserving Bitwise Weight for Image Retrieval

In recent years, binary coding methods have become increasingly popular for tasks of searching approximate nearest neighbors (ANNs). High-dimensional data can be quantized into binary codes to give an efficient similarity approximation via a Hamming distance. However, most of existing schemes consider the importance of each binary bit as the same and treat training samples at different positions equally, which causes many data pairs to share the same Hamming distance and a larger retrieval loss at the top position. To handle these problems, we propose a novel method dubbed by the top-position-sensitive ordinal-relation-preserving bitwise weight (TORBW) method. The core idea is to penalize data points without preserving an ordinal relation at the top position of a ranking list more than those at the bottom and assign different weight values to their binary bits according to the distribution of query data. Specifically, we design an iterative optimization mechanism to simultaneously learn binary codes and bitwise weights, which makes their learning processes related to each other. When the iterative procedure converges, the binary codes and bitwise weights are effectively adapted to each other. To reduce the training complexity, we relax the discrete constraints of both the binary codes and the indicator function. Furthermore, we pretrain a tensor ordinal graph to decrease the time consumption of computing a relative similarity relationship among data points. Experimental results on three large-scale ANN search benchmark datasets, i.e., SIFT1M, GIST1M, and Cifar10, show that the proposed TORBW method can achieve superior performance over state-of-the-art approaches.


Introduction
With the rapid development of massive image collections, it has been challenging to search for visually relevant images effectively and efficiently [1,2]. In contrast to commonly used methods that exhaustively search for the most similar images in one high-dimensional space, hashing methods map floating point data into binary codes and achieve tasks of searching approximate nearest neighbors (ANNs) using Hamming distances. Therefore, hashing methods can accelerate ANN search procedures and save on storage. Recently, hashing methods have been applied in the area of computer vision and machine learning [3][4][5][6].
The pioneering method, locality-sensitive hashing (LSH) [7], randomly generates linear hashing functions and computes binary codes based on projection signs. The learning process is independent of training samples, and the performance cannot obviously improve as the number of binary bits increases [8]. To fix the above problem, many data-dependent hashing algorithms have been proposed to preserve training samples' similarity relationships in Hamming spaces. Generally speaking, (2) Unlike a general two-step mechanism, we simultaneously learn binary codes and bitwise weights by an iterative mechanism, and they can feedback into each other. When the mechanism converges, the binary codes and bitwise weights are effectively adapted to each other. (3) A tensor ordinal graph (TOG) is precomputed to represent a relative similarity relationship of any two data pairs, which can effectively reduce the training complexity.
Algorithms 2020, 13, x FOR PEER REVIEW 3 of 12 bottom. This measure helps to reduce the probability of chaos ranking occurring at the top position. (2) Unlike a general two-step mechanism, we simultaneously learn binary codes and bitwise weights by an iterative mechanism, and they can feedback into each other. When the mechanism converges, the binary codes and bitwise weights are effectively adapted to each other. (3) A tensor ordinal graph (TOG) is precomputed to represent a relative similarity relationship of any two data pairs, which can effectively reduce the training complexity. A tensor ordinal graph is constructed to approximate a similarity relationship of any two data pairs, and we establish a top-position-sensitive ordinal-relation-preserving restriction to improve the performance. We adopt an iterative optimization mechanism to simultaneously learn binary codes and bitwise weights. During the retrieval of approximate nearest neighbors, we utilize weighted hamming distances to resort a chaos ranking list in a Hamming space.
The rest of this paper is organized as follows: Section 2 describes the proposed TORBW method and the adopted iterative optimization mechanism. In Section 3, we show and discuss experimental results. Finally, we conclude this paper in Section 4.

Top-Position-Sensitive Ordinal-Relation-Preserving Restriction
We define X∈R d×N as a training dataset, which includes N samples, and xi∈R d×1 is the i-th data point. Our aim is to simultaneously learn a set of hash functions and bitwise weight functions. Thus, we can map xi into r-bit binary code bi = {h1(xi), ⋯, hr(xi)} and generate its bitwise weights [w1(xi), ⋯, wr(xi)]. Here, hc(xi) = sgn(uc T xi+b) represents the c-th hash function, and sgn(•) is the sign function. If uc ← [uc b] and xi ← [xi 1], we can rewrite the c-th hash function as hc(xi) = sgn (uc T xi). Similarly, we define the c-th bit weight function as wc(xi) = vc T xi, and it is sensitive to query sample xi.
In this paper, we achieve an ANN search task in two steps [20][21][22][23]. First, we retrieve nearest neighbors by Hamming distances as in most commonly used hash algorithms [1,9,[11][12][13]. Then, for data points that share the same Hamming distance to a query sample, the weighted Hamming distance dwh(•,•) is utilized to resort their ranking orders, as shown in Equation (1): (1) Figure 1. Flow chart of the proposed top-position-sensitive ordinal-relation-preserving bitwise weight (TORBW) method. A tensor ordinal graph is constructed to approximate a similarity relationship of any two data pairs, and we establish a top-position-sensitive ordinal-relation-preserving restriction to improve the performance. We adopt an iterative optimization mechanism to simultaneously learn binary codes and bitwise weights. During the retrieval of approximate nearest neighbors, we utilize weighted hamming distances to resort a chaos ranking list in a Hamming space.
The rest of this paper is organized as follows: Section 2 describes the proposed TORBW method and the adopted iterative optimization mechanism. In Section 3, we show and discuss experimental results. Finally, we conclude this paper in Section 4.

Top-Position-Sensitive Ordinal-Relation-Preserving Restriction
We define X∈R d×N as a training dataset, which includes N samples, and x i ∈R d×1 is the i-th data point. Our aim is to simultaneously learn a set of hash functions and bitwise weight functions. Thus, we can map x i into r-bit binary code b i = {h 1 (x i ), · · · , h r (x i )} and generate its bitwise weights [w 1 (x i ), · · · , w r (x i )]. Here, h c (x i ) = sgn(u c T x i +b) represents the c-th hash function, and sgn(·) is the sign function.
we define the c-th bit weight function as w c (x i ) = v c T x i , and it is sensitive to query sample x i .
In this paper, we achieve an ANN search task in two steps [20][21][22][23]. First, we retrieve nearest neighbors by Hamming distances as in most commonly used hash algorithms [1,9,[11][12][13]. Then, for data points that share the same Hamming distance to a query sample, the weighted Hamming distance d wh (·,·) is utilized to resort their ranking orders, as shown in Equation (1): Therefore, to guarantee the performance of an ANN search task, the data points' ordinal relation should be well preserved in both a Hamming space and a weighted Hamming space. Generally, existing ordinal-relation-preserving restrictions treat each data point at different ranking positions Algorithms 2020, 13, 18 4 of 12 equally [1,13]. In contrast, evaluation standards of ANN search tasks, such as mean average precision (mAP), pay more attention to samples at the top position [25]. To achieve the above requirement, we define a top-position-sensitive ordinal-relation-preserving restriction as in Equation (2), which can effectively optimize the precision at the top position: where R E and R H represent the position of x j at the ranking lists of x i 's nearest neighbors in different spaces. For the loss function defined in Equation (2), its first-order derivation is larger than zero, and the second-order derivation is smaller than zero. Obviously, L(x i , x j ) is a monotonic nondecreasing function, and its growth ration drastically decreases as the ranking order increases. Therefore, the loss function L(x i , x j ) can penalize more of the samples without preserving its original ranking orders at the top position than those at the bottom. As defined in Equation (3), R E (x i , x j ) represents the ranking order of x j to x i in a Euclidean space: where d(·,·) represents the Euclidean distance. As shown in Equation (4), R H (x i , x j ) calculates the number of dissimilar samples with closer positions than x j in a Hamming space: where d wh (·,·) returns the weighted Hamming distance and R H is computed on the basis of R E . The first question is how to compute the ranking list R E . The naive method of forming triplet data (x i , x j , x k ) is to select a similar data pair (x i , x k ) and a dissimilar data pair (x i , x j ) with a Euclidean distance. However, this kind of representation needs to randomly select triple tuples and compare the similarity degree among all data points side by side, which is time-consuming and needs costly memory. To avoid the above situation, we learn a TOG to represent a relative similarity relationship of any two data pairs in advance [1]. Given a dataset X, we first construct an affinity graph S, as shown in Equation (5): where the value in S represents the similarity degree of a data pair and is computed according to a Euclidean distance. Next, we further define a dissimilar graph DS, and DS(i, j) = 1/S(i, j), We can formulate a TOG as shown in Equation (6): where is defined as G(ij, kl) = S(i, j)·DS(k, l). Therefore, each entry of G relates to two data pairs, and the relative similarity relationship of any two data pairs (i, j, i, k) can be represented through the TOG, as shown in Equation (7): where δ ij represents the similarity degree of data pair (x i , x j ). Thus, we can form R E (x i , x j ) by directly counting elements, of which values are smaller than 1 in the column ij. R H (x i , x j ) can be generated by selecting triplet elements with d wh (

Relaxation and Iterative Optimization
In this section, we describe the process of learning hash and bitwise weight functions by minimizing loss function L(·,·). Different from the two-step mechanism, we design an iterative optimization mechanism that simultaneously learns parameters of hash and bitwise weight functions [9,10].
As binary codes and indicator functions are discrete integer values, directly optimizing loss function L(·,·) becomes difficult. To fix this problem, we adopt a relaxation mechanism.
First, we utilize tanh(·) instead of the encoding function as shown in Equation (8), and the weighted Hamming distance is rewritten as in Equation (9): Secondly, we employ the sigmoid function to approximate the indicator function as in shown in Equation (10): .
The definition of z is shown in Equation (11): After the above relaxation procedure, we utilize a stochastic gradient descent algorithm to optimize the objective function in Equation (12): For the parameter of the c-th hash function u c , the partial derivation of the objective function O is shown in Equation (13): The partial derivation of the weighted Hamming distance is shown in Equation (14): where • represents the element-wise product. As a result, we can update the value of the parameter u c shown in Equation (15): For v c of the c-th bitwise weight function, the partial derivation of the objective function is shown in Equation (16): Algorithms 2020, 13, 18 As a result, we can update the parameter v c shown in Equation (17): The details of the proposed TORBW method are shown in Algorithm 1.
Output: The coefficients of hash functions (u 1 , · · · , u r ) and bitwise weight functions (v 1 , · · · , v r ). 1: Choose training samples from X by the k-means algorithm; 2: Compute an affinity graph S and a dissimilar graph DS by Equation (5); 3: Construct a tensor ordinal graph G by Equation (6); 4: Generate an ordinal relations set of triplet elements (x i , x j , x k ) by G 5: for c=1:r 6: repeat 7: Compute the partial derivation by Equation (13); 8: Update the value of u c by Equation (15); 9: Compute the partial derivation by Equation (16); 10: Update the value of v c by Equation (17); 11: until convergence or reaching the maximum iteration number 12: end for.

Experimental Results and Discussion
In this section, we first introduce three publicly available datasets for ANN search experiments. Then, we describe the compared methods and the ANN search performance evaluation metrics. Finally, we compare the proposed TORBW method against several state-of-the-art hashing algorithms and bitwise weight methods.

Database and Setup
We conduct ANN search experiments on three datasets, i.e., SIFT1M [26], GIST1M [27], and Cifar10 [28] datasets. SIFT1M [26] consists of one million SIFT [29] descriptors with 128 dimensions. In SIFT1M, we randomly select 100,000 samples as a training dataset, and 10,000 data points are used as query samples. GIST1M [27] includes one million GIST [30] descriptors that have 320 dimensions, and we respectively select 100,000 data points as a training data and query samples. In Cifar10 [28], there are six thousand tiny images, which are described as 320-dimensional GIST descriptors [30]. We utilize all image descriptors in Cifar10 as training samples, and one thousand samples are considered as query data points.

Compared Methods and Evaluation Metrics
To prove that the proposed TORBW method can achieve excellent ANN search performance, we compare the TORBW method against five hashing algorithms and two bitwise weight methods. Among them, LSH [7], AGH [8], KMH [9], Top-RSBC [25], and OCH [1] aim to generate excellent Algorithms 2020, 13, 18 7 of 12 binary codes. WhRank [23] and QRank [24] learn bitwise weights to resort the tied ranking orders, and we use them to further boost the ANN search performances of LSH [7], AGH [8], and KMH [9].
In this experiment, we use the criterions recall [31,32] and mAP to evaluate the ANN search performances. recall represents the fraction of positive data that are successfully returned as defined in Equation (18).
where N positive means the number of positive data that are retrieved and N all is the number of the true nearest neighbors. We further use mAP to exactly express which position the i-th positive data point locates in, and the definition is shown in Equation (19): where |Q| represents the number of query samples, Ki is the number of the i-th query sample's ground truth and rank(j) is the ranking position of the j-th true positive sample in the retrieval results.

Experimental Results
In this section, we first encode floating data into 32-, 64-, and 128-bit binary codes by the hash methods (TORBW, LSH [7], AGH [8], KMH [9], Top-RSBC [25], and OCH [1]) and achieve ANN search tasks according to Hamming distances. Then, TORBW, WhRank [23], and QRank [24] assign different weight values to each binary bit and utilize the weighted Hamming distances to resort the tied ranking orders. The ANN search experimental results are shown in Tables 1-3 and Figures 2-4. We separately define the ground truth as 10 and 100 nearest neighbors in the Euclidean space.        We perform ANOVA tests by matlab function "anovan1()" and show the experimental results in Table 4. From the experimental results, we know that the improvements made by our algorithm are significant. We perform ANOVA tests by matlab function "anovan1()" and show the experimental results in Table 4. From the experimental results, we know that the improvements made by our algorithm are significant. In LSH [7], the learning process is independent from training samples, and the ANN search performance cannot evidently improve as the number of binary bits increases [8]. Thus, to obtain a satisfying ANN search performance, LSH needs a long binary code or multiple hash tables. In contrast, data-dependent algorithms, such as TORBW, OCH [1], Top-RSBC [25], KMH [9], and AGH [8], utilize a kind of machine learning mechanism to learn hashing functions, which can minimize the ANN search performance loss on a training database. The data-dependent hashing algorithms can achieve a better ANN search performance with compact binary codes. In this paper, the maximum number of binary bits is only 128, and the data-dependent hashing algorithms are superior to LSH. AGH [8] generates centers with the k-means algorithm and learns binary codes with the spectral graph cut mechanism. However, AGH demands that the distribution of training samples should be uniform [10]. In practice, the databases used in this paper do not obey a uniform distribution. KMH [9] learns encoding centers, which can minimize both quantization loss and similarity loss. As a result, the performance of KMH is better than that of AGH. AGH and KMH aim to preserve a pointwise similarity relationship. Top-RSBC defines a pairwise similarity relationship restriction for learning hashing functions, which can further enhance the power of preserving a similarity relationship among samples. The ANN search task emphasizes preserving an ordinal relation. To fulfill this task, TORBW and OCH [1] generate binary codes, which can preserve an original ordinal relation among data points in a Hamming space. Thus, TORBW and OCH have a superior performance over AGH, KMH, and LSH. As described above, the binary coding methods, OCH [1], Top-RSBC [25], and KMH [9], have shown efficacy for ANN search tasks, but they consider the importance of each binary bit equally. Therefore, many data with different binary codes would share the same Hamming distance to a query sample, and the tied ranking orders that exist in their retrieval results would result in inferior performance. To avoid the above ambiguous situation, TORBW, WhRank [23], and QRank [24] assign different weight values to each binary bit, and they use the weighted Hamming distance to resort the retrieval results. As a result, bitwise weight methods can effectively improve the ANN search performances of binary coding methods. The bitwise weights in TORBW and QRank [24] are particularly sensitive to query data points, and they have a better performance. The aforementioned methods including WhRank [23], QRank [24], OCH [1], KMH [9], ITQ [10], and LSH [7] treat all data points equally. However, evaluation criterions of ANN search tasks pay more attention at the top position of the retrieval results. To satisfy this requirement, TORBW and Top-RSBC [25] penalize mistakes at the top position more than those at the bottom. Therefore, TORBW and Top-RSBC can well preserve a similarity relationship among data points at the top position. As described above, TORBW can effectively preserve ordinal relations and utilizes query-sensitive bitwise weights to reduce the tied ranking orders occurring at the top position. Finally, TORBW achieves an excellent ANN search performance.

Conclusions
In this paper, we propose a novel method dubbed as the TORBW method, which simultaneously generates hash functions and bitwise weight functions. To guarantee the performance of an ANN search task, we demand the preservation of an original ordinal relation among data points in both a Hamming space and a weighted Hamming space. We utilize a TOG to represent a relative similarity relationship between any two data pairs, which can reduce the complexity of constructing a training dataset. Different from a two-step mechanism, which separately learns binary codes and bitwise weights, TORBW adopts an iterative mechanism to simultaneously optimize them. When the algorithm converges, binary codes and bitwise weights are effectively adapted to each other. Generally, training samples are considered to be equal. In contrast, we assign a large weight value to training samples at the top position during the learning process. As a result, it can effectively reduce the probability of tied ranking orders occurring at the top position of a ranking list. Extensive experiments on three benchmark datasets demonstrate that TORBW has a better ANN search performance than many existing state-of-the-art binary coding methods and bitwise weight methods. In this paper, we adopt linear hashing functions to map data into binary codes. However, the practical dataset may be linearly inseparable. In future work, we will employ a kernel formulation for target hashing functions to further enhance the performance of preserving ordinal relations.