Crime Scene Shoeprint Retrieval Using Hybrid Features and Neighboring Images

: Given a query shoeprint image, shoeprint retrieval aims to retrieve the most similar shoeprints available from a large set of shoeprint images. Most of the existing approaches focus on designing single low-level features to highlight the most similar aspects of shoeprints, but their retrieval precision may vary dramatically with the quality and the content of the images. Therefore, in this paper, we proposed a shoeprint retrieval method to enhance the retrieval precision from two perspectives: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) enhance the traditional distance-based similarity by leveraging the information embedded in the neighboring shoeprints. The experiments were conducted on a crime scene shoeprint image dataset, that is, the MUES-SR10KS2S dataset. The proposed method achieved a competitive performance, and the cumulative match score for the proposed method exceeded 92.5% in the top 2% of the dataset, which was composed of 10,096 crime scene shoeprints.


Introduction
Shoeprint retrieval aims at retrieving the most similar shoeprints that were collected at different crime scenes, to help investigators to reveal clues about a particular case. In past decades, large numbers of crime scene shoeprint images were collected and recorded for analysis. When there was a new case, investigators could manually compare shoeprints derived at the crime scene with those collected from other crime scenes to reveal clues. It is really difficult and tedious to conduct this work for a huge number of degraded shoeprints. Therefore, it is necessary to propose a more efficient automatic shoeprint retrieval method.
The first reason may be that the descriptive capability of the low-level feature has its own deficiencies. Figure 1 shows the illustrative cases of a failure by either approach. Each pair of shoeprints in Figure 1 does not have the same shoe patterns. The local features (e.g., the Gabor feature) cannot distinguish between the visual patterns in Figure 1a, while it accurately handles the visual patterns in Figure 1b. On the other hand, the holistic features (e.g., the Fourier-Mellin feature) fail to make distinctions in Figure 1b, while it successfully handles shoeprints in Figure 1a, because they consider the overall layout of the images. Therefore, the complementary descriptive capability of the local and holistic features naturally inspires us to integrate their strengths to yield more satisfactory retrieval results.
Information 2018, 9, x FOR PEER REVIEW 2 of 15 patterns in Figure 1b. On the other hand, the holistic features (e.g., the Fourier-Mellin feature) fail to make distinctions in Figure 1b, while it successfully handles shoeprints in Figure 1a, because they consider the overall layout of the images. Therefore, the complementary descriptive capability of the local and holistic features naturally inspires us to integrate their strengths to yield more satisfactory retrieval results.
(a) (b) The second reason may be that the shoeprints derived from crime scenes are usually misaligned, incomplete and degraded shoeprints affected by debris, shadows and other artifacts. Thus, it is difficult to retrieve crime scene shoeprints using pair-wise similarity computed from just two images. Figure 2 shows an example where the similarity estimation can potentially benefit from the neighbors. As shown in Figure 2, three samples A, B, and C are represented as filled circles, and the distance between A and B is equal to the distance between A and C. The feature similarity between A and B is equal to that between A and C, that is, ( , because the neighbors around A and C are much more similar than those around A and B. The descriptive capability of the neighbors inspires us to use the neighborhoods to yield more satisfactory retrieval results. Wang et al. [29] proposed a manifold ranking-based shoeprint retrieval method, which used opinion scores of shoeprint examples labeled by forensic experts to achieve a good performance. However, the method did not enhance the retrieval precision based on the two aspects stated above. Therefore, in this paper, we propose a shoeprint retrieval method to enhance the retrieval precision considering the following two aspects: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) utilize the information contained in the neighboring images to improve the performance of the shoeprint retrieval method.
The main contributions of the proposed method are as follows: (1) We propose a hybrid feature in a coarse to fine manner from holistic, region and local views. The proposed method integrates the strengths of three kinds of low-level features to yield more satisfactory retrieval results; (2) We propose a neighborhood-based similarity estimation (NSE) method, which utilizes the information contained in neighbors to improve the performance of a shoeprint retrieval method. The The second reason may be that the shoeprints derived from crime scenes are usually misaligned, incomplete and degraded shoeprints affected by debris, shadows and other artifacts. Thus, it is difficult to retrieve crime scene shoeprints using pair-wise similarity computed from just two images. Figure 2 shows an example where the similarity estimation can potentially benefit from the neighbors. As shown in Figure 2, three samples A, B, and C are represented as filled circles, and the distance between A and B is equal to the distance between A and C. The feature similarity between A and B is equal to that between A and C, that is, S(A, B) = S(A, C). The neighbors of the three samples are represented as circles. From the distribution of their neighbors in Figure 2, it is more reasonable to intuitively let S(A, B) < S(A, C), because the neighbors around A and C are much more similar than those around A and B. The descriptive capability of the neighbors inspires us to use the neighborhoods to yield more satisfactory retrieval results.
Information 2018, 9, x FOR PEER REVIEW 2 of 15 patterns in Figure 1b. On the other hand, the holistic features (e.g., the Fourier-Mellin feature) fail to make distinctions in Figure 1b, while it successfully handles shoeprints in Figure 1a, because they consider the overall layout of the images. Therefore, the complementary descriptive capability of the local and holistic features naturally inspires us to integrate their strengths to yield more satisfactory retrieval results.
(a) (b) The second reason may be that the shoeprints derived from crime scenes are usually misaligned, incomplete and degraded shoeprints affected by debris, shadows and other artifacts. Thus, it is difficult to retrieve crime scene shoeprints using pair-wise similarity computed from just two images. Figure 2 shows an example where the similarity estimation can potentially benefit from the neighbors. As shown in Figure 2, three samples A, B, and C are represented as filled circles, and the distance between A and B is equal to the distance between A and C. The feature similarity between A and B is equal to that between A and C, that is, (A, B) The neighbors of the three samples are represented as circles. From the distribution of their neighbors in Figure 2, it is more reasonable to intuitively let (A, B) , because the neighbors around A and C are much more similar than those around A and B. The descriptive capability of the neighbors inspires us to use the neighborhoods to yield more satisfactory retrieval results. Wang et al. [29] proposed a manifold ranking-based shoeprint retrieval method, which used opinion scores of shoeprint examples labeled by forensic experts to achieve a good performance. However, the method did not enhance the retrieval precision based on the two aspects stated above. Therefore, in this paper, we propose a shoeprint retrieval method to enhance the retrieval precision considering the following two aspects: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) utilize the information contained in the neighboring images to improve the performance of the shoeprint retrieval method.
The main contributions of the proposed method are as follows: (1) We propose a hybrid feature in a coarse to fine manner from holistic, region and local views. The proposed method integrates the strengths of three kinds of low-level features to yield more satisfactory retrieval results; (2) We propose a neighborhood-based similarity estimation (NSE) method, which utilizes the information contained in neighbors to improve the performance of a shoeprint retrieval method. The Wang et al. [29] proposed a manifold ranking-based shoeprint retrieval method, which used opinion scores of shoeprint examples labeled by forensic experts to achieve a good performance. However, the method did not enhance the retrieval precision based on the two aspects stated above. Therefore, in this paper, we propose a shoeprint retrieval method to enhance the retrieval precision considering the following two aspects: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) utilize the information contained in the neighboring images to improve the performance of the shoeprint retrieval method.
The main contributions of the proposed method are as follows: (1) We propose a hybrid feature in a coarse to fine manner from holistic, region and local views. The proposed method integrates the strengths of three kinds of low-level features to yield more satisfactory retrieval results; (2) We propose a neighborhood-based similarity estimation (NSE) method, which utilizes the information contained in neighbors to improve the performance of a shoeprint retrieval method. The greatest difference, compared to the other existing shoeprint retrieval methods, is that it not only considers the relationship between every two shoeprints, but also the relationship between their neighbors; (3) We propose a generic manifold based reranking framework, which can narrow the well-known gap between high-level semantic concepts and low-level features; (4) The proposed method can work well for real crime scene shoeprint image retrieval. The cumulative match score is more than 92.5% in the top 2% of the database, which was composed of 10,096 real crime scene shoeprint images. The evaluation shows our method consistently improves the retrieval precision and compares favorably with the state-of-the-art.
The rest of the paper is organized as follows. Section 2 reviews related works on shoeprint retrieval. Section 3 presents the proposed method. Section 4 provides the experimental results and the analysis, followed by the conclusions in Section 5.

Related Works
According to the scope of representation, features roughly fall into two categories: holistic features and local features.
Methods in the holistic features category usually take the whole image into consideration when extracting features. Bouridane et al. [1] used a fractal-based feature to retrieve shoeprint images. They can handle high quality shoeprint images; however, this method is sensitive to variations in rotations and translations. The moment invariant features were used for shoeprint retrieval [2,3], and they can work well for complete shoeprints; however, partial shoeprints are not considered. Chazal et al. [4] and Gueham et al. [5,6] used the Fourier transform to analyze the frequency spectra of shoeprint images, but the methods are sensitive to partial shoeprints. Cervelli et al. [7][8][9] utilized the Fourier transform on the cropped shoeprint images to extract features in frequency domain. However, these methods are sensitive to geometry transformations. Alizadeh et al. [10] retrieved shoeprints by using a sparse representation method. They reported good performance, but their method is sensitive to variations in rotation and translation. Richetelli et al. [11] implemented and tested some shoeprint retrieval methods on a scene-like shoeprint database, that is, the phase-only correlation (POC) method, Fourier-Mellin transformation and scale-invariant feature transform (SIFT) method. Results show that the POC method has better performance than the Fourier-Mellin transformation and the SIFT method; however, the performances of these methods may drop considerably when applied to degraded crime scene shoeprints. Kong et al. [12,13] applied a convolutional neural network to extract multi-channel features, and computed the similarity score using the normalized cross-correlation method. They have achieved a good performance. However, their algorithm requires a large amount of computation.
Methods in the local feature category always divide shoeprint into different regions, and then extract features from these regions. Patil et al. [14] convolved shoeprint images with Gabor filters, and then divided the filtered images into non-overlapping blocks to extract local features for shoeprint retrieval. The method shows good performance for partial shoeprints generated from full shoeprints. Tang et al. [15,16] used an attributed relational graph (ARG) to represent the shoeprint. In the graph, nodes represent fundamental shapes in shoes, such as lines, circles, ellipses, and so on. They reported good performance on distortions and partial shoeprints. However, it is a challenge to handle crime scene shoeprints with random breaks and extrusions, which cannot be represented by above fundamental geometry shapes. Pavlouet et al. [17,18] applied the maximally stable extremal regions (MSER) feature to represent shoeprints. However, the performance may drop a lot when dealing with shoeprint images with noises and distortions. Kortylewsk et al. [19] presented a periodic pattern-based shoeprint image retrieval method. The method firstly detects periodic patterns of the shoeprint, and then evaluates the similarity through comparing the Fourier features of the periodic patterns. The algorithm can deal with shoeprints with periodic patterns. However, it is  [20][21][22][23][24][25], but their performance may vary dramatically among crime scene shoeprints. The possible reasons may be that the crime scene shoeprints are highly degraded and randomly occluded, and there are many random extrusions, intrusions or breaks on the shoe patterns. Nevertheless, the local interest point based methods cannot work well on distinguishing the useful information from interferences. Kortylewski et al. [26,27] learned a compositional active basis model to each reference shoeprint, which was used to evaluate against other query images at testing time. The model can be learned well on high quality reference shoeprints. However, how to represent degraded crime scene shoeprint images remains a problem. Wang et al. [28] divided a shoeprint into a top region and a bottom region, and then extracted Wavelet-Fourier-Mellin transform-based features of the two regions for shoeprint retrieval. The method performs well for its invariant features and matching score estimation method. Wang et al. [29] proposed a manifold ranking shoeprint retrieval method that considers not only the holistic and region features but also the relationship between every two shoeprints. The method achieves a good performance on crime scene shoeprint images, but it neglects the effect of local features and the contribution of the neighboring shoeprints.

Notations and Formulations
in which q denotes the query shoeprint. We focus on finding a function f : U → R + that assigns to each shoeprint u i ∈ U a ranking score Our motivation is to enhance the shoeprint retrieval precision from the following two perspectives: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) utilize the information contained in the neighboring images to improve the performance of the shoeprint retrieval method. Therefore, we have two constraints on the ranking score: (i) closer shoeprint images in multiple feature spaces should share similar ranking scores; and (ii) shoeprint images with similar neighboring shoeprints should share similar ranking scores.
We construct the cost function by employing the above two constraints on f. The shoeprint retrieval problem can be defined as an optimal solution of minimizing the following cost function.
where β 1 , β 2 and γ are the regularization parameters. The first term weighted by β 1 is the neighborhood correlation term. Shoeprint images with similar neighbors should share similar ranking scores. S ij denotes the neighborhood based similarity between u i and u j , and A is a diagonal matrix, in which A ii = K ∑ j=1 S ij . Intuitively, similarity is usually defined as the feature relevance between two images. But it is difficult to use low level features to describe the shoeprints more clearly, because crime scene shoeprints are usually highly degraded and also randomly partial. Moreover, the traditional image-to-image similarity measure is sensitive to noises. One feasible way to deal with this problem is to use the neighborhoods to provide more information.
To this end, we propose a neighborhood-based similarity estimation (NSE) method which regards the neighbors of the images as their features, the more similar neighbors the images have, the higher similarity value they should share. Formally, for shoeprint images u i and u j , the neighborhood-based similarity between the two images can be defined as follows: where a S and a C are the weighted parameters, and a S + a C = 1. W mn denotes the hybrid feature similarity between shoeprint image u m and u n .
neighbors of u i , which is acquired based on the hybrid feature similarity W ij .|•| represents the cardinality of a set. N k (u i ) denotes the k nearest neighbors of u i , and they are acquired based on the region feature similarity S r u i , u j that can be calculated according to Equations (7)- (19) in [28].
Here we defined S ij = 1 for u i = u j . The second term weighted by β 2 is the smoothness term. The shoeprint images nearby in the feature space should share similar ranking scores. W ij denotes the hybrid feature similarity between u i and u j , and B is a diagonal matrix, in which The third term weighted by γ is the fitting term. y = [y 1 , y 2 , . . . , y K ] T is a vector, in which y i = 1, if u i is the query, and y i = 0 otherwise.
Equation (1) can also be generalized as a multiple similarity measures manifold ranking frame work, which can be formulated as follows: where W (p) denotes the adjacency matrix calculated using the pth similarity measure, P denotes the number of similarity measures, and C (p) is a diagonal matrix, and C

Solution
We solve the optimal question in Equation (3) by constructing a Lagrange function. To get an optimal regularization parameter β, we replace β p with β q p , where q > 1. Therefore, the Lagrange function is defined as follows: Letting Then, β p can be acquired as follows: Then, we update f by using the new β p . When β p is fixed, we can get The matrix-vector formulation of the function is: The ranking score can be obtained as follows: The algorithm is summarized in Algorithm 1.

The Affinity Matrix Computation Mothod
In [28], a Wavelet-Fourier-Mellin transform and Similarity Estimation (WFSE) based method is proposed to compute the matching score. The WFSE method has been applied successfully in forensic practice when retrieving crime scene shoeprint images, but it does not take into consideration the local patterns of the shoeprint. Generally, our observation of objects usually is a continuously improving process from the whole to the parts and to the details. Inspired by this rule, we propose a hybrid holistic, region and local features to compute the matching score, which follows the rules of our observation to objects. We define the hybrid feature similarity as follows: where b r , b h and b l denote the weighted parameters, and b r + b h + b l = 1. For a shoeprint image u i , the extraction process for its hybrid holistic, region and local features has following six main steps.
Step 1: Acquire and normalize the shoeprint image.
Step 1.1: Acquire the binarized shoeprint image. The shoeprint image is firstly split into a grid of cells, and then a thresholding method (e.g. Otsu's method) is applied to each cell to extract sub shoeprints. Finally, morphological operations are utilized to eliminate small holes and smooth edges.
Step 1.2: Resolution and orientation normalization. The shoeprint images are rescaled to a predefined resolution measured in dots per inch (DPI). And then we normalize the shoeprint image by using the Shoeprint Contour Model (SPCM) proposed in [28].
Step 2: The normalized shoeprint image u i is divided into the top region and the bottom region, and they are denoted as S top (i) and S bottom (i), respectively.
Step 3: The shoeprint image u i and its two regions S top (i) and S bottom (i) are decomposed at a specified number of levels by using the Haar Wavelet. We can acquire one approximation and three details. The coefficients can have the following forms: where L is the maximum level. To avoid merging the useful neighbor patterns, L should be able to meet the criterion: 2 L−1 ≤ D min , where D min represents the minimum distance between two neighbor patterns, which can be specified interactively.
Step 4: The Fourier-Mellin transform is applied on each wavelet coefficients to extract features.
Step 4.1: Calculate the Fourier magnitude of the pre-processed image by using the fast Fourier transform (FFT); Step 4.2: Use a band passed filter proposed in [30] to weaken the effect of the noises, such as small holes, intrusions, extrusions and broken patterns; Step 4.3: Perform the log-polar mapping of the filtered Fourier magnitude acquired in Step 4.2; Step 4.4: Calculate the Fourier magnitude of the log-polar mapping calculated in Step 4.3 by using the FFT; Step 4.5: The filtered Fourier-Mellin domain coefficients of FW(u i ) are used as holistic features and those of FW S top (i) and FW(S bottom (i)) are used as region features. Here, we use FMW(u i ), FMW S top (i) and FMW(S bottom (i)) to denote the holistic and region features of the shoeprint u i .
Step 6: Extraction of local features. A shoeprint image is convolved with the Gabor filters in eight orientations. Then each filtered shoeprint image is divided into non-overlapping blocks, and there where θ ∈ {θ 1 , θ 2 , . . . , θ 8 }, m = 1, 2, . . . , 8 and n = 1, 2, . . . , 16. For two shoeprint images u i and u j , the holistic feature similarity S h u i , u j between them is computed as follows: The regional feature similarity S r u i , u j between them is a weighted sum of correlation coefficients of both FMW S top (i) and FMW(S bottom (i)). Please refer to Equations (7)- (19) in [28] for details about how to set the weights adaptively. The local feature similarity S l u i , u j between two images is computed as follows:

Dataset
The experiments were conducted on two shoeprint datasets. One is the MUES-SR10KS2S dataset [28], and shoeprints in this dataset were collected from real crime scenes. The other is a public available dataset, named the FID-300 dataset [26].
The MUES-SR10KS2S dataset contains one probe set and one gallery set. The gallery set consists of 72 probe images, 432 synthetic versions of the probe images and 9592 crime scene shoeprints. Examples of crime scene shoeprint images in the dataset are shown in Figure 3. It can be seen that shoeprint images with same patterns differ greatly, due to the varying imaging conditions. is computed as follows: The regional feature similarity . Please refer to Equation (7)-Equation (19) in [28] for details about how to set the weights adaptively. The local feature similarity between two images is computed as follows:

Dataset
The experiments were conducted on two shoeprint datasets. One is the MUES-SR10KS2S dataset [28], and shoeprints in this dataset were collected from real crime scenes. The other is a public available dataset, named the FID-300 dataset [26].
The MUES-SR10KS2S dataset contains one probe set and one gallery set. The gallery set consists of 72 probe images, 432 synthetic versions of the probe images and 9592 crime scene shoeprints. Examples of crime scene shoeprint images in the dataset are shown in Figure 3. It can be seen that shoeprint images with same patterns differ greatly, due to the varying imaging conditions.  The FID-300 dataset consists of 300 probe shoeprints and 1175 gallery shoeprints. The probe shoeprint images were collected from crime scenes by investigators. The gallery shoeprints were generated by using a gelatine lifter on the outsole of the reference shoe, and then by scanning the lifters. The gallery shoeprints are of very high quality. Examples of shoeprints in FID-300 dataset are The FID-300 dataset consists of 300 probe shoeprints and 1175 gallery shoeprints. The probe shoeprint images were collected from crime scenes by investigators. The gallery shoeprints were generated by using a gelatine lifter on the outsole of the reference shoe, and then by scanning the lifters. The gallery shoeprints are of very high quality. Examples of shoeprints in FID-300 dataset are shown in Figure 4. Figure 4a shows one group of probe shoeprints, and Figure 4b shows their corresponding shoeprints in the gallery set.
shown in Figure 4. Figure 4a shows one group of probe shoeprints, and Figure 4b shows their corresponding shoeprints in the gallery set.

Evaluation Metrics
The cumulative match score used in [31] is applied to evaluate the performance of the method, and it is defined as follows: where P and n R denote the number of the probe images and the number of gallery images which match the probe images in the top n rank, respectively.

Performance Evaluation of the Proposed Hybrid Features and the Proposed NSE Method
To test the performance of the hybrid holistic, region and local features, we used four kinds of features to retrieve images in the dataset and evaluated the performance of these features according to their cumulative match score. We also compared the performance of the proposed NSE method with that of the features. The first kind of feature is the holistic feature, and its matching score was computed according to Equation (16). The second kind of feature is the region feature, in our method, the Wavelet-Fourier-Mellin feature proposed in [28] was used as the region feature, and its matching score was computed according to Equation (8) in [28]. The third kind of feature is the local feature, and its matching score was computed according to Equation (17). The fourth kind of feature is the proposed hybrid features, and the matching score was computed according to Equation (12). The results are listed in Table 1. For single features, region features have better performance than both holistic and local features. For the proposed hybrid feature, its cumulative match score is improved 6.6% on average than that of [28] because of the strengths of three kinds of low-level features. The results also illustrate that the cumulative match score of the proposed NSE method is improved 10.4% on average than that of [28]. Figure 5 provides a visual illustration of the top 10 shoeprint images in the ranking lists of our method and the compared method. The results show that the proposed hybrid feature and NSE method outperform the work of [28] on the MUES-SR10KS2S database.

Evaluation Metrics
The cumulative match score used in [31] is applied to evaluate the performance of the method, and it is defined as follows: CMS(n)= 100 R n |P| (18) where |P| and R n denote the number of the probe images and the number of gallery images which match the probe images in the top n rank, respectively.

Performance Evaluation of the Proposed Hybrid Features and the Proposed NSE Method
To test the performance of the hybrid holistic, region and local features, we used four kinds of features to retrieve images in the dataset and evaluated the performance of these features according to their cumulative match score. We also compared the performance of the proposed NSE method with that of the features. The first kind of feature is the holistic feature, and its matching score was computed according to Equation (16). The second kind of feature is the region feature, in our method, the Wavelet-Fourier-Mellin feature proposed in [28] was used as the region feature, and its matching score was computed according to Equation (8) in [28]. The third kind of feature is the local feature, and its matching score was computed according to Equation (17). The fourth kind of feature is the proposed hybrid features, and the matching score was computed according to Equation (12). The results are listed in Table 1. For single features, region features have better performance than both holistic and local features. For the proposed hybrid feature, its cumulative match score is improved 6.6% on average than that of [28] because of the strengths of three kinds of low-level features. The results also illustrate that the cumulative match score of the proposed NSE method is improved 10.4% on average than that of [28]. Figure 5 provides a visual illustration of the top 10 shoeprint images in the ranking lists of our method and the compared method. The results show that the proposed hybrid feature and NSE method outperform the work of [28] on the MUES-SR10KS2S database.

Comparison with the Traditional Manifold Ranking Method
Zhou et al. [32] provided a manifold based ranking (MR) algorithm, and the manifold regularization term of our proposed ranking cost function is based on their ideas. To evaluate the effectiveness of our method, we compared our method with the traditional manifold ranking method [32]. To ensure a fair comparison, the affinity matrixes used in both methods were the same ones calculated according to Equation (2). The cumulative match scores of the algorithms are listed in Table 2. The results show that the performance of our method is approximately 0.7% on average above that of Zhou et al. [32].

Comparison with the Manifold Ranking Based Shoeprint Retrieval Method
We compared the proposed method with the works of Wang et al. [29] which provided a manifold-based shoeprint retrieval algorithm. The method asks forensic experts to assign the shoeprint example an opinion score according to the similarity between the example and the query, in which the example denotes the shoeprint acquired at the same crime scene with the query shoeprint. To ensure a fair comparison, the vector

Comparison with the Traditional Manifold Ranking Method
Zhou et al. [32] provided a manifold based ranking (MR) algorithm, and the manifold regularization term of our proposed ranking cost function is based on their ideas. To evaluate the effectiveness of our method, we compared our method with the traditional manifold ranking method [32]. To ensure a fair comparison, the affinity matrixes used in both methods were the same ones calculated according to Equation (2). The cumulative match scores of the algorithms are listed in Table 2. The results show that the performance of our method is approximately 0.7% on average above that of Zhou et al. [32].

Comparison with the Manifold Ranking Based Shoeprint Retrieval Method
We compared the proposed method with the works of Wang et al. [29] which provided a manifold-based shoeprint retrieval algorithm. The method asks forensic experts to assign the shoeprint example an opinion score according to the similarity between the example and the query, in which the example denotes the shoeprint acquired at the same crime scene with the query shoeprint. To ensure a fair comparison, the vector y = [y 1 , y 2 , . . . , y K ] T used in our proposed method is same as that of [29], where if u i is the query or the shoeprint sample that has the similar shoe pattern with the query, y i = 1; else y i = 0. The cumulative match scores of the algorithms are listed in Table 3. The experimental results show that the performance of our method surpasses that of [29]. The cumulative match score of ours is approximately 3.0% above that of [29] on average. We also compared the proposed method with the state-of-the-art shoeprint retrieval methods on the MUES-SR10KS2S dataset. The results are shown in Table 4. For some state-of-the-art methods do not release codes, the results listed in Table 4 are achieved by running the codes implemented by ourselves. In this section, to ensure a fair comparison, the vector y = [y 1 , y 2 , . . . , y K ] T used in [29] is the one detailed in Section 3.1. The experimental results show that the cumulative match score of top 2% of the proposed algorithm is more than 92.5%. It can also be found that the cumulative match score of the proposed method is improved 5% compared to the work of Wang et al. [28] on top 2% of the ranked list. The results show that some methods cannot have the similar performance as that reported in the literatures on the MUES-SR10KS2S dataset. We think the possible reasons may be that: (i) the quality and quantity of shoeprints in datasets are greatly different; (ii) the codes and experimental settings may not be the optimal ones.
Our hardware configuration consists of a 3.33-GHz central processing unit (CPU) with 8-GB random access memory (RAM). All of the methods are implemented with MATLAB codes. The mean average running time is applied to evaluate the running times.
where i denotes the ith query image, T(i) represents the running time of the ith retrieval, n q denotes the number of query images, and K is the number of the gallery images. To further verify the effectiveness of the proposed method, we compared our proposed method with the works of Kortylewski et al. [26], Wang et al. [29], Kong et al. [12], Kortylewski [27] and Kong et al. [13] on a public available database named FID-300 [26]. The results are listed in Table 5. The results of Kortylewski et al. [26] and Kong et al. [12] are borrowed from Figure 5 in Kong et al. [12]. The results show that our method achieves a good performance on the FID-300 dataset, and the cumulative match score of our method reaches more than 95.3% in the top 20% of the dataset. The results also show that there is an improvement of approximately 1.3% compared with the works of Kong et al. [13] in the top 20% of the ranked list. However, the cumulative match scores of Kong et al. [13] surpass those of ours in the top 1%, 5% and 15% of the ranking list. The main reason may be as follows. Most of the probe shoeprint images in FID-300 database are of small size, and some of them only provide a small periodical pattern patch to retrieve shoeprint images. Kong et al. [13] used a template matching method to search over both translations (with a stride of 2) and rotations, which can work well for the case that the query image is a small patch. However, our method does not consider this case. Table 5. Comparisons with the state-of-the-art shoeprint retrieval algorithms on the FID-300 dataset.

Method
The

Analysis and Discussion
In this section, we further analyze the effect of different components of the cost function on the ranking result, which includes the roles of the proposed NSE method and the hybrid feature similarity. The influence caused by each kind of low-level feature is also discussed.

Effectiveness of the proposed NSE method and the hybrid feature similarity
In this section, we conducted experiments to verify the effectiveness of the proposed NSE method and the hybrid feature similarity. We conducted two kinds of experiments by using the MR method with two different affinity matrixes. The first affinity matrix consists of the hybrid feature similarities calculated according to Equation (12), and the second affinity matrix is acquired by using the proposed NSE method. The cumulative match scores of the algorithms are listed in Table 6, and the cumulative match characteristic curves of the algorithms are shown in Figure 6. The cumulative match scores show that the performance of our method is approximately 4.1% above that of Zhou et al. [32] with our proposed hybrid feature similarity on average, and approximately 0.7% on average above that of Zhou et al. [32] with the affinity matrix computed by using our proposed NSE method. The results also show that the proposed method using both hybrid feature similarity and NSE method achieves a higher performance than the one using only the proposed NSE method or the hybrid feature similarity.

Effectiveness of Each Kind of Low Level Feature
We conducted experiments to verify the effectiveness of each kind of the low level features. In the experiments, we replaced the proposed hybrid feature with three kinds of features, and evaluated the performance of these features according to their cumulative match score. The first kind of feature was the hybrid feature of holistic and region feature, and we conducted this kind of experiments to verify the effectiveness of the local features. The second kind of feature was the hybrid feature of holistic and local feature, and we conducted this kind of experiments to verify the effectiveness of the region features. The third kind of feature was the hybrid feature of region and local feature, and we conducted this kind of experiments to verify the effectiveness of the holistic features. The results are listed in Table 7. The results show that the method with the proposed hybrid features achieves a higher performance than the one using only two of them. Furthermore, the proposed hybrid features can integrate the strengths of three kinds of low-level features, and each kind of features can help the proposed method to yield more satisfactory retrieval results.

Conclusions
In this paper, we proposed an effective shoeprint image retrieval method. In the proposed method, we enhance the retrieval precision from two perspectives: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) enhance the traditional distance-based similarity by leveraging the information embedded in the neighboring shoeprints. The greatest difference between the proposed method and the other existing shoeprint retrieval

Effectiveness of Each Kind of Low Level Feature
We conducted experiments to verify the effectiveness of each kind of the low level features.
In the experiments, we replaced the proposed hybrid feature with three kinds of features, and evaluated the performance of these features according to their cumulative match score. The first kind of feature was the hybrid feature of holistic and region feature, and we conducted this kind of experiments to verify the effectiveness of the local features. The second kind of feature was the hybrid feature of holistic and local feature, and we conducted this kind of experiments to verify the effectiveness of the region features. The third kind of feature was the hybrid feature of region and local feature, and we conducted this kind of experiments to verify the effectiveness of the holistic features. The results are listed in Table 7. The results show that the method with the proposed hybrid features achieves a higher performance than the one using only two of them. Furthermore, the proposed hybrid features can integrate the strengths of three kinds of low-level features, and each kind of features can help the proposed method to yield more satisfactory retrieval results.

Conclusions
In this paper, we proposed an effective shoeprint image retrieval method. In the proposed method, we enhance the retrieval precision from two perspectives: (i) integrate the strengths of three kinds of low-level features to yield more satisfactory retrieval results; and (ii) enhance the traditional distance-based similarity by leveraging the information embedded in the neighboring shoeprints. The greatest difference between the proposed method and the other existing shoeprint retrieval methods is that we not only consider the relationship between every two shoeprints, but also the relationships between their neighbors. Our proposed method can also be generalized as a generic reranking framework which utilizes information contained in the neighbors to improve the effectiveness of manifold based method. Experiments on the real crime scene datasets have shown that the performance of the proposed algorithm outperforms not only the traditional manifold ranking method, but also the state-of-the-art shoeprint retrieval algorithms.