A Weighted Fourier and Wavelet-Like Shape Descriptor Based on IDSC for Object Recognition

: This article presents an effective shape descriptor with a property of fast matching. This descriptor, called IDSC-wFW (a weighted Fourier and wavelet-like descriptor based on inner distance shape context), first rewrites shape histograms of IDSC descriptors, changing the histogram belonging to a point to the histogram belonging to a field, and sets the histogram of a field as a one-dimensional signal, then transforms this one-dimensional signal by using a Fourier transform and a transform similar to Haar wavelet. Finally, the two transform results are linearly combined to form a new descriptor. This new descriptor requires only a distance-based measure method during the matching stage. Experimental results on three well-known databases show that this new descriptor not only obtains accurate retrieval results but also runs fast.


Introduction
Shape is a very important feature of an object in computer vision.Human beings can judge the categories of objects even with this feature alone.For example, human eyes can easily distinguish between cats and dogs very quickly.The human eyes have two advantages, which are strong discriminability and high speed, respectively.Many scholars are interested in these two advantages.In the last ten years, contour-based descriptors have attracted a lot of attention.Contour-based descriptors usually sample the contour points uniformly, and then describe these sampling points.
The research about shape description developed slowly before 2000 [1].From 2002, the study of shape descriptors entered a climax, and a large number of outstanding descriptors [2][3][4][5][6][7][8][9] emerged.The Shape Context (SC) method [5] creates a shape histogram for each contour sampling point and calculates the spatial distribution information of the remaining points to the relative point.The original version of the Shape Context uses TPS (Thin Plate Spline) to estimate the deformation cost, which is taken as a dissimilarity between two shapes during the matching process.
Many later algorithms are variations of the Shape Context-for example, IDSC.IDSC (Inner-Distance Shape Context) [3] replaces the Euclidean distance between two points with the inner distance, which gives this descriptor an ability to resist articulation.IDSC is a classic case of descriptors using DP (dynamic programming) in the matching process.DP is a good tool in shape matching.Bai et al. [10] use DP in the matching process of Shape Contexts to greatly improve its retrieval rate (bulls-eye test) on MPEG-7 CE-1 Part B Shape Database.DP can always make the retrieval rate of a descriptor higher than 80%.However, the DP matching method has a huge online calculation consumption.Some description methods are known for speed.Fourier descriptors (FD) are very commonly used fast descriptors.In general, a shape needs to extract a one-dimensional signal [11], such as the distance between contour points and the center point [12].Then, a one-dimensional signal is transformed by Fourier transform to obtain Fourier Descriptors.The Fourier descriptor is a global descriptor [13] and does not require dynamic programming.The dissimilarity between two shapes in the FD method can measured by distance metrics.Other descriptors using the spectral method behave similarly on MPEG-7 CE-1 Part B Shape Database [11,12,14,15], whose retrieval rates have a hard time exceeding 70%.
Hu et al. [13] proposed an MDM (multiscale distance matrix) descriptor; however, this descriptor is not effective enough on discriminability.Kaothanthong et al. [16] put forward the DIR (distance interior ratio) descriptor and its corresponding matching method.This method got more than a 77% retrieval rate on MPEG-7 CE-1 Part B with distance-based measurement only.Therefore, scholars see the hope of fast shape retrieval.With the influence of fast shape retrieval, this paper proposed a new fast shape descriptor IDSC-wFW, weighted Fourier and wavelet-like descriptor based IDSC, and this descriptor also has insensitivity to articulation.
In Section 2, the IDSC-wFW will be introduced in detail.The time complexity of the IDSC-wFW will be analyzed in Section 3.This article provides a lot of experimental results and discussion in Sections 4 and 5. Finally, the conclusions are drawn.

The Weighted Fourier and Wavelet-Like Descriptor Based on IDSC
IDSC-wFW is a global descriptor that is transformed from IDSC through Fourier transform and wavelet-like transform.Only a simple matching method based on distance measurement is needed in IDSC-wFW.In this section, three descriptors (IDSC-wF, IDSC-wW and IDSC-wFW) are introduced.

Inner-Distance Shape Context
In the method of IDSC [3], the contour is represented by N histograms (there are N sampling points x 1 , x 2 , ..., x N on the contour), which describe the relationships between the relative point and the remaining points.The histogram h i (k) of the point x i [17], is calculated by where the bins divide the log-polar plane uniformly.If the Fourier transform is directly performed on each shape histogram, the matching algorithm based on the dynamic programming (DP) has to still be used, and the matching speed will not change.However, speeding up is an important goal of our study.Therefore, we have to rewrite the shape histogram of IDSC with Equations (2) to (3): where N is the number of the sampling points, and K is the number of bins.It is clear that i and k are the index numbers of the sampling points and bins, respectively.Thus, h k is the kth field of H. H consists of K fields and each field of H is a sequence.The rewritten histogram still form an IDSC feature.Such a new histogram uniquely corresponds to the one with the same bin value in another IDSC feature, due to the unique correspondence of the bins in original shape histograms.Therefore, in the matching process of rewritten IDSC features, finding the optimal correspondence of shape histograms with DP is no longer needed.
The rewritten histogram is one period of a periodic signal with an indeterminate starting point.After performing the Fourier transform on the rewritten histogram, the matching process of the two transformed results only requires a distance metric.Distance metrics can greatly improve the matching efficiency of the descriptors.

Weighted Fourier Descriptor IDSC-wF
The Fourier transform could be applied to each field of H. Equation (4) shows a kind of Fourier transform: where the phase information is discarded [12].For a periodic signal, although the difference of the start location only leads to the difference of the phase of its Fourier transform, amplitudes will not be changed by different start locations.Therefore, the Fourier transform that abandons the phase information is used to get invariance to a different start location.To get the robustness to noise, the coefficients lower than M order of f k are used in descriptors, where M is much smaller than N.These M coefficients have different importance.For example, the lowest order coefficient is the most stable and important.Therefore, the weights are necessary to these Fourier coefficients.Equations ( 5) and (6) show the weighted Fourier descriptor: where θ m f is the weight of f k (m), and F k is the kth field of this weighted Fourier descriptor F.

Weight Wavelet-Like Descriptor IDSC-wW
Wavelet transform is another applicable tool to get the feature of a signal.Inspired by the application of Haar wavelet transform [18], a novel wavelet-like descriptor is proposed in this article.The following Equations ( 7) to (10) are proposed by analysis on Haar wavelet: where d r is a sequence containing N (in order to facilitate the calculation, N is necessary to be in the power of 2, N = 512 in this article) elements and d r+1 is the transforming result of d r by Equations ( 7) and ( 8) above.The o r (i) is the order number of the ith element in d r (i).d log 2 N can be obtained from d 0 through iterations with Equations ( 7) to ( 10) for log 2 N times.
It is important to note that these wavelet-like coefficients are not invariant to the start location of a circle sequence.Therefore, some preprocessing to the input sequence must be done.In this article, each field of IDSC shape context is sorted in ascending order, and the sorting result is expressed as The w k is obtained by Equation ( 11) below.It can be seen that w k consists of log 2 N elements.w k (j) describes the sum of amplitudes of d log 2 N k in the jth order: To get the robustness of the wavelet-like descriptor to noise, the first J elements of w k are used in the descriptor, where J is smaller than log 2 N .The weights are necessary for elements of w k as the same reason as the IDSC-wF: where θ j w is the weight of w k (j), and W is the weighted wavelet-like descriptor.Just like the weighted Fourier descriptor, W k derived from h k is the kth field of this weighted wavelet-like descriptor.

Weighted Fourier and Wavelet-Like Descriptor
Generally, neither IDSC-wF nor IDSC-wW is enough for shape retrieval, therefore a more effective descriptor is necessary.When IDSC-wF is combined with IDSC-wW, the new descriptor is obtained.This new descriptor is named IDSC-wFW (the weighted Fourier and wavelet-like descriptor), and is shown as Equation (14).The experiments in Section 4.1 show that IDSC-wFW is more effective than either IDSC-wF or IDSC-wW: where the IDSC − wFW is the descriptor containing IDSC-wF and IDSC-wW.
No two descriptors are suitable for combination.However, IDSC-wW is suitable to be combined with IDSC-wF because they extract complementary information of one contour.IDSC-wW extracts the features of each field of the IDSC after sorting the field instead of the original field.That is to say, the IDSC-wW extracts the structural information of the contour in a different coding mode with IDSC-wF.This structural information is a supplement to the discriminability of IDSC-wF, so IDSC-wF and IDSC-wW are suitable for combining.In addition, the calculation of IDSC-wW is extremely inexpensive.When IDSC-wF and IDSC-wW are used together, the efficiency is not affected too much.

Invariance of IDSC-wFW
Both SC and IDSC have invariance on translation and scale.In addition, IDSC obtains the invariance on rotation with inner-angle.The IDSC-wFW descriptor consists of two parts, which are all transformed from IDSC.Both Fourier transform and wavelet-like transform cannot change the invariance of IDSC on rotation, transformation and scaling.Thus, the IDSC-wFW descriptor has the same invariance as IDSC on rotation, translation and scaling.

Shape Dissimilarity Measure
Given two IDSC-wFW descriptors, IDSC − wFW(A) = {F(A), W(A)} and IDSC − wFW(B) = {F(B), W(B)}, extracted from shapes A and B, respectively.The dissimilarity between shapes A and B is where the dissimilarity consists of two parts, which belong to IDSC-wF descriptor and IDSC-wW descriptor, respectively.θ F is the weight of the distance measured by IDSC-wF, and θ W is the weight of the distance measured by IDSC-wW.Equations ( 16) and (17) show how to compute D F (A, B) and D W (A, B).In fact, when t = 1.5, the precision is higher, but t is set to 1 to achieve higher efficiency.In other words, the city block distance is used in this approach:

Computational Complexity
The computational complexity of IDSC-wFW consists of four parts: O(N 3 ) of computing the inner distance with the shortest path algorithm [3], O(N 2 ) of constructing the histograms, O((M + J)KN) of the Fourier transform and the wavelet-like transform, and O((M + J)K) of the matching cost, respectively.Usually, M ≤ 2log 2 N and J ≤ log 2 N, so O((M + J)K) = O(K log N).The fourth part cost, O(K log N), is the most important for online retrieval in a large database.The work of extracting shape features can be done offline; however, matching work must be done online, thus the matching complexity is more important.Table 1 shows the computational complexity of the matching cost of different methods.

Experiments
In the experiments of this article, the comparative methods are programed with Matlab software.Then, these methods are run on a personal computer with Intel(R) Core(TM) i5-4570 3.20 GHz CPU and 8 GB DDR2 RAM under Windows 10.As the dynamic programing program in IDSC [3] and SC [5] is time consuming heavily, it is written in C language in order to be comparative to other methods with an comparable time.

Experimental Results on the MPEG-7 CE-1 Part B Shape Database
To show the effectiveness of IDSC-wF, and IDSC-wFW, a series of experiments are conducted.MPEG-7 CE-1 Part B shape database, which contains 1400 shape images consisting of 70 classes of various shapes with 20 images in each class, is a widely used database for evaluating performances of similarity-based shape retrieval methods and is used in this article with the same evaluating method (bulls-eye test) as in [3,5,19].Some example shapes are shown in Figure 1.The retrieval rates (bulls-eye test) of these three descriptors on MPEG-7 CE-1 Part B shape database are shown in Tables 2-4.In this experiment, the results show that when M (the number of the Fourier coefficients) is 12, the IDSC-wF obtained its highest retrieval rate of 81.64%, and when J (the number of the wavelet-like coefficients) is 2, the IDSC-wW obtained its highest retrieval rate of 71.87%.Table 4 shows that IDSC-wFW can obtain the higher retrieval rate than either IDSC-wF or IDSC-wW when M and J are 12 and 3, respectively.Therefore, it is determined that IDSC-wFW is more effective than IDSC-wF and IDSC-wW.In these experiments, θ F = 1 and θ W = 7, which are determined according to the empirical studies.Using Equations ( 18) to (19), we linearly reduce the weights of the higher frequency coefficients, as the contributions of the higher frequency components to the shape description are generally smaller than the low frequency components.The parameters d f l = 0.008 and d w l = 0.21 are also set according to the empirical studies: M = 12 and J = 3 are determined by the experimental results shown in Table 4.The six adjustable parameters above with the settings above are used in all experiments without tuning for any datasets.
In Equations ( 4) and (11), with both f k (0) and w k (0) being the average of h k , θ 0 f is set as 0 because only one between them is needed for describing the shape.
The performance of the IDSC-wFW is also compared with the state-of-the-art approaches in terms of accuracy and speed.The parameters of other methods, with which all the methods can obtain their reported high retrieval rate on the MPEG-7 CE-1 Part B shape database, are also fixed without tuning.
The retrieval speed is measured by matching time, which is the time to compute the dissimilarities between the descriptor of query shape and the descriptors of all the shapes in the database.Table 5 shows the comparative results of the proposed IDSC-wFW and the-state-of-art benchmarks including DIR [16], MDM [13] and IDSC+DP [3].It is clear that IDSC-wFW obtains a high retrieval rate (82.34%), which is a little lower than IDSC+DP (85.40%) but higher than DIR (77.69%) and MDM [13].Fast descriptors do not use dynamic programming, and the IDSC-wFW outperforms all other methods on accuracy.[3] 85.40 6120.2DIR [16] 77.69 6.7 MDM [13] 70.46 44.2 ASD&CCD [20] 76.85 27037.9FD [12] 68.14 4.1 FPD [11] 65.52 3.3 More importantly, this high retrieval rate of IDSC-wFW is achieved with a speed of more than 530 (6120.2/11.4shown in Table 5) times faster than IDSC+DP, which uses DP for matching.This higher efficiency, relative to IDSC+DP, benefits from a matching approach based on distance metric with time complexity of O(K log N).The time complexity of the matching method based on dynamic programming used in IDSC+DP has reached O(KN 2 ).The question is why IDSC-wFW can achieve the same level of retrieval rate with a matching approach with such a low degree of time complexity.The specific reasons are discussed in detail in Section 5.

Experimental Results on the Articulated Database
It is well known that some descriptors have strong insensitivity to articulation.IDSC-wFW derives this goodness from IDSC.To show this property of IDSC-wFW, the proposed descriptor is applied to the Articulated shape dataset [3], which contains 40 images from eight categories (see Figure 2).To evaluate the retrieval accuracy of different methods, three most similar candidates to a query are chosen for statistics of retrieval results.In this statistical method, each image is set as a query and matched to other images of the database.The retrieval accuracy of the first, second and third most similar matches are the ratio of the correct matches to the retrieval times (40).The comparative experimental results are shown in Table 6.The results demonstrate that IDSC-wFW has the same insensitivity on articulation with IDSC [3].In fact, at the average accuracy of the top three most similar matches, IDSC-wFW (85.00%) is a little better than IDSC (84.17%).As the parameters of IDSC are fixed to make it most suit MPEG-7 CE-1 Part B, the retrieval result is a little different from that made by [3].[20] 80.0 50.0 32.5 54.16 735,390.9FD [12] 77.5 37.5 42.5 52.50 105.6 FPD [11] 70.0 50.0 32.5 50.83 75.7 MDM [13] 62.5 30.0 27.5 40.00 1187.0SC+DP [10] 50.0 25.0 27.5 34.17 146,203.5 In terms of efficiency, IDSC-wFW still has a speed of more than 530 (149774.1/279.3shown in Table 6) times faster than IDSC+DP.The speed of other algorithms, relative to IDSC-wFW, is almost the same as in Section 4.1.

Experimental Results on the 270 Plant Leaf Database
To show the effectiveness of IDSC-wFW, a series of experiments are conducted on a plant leaf database.This plant leaf image database is collected by [21].It contains many classes, but only 27 classes consist of more than 10 images.The first 10 images of each class, which consists of more than 10 images, are used to build a tidy database named "270 Plant Leaf" (see Figure 3. Leaf retrieval is widely used for evaluating the performance of the methods on shape retrieval.The large similarity between different classes and the variance between shapes in the same class are the main challenges in this experiment.The "Bulls-eye test" is used for measuring the performances of different methods in these experiments on the 270 Plant Leaf database.Table 7 shows the retrieval rates of different methods.It is clear that IDSC-wFW achieves the highest score 80.7% among 76.37% of DIR [16], 79.07% of IDSC+DP [3], 79.93% of SC+DP [10] and 64.41% of MDM [13]. In order to show the performance of different methods better, their PR (Precision-Recall) curves are drawn in Figure 4.It seems hard to judge whether the IDSC-wFW performs the best in Figure 4. Therefore, a measure on the performance of the PR curves is used.This is the Area enclosed by the PR curve and the axis.IDSC-wFW achieves the largest Area of 0.471127, among ASD and CCD (0.407479), DIR (0.469016), FD (0.404098), FPD (0.354920), IDSC+DP (0.444379) and MDM (0.407438).

Experimental Results on Sensitivity to Noise
In order to show the robustness of the proposed IDSC-wFW to noise, some experiments are conducted in this section.The same noise perturbation scheme, in which for each shape image in MPEG-7 CE-1 Part B dataset its coordinates (x,y) of the boundary points are added random Gaussian noise with different variance [11,19], is adopted.The noise corruption of a shape is measured by the signal-to-noise ratio (SNR) as where σ 2 s and σ 2 n are the variances of the signal and noise sequences, respectively.In these experiments, the measure method is still the bulls-eye test.The Gaussian noise is added to the query.The bulls-eye test on MPEG-7 CE-1 Part B [3] is run 10 times with different levels of noise SNR = 10 db, 20 db, ..., 100 db, respectively.Figure 5 summarizes the ability of the proposed IDSC-wFW to resist noise.It is clear that the performance of IDSC-wFW is not affected by the noise at 20 db to 100 db.When SNR = 10 db, the retrieval rate was still maintained over 80%.

Discussion
In the experimental results on MPEG-7 CE-1 Part B shape database, Figure 6 gives the retrieval results of four typical shapes obtained by IDSC+DP and the proposed IDSC-wFW.The results are listed and sorted in ascending order of dissimilarity (the top 10 ranked matches are shown).The odd columns are the results of IDSC-wFW, and the even columns are the results of IDSC+DP.Previously, IDSC+DP performs better than IDSC-wFW for simple shapes.However, for complex shapes, IDSC-wFW is much better than IDSC+DP-for example, flies and lizards.The experimental results on the Articulated database shows that the proposed IDSC-wFW is insensitive to articulation, the same as IDSC, maybe a little better than IDSC.It can be seen in experimental results on the 270 Plant Leaf database that IDSC-wFW performs better than any other methods including IDSC+DP [3].The experimental results on the 270 Plant Leaf database prove the superiority of the proposed IDSC-wFW in plant leaf retrieval.The experimental results on sensitivity to noise show that IDSC-wFW has strong anti-noise ability.
The high efficiency and effectiveness of the IDSC-wFW have been confirmed by experiments in Section 4.However, why does IDSC-wFW run both fast and accurately?The reason is introduced next.The shape context used by IDSC+DP is based on the primitive (contour sampling points) representation, which creates a shape histogram for a primitive.A histogram contains several fields.Since the relative order between the primitives is uncertain, IDSC+DP has to assume that the primitives are completely independent when matching.Then, DP finds the best correspondence between the two sets of primitives.However, in fact, the relative order exists between neighboring primitives at high probability.In other words, IDSC+DP wastes the value of the relative order at high probability between primitives.
In the proposed approach, shape context is reconstructed into a new feature based on the field representation.Each field representation is a vector, whose element is determined by its corresponding primitive (contour point).Then, the Fourier transform and the wavelet-like transform are applied to the vector (sequence) of each field.These two transforms can ignore some large deviation in a sequence.They extract the strong features in a sequence.Therefore, the discriminability of the IDSC-wFW is not worse than the original IDSC.In addition, the compatibility between the frequency domain descriptor and the distance metric method is so good in the matching process.The Fourier transform and the wavelet-like transform are performed offline.Thus, IDSC-wFW runs fast and accurately.
Overall, all the above advantages and disadvantages of IDSC-wFW are related to the special processing in the frequency domain.In terms of effectiveness, IDSC-wFW reduces high frequency noise, which always causes large dissimilarity between shapes in the same class.In terms of efficiency, the processing in the frequency domain makes the IDSC-wFW work well with an efficient distance metric method on the matching stage.

Figure 1 .
Figure 1.Seventy examples of all the categories in MPEG-7 CE1 Part B shape database, respectively, are on the left side of this figure.All of the various shapes of the class "Chicken" are on the right side of this figure.

Figure 2 .
Figure 2. Shapes in the articulated shape database.Each category has five shapes in this database.

Figure 3 .
Figure 3.The 270 Plant Leaf Database, which contains 27 classes with 10 images in each class.

Figure 4 .
Figure 4. Precision-Recall curves of different methods on the 270 Plant Leaf Database.In terms of efficiency, IDSC-wFW is still much more efficient than IDSC+DP, as it has a speed of more than 590 (1065.3/1.8)times faster than IDSC+DP; it only spent 0.179% of the matching time used by DP-base methods.The speed of other algorithms relative to IDSC-wFW is almost the same as in Section 4.1 again.

Figure 5 .
Figure 5. Retrieval rates of IDSC-wFW in MPEG-7 with the presence of noise.

Figure 6 .
Figure 6.The retrieval results of IDSC and IDSC-wFW on four queries.The odd columns are of IDSC-wFW and the even columns are of IDSC.The first row are the queries and the second row to the eleventh row are the second to the eleventh most similar shapes to queries.

Table 1 .
Computational complexities of some well-known algorithms during feature matching.

Table 2 .
Retrieval results on MPEG-7 CE-1 Part B shape database of IDSC-wF with different M values.

Table 3 .
Retrieval results on MPEG-7 CE-1 Part B shape database of IDSC-wW with different J values.

Table 4 .
Retrieval results on MPEG-7 CE-1 Part B shape database of IDSC-wFW with different M and J values.

Table 5 .
Retrieval results of different algorithms on MPEG-7 CE-1 Part B shape database.

Table 6 .
Retrieval results of different algorithms on the Articulated shape database.

Table 7 .
Retrieval results on the 270 Plant Leaf database of different algorithms.