A Fourier Descriptor of 2D Shapes Based on Multiscale Centroid Contour Distances Used in Object Recognition in Remote Sensing Images

A shape descriptor is an effective tool for describing the shape feature of an object in remote sensing images. Researchers have put forward a lot of excellent descriptors. The discriminability of some descriptors is very strong in the experiments, but usually their computational cost is large, which makes them unsuitable to be used in practical applications. This paper proposes a new descriptor-FMSCCD (Fourier descriptor based on multiscale centroid contour distance)—which is a frequency domain descriptor based on the CCD (centroid contour distance) method, multiscale description, and Fourier transform. The principle of FMSCCD is simple, and the computational cost is very low. What is commendable is that its discriminability is still strong, and its compatibility with other features is also great. Experiments on three databases demonstrate its strong discriminability and operational efficiency.


Introduction
The objects in remote sensing images are more blurred compared to in common images, therefore, it is hard for recognition with texture and point features, as Figure 1 shows. Without texture features and feature points, people can only use shape features to identify objects. A shape descriptor is a great tool for the task of identifying objects relying on shape features. In addition, theoretically, deep learning can complete object recognition in remote sensing images but establishing a training dataset is a very difficult task since such images are not easy to obtain. Therefore, for object recognition in remote sensing images, a shape descriptor is a suitable tool.

Introduction
The objects in remote sensing images are more blurred compared to in common images, therefore, it is hard for recognition with texture and point features, as Figure 1 shows. Without texture features and feature points, people can only use shape features to identify objects. A shape descriptor is a great tool for the task of identifying objects relying on shape features. In addition, theoretically, deep learning can complete object recognition in remote sensing images but establishing a training dataset is a very difficult task since such images are not easy to obtain. Therefore, for object recognition in remote sensing images, a shape descriptor is a suitable tool.  A shape descriptor is always used to extract the shape features of an object in an image. Research on shape descriptors has attracted scholars for more than 20 years. In the past two decades a lot of effective descriptors [1][2][3][4] and post-processing methods [5][6][7] based on machine learning appeared. Among them, IDSC + DP (inner-distance shape context and dynamic programming) [8,9], SC + DP (shape context and dynamic programming) [10], Shape Tree [4], TAR (triangle-area representation) [1], FD (Fourier descriptor) [11], and WD (wavelet descriptor) [12] are some classical descriptors. Co-transduction (Co-transduction for shape retrieval) [13], LCDP (locally constrained diffusion process) [14], and GMM (modified mutual graph) [15] are post-processing methods that are popular recently, however, their high accuracy is still based on the performance of classical descriptors.
FD [11] is a kind of frequency domain descriptor with high practical value because of its excellent balance between speed and precision. FD usually obtains a feature vector in the spatial domain first, then the spatial domain feature vector; for example, CCD (centroid contour distance) is transformed into a frequency domain feature vector. In [11], the FD-CCD (FD based on CCD) obtains the best experimental results on the MPEG-7 CE1 Part B shape database among many combinations. MDM (multiscale distance matrix) [16] is also a descriptor that is known for speed. It uses a multiscale description method to compute a feature matrix for a shape. In the matching process, the dissimilarity is the city block distance between two feature matrices. DIR (distance interior ratio) [17] is a relatively new fast descriptor. It is as fast as FD-CCD, but it is more accurate than FD-CCD.
ASD&CCD [18] is a descriptor which combines the CCD method with the ASD (angle scale descriptor) method. It is more accurate than FD-CCD and MDM, however it runs slowly as it uses an optimization algorithm to find the best correspondence for the starting point. In this paper, Fourier transform and multiscale description are used to improve ASD and CCD.
The ASD [18] feature contains some angle sequences, which are computed at different scales. The element in the sequence is an angle that is formed by a contour point and two other contour points at its front and rear. The two contour points at the front and rear have the same length of interval to the contour point in the middle. The length of interval is how many contour points there are, and it refers to the scale. To improve ASD, each angle sequence is transformed to the frequency domain to form FASD (Fourier descriptor based on ASD).
The CCD [18] feature is a distance sequence. The element in the sequence is the distance between a contour point and the centroid point of the contour. As the CCD method is too simple, improving CCD is the most important work in this article. Fourier transform and multiscale description are all used to form FMSCCD (Fourier descriptor based on multiscale CCD). IDSC + DP [8] is undoubtedly a great descriptor. It has obtained very high accuracy in experimental results in some databases, but as it uses dynamic programming in the matching process, the matching efficiency is extremely low, which makes it useless in engineering. Shape Tree [4] and TAR [1] are all the same as IDSC + DP, which is accurate but slow. Some researchers also use skeletons to describe shapes [19,20], but skeleton-based methods are less popular than contour-based methods.
Some descriptors based on matching learning [21,22] have appeared in recent years. Sometimes these methods are used in medical images analysis [23]. However, as the training datasets are not easy to obtain, these methods are not universal.
In the remainder of this paper, Section 2 of this paper describes the specific calculation process of the proposed method. In Section 3, some databases are used to evaluate the performance of the proposed method. Section 4 discusses the performance of the proposed method. Finally, this paper is concluded.

Methods
CCD [18] is a commonly used spatial domain feature. For a sequence p 1 , p 2 , . . . , p N p of uniform contour points in order, where N p means how many sampling points there are on the contour, the centroid contour point of these contour points is first calculated, using Equation (1).
where p i = (x i , y i ) is the ith contour point of a shape. Then, the Euclidean distance between each contour point and the centroid contour point p m = (x m , y m ) is calculated, using Equation (2).
where d i uc is the unnormalized Euclidean distance between the ith point and the centroid contour point p m . d i ccd is the normalized distance, centroid contour distance, calculated with Equation (3). The purpose of normalization is to make the feature scaling invariant and reduce the disturbance caused by the number of sampling points changing.
The sequence ccd is the CCD feature of a shape. The CCD feature can be used directly to describe a shape. However, there is a problem in the CCD method. When the starting position of the sampling point of the closed contour changes, the CCD feature vector will cyclically shift. Therefore, in the CCD feature space, the distance between two shapes, s 1 and s 2 , is computed with Equation (4).
where dis ccd (s 1 , s 2 ) is the distance/dissimilarity between two shapes s 1 and s 2 in the CCD feature space and d n+i ccd = d n+i−N p ccd exists as the contour is closed. The optimization problem shown in Equation (4) is a non-convex optimization, so the general convex optimization solution methods are not applicable. Evolutionary algorithms can be used for solving, but with lower efficiency. Therefore, the CCD method has no advantage in efficiency. Some scholars use Fourier transform [7] to transform CCD into a frequency domain feature, FD-CCD (Fourier descriptor based on CCD), which was obtained with Equation (5).
The matching method of F ccd is based on the city block distance, shown as Equation (6).
where s 1 and s 2 are the index numbers of two shapes and K means how many coefficients of the frequency domain feature are used in the matching process. In this article, K = 50 exists. The ASD [18] feature also can be transformed into the frequency domain feature FASD through the Fourier transform in the same way, thereby improving the efficiency in the matching stage. The FASD feature is used in the experimental part. The matching efficiency of FCCD (Fourier descriptor based on centroid contour distance) is greatly improved, but there is not much improvement in terms of accuracy.
In order to demonstrate the discriminability of the CCD method, the exhaustive method is temporarily used in its matching process. Using the CCD method in shape matching in the MPEG-7  (7).
where the database contains N d shapes. sign(s 1 , s 2 ) indicates if two shapes are in the same class, shown in Equation (8).
where label() is the class label of a shape. dis s ccd , the average distance between the same class, is 0.1322 calculated with Equation (9).
Generally, a threshold t ccd (dis s ccd < t ccd < dis d ccd ) is set for shape matching. When the distance between two shapes is larger than t ccd , they can be determined to be in different classes. When the distance between two shapes is smaller than t ccd , they can be determined to be in the same class. This method of judging is slightly rudimentary, but it is of high value in engineering practice.
In terms of discriminability, CCD and FCCD have the same weakness, as FCCD is derived from CCD. Human eyes can easily distinguish between two shapes in Figure 2. However, the CCD method does not. In the CCD feature space, the distance between the two shapes in Figure 2 is 0.1007, which is significantly less than dis s ccd . This makes them extremely easy to be judged as in the same class. Figure 3 shows their CCD feature vector curves; it can be seen that their feature vectors are so similar. Figure 4 shows their FCCD feature vector curves that are still so similar. What caused this error? This is because d i ccd is a distance scalar without direction. The direction information of the contour points relative to the centroid point is lost during calculation of d i ccd , which results in different shapes having similar CCD feature vectors. The FCCD feature is derived from the CCD feature, so it also inherits this error description.
where the database contains d N shapes. shown in Equation (8).
where () label is the class label of a shape.
Generally, a threshold ccd ) is set for shape matching. When the distance between two shapes is larger than ccd t , they can be determined to be in different classes. When the distance between two shapes is smaller than ccd t , they can be determined to be in the same class.
This method of judging is slightly rudimentary, but it is of high value in engineering practice.      In terms of discriminability, CCD and FCCD have the same weakness, as FCCD is derived from CCD. Human eyes can easily distinguish between two shapes in Figure 2. However, the CCD method does not. In the CCD feature space, the distance between the two shapes in Figure 2 is 0.1007, which is significantly less than s ccd dis . This makes them extremely easy to be judged as in the same class. Figure 3 shows their CCD feature vector curves; it can be seen that their feature vectors are so similar. Figure 4 shows their FCCD feature vector curves that are still so similar. What caused this error? This is because i ccd d is a distance scalar without direction. The direction information of the contour points relative to the centroid point is lost during calculation of i ccd d , which results in different shapes having similar CCD feature vectors. The FCCD feature is derived from the CCD feature, so it also inherits this error description. In addition, the CCD method still has a more serious problem in that it is too poor to describe the detail of the contours. In Figure 5, the distance between each pair of shapes is larger than 0.0277 and smaller than 0.0907, which are all smaller than dis s ccd . Therefore, the CCD method cannot identify the difference between each pair of shapes in Figure 5. Why cannot the CCD method distinguish? Because the differences between them are local, but the CCD method is more concerned with global features. The difference between each pair of CCD feature vectors is shown in Figure 6. It can be seen that two CCD feature vectors of each pair of shapes are similar globally, though they are in different classes. These small local differences cause two shapes to be completely in different classes, but unfortunately small local differences do not obviously increase the distance between two shapes in the CCD feature space. The difference between FCCD features of each pair of shapes is shown in Figure 7. The situation is similar to that in CCD. When the CCD method and Fourier transform are used in combination, the discriminability is not substantially improved, though the efficiency of FCCD is much higher than CCD.
In terms of discriminability, CCD and FCCD have the same weakness, as FCCD is derived from CCD. Human eyes can easily distinguish between two shapes in Figure 2. However, the CCD method does not. In the CCD feature space, the distance between the two shapes in Figure 2 is 0.1007, which is significantly less than s ccd dis . This makes them extremely easy to be judged as in the same class. Figure 3 shows their CCD feature vector curves; it can be seen that their feature vectors are so similar. Figure 4 shows their FCCD feature vector curves that are still so similar. What caused this error? This is because i ccd d is a distance scalar without direction. The direction information of the contour points relative to the centroid point is lost during calculation of i ccd d , which results in different shapes having similar CCD feature vectors. The FCCD feature is derived from the CCD feature, so it also inherits this error description.  The FMSCCD method (Fourier descriptor based on multiscale CCD) is proposed to solve the problem of CCD ignoring local differences and direction information. In order to facilitate the calculation, the number of sampling points of the contour in FMSCCD must make N p = 2 t 0 + 1, t 0 ∈ Z + be satisfied. In the CCD method, a constant global centroid point is always used. However, in the FMSCCD method, a novel dynamic centroid point is used. Before the distance from a contour point to the dynamic centroid point is calculated, the dynamic centroid point is calculated with Equation (10).
where h indicates the level of the scale from global to local. The larger the value of h in Equations (10)- (12), the finer the obtained feature. Then, with p h,i dc = (x h,i dc , y h,i dc ), the unnormalized distance to the dynamic centroid point from p i is calculated with Equation (11).
Next, the same normalization method is used.
Combining F h dc of different h forms the FMSCCD feature. When h is large, the relative location of the dynamic centroid point is easily disturbed by the noise on the contour. It makes the robustness of D h dc and F h dc decrease as h increases. Therefore, weighted summation is used when more than one scale is selected to form the multiscale features MSCCD and FMSCCD.
When a MSCCD feature is matched to another one, Equation (14) is used. When an FMSCCD feature is matched to another one, Equation (15) is used.
where w h (0 < w h ≤ 1) means the weight of the feature at the scale of level h. In this paper, The MSCCD method and the FMSCCD method are used to determine the differences between two shapes in Figure 2. The difference between MSCCD feature vectors of two shapes at each scale is shown in Figure 8. The difference between FMSCCD feature vectors of two shapes at each scale is shown in Figure 9. It can be seen that at some scales, the difference between features is larger than that in Figures 3 and 4.
The difference between MSCCD feature vectors of each pair of shapes in Figure 5 when h = 3 is shown in Figure 10. The difference between FMSCCD feature vectors of each pair of shapes in Figure 5 when h = 3 is shown in Figure 11. Clearly, the difference between each pair of shapes in Figures 10  and 11 becomes larger compared to that in Figures 6 and 7, respectively. This general improvement confirms the robustness of MSCCD and FMSCCD. The experiment results on three different databases in Section 3 prove the robustness of FMSCCD further.
FMSCCD (improved CCD) is generally used in combination with FASD (improved ASD) as CCD and ASD are complementary [18]. FMSCCD is also easy to use in combination with other features because the FMSCCD feature is easy to implement and has low computational cost during feature extracting and matching processes.  Figure 6. This figure shows the difference between two CCD feature vectors of each pair of shapes in Figure 5. The subfigure of each pair is arranged in the same order as in Figure 5. It can be seen that two curves of each pair are so similar, and even overlap.  Figure 5. The subfigure of each pair is arranged in the same order as in Figure 5. It can be seen that two curves of each pair are so similar, and even overlap.  Figure 5. The subfigure of each pair is arranged in the same order as in Figure 5. It can be seen that two curves of each pair are so similar, and even overlap.  Figure 5. The subfigure of each pair is arranged in the same order as in Figure 5. It can be seen that two curves of each pair are so similar, and even overlap.    It can be seen that the difference between each pair of MSCCD features is larger than that in the corresponding subfigure in Figure 6.  Figure 5 when h = 3. It can be seen that the difference between each pair of MSCCD features is larger than that in the corresponding subfigure in Figure 6.  Figure 11. The difference between FMSCCD feature vectors of each pair of shapes in Figure 5 when It can be seen that the difference between each pair of FMSCCD features is larger than that in the corresponding subfigure in Figure 6. Figure 11. The difference between FMSCCD feature vectors of each pair of shapes in Figure 5 when h = 3. It can be seen that the difference between each pair of FMSCCD features is larger than that in the corresponding subfigure in Figure 6.

Results
In order to evaluate the performance of FMSCCD, CCD [18], FD-CCD [11], DIR [17], ASD&CCD [18], FPD (farthest point distance) [24], and MDM [16] were used for comparison. Since FMSCCD is a shape descriptor, the evaluation experiment is still on the well-known shape databases MPEG-7 CE1 Part B, Swedish Plant Leaf, and Kimia 99, on which the performance of other descriptors is reported. These algorithms were implemented in MATLAB, on a PC with I7 CPU, 16GB RAM under Windows 10 system. In all the experiments N p is 513.
In the experiments, when FMSCCD combined with other descriptors, the weighted distance was used to calculate the dissimilarity between two shapes with Equation (16).
dis a f s (s 1, s 2 ), 0 < w f mc < 1 (16) where dis f mc is the distance between two shapes in the FMSCCD feature space and dis a f s is the distance between two shapes in another feature space (for example FASD, DIR, or MDM).

On MPEG-7 CE1 Part B
MPEG-7 CE1 Part B is a common shape database used by a large number of shape descriptors in articles [8,10,11,[16][17][18]. It contains 70 classes, each containing 20 shapes, so a total of 1400 shapes are in this database. Some examples in the database are shown in Figure 12.

Results
In order to evaluate the performance of FMSCCD-CCD [8], FD-CCD [4], DIR [9], ASD&CCD [8], FPD [5], and MDM [7] were used for comparison. Since FMSCCD is a shape descriptor, the evaluation experiment is still on the well-known shape databases MPEG-7 CE1 Part B, Swedish Plant Leaf, and Kimia 99, on which the performance of other descriptors is reported. These algorithms were implemented in MATLAB, on a PC with I7 CPU, 16GB RAM under Windows 10 system. In all the experiments p N is 513.
In the experiments, when FMSCCD combined with other descriptors, the weighted distance was used to calculate the dissimilarity between two shapes with Equation (16).
where fmc dis is the distance between two shapes in the FMSCCD feature space and afs dis is the distance between two shapes in another feature space (for example FASD, DIR, or MDM).

On MPEG-7 CE1 Part B
MPEG-7 CE1 Part B is a common shape database used by a large number of shape descriptors in articles [2,3,8,9,4,7]. It contains 70 classes, each containing 20 shapes, so a total of 1400 shapes are in this database. Some examples in the database are shown in Figure 12.
"Bulls-eye-test" is a commonly used evaluation method [2,3,8,9,4,7] on this database. It is used "Bulls-eye-test" is a commonly used evaluation method [8,10,11,[16][17][18] on this database. It is used to measure the performance of a descriptor. Each shape in the database is set as query in turn, then in the retrieval result corresponding to each query the number of correct hits (the retrieved shape and the query belong to the same class) of the top 40 most similar shapes to the query are counted. The counted number divided by 28,000 (the maximum of correct hits is 1400 × 20 = 28, 000) is the bulls-eye-test score.
Matching time refers to the time taken to calculate the dissimilarity between the feature of query and the features of all shapes in the database. Matching time is used to evaluate the performance of the descriptor in terms of efficiency. Table 1 shows the bulls-eye-test scores of FMSCCD when e w and H varies. It can be seen that when e w = 5, H = 6, 7 and 8, FMSCCD obtains the highest score of 75.73%. Table 2 shows the scores of FMSCCD+FASD when w f ms varies with H = 6 and e w = 5. It can be seen that when w f ms = 4/6, FMSCCD+FASD obtains its highest score of 78.18%. In the remaining experiments H = 6, e w = 5, and w f ms = 4/6 are used without fine tuning to show the robustness of the proposed method.

On Swedish Plant Leaf
As the FMSCCD is a shape descriptor, it is necessary to evaluate the performance on plant leaf retrieval, which is a common application for shape descriptors. Swedish Plant Leaf is a database of plant leaf images. It contains 15 classes, each containing 75 shapes, so a total of 1125 shapes are in the database. Some shapes in the database are shown in Figure 13.
Each shape in the database is set as a query in turn, then the similar shapes are retrieved in this database. In the retrieval results, the precision is calculated when 10 (recall rate is 13.33%), 20 (recall rate is 26.7%), 30 (recall rate is 40.0%), 40 (recall rate is 53.3%), 50 (recall rate is 66.7%), 60 (recall rate is 80.0%), 70 (recall rate is 93.3%), and 75 (recall rate is 100%) shapes are retrieved correctly [21]. The average precision is used to evaluate the performance of the proposed FMSCCD+FASD compared to some state-of-the-art methods. In terms of efficiency, the performance of each descriptor is independent of the specific database, so the matching time, which maintains the same trend as in MPEG-7 CE1 Part B, is no longer calculated.
In this experiment, the descriptor DALR (deep autoencoder learning representation) [21] based on the autoencoder is also selected to be compared. The experimental results of some state-of-the-art descriptors and the proposed FMSCCD in this paper are shown in Table 4. It can be seen that FMSCCD+FASD (68.3%) performs the best among DIR (67.6%), ASD&CCD (57.3%), MDM (54.6%), DALR (54.2%), and FD-CCD (49.0%). It can also be seen that in some scenarios, descriptors based on machine learning have no obvious advantages.
database. Some shapes in the database are shown in Figure 13.
Each shape in the database is set as a query in turn, then the similar shapes are retrieved in this database. In the retrieval results, the precision is calculated when 10 (recall rate is 13.33%), 20 (recall rate is 26.7%), 30 (recall rate is 40.0%), 40 (recall rate is 53.3%), 50 (recall rate is 66.7%), 60 (recall rate is 80.0%), 70 (recall rate is 93.3%), and 75 (recall rate is 100%) shapes are retrieved correctly [22]. The average precision is used to evaluate the performance of the proposed FMSCCD+FASD compared to some state-of-the-art methods. In terms of efficiency, the performance of each descriptor is independent of the specific database, so the matching time, which maintains the same trend as in MPEG-7 CE1 Part B, is no longer calculated.
In this experiment, the descriptor DALR [22] based on the autoencoder is also selected to be compared. The experimental results of some state-of-the-art descriptors and the proposed FMSCCD in this paper are shown in Table 4. It can be seen that FMSCCD+FASD (68.3%) performs the best among DIR (67.6%), ASD&CCD (57.3%), MDM (54.6%), DALR (54.2%), and FD-CCD (49.0%). It can also be seen that in some scenarios, descriptors based on machine learning have no obvious advantages.

On Kimia 99
Kimia 99 is also a common shape database [2]. A large number of shape descriptors in their articles use Kimia 99 as a test database. It contains 9 classes, each containing 11 shapes, so a total of 99 shapes are in the database. All shapes in the database are shown in Figure 14.
Each shape in the database is set as a query in turn, then the similar shapes are retrieved in the remaining shapes. In the retrieval results, the numbers of correct hits from the first to the tenth most similar shapes of each query are counted. The final statistical results are used to evaluate the performance of the descriptors. The experimental results of some state-of-the-art descriptors and the proposed FMSCCD in this paper are shown in Table 3. It can be seen that the combination of FMSCCD and another descriptor always performs better than that descriptor alone. Experimental results show that FMSCCD is very flexible and performs well when combined with multiple descriptors.

On Kimia 99
Kimia 99 is also a common shape database [8]. A large number of shape descriptors in their articles use Kimia 99 as a test database. It contains 9 classes, each containing 11 shapes, so a total of 99 shapes are in the database. All shapes in the database are shown in Figure 14.
Each shape in the database is set as a query in turn, then the similar shapes are retrieved in the remaining shapes. In the retrieval results, the numbers of correct hits from the first to the tenth most similar shapes of each query are counted. The final statistical results are used to evaluate the performance of the descriptors. The experimental results of some state-of-the-art descriptors and the proposed FMSCCD in this paper are shown in Table 5. It can be seen that the combination of FMSCCD and another descriptor always performs better than that descriptor alone. Experimental results show that FMSCCD is very flexible and performs well when combined with multiple descriptors.

Discussion
FMSCCD is a shape feature that is very simple in principle and structure and is easy to implement. The most important thing is that it has strong discriminability with low computational cost in feature extracting and matching processes, which is beneficial for engineering applications. In addition, because of its low computational cost, it can be easily combined with other features. As can be seen in experiments on MPEG-7 CE1 Part B, the FMSCCD performs better than other descriptors when FASD is used in combination. On the Swedish Plant Leaf database, FMSCCD+FASD performs best in average precision. In the experiment on the Kimia 99 database, the FMSCCD combined with multiple shape features performed better than a descriptor used alone.

Conclusions
FMSCCD is a simple, efficient, and compatible frequency domain descriptor. Multiscale description and Fourier transform are two useful tools for non-dynamic programming descriptors. A frequency domain descriptor can maintain high discriminability with low computational cost. The high discriminability in such highly efficient situations is what is needed for object recognition in remote sensing images. Another fact is that FMSCCD is also suitable for plant leaf retrieval.

Discussion
FMSCCD is a shape feature that is very simple in principle and structure and is easy to implement. The most important thing is that it has strong discriminability with low computational cost in feature extracting and matching processes, which is beneficial for engineering applications. In addition, because of its low computational cost, it can be easily combined with other features. As can be seen in experiments on MPEG-7 CE1 Part B, the FMSCCD performs better than other descriptors when FASD is used in combination. On the Swedish Plant Leaf database, FMSCCD+FASD performs best in average precision. In the experiment on the Kimia 99 database, the FMSCCD combined with multiple shape features performed better than a descriptor used alone.

Conclusions
FMSCCD is a simple, efficient, and compatible frequency domain descriptor. Multiscale description and Fourier transform are two useful tools for non-dynamic programming descriptors. A frequency domain descriptor can maintain high discriminability with low computational cost. The high discriminability in such highly efficient situations is what is needed for object recognition in remote sensing images. Another fact is that FMSCCD is also suitable for plant leaf retrieval.