1. Introduction
With the influences of data acquisition errors, map generalization, different application purposes and people’s explanatory differences, there are inconsistencies in the expression of real-world objects in different geospatial data [
1,
2,
3]. Fast and accurate matching of the same entities from multi-source spatial data is one of the key technologies for geospatial data integration, conflation, updating, and quality assessment [
4,
5,
6,
7,
8]. Commonly, all kinds of features will be used in matching entities. Matching methods can be divided into geometric matching, topology matching and semantic matching. Small differences of topological relations between two datasets may lead to different matching results, while semantic matching heavily relies on consistency and completeness of data properties [
9]. As a result, geometric matching is used mostly. Commonly used geometric matching features include location, size, orientation, and shape [
10,
11]. Compared with location, size, and orientation, shape is less dependent on spatial position accuracy and it can distinguish objects more intuitively. Therefore, it is important to study the method of shape similarity measurement.
There are extensive methods of shape similarity measurement in image processing [
12]. However, only some of them are suitable for vector data, because shape representation of vector polygons is different from images. Simply transforming vector polygons into images will introduce bias and require additional computation. Thus, scholars pay more attention to researching similarity measurement of areal entities in geographical vector data. Presently, commonly used shape description methods for vector polygons include the tangent space shape description method [
5,
13], multi-level chord length shape description function [
14,
15], shape context descriptor [
16], and Fourier shape description method [
17,
18]. The tangent space method describes shape through tangent values of vertex angles on outlines of polygons. Not only can it realize shape matching between polygons, but also the matching between vertices on the polygons. However, it heavily relies on the starting point. The multi-level chord length method divides polygon contours by arc length. One dividing point and the vertexes on the polygon contour compose a chord length function. The multi-level chord length function is composed of several chord length functions and can be used to describe the shape. This method can measure overall similarity or local similarity between shapes in geospatial data at different scales, but it is time-consuming. The shape context descriptor extracts a global feature for each point on the boundary of polygons. It is often used to find the corresponding points on two polygons, but this method requires too many feature points. In Fourier shape description method, the distance from polygon contour to the center of the polygon forms a discrete sequence. Then, the Fourier shape descriptor is obtained by performing fast Fourier transform on the sequence. This method is simple and widely used, but it is easily affected by noise. In geographical vector data, there are not only simple polygons, but also complex polygons, such as holed polygons that are polygons with holes and multipolygons which are composed of several disjoint polygons [
19]. However, the above methods are mainly applied to simple polygons, because they cannot describe relations between holes and relations between disjoint polygons. Xu et al. [
18] proposed a method using Fourier descriptors to describe the exterior contour of holed polygons. Then, they described the structure of holes with position graph. Shape similarity between holed polygons is measured by Fourier descriptors and position graph. Chen et al. [
20] proposed a hierarchical model to measure shape similarity between complex holed-region entities. They divided a complex entity into three layers: A complex scene, a micro-spatial-scene, and a simple entity. Then they calculated the similarity of three layers respectively. However, these two methods can only measure shape similarity between holed polygons and cannot be applied to simple polygons and multipolygons.
Areal entities have multiple representations in different geographical vector datasets. The same areal entities can be represented by either a simple polygon, a holed polygon or a multipolygon. Thus, it is necessary to find a similarity measurement that can apply to different kinds of polygons. Vector polygons are represented by points on their contours. Thus, most shape description methods used in Geographic Information System are contour-based. However, contour-based shape description may not be suitable for complex shapes that consist of several disjoint regions such as multipolygons [
21]. In the field of traditional image processing, the moment description method has a wide range of applications in image matching, retrieval and identification [
22]. This method can extract invariant shape features under the transformation of translation, rotation, and scaling, and can describe global shape features of overall areas which means it can describe all kinds of polygons [
23]. Therefore, this region-based shape description method is more suitable for areal entities than methods based on shape contour description. Although shape representation of vector data is different from images, vector data can also express shape feature of entities. In this paper, the moment description method in the field of image processing is introduced to extract invariant moments of areal entities in vector data. Considering inconsistencies caused by map generalization, we construct convex hull moment invariant curves to make the shape description more robust. Then, shape descriptors are obtained by fast Fourier transform on the curves to realize the similarity measure of areal entities. Specifically, the contributions of this paper are summarized below.
We extend the calculation of geometric moments from images to vector polygons.
Based on convex hull and moment invariants, we construct convex hull moment invariant curves to describe shapes and extract shape descriptors from the curves.
We validate our method by experiments of invariance, similarity, and matching. Experimental results show that our shape descriptor is invariant to translation, rotation, and scaling. In addition, it takes advantage of moment invariants and convex hull and can be used for vector data matching.
The rest of this paper is organized as follows.
Section 2 introduces multiple representations of areal entities in geographical vector data.
Section 3 introduces moment-based method of describing shape and discusses moment calculation method in geographical vector data. In
Section 4, we propose a shape similarity measurement model considering map representation differences in geospatial datasets. Next, we apply the model to different experiments, then discuss and analyze the results in
Section 5. Finally, we conclude the study in
Section 6.
2. Areal Entities in Geographical Vector Data
In geographical vector data, polygons are usually used to represent areal entities. However, areal entities in the real world are complex in most cases. For example, the American territory includes several disjoint regions which are the mainland of the United States, Alaska, and the Hawaiian islands. There is a hole in the South African territory, because it completely contains Lesotho. Thus, simple polygons are difficult to fully describe complex areal entities [
24]. Clementini et al. [
25] divided polygons in geospatial data into simple polygons, holed polygons, and multipolygons. In addition, they gave detailed mathematical definitions of three types of polygons. Based on their definitions, we define polygons used in our paper as follows.
Definition 1. Simple polygon:A simple polygon is a counterclockwise closed point set with connected interior and connected exterior. It can be expressed as
Definition 2. Holed polygon:A holed polygon contains an outer contour and one or more inner contours. The outer contour is represented by a counterclockwise closed point set, and the contour of internal holes are represented by clockwise closed point sets. Thus, the holed polygon is expressed as, where O is the outer contour and I is the inner contour.
Definition 3. Multipolygon:A multipolygon is composed of more than two disjoint regions, each of which consists of a simple polygon or a holed polygon, that is, M is a multipolygon, S is a simple polygon, and H is a holed polygon.
The same areal entities can be represented by different types of polygons (
Figure 1). Thus, shape similarity between areal entities can be divided into six cases. There are similarities between two simple polygons, a simple polygon and a holed polygon, a simple polygon and a multipolygon, two holed polygons, a holed polygon and a multipolygon, and between two multipolygons Contour-based shape descriptors can only describe simple polygons, because they cannot describe holes in holed polygons and disjoint regions in multipolygons. However, region-based shape descriptors can describe simple polygons, holed polygons, and multipolygons with all of the cases above. Thus, region-based shape descriptors are more suitable for areal entities.
Affected by map generalization, shapes of the same areal entities may differ in multiscale data, while their convex hulls change slightly compared with shapes (
Figure 2). In addition, convex hulls can combine disjoint regions into single region, which is convenient for the comparison between multipolygons and simple polygons or multipolygons and holed polygons. Convex hull is also a basic method of multi-scale representation of vector data [
26]. Thus, we can use convex hull to extract boundary feature of vector polygons. On the other hand, convex hull can only describe the rough shape of polygons. Areal entities with totally different shapes may also have similar convex hull (
Figure 3). Thus, it is necessary to find a shape similarity measurement to combine region-based shape descriptor with convex hull. Through the combination method, the same entities will have a higher similarity while different entities will have a lower similarity. There are many investigations in region-based shape description [
27,
28,
29], while Hu moment invariants [
30] is a simple and effective region-based method. It can be calculated by the contour of shape [
31], which is suitable for vector polygons. In addition, it can extract global feature of different types of polygons and can be easily connected with convex hull. By combining Hu moment invariants with convex hull, we can measure shape similarity between areal entities.
4. Shape Description Model for Areal Entities
Moment invariants of polygons can describe translation, rotation, and scaling invariant features of the entire shape. However, it is easy to be influenced when describing the same entity at different scales, because there is map generalization in different scales. In order to get robust shape features under map generalization, we introduce convex hulls of polygons to construct the moment invariants feature description model.
Convex hull of points is the smallest convex polygon that contains the points. It is a basic structure that describes the shape of a spatial object. Additionally, it is usually less variable in the process of map generalization. Therefore, it has a wide range of applications in map generalization [
26,
32]. In this paper, we use the local moment invariant proposed by Zhao et al. [
33] to make the convex hull vertices participate in the calculation of polygon’s moment invariants. Thus, we can obtain shape features that combine convex hull and moment invariants. The features are stable under the process of map generalization.
4.1. Local Moment Invariants of Complex Polygons
According to the definition of center moments, we can get moments that are invariant to translation by moving the origin of coordinates to the centroid of a polygon (
Figure 4). After this transformation, geometric moments of the polygon are invariant to translation. Therefore, we can also move the origin of coordinates to an arbitrary point
P(xi,yi) to get translation moment invariants. The arbitrary point
P should be on the boundary of a polygon or on the boundary of the convex hull of a polygon. Zhao et al. [
33] define the moment calculated by this method as local moment and define the point
P as reference point of local moment. The
(p + q)th order local moment relative to point
P of a polygon can be expressed as follows:
We can prove the local moment is also invariant to translation. If
(x,y) belongs to the polygon, and
belongs to the polygon after translation of coordinates, then we can get the relation between
(x,y) and
:
The reference point
P is also satisfied by Equation (9). Thus, the
(p + q)th order local moment of the polygon after translation is Equation (10):
Equation (10) proves translation invariance of local moments. The same as Hu moment invariants of polygons, we can get seven local moment invariants by normalizing and linearly combining the low order local moments. They are also invariant under the transformation of translation, rotation, and scaling.
According to the definition, we can calculate the local moments relative to arbitrary points by geometric moments. Binomial decomposition of the integrand
in Equation (8) is as follows:
The relationship between local moment and geometric moment can be obtained by substituting Equations (7) and (11) into Equation (8) respectively. Then, relationships between low order local moments and geometric moments can be easily deduced:
In the case of known geometric moments, by using Equation (12), we can quickly calculate local moment invariants of polygons relative to arbitrary points.
4.2. Convex Hull Moment Invariant Curves
We can take every convex hull vertex as reference point of local moment invariants. Therefore, every local moment invariant can extract global feature of polygons, and all of local moment invariants reference to convex hull vertices can represent the structure of convex hull of polygons.
If
X is convex hull of polygon
C, it can be represented as a set of vertices
. Then we can calculate seven local moment invariants taking the convex hull vertex
as the reference point. If we take each vertex in X as the reference point successively, we can get seven sequences of local moment invariants. Each sequence can form a curve, which is called the convex hull moment invariant curve.
Figure 5a is convex hull moment invariant curves of the polygon in
Figure 4. Every point on the curve is a local moment invariant.
Figure 5b is centroid distance of vertices on the convex hull. The centroid distance of vertices is the distance between the centroid of polygon and the convex hull vertices. It can be found that the shape of every convex hull moment invariant curve is similar to the centroid distance curve. In addition, the convex hull moment invariant curve is invariant to translation, rotation and scaling because of the invariance of every local moment invariant on the curve.
For the same lakes at different scales (
Figure 6), we can get two series of convex hull moment invariant curves (
Figure 7 and
Figure 8). In order to make the curves more comparable, we resample the convex hull by arc length and take the logarithm of each local moment invariant. It can be found that the two series of curves are similar overall but vary in detail. These characteristics are generated by the method of combining moment invariants with convex hull. It can be used to measure the similarity between areal entities. However, there is displacement on the two series of curves, which is caused by the differences of the starting point on the convex hull. In addition, we can hardly get the similarity value of two lakes only by comparing the curves qualitatively. It is necessary to measure the similarity between areal entities quantitatively.
4.3. Feature Similarity Calculation
In order to eliminate the dependence of starting point on convex hull and get shape descriptors, we introduce the method of fast Fourier transform to extract features on convex hull moment invariant curves. First, convex hull of a polygon is resampled by arc length in power of 2. We can get the resampled vertices set , where m equals . Then, we can calculate seven local moment invariant sequences by taking each point in the resampled vertices as reference point. Performing fast Fourier transform to each sequence and selecting top k of the Fourier coefficient sequences after transformation, we can get seven feature vectors. The feature matrix composed by seven feature vectors is a shape descriptor of the polygon.
If
and
are the descriptors of two polygons respectively, and
and
are the elements in
ith row and
jth column of
and
. Then we can define the similarity between matrices to get the similarity between polygons.
where
.
For the lakes in
Figure 6, we resample the number of convex hull vertices to 256 and select top seven of the sequences after fast Fourier transform. Thus, we can get two shape descriptors for the two polygons (
Figure 9). For visual comparison, we map the values of matrices to the color band. Higher values are colored with red while lower values are colored with green. In addition, we index each element in the matrix. The element with C1F1 means it is calculated by the first Hu moment invariant
, and it is the first Fourier coefficient after fast Fourier transform. It can be found that the two shape descriptors are extremely similar. Then, we can get the similarity between two areal entities by Equation (13). The similarity between two lakes in
Figure 6 is 0.8.
6. Conclusions
Matching of areal entities is a key problem in geospatial data processing. Shape is one of the basic features for the matching of areal entities. Based on the idea of describing shape feature by moment invariants, this paper proposes an approach to describe shape feature by combining convex hull with moment invariants. The method uses convex hull to calculate local moment invariants of a polygon and then constructs convex hull moment invariant curves to describe the shape of polygon. By fast Fourier transform, the shape descriptor of the polygon can be extracted from the curves and can be used to measure shape similarity between different polygons. The experiments indicate that our shape descriptor is invariant under the transformation of translation, rotation, and scaling. In addition, our shape similarity measurement takes full advantage of moment invariants and convex hull. It can distinguish different areal entities even though they are represented by different kinds of polygons and vary in shape. Moreover, the method is applicable to the matching of areal entities in multi-scales.
Although the method is believed to be useful, there are issues for further investigation. The proposed method requires some parameters. We need to test a range of values to find the optimal parameters, which is time consuming. In addition, multi-representation of areal entities is a complicated problem. Areal entities represented by different polygons are not only similar in shape but also similar in direction, position, size, semantics, and topological relationships. In future research, we will investigate the comprehensive similarity measurement model that combines geometric features, topological features and semantic features for areal entities.