1. Introduction
The centroid is often assumed to be the concentrated point of the mass of a body and is also called the center of mass or the center of gravity. For a body of homogeneous mass, the location of its centroid is equivalent to its geometric center location; therefore, it is commonly accepted that the centroid coincides with the center of mass [
1,
2]. For example, the center of mass of a homogeneous sphere (a geometric solid) coincides with its exact geometric center.
The classic centroid which relates to the concept of central tendency mainly includes the following three types of averages [
1,
3]: (1) The mean centroid. This type of centroid can be subdivided into arithmetic mean, root mean square mean, harmonic mean, geometric mean, and minimum bounding rectangle. (2) The median centroid. The median centroid is the arithmetic mean of the middle pair of values ordered from smallest to largest. The mean and median centroid can be easy to calculate but are not sensitive to the order of the vertices. (3) The mode centroid. This type of centroid is the value that occurs most frequently in a set of numbers. It is used in statistics to indicate the value of a substantial part of a dataset. As a measure of a central point of a geometric figure, the mode has little or no practical use.
In addition, there are other forms of centroids, for example, minimum distance, negative buffer, and weighted form of the above methods. The minimum distance centroid has a connection with spatial analysis, where a gravity function may be used to model relationships between points or regions. The negative buffer centroid is not sensitive to large poorly conditioned portions of a polygon. The computation of the centroid is difficult to automate, particularly with highly irregular polygons. In some cases, the weighted centroid considers the average distribution center of the region more, and the other centroid describes only the absolute geometric center. The weighted centroid mainly includes weighted mean centroid, weighted median centroid, and so on [
4,
5,
6,
7].
Since the concept of a centroid was proposed, it has been extensively used to determine the average location of a body or some objects in many fields, especially in computational geometry, applied physics, and spatial information fields, amongst others [
8,
9,
10,
11,
12,
13,
14]. City planners are continually striving to use centroids to discover fundamental relationships to help explain the structure of urban and metropolitan areas [
15]. K-mean clustering is a partitioning clustering technique for data mining, which is formed with the help of centroids [
16]. Some scholars in the communication field have modified the traditional centroid method for WLAN indoor positioning systems [
17] and to evaluate the stability of objects, such as the center of mass of vehicles [
18,
19]. In astronomy, it is used to calculate the spatial position of celestial bodies, their state of motion, and their relationships with other bodies [
20,
21]. Eteje and Oluyori [
13] presented the impact of different methods of centroid mean computation on the accuracy of orthometric height modeling using the geometric geoid method. Farmer et al. [
22] proposed an alternative resilient centroid for natural resource management applications.
Some means are simple coordinate averages whilst others are more advanced. Different calculation methods have different application values. Moment centroids are considered as the best measure of the centroid of a complex polygon. The coordinates of the weighted centroid can be calculated using the weighted arithmetical mean of the coordinates of the points. In recent years, owing to the rapid development of Geographic Information Systems (GIS) tools, there several techniques have become available for rapid calculations, even for large spatial databases.
However, in nature, geographical spaces are heterogeneous and complex. The polygon boundary around such a heterogenous space is called a non-homogeneous polygon. If a boundary is set for such a space, the polygon defined is called a non-homogeneous polygon. Centroids calculated by GIS tools implicitly assume that the mass density of objects are homogenous and hence cannot calculate accurate centroids for non-homogeneous objects. Here, we propose a new method for centroids of non-homogenous polygons by combining the suspension method in physics with the method of distance transformation in map algebra. The proposed method was applied to estimate the population gravity of Beijing. The center results are then compared with those of the original administrative district centers.
2. Method
2.1. Suspension Line Theory
Based on suspension theory, the centroid of an object can be obtained as follows: Hang the object by a piece of string and obtain a vertical line from the point of suspension down across the object. Thereafter, remove the string, rotate the object a few degrees, hang it again, and repeat the procedure. The intersection of the dividing lines (or balance lines) defines a point called the object centroid.
For a stationary object, there are only two forces: gravity and the rope. To maintain balance, the two forces must be equal and reversed. The direction of gravity follows a straight line with the rope; in other words, the straight line of the rope passes through the center of gravity. For the two suspensions of the object, both ropes pass through the center of gravity; therefore, the intersection point must be the center of gravity. This method was first proposed to calculate the center of gravity of an object. However, when it is used to obtain the centroid of a polygon or map, it is usually assumed that the polygon or map is homogeneous and therefore does not correctly reflect the tendency for conditions to vary over space, i.e., spatial heterogeneity [
23].
2.2. Method to Determine the Centroid
Here, we propose a new method to calculate the centroid for a nonhomogeneous polygon using the above suspension line theory. Assume that the calculated polygon is located in a real-number plane, P, which may be divided into two equal portions, P1 and P2, about a dividing line. If the plane were symmetrical about this line according to some objective value, it would balance if placed on a knife-edge along this line. If another line is also selected, which divides the plane equally in some value, then the intersection of the dividing lines (or balance lines) would define a point called the proposed centroid. The method is implemented in four steps:
Step 1: Select any two points on the boundary of the polygon as the source and destination points, and the two-point connecting line is considered as the candidate centroid balance line.
Step 2: The candidate centroid balance line is taken as the center line, and the real number plane is transformed to the Euclidean distance plane. Calculate the sum of gravity moments for the left and right sides of the candidate centroid balance line. The gravity moment can take the following form:
where
G is the gravity moment,
M is the cell value in the Euclidean plane, and
g is the Euclidean distance between the cell point and the candidate centroid balance line. If this polygon is located within a Euclidean distance plane, the plane conversion of this step above should be ignored.
Step 3: Calculate the difference in the gravity moment on both sides if the difference is smaller than the set precision value, as follows:
where
abs is the function of the absolute function,
and
are the gravity moments, and
is the value of the set precision.
If the difference is smaller than the precision value, the current candidate centroid balance line is the final centroid balance line. Conversely, if the difference is larger than the precision value, the destination point is updated to the larger moment direction boundary. The length between the new and previous destinations is set as a fixed length (called one step length). Step 3 is repeated until the difference is smaller than the precision value.
Step 4: Similarly, calculate another centroid balance line for this polygon. The intersection point of the two centroid balance lines is the final centroid of the polygon.
3. Centroid-Computing Procedures for Different Types of Polygons
There are many different methods of classifying polygons, according to different criteria. In general, all polygons can be divided into convex and concave polygons [
24,
25]. However, there are some irregular polygons, for example, a polygon with holes and a self-intersecting polygon. A self-intersecting polygon is one type of complex polygon (not simple polygon) and is often considered a topology error in GIS. Hence, this type of polygon will not be discussed in this paper.
Thus, to demonstrate the variation between different types of polygons while using the proposed method, four types of polygons were used as examples, as shown in
Figure 1: a homogeneous convex polygon
P, a homogeneous concave polygon
Q, a homogeneous convex polygon with a hole
R, and a non-homogeneous polygon
S (consisting of four small polygons
A,
B,
C, and
D). The centroids of these polygons were calculated by using the ArcPy package in ArcGIS 10.6.
3.1. Centroid of Homogeneous Convex and Concave Polygons
Taking the simple homogeneous polygon
P as the first study object, the main steps of the centroid-computing process are shown in
Figure 2.
Figure 2a shows the original polygon and the first candidate centroid balance line, Line
AB, whilst
Figure 2b shows the result of the Euclidean distance transformation shown in
Figure 2a. After calculating the difference in the gravity moments on both sides, it was found that it did not meet the demand of being smaller than the precision value. Thereafter, the subsequent updated lines and the corresponding results of the Euclidean distance transformation were calculated with the results shown in
Figure 2c–f. After comparison, line
AB4 was determined to be the first final centroid balance line.
Figure 2g shows the main steps of the computing process of the second candidate centroid balance line, with Line
CD4 representing the second final centroid balance line. The intersection point
C of the two centroid balance lines is the final centroid of the polygon.
For the concave polygon
Q, its computing procedures of the candidate centroid balance line is the same as the above convex polygon
P (
Figure 3), which is not discussed in detail here.
3.2. Centroid of Homogeneous Polygons with a Hole
In calculating the centroid for polygons, one type of irregular polygon is usually dealt with, that is, a polygon with a hole or more holes, when performing distance transformation, the influence of the hole(s) must be excluded from the polygon. Taking polygon
R as the study object, the first and second centroid balance lines of polygon
R are shown in
Figure 4a,b, respectively, and the results of the centroid balance line of the polygon are shown in
Figure 4c,d. For comparison, the centroid of polygon
R with no such hole was also calculated. The centroid of polygon
R is point
p, and the centroid of the ‘with no hole polygon’ is point
q, as shown in
Figure 4e. It is apparent that point
q tended to be located at the center of the polygon, whereas point
p was shifted to the upper left. This is because the hole was in the bottom right of the polygon, which weakened the influence of this part on the entire polygon. This observation holds true and reflects reality correctly.
3.3. Centroid of Non-Homogeneous Polygons
When calculating the centroid of a non-homogeneous polygon, the proposed method not only considers the distance parameter but also the attribute value of the grid cells inside. Consider a nonhomogeneous polygon
R as the study object. To obtain grid values, the non-homogeneous polygon was first converted to a raster dataset according to the field values, as shown in
Figure 5a. Based on statistics, the grid numbers in the four small polygons were 1135, 32,383, 4263, and 13,515, respectively (see
Figure 5b).
Figure 5c,d show the computing results of the first and second final centroid balance lines, and in
Figure 5e, point
q is the final centroid of the entire polygon using the proposed method. Point
p is the centroid of the corresponding homogeneous polygon with the same boundary as polygon
S, and it is noted that point
q is east of point
p. This is because most of the polygon
B and the largest value of polygon
D are all located in the east. The result is logical and consistent with this finding.
3.4. Parameter Analysis
As mentioned above, the selection of the original points, step length, and precision value are the three most important parameters when using the proposed method and the setting of some values varies according to the specific application instance. Here, the simple homogeneous polygon
P of
Figure 1 was taken as an example to explain the effect of the three parameters on the calculated centroid results and suggest some tips while applying them.
3.4.1. Selection of the Original Point
For polygon
P, five different centroid balance lines were selected for comparison, as shown in
Figure 6a–e. Their starting points were set at five different locations, whereas the step length and precision values were set to be the same.
Figure 6f shows the overlapping results of the five centroid balance lines. It is apparent that the result did not intersect a unique point, as expected. However, it was determined that this phenomenon does not occur if the starting and ending points are all set at the vertex of the line, which is an integral multiple step length. As shown in
Figure 7, starting from point
O, the lengths of
OA,
OB,
OC, and
OD are integral multiples of one step length. If point
O was selected as the starting point, and the ending point was set to an integer multiple step length away from the starting point while updating every time, such as points
A,
B,
C, and
D, then the intersecting points of different candidate centroid balance lines will gather at one unique point.
3.4.2. Precision Value
The smaller the precision value, the less the gravity moment difference between the two sides, and the more accurate the calculation result. In contrast, a higher precision value will result in a worse result. However, in terms of computational efficiency, a smaller precision value will inevitably require a longer running time of the program, making it difficult to determine the optimal precision value. Therefore, for specific issues, a suitable precision value was set according to the specific application instances to weigh the relationship between the accuracy requirement and time consumption.
3.4.3. Step Length
In the process of calculating the centroid balance line, the location of the ending point was continually updated according to the step length, thus determining the appropriate step length so that the accuracy and computational efficiency were kept optimal, which is also a key step in calculating the centroid. In
Figure 6, if the ending point is selected between
AC or
CB, there must be other ending points that meet the conditions. However, it is difficult to identify such a point.
Similar to the step length, the smaller the precision value, the more accurate the calculation result. Taking into account the accuracy and efficiency simultaneously, the step length setting can be a dynamic process. A larger size can first be set allowing for a rough accurate centroid balance line to be obtained. Thereafter, based on this line, update the step size to a smaller one, calculate the balance line again, obtain a more accurate balance line, and repeat the procedure again until the minimum acceptable conditions are met.
4. Gravity of Population
To further explain how the proposed method can be applied in practice, the proposed method was used to calculate the population gravity of Beijing by calculating the centroid of the population. In the field of geography, population gravity is often closely related to the spatial distribution of populations. Generally, the population density of a city is directly proportional to its distance from the city center. To illustrate the proposed method, the results of centroids were compared using the proposed method with the administrative centers (called Administrative Center), weighted median centroids (called Weighted Centroid), and mean centroids (homogeneous, called Mean Centroid).
Beijing is the capital of China, which is a municipality located in North China at the northern tip of the North China Plain, and it consists of six city districts (Dongcheng, Xicheng, Chaoyang, Haidian, Fengtai, and Shijingshan) and 10 suburban districts (Shunyi, Tongzhou, Daxing, Fangshan, Mentougou, Changping, Pinggu, Miyyun, Huairou, and Yanqing). It covers a total area of 16,807.8 km
2, with a population of 21.14 million measured in 2015. In this study, the population dataset was provided by the Data Center for Resources and Environmental Sciences, Chinese Academy of Sciences (RESDC), aggregated in 1000 m × 1000 m cells regarding the terrain and population density, as shown in
Figure 8. There are 9838 population grid points, according to recent statistics.
Using the proposed method, the centroids of the 16 administrative districts were calculated, and the parameters were set according to
Section 3.3. We set the precision value as 0, and if the difference was greater than 0, we had to adjust the end point coordinate to find the true balance line; otherwise, the current line was thought of as the balance line. For comparison, the location of the Mean Centroid, Weighted Centroid, and Administrative Center were also calculated in each region. A spatial distribution diagram is shown in
Figure 8, where the green triangle represents the Mean Centroid. The symbol of the yellow circle-shaped point represents the Administrative Center of the district. The symbol of the black circle-shaped point represents the Weighted Centroid of the district. The blue circle-shaped point represents the centroid of the districts using the proposed method (called the New Centroid).
It should be noted that most of these centroids calculated by the proposed method tended to be located near their own administrative centers, which are generally located in a densely populated area to serve as many people as possible. Current administrative centers are mostly located on population density centers from many years ago. However, their locations, which rarely change over many years for various reasons, are quite different from the real current population center. As shown in
Figure 9, the Weighted Centroids are closer to the centroid results of the new method compared with Mean Centroids and Administrative Centers. This may be because they all take into account point coordinates and point values. The centroid location of the new method tends to be near the Weighted Centroid compared with the other two methods.
Table 1 shows a comparison of the statistics of the distance to the New Centroid, Administrative Center, and Weighted Centroid from each grid. As shown in
Table 1, the average distance from the new centroid to each grid point is larger than that from the Administrative Center, up to 100 m and approximately 3 km. For the distance to new centroid, the sum value ranges from 13.61 to 31,619.4 km with a mean of 10,664.14 km, the maximum value ranges from 6.04 to 76.46 km with a mean of 33.21 km, and the STD.E ranges from 1.34 to 22.15 with a mean of 7.88. Compared with the results of the distance to NC, the mean sum, mean maximum, and mean STD.E of the distance to WC are generally larger, and the mean sum, mean maximum, and mean STD.E of the distance to AC are generally larger than that to WC. The experimental results are as expected. Compared with the other two methods, the overall results of the new method are closer to the true center of all points because the boundary vertices and inside point value are not considered. Compared with the early administrative centers, the spatial distribution of the existing population tends to shift significantly. Similarly, these indicators of Weighted Centroids are also smaller than those of the New Centroids. In the suburbs, especially Huairou, Fangshan, Shunyi, and Changping, the difference in the average distance tended to be greater than that in urban areas. This may be because suburban areas tend to be relatively large, and the population distribution is relatively scattered.
While having the benefit of being easily calculated, Mean Centroids tended to be far away from Administrative Centroids, Weighted Centroids, and New Centroids (see
Figure 9) because the calculation of Mean Centroids only requires the vertex coordinates of the boundary, not the information of the areas of polygons. They are usually considered to be useful descriptors of point datasets but not of the entire polygons. However, the proposed method is not only sensitive to the boundary of the polygon, but it is also sensitive to the coordinates of the point and point values. Therefore, Mean Centroids should only be viewed as geometric centers, and not suitable for the calculation of urban centers. The proposed method can be applied to aid in solving specific problems such as location analysis of shopping, allocation of resources, spatial optimization of service facilities, and other uses.
5. Discussion and Conclusions
Building on previous suspension theory, this paper proposes a new method for calculating the centroid of non-homogeneous polygons by combining the method of distance transformation in map algebra. In addition, the computing variation of the three types of polygons while using the proposed method are also discussed. This method is applied to estimate the population gravity of Beijing, which is compared with the original administrative district centers.
The results show that the consideration of grid distance and grid value is logical and consistent with the calculation of the centroid of a non-homogeneous polygon. The proposed method is not only sensitive to the boundary of the polygon but is also sensitive to the coordinates of the point and point values inside, which can correctly reflect spatial heterogeneity in geography. Furthermore, the proposed method is easy to implement using the ArcPy package. However, while using this method, a suitable value for three key parameters (selection of starting point and ending point of candidate centroid balance line, step length, and precision value) needs to be established according to specific application instances. The starting and ending points should be set at the vertex of the line, which is an integral multiple step length, to ensure that the centroid intersects a unique point. The setting of step length and precision value can be set to a large value first and then gradually decreased to weigh the relationship between the accuracy requirement and time consumption.
The proposed method can be applied to aid in solving specific problems such as location assessment, allocation of resources, spatial optimization, and other related applications. In future work, the automated computation of the centroids for non-homogeneous polygons will be further studied. In addition, extending the proposed method to calculate the centroids of 3D objects is another important issue to be examined.