An Improved Identification Code for City Components Based on Discrete Global Grid System

City components are important elements of a city, and their identification plays a key role in digital city management. Various identification codes have been proposed by different departments and systems over the years, however, their application has been partly hindered by the lack of a unified coding framework. The use of a code identifying a city component for unified management and geospatial computation across systems is still problematic. In this paper, we put forward an improved identification code for city components based on the discrete global grid system (DGGS). According to their spatial location, city components were identified with one-dimensional integer codes. The results illustrated that this identification code could express the location information of city components explicitly, as well as indicate the spatial distance relationship and the spatial direction relationship between different components. The experiment showed that this code performed better than traditional codes in data query and geospatial computation. Therefore, we concluded that this improved identification code was conducive to the more efficient management of city components, and hence might be used to improve digital city management.


Introduction
Urbanization and urban modernization are the main driving forces of socio-economic development [1,2].It is predicted that by 2050, about 64% of the developing world and 86% of the developed world will be urbanized, with more than half of the world's population living in urban areas [3].These changes pose a significant challenge to urban infrastructure and services, and increase the demand for the orderly management of cities [4,5].In this regard, a digital city management system can be used to manage a city's assets in an advanced way.This integrates geographic information systems (GIS), information and communication technology (ICT), and internet of things technology (IoT) to drive the informatization, intellectualization, and automation in city management [6,7].
One of the main objects of a digital city management system is the city component itself, which incorporates the urban environment and citizen activities [8].The China National Standard defines the city component as a combination of urban public facilities, transport facilities, urban appearance and environmental facilities, landscaping facilities, and other elements [9].To effectively incorporate these into a digital city management system, object identification technology is used to label every component with a digital code [10,11].Identification codes, which are assigned according to a unified framework, help to improve the standardization of city component management and enhance the construction of a digital city management system.It is therefore important to examine the identification code for city components, which is the basis of city component digitalization.
Currently, identification systems usually have various application requirements and business parameters, so city components are encoded separately within different frameworks.Mostly, these codes work well within the system boundaries, but do not guarantee a consistent code for the same component across different systems.Take a fire hydrant, for example: the water department is responsible for its water supply, the fire department uses it for fire control, and the municipal department is also involved in managing its maintenance.Each department holds an identification code for this fire hydrant, which is usually different from that of the others.As a result, when sharing and operating data across systems, it takes extra effort to use different data dictionaries to match common elements, thereby reducing management efficiency and increasing costs.A popular solution to this problem is to integrate the administrative division code, category code, and sequence code to form a unified coding framework [12].This unified code has been used in a city components census [13] and urban grid management [14].However, the administrative division code only contains spatial information at the urban scale, which is too large for accurate positioning, and thus is limited in its application.Li proposed a method to identify the city components by integrating a spatial information grid code and an object code [15,16].This contained administrative division codes at four different scales and could represent more detailed location information.It was used in the city management and service system in Wuhan, China [6], however, it did not contribute to spatial analysis and geospatial computation in a geographic information system, as the administrative regions are irregularly shaped and are coded by sequential numbers.
In this paper, we propose an improved identification code for city components, using the discrete global grid system (DGGS) for geo-referencing [17,18].DGGS divides the Earth's surface into grids with multiple levels, thereby forming a hierarchy of multi-resolution grids [19,20].Each grid is indexed by a one-dimensional integer code, with its location being represented explicitly [21,22].The grid code can also be used in spatial analysis and geospatial computation [23][24][25].Using the grid code to identify and manage city components can build a logical association between the location information and identification codes.Therefore, it may enhance the efficient management of city components and improve the overall service that is provided by a digital city management system.

GeoSOT Grid Code
Among the different DGGSs that are available, this study adopted the Geographic coordinate Subdivision grid with One-dimensional integral coding 2 n -Tree (GeoSOT) as its geo-referencing and coding framework [26,27].GeoSOT is based on a latitude and longitude coordinate system to discretize the Earth's surface with a recursive quad-tree structure.The grid is appropriately indexed by a nested hierarchical code in a single string [28,29].Furthermore, hierarchical coding is handled by appending digits to a grid's code to access its children, or by truncating a part of its code to access its parents [30,31].The benefits of this hierarchy-based indexing method include the efficient hierarchical traversal of the grids and explicit spatial resolution along the length [32].
The subdivision and coding method of GeoSOT is shown in Figure 1, and the scale of the grids at different levels is shown in Table 1.The innovative aspect of GeoSOT are the three extensions in the process of subdivision, expanding the sphere from 180 • × 360 • to 512 • × 512 • , and then expanding 1 • from 60 to 64 , and 1 from 60 to 64 .The result is a one-dimensional integral grid code on the 2 n -tree [33].
At the same time, the GeoSOT grid is compatible with (1) latitude and longitude grids with absolute degrees, minutes, and seconds; and (2) digital Earth grids (e.g., WorldWind of the National Aeronautics and Space Administration (NASA)) [34].Consequently, GeoSOT grids and codes can be adapted to the existing geospatial data management system.Taking into account what is required to apply the identification code of the city component, we adopted only the 4th, 8th, 12nd, 16th, 20th, 24th, 28th, and 32nd level grids of GeoSOT for location reference.The ratio of the grid size of adjacent levels was 1:16, so we could use hexadecimal numbers to encode the grids with a shorter length and higher efficiency.

Identification Code of the City Component
The identification code of the city component included a positioning grid code and a span code .It was calculated by the following three steps: (1) The optimum grid level.The grid size at the level is and this is calculated by Equation (1).
where ∈ [4,8,12,16,20,24,28,32], and and are the meridional length and zonal length of the component's minimum bounding rectangle (MBR) (see Figure 2).Taking into account what is required to apply the identification code of the city component, we adopted only the 4th, 8th, 12nd, 16th, 20th, 24th, 28th, and 32nd level grids of GeoSOT for location reference.The ratio of the grid size of adjacent levels was 1:16, so we could use hexadecimal numbers to encode the grids with a shorter length and higher efficiency.

Identification Code of the City Component
The identification code of the city component included a positioning grid code C 0 and a span code MN.It was calculated by the following three steps: (1) The optimum grid level.The grid size at the nth level is Size n and this is calculated by Equation (1).
where n ∈ [4, 8, 12, 16, 20, 24, 28, 32], and L MBR and W MBR are the meridional length and zonal length of the component's minimum bounding rectangle (MBR) (see Figure 2).( ) be the point closest to the Origin (0°, 0°) among the apexes of the component's MBR, then the grid code of at the optimum grid level is .Specifically, the meridional code can be converted with longitude value by Equation ( 2), and the zonal code can be converted with latitude value through similar equations.Then, the meridional code and the zonal code are cross-integrated consecutively to form : : for the two points closest to and farthest from the Origin (0°, 0°) among the apexes of the component's MBR, whose grids at the optimum grid level are encoded with tag ends of and ( , , , ∈ [0, F]).The meridional span code is calculated by Equation (3), and the zonal span code can be calculated by similar equations.For the identification code of point objects, may be omitted; for the identification code of polyline object and polygon object, is required.

Results of Encoding
Using 3127 commercial buildings and 7103 parking lots in Beijing as an example, each was encoded according to its location based on GeoSOT.The grid size was about 64 m and the grid code was in the form of a one-dimensional integer array at a length of 10 characters.Figure 3 shows the distribution of the commercial buildings on a global scale (on the left) and local scale (on the right); detailed information on the latter is shown in Table 2. (2) among the apexes of the component's MBR, then the grid code of P at the optimum grid level is C 0 .Specifically, the meridional code can be converted with longitude value L by Equation ( 2), and the zonal code can be converted with latitude value B through similar equations.
Then, the meridional code and the zonal code are cross-integrated consecutively to form C 0 : (3) MN: for the two points closest to and farthest from the Origin (0 • , 0 • ) among the apexes of the component's MBR, whose grids at the optimum grid level are encoded with tag ends of m i n i and m j n j (m i , n i , m j , n j ∈ [0, F]).The meridional span code M is calculated by Equation (3), and the zonal span code N can be calculated by similar equations.For the identification code of point objects, MN may be omitted; for the identification code of polyline object and polygon object, MN is required.

Results of Encoding
Using 3127 commercial buildings and 7103 parking lots in Beijing as an example, each was encoded according to its location based on GeoSOT.The grid size was about 64 m and the grid code was in the form of a one-dimensional integer array at a length of 10 characters.Figure 3 shows the distribution of the commercial buildings on a global scale (on the left) and local scale (on the right); detailed information on the latter is shown in Table 2.

Analysis of the Code
Traditionally, a city component is identified with a 16-digit code as per the China National Standard.It is made up of a six-digit administrative division code, a four-digit category code, and a six-digit sequence code [4,9,13].In contrast, identification code in this paper was based on its spatial location in the DGGS framework and thus had the following advantages: (1) Universal utility across systems: Sequential code in the traditional method is a series of computer characters without any attributes of the component itself.As different users may specify different sequential codes for the same component, it is not conducive for accurate identification.In contrast, the grid code comes from the location information of the component, so different users may obtain the same identification code for the same city component.It is of universal utility across all of the systems and can facilitate operations between different departments in the unified management of multi-source city components.(2) Explicit expression of accurate location: The administrative division code in the traditional code usually represents a region with a large area and irregular shape.However, the discrete global grids of the same level share the same shape and size, which is consistent.Furthermore, the grid code can express more accurate spatial location information than the administrative division code.This helps in identifying city components effectively by their grid codes as required, and thereby might contribute to an improvement in the efficiency of a digital city management system.

Analysis of the Code
Traditionally, a city component is identified with a 16-digit code as per the China National Standard.It is made up of a six-digit administrative division code, a four-digit category code, and a six-digit sequence code [4,9,13].In contrast, identification code in this paper was based on its spatial location in the DGGS framework and thus had the following advantages: (1) Universal utility across systems: Sequential code in the traditional method is a series of computer characters without any attributes of the component itself.As different users may specify different sequential codes for the same component, it is not conducive for accurate identification.In contrast, the grid code comes from the location information of the component, so different users may obtain the same identification code for the same city component.It is of universal utility across all of the systems and can facilitate operations between different departments in the unified management of multi-source city components.(2) Explicit expression of accurate location: The administrative division code in the traditional code usually represents a region with a large area and irregular shape.However, the discrete global grids of the same level share the same shape and size, which is consistent.Furthermore, the grid code can express more accurate spatial location information than the administrative division code.This helps in identifying city components effectively by their grid codes as required, and thereby might contribute to an improvement in the efficiency of a digital city management system.
(3) Implicit expression of spatial relationships: Traditional codes offer hardly any useful information about spatial relationships.However, the discrete global grid system uses a unified subdivision and coding framework, so the grid code can indicate a simple spatial relationship between components [35].Using Microsoft China (Point A in Figure 3) and the Daimler Tower (Point B in Figure 3), as examples, it can be seen that the codes of these two buildings were B9A33FDE7B and B9A33FDE4A, respectively.They were of the same length, but end with 7B and 4A, respectively.We made a simple inference of the spatial relationship between them according to their codes.
For the spatial direction relationship, it was found that Microsoft China is located on the north-east side of the Daimler Tower, since 7 is greater than 4, and B is greater than A. For spatial distance relationship, since the code length was 10 characters, the grid size of this level was inferred to be approximately 48 m × 64 m.Their zonal distance is about three grids (144 m), and their meridional distance about one grid (64 m), which is broadly consistent with the actual distances (125 m and 49 m).

Comparison in Data Query
To compare the query efficiency of the grid code with the traditional code, an application scenario was set up where a user input an identification code to search for the corresponding city component in the digital city management system.When the user input the first character of the code, all records that started with it formed a preselected data set.Then, the second character was input and all of the matched records combined a new preselected data set on the basis of the former one.This process was repeated several times and the last preselected data set was traversed to find out the right city component.Figure 4 shows the query time of the grid code and the traditional code from no pre-selection to five pre-selections.The main indicator used to compare and evaluate their performances was the time spent for the test: the less time it took, the higher the efficiency.
(3) Implicit expression of spatial relationships: Traditional codes offer hardly any useful information about spatial relationships.However, the discrete global grid system uses a unified subdivision and coding framework, so the grid code can indicate a simple spatial relationship between components [35].Using Microsoft China (Point A in Figure 3) and the Daimler Tower (Point B in Figure 3), as examples, it can be seen that the codes of these two buildings were B9A33FDE7B and B9A33FDE4A, respectively.They were of the same length, but end with 7B and 4A, respectively.We made a simple inference of the spatial relationship between them according to their codes.For the spatial direction relationship, it was found that Microsoft China is located on the north-east side of the Daimler Tower, since 7 is greater than 4, and B is greater than A. For spatial distance relationship, since the code length was 10 characters, the grid size of this level was inferred to be approximately 48 m × 64 m.Their zonal distance is about three grids (144 m), and their meridional distance about one grid (64 m), which is broadly consistent with the actual distances (125 m and 49 m).

Comparison in Data Query
To compare the query efficiency of the grid code with the traditional code, an application scenario was set up where a user input an identification code to search for the corresponding city component in the digital city management system.When the user input the first character of the code, all records that started with it formed a preselected data set.Then, the second character was input and all of the matched records combined a new preselected data set on the basis of the former one.This process was repeated several times and the last preselected data set was traversed to find out the right city component.Figure 4 shows the query time of the grid code and the traditional code from no pre-selection to five pre-selections.The main indicator used to compare and evaluate their performances was the time spent for the test: the less time it took, the higher the efficiency.As seen in Figure 4, with the increase of the number of pre-selections, the query time of both the codes was relatively stable at first, and then dropped.This was possibly due to the first two characters in all of the grid codes being the same, and the first four characters in all of the traditional codes were the same.Moreover, the query based on grid code cost less time than that of the traditional code, the reason being in the structure of the codes.On one hand, the grid code had fewer characters than the traditional code.A shorter code is generally more advantageous in the traversal and performs better in queries.On the other hand, the grid code uses hexadecimal numbers.Theoretically, it has a higher resolution as each character contains more possibilities.This is why three characters of the grid code are able to distinguish between the different city components, whereas As seen in Figure 4, with the increase of the number of pre-selections, the query time of both the codes was relatively stable at first, and then dropped.This was possibly due to the first two characters in all of the grid codes being the same, and the first four characters in all of the traditional codes were the same.Moreover, the query based on grid code cost less time than that of the traditional code, the reason being in the structure of the codes.On one hand, the grid code had fewer characters than the traditional code.A shorter code is generally more advantageous in the traversal and performs better in queries.On the other hand, the grid code uses hexadecimal numbers.Theoretically, it has a higher resolution as each character contains more possibilities.This is why three characters of the grid code are able to distinguish between the different city components, whereas using traditional code requires at least five characters.In summary, the grid code resulted in less time cost and held higher query efficiency when compared with the traditional code.

Performance in Geospatial Computation
To illustrate the performance of this grid code in geospatial computation, we set up an application scenario that searched for all of the parking lots around a commercial building within a certain distance.Specifically, with the location of the commercial building as the center, a buffer was built with a specific width (distance), then all of the parking lots within this buffer were checked.The experiment was conducted with the help of the database in PostGIS 2.4 for PostgreSQL 9.6.We simulated the buffer width by changing the distance between the commercial buildings and ideal parking lots by starting at 100 m and increasing in intervals of 100 m to 2000 m each time.In each case, the test was repeated 10 times to obtain the average running time.The grid codes were used to express the location information, which was the foundation of the entire computation.In contrast, the traditional code could not be applied to this scenario since it only contained the location information on an urban scale, which is not accurate enough to be used for geospatial computation.The results are shown in Figure 5.
using traditional code requires at least five characters.In summary, the grid code resulted in less time cost and held higher query efficiency when compared with the traditional code.

Performance in Geospatial Computation
To illustrate the performance of this grid code in geospatial computation, we set up an application scenario that searched for all of the parking lots around a commercial building within a certain distance.Specifically, with the location of the commercial building as the center, a buffer was built with a specific width (distance), then all of the parking lots within this buffer were checked.The experiment was conducted with the help of the database in PostGIS 2.4 for PostgreSQL 9.6.We simulated the buffer width by changing the distance between the commercial buildings and ideal parking lots by starting at 100 m and increasing in intervals of 100 m to 2000 m each time.In each case, the test was repeated 10 times to obtain the average running time.The grid codes were used to express the location information, which was the foundation of the entire computation.In contrast, the traditional code could not be applied to this scenario since it only contained the location information on an urban scale, which is not accurate enough to be used for geospatial computation.The results are shown in Figure 5.As seen in Figure 5, the average test time changed with distance.From 100 to 500 m, the test time deceased, partly owing to the cache mechanism where an earlier calculation laid some preprocessing for a later calculation.From 500 to 2000 m, the test time gradually increased, most probably due to the increased calculation that was accompanied by the increased distance.When the grid code was used to construct the buffer around a commercial building, the adjunct grids were obtained simply by left-shift or right-shift operations since the spatial relationship was directly reflected in the grid codes.Consequently, the buffer was expressed by a set of grids.When searching the parking lots nearby, we only needed to select the common ones from the code sets of the parking lot and the buffer, which almost abandoned the complex spatial distance calculation.Furthermore, the grid code was an integer array and could be easily transformed to binary code, which is well suited to in silico calculation.This experiment demonstrated a simple case of using the grid code for distance-based searching.The results revealed that the improved identification code for the city component was computable.This may be beneficial to both the spatial relationship inference in the outdoor work, and the geospatial computation in the digital city management system, which has significant advantages over the traditional code.As seen in Figure 5, the average test time changed with distance.From 100 to 500 m, the test time deceased, partly owing to the cache mechanism where an earlier calculation laid some preprocessing for a later calculation.From 500 to 2000 m, the test time gradually increased, most probably due to the increased calculation that was accompanied by the increased distance.When the grid code was used to construct the buffer around a commercial building, the adjunct grids were obtained simply by left-shift or right-shift operations since the spatial relationship was directly reflected in the grid codes.Consequently, the buffer was expressed by a set of grids.When searching the parking lots nearby, we only needed to select the common ones from the code sets of the parking lot and the buffer, which almost abandoned the complex spatial distance calculation.Furthermore, the grid code was an integer array and could be easily transformed to binary code, which is well suited to in silico calculation.This experiment demonstrated a simple case of using the grid code for distance-based searching.The results revealed that the improved identification code for the city component was computable.This may be beneficial to both the spatial relationship inference in the outdoor work, and the geospatial computation in the digital city management system, which has significant advantages over the traditional code.

Limitations and Prospects
This paper adopted the two-dimensional GeoSOT grid to identify city components on the Earth's surface.However, an increasing number of city components will exist in three-dimensional space during the processes of urbanization.There is a need for further research on the unified coding of city components.One possible solution is to refer to the three-dimensional expression of GeoSOT, and to use it for identification as well as for geospatial computation [36,37].However, it should be pointed out that an increase in the dimension accompanies an increase in the length of the code.This may create additional storage overheads, and, moreover, is not conducive to manual recognition.Furthermore, the method proposed in this paper only guarantees that one city component has one identification code, but could not guarantee that one identification code belonged to a single city component.One possible solution to this is to add more accessorial information into the identification code to ensure its uniqueness.Some studies have recommended a category attribute as the preferred supplement on account of its inherency and stability [38,39].Nonetheless, automatic coding for categories is another problem, and the efficiency and accuracy of category encoding has yet to be confirmed.Furthermore, the discrete global grid system offers an efficient unified method for location representation and geospatial computation [40,41].Two simple application scenarios were mentioned in this study.Nevertheless, further studies on the indexing and computation mechanism of the grid are needed to support more complex applications across wider fields.
An identification code based on a discrete global grid system is expected to be applied to the spatial information exchange standard [42][43][44].Spatial information infrastructure (SII) has become an essential part of national information infrastructure construction [45].However, the lack of an exchange standard has hindered the development of SII [46].The identification code that is presented in this paper is of universal utility across systems that explicitly expresses location and is computable.Therefore, it has the potential to be used for a spatial information exchange standard, and it also contributes to the unified management of geospatial data [47].

Conclusions
With rapid urbanization, city management has become digitized.The city component is a key element, whose identification process plays an important role in digital city management.This paper proposes an improved method for identifying city components based on the discrete global grid system.According to their spatial location, city components were identified with one-dimensional integer codes.These codes were of universal utility across systems, explicitly expressing accurate location and implicitly expressing spatial relationships.On this basis, they were used for data queries and geospatial computation.They performed much better than the traditional codes.Therefore, we conclude that the improved identification code is conducive to the more efficient management of city components, and hence may be used to improve digital city management.

Figure 1 .
Figure 1.Geographic coordinate Subdivision grid with One-dimensional integral coding 2 n -Tree (GeoSOT) subdivision and coding method.

Figure 1 .
Figure 1.Geographic coordinate Subdivision grid with One-dimensional integral coding 2 n -Tree (GeoSOT) subdivision and coding method.

Figure 2 .
Figure 2. Identification code based on the discrete global grid system.

Figure 2 .
Figure 2. Identification code based on the discrete global grid system.

Figure 3 .
Figure 3. Distribution of commercial buildings at the global and local scales, respectively.

Figure 3 .
Figure 3. Distribution of commercial buildings at the global and local scales, respectively.

Table 1 .
GeoSOT grid scale of different levels.

Table 1 .
GeoSOT grid scale of different levels.

Table 2 .
Detailed information of the commercial buildings at the local scale.

Table 2 .
Detailed information of the commercial buildings at the local scale.