Next Article in Journal
Spatial Modeling of COVID-19 Prevalence Using Adaptive Neuro-Fuzzy Inference System
Previous Article in Journal
Using Flickr Data to Understand Image of Urban Public Spaces with a Deep Learning Model: A Case Study of the Haihe River in Tianjin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A GloVe Model for Urban Functional Area Identification Considering Nonlinear Spatial Relationships between Points of Interest

Institute of Geospatial Information, Information Engineering University, Zhengzhou 450001, China
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2022, 11(10), 498; https://doi.org/10.3390/ijgi11100498
Submission received: 9 July 2022 / Revised: 16 September 2022 / Accepted: 23 September 2022 / Published: 24 September 2022

Abstract

:
As cities continue to grow, the functions of urban areas change and problems arise from previously constructed urban planning schemes. Hence, the actual distribution of urban functional areas needs to be confirmed. POI data, as a representation of urban facilities, can be used to mine the spatial correlation within the city. Therefore it has been widely used for urban functional area extraction. Previous studies are mostly devoted to mining POI linear location relationships and do not comprehensively mine POI spatial information, such as spatial interaction information. This results in less accurate modeling of the relationship between POI-based and urban function types. In addition, they all use Euclidean distance for proximity assessment, which is not realistic. This paper proposes an urban functional area identification method that considers the nonlinear spatial relationship between POIs. First, POI adjacency is determined according to road network constraints, which forms the basis of a co-occurrence matrix. Then, a Global Vectors (GloVe) model is used to train POI category vectors and the feature vectors for each basic research unit are obtained using weighted averages. This is followed by clustering analysis, which is realized by a K-Means++ algorithm. Lastly, the functional areas are labeled according to the POI category ratio, enrichment factors, and mobile phone signal heat data. The model was tested experimentally, using core areas of Zhengzhou City in China as an example. When the results were compared with a Baidu map, we confirmed that making full use of nonlinear spatial relationships between POIs delivers high levels of identification accuracy for urban functional areas.

1. Introduction

The city is the product of human society at a certain stage of development. It is an open, complex and giant system [1]. The various components of cities interact in different forms and ways to form various urban structures, including mainly economic, social and spatial structures. Spatial structure refers to the configuration and distribution of facilities in different areas of a city [2]. As cities continue to grow, their spatial structures are becoming more diverse and complex [3]. The planning schemes developed in the early days of the city are no longer reasonable. It has led to issues such as traffic congestion, environmental degradation, housing shortages and emergency lag [4]. In order to solve the problems of urban development, optimize the layout of urban space and improve the full utilization of space resources, it is necessary to formulate a reasonable urban planning scheme again. In the process of formulating the urban planning program, the primary task is to determine the functional zoning of the city.
Originally, the identification of urban functional areas relied primarily on surveys and statistics. However, this is potentially subjective, limited by statistics, and inefficient [5]. Thus, some studies moved on to using high-resolution remote sensing images. This can accurately capture physical surface characteristics, such as shape, texture, and spectrum [6,7,8]. However, this method suffers from low accuracy and poor data currency. In addition, functional areas are affected by human social and economic activities and only natural attributes can be extracted from remote sensing images. To solve these issues, the concepts of “social perception” [9] and “urban computing” [10] have been applied to the measurement of social and economic activities in urban space.
With the development of sensors and data acquisition technology, an enormous quantity of data have become available to study the socio-economic environment and to uncover patterns of interactions in urban functions. More current methods for identification of urban functional areas mainly utilize Points of Interest (POIs) [11,12,13,14,15], social media [16,17,18,19,20], and vehicle trajectories [21,22,23,24]. Thus, various types of Big Data have made it possible to discover the distribution of urban functional areas in ways that were not previously feasible. Hélder Tiago [19] used photo data from social media to confirm the correlation between tourism activities and the functional distribution of cities. Xue [20] used Tencent Street View photos to analyze pedestrian trajectories and advance research on the identification of functional urban areas. Chen [21] used taxi GPS data to identify functional areas of the city by analyzing their movement patterns. However, the identification of functional areas using social media data and vehicle track data only uses the frequency of their data points as a criterion for determining the type of functional area and does not take into account the spatial correlation within urban areas.
Among the rich data sources above, POI data are city-wide facilities, institutions and geographical places closely related to people’s lives abstracted as point geographic entities with both location and semantic information. POI data can be used to study spatial correlation and spatial interaction within cities. That is because the configuration of the POI varies in the different functional areas. For example, industrial areas have many companies, factories and some infrastructure such as parks, restaurants and shops. POI data are also characterized by their accessibility and presentability. Therefore, POI data are widely used in the identification of urban functional areas. We also used POI data for our research.
In the early days, POI-based urban functional area identification focuses on analysis of the number and density of various types of POIs in basic urban units. For instance, Hu et al. [11] used 250 m2 POI density information to identify the functional areas in Guangzhou. Yuan et al. [12] mined POI data using a probabilistic topic model to enable functional area identification in Beijing. Yi et al. [13] analyzed POI data quantitatively and used Fisher’s exact test method to identify urban functional areas. Gao et al. [14] used Latent Dirichlet Allocation (LDA) topic modeling technology to mine POI and social media data to achieve the same objective. They are limited by the fact that the functions were only determined according to POI frequency. This results in much of the spatial information of the POI being wasted. This method fails to explain that points within a space are defined by their surroundings, violating the first law of geography [25]. Thus, their accuracy therefore needs to be improved further.
To solve the above problems, Yao et al. [26] proposed a method that combines the language model, Word2vec, and POI data. The shortest distance sequence between all POIs in each independent, basic research unit was used to construct a training data set. Regional feature vectors were then obtained that made it possible to identify the urban functional areas. Wu et al. [27] applied a Global Vectors (GloVe) deep learning model that considered the global co-occurrence of information when undertaking functional area identification. Here, the POI spatial distribution was described by natural language sequences and language models were used to quantify the relationship between the POIs and different categories of urban functional area. These methods can mine latent semantic information and geographical information related to the spatial distribution of POIs. However, linearly distributed [28] natural language sequences were used to organize the POI data in the Parcels (Figure 1a). This method of POI organization does not fully exploit the spatial information of POIs. For example, POI information that is far apart in a natural language sequence but spatially close to each other will be ignored. Additionally, in urban spaces, POIs have a scattered distribution and a strong correlation with surrounding roads. There are also interactive connections between adjacent parcels that are notably different to natural language sequences. This makes using the linear relationship between POIs in parcels potentially problematic and can result in nonsensical outcomes such as the splitting of adjacent POIs across two sides of a street. The mining of spatial interaction features between parcels can therefore be undermined by proceeding in this way. There is another very important point: this type of approach measures the proximity of POIs in terms of Euclidean distances. However, in reality, there are buildings that can be an obstacle. Therefore this measurement is not realistic.
In this paper, we propose an urban functional area identification method that takes into account the nonlinear spatial relationship between POIs. This method considers not only geographical adjacency, but also urban structural integrity. It can comprehensively mine POIs proximity information and spatial interaction information in line with realistic scenarios. The spatial adjacency of POIs is first established on the basis of road networks (Figure 1b). Take any POI as the center and count the information of all the neighboring points within the neighborhood threshold of its road network. An improved co-occurrence matrix can then be constructed and the global POI co-occurrence information can be mined. After this, a GloVe model is used for training so as to obtain POI category vectors, which form the basis of feature vectors. K-Means++ is then used to cluster the feature vectors, and the regional clusters are evaluated by three indicators to determine the category of the urban functional area.

2. Study Area and Data

The study area includes the main urban areas of Zhengzhou City in China (namely, the following districts: Zhongyuan; Jinshui; Erqi; and Guancheng Huizu (Figure 2). These cover a total area of 801.1 km2. With its excellent economic development, large residential population and dense road network, Zhengzhou is a city of great dynamism. It is the political, economic, and cultural center of Henan Province. In recent years, rapid urban development has rendered the functional structure of its main urban areas highly heterogeneous. Thus, the study area incorporates a variety of different urban functional types, such as residential areas, science, educational and cultural facilities, business and entertainment zones.
The data used in the study came from the 2020 urban administrative division map of Zhengzhou, POI data for Zhengzhou on 28 December 2021, the road network data from 2020, and the mobile phone signal heat data for Zhengzhou residents on 3 June and 5 June 2020. The POI data were obtained through an application programming interface (API) provided by the AutoNavi open platform. There was a total of 216,413 data points in the study area. The road network data were downloaded from the OpenStreetMap website. The mobile phone signal data were provided by China Unicom. Specific data descriptions are given in Table 1.

3. Methodology

As considering the spatial distribution of urban POIs as a whole can assist with mining urban functional areas, we used a GloVe model to train POI category vectors. The POI co-occurrence relationships needed to be based on the actual urban structure, so the road network was used to construct a co-occurrence matrix. An overall flowchart for the proposed method is shown in Figure 3. It covers five steps: (1) construct the parcels (basic research unit) according to the urban road network; (2) from a city-wide perspective, establish POI adjacency relationships on the basis of the road network and constructing a co-occurrence matrix; (3) use a GloVe model to train POI category vectors; (4) construct feature vectors for each parcel and perform clustering; and (5) identify and label the functional areas. The rest of this section describes this process in detail.

3.1. Constructing the Parcels (Basic Research Units)

In existing studies, the research areas are mainly divided according to grids [11], cadastral data [26], and road networks [29]. In grid-based methods, irregular urban blocks are converted into uniform shapes, which does not represent the macro-morphology of the urban structure. Division on the basis of cadastral data risks breaking the connectivity between regions. Division according to urban road networks, however, more naturally reflects how cities are actually organized. The proposed method therefore uses highways, primary roads, and secondary roads as basic urban division units. The areas formed by these divisions are known as Parcels.
A morphological approach [30] was used to construct the Parcels. As shown in Figure 4a, road networks are usually represented by multiline roads, which affect the organization of the POIs. It is therefore better to generate single-line road networks. This involves first of all rasterizing the original road network and thickening the roads using a morphological expansion function, which fills in the gaps and voids between roads (Figure 4b). The expanded roads were then refined and, in order to maintain the overall topology of the original road network, the central axis of each road was extracted. This was used to represent the road (Figure 4c). Finally, meshes were constructed for the single-line road network to form Parcels (Figure 4d).

3.2. Co-Occurrence Matrix Construction

The distribution of POI types in terms of number is a power–law distribution, as is the distribution of words, so that deep learning natural language models can be used. The only one of these is GloVe, a word embedding model that makes use of city-wide statistical information. It can embed a high-dimensional vector space with the total number of words into a continuous vector space of a much lower dimension. The words can then be mapped to vectors represented by numbers [31]. The statistical information it utilizes–co-occurrence frequency–is very important information about the spatial characteristics of urban areas and helps in modelling the relationship between POI types and urban area functions. The GloVe model was therefore chosen for the identification of urban functional areas.
The GloVe model stores statistical information in a co-occurrence matrix [32]. A co-occurrence matrix is a matrix consisting of the frequency of two simultaneous occurrences of two words in all different words. It can be used to measure semantic similarity between words based on the principle that words in a language are semantically closer to their neighbors. In geography, a similar law suggests that the shorter the distance between two objects, the closer their relationship (the first law of geography) [25]. This renders co-occurrence matrices also suitable for mining POI semantic information in urban space.
Constructing a co-occurrence matrix is at the heart of the GloVe model. The original GloVe model constructed the co-occurrence matrix by sliding over each sentence in the corpus (i.e., a set of texts in a certain format [33]) using a fixed window and counting the number of times that different words appeared in the window at the same time. When applying them to geographical studies, Euclidean distances have been commonly used in the past to calculate the proximity between objects. They first use a greedy algorithm to arrange all POIs in the basic research units into a sequence of POIs that satisfy the shortest distance. The frequency of co-occurrence with neighboring objects is then obtained by sliding over each POI sequence through a fixed window. The final co-occurrence matrix is formed. This rationale for transforming POI into sequential documents is not theoretically convincing or rigorous. The transformation of spatial data into sequential document data has limitations in terms of mining the spatial characteristics of POIs. This is because natural sequences are inherently sequential in structure (linear in structure), whereas POIs are distributed in a two-dimensional geographic space (non-linear in structure). This allows co-occurrence information to be ignored for POI pairs that are far apart in terms of POI sequence location, but close in terms of spatial location. As an example, let us take the POI sequence: scientific research institutions, fitness centers, parking lots, colleges and universities, Chinese restaurants, dormitories, entrances and exits. Let the size of the window be 5. The central term is colleges and universities, and the terms within a step length of 2 are fitness centers, parking lots, Chinese restaurants and dormitories. In reality, however, colleges and universities are most strongly correlated with scientific research institutes. This makes it clear that adjacency relationships obtained in this way are incomplete and unrealistic. Increasing the size of the window only results in additional redundant adjacency relationships. In addition, this approach is quite sensitive to the construction of Parcels (basic research units). This approach assumes that POIs are only relevant within the same area, ignoring the fact that POIs that exist within different Parcels are also potentially relevant. For example, the presence of a public security bureau and a fire station on either side of a road, which are geographically close to each other but belong to two adjacent Parcels, would not be statistically relevant if this method was used.
In this paper, considering that POIs are geographically distributed, unlike the sequential distribution of words in documents, the original method of constructing a co-occurrence matrix is no longer used. In addition, in order to explore the non-linear location relationship and spatial interaction information of POIs in geographic space, we no longer rely on Parcels, but take a city-wide perspective. We find that urban facilities are usually arranged along the sides of roads in the real world and that physical movement in the urban space is typically constrained by the road network (Figure 5). For example, walking from point A to point B in Figure 5, because there are buildings on both sides of the road that create barriers, we cannot walk along the red line (a straight line formed by two points), we can only walk along the blue line (the road). The road network can therefore be used to mine the neighborhood relationships between POIs and construct a co-occurrence matrix. The proximity relationships mined by this method are not only closer to reality, but are also more comprehensive. This method does not result in any of the following three situations: (1) Information from POI pairs that are far apart in the real world are incorrectly counted, because they are in the same parcel and the POI points on the parcel are sparse; (2) Due to parcel limitations, the information about POI pairs that are actually close is ignored; (3) POI pairs are in the same parcel and are close to each other in the real world. However, due to the number of windows set, their proximity information is not counted.
To obtain sufficient information, small POI categories are best used to construct a training dataset. The co-occurrence matrix can be achieved as follows:
  • Abstract the road network into a graph, G = (N, E), where E is the set of all roads (edge set), and N represents their intersections (node set);
  • Use a nearest neighbor search method to map geographic entities to the edges of the road network and represent these entities in the form of a tuple, 〈 e ,     ( p o s 1 ,   p o s 2 ) 〉, where: e is the nearest neighbor of each geographic entity, working from the start point and end point of each edge; p o s 1 denotes the distance between the projected point of the geographic entity and the starting point of e; and p o s 2 represents the distance between the projected point of the geographic entity and the end point of e. So, the geographic entity D 1 in Figure 6 can be represented as 〈 ( n 1 ,     n 2 ) ,     4.2 ,     2.1 〉;
  • Perform a nearest neighbor search according to the road network constraints for each geographic entity to obtain a neighboring point set, having set a distance threshold. For the geographic entity D 1 in Figure 6, the neighboring point set is ( B 1 , C 1 , C 2 , A 1 );
  • Build an n × n co-occurrence matrix, X , where n is the total number of POI categories. The value of each element, X i j (i and j are the categories), in the matrix is updated according to the neighboring point set. So, on the basis of the neighboring point set of D 1 , it is necessary to add 1, 1, 2 at X D A , X D B and X D C , respectively. The results are shown in Table 2.
The neighboring point distance threshold can be obtained by calculating the average value of the network distance between each geographic entity and the nearest neighboring geographic entity [34,35]. The calculated value was 150 m in this study.

3.3. Construction of the POI Category Vector

The representation of the regional features depends upon the training of the POI category vectors using the co-occurrence matrix. The POI co-occurrence matrix is X, with the element Xij being the frequency of co-occurrence of the POIs in category i and category j in their spatial context. X i = k X i k is the frequency of the co-occurrence of any POI category and category i. P i , j = P ( j | i ) = X i j / X i is the probability that category j will appear in the spatial context of category i. When training the POI category vectors, the co-occurrence probability ratio is better able to measure the correlation between the POI categories than the co-occurrence probability itself. This helps to distinguish the different POI categories and accurately express the semantic characteristics of the various POIs. Taking an example, let the small POI categories include colleges and universities, libraries, supermarkets, Chinese restaurants, metallurgical chemical units, industrial parks, etc., and let i = colleges and universities, j = metallurgical chemical units, and k = the other POI categories. Libraries in urban spaces are usually close to colleges and universities, while the adjacency relationship between libraries and metallurgical chemical units is weak. When k = libraries, the value of P i , k / P j , k will be large. Industrial parks and universities possess a weaker proximity relationship, while industrial parks have a stronger proximity relationship with metallurgical chemical units. When k = industrial parks, the value of P i , k / P j , k will be small. In addition, there are supermarkets distributed near both metallurgical chemical units and universities. So, when k = supermarkets, the value of P i , k / P j , k will be close to 1.
If the POIs were organized using the linear approach described in Section 3.2, the co-occurrence probability ratios would misrepresent the correlation of real-world urban facilities. For example, the following two situations can occur: (1) Since metallurgical chemical units and industrial park areas belong to different parcels. Their co-occurrences are not counted, resulting in a low probability of co-occurrence; (2) In suburban areas with larger parcels, the colleges and universities may be in the same parcel as the industrial park. This would erroneously make them co-occur more frequently, leading to a higher co-occurrence probability. These two cases make the value of P i , k / P j , k instead larger when k = industrial park. This ratio reflects the larger correlation between industrial park and higher education institutions, contrary to reality.
Using the non-linear organization of POIs in this paper, these two situations do not occur. Although metallurgical chemical units and industrial parks are in different parcels, their co-occurrence counts will be counted because the road network distance is less than the threshold. Universities and industrial parks are in the same parcel, but their road network distances are larger. Their co-occurrence counts will not be counted. The non-linear organization of POIs in this paper enables the co-occurrence probability ratios to correctly reflect the correlation between POI types.
From the example situation above, a generalized model of the loss function for the GloVe model can be expressed as follows:
g ( ( v i v j ) T v k ) = P i , k P j , k = g ( v i T v k ) g ( v j T v k )
where, v i ,   v j ,   and   v k represent the feature vectors for the POI categories i, j, and k, respectively. If an exponential operation is performed for ( v i v j ) T v k , the loss function becomes:
P i , k P j , k = exp ( v i T v k ) exp ( v j T v k )
To keep the loss function model as simple as possible, this equation needs to be simplified. Doing this involves keeping the numerator and denominator on both sides of the equation equal to each other and maintaining the symmetry on the right-hand side. So, the deviation terms b i and b j need to be added in the left side of the equation:
v i T v j + b i + b j = log ( X i , j )
Based on the principle that the higher the co-occurrence frequency of the POI categories, the greater the weight, a weight term should be added to the cost function:
J = i , j N f ( X i , j ) ( v i T v j + b i + b j log ( X i , j ) 2
Since the co-occurrence matrix is a sparse matrix, it is necessary to limit the value of the loss function when X i , j = 0 , to ensure that the weight does not increase or decrease significantly when the co-occurrence frequency is too large or too small. The weight function therefore needs to be:
f ( x ) = { ( x x m a x ) 0.75 , x < x m a x 1 , x x m a x
This is because the number of POI types is less than the number of words in the semantic space and the co-occurrence frequency is greater than the number of co-occurrence of words in the semantic space. The x m a x value in this paper needs to be set higher than the 100 set in the original model, which was finally set to 400 after multiple experiments.
The dimensions of the POI category vectors are key parameters in the GloVe model. As the number of POI categories is not as high as the number of words in a semantic space, the dimensions for the POI category vectors can be set to 70 (Table 3). To analyze the spatial distribution of the 70-dimensional POI category vectors, a data dimensionality reduction technology, t-SNE, was used to mine the information from the higher-dimensional data. t-SNE maps all the POI category vectors into a 3D semantic space, as shown in Figure 7. After dimensionality reduction, the POI categories with similar or related spatial semantic information will be closer to one another. Looking at the example in Figure 7a, there are various POI categories, including government agencies, medical and healthcare facilities, and sports and leisure facilities. These kinds of resources are usually located in the public service and management areas of a city. In particular, as China Telecom and China Mobile are telecommunications companies with the same role in an urban space, their branches almost overlap in the 3D semantic space. The large POI categories in Figure 7b include catering services, shopping services, and public facilities. At the level of the smaller categories, restrooms, casual dining places, Suning stores, and Gome stores all appear together in shopping malls, where they play a similar role in meeting peoples’ needs for leisure and entertainment. It is these kinds of precise and nuanced concerns that justify the use of smaller POI categories.

3.4. Regional Feature Expression and Clustering

To accurately measure the similarity of different urban regions, it is necessary to construct regional feature vectors. The regional features in this study were constructed at a Parcel scale, so all of the POIs in each Parcel needed to be obtained. Then, different functional clusters were obtained through clustering.
Previous studies have used the weighted average of all the word vectors in a text to measure similarity [36,37]. Here, the POI category vectors obtained via the GloVe model were initially treated as word vectors, with the POI categories in each Parcel being treated as the text. The weighted average of the word vectors was then calculated to obtain the feature vector for each Parcel:
P a r c e l v e c i = j = 1 N t y p e ( p i , j ) N
where P a r c e l v e c i represents the feature vector of the i-th Parcel; N is the total number of POIs in the Parcel; and p i , j is the POI vector of category j in the i-th Parcel.
Having obtained the Parcel feature vectors, they were clustered using the modified K-means algorithm, K-means++. When initializing the cluster centers, they were placed as far away from one another as possible to achieve a globally optimal solution. The prior parameter, K, in a K-means++ algorithm, determines the clustering effect. Elbow and silhouette coefficient methods were used to objectively select the value of K. The key index in the elbow method is the Sum of Squared Errors (SSE). As K increased, the SSE first decreased sharply, then flattened out (Figure 8a). The relationship between the K and SSE values takes the form of an elbow, with the elbow being at K = 5. In the silhouette coefficient method, the larger the value, the better the clustering effect. As shown in Figure 8b, the value was at its largest at K = 5. So, in this study, K = 5 was chosen as the optimal number of clusters.

3.5. Identification of the Urban Functional Areas

After obtaining the regional clusters with similar functions by clustering, an actual spatial meaning needs to be assigned to each of them so as to identify the functional areas. There are many ways of classifying urban functional areas. Here, we took the perspective of peoples’ everyday lives and social activities. The functional areas therefore incorporated things such as business areas, residential areas, public service areas, industrial areas and scenic spots [38,39]. Existing data were used to label each regional cluster according to the following three indicators:
(1) The category ratio. The frequency density (FD) and category ratio (CR) of each large POI category in each regional cluster was calculated to obtain the distribution of each POI within the different regional clusters. This was executed using the following:
F D i = n i N i
C R i = F D i i = 1 n F D i × 100 %
where i is the POI category; n i is the number of POI category i in the regional cluster; N i is the total number of POI category i; F D i is the frequency density of POI category i in the regional cluster; and C R i represents the ratio of the frequency density of POI category i in the regional cluster to that of all the POIs;
(2) Enrichment factor. As certain POI categories occur very frequently in urban spaces, the FD and CR of the POIs cannot fully reflect the attributes of the regional clusters. It is therefore necessary to add in an enrichment factor (EF) for each category of POI. This can be calculated as follows:
E F i j = ( N i j / N i ) / ( N j / N )
where E F i j represents the EF of the category j POIs in the i-th the regional cluster; N i j represents the number of category j POIs in the i-th the regional cluster; N i represents the number of all POIs in the i-th the regional cluster; N j is the number of category j POIs in the whole study area; and N denotes the total number of POIs in the study area;
(3) Population heat value. Human activities are closely related to the spatial structure of cities. The changing characteristics of individual activity in the city can reflect the urban functions undertaken by the study area. The population heat value represents the number of individuals active in an area, which is the basis for the analysis of human behavior. The study of its changes over time can help to label the clusters of areas obtained after clustering. It can compensate for the errors arising from the labelling of functional areas using only the EF and CR values of the POI, and increase the scientific and rational nature. Within the area covered by each regional cluster, the heat values of the population are statistically analyzed for 24 time periods on weekdays and rest days. The results can reflect the aggregation and distribution characteristics of the population under different time periods. The higher the value, the higher the aggregation of the local population in that period, and vice versa. For instance, during the morning and evening peak hours on workdays, the aggregation was very high in residential areas. The value was higher than it was during other time periods.

4. Results

4.1. Functional Area Identification

The proposed method was used to obtain functional clustering results for 1017 parcels in Zhengzhou City (Figure 9). The CR and EF values for each POI category in each regional cluster were also acquired (Table 4), together with the population heat values for a workday and a weekend (Figure 10 and Figure 11). The regional clusters were then labeled and the identification results analyzed:
(1)
C 0 : Residential areas
From the analysis of CR and EF values of POI, comparing with other regions horizontally, it can be found that the CR value of the business residential POI in this regional cluster is the highest. A longitudinal comparison shows that different types of POIs are evenly distributed. This is in line with the spatial distribution characteristics of residential areas which are surrounded by medical, sports and leisure, accommodation, restaurants, shopping, schools and other infrastructures that provide services to residents. Looking at the population heat values, as residential areas form the principal areas in which people live, their overall heat value was higher than in other areas. During commuting hours between 7:00 and 9:00 and 17:00 and 19:00 on weekdays, the heat value was especially high. At weekends, the heat value was stable during the day, with a peak at around 19:00 in the evening;
(2)
C 1 : Science, Education and Public Service areas
The CR value of the POI shows that there are numerous science, education, and culture-related venues in this regional cluster, such as Zhengzhou University, Henan University of Technology, Henan University of Traditional Chinese Medicine, etc. Meanwhile, transportation facilities, sports and leisure, restaurants and shopping POIs also have high values. It is consistent with the spatial distribution of infrastructure around the school that meets the shopping, dining, sports and traffic needs of the school’s students and faculty. The heat value in this area remained high during the daytime from 8:00 to 18:00 on weekdays, with a peak at 12:00, which agrees with what is known about students’ daily behavior. At weekends, the heat value was significantly lower;
(3)
C 2 : Commercial areas
The CR and EF values for the shopping service and restaurants POI in this regional cluster were high. These areas are mainly dominated by shopping malls and business buildings (office buildings), such as Zhenghong City, Xidi Port, Wanda Plaza, and Zhongke Building. On weekdays, the heat value for this area was especially high from 17:00 to 20:00, which is in agreement with what is known about people using these facilities after work for dining, shopping, entertainment, etc. The overall higher heat values on rest days than weekdays confirms the concentration of the flow of people to commercial areas on weekends;
(4)
C 3 : Natural scenic spots
Tourist sites’ POI in the regional cluster had the highest EF and CR values compared both horizontally and longitudinally. These areas included numerous tourist attractions and parks, such as Longhu Park, Xiliu Lake Park, Zhengzhou Forest, People’s Park, etc. These areas also contained a range of public facilities (e.g., public restrooms), restaurants, convenience stores, hotels and other basic service facilities. The distribution of the population heat value for different time periods was relatively uniform on working days, but the value at weekends was generally higher. The peak value was between 10:00 and 16:00 on weekends, which agrees with people’s known travelling behavior at weekends;
(5)
C 4 : Industrial areas
In terms of EF and CR values, the corporation-type POIs have the highest value in this regional cluster, while POIs such as scenic-type and government-type also hold not particularly low values. This is because, in addition to corporations, factories, agricultural, forestry, animal husbandry and fishery areas, industrial areas also include some basic service facilities, such as parks, restaurants, and convenience stores. At the same time, there were some government agencies nearby to regulate and control the industries, so the number of government agencies was also relatively large. From geographical location, this regional cluster is mainly distributed in the periphery of the city, which is in line with the actual distribution of industrial zones. In terms of the population heat value, the value was at its highest on the workday from 9:00 to 18:00, but lower at the weekend, matching the daily working behavior of residents;
(6)
C 5 : Unidentified areas
These areas were not analyzed in this paper because of the insufficient POI data, the overall low thermal values, and the small overall area with no specific functions.

4.2. Validation

To assess the accuracy of the results we acquired, the functional area identification results were compared with Baidu online maps and Baidu Street View. Table 5 shows a comparison of some typical areas.
Area A in the Table 5 gives the map and street view of Longhu Wetland Park, which is a scenic spot. The identification result also categorized this as a natural scenic spot ( C 3 ). Area B is Zhengzhou University, which belongs to the science, education, and culture-related category. Again, this is consistent with the identification result (C1). Area C is Grand View International Trade, World trade shopping center and Silverbase Plaza, which is business service-related areas. As the identification result is a business area (C2), this, too, is accurate. Area D has multiple residential areas, such as Zhengshang Goldfield Family, and Wu Jian Xin Neighborhood. The identification result was also residential area (C0). Area E incorporates many different companies and even has an Enterprise Park. Here, the identification result is industrial area (C4), which is consistent with the reality.
Together these results suggest that the proposed method was able to accurately classify the urban functional areas of Zhengzhou City.

4.3. Comparative Analysis

To assess the relative merits of our proposed approach in relation to other possibilities, the outcomes of using the popular LDA model (Figure 12a), Word2Vec model (Figure 12b) and a traditional GloVe model (Figure 12c) were compared with the results of using the proposed method. Experts with a geographical background were invited to label each parcel in terms of different function. The functional area types included residential areas, scientific research, education, and public service areas, scenic spots, business areas, and industrial areas. The results of the four methods were then compared to the manual classifications (Figure 12e). The confusion matrix (Figure 13), overall accuracy and kappa coefficient (Table 6) are calculated using the population classification results as criteria.
(1) The identification results of the LDA model have much deviation in both urban center and suburban areas (Figure 14). In the urban center area, multiple Science, Education and Public Service areas are classified as residential land uses. Meanwhile, multiple industrial sites in suburban areas are not correctly classified. From the confusion matrix, the probability of being correctly classified is low for either functional area (Figure 13a). This is because LDA is an unsupervised learning-topic probability generation model that uses a bag-of-words method to generate topics, which is independent of the order of word distribution in the document. Hence, the model essentially judges the type of functional area by the frequency of POI, and this judgment criterion has limitations. Specifically, there is only one school in the public service area, while there are multiple dormitories and family homes. The probability of the occurrence of residential areas is greater than that of scientific and educational places. So, scientific and educational places will be wrongly classified as a residential area. There are many services and infrastructures near factories in industrial land, and the number of these facilities is greater than that of factories, so industrial land is not classified correctly;
(2) Based on the LDA model which only considers POI frequency, the Word2Vec model takes into account the spatial correlation of POI. From the correct rate of each functional area identification in the confusion matrix (Figure 13b), it is seen that this method enhances the functional area identification to some extent. However, since the model is trained only on separate local contexts and not on global co-occurrence counts, important co-occurrence statistics of POI are ignored, resulting in the inaccurate identification of some regions. For instance, area A (Figure 15a) in the Word2Vec model identification results is the location of commercial shopping centers such as Zhenghong City and Jianye Kaixuan Plaza, while area B (Figure 15b) is the Xidi Port shopping center. They are both commercial areas but are not classified as such. Looking at areas C (Figure 15c) and D (Figure 15d), where there are multiple steel-casting and auto parts industrial businesses, these have been assigned to different categories. Area D is correct, but area C is not;
(3) The traditional GloVe model utilizes the global co-occurrence information for the training of POI-type vectors, as well as the overall accuracy and Kappa coefficient (Table 6) being higher than the first two methods. However, when organizing POI relationships, only POI linear location relationships are considered, resulting in errors in the statistics of global co-occurrence information. The effect of functional area recognition is limited. Areas A (Figure 16a) and B (Figure 16b) in the traditional GloVe model identification results are the locations of North China University of Water Conservancy and Hydropower and Henan University of Finance and Economics and Government, respectively, both of which are scientific research, education, and public service areas. However, they have been given different categories. Area C (Figure 16c) is Zhengzhou Xiliu Lake Park and area D (Figure 16d) is Longhu Wetland Park. Both of these areas are typical scenic spots, yet they have not been assigned to the same category.
The identification results for the proposed method are shown in Figure 12d. As can be seen, the incorrectly classified areas in the above two methods were all correctly classified, and the results are much more consistent with the actual functional areas. From the confusion matrix (Figure 13d), it can be seen that the accuracy of the proposed method was higher across all categories than that of a traditional GloVe model based on the POI’s linear spatial relationships. The identification accuracy for the science, education, culture and public service areas, industrial areas, and scenic spots was especially high, though the accuracy for business areas is slightly lower. From Table 6, it can be seen that the Kappa coefficients and overall accuracy of the proposed method were 0.74 and 0.80, respectively, which is better in both cases than the results produced by the other methods. This shows that we can model the relationship between POI types and functional areas more accurately by taking road network restrictions to mine POI proximity information. This functional area identification method, which takes into account the nonlinear location relationship of POIs, can improve the functional area identification effect.

4.4. Discussion

The proposed GloVe model identification method, which takes into account the nonlinear location relationship of POI, can effectively identify urban functional areas. Our method not only mines POI co-occurrence information from the city as a whole, but also takes into account the interaction information of POIs between different parcels. It can take into account the geographical proximity and structural continuity. In addition, the method in this paper differs from previous studies using Euclidean distance to mine the proximity relationship, but uses road network distance, which makes the POI proximity information more relevant to reality. In summary, it is able to mine more comprehensive POI semantic information and provides a more effective functional area identification result.
Nonetheless, the proposed method is not without limitations. As central urban central areas tend to have a rich mixture of multiple different functions, there is some deviation between the identification results for the central area (Figure 17a) and the actual functional categories (Figure 17b). The identification accuracy for business districts is particularly low (Figure 13d). This is mainly due to the fact that business areas are heavily concentrated in central areas, where they are often mixed with other functional areas such as residential areas and government agencies.
Another limitation of the approach in this paper is that both the Big Data and the model are only a simplification of the city. The identification of urban functional areas is a very complex process involving many factors, such as population growth, economic activities, etc. This paper only uses the population heat value of cell phone signaling data to reflect the aggregation and travel characteristics of the crowd, which can reflect the behavior pattern of residents to a certain extent, but is not comprehensive enough, meaning the results may have some bias. Further research can be conducted by fusing spatiotemporal data such as floating vehicle trajectory data, subway data, and social media data.

5. Conclusions

This paper proposed a GloVe-based model for urban functional area identification that considers the nonlinear spatial relationship between POIs. POI proximity information based on the road network was mined to construct a co-occurrence matrix and a GloVe model was used to train small POI category vectors. Parcel feature vectors were then constructed to measure the similarity between the different areas. After this, a clustering algorithm was used to split the parcels into six categories. CR, EF, and population heat values were used to assign different functions to the six region clusters. The method was applied to urban areas in Zhengzhou City and the city’s different functional areas were obtained. The identification results were then compared with Baidu maps to verify the accuracy of the proposed method. The performance of the proposed method was also compared with LDA, Word2Vec and a traditional GloVe model. It was found that the proposed method had a higher identification accuracy for urban functional areas. This reflects that the global co-occurrence information based on road network mining is closer to the real POI, which can obtain a more accurate modeling of the relationship between POI types and urban functions and improve the functional area identification.
This study can be a useful tool for assessing changes in the function of built-up areas, as a result of human economic activities. At the same time, this study can be a good supplement when it is relatively difficult to obtain conventional cadastral data in some areas. The findings of this study offer a source of reference for identifying and understanding complex urban spatial structures and their functional configuration. The proposed method can also assist in the selection of urban sites for different functions, urban master planning, and the construction of smart cities.
In the future, we will fuse multiple spatial and temporal data with traditional data as a means of identifying functional changes in regions. On the other hand, we only considered a single function of the region in this study, without taking into account that city center areas often have different functional attributes. In the subsequent work, the different functional intensity of regions is also a research direction.

Author Contributions

Conceptualization, Yue Chen and Haizhong Qian; methodology, Yue Chen; formal analysis, Xiao Wang; investigation, Lijian Han; writing—original draft preparation, Yue Chen; writing—review and editing, Haizhong Qian and Xiao Wang; visualization, Yue Chen; supervision, Xiao Wang and Di Wang; funding acquisition, Haizhong Qian All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Natural Science Foundation for Distinguished Young Scholars of Henan Province, grant number 212300410014.

Data Availability Statement

The original POI data and mobile phone signal heat data provided by Gaode Map and China Unicom Ltd., respectively. Restrictions apply to the availability of these data, which were used under license for this study.

Acknowledgments

The authors would like to thank the anonymous reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Batty, M. Cities as Complex Systems: Scaling, Interaction, Networks, Dynamics and Urban Morphologies; Springer: New York, NY, USA, 2009. [Google Scholar] [CrossRef]
  2. Anas, A.; Arnott, R.; Small, K.A. Urban Spatial Structure. J. Econ. Lit. 1998, 36, 1426–1464. [Google Scholar]
  3. Yuan, N.J.; Zheng, Y.; Xie, X.; Wang, Y.; Zheng, K.; Xiong, H. Discovering Urban Functional Zones Using Latent Activity Trajectories. IEEE Trans. Knowl. Data Eng. 2015, 27, 712–725. [Google Scholar] [CrossRef]
  4. Couch, C. Urban Planning: An introduction; Bloomsbury Publishing: London, UK, 2017; p. 45. [Google Scholar]
  5. Cai, J.; Huang, B.; Song, Y. Using multi-source geospatial big data to identify the structure of polycentric cities. Remote Sens. Environ. 2017, 202, 210–221. [Google Scholar] [CrossRef]
  6. Hu, S.; Wang, L. Automated urban land-use classification with remote sensing. Int. J. Remote Sens. 2013, 34, 790–803. [Google Scholar] [CrossRef]
  7. Shao, H.; Li, Y.; Ding, Y.; Zhuang, Q.; Chen, Y. Land Use Classification Using High-Resolution Remote Sensing Images Based on Structural Topic Model. IEEE Access 2020, 8, 215943–215955. [Google Scholar] [CrossRef]
  8. Castelluccio, M.; Poggi, G.; Sansone, C.; Verdoliva, L. Land use classification in remote sensing images by convolutional neural networks. arXiv 2015, arXiv:1508.00092. [Google Scholar]
  9. Liu, Y.; Liu, X.; Gao, S.; Gong, L.; Kang, C.; Zhi, Y.; Chi, G.; Shi, L. Social Sensing: A New Approach to Understanding Our Socioeconomic Environments. Ann. Assoc. Am. Geogr. 2015, 105, 512–530. [Google Scholar] [CrossRef]
  10. Zheng, Y.; Capra, L.; Wolfson, O.; Yang, H. Urban Computing: Concepts, Methodologies, and Applications. ACM Trans. Intell. Syst. Technol. 2014, 5, 38. [Google Scholar] [CrossRef]
  11. Hu, Y.; Han, Y. Identification of Urban Functional Areas Based on POI Data: A Case Study of the Guangzhou Economic and Technological Development Zone. Sustainability 2019, 11, 1385. [Google Scholar] [CrossRef]
  12. Yuan, J.; Zheng, Y.; Xie, X. Discovering regions of different functions in a city using human mobility and POIs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 186–194. [Google Scholar]
  13. Yi, D.; Yang, J.; Liu, J.; Liu, Y.; Zhang, J. Quantitative Identification of Urban Functions with Fishers’ Exact Test and POI Data Applied in Classifying Urban Districts: A Case Study within the Sixth Ring Road in Beijing. ISPRS Int. J. Geo-Inf. 2019, 8, 555. [Google Scholar] [CrossRef]
  14. Gao, S.; Janowicz, K.; Couclelis, H. Extracting urban functional regions from points of interest and human activities on location-based social networks. Trans. GIS 2017, 21, 446–467. [Google Scholar] [CrossRef]
  15. Wang, Z.; Ma, D.; Sun, D.; Zhang, J. Identification and analysis of urban functional area in Hangzhou based on OSM and POI data. PLoS ONE 2021, 16, e0251988. [Google Scholar] [CrossRef] [PubMed]
  16. Liu, X.; He, J.; Yao, Y.; Zhang, J.; Liang, H.; Wang, H.; Hong, Y. Classifying urban land use by integrating remote sensing and social media data. Int. J. Geogr. Inf. Sci. 2017, 31, 1675–1696. [Google Scholar] [CrossRef]
  17. Tu, W.; Cao, J.; Yue, Y.; Shaw, S.-L.; Zhou, M.; Wang, Z.; Chang, X.; Xu, Y.; Li, Q. Coupling mobile phone and social media data: A new approach to understanding urban functions and diurnal patterns. Int. J. Geogr. Inf. Sci. 2017, 31, 2331–2358. [Google Scholar] [CrossRef]
  18. Chen, Y.; Liu, X.; Li, X.; Liu, X.; Yao, Y.; Hu, G.; Xu, X.; Pei, F. Delineating urban functional areas with building-level social media data: A dynamic time warping (DTW) distance based k-medoids method. Landsc. Urban Plan. 2017, 160, 48–60. [Google Scholar] [CrossRef]
  19. da Silva Lopes, H.T.; Remoaldo, P.C.A.C.; Ribeiro, V. The use of photos of the social networks in shaping a new tourist destination: Analysis of clusters in a GIS environment. In Spatial Analysis, Modelling and Planning; IntechOpen: London, UK, 2018. [Google Scholar] [CrossRef]
  20. Xue, F.; Li, X.; Lu, W.; Webster, C.J.; Chen, Z.; Lin, L. Big Data-Driven Pedestrian Analytics: Unsupervised Clustering and Relational Query Based on Tencent Street View Photographs. ISPRS Int. J. Geo-Inf. 2021, 10, 561. [Google Scholar] [CrossRef]
  21. Chen, Z.; Gong, X.; Xie, Z. An analysis of movement patterns between zones using taxi GPS data. Trans. GIS 2017, 21, 1341–1363. [Google Scholar] [CrossRef]
  22. Liu, X.; Gong, L.; Gong, Y.; Liu, Y. Revealing travel patterns and city structure with taxi trip data. J. Transp. Geogr. 2015, 43, 78–90. [Google Scholar] [CrossRef]
  23. Liu, X.; Kang, C.; Gong, L.; Liu, Y. Incorporating spatial interaction patterns in classifying and understanding urban land use. Int. J. Geogr. Inf. Sci. 2016, 30, 334–350. [Google Scholar] [CrossRef]
  24. Liu, X.; Tian, Y.; Zhang, X.; Wan, Z. Identification of Urban Functional Regions in Chengdu Based on Taxi Trajectory Time Series Data. ISPRS Int. J. Geo-Inf. 2020, 9, 158. [Google Scholar] [CrossRef] [Green Version]
  25. Tobler, W.R. A computer movie simulating urban growth in the Detroit region. Econ. Geogr. 1970, 46, 234–240. [Google Scholar] [CrossRef]
  26. Yao, Y.; Li, X.; Liu, X.; Liu, P.; Liang, Z.; Zhang, J.; Mai, K. Sensing spatial distribution of urban land use by integrating points-of-interest and Google Word2Vec model. Int. J. Geogr. Inf. Sci. 2017, 31, 825–848. [Google Scholar] [CrossRef]
  27. Zhang, C.; Xu, L.; Yan, Z.; Wu, S. A GloVe-Based POI Type Embedding Model for Extracting and Identifying Urban Functional Regions. ISPRS Int. J. Geo-Inf. 2021, 10, 372. [Google Scholar] [CrossRef]
  28. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26, pp. 3111–3119. [Google Scholar]
  29. Zhang, X.; Du, S.; Wang, Q. Hierarchical semantic cognition for urban functional zones with VHR satellite images and POI data. ISPRS J. Photogramm. Remote Sens. 2017, 132, 170–184. [Google Scholar] [CrossRef]
  30. Burger, W.; Burge, M.J.; Burge, M.J.; Burge, M.J. Principles of Digital Image Processing; Springer: Berlin/Heidelberg, Germany, 2009; Volume 111. [Google Scholar]
  31. Socher, R.; Perelygin, A.; Wu, J.Y.; Chuang, J.; Manning, C.D.; Ng, A.Y.; Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP 2013, 1631, 1631–1642. [Google Scholar]
  32. Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
  33. Ng, H.T.; Zelle, J. Corpus-based approaches to semantic interpretation in NLP. AI Mag. 1997, 18, 45. [Google Scholar] [CrossRef]
  34. Thurstain-Goodwin, M.; Unwin, D. Defining and Delineating the Central Areas of Towns for Statistical Monitoring Using Continuous Surface Representations. Trans. GIS 2000, 4, 305–317. [Google Scholar] [CrossRef]
  35. Yu, W. Spatial co-location pattern mining for location-based services in road networks. Expert Syst. Appl. 2016, 46, 324–335. [Google Scholar] [CrossRef]
  36. Zhang, D.; Xu, H.; Su, Z.; Xu, Y. Chinese comments sentiment classification based on word2vec and SVMperf. Expert Syst. Appl. 2015, 42, 1857–1863. [Google Scholar] [CrossRef]
  37. Islam, A.; Inkpen, D. Semantic text similarity using corpus-based word similarity and string similarity. ACM Trans. Knowl. Discov. Data (TKDD) 2008, 2, 10. [Google Scholar] [CrossRef]
  38. Ya, L.; Yalan, L.; Chang, Q.; Yuhuan, R.; Zhihao, W. Semantic information mining and remote sensing classification of urban functional areas. J. Univ. Chin. Acad. Sci. 2019, 36, 56–63. [Google Scholar] [CrossRef]
  39. Yanyan, G.; Limin, J.; Ting, D.; Yandong, W.; Gang, X. Spatial Distribution and Interaction Analysis of Urban Functional Areas Based on Multi-source Data. Geomat. Inf. Sci. Wuhan Univ. 2018, 43, 1113. [Google Scholar] [CrossRef]
Figure 1. Comparison of organizational relationships between POIs: (a) Existing studies use the linear relationship between POIs; (b) In this study, we make use of the non-linear relationship between POIs.
Figure 1. Comparison of organizational relationships between POIs: (a) Existing studies use the linear relationship between POIs; (b) In this study, we make use of the non-linear relationship between POIs.
Ijgi 11 00498 g001
Figure 2. Core areas of Zhengzhou City.
Figure 2. Core areas of Zhengzhou City.
Ijgi 11 00498 g002
Figure 3. Flowchart of urban functional area identification.
Figure 3. Flowchart of urban functional area identification.
Ijgi 11 00498 g003
Figure 4. Construction steps of Parcels: (a) original road network; (b) roads after expansion operation; (c) roads by refined operation; (d) Parcel schematic diagram.
Figure 4. Construction steps of Parcels: (a) original road network; (b) roads after expansion operation; (c) roads by refined operation; (d) Parcel schematic diagram.
Ijgi 11 00498 g004
Figure 5. Distribution of POIs in the road network.
Figure 5. Distribution of POIs in the road network.
Ijgi 11 00498 g005
Figure 6. Graph representation of the road network.
Figure 6. Graph representation of the road network.
Ijgi 11 00498 g006
Figure 7. The POI type vectors is visualized in the three-dimensional coordinate system, and its part is taken for enlarged display, as shown in (a,b). Note: Because the training is carried out with Chinese labels, it is displayed in Chinese. And a English definition below each label.
Figure 7. The POI type vectors is visualized in the three-dimensional coordinate system, and its part is taken for enlarged display, as shown in (a,b). Note: Because the training is carried out with Chinese labels, it is displayed in Chinese. And a English definition below each label.
Ijgi 11 00498 g007
Figure 8. Determination of K value in clustering: (a) elbow method; (b) silhouette coefficient method.
Figure 8. Determination of K value in clustering: (a) elbow method; (b) silhouette coefficient method.
Ijgi 11 00498 g008
Figure 9. Identification results of urban functional areas in the main areas of Zhengzhou. Note: Public Service in the legend is Science, Education and Public Service areas.
Figure 9. Identification results of urban functional areas in the main areas of Zhengzhou. Note: Public Service in the legend is Science, Education and Public Service areas.
Ijgi 11 00498 g009
Figure 10. Population heat value of different areas in different time periods on weekdays.
Figure 10. Population heat value of different areas in different time periods on weekdays.
Ijgi 11 00498 g010
Figure 11. Population heat value of different areas in different time periods on weekends.
Figure 11. Population heat value of different areas in different time periods on weekends.
Ijgi 11 00498 g011
Figure 12. Identification results of different methods. Note: Public Service in the legend is Science, Education and Public Service areas.
Figure 12. Identification results of different methods. Note: Public Service in the legend is Science, Education and Public Service areas.
Ijgi 11 00498 g012
Figure 13. Confusion matrix of identification result based on different methods: (a) LDA; (b) Word2Vec; (c) GloVe; (d) proposed method.
Figure 13. Confusion matrix of identification result based on different methods: (a) LDA; (b) Word2Vec; (c) GloVe; (d) proposed method.
Ijgi 11 00498 g013
Figure 14. Comparison of LDA model recognition results (a,c) and actual functional areas (b,d) in the urban center area and suburban area. Note: The urban centers are (a,b), while suburbs are (c,d).
Figure 14. Comparison of LDA model recognition results (a,c) and actual functional areas (b,d) in the urban center area and suburban area. Note: The urban centers are (a,b), while suburbs are (c,d).
Ijgi 11 00498 g014
Figure 15. Comparison of partial identification results of Word2Vec with the actual function types and Baidu map: (a) area A of Figure 12a; (b) area B of Figure 12a; (c) area C of Figure 12a; (d) area D of Figure 12a.
Figure 15. Comparison of partial identification results of Word2Vec with the actual function types and Baidu map: (a) area A of Figure 12a; (b) area B of Figure 12a; (c) area C of Figure 12a; (d) area D of Figure 12a.
Ijgi 11 00498 g015
Figure 16. Comparison of partial identification results of GloVe with the actual function types and Baidu map: (a) area A of Figure 12b; (b) area B of Figure 12b; (c) area C of Figure 12b; (d) area D of Figure 12b.
Figure 16. Comparison of partial identification results of GloVe with the actual function types and Baidu map: (a) area A of Figure 12b; (b) area B of Figure 12b; (c) area C of Figure 12b; (d) area D of Figure 12b.
Ijgi 11 00498 g016
Figure 17. Comparison of the identification result of the central area between the proposed method (a) and the actual function categories (b).
Figure 17. Comparison of the identification result of the central area between the proposed method (a) and the actual function categories (b).
Ijgi 11 00498 g017
Table 1. Data used in this study.
Table 1. Data used in this study.
Data TypeQuantityDescription
Urban administrative division map of Zhengzhou1The core areas of Zhengzhou were selected for this study
Road network data12,149Highways, primary roads, secondary roads, and tertiary roads were selected to construct Parcels
POI data216,41313 large categories, 128 medium categories, and 434 small categories
Mobile phone signaling heat data8,922,272Population heat data in the 500 m grid area in 24 time periods on a workday and a weekend day
Table 2. The co-occurrence matrix of Figure 6.
Table 2. The co-occurrence matrix of Figure 6.
QuantityABCD
A0343
B3032
C4303
D323
Table 3. Partial POI category vectors.
Table 3. Partial POI category vectors.
POI CategoryPOI Category Vectors (Dimension = 70)
Outdoor fitness center(−0.00319916, 0.00888043, −0.02442472, ……, −0.02165624)
Parks(0.02811326, 0.02575135, −0.0115155, ……, −0.0042832)
Middle Schools(0.04371936, 0.03249595, −0.02067991, ……, −0.05321765)
Factories(0.01310952, −0.03528632, 0.01673809, ……, 0.08923801)
Table 4. CR and EF values of POIs.
Table 4. CR and EF values of POIs.
Category C 0 C 1 C 2 C 3 C 4
EF 1CR 2EFCREFCREFCREFCR
Accommodation1.0338.5290.2846.1150.7197.0981.1007.0001.1497.734
Government agency0.8937.3741.9117.9900.4634.5721.0212.4611.3559.120
Medical and healthcare1.0428.5961.9027.7350.6646.5601.0302.9001.0356.967
Sports and leisure0.9357.7192.0568.0690.5865.7832.73611.2631.1537.763
Commercial residence1.0148.3710.8156.3760.6676.5910.4290.6241.0517.073
Public utilities0.9637.9461.5787.2690.1442.1691.8508.9460.7835.270
Science, education and culture1.0058.2942.2259.7310.7827.7241.0151.6410.9286.250
Financial and insurance0.8827.2791.0217.0050.3623.5721.0141.6511.4839.988
Transportation Facilities0.9097.5032.2028.9530.7637.5331.0684.7581.1817.952
Shopping service0.9978.2321.9358.1001.96319.5135.07615.1111.0557.103
Restaurants1.0929.0141.9278.0682.14320.2892.3209.0990.7695.175
Tourist sites0.6505.3671.0337.5970.4394.33910.54733.0291.3879.338
Corporations0.7005.7750.4106.9910.4314.2551.0121.5171.52510.267
1 The EF value is the enrichment factor of the POI, the higher the value the higher the enrichment of this type of POI. 2 The CR value is the proportion of POIs in the category, the higher the value the higher the proportion of the density of this type of POI in the sum of all types of POIs in the region.
Table 5. Verification of functional area identification results.
Table 5. Verification of functional area identification results.
FunctionBaidu MapBaidu Street ViewIdentification Results
Natural
Scenic
(brown)
Ijgi 11 00498 i001
(Longhu Wetland Park)
Public
Service *
(yellow)
Ijgi 11 00498 i002
(Zhengzhou University)
Commercial
(blue)
Ijgi 11 00498 i003
(Grand View International Trade, World trade shopping center, Silverbase Plaza)
Residential
(pink)
Ijgi 11 00498 i004
(Zhengshang Goldfield Family, Wu Jian Xin Neighborhood)
Industry
(purple)
Ijgi 11 00498 i005
(Xingda Medical Equipment Co., Enterprise Park)
* Public Service is Science, Education and Public Service areas.
Table 6. Kappa coefficient and overall accuracy.
Table 6. Kappa coefficient and overall accuracy.
MethodKappa CoefficientOverall Accuracy
LDA0.3300290.499472
Word2Vec0.3893530.543822
GloVe0.6351690.721987
This study0.7404160.801268
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chen, Y.; Qian, H.; Wang, X.; Wang, D.; Han, L. A GloVe Model for Urban Functional Area Identification Considering Nonlinear Spatial Relationships between Points of Interest. ISPRS Int. J. Geo-Inf. 2022, 11, 498. https://doi.org/10.3390/ijgi11100498

AMA Style

Chen Y, Qian H, Wang X, Wang D, Han L. A GloVe Model for Urban Functional Area Identification Considering Nonlinear Spatial Relationships between Points of Interest. ISPRS International Journal of Geo-Information. 2022; 11(10):498. https://doi.org/10.3390/ijgi11100498

Chicago/Turabian Style

Chen, Yue, Haizhong Qian, Xiao Wang, Di Wang, and Lijian Han. 2022. "A GloVe Model for Urban Functional Area Identification Considering Nonlinear Spatial Relationships between Points of Interest" ISPRS International Journal of Geo-Information 11, no. 10: 498. https://doi.org/10.3390/ijgi11100498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop