Block2vec: An Approach for Identifying Urban Functional Regions by Integrating Sentence Embedding Model and Points of Interest

: Urban functional regions are essential information in parsing urban spatial structure. The rapid and accurate identiﬁcation of urban functional regions is important for improving urban planning and management. Thanks to its low cost and fast data update characteristics, the Point of Interest (POI) is one of the most common types of open access data. It mainly identiﬁes urban functional regions by analyzing the potential correlation between POI data and the regions. Even though this is an important manifestation of the functional region, the spatial correlation between regions is rarely considered in previous studies. In order to extract the spatial semantic information among regions, a new model, called the Block2vec, is proposed by using the idea of the Skip-gram framework. The Block2vec model maps the spatial correlation between the POIs, as well as the regions, to a high-dimensional vector, in which classiﬁcation of urban functional regions can be better performed. The results from cluster analysis showed that the high-dimensional vector extracted can well distinguish the regions with different functions. The random forests classiﬁcation result (Overall accuracy = 0.7186, Kappa = 0.6429) illustrated the effectiveness of the proposed method. This study also veriﬁed the potential of the sentence embedding model in the semantic information extraction of POIs.


Introduction
Cities are composed of various functions that describe human social activities and their employment of land [1,2], and can be divided into various functional regions, such as commercial, residential, industrial and open space. Urban functional regions are closely related to many urban structure studies, such as neighborhood vibrancy [3,4], travel distribution [5], urban mass transit [6] and urban energy consumption [7]. With the rapid urbanization in recent years, the urban function structure has become increasingly diverse and sophisticated. In addition, the evolution of the actual function of the region may be inconsistent with the planning intention of the land [8][9][10]. Thus, the fast and accurate identification of urban functional regions has become essential for improving urban planning and management [11][12][13].
Cadastral maps and censuses data are valuable sources of land use data as they explicitly reflect land use and contribute to land use management. However, there are extremely strict requirements for its update speed and update frequency, which is obviously not conducive to our real-time understanding of the urban land use structure. Remote sensing images [14][15][16][17] and radar/Lidar [18][19][20] have been used effectively for land use and land cover classification due to these can capture both spectral and textural properties of the land. However, it may be difficult for them to distinguish the categories closely ISPRS Int. J. Geo-Inf. 2021, 10, 339 2 of 19 related to human social activities, because these data cannot capture functional interaction pattern, nor can they understand socioeconomic environments [21][22][23][24][25]. Therefore, the land cover categories for impervious surfaces usually include commercial, residential and industrial land.
To monitor and understand the potential information about human social activities, multi-source geographic data has been investigated to perceive human social activities and further infer the functions of the regions, such as mobile phone data [23,[26][27][28][29][30], GPS trajectory data [31][32][33], smart card data [1,34], social media data [25,35,36] and Point of Interests (POI) data [2,9,10,33,34,[36][37][38]. Wherein, POI data, as its inexpensive and fast to acquire from the internet, has significance advantage in representing reliable information about the location and type of urban activities (e.g., Shopping, Entertainment and Restaurant) [9,39]. Moreover, POI data can explicitly express semantic information about the urban built environment.
There is a growing body of literatures using POI data to identify urban functional regions. Considering strong relationships between functional regions with socioeconomic activities, Liu and Long [38] attempted to map functional patterns at the parcel level by generating various indicators based on frequencies of POI data. Yuan et al. [40] identified urban functional zones by using the POI data and taxi trajectory data in Beijing. However, due to the complexity of urban spatial structure, it is not enough to analyze functional patterns only using POI frequencies. Several neural language process (NLP) methods have been deployed to infer urban functional regions. In Gao et al. [2], the Latent Dirichlet Allocation (LDA) topic model was used to infer urban functional regions using POI and user check-in activities data. Chen et al. [41] compared the spatial organization of 25 cities based on the co-location patterns mining method. Word2vec algorithm was used to infer the spatial relationship of POIs in terms of urban functional region classification [10]. Place2vec algorithm, which considers spatial context based on the first law of geography [42], was used to identify the urban functional regions in Wuxi, China [8]. Conversely, spatial relationship exists not only between facility points (such as POI) but also between parcels. In other words, as the POI is related more to the other POIs that are geographically close to it according to the first law of geography [8,43], a parcel is also more related to the parcels that are geographically closer to it. However, the above methods are word embedding methods that only consider the spatial relationship between each POI, and few studies have explicitly addressed the spatially interacting relationships between parcels.
To address the gaps, a parcel-based approach, called Block2vec, was proposed to extract spatial information between parcels inspired by sentence embedding methods [44,45]. Based on the nearest neighbor method, the POI sequence and further sequence group were constructed for each parcel in Block2vec. The latent semantic feature extraction model was then built by using the skip-gram framework. Here, the Long Short Term Memory (LSTM) network [46] was deployed to build the Block2vec model, which was a one-to-many (central parcel to background parcels) and hierarchical model. Finally, the Block2vec model was tested and verified by a case study in Wuhan, China.
The remainder of the paper is structured as follows: Section 2 presents the study area, dataset. Then Section 3 introduces the method for the Block2vec model. Section 4 describes comparisons with experimental results. Section 5 discusses the advantages and limitations of the proposed method. Finally, Section 6 presents the conclusion and future work.

Study Area
The study area of this research is the main urban area of Wuhan, which is the capital of Hubei Province, China. The region consists of an area within the third ring road, Zhuankou, Wugang and Miaoshan, covering an area of 678 km 2 . Divided by the Yangtze, the city is known as the 'Three Towns of Wuhan' with Hankou and Hanyang on the west bank, and Wuchang on the east. For this study, the region was divided into 2385 parcels according to the road network data. Figure 1 shows the main urban area and POIs distribution in Wuhan. the city is known as the 'Three Towns of Wuhan' with Hankou and Hanyang on the west bank, and Wuchang on the east. For this study, the region was divided into 2385 parcels according to the road network data. Figure 1 shows the main urban area and POIs distribution in Wuhan.

Dataset
The POI data used in this study were obtained through the AutoNavi Development Platform (ADP) (https://lbs.amap.com/api/webservice/guide/api/search, accessed on 29 December 2016). For the study area, 537,375 POI records were collected in December 2016. Each POI record contains the geographic latitude and longitude of the POI and a classification category of multiple levels, of which there are 20 major categories of top-level and over 500 subcategories third-level. For example, as a primary school, its primary category is science and education cultural service, the secondary category is school and the tertiary category is a primary school. The detailed POI categories could be obtained through the website (https://lbs.amap.com/api/webservice/download, accessed on 14 February 2017). Among all the categories, the Address / Location was excluded in the later study because it could not explicitly express some human social activities. In Table 1, excluding address and location, the categories with the largest number of POIs are Shopping Mall, Catering Service and Living Service.

Dataset
The POI data used in this study were obtained through the AutoNavi Development Platform (ADP) (https://lbs.amap.com/api/webservice/guide/api/search, accessed on 29 December 2016). For the study area, 537,375 POI records were collected in December 2016. Each POI record contains the geographic latitude and longitude of the POI and a classification category of multiple levels, of which there are 20 major categories of top-level and over 500 subcategories third-level. For example, as a primary school, its primary category is science and education cultural service, the secondary category is school and the tertiary category is a primary school. The detailed POI categories could be obtained through the website (https://lbs.amap.com/api/webservice/download, accessed on 14 February 2017). Among all the categories, the Address / Location was excluded in the later study because it could not explicitly express some human social activities. In Table 1, excluding address and location, the categories with the largest number of POIs are Shopping Mall, Catering Service and Living Service.

Methodology
The overall workflow of the proposed approach is shown in Figure 2. The main goal of the approach proposed is to extract the semantic information from POIs in a parcel, to better identify the function of the regions. Firstly, the POI data and Parcels were used to produce the POI semantic sequence for each parcel. Secondly, POI semantic sequence was grouped according to parcels by using the nearest neighbor method. Thirdly, the latent semantic feature extraction model was establish using LSTM network. The model was trained using the POI semantic sequence groups and then mapped the semantic sequence into a high-dimensional latent semantics vector. Then, the K-Means algorithm was used to verify the discriminability and validity of the latent semantic features, and Random Forest Algorithm (RFA) were adopted to classify urban functional regions. Finally, the performance of the urban functional region classification was estimated based on its overall accuracy (OA) and Kappa score.

Methodology
The overall workflow of the proposed approach is shown in Figure 2. The main goal of the approach proposed is to extract the semantic information from POIs in a parcel, to better identify the function of the regions. Firstly, the POI data and Parcels were used to produce the POI semantic sequence for each parcel. Secondly, POI semantic sequence was grouped according to parcels by using the nearest neighbor method. Thirdly, the latent semantic feature extraction model was establish using LSTM network. The model was trained using the POI semantic sequence groups and then mapped the semantic sequence into a high-dimensional latent semantics vector. Then, the K-Means algorithm was used to verify the discriminability and validity of the latent semantic features, and Random Forest Algorithm (RFA) were adopted to classify urban functional regions. Finally, the performance of the urban functional region classification was estimated based on its overall accuracy (OA) and Kappa score.

Constructing Semantic Sequence for Each Parcel
The function of a region is related to the integration of all types of activities there [2]. Generally speaking, there are multiple service facilities in one parcel, and different locations in the parcel have different spatial contact opportunities. According to the different locations, the POIs could be divided into two parts, including the part located closer to the road and the other part located in the parcel. The former serves the population in the adjacent parcels, while the latter will mainly service the population in this parcel.

Constructing Semantic Sequence for Each Parcel
The function of a region is related to the integration of all types of activities there [2]. Generally speaking, there are multiple service facilities in one parcel, and different locations in the parcel have different spatial contact opportunities. According to the different locations, the POIs could be divided into two parts, including the part located closer to the road and the other part located in the parcel. The former serves the population in the adjacent parcels, while the latter will mainly service the population in this parcel.
In this study, the semantic sequence of POIs with specific order was constructed to express the different spatial contact opportunities in one parcel. Considering the spatial difference of the POIs above, POIs in a parcel could be sorted by order of the spatial distance from each POI to the center of the parcel. For example, in Figure 3a, there are currently six POIs in the i-th parcel, where p 1 is the closest to the center point and p 6 is the closest to the road. Based on the distance to the center point, the semantic sequence S i was constructed as {p 1 , p 2 , p 3 , p 5 , p 4 , p 6 }. In practice, parcels could have different numbers of POIs, which means that their POI sequences could have different lengths. In the next study, the LSTM layer requires a fixed number of input neurons. Therefore, the POI sequences with various lengths need to be proceeded to have a fix-length sequence. In this paper, the fixed length is set to the length that accumulates the percentage to 90%. Namely, if the POI length exceeds the fixed length in a parcel, the excess POIs would be removed. While if the length is less than the fixed length, the specific characters would be filled.
closest to the road. Based on the distance to the center point, the semantic sequence was constructed as { , , , , , }. In practice, parcels could have different numbers of POIs, which means that their POI sequences could have different lengths. In the next study, the LSTM layer requires a fixed number of input neurons. Therefore, the POI sequences with various lengths need to be proceeded to have a fix-length sequence. In this paper, the fixed length is set to the length that accumulates the percentage to 90%. Namely, if the POI length exceeds the fixed length in a parcel, the excess POIs would be removed. While if the length is less than the fixed length, the specific characters would be filled.
To fully mine spatial semantic relationships in POI data, it is necessary to not only consider the spatial relationship between POIs but also that between adjacent parcels. In natural language processing, a word or a sentence has two contextual relationships, forward and backward. However, in geospatial, there will be several different directional contexts. To simplify this problem, the contextual relationship of four adjacent parcels was considered as a block, which could be regarded as a contextual relationship. The typical spatial distribution of a block was shown in Figure 3b, where four nearest parcels ( , , , ) around the central parcel i were regarded as context parcels. Therefore, the Semantic sequence group for parcel i was defined as [ , ( , , , , , , , )].
(a) Construction of POIs sequence for the i-th parcel.
(b) Typical spatial distribution and relationship between center parcel and context parcel.

Latent Semantic Feature Extraction Model
Previous studies have shown that the seq2seq model can effectively extract the latent features of a sentence by using its context information [44,45,47,48]. Different from the word embedding method, the sentence embedding method represented by seq2seq models To fully mine spatial semantic relationships in POI data, it is necessary to not only consider the spatial relationship between POIs but also that between adjacent parcels. In natural language processing, a word or a sentence has two contextual relationships, forward and backward. However, in geospatial, there will be several different directional contexts. To simplify this problem, the contextual relationship of four adjacent parcels was considered as a block, which could be regarded as a contextual relationship. The typical spatial distribution of a block was shown in Figure 3b, where four nearest parcels (C 1 , C 2 , C 3 , C 4 ) around the central parcel i were regarded as context parcels. Therefore, the Semantic sequence group for parcel i was defined as

Latent Semantic Feature Extraction Model
Previous studies have shown that the seq2seq model can effectively extract the latent features of a sentence by using its context information [44,45,47,48]. Different from the word embedding method, the sentence embedding method represented by seq2seq models could perform the sentence embedding task better, because it can comprehensively capture the relevant characteristics of different words at the level of the sentence, rather than understand them at the level of words. Inspired by the above model, the POI sequence in a parcel could be regarded as a sentence, with the k nearest parcels in its geospatial as its context parcels. As illustrated in Figure 3b, the most adjacent k value was set to 4.
In this study, the Skip-Gram model, which has been used in the skip thought vectors model [44], was applied to establish the latent semantic feature extraction model, which can be described by three parts: the encoder, decoder and objective function. As shown in Figure 4, an encoder was used to map the POI sequence of the central parcel to a latent semantic feature, and multiple decoders were used to generate POI sequences of context parcels. could perform the sentence embedding task better, because it can comprehensively capture the relevant characteristics of different words at the level of the sentence, rather than understand them at the level of words. Inspired by the above model, the POI sequence in a parcel could be regarded as a sentence, with the k nearest parcels in its geospatial as its context parcels. As illustrated in Figure 3b, the most adjacent k value was set to 4. In this study, the Skip-Gram model, which has been used in the skip thought vectors model [44], was applied to establish the latent semantic feature extraction model, which can be described by three parts: the encoder, decoder and objective function. As shown in Figure 4, an encoder was used to map the POI sequence of the central parcel to a latent semantic feature, and multiple decoders were used to generate POI sequences of context parcels. be the POI sequence in the sequence, where n is the number of POIs in the sequence . For each calculation step, the encoder calculates a hidden layer feature ℎ , which can be regarded as a hidden expression for the sequence { , , … , }. The hidden state ℎ thus represents the entire sequence, namely, the latent semantic feature vector of . To encode the sequence , iterate over the following equations from the first POI in the POI sequence: where, is the input gate, is the forget gate, is the update gate and is the output gate, is the cell state and ℎ is the hidden state of the encoder at step t. Four LSTM Layers were adopted to establish the decoder of the model, respectively. The network structure of each decoder is similar to that of the encoder. With the state ℎ as a condition, four decoders then generate the POI sequences of context parcels. The LSTM layers were used to build the encoder of the model. For i-th parcel, let { x 1 , x 2 , . . . , x n } be the POI sequence in the S i sequence, where n is the number of POIs in the sequence S i . For each calculation step, the encoder calculates a hidden layer feature h t , which can be regarded as a hidden expression for the sequence {x 1 , x 2 , . . . , x t }. The hidden state h n thus represents the entire sequence, namely, the latent semantic feature vector of S i . To encode the sequence S i , iterate over the following equations from the first POI in the POI sequence: where, i t is the input gate, f t is the forget gate, g t is the update gate and o t is the output gate, c t is the cell state and h t is the hidden state of the encoder at step t. Four LSTM Layers were adopted to establish the decoder of the model, respectively. The network structure of each decoder is similar to that of the encoder. With the state h n as a condition, four decoders then generate the POI sequences of context parcels.
Given a POI sequence group [S i , (S i,c 1 , S i,c 2 , S i,c 3 , S i,c 4 )], the optimization objective function is the sum of log-probabilities for the context semantic sequences conditioned on the encoder representation: where h i denotes the hidden state of the sequence S i , x t c denotes the predicted value for parcel c at step t.
In the model training process, the total objective was to minimize the sum of the above optimization objective functions over all sequence groups.

Identification of Urban Functional Regions Based on Latent Semantics
After training the above model, the encoder with the learned weights was used as a feature extractor to map the POI sequence semantics of each parcel to a latent semantic feature h n . Theoretically, the more similar the POI semantic function between the parcels and their surrounding environment, the more they gather in the latent semantic space. Therefore, several classifiers could be trained to distinguish different regions' functions. As shown in Figure 5, K-Means and Random Forest Algorithm (RFA) were adopted to classify the parcels with different latent semantic features.
Given a POI sequence group [ , ( , , , , , , , )], the optimization objective function is the sum of log-probabilities for the context semantic sequences conditioned on the encoder representation: where ℎ denotes the hidden state of the sequence , denotes the predicted value for parcel c at step t.
In the model training process, the total objective was to minimize the sum of the above optimization objective functions over all sequence groups.

Identification of Urban Functional Regions Based on Latent Semantics
After training the above model, the encoder with the learned weights was used as a feature extractor to map the POI sequence semantics of each parcel to a latent semantic feature ℎ . Theoretically, the more similar the POI semantic function between the parcels and their surrounding environment, the more they gather in the latent semantic space. Therefore, several classifiers could be trained to distinguish different regions' functions. As shown in Figure 5, K-Means and Random Forest Algorithm (RFA) were adopted to classify the parcels with different latent semantic features.

K-Means-Based Parcel Aggregation
To verify the discriminability and validity of the latent semantic features, the K-Means algorithm was used to aggregate the research parcels according to these features. The distance of similarity in vector space can be measured by various spatial distance calculation methods, such as Euclidean distance and Cosine distance. Since the feature dimension of the latent semantic space obtained in this paper is high, the cosine distance was adopted to measure the latent semantic feature vectors. Consequently, the cosinedistance-based K-Means clustering algorithm was applied to aggregate those parcels.
The silhouette score [49] was then used to evaluate how appropriate objects lie within their cluster. For the sample Pi, the average distance between Pi and other samples within the same cluster is defined as a and the average distance between Pi and samples within other clusters is defined as b, then the silhouette score is calculated as follows: It can be seen from the above formula that the value of the silhouette score ranges between [−1, 1], and the closer to 1 the better the clustering performance is. Therefore, we calculate the average silhouette score of all samples as the evaluation of the K-Means clustering.

RFA-Based Parcel Classification
The unsupervised clustering analysis, however, only classifies the categories by the differences between the POI latent semantic features of different parcels. Due to the inexplicability of the extracted POI's latent semantics, it is difficult to assign and define the categories that are clustered by the cluster analysis. Therefore, the supervised classification method based on existing training samples is an essential part of our consideration.

K-Means-Based Parcel Aggregation
To verify the discriminability and validity of the latent semantic features, the K-Means algorithm was used to aggregate the research parcels according to these features. The distance of similarity in vector space can be measured by various spatial distance calculation methods, such as Euclidean distance and Cosine distance. Since the feature dimension of the latent semantic space obtained in this paper is high, the cosine distance was adopted to measure the latent semantic feature vectors. Consequently, the cosine-distance-based K-Means clustering algorithm was applied to aggregate those parcels.
The silhouette score [49] was then used to evaluate how appropriate objects lie within their cluster. For the sample P i , the average distance between P i and other samples within the same cluster is defined as a and the average distance between P i and samples within other clusters is defined as b, then the silhouette score is calculated as follows: It can be seen from the above formula that the value of the silhouette score ranges between [−1, 1], and the closer to 1 the better the clustering performance is. Therefore, we calculate the average silhouette score of all samples as the evaluation of the K-Means clustering.

RFA-Based Parcel Classification
The unsupervised clustering analysis, however, only classifies the categories by the differences between the POI latent semantic features of different parcels. Due to the inexplicability of the extracted POI's latent semantics, it is difficult to assign and define the categories that are clustered by the cluster analysis. Therefore, the supervised classification method based on existing training samples is an essential part of our consideration.
Among them, the RFA is widely used in supervised classification because of its good adaptability to high-dimensional features and difficulty in over-fitting, and strong antinoise ability [50,51]. Let the H ij (i ∈ [1, M], j ∈ [1, N]) and Y k (k ∈ [1, K]) be the latent features and land use types of parcel i, where M is the total number of parcels and N is the dimensions of the features and K is the total number of the types of regions' functions. Using the bagging method, samples with n (n ≤ N) features were randomly selected from the N features, and then were used to build a decision tree. By the random combination of k features, C decision trees were repeatedly built without pruning operations. Each decision tree predicted the result separately and then all the results were integrated. Even though a single decision tree may be over-fitting, this risk can be reduced by integrating the results of all decision trees. In this paper, RFA model implementation combines all results by averaging their probabilistic prediction, instead of letting each decision tree vote for a single class.
As mentioned above, the actual function of the region may not be consistent with the planers' intention. In this study, the samples were selected by using a prior information from multiple sources, including urban land use planning maps, remote sensing images and online maps. The urban land use planning maps can be obtained through the website (http://zrzyhgh.wuhan.gov.cn/zwgk_18/fdzdgk/ghjh/zzqgh/202001/t20200107_ 602858.shtml, accessed on 12 May 2017). The samples including the five types of functions: residential regions, commercial regions, business regions, open green spaces and industrial regions. The training samples were randomly divided into two equal-sized subsets, one used for the training and another one used for the testing. Then the model was trained using the training samples and the testing samples were used for the accuracy evaluation of the trained models. To ensure the robustness of the classification, the above-mentioned random forest classification was repeated 100 times, and then the average accuracy was used as the final evaluation result. Additionally, several state-of-the-art POIs semantic mining methods, such as term frequency-inverse document frequency (TF-IDF) [9], Latent Dirichlet Allocation (LDA) [52] and Word2vec [10] were used for comparison with our proposed method.

Results
In this study, 2315 research parcels contained a total of 537,375 POI data, while Tianxingzhou and a few parcels without POI were removed. Then, the three-level classification of POI types (496 types in total) was used to construct the POI sequences for parcels, which could have different lengths. Figure 6 shows the distribution of POI sequence length of parcel. It can be seen that the POI sequence length of most parcels is smaller. When the length is up to 500, the cumulative percentage reaches 91.69%. Therefore, this study sets the fixed length of the sequence to 500. Finally, sequence groups for each parcel were constructed as described in Section 3.1.
Among them, the RFA is widely used in supervised classification because of its good adaptability to high-dimensional features and difficulty in over-fitting, and strong antinoise ability [50,51]. Let the (i ∈ [1, M], j ∈ [1, N]) and (k ∈ [1, K]) be the latent features and land use types of parcel i, where M is the total number of parcels and N is the dimensions of the features and K is the total number of the types of regions' functions. Using the bagging method, samples with n (n ≤ N) features were randomly selected from the N features, and then were used to build a decision tree. By the random combination of k features, C decision trees were repeatedly built without pruning operations. Each decision tree predicted the result separately and then all the results were integrated. Even though a single decision tree may be over-fitting, this risk can be reduced by integrating the results of all decision trees. In this paper, RFA model implementation combines all results by averaging their probabilistic prediction, instead of letting each decision tree vote for a single class.
As mentioned above, the actual function of the region may not be consistent with the planers' intention. In this study, the samples were selected by using a prior information from multiple sources, including urban land use planning maps, remote sensing images and online maps. The urban land use planning maps can be obtained through the website (http://zrzyhgh.wuhan.gov.cn/zwgk_18/fdzdgk/ghjh/zzqgh/202001/t20200107_602858.shtml, accessed on 12 May 2017). The samples including the five types of functions: residential regions, commercial regions, business regions, open green spaces and industrial regions. The training samples were randomly divided into two equal-sized subsets, one used for the training and another one used for the testing. Then the model was trained using the training samples and the testing samples were used for the accuracy evaluation of the trained models. To ensure the robustness of the classification, the above-mentioned random forest classification was repeated 100 times, and then the average accuracy was used as the final evaluation result. Additionally, several state-of-the-art POIs semantic mining methods, such as term frequency-inverse document frequency (TF-IDF) [9], Latent Dirichlet Allocation (LDA) [52] and Word2vec [10] were used for comparison with our proposed method.

Results
In this study, 2315 research parcels contained a total of 537,375 POI data, while Tianxingzhou and a few parcels without POI were removed. Then, the three-level classification of POI types (496 types in total) was used to construct the POI sequences for parcels, which could have different lengths. Figure 6 shows the distribution of POI sequence length of parcel. It can be seen that the POI sequence length of most parcels is smaller. When the length is up to 500, the cumulative percentage reaches 91.69%. Therefore, this study sets the fixed length of the sequence to 500. Finally, sequence groups for each parcel were constructed as described in Section 3.1. adopted to construct and train the regional potential semantic extraction model described in the Section 3.2. The LSTM structure was adopted to the latent semantic feature extraction model. In this model, the number of layers of LSTMs was set to 1, the latent semantic feature dimension was set to 200, the mini-batch was set to 64 and the number of iterations was 100.

Urban Functional Regions Aggregation by K-Means Algorithm
As illustrated above, owing to the similar latent semantics of their POI spatial sequences, parcels with the same functional semantics will be more closed in the latent semantic space than other functional parcels. The cosine-distance-based K-Means clustering algorithm was then performed to verify the discriminability and validity of the latent semantic features. As shown in Figure 7, when the clustering is two types, the silhouette score is the highest, then the silhouette score decreases gradually with the increase of the number of clusters. As a result, when k = [2,3,4], the silhouette score could reach the top-3 values. Moreover, the local maximum is obtained when the number of clusters k is 6, 8 and 12.
POIs in the parcel.
Several modules, such as the scikit-learn module (an open-source machine learning tool, https://scikit-learn.org/stable/, accessed on 10 May 2019), the PyTorch module (an open-source machine learning and deep learning framework, https://pytorch.org/, accessed on 2 August 2018) and the Gensim module (an open-source topic modeling framework, https://radimrehurek.com/gensim/, accessed on 23 September 2019), were adopted to construct and train the regional potential semantic extraction model described in the Section 3.2. The LSTM structure was adopted to the latent semantic feature extraction model. In this model, the number of layers of LSTMs was set to 1, the latent semantic feature dimension was set to 200, the mini-batch was set to 64 and the number of iterations was 100.

Urban Functional Regions Aggregation by K-Means Algorithm
As illustrated above, owing to the similar latent semantics of their POI spatial sequences, parcels with the same functional semantics will be more closed in the latent semantic space than other functional parcels. The cosine-distance-based K-Means clustering algorithm was then performed to verify the discriminability and validity of the latent semantic features. As shown in Figure 7, when the clustering is two types, the silhouette score is the highest, then the silhouette score decreases gradually with the increase of the number of clusters. As a result, when k = [2,3,4], the silhouette score could reach the top-3 values. Moreover, the local maximum is obtained when the number of clusters k is 6, 8 and 12.  When k = 2, by comparing with the remote sensing map and the land use map of comprehensive planning in Wuhan, we can find that the obvious circle structure can be observed in Figure 8a. Moreover, the clustering results divide the urban spatial function into a central area and an edge area, which may be related to the function and development level of the urban area in the center and suburbs.
When k = 3, the further division is performed compared to k = 2, and the circle structure still exists in Figure 8b; not only that, but the class 2 category at this time is more concentrated in the city center area, while class 3 is more concentrated in the city edge/peripheral area. Through the comparison of remote sensing maps and land use maps of comprehensive planning in Wuhan, the distribution of class 3 is consistent with the actual layout of various industrial areas in Wuhan.  When k = 2, by comparing with the remote sensing map and the land use map of comprehensive planning in Wuhan, we can find that the obvious circle structure can be observed in Figure 8a. Moreover, the clustering results divide the urban spatial function into a central area and an edge area, which may be related to the function and development level of the urban area in the center and suburbs.
When k = 3, the further division is performed compared to k = 2, and the circle structure still exists in Figure 8b; not only that, but the class 2 category at this time is more concentrated in the city center area, while class 3 is more concentrated in the city edge/peripheral area. Through the comparison of remote sensing maps and land use maps of comprehensive planning in Wuhan, the distribution of class 3 is consistent with the actual layout of various industrial areas in Wuhan.
When k = 4, the clustering map in Figure 8c is mainly to reclassify class 2 and class when k = 3, which produces class 1, class 3 and class 4 at this time. Additionally, the class 2 in Figure 8c is basically consistent with class 1 in Figure 8b. Among them, class 1 is more concentrated, showing a partial patchy and point-like distribution. At the same time, through the comparison of remote sensing maps and urban land use maps of comprehensive planning, it is found that the distribution of class1 categories is consistent with the distribution of commercial areas in Wuhan.

Identification of Urban Functional Regions Based on Random Forest Algorithm
Using the unsupervised K-Means clustering method, it can be seen that the proposed method can effectively extract the latent semantic features of POI sequences. However, the unsupervised method cannot give an explicit definition of the classified categories, so the supervised classification method based on existing training samples is the necessary means that need to be adopted.
Based on the latent semantic feature vectors extracted from the above model, this paper uses the random forest algorithm to classify urban functional regions. At the same time, 96 samples have been randomly sampled. Additionally, some state-of-the-art methods, including TF-IDF [9], LDA [2] and Word2vec [43], are used to compare with our methods.
RFA model provided by the scikit-learn module library (https://scikit-learn.org/ stable/, accessed on 10 May 2019) was adopted to classify urban functional regions, where the number C of decision trees is set to 200. The implementation of the Word2vec, LDA and TF-IDF models for comparison experiments was performed using the module provided by the genism module library (https://radimrehurek.com/gensim/index.html, accessed on 23 September 2019), where the model parameter settings for each method are maintained consistent with previous literature.
To ensure the stability of results, each method was repeated 100 times. Table 2 provides an accurate assessment of urban functional region classification using different methods, and Figure 9 shows the results of urban functional region classification mapping using different methods. Table 2. Accurate assessment of urban functional region classification via different methods.

Methods
Overall Accuracy Kappa Score

Identification of Urban Functional Regions Based on Random Forest Algorithm
Using the unsupervised K-Means clustering method, it can be seen that the proposed method can effectively extract the latent semantic features of POI sequences. However, the unsupervised method cannot give an explicit definition of the classified categories, so the supervised classification method based on existing training samples is the necessary means that need to be adopted.
Based on the latent semantic feature vectors extracted from the above model, this paper uses the random forest algorithm to classify urban functional regions. At the same time, 96 samples have been randomly sampled. Additionally, some state-of-the-art methods, including TF-IDF [9], LDA [2] and Word2vec [43], are used to compare with our methods.
RFA model provided by the scikit-learn module library (https://scikit-learn.org/stable/, accessed on 10 May 2019) was adopted to classify urban functional regions, where the number C of decision trees is set to 200. The implementation of the Word2vec, LDA and TF-IDF models for comparison experiments was performed using the module provided by the genism module library (https://radimrehurek.com/gensim/index.html, accessed on 23 September 2019), where the model parameter settings for each method are maintained consistent with previous literature.
To ensure the stability of results, each method was repeated 100 times. Table 2 provides an accurate assessment of urban functional region classification using different methods, and Figure 9 shows the results of urban functional region classification mapping using different methods.  Inconsistent with the previous studies, although the TF-IDF method only considers the quantitative features of POIs in the region, it still achieves relatively good classification accuracy compared to the LDA model. The Word2vec model, which considers both the quantitative features and spatial distribution features of POIs, has a higher classification accuracy than the TF-IDF and LDA because it considers both the frequency characteristics of the POI and the spatial relationship between the POIs. Compared with the above results, the proposed Block2vec achieved the highest classification accuracy and Kappa score. Figure 10 shows the confusion matrixes of urban functional region classification via different methods. Compared with other methods, the proposed method (Figure 10d) has the highest accuracy in the classification of Residential, Commercial and Industrial regions, and the top-2 accuracy in the classification of Business. The Word2Vec method (Figure 10a) has the highest accuracy in the classification of Business and Open Space, while it is lower in the classification of Residential and Industrial. The results show that, considers the spatial relationship of the parcels, the feature extraction model that can effectively improve the classification accuracy of Residential, Commercial and Industrial, but cannot improve the classification accuracy of the Open Space. Inconsistent with the previous studies, although the TF-IDF method only considers the quantitative features of POIs in the region, it still achieves relatively good classification accuracy compared to the LDA model. The Word2vec model, which considers both the quantitative features and spatial distribution features of POIs, has a higher classification accuracy than the TF-IDF and LDA because it considers both the frequency characteristics of the POI and the spatial relationship between the POIs. Compared with the above results, the proposed Block2vec achieved the highest classification accuracy and Kappa score. Figure 10 shows the confusion matrixes of urban functional region classification via different methods. Compared with other methods, the proposed method (Figure 10d) has the highest accuracy in the classification of Residential, Commercial and Industrial regions, and the top-2 accuracy in the classification of Business. The Word2Vec method (Figure 10a) has the highest accuracy in the classification of Business and Open Space, while it is lower in the classification of Residential and Industrial. The results show that, considers the spatial relationship of the parcels, the feature extraction model that can effectively improve the classification accuracy of Residential, Commercial and Industrial, but cannot improve the classification accuracy of the Open Space.
To further verify the classification results based on the proposed model, three local regions were compared with Google map and land use map of comprehensive planning in Wuhan. Figure 11a is the central area of the city, and its actual function type is mainly based on the business and commercial regions. The results show that the distribution of the proposed model classification is consistent with the distribution of planning maps. Figure 11b is another central area of the city, and its commercial scale is smaller than that in Figure 11a. The business in this area's planning map is allocated from north to south, while the classification results of the proposed model are allocated from east to west. Figure 11c is a business and industrial area in the southeastern part of the city. It can be seen that the classification results of the proposed model are completely inconsistent with the planning map. Through the comparison with online maps, the distribution of the proposed model classification is more realistic. Even though most of the regions in Figure 11c  To further verify the classification results based on the proposed model, three local regions were compared with Google map and land use map of comprehensive planning in Wuhan. Figure 11a is the central area of the city, and its actual function type is mainly based on the business and commercial regions. The results show that the distribution of the proposed model classification is consistent with the distribution of planning maps. Figure 11b is another central area of the city, and its commercial scale is smaller than that in Figure 11a. The business in this area's planning map is allocated from north to south, while the classification results of the proposed model are allocated from east to west.

The Influence of the Size of Latent Semantic Features
The latent semantic feature model used in this paper maps the POI sequence in a block to a latent high-dimensional semantic feature space, so the dimension of the latent semantic feature directly determines the semantic richness of the latent feature. If the dimension of the latent semantic feature is too low, it is difficult to obtain rich POI sequence semantics and lead to loss of information; however, too high a dimension may lead to information redundancy. Therefore, in this section, we try to analyze the ability of different size latent semantic features to identify and distinguish urban functional regions.   Figure 12 shows the classification accuracy evaluation changes of urban functional regions when the model is set to different dimensions. It can be seen that when the hidden layer size is between 10-100, with the increase of hidden layer size, the latent semantics acquired are more and more abundant and the classification accuracy is higher. When the hidden layer size is between 100 and 250, the change of classification accuracy is not obvious as the size is improved. When the hidden layer dimension continues to increase to 300, the classification accuracy decreases due to the too high dimension of the latent features. Therefore, it is appropriate to set the latent semantic feature size of the hidden layer to 200 in this paper.

The Influence of the Size of Latent Semantic Features
The latent semantic feature model used in this paper maps the POI sequence in a block to a latent high-dimensional semantic feature space, so the dimension of the latent semantic feature directly determines the semantic richness of the latent feature. If the dimension of the latent semantic feature is too low, it is difficult to obtain rich POI sequence semantics and lead to loss of information; however, too high a dimension may lead to information redundancy. Therefore, in this section, we try to analyze the ability of different size latent semantic features to identify and distinguish urban functional regions. Figure 12 shows the classification accuracy evaluation changes of urban functional regions when the model is set to different dimensions. It can be seen that when the hidden layer size is between 10-100, with the increase of hidden layer size, the latent semantics acquired are more and more abundant and the classification accuracy is higher. When the hidden layer size is between 100 and 250, the change of classification accuracy is not obvious as the size is improved. When the hidden layer dimension continues to increase to 300, the classification accuracy decreases due to the too high dimension of the latent features. Therefore, it is appropriate to set the latent semantic feature size of the hidden layer to 200 in this paper.

Discussion
A timely and accurate urban functional regions map is conducive to urban management and urban planning. This study proposed an effective approach for the identification of urban functional regions by extracting latent semantic features of POIs in parcels. The proposed approach considers the following spatial relationships of POIs: (1) The spatial relationship of POIs in a parcel: There is an interdependent and competitive relationship between adjacent POIs in geographic space [53]. (2) The spatial accessibility varies in different areas of a parcel: Generally speaking, there are more public facilities in the areas near the streets due to the high accessibility for external contact, while the areas near the interior of the parcel have some unique facilities. (3) The relationship between parcels: Parcels with different functions are often close to or distant from each other due to their interdependent or competing relationships.
This study agrees with previous studies [2,9,10,42] that natural language processing (NLP) has a good advantage in extracting the semantic features of POIs. However, few studies have explicitly addressed the spatial correlations among parcels. In this study, considering relationships existing between POIs, the POI semantic sequence was built with specific order. Then, sequence group was constructed by considering the relationships existing between parcels (center parcel and context parcel). The LSTM network was

Discussion
A timely and accurate urban functional regions map is conducive to urban management and urban planning. This study proposed an effective approach for the identification of urban functional regions by extracting latent semantic features of POIs in parcels. The proposed approach considers the following spatial relationships of POIs: (1) The spatial relationship of POIs in a parcel: There is an interdependent and competitive relationship between adjacent POIs in geographic space [53]. (2) The spatial accessibility varies in different areas of a parcel: Generally speaking, there are more public facilities in the areas near the streets due to the high accessibility for external contact, while the areas near the interior of the parcel have some unique facilities. (3) The relationship between parcels: Parcels with different functions are often close to or distant from each other due to their interdependent or competing relationships.
This study agrees with previous studies [2,9,10,42] that natural language processing (NLP) has a good advantage in extracting the semantic features of POIs. However, few studies have explicitly addressed the spatial correlations among parcels. In this study, considering relationships existing between POIs, the POI semantic sequence was built with specific order. Then, sequence group was constructed by considering the relationships existing between parcels (center parcel and context parcel). The LSTM network was used to extract the former, while the Encoder-Decoder structure was used to extract the latter. Consequently, the results achieved the highest accuracy (OA = 0.7186, Kappa = 0. 6429), which indicates that our model can effectively extract the latent features for more accurate classification of the urban functional regions. Moreover, the result of the confusion matrix indicates that the proposed method could effectively improve the classification accuracy of the Residential, Commercial and Industrial regions. This reveals that those types of parcels have close spatial correlations, while they are less spatially connected to Open Space land. Furthermore, the comparison of local regions classification results verified that the evolution of the actual function of the region may not be consistent with the planers' intention.
Classification accuracy for the four methods using POI data ranged from 0.5972 to 0.7186, which were close to the accuracy of relevant studies [8,30,35]. However, these were lower than that of remote sensing land use classification. The main reason is that the remote sensing images mainly classify the land based on the physical features of the land, and it has accumulated a large number of labeled datasets. Different from this, the classification based on the POIs was trained using the samples chosen by ourselves, with a small sample size and subject to subjective influence. At the same time, some regions' functions are affected by multiple human social activities.
It should be noted that this study has been examined only in the urban area, where there are abundant service facilities. Thus, it may be difficult to transfer this approach to areas with fewer POIs, such as suburbs and rural areas. Additionally, mixed-function type has not been mentioned as it is hard to artificially define this type [41,43]. Nevertheless, this paper innovatively proposed a parcel-based semantic extraction method, which outperformed other state-of-the-art methods reported in the paper in terms of its ability to extract POI semantics.
In addition, the research in this paper classified the parcels into five types of functional regions, which may conflict with standard urban land use classification. Some standard land use types are a mixture of various human social activities, which may not correspond exactly according to the land use types. There is also no unified definition standard in relevant studies [8,32,36]. However, this paper does not attempt to use the proposed method to define types to replace standard urban land use types. This research aims to provide a better data-driven method to quickly and accurately identify regional functions from POI data. Planners and government management thus can use this method to continuously and effectively observe and monitor changes in regional functions in the city.
Moreover, some functional regions can be subdivided. For instance, the residence could be divided into low-density and high-density residential land, which could be identified with remote sensing data [14]. However, it is difficult to distinguish them by the POIs alone. Incorporating other data, such as high-resolution remote sensing images and social media data, can effectively improve the ability to distinguish among more different types of urban functional regions.

Conclusions and Future Work
With rapid urbanization, the urban spatial structure of urban functional regions has become increasingly diverse and sophisticated. Therefore, it is necessary to produce a timely and accurate urban functional region map for urban management and urban planning. This study proposed an effective approach, called Block2vec, for the identification of urban functional regions by extracting latent semantic features of POI in parcels. First, a POI sequence and further sequence group were constructed for each parcel. Then, the POI sequence was mapped to a high-dimensional space by building a Block2vec model. Furthermore, the K-Means clustering and RFA classification were adopted to reveal the urban structures and to identify the functional types. Compared with other state-ofthe-art methods (TF-IDF, LDA and Word2vec), the Block2vec method could obtain the highest accuracy (OA = 0.7186, Kappa = 0.6429). Furthermore, the proposed method has a significant improvement in the classification accuracy of residential, commercial and industrial land. The proposed method can help urban management and urban planners to understand the distribution of urban functional regions in a timely and accurate manner. At the same time, this study also verified the potential of the neural language process model in the semantic information extraction of POIs.
For future work, accumulating more study areas will help us to obtain more training samples of functional regions. Last but not least, incorporating other data, such as highresolution remote sensing images and social media data, can effectively improve the ability to distinguish among more types of urban functional regions.