1. Introduction
China has been experiencing urban expansion at an unprecedented scale and rate, which can be attributed to rapid urbanization [
1]. Consequently, several problems such as urban population expansion, unreasonable economic and industrial structures, and the inefficient utilization of land resources are becoming more prominent. Urban land use reflects human activities as well as their use of land [
2], and it is closely related to urban planning, economic development, and environmental protection [
3,
4,
5]. In China, urban industrial land (UIL) can be interpreted as urban construction land that is classified in terms of an industrial purpose, and it usually refers to secondary and tertiary industrial land in the city [
6,
7]. With rapid economic development and urban expansion, the contradiction between the increasing demand of UIL and limited land resources has been gradually intensifying. Therefore, it is important to acquire up-to-date UIL information for rational planning.
Land Use Land Cover mapping usually relies on manual interpretations of aerial photos or satellite images, combined with field surveys and auxiliary materials (e.g., statistical data) [
8]. Urban environments can change rapidly in large city regions when accompanied by rapid urbanization, exceeding the rate by which on-and-off efforts are undertaken to update existing land use database; thus, timely updatable urban land use information is very important [
2,
9]. Many researchers have utilized object-oriented classification from satellite-based remote sensing images for obtaining urban land use types, focusing on the utilization of physical properties (e.g., spectrum, texture, and shape information) of ground objects derived from remote sensing images [
10,
11]; however, the knowledge regarding socioeconomic functions has not been fully utilized therein [
12,
13,
14]. Some land use types such as commercial land and industrial land cannot be distinguished by simply using remote sensing information and its derived features because urban land use types are strongly correlated with internal socioeconomic activities [
13].
Open social data that reflects socioeconomic activities is a particularly promising data resource, as it can capture spatiotemporal patterns of human activities and thus uncover the association with different land use types [
12]. Point of Interest (POI) data therein possess rich semantic and location information. A POI record documents a location point and consists of attributes such as name, category (multilevel), and coordinates [
15]. Moreover, POIs can denote buildings, stores, universities, or geographical entities of a certain size, and this level of detail strengthens the ability of the data to describe the entity’s location. POIs have been utilized to estimate fine-resolution population [
16], explore commercial pattern changes [
17], and classify urban green space from a social function perspective [
18]. Textual information from POIs has been mined using the topic model to infer land cover types [
19]. However, this study considered only the frequency and not the spatial distribution of POIs. Yao et al. [
14] considered the relationship between the spatial distribution information of POIs and regional functions. Based on these studies, the Word2Vec model can be used to quantify the relationship between POIs and UIL types [
14]. The Word2Vec model can map words into high-dimensional vector spaces via their contextual relationships [
20].
Researchers have substantially studied urban land use efficiency in China [
21], with a focus on urban agglomerations [
22] or on specific cities [
23] as well as land use types such as construction land [
24] or industrial land [
25]. Efficiency is a crucial criterion for measuring urban land use level, and urban land use efficiency can be understood as the well-being produced on per unit urban land [
23]. Researchers have developed specific indicators to measure land use efficiency in China [
26], which covers economic (e.g., the total assets investment) and environmental aspects (e.g., air quality and green rate) [
23,
27]. Based on this, in our study, UIL use efficiency was used to measure the degree of UIL utilization by accounting for economic and environmental factors: the higher the UIL use efficiency, the better the UIL value realization. We considered utilizing accessible remote sensing data, ground observation data, and necessary statistical data for evaluating UIL use efficiency using the entropy-weight Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method. TOPSIS can generate reasonable evaluations of different problems with multiple criteria [
28]. It has been applied in urban planning management studies such as hazard assessment and sustainable and ecological development evaluation [
29,
30].
Considering the abovementioned issues, our study focuses on two objectives: first, we utilized timely and updatable POI data and a Word2Vec model to extract UIL types; next, we evaluated UIL use efficiency using the entropy-weight TOPSIS method by generating evaluation indicators from remote sensing data, ground observation data, and statistical data. This paper is structured as follows: the study area, data sources, and implementation methods are described in
Section 2 and
Section 3 and the results, discussion, and conclusions are in
Section 4,
Section 5 and
Section 6.
3. Methodology
The identification and use efficiency evaluation of UIL was conducted using the following methodological framework (
Figure 2).
We first generated parcels using the OSM road network (including the deletion, trimming, and buffering of roads and removal of road space) based on the method of Long and Liu [
35] and utilized them as basic analytical units. Parcels were relatively homogeneous in terms of land use functions [
36] and were polygons bounded by road networks [
9]. The corpus (i.e., a large and organized collection of well-sampled and processed texts in the field of natural language processing [
37]) was constructed based on Parcels and POIs. POI vectors were extracted using a Continuous Bag-of-Words (CBoW)-based Word2Vec model, and parcel vectors were calculated by performing the averaged summation of POI vectors within parcels. The random forest model was adopted to extract UIL types of parcels based on parcel vectors.
Subsequently, at the parcel level, remote sensing data, ground observation data, and statistical data were utilized to extract the following: the green space coverage index; the index of number of days with good air quality; and indices connected with industrial development, including total assets, total profits, number of employees, energy consumption, and net demand of water. The entropy-weight TOPSIS method was employed to evaluate UIL parcel use efficiency based on these indicators.
3.1. Classification System of UIL
UIL types in this study were classified according to the Chinese urban land use classification criteria (GB 50137–2011), as shown in
Table 1. Considering there are a few warehouses within Beijing’s fifth ring road, we classified warehousing into industrial land. UIL contained two primary categories, commercial land and industrial land, and commercial land consisted of commercial services, accommodation, entertainment, and business.
3.2. Construction of the UIL Identification Model
UIL types are difficult to acquire solely using remote sensing images: UIL types are related to socioeconomic activities and human use; however, socioeconomic knowledge extracted from remote sensing images is limited [
12,
13]. In this study, we mined semantic information and spatial correlations of POI data inside cities, quantified POIs into high-dimensional vectors by utilizing CBoW-based Word2Vec model, and applied POI vectors to determine UIL types at the parcel scale.
3.2.1. Parcel Generation
Long and Liu [
35] suggested that parcels served as natural segmentation boundaries within urban areas. In this study, the OSM road network was utilized to generate parcels based on this assumption. First, the OSM roads were preprocessed; redundant roads were deleted (e.g., roads within residential districts, university districts); and dangling roads were trimmed. Based on the importance of their original levels (
http://wiki.openstreetmap.org/wiki/Key:highway) [
8], we reclassified the OSM roads into three levels, wherein Level 1 includes trunk, primary, and motorway roads; Level 2 includes secondary roads; and Level 3 includes tertiary, unclassified, residential, and service roads. Next, according to the Chinese road construction standards and the actual road widths within Beijing’s fifth ring road, road spaces were generated by creating 40, 20, and 10 m buffer widths [
8]. Finally, less meaningful parcels were removed: those with an area below 0.005 km
2 were removed because they lacked actual socioeconomic functions or were located inside overpasses or roundabouts. In this manner, 3005 parcels were obtained (
Figure 3).
3.2.2. Building Parcel-POI Corpus and Obtaining Vectors of POI Categories
In the field of natural language processing (NLP), a corpus usually refers to a large and organized collection of well-sampled and processed texts [
37]. To be specific, a corpus includes plenty of documents, and each document contains several words. In a corpus, the sequential order of documents and the words therein represents the contextual relationships [
14]. Applying these concepts to UIL identification analysis, the study area, each parcel therein, and POIs within each parcel were considered as the corpus, the document therein, and words within each document, respectively. Level-3 categories of POIs were used as descriptions for composing documents in order to guarantee sufficient and effective words.
In our study, the Parcel-POI corpus was constructed using a shortest path method proposed by Yao et al. [
14]. Based on this method, POIs in parcels were connected to each other through their spatial relations. Before building the Parcel-POI corpus, we first selected and obtained the POIs within each parcel according to the parcel’s location. The process of constructing a Parcel-POI corpus was as follows. First, for each parcel, the shortest path passing all POIs was calculated and the sequential order of POIs was recorded. Next, parcel-based documents were constructed using words (i.e., level-3 categories of POI) in the same order as that of the POI’s acquisition above. Similarly, the sequential order of parcel-based documents in the corpus was recorded by calculating the shortest path passing all parcels. Thus, the Parcel-POI corpus was constructed. Through the organization of the corpus, the spatial distribution attributes and location relationships of the POIs were revealed by the contextual relationships of words (i.e., level-3 categories of POI) [
14].
The CBoW model in the Word2Vec tool was selected to train vectors of POI categories in our study. Word2Vec launched by Google is an open source tool for word embedding in the natural language process [
20]. By building a neural network language model on a given corpus, the Word2Vec model can map words into high-dimensional vectors via its contextual content [
20].
3.2.3. Extracting UIL Types for Each Parcel
Parcel vectors were calculated by the averaged summation of all POI vectors inside a parcel based on the study of Zhuang et al. [
38]. Assuming
M POIs
in the
ith parcel, parcel vectors can be expressed as follows:
where
represents the vector of the
kth POI category in the
ith parcel, and
represents the vector of the
ith parcel.
The actual UIL types of parcels participated in analysis were identified according to the online digital map, street view data, and field surveys. Considering the existence of multiple types within some parcels, the main UIL type of these mixed parcels was used. Half of the total parcels participated in analysis were randomly selected as training samples, and the remaining half were used as testing samples. The training samples were utilized to train model parameters, in which parcel vectors were considered as the input and their actual UIL types as labels. The trained model was then used for precision assessment of testing samples.
In our study, the random forest model [
39,
40] was selected to extract UIL types for parcels. The Out-Of-Bag (OOB) [
8] accuracy, which is an unbiased estimation of generalization error, was utilized to evaluate the prediction precision of the random forest model. The classification process was repeated 50 times and the overall accuracy was considered as the evaluation criteria.
3.3. UIL Use Efficiency Evaluation
Economic and environmental benefits were accounted for during the selection of urban land use efficiency indicators [
27]. Referring to previous studies on indicator selection and considering the availability of data [
21,
23], we selected total assets, total profits, and number of employees as economic benefits; environmental benefits included green space coverage, number of days with good air quality, energy consumption, and net demand of water. After obtaining the abovementioned indicators of UIL parcels, entropy-weight TOPSIS was utilized to evaluate use efficiency of each UIL parcel.
3.3.1. Use Efficiency Indicators Acquisition
In our study, green space coverage for each UIL parcel refers to the proportion of vegetation within the parcel, as shown in Equation (4).
where GSC represents green space coverage,
is the vegetation area within the parcel, and
is the parcel area.
The number of days with good air quality for UIL parcels was calculated as follows. According to the distribution of 35 monitoring stations, 366 10 m gridded surfaces of daily AQI estimations were created using Kriging interpolation, including the gridded surfaces of the standard error to describe the prediction error. The daily AQI of UIL parcels was obtained through spatial averaging of all the gridded daily AQI values inside the parcels. The number of days with good air quality for UIL parcels was obtained by counting days in a year with AQI below 50.
Indices connected with industrial development for UIL parcels, including total profits, total assets, number of employees, energy consumption, and net demand of water, were calculated as follows. Statistical data values of street level were assigned to street units, and 200 m × 200 m gridded data were created within street units’ boundaries covering the study area. The value of each grid was determined using the areal weighting method. Each index abovementioned of UIL parcels was calculated by summation of all gridded values of this index inside the parcels.
3.3.2. Use Efficiency Evaluation Method
Entropy-weight TOPSIS, as a multi-criteria decision analysis technique, was adopted to calculate UIL use efficiency scores. This model introduced the entropy-weight method [
41] into TOPSIS: first, the entropy-weight method utilized information entropy to determine the weight of indicators, wherein the smaller the information entropy, the greater the weight of indicators [
42]; then, TOPSIS was utilized to calculate the degree of proximity between the positive and negative ideal solutions and evaluated objects and accordingly determine comprehensive evaluation scores [
28,
43]. The process of calculating UIL use efficiency scores was as follows:
The first step was data standardization.
Consider the indicator matrix , where m denotes the number of UIL parcels, n denotes the number of indexes, and denotes the value of index j of the ith UIL parcel ().
Applying vector normalization to convert the original index value
into normalized index value
:
The second step was index weight determination by the entropy-weight method.
We calculated the proportion of index
j in the
ith UIL parcel.
Determining the information entropy value of index
j:
We calculated the weight value of index
j.
The third step was the determination of UIL evaluation scores by TOPSIS.
We determined a set of weights
for
n indexes and calculated the weighted normalized indicator matrix (
Z).
We then calculated the positive and negative ideal solutions.
Then, we determined
and
by calculating the Euclidean distances between each UIL parcel’s evaluated value and the positive and negative ideal solutions.
Finally, we calculated the evaluation score (
) for each UIL parcel; a higher value of
reflected a greater the utilization degree of UIL parcel, ranging from 0 to 1.
4. Analysis and Results
4.1. Identification Results and Spatial Distribution of UIL
The study area included a total of 613,973 POIs, which were distributed into 2888 parcels. Due to lack of POIs, 117 parcels were removed and excluded from the analysis. A total of 1548 level-3 categories of POIs were included in building the Parcel-POI corpus. When using the CBoW model to construct word vectors, the sampling window size, output vector dimensions, and number of iterations were set to 5, 200, and 20, respectively. The default settings were maintained for other parameters. The spatial distributions of POIs were represented by the contextual relationships of attribute categories via the conversion of the CBoW model. Thus, 1548 characteristic vectors of level-3 categories of POI were obtained.
Characteristic vectors of 1444 parcels were used as the training samples to construct the random forest model. The characteristic vectors of the same number of parcels were used as the testing samples to evaluate the UIL identification precision of the random forest model. The OOB and testing precisions were 83.19% and 92.24%. The confusion matrix of identifying UIL types using the testing samples was shown in
Table 2.
Other types were well differentiated from commercial and industrial parcels because the producer’s and user’s accuracies were more than 95%. Some of the other types were misclassified as commercial parcels because they contained many POIs that were involved in commercial content; however, the main function of these parcels was not commercial. Commercial parcels were classified slightly better than industrial parcels. Misclassified commercial parcels were principally the large parcels, which were likely to contain other types.
There was a total of 602 UIL parcels within the study region (
Table 3). The region from the 4th to 5th ring roads possessed the largest number of commercial parcels, which possibly contributed to reducing congestion in the core area. The distribution of UIL presented a ring structure, developing outwards along the ring roads (
Figure 4a), and contained concentrated commercial areas, such as commercial shopping centers (e.g., Xidan, Wangfujing, and Zhongguancun commercial districts) and business office buildings (e.g., Jinrong Street and Guomao business district) (
Figure 4b). From the street level, UIL parcels were mainly distributed in Sanlitun Street, Chaowai Street, Dongzhimen Street, Jinrong Street, Zhongguancun Street, and Wangjing Street.
Commercial parcels in Xicheng and Dongcheng districts were mainly used for commercial services (shopping centers) and businesses (finance services). In Haidian, Fengtai, and Chaoyang districts, some were also used for businesses (information services) and entertainment. Industrial parcels were mainly concentrated in the western and southern areas between the 4th and 5th ring roads (
Figure 5a), which was likely connected to land prices and the convenience of transportation conditions, as well as urban planning policies.
Ripley’s
L(
d) value was used to measure the multi-distance spatial agglomeration pattern of UIL within Beijing’s fifth ring road. The
L(
d) value of UIL was greater than 0, indicating that UIL exhibited spatial agglomeration (
Figure 5b). Meanwhile, as the distances increased from 0 to 9 km, the agglomeration intensity firstly increased and then slowly decreased.
4.2. Criteria, Evaluation Results, and Spatial Distribution of UIL Use Efficiency
Utilizing the methods in
Section 3.3.1, seven criteria were calculated. The weights of seven criteria were determined by the entropy-weight method mentioned in Step 2 of
Section 3.3.2, and four criteria (i.e., total assets, total profits, number of employees, and green space coverage) had more impact on UIL use efficiency.
There was no obvious distribution pattern of the green space coverage of UIL. In particular, the green space coverage of commercial parcels that were mainly used for entertainment was up to 90%, and commercial parcels mainly used for business had higher green space coverage than those used for accommodation or commercial services. This was related to UIL types and specific greening planning for construction projects.
On the whole, the number of days with good air quality was marginally higher in commercial parcels than in industrial parcels. Obvious north‒south differences existed, and the value of the southern area was relatively lower, which was related to industry distribution and government controls of air quality.
There was a south–north trend in total assets, total profits, number of employees, energy consumption, and net demand of water: high values of parcels comprising these five indicators aggregated inside the 3rd ring road, which was connected to the highly concentrated financial services and business service industries.
The use efficiency evaluation scores ranged from 0.10 to 0.71 and were reclassified into four classes: I (0.1–0.2), II (0.2–0.3), III (0.3–0.4), and IV (above 0.4).
Figure 6a presented the proportion of four class scores between each pair of ring roads. Use efficiency scores showed a north‒south gradient, and scores inside the 3rd ring road were better than in other regions (
Figure 6b). The evaluation scores of commercial parcels were clearly better than those of industrial parcels, and scores above 0.4 mostly appeared in commercial parcels whose main function was accommodation and business (e.g., commercial office buildings, financial services, and hotels).
Global spatial autocorrelation measures the overall correlation degree, spatial distribution pattern, and their significance of spatial objects in the region. The overall correlation of UIL utilization degree was obtained according to UIL use efficiency scores using global Moran’s
I. The global Moran’s
I value of 0.647 (
p = 0.001) passed the significance testing at 1% (
Figure 7a), and the figure indicated that UIL utilization degree showed positive spatial correlation. The Local Indicators of Spatial Association (LISA) aggregation graph was drawn to describe local spatial heterogeneity characteristics. High–high regions contained 88 commercial parcels, and were located in areas roughly consistent with the concentrated commercial area (
Figure 7b).
5. Discussion
5.1. Industrial Development in Beijing
Since 2004, the change in the industrial structure has accelerated in Beijing. The central city area principally consists of finance and business offices, and traditional manufacturing industries have been gradually moved outwards, with an aim to alleviate the current situation of the excessive agglomeration of industrial functions in urban central areas. Industries in key functional areas are more concentrated, such as the financial industry in Jinrong Street, business services in Guomao business district, and information services in Zhongguancun. This is consistent with the results of UIL identification with POIs. Certain problems in industrial land were reported, such as relatively scattered distribution, smaller surface areas, and lower degree of development and utilization. Our study reflects this and shows that industrial land is principally scattered between the 4th to 5th ring roads, and that their degree of utilization is low, as calculated by the entropy-weight TOPSIS method.
5.2. Implication of UIL Identification and Use Efficiency Analysis
Combined with easily accessible and timely updated POI data, the ongoing situation of UIL can be obtained and thus provide support for urban land use planning. UIL supports the social and economic activities of cities, and UIL identification can intuitively provide insight into the layout, distribution pattern, and density of UIL. The UIL layout is the comprehensive result of joint action between economy, technology, and policy factors, and the evolution of the UIL layout reflects changes in the industrial structure and economic development. The spatial distribution and combination of urban land can affect urban functional operation, residents’ quality of life, and the level of urban development. As an important urban land use type, the rational arrangement of UIL is the key issue of urban development. For instance, the rational arrangement of UIL and residential land can contribute to the easing of urban commuting pressure and thereby reduce environmental pollution [
44]. Meanwhile, the causes of the present situation of UIL can be analyzed by considering economic, social and environmental factors.
UIL use efficiency reflects the realization degree of UIL value in the process of urban economic development, and is related to factors of population size, economic level, traffic location, industrial structure, and input of production factors. By analyzing the spatial distribution of UIL use efficiency, measures can be taken to improve UIL utilization degree.
5.3. Incorporating Multi-Source Data
POI categories are cognized and conceptualized by individuals, which can help us better understand interactions between the urban environment and UIL types. Compared with traditional urban land use mapping methods or aerial photo interpretation, the utilization of POI data provides a new perspective for UIL identification. POIs used for constructing characteristic vectors are continually updated on the Internet, thus providing a quick method for acquiring the newest UIL types and enhancing the convenience of decision makers for understanding UIL layouts.
We constructed use efficiency evaluation indicators based on economic benefits and environmental impacts. Combined with multi-source data that included remote sensing data, ground observation data and statistical data, a better understanding of UIL use efficiency can be achieved in comparison to solely depending on statistical data. Similar to POI data, remote sensing data and ground observation data can also be updated on a regular basis. However, data related to industrial development are only available through official statistics at present. This suggests that up-to-date UIL use efficiency evaluation results can be maintained through a specific time-scale update.
5.4. Issues Associated with OSM-Segmented Parcels
There were some intrinsic problems for parcels generated by road networks because the quality of parcels depended on the accuracy of OSM. Parcels generated for the study region behaved well in the central area. However, they were not satisfactory in margin areas where the road networks were sparse. Thus, there were mixed UIL types distributed particularly between the 4th and 5th ring roads on account of the OSM’s quality. Within some parcels, commercial buildings lacked clear boundaries with residential districts, university areas, or government offices, which impeded them from being correctly classified. For example, “Beijing Ningxia Hotel” shown in
Figure 8a was located around residences, but no specific separation could be demarcated between the two. In addition, some large parcels were a mix of UIL types and other types, such as green spaces and residences, while few specific roads divided them into distinct units. For example, “Beijing Xinfadi Logistics Centers” were located in a large parcel mixed with other types without specific roads separating them (
Figure 8b).