Identification and Portrait of Urban Functional Zones Based on Multisource Heterogeneous Data and Ensemble Learning

Urban functional zones are important space carriers for urban economic and social function. The accurate and rapid identification of urban functional zones is of great significance to urban planning and resource allocation. However, the factors considered in the existing functional zone identification methods are not comprehensive enough, and the recognition of functional zones stops at their categories. This paper proposes a framework that combines multisource heterogeneous data to identify the categories of functional zones and draw the portraits of functional zones. The framework comprehensively describes the features of functional zones from four aspects: building-level metrics, landscape metrics, semantic metrics, and human activity metrics, and uses a combination of ensemble learning and active learning to balance the identification accuracy of functional zones and the labeling cost during large-scale generalization. Furthermore, sentiment analysis, word cloud analysis, and land cover proportion maps are added to the portraits of typical functional zones to make the image of functional zones vivid. The experiment carried out within the Fifth Ring Road, Haidian District, Beijing, shows that the overall accuracy of the method reached 82.37% and the portraits of the four typical functional zones are clear. The method in this paper has good repeatability and generalization, which is helpful to carry out quantitative and objective research on urban functional zones.


Introduction
According to people's different social and economic activities, cities are divided into different functional zones, which are the basic units of urban planning, management, and resource allocation [1][2][3][4][5][6]. The accurate mapping of functional zones is of great significance for the quantitative analysis of urban traffic, the balance of workplaces and residences, and residents' relocation [7], which are also helpful to economic and demographic research [8,9]. In traditional urban planning, urban functional zones are first planned and then constructed, and the distribution of functional zones must follow rules. However, it is difficult to replan and construct built-up areas in many large and medium-sized cities. At the same time, the division of urban functional zones formulated by the government usually takes administrative divisions as the unit, which only indicates the functional distribution on a macro level. In this case, the accurate division of fine-scale urban functional zones in existing built-up areas is of great significance to the understanding and management of cities.
In previous studies that have examined the research unit of fine-scale functional zones, the partitioning of functional zones involves the problem of modifiable area units [10], so it is necessary to determine the division scale of functional zones before describing their categories. In the literature [11], the methods for dividing functional zone units are summarized into three types: grid-based methods, block-based methods, and cadastral data-based methods. Grid-based methods usually create fishnets that cover the study area according Remote Sens. 2021, 13, 373 2 of 20 to a certain space interval and then determine their functional attributes [12,13]. This segmentation method is simple and easy to operate, but the boundary obtained is not accurate. A grid may contain multiple functional zones, or a functional zone may be segmented by multiple grids. Block-based methods divide functional zones by road networks and then analyze the functional attributes of these traffic analysis zones (TAZs) [2,7,[14][15][16]. Road networks usually come from the popular collaborative mapping project OpenStreetMap (OSM) or the navigation data of Internet maps. These TAZs are more in line with the urban planning units, but the simple qualitative analysis of functional categories in previous studies cannot help urban planners obtain a good understanding of functional zones. The cadastral data-based methods use the land parcel in cadastral surveys as the unit of functional zone identification [17]. Cadastral data are detailed and accurate, but they are slow to update and sometimes difficult to obtain [18]. In recent years, there have been studies on functional zones based on segmentation units. These studies usually use object-oriented segmentation methods to obtain functional objects and study functional zones on a more precise scale [11,19]. However, the concept of functional zones implies highly heterogeneous and complex scenes [1,20,21]. Functional zones are a combination of various land cover types with complex visual characteristics. Image segmentation produces objects with similar spectral and texture features rather than areas with homogeneous social functions, which is more meaningful in land cover and land use classification [14,22]. Functional zones should not correspond to a single land use type but should be a combination of land use and land cover types to provide functional services for people's daily lives. Therefore, the research unit of this paper is based on TAZs, which are relatively fine and consistent with the planning unit of urban planning.
From the perspective of data and methods, image data and location-based social perception data are the main data sources suitable for urban functional area research. Very high-resolution (VHR) remote sensing images provide a wide and detailed view for the recognition of urban functional zones [1,23,24]. The buildings in the image are clearly visible, and the land cover is immediately apparent. By using high-resolution remote sensing data as a single data source, Zhou et al. proposed a super object convolutional neural network based on super objects from image segmentation. The functional attributes of super objects were determined by voting on the functional categories of random points in them [11]. However, most remote sensing images are focused on the natural characteristics of land cover, with abundant spectral and texture information and a lack of information on human and social activities [7]. Therefore, in functional zone studies, remote sensing images are usually combined with other types of data to conduct research. With the development of information technology and the construction of smart cities, new urban social perception big data and corresponding data mining methods are emerging. Point of interest (POI) data are one of the most common types of data in the study of functional zone identification. POI data are widely used because of their accessible and rich semantic and land use information [2,25]. Zhang et al. proposed using public bicycle rental records and POI data to identify urban functional zones. The topic model and the unsupervised classification method are used to cluster the functional zones [15]. Song et al. used VHR remote sensing images and POI data to calculate the function value of a TAZ by using building rooftop functional segments and the corresponding segmental weight coefficient [7]. Chao et al. used social media check-in records and street view imagery to identify functional zones. Verbs in the check-in records are used to represent human activities, and street view data are used to represent the natural environment. The categories of urban functional zones are predicted based on deep learning [26]. Mobile phone signaling data, a new type of human activity data, have also appeared in the study of urban functional zone recognition. Tu et al. studied the feasibility of hierarchical clustering by combining mobile phone positioning data and VHR remote sensing data to identify functional zones [12]. In addition, real-time population heat map data, floating car trajectory data, and bus smart card data appear as human activity data in the study of functional area recognition [13,16,27].
Generally, these studies on functional zones involve buildings, the natural environment, functional semantics, and human activities. Previous studies often establish the corresponding relationship between features and functional zones based on one point or parts of points but rarely comprehensively consider the characteristics of these four different aspects. In fact, all of these four aspects reflect the functions of zones. Functional zones are closely related to human activities. As the main space of human activities in the city, the shape of buildings must also conform to their functionality. In addition, landscape design is not only based on aesthetic or ecological considerations, but also closely related to its function. POIs are intuitive reflections of the functional zones. The above four metrics are directly related to the recognition of functional zones. The purpose of this paper is to comprehensively consider these four metrics of building, landscape, semantics, and human activities to complete the identification of urban functional zones. In addition, this paper also aims to optimize the other two aspects ignored in previous studies of functional zones: on the one hand, large-scale functional zone recognition application scenarios are different from small-scale experiments. One of the important differences is sample labeling. In small-scale experiments, all the training samples can be labeled in order to obtain better classification accuracy. However, large-scale application scenarios need to fully consider the cost of sample labeling. In the process of extending the method of small-scale experiments to large-scale application scenarios, how to balance the accuracy of identification and the cost of labeling is one of the problems that this paper attempts to solve. On the other hand, this paper tries to make portraits of urban functional zones. Previous studies stop at the recognition of functional zones' categories, but categories are only the most basic information of functional zones. People's emotions in the process of interaction with urban functional space, various types of POI, and the percentage of vegetation coverage can reflect the characteristics of functional zones from different aspects, which together form the portraits of functional zones.
Based on the above three objectives, a series of experiments are designed in this paper. This study contributes in three ways. First, this paper comprehensively describes the characteristics of urban functional zones in four different aspects, building-level metrics, landscape metrics, semantic metrics, and human activity metrics, and effectively identifies the categories of functional zones. Second, active learning is introduced in the process of sample labeling to balance the accuracy and the cost of sample labeling in large-scale urban functional zone recognition. Third, portraits of functional zones create a multiperspective description mode of functional zones.

Study Area
In this study, we selected the central urban area of Haidian District, Beijing, China as the study area. Specifically, the study area is part of Haidian District within the scope of the Fifth Ring Road (Figure 1), covering an area of approximately 145 square kilometers. The study area belongs to the center of Beijing, and its functional categories are complex and comprehensive. There are government agencies, campuses, technology companies, business centers, famous attractions, residential areas, and so on. The complexity of the functional categories of the study area is conducive to extending the method of this research to other regions.

Materials and Preprocessing
This paper involves multisource heterogeneous data. Table 1 is an overview of the data sources. Due to the different characteristics and data structures of these heterogeneous data, the corresponding preprocessing process for each type of data is also different. The subsequent Sections 2.2.1-2.2.4 will introduce the basic information and preprocessing methods of various data in more detail. Two kinds of remote sensing images are used in this study: Google Maps images and GaoFen-2 (GF-2) images. The resolution of the selected Google Maps image is 0.6 m, and the image acquisition times are concentrated in 2018-2019. GF-2 is a high-resolution remote sensing satellite launched in 2014 equipped with panchromatic and multispectral sensors with corresponding resolutions of 1 m and 4 m, respectively [14]. A GF-2 multispectral image acquired on May 9, 2019 was used in this study. Its acquisition time was similar to that of the Google Maps image. The reason for choosing these two kinds of remote sensing image data sources is that Google Maps images have high spatial resolution, which can ensure the accurate extraction of buildings, but only the visible light band can be used. The multispectral band of the GF-2 image can provide more information for subsequent land cover extraction. Using a combination of multisource remote sensing data can highlight their respective advantages. The obtained Google Maps image can be used directly, but the GF-2 image requires a series of preprocessing steps. First, the parameters provided by the China Center for Resources Satellite Data and Application were used for radiometric calibration. Second, atmospheric correction was performed to eliminate errors caused by atmospheric scattering, absorption, and reflection. Third, orthorectification was performed by using the rational polynomial model. Finally, Google Maps images were used as a reference for georeferencing, which made the two images match completely.

Urban Road Network
Urban planning and development often take road-based parcels as the basic unit, and previous research on urban functional zones based on road networks has also achieved good results [7,28]. OSM is a crowdsourcing project to create a free and editable map service. In this study, the road networks in OSM were reclassified according to the importance level [29], and the results were consistent with the current "highway engineering technical standard" (JTG B01-2014). After removing the corresponding buffer zones of highways and level 1-3 roads, TAZs were obtained. Finally, after removing the TAZs surrounded by the ring-shaped overpass that were too small and had no socioeconomic functions, a total of 694 TAZs were obtained for subsequent analysis.

Semantic Data
In recent years, an increasing amount of semantic data with geographical markers have appeared in people's lives, such as locations on Internet maps, comments on consumer review sites (e.g., Yelp), and real-time information shared on social network platforms (e.g., Twitter). All of these text data are geotagged. In this paper, two kinds of social sensing semantic data with geographical markers are used: POI data and Weibo (known as Chinese Twitter) check-in data.
The POI data come from Amap (https://www.amap.com) and Baidu map (https://map. baidu.com), which are two of the most popular Internet map service providers in China. A total of 223,088 POI data points were collected in the study area in June 2019. POI data have different levels: Baidu POI data contain two levels, while Amap POI data contain three levels. In this study, to fully extract the semantic information of land use in POI data and avoid data redundancy, we selected the second-level classification of all POI data. Therefore, the preprocessing process for POI data was mainly the process of data cleaning: discarding duplicate values, handling missing values, processing outliers, and allocating POI data to the corresponding TAZs.
Weibo is the largest microblogging social media platform in China. People often use Weibo to share immediate events or emotions. The sentiment analysis based on Weibo check-in data describes the functional zones from different aspects. In this study, we collected a total of 506,165 Weibo check-in data points in the study area from July 2017 to July 2018. The preprocessing of Weibo check-in data mainly included data cleaning (deleting the web page URL, @ user name, emoji, topic symbol contained in the text) [26], merging the check-in records according to the check-in points, and assigning the check-in points to the corresponding TAZs.

Human Activity Data
The change in population density in a period is one of the external manifestations of different functional zones. Therefore, the change in human activity density is also considered in this study. In past research on population spatialization, static population data have usually been derived from statistical yearbooks over the years. Population data are distributed in a certain size grid by gravity models, area weight models, inverse distance weighted models, or land use population models [30][31][32][33][34][35]. However, it is difficult for static data to support the study of urban functional zones. In recent years, an increasing number of mobile phone applications have provided more accurate services based on users' location information. The data from Real-time Tencent user (RTU) used in this study are high-resolution (25 m) user density grade products of Tencent, one of the Internet enterprises with the largest number of users in China [13,36,37]. This study collected RTU data from September 2 to September 8, 2019 (one week). The change in human activity density over time is conducive to the identification of urban functional zones.

Methodology
The flow chart of this method framework is shown in Figure 2. Image data, POI data, RTU data, and Weibo check-in data are used to describe the features of functional zones from different perspectives. These features can be summarized into four metrics in general: building-level metrics, landscape metrics, semantic metrics, and human activity metrics. In the TAZs obtained by taking road networks as the segmentation unit, the balance of accurately identifying functional zones and reducing the burden of sample labeling is achieved by combining the sampling method of active learning and the classification method of supervised ensemble learning. Subsequently, the results of category identification and sentiment analysis are combined to draw the portraits of urban functional zones.
of mobile phone applications have provided more accurate services based on users' location information. The data from Real-time Tencent user (RTU) used in this study are highresolution (25 m) user density grade products of Tencent, one of the Internet enterprises with the largest number of users in China [13,36,37]. This study collected RTU data from September 2 to September 8, 2019 (one week). The change in human activity density over time is conducive to the identification of urban functional zones.

Methodology
The flow chart of this method framework is shown in Figure 2. Image data, POI data, RTU data, and Weibo check-in data are used to describe the features of functional zones from different perspectives. These features can be summarized into four metrics in general: building-level metrics, landscape metrics, semantic metrics, and human activity metrics. In the TAZs obtained by taking road networks as the segmentation unit, the balance of accurately identifying functional zones and reducing the burden of sample labeling is achieved by combining the sampling method of active learning and the classification method of supervised ensemble learning. Subsequently, the results of category identification and sentiment analysis are combined to draw the portraits of urban functional zones.

Building-Level Metrics
Building-level metrics are important characteristics of functional zones [38]. Buildings are an important part of the city. The same types of buildings have similar requirements for area, location, and function, resulting in the gathering of the same types of buildings in the urban space [19]. For example, industrial areas or public service areas usually have large independent buildings with low aggregation density, while residential areas usually have continuous strip buildings with high aggregation density [39]. Previous studies have achieved a good distinction of functional zones by using building-based

Building-Level Metrics
Building-level metrics are important characteristics of functional zones [38]. Buildings are an important part of the city. The same types of buildings have similar requirements for area, location, and function, resulting in the gathering of the same types of buildings in the urban space [19]. For example, industrial areas or public service areas usually have large independent buildings with low aggregation density, while residential areas usually have continuous strip buildings with high aggregation density [39]. Previous studies have achieved a good distinction of functional zones by using building-based indicators [40]. Accurate and rapid large-scale building extraction is the key to obtaining building-level metrics. Traditional building extraction methods often have the disadvantage of ignoring the context information in the image [41,42]. In this study, we used a deep learning-based semantic segmentation method to extract urban buildings from 0.6 m resolution Google images. Submeter resolution images can ensure that buildings are more accurately extracted.
Specifically, D-LinkNet, a deep semantic segmentation network, was used for building extraction in this paper [43]. This network has the typical encoder-decoder structure of semantic segmentation networks. Encoder ResNet34 [44] pretrained on ImageNet [45] ensured the effectiveness and stability of feature extraction. In the center part of the network, the dilated convolution layer [46] can enlarge the receptive field and capture multiresolution context information. For the decoder part, transposed convolution layers [47] are used for upsampling to restore the original image resolution. In this study, 26 sample patches with a size of 1000 × 1000 pixels were selected for network training. As far as possible, the samples should include various types of buildings in the study area. As shown in Figure 3, buildings are marked as targets, others are marked as backgrounds, and building extraction is a binary semantic segmentation task.
indicators [40]. Accurate and rapid large-scale building extraction is the key to obtaining building-level metrics. Traditional building extraction methods often have the disadvantage of ignoring the context information in the image [41,42]. In this study, we used a deep learning-based semantic segmentation method to extract urban buildings from 0.6m resolution Google images. Submeter resolution images can ensure that buildings are more accurately extracted. Specifically, D-LinkNet, a deep semantic segmentation network, was used for building extraction in this paper [43]. This network has the typical encoder-decoder structure of semantic segmentation networks. Encoder ResNet34 [44] pretrained on ImageNet [45] ensured the effectiveness and stability of feature extraction. In the center part of the network, the dilated convolution layer [46] can enlarge the receptive field and capture multiresolution context information. For the decoder part, transposed convolution layers [47] are used for upsampling to restore the original image resolution. In this study, 26 sample patches with a size of 1000 * 1000 pixels were selected for network training. As far as possible, the samples should include various types of buildings in the study area. As shown in Figure 3, buildings are marked as targets, others are marked as backgrounds, and building extraction is a binary semantic segmentation task. The accurate extraction of buildings based on high-resolution images and deep learning can provide convenience for the subsequent calculation of building-level metrics. The results of building extraction from semantic segmentation need to be vectorized in order to calculate the perimeter and area attributes of each building vector patch. Then, taking TAZ as the statistical unit, six indexes (Table 2) of the area and edge of buildings were calculated as the building-level metrics of each TAZ.  The accurate extraction of buildings based on high-resolution images and deep learning can provide convenience for the subsequent calculation of building-level metrics. The results of building extraction from semantic segmentation need to be vectorized in order to calculate the perimeter and area attributes of each building vector patch. Then, taking TAZ as the statistical unit, six indexes (Table 2) of the area and edge of buildings were calculated as the building-level metrics of each TAZ. living needs. Their composition structure and spatial distribution pattern are different from those of other ecosystems and landscapes [48]. In landscape ecology, the landscape index is used to describe all aspects of landscape structure. The landscape index refers to some simple quantitative indicators that can highly concentrate landscape pattern information and reflect aspects of landscape structure composition and spatial configuration. The characteristics of landscape patterns can be reflected in three levels: individual patch-level metrics, class-level metrics, and landscape mosaic-level metrics [49]. Since individual patch-level metrics are not of high explanatory value to understand the structure of the whole landscape, they are often used only as the basis for calculating other landscape metrics. Therefore, this study mainly focuses on class-level metrics and landscape mosaic-level metrics.
In this study, the professional landscape index software FRAGSTATS [50] was used to analyze the landscape pattern. Since landscape measurement is usually based on classified raster data, the calculation of the landscape index starts from the classification of land cover. With the improvement of the spatial resolution of remote sensing images, the phenomenon in which the same object has different spectra and different objects have the same spectrum has emerged. Compared with the pixel-based remote sensing interpretation method, geographic object-based image analysis (GEOBIA) has become a more reasonable land cover interpretation method [51,52]. The fractal net evolution approach (FNEA) is one of the common methods in GEOBIA [53,54]. By considering both spectrum heterogeneity and shape heterogeneity, the algorithm determines whether the merging between adjacent objects is terminated to obtain the final image segmentation result. In this paper, FNEA was used to classify the land cover types (impervious, water, vegetation, and barren soil) of GF-2 images. Due to the high correlation between some landscape indexes, it is necessary to analyze the correlation in the process of index selecting [14]. After taking into account the correlation between indexes and the information content, 20 landscape indexes were selected to quantify the landscape metrics based on the results of land cover classification, as shown in Table 3. Among them, there are 8 landscape mosaic-level indexes (beginning with "L_") and 12 class-level indexes (beginning with "C_"). Table 3. Landscape metrics of TAZs.

C_AI_W
Aggregation index shows the frequency of different pairs of patch types appearing side by side on the map. AI_W is the aggregation index of the water patches. C_AI_I AI_I is the aggregation index of the impervious surface patches. C_AI_B AI_B is the aggregation index of the barren soil patches.

Semantic Metrics
Landscape metrics describe more of the visual characteristics of urban functional zones, but the social and economic characteristics of functional zones are also worth exploring. POIs have become a common semantic information source to represent land use patterns. The topic model is a type of probability and statistics model that uses unsupervised learning to mine the hidden semantic structure of the corpus for clustering analysis [55]. Topic models are mainly used in semantic analysis and text mining in natural language processing (NLP), such as text representation, dimensionality reduction, and classification based on text topics. Latent Dirichlet allocation (LDA) is a common topic model [56]. In recent years, some studies have introduced this NLP technology into urban research to promote the study of urban functional zones [16,28,57]. In this paper, we use LDA to represent the semantic metrics of urban functional zones. As shown in Figure 4a,b, analogically speaking, the set of all POIs in the study area corresponds to the corpus, the functional zone unit corresponds to the document, and the POI in the functional unit corresponds to the word in the document. The process of semantic metric calculation corresponds to the process of document topic inference.  : (a,b) represent the analogy between the semantic calculation of the functional zone and the process of text topic inference; (c) is the flow chart of the topic model. In subgraph (c), "w"represents "word", "fq" represents "word frequency", "θ" represents the distribution of words in a topic. "ID" corresponds to the unique identification of each word. "LDA" is short for "Latent Dirichlet allocation". The topic inference process is shown in Figure 4c. Specifically, gensim, a natural language processing library, was used to calculate semantic measures in this paper [58]. All the secondary classifications of POIs were aggregated into the corpus according to TAZs. The ID of each word in the corpus was encoded, and each TAZ was represented as a vector in the form of a bag-of-words through the doc2bow model to train the LDA topic model and infer the TAZ functional topic vector. According to the topic coherence and the actual situation of the study area, we set the number of topics in the LDA model to 4.

Human Activity Metrics
Human activity is one of the most important characteristics of functional zones. The population density of different functional zones changes over time in a day. The RTU data used in this paper come from the location-based service data of users on all platforms of Tencent, including QQ instant messaging, WeChat, and Tencent Map [14]. The abundant user data ensure the validity and accuracy of human activity metrics. RTU data are a kind of vector data of point elements, which represent the level of human activity intensity. The space interval between adjacent points is about 25 m. In order to directly reflect the distribution of discrete human activity data in the continuous space of the study area and quantify the changes of human activity intensity over time in the functional zones, RTU data of 10:00, 15:00, and 22:00 (three typical time intervals) on weekdays and weekends were selected for kernel density analysis, and TAZs were used as the basic units to conduct zonal statistics on the results of kernel density analysis. In this paper, the maximum and average of human activity density in each TAZ unit were calculated to quantify the human activity metrics. Kernel density analysis and zonal statistics can both be completed in the software Arcgis 10.2. The visualization results of the mean value of human activity density are shown in Figure 5. Compared with specific values, Figure 5 focuses more on the distribution of the relative high and low of human activity density in spatial and temporal dimensions. In order to show the difference distribution of human activity density more clearly, we unify the legend value range of the subfigures, so that the relative change of human activity intensity with space and time can be seen at a glance. Figure 5(1)- (6) show the change of the mean value of human activity density over time in the whole study area and Figure 5 1-6) show the distribution of human activity density at three typical moments in the whole study area. (7) shows the locations of a typical commercial zone (a) and a typical residential zone (b). (8)(9)(10)(11)(12)(13) are the enlarged details of human activity density of (a,b).

Classification Based on Ensemble Learning with Active Sample Labeling
The data preparation of the four metrics mentioned above is the data basis of functional zone identification. At the same time, an excellent classifier is also an important factor to ensure the accuracy of supervised classification. Specifically, this paper uses an ensemble learning method to classify functional zones. Ensemble learning, also known as committeebased learning or multiple classifier systems, usually trains multiple learners and combines them to solve a problem [59]. The ensemble of a group of classifiers produces a more accurate prediction than the optimal single classifier [60]. More specifically, the XGBoost algorithm [61] is applied in this study as an excellent representative of ensemble learning. The XGBoost algorithm is helpful for identifying functional zones due to the following aspects: first, the generalization ability and robustness of multiclassifier combinations are stronger; second, XGBoost supports shrinkage and column subsampling tricks, which can effectively prevent overfitting and reduce the amount of calculation; third, XGBoost can effectively deal with sparse data, which are suitable for case studies.
How to ensure the accuracy and reduce the cost of labeling are problems that supervised learning has been facing, especially when the small-scale experimental methods are extended to a large-scale practical application. As an iterative sampling method, active learning [62][63][64] can effectively meet our requirement of reducing manual labeling. A small number of representative samples are selected to train the initial classifier. By using the query function, the samples with sufficient information are selected to label, which provides new knowledge for the next round of classifier training. Active learning adjusts training samples iteratively until the requirements are met. In this paper, active learning uses a query function based on information entropy to select a certain proportion of confusing samples with large uncertainty to participate in classifier training. The calculation method of sample information entropy is shown in Equation (1).
In Equation (1), I represents the total number of functional zone categories. P(C i | S n ) represents the probability that sample S n is classified into category C i (1 ≤ i ≤ I) by the current classifier. Entropy (S n ) is the information entropy of sample S n .
The process of active learning is shown in Figure 6a. The query function selects confusing samples with high information entropy for manual labeling and adds them to the sample set to make the classifier better by iteration. Figure 6b shows the schematic diagram of the distribution of samples in the feature space. Compared with random selection, the samples located near the boundary of the classification hyperplane have a greater impact on the final classification boundary. It is more conducive to balance the accuracy and labeling cost to select these samples first. The data preparation of the four metrics mentioned above is the data basis of functional zone identification. At the same time, an excellent classifier is also an important factor to ensure the accuracy of supervised classification. Specifically, this paper uses an ensemble learning method to classify functional zones. Ensemble learning, also known as committee-based learning or multiple classifier systems, usually trains multiple learners and combines them to solve a problem [59]. The ensemble of a group of classifiers produces a more accurate prediction than the optimal single classifier [60]. More specifically, the XGBoost algorithm [61] is applied in this study as an excellent representative of ensemble learning. The XGBoost algorithm is helpful for identifying functional zones due to the following aspects: first, the generalization ability and robustness of multiclassifier combinations are stronger; second, XGBoost supports shrinkage and column subsampling tricks, which can effectively prevent overfitting and reduce the amount of calculation; third, XGBoost can effectively deal with sparse data, which are suitable for case studies.
How to ensure the accuracy and reduce the cost of labeling are problems that supervised learning has been facing, especially when the small-scale experimental methods are extended to a large-scale practical application. As an iterative sampling method, active learning [62][63][64] can effectively meet our requirement of reducing manual labeling. A small number of representative samples are selected to train the initial classifier. By using the query function, the samples with sufficient information are selected to label, which provides new knowledge for the next round of classifier training. Active learning adjusts training samples iteratively until the requirements are met. In this paper, active learning uses a query function based on information entropy to select a certain proportion of confusing samples with large uncertainty to participate in classifier training. The calculation method of sample information entropy is shown in Equation (1).
In Equation (1), represents the total number of functional zone categories. | ) represents the probability that sample is classified into category (1 ≤ ≤ ) by the current classifier.
) is the information entropy of sample .
The process of active learning is shown in Figure 6a. The query function selects confusing samples with high information entropy for manual labeling and adds them to the sample set to make the classifier better by iteration. Figure 6b shows the schematic diagram of the distribution of samples in the feature space. Compared with random selection, the samples located near the boundary of the classification hyperplane have a greater impact on the final classification boundary. It is more conducive to balance the accuracy and labeling cost to select these samples first.

Sentiment Analysis of Functional Zones
Urban functional zones are closely related to human activities, and human activities are inevitably accompanied by the generation and expression of emotions. This paper provides a new perspective for the understanding of functional zones by analyzing the sentiment of location-based Weibo check-in data. The Baidu AI open platform [65] was used to analyze emotion in Weibo data and explore the emotional polarity of functional zones in the study area. Specifically, the core emotional analysis module, Sentiment Knowledge Enhanced Pretraining (SKEP) [66], which achieved new state-of-the-art performance on typical specific sentiment analysis tasks, was used in this study. SKEP can distinguish the emotional polarity categories (positive, negative, and neutral) of texts containing subjective opinion information and provide the corresponding confidence level. When the probability of positive emotion is less than 45%, it is regarded as negative polarity; if it is greater than 55%, it is regarded as positive polarity; otherwise, it is considered neutral. Through the emotional polarity analysis of the Weibo check-in data and the reorganization and statistics according to the functional zone units, the sentiment distribution of each functional zone can be understood. This part of the experiment is based on the Baidu PaddlePaddle open-source framework and Senta sentiment analysis system.

Classification Results of Urban Functional Zones
For the functional zone samples, 60% were divided into a training set and 40% were divided into a testing set. The actual functional categories of these samples were identified by experienced analysts based on high-resolution remote sensing images, Internet maps, and field surveys. The results of functional zone identification of the whole research area are shown in Figure 7. From the results, the urban green space zones in the north and northwest of the study area show a certain concentration trend, corresponding to the Old Summer Palace, the Summer Palace, Yuquan Park, and the golf club. Some public service zones are gathered in the northeast, such as Tsinghua University, Peking University, Beijing Forestry University, Beijing Language and Culture University, China University of Geosciences (Beijing), etc. There are more residential zones in the south part of the study area. The commercial zones show the pattern of overall dispersion and local aggregation. Urban functional zones are closely related to human activities, and human activities are inevitably accompanied by the generation and expression of emotions. This paper provides a new perspective for the understanding of functional zones by analyzing the sentiment of location-based Weibo check-in data. The Baidu AI open platform [65] was used to analyze emotion in Weibo data and explore the emotional polarity of functional zones in the study area. Specifically, the core emotional analysis module, Sentiment Knowledge Enhanced Pretraining (SKEP) [66], which achieved new state-of-the-art performance on typical specific sentiment analysis tasks, was used in this study. SKEP can distinguish the emotional polarity categories (positive, negative, and neutral) of texts containing subjective opinion information and provide the corresponding confidence level. When the probability of positive emotion is less than 45%, it is regarded as negative polarity; if it is greater than 55%, it is regarded as positive polarity; otherwise, it is considered neutral. Through the emotional polarity analysis of the Weibo check-in data and the reorganization and statistics according to the functional zone units, the sentiment distribution of each functional zone can be understood. This part of the experiment is based on the Baidu Pad-dlePaddle open-source framework and Senta sentiment analysis system.

Classification Results of Urban Functional Zones
For the functional zone samples, 60% were divided into a training set and 40% were divided into a testing set. The actual functional categories of these samples were identified by experienced analysts based on high-resolution remote sensing images, Internet maps, and field surveys. The results of functional zone identification of the whole research area are shown in Figure 7. From the results, the urban green space zones in the north and northwest of the study area show a certain concentration trend, corresponding to the Old Summer Palace, the Summer Palace, Yuquan Park, and the golf club. Some public service zones are gathered in the northeast, such as Tsinghua University, Peking University, Beijing Forestry University, Beijing Language and Culture University, China University of Geosciences (Beijing), etc. There are more residential zones in the south part of the study area. The commercial zones show the pattern of overall dispersion and local aggregation. The confusion matrix is shown in Table 4. From the confusion matrix, it can be seen that the overall recognition accuracy of the functional zones in the study area reaches 82.37%, which shows the effectiveness of the method in this paper. As for the specific categories of functional zones, residential zones and urban green space zones have higher user accuracy and producer accuracy. Of real residential zones, 89.68 percent are accurately identified and the user's accuracy of green space zones is 100%. However, some of the public service zones are mistakenly divided into residential zones. After careful analysis, it is found that some administrative agencies have family dormitory buildings in their blocks, which are the supporting facilities of administrative agencies. The existence of these family dormitory buildings leads to the incorrect classification of the public service zones.

Portraits of Typical Functional Zones
In previous studies, the study of functional zones often stopped at the recognition of their categories. However, categories are only the basic description of functional zones and cannot establish the overall understanding of functional zones. Similar to other cognitive processes, functional zones need to be described from multiple perspectives to establish their overall concept. As shown in Figure 8, four blocks are selected as representative of typical functional zones. Blocks a, b, c, and d correspond to commercial zones, residential zones, public service zones, and urban green space zones, respectively. Combined with the above experimental results, word clouds (due to the limited number of POIs in a single block, the word clouds in Figure 8 are generated by all POIs of the corresponding functional zone categories), pie charts of land cover categories, and block sentiment analysis results are identified to describe a functional zone. From the word clouds, it can be seen that there is a common phenomenon of mixed function in each category of functional zones. The size of the font in the word clouds represents the frequency with which the category of POI emerges. In commercial zones, residential zones, and urban green space zones, the main categories of POIs (companies, residential areas, scenic spots) have absolute quantity superiority. However, for public service zones, many government agencies, scientific research institutions, and universities contain a large number of family areas or dormitories, resulting in a proportion of the word "residential" in the word cloud. At the same time, subsidiary companies or affiliated companies of some institutions also appear in public service zones, which makes the degree of mixing of functional areas more serious. However, it is obvious that compared with the word clouds of other categories of functional zones, "government", "scientific", "institution", and "dormitory" emerged more frequently. In addition to the word clouds, the typical functional zone portraits in Figure 8 show the land cover type proportion map of the current TAZ and the results of sentiment analysis. Since none of the four selected blocks contains water, the pie charts show only three types of land cover (vegetation, barren soil, impervious surface). The results of sentiment analysis show the emotional tendency of Weibo check-in records in each TAZ and the proportion of various emotional tendencies (positive, negative, neutral). The portraits of functional zones describe the features of functional zones from multiple perspectives so that people can clearly understand the characteristics of a certain functional zone. In the future, with the increasing abundance and availability of multisource data, the aspects of functional zone description can also be increased.

Comparison of Different Metric Sets and Methods
In this paper, four metrics are selected to describe the functional zones from different aspects. A set of experiments, shown in Table 5, were designed to verify the necessity of selecting each metric. The experimental results show that when the building-level metrics, the landscape metrics, the semantic metrics, and the human activity metrics are removed in turn, the overall classification accuracy of the ensemble model decreases by approximately 6.5%, 9.0%, 9.0%, and 7.2%, respectively. The results show that any one of the four metrics contributes to improving the accuracy of functional zone recognition. In addition, the comparison between ensemble model (XGBoost) and non-ensemble model (DecisionTree) is also designed in the experiment. Unlike the DecisionTree model which uses a single tree model to simulate the mapping relationship between samples' features and categories, XGBoost combines a series of basic tree models to get an enhanced classifier, which has better results for functional zone recognition. As shown in Table 5, the overall classification accuracy of the decision tree model is 67.75%, which is much lower than the 82.37% of XGBoost.  Figure 9 shows the ranking result of the importance of all features. Among the 42 features, the top 10 features involve each of the four metrics, which confirm the conclusion mentioned above that the four metrics are closely related to the identification of functional zones. In the top 10 features, there are four features of human activity metrics and two features of semantic metrics. Their high ranking is in line with the expectation that functional zones are closely related to land use and human behavior. In building-level and landscape metrics, features named "BA_P" (percentage of building area), "C_SHAPE_MN_V" (mean shape index of the vegetation patches), "L_PD" (patch density of all patches), and "C_AREA_MN_B" (mean area of the barren soil patches in the parcel) are in the top 10. These features fully show that the proportion, shape, area, and aggregation degree of different land cover types are the external representation of functional zones.  Figure 9 shows the ranking result of the importance of all features. Among the 42 features, the top 10 features involve each of the four metrics, which confirm the conclusion mentioned above that the four metrics are closely related to the identification of functional zones. In the top 10 features, there are four features of human activity metrics and two features of semantic metrics. Their high ranking is in line with the expectation that functional zones are closely related to land use and human behavior. In building-level and landscape metrics, features named "BA_P" (percentage of building area), "C_SHAPE_MN_V" (mean shape index of the vegetation patches), "L_PD" (patch density of all patches), and "C_AREA_MN_B" (mean area of the barren soil patches in the parcel) are in the top 10. These features fully show that the proportion, shape, area, and aggregation degree of different land cover types are the external representation of functional zones.

Efficiency and Accuracy of Active Sampling Method
Supervised classification usually uses labeled samples to train classifiers. Different from the small-scale research area experiment, when the research method is extended to the application of large-scale functional zone recognition, in addition to the well-designed classifier, the sampling method also has higher requirements. The process of sample labeling requires considerable time and labor, which is an unavoidable problem in machine learning and deep learning. The model needs to be fed a large number of samples to become stable and robust. In small-scale experiments, satisfactory accuracy can be obtained by completely labeling the training samples, but large-scale functional zone identification needs to balance accuracy and cost. Therefore, this paper added the experiment of active

Efficiency and Accuracy of Active Sampling Method
Supervised classification usually uses labeled samples to train classifiers. Different from the small-scale research area experiment, when the research method is extended to the application of large-scale functional zone recognition, in addition to the well-designed classifier, the sampling method also has higher requirements. The process of sample labeling requires considerable time and labor, which is an unavoidable problem in machine learning and deep learning. The model needs to be fed a large number of samples to become stable and robust. In small-scale experiments, satisfactory accuracy can be obtained by completely labeling the training samples, but large-scale functional zone identification needs to balance accuracy and cost. Therefore, this paper added the experiment of active learning sample selection. Forty percent of the samples were considered as testing samples. In the rest of the samples, only 10% of the representative samples with obvious features of functional zones were manually labeled for training basic classifiers at first. Then the samples with larger information entropy relative to the current classifier were selected by an iteration method for active annotation, and the classifier was updated. Three groups of experiments were designed to verify the effectiveness of active learning. Samples (10%, 25%, and 40%) were selected to update the classifier. Compared with the classifier trained by the same proportion of random samples, the classifier trained by active selection samples shows better accuracy, as shown in Table 6. The experimental results show that, in three groups of experiments with different proportions of samples, the sample selection method based on active learning can perform better with the same manual labeling burden, which can provide a strategy to balance the recognition accuracy and labeling cost in the process of large-scale functional zone recognition. The targeted sample selection method can quickly meet certain accuracy requirements. However, it must be noted that there is still a small gap in the results of active learning compared with the way that training samples are all labeled. The purpose of active learning for sample selection is to take into account both the labeling burden and the recognition accuracy while identifying the urban functional zones on a large scale.

Sentiment Analysis of Urban Functional Zones
Functional zones are the objective function distribution in a city, and the Weibo checkin data represent people's subjective emotional expression. However, the statistical results obtained from the sentiment analysis of a large amount of Weibo check-in data are the objective reflection of the functional zones in the dimension of sentiment characteristics. Table 7 shows the mean sentiment score of the four kinds of functional zones and the proportion of positive Weibo check-in data to the total number. Based on the analysis of 506,165 Weibo check-in data points in the study area, it is found that the emotional tendencies of the four functional zones are all positive. In comparison, the statistical results of sentiment analysis in commercial zones, residential zones, and public service zones are not significantly different. Table 7. The statistical results of sentiment analysis in the functional zone: "mean" represents the mean value of the sentiment score, and "proportion" represents the proportion of positive sentiment Weibo check-in records. Through careful observation of the check-in records, it is found that most of the specific Weibo check-in records have obvious positive or negative emotional tendencies. These check-in records reflect different topics or relate different events. However, when taking a TAZ as the spatial statistical unit and a whole year as the time statistical unit, these obvious emotional tendencies are averaged. Even so, urban green space has a significantly higher score and proportion of positive sentiment. The results show that urban green space plays a role in promoting people's positive emotions. Parks, lakesides, and gardens provide people with space for relaxation, leisure, and entertainment. This finding suggests that in urban built-up areas where the functional pattern is difficult to change, the development of new types of urban green space, such as roof greening and garden-type commercial blocks, can be regarded as an effective way to relieve daily pressure and improve people's happiness index.

Limitations and Possible Improvements
In this paper, the building-level metrics are calculated based on the semantic segmentation results of high-resolution images, including the perimeter, area, and other statistical information of buildings in TAZs. This method can extract building information quickly, accurately, and widely. However, due to the limitation of data sources, the features of building metrics are still two-dimensional. In fact, the height and number of floors of a building are related to its functional types. For example, residential and office buildings in cities generally have more floors, while shopping malls and supermarkets have fewer floors. High-precision Light Detection and Ranging (LiDAR) data [67] and oblique photogrammetry modeling technology [68,69] can be used in the future to provide more 3D information of buildings.
The distribution difference of human activity intensity in spatial and temporal dimensions is an important metric to distinguish different functional zones. The RTU data used in this paper reflect the spatial distribution of the activity intensity levels of Tencent users. Although the Tencent platform has a wide range of users in China, its platform applications are more focused on providing instant messaging and social network services. There is a certain age bias among its user groups. Compared with the elderly group, the young and middle-aged user groups are larger. This kind of user age bias is also limited by the data source. The mobile phone signaling data can relieve the age bias of users to a certain extent. It is conceivable that in the near-future era of the Internet of Things, new sensors can provide more comprehensive population density or human activity intensity data.

Conclusions
This paper proposes a framework to quantitatively describe functional zones, identify functional zone categories, and draw functional zone portraits. With the characteristics of repeatability, comprehensiveness, and large-scale implementation, it describes the features of functional zones from multiple perspectives and realizes the accurate identification of functional zones based on the supervised ensemble learning algorithm XGBoost. Within the Fifth Ring Road, Haidian District, Beijing, the study area of this paper, experimental results fully prove the effectiveness of the method proposed in this paper. Based on the classification of functional zones, the method of sample labeling is optimized and the portraits of typical functional zones are drawn.
The contribution of this study is mainly concentrated in three aspects. First, this paper integrates multisource heterogeneous data and an ensemble learning method to effectively identify the categories of urban functional zones, and the overall recognition accuracy reaches 82.37%. Based on VHR remote sensing images, Internet map POI data, and realtime human activity density data, this study uses deep learning technology, the landscape ecological index calculation method, the NLP theme model, and the kernel density analysis method to sort out and calculate four kinds of metrics: building-level metrics, landscape metrics, semantic metrics, and human activity metrics. These four metrics describe urban functional zones comprehensively and lay a good foundation for the category identification. Compared with a single classifier method (random forest), the ensemble learning method (XGBoost) improves the overall classification accuracy by 14.62%. Second, considering the burden of sample labeling when expanding the method of functional zone identification from a small experimental area to a large-scale application, this paper proposes a sample labeling method based on active learning to balance the accuracy of identification and the cost of sample labeling. Compared with the random sampling method, it is found that the sampling method based on active learning effectively improves the identification accuracy of the functional zone with the same sample labeling burden. Third, this paper goes beyond the classification of functional zones by adding sentiment analysis, word clouds, and land cover proportion maps to the portraits of functional zones, making the image of functional zones clear at a glance. The functional zone portraits show the characteristics of the functional zones from multiple perspectives. Even if there are supplements of new source data in the future, the analysis results can be added to the portraits, so as to increase new perspectives of people's cognition of functional zones.
The research framework of this paper involves the construction of a metrics system, sample labeling, ensemble learning, and category portraits. This framework can be transferred to some supervised classification scenarios. For example, land use and land cover classification. However, it is necessary to adjust the data source and metrics system according to the application scenarios to ensure good classification results.
The next step of our study will focus on how to integrate urban traffic big data into the research of urban functional zones to analyze the impact range of functional zones. Furthermore, it can also analyze the origin and destination of the people flow in the functional zones, and the results can help companies to reasonably arrange off-peak commuting, or help shopping malls to accurately formulate shopping shuttle routes.