Comparison of Approaches for Urban Functional Zones Classification Based on Multi-Source Geospatial Data: A Case Study in Yuzhong District, Chongqing, China

Accurate and timely classification and monitoring of urban functional zones prove to be significant in rapidly developing cities, to better understand the real and varying urban functions of cities to support urban planning and management. Many efforts have been undertaken to identify urban functional zones using various classification approaches and multi-source geospatial datasets. The complexity of this category of classification poses tremendous challenges to these studies especially in terms of classification accuracy, but on the opposite, the rapid development of machine learning technologies provides us with new opportunities. In this study, a set of commonly used urban functional zones classification approaches, including Multinomial Logistic Regression, K-Nearest Neighbors, Decision Tree, Support Vector Machine (SVM), and Random Forest, are examined and compared with the newly developed eXtreme Gradient Boosting (XGBoost) model, using the case study of Yuzhong District, Chongqing, China. The investigation is based on multi-variate geospatial data, including night-time imagery, geotagged Weibo data, points of interest (POI) from Gaode, and Baidu Heat Map. This study is the first endeavor of implementing the XGBoost model in the field of urban functional zones classification. The results suggest that the XGBoost classification model performed the best and was able to achieve an accuracy of 88.05%, which is significantly higher than the other commonly used approaches. In addition, the integration of night-time imagery, geotagged Weibo data, POI from Gaode, and Baidu Heat Map has also demonstrated their values for the classification of urban functional zones in this case study.


Introduction
In recent years, most cities focus on classifying land use/land cover (LULC) based on remote sensing (RS) satellite images, which is costly and lacks timely update. Urban functional zones detection, as an effective way to understand the urban space and the interaction between human activities and the environment, is seldom conducted by the government due to limited budgets and manpower [1,2]. Meanwhile, diverse and complex urban functional zones have also been formed and transformed continuously, in order to meet people's increasing social and economic needs as a result of rapid urbanization [3,4]. Thus, the demand for up-to-date urban function information is becoming increasingly crucial, because it is the basis to capture human behavior patterns of a city, and then to effectively inform urban management with respect to traffic control, energy recycling, and emergency management [5][6][7]. on trajectory data, Xiao, Wang, Fu, and Wu [36] compared different tree-based ensemble models and traditional method, and it was noted that eXtreme Gradient Boosting (XGBoost) was able to achieve the highest classification accuracy. Moreover, although many machine learning classification methods have been used for urban functional zones, XGBoost, as a newly developed machine learning method, has not been applied in the field of urban functional zones classification from existing literature. To bridge the research gap, this research aims to compare XGBoost with other commonly used classification methods, including Logistic Regression, K-Nearest Neighbours, Decision Tree, Support Vector Machines and Random Forest, in the field of urban functional zones classification through the case study in Yuzhong, Chongqing, China based on the multi-source geospatial datasets, including nightlight imagery, social media records, POI and Baidu Heat Map. We believe that this research can significantly help other urban functional zones classification applications while enough datasets are available and the target research context is similar as the case study in this research. The remainder of this paper is divided into four sections. Section 2 briefly describes the research area and how the geospatial datasets are selected. Sections 3 and 4 introduce the methodology and the results, and lastly the discussion and conclusion are included in Section 5.

Study Area
The Yuzhong District situated in Chongqing, China with an area of 23.71 km 2 was chosen as our research area ( Figure 1). This district is characterized by the highest population density in the Chongqing Municipality. With a population of 657,200 in 2016 and the GDP in 2016 of CNY 105.021 billion, this region is essential for the economic development of Chongqing. It is also the most important political, cultural, and commercial circulation center of Chongqing. The urban structures of the Yuzhong District are of high complexity due to the acute contradiction between land resources and ecological environment, population as well as economic growth. On the other hand, apparently the higher accuracy of classification is always desirable for classifications. To find a more accurate method for inferring hybrid transportation modes based on trajectory data, Xiao, Wang, Fu, and Wu [36] compared different tree-based ensemble models and traditional method, and it was noted that eXtreme Gradient Boosting (XGBoost) was able to achieve the highest classification accuracy. Moreover, although many machine learning classification methods have been used for urban functional zones, XGBoost, as a newly developed machine learning method, has not been applied in the field of urban functional zones classification from existing literature. To bridge the research gap, this research aims to compare XGBoost with other commonly used classification methods, including Logistic Regression, K-Nearest Neighbours, Decision Tree, Support Vector Machines and Random Forest, in the field of urban functional zones classification through the case study in Yuzhong, Chongqing, China based on the multi-source geospatial datasets, including nightlight imagery, social media records, POI and Baidu Heat Map. We believe that this research can significantly help other urban functional zones classification applications while enough datasets are available and the target research context is similar as the case study in this research. The remainder of this paper is divided into four sections. Section 2 briefly describes the research area and how the geospatial datasets are selected. Section 3 and 4 introduce the methodology and the results, and lastly the discussion and conclusion are included in Section 5.

Study Area
The Yuzhong District situated in Chongqing, China with an area of 23.71 km 2 was chosen as our research area (Figure 1). This district is characterized by the highest population density in the Chongqing Municipality. With a population of 657,200 in 2016 and the GDP in 2016 of CNY 105.021 billion, this region is essential for the economic development of Chongqing. It is also the most important political, cultural, and commercial circulation center of Chongqing. The urban structures of the Yuzhong District are of high complexity due to the acute contradiction between land resources and ecological environment, population as well as economic growth.

Night-Time Light Imagery
In this research, nighttime light imagery was used to map the features and distribution of the population. Launched at the end of 2011, the Visible Infrared Imaging Radiometer Suite (VIIRS) dataset has been available since April 2012. Compared with the previous widely used Meteorological Satellite Program's Operational Linescan System (DMSP/OLS), the VIIRS images have higher spatial resolution and wider radiance range, which could provide more spatial information [37]. The VIIRS data products used in this research were downloaded from National Geophysical Data Center (NDGC) (http: //ngdc.noaa.gov/eog/viirs/download_viirs_ntl.html) in 2016 with 15 arc-seconds spatial resolution in the geographic grid [38]. The night light imagery was radiometrically calibrated using robust regression [39]. To decrease error on geo-referencing, this dataset is transformed into the Universal Transverse Mercator projection system.

Social Media Data
Geo-located social media records are appropriate and commonly used to detect population activity, because check-in location data are usually recorded when users stay in a location for a period as they engage in social media activities. For example, geo-located social media like Weibo microblogs, a social platform similar to Twitter, allow users to post messages each containing fewer than 140 words [40]. Location-based spatio-temporal social media data from Sina Weibo were obtained by large-scale crawler technology. Weibo data can be obtained using Application programming interface (API) from the website ( http://open.weibo.com/wiki/位置服务 ). In addition, the Sina Weibo users are mainly young and middle-aged people. Altogether, a dataset of more than 444,000 check-in observations in 2016 were collected through API for this research. The collected Weibo check-in data contain longitude and latitude. After preprocessing, ESRI ArcGIS 10.5 was used to transform the check-in data are from .csv format to point .shp format. Then, the fishnet function of ArcGIS10.5 was used to divide the Yuzhong District of Chongqing into the grid before the number of check-in data in each grid could be calculated. In this research, the density of Weibo check-in datasets in 2016 was then calculated in Yuzhong District ( Figure 2) for further experiments. In this research, nighttime light imagery was used to map the features and distribution of the population. Launched at the end of 2011, the Visible Infrared Imaging Radiometer Suite (VIIRS) dataset has been available since April 2012. Compared with the previous widely used Meteorological Satellite Program's Operational Linescan System (DMSP/OLS), the VIIRS images have higher spatial resolution and wider radiance range, which could provide more spatial information [37]. The VIIRS data products used in this research were downloaded from National Geophysical Data Center (NDGC) (http://ngdc.noaa.gov/eog/viirs/download_viirs_ntl.html) in 2016 with 15 arc-seconds spatial resolution in the geographic grid [38]. The night light imagery was radiometrically calibrated using robust regression [39]. To decrease error on geo-referencing, this dataset is transformed into the Universal Transverse Mercator projection system.

Social Media Data
Geo-located social media records are appropriate and commonly used to detect population activity, because check-in location data are usually recorded when users stay in a location for a period as they engage in social media activities. For example, geo-located social media like Weibo microblogs, a social platform similar to Twitter, allow users to post messages each containing fewer than 140 words [40]. Location-based spatio-temporal social media data from Sina Weibo were obtained by large-scale crawler technology. Weibo data can be obtained using Application programming interface (API) from the website (http://open.weibo.com/wiki/位置服务). In addition, the Sina Weibo users are mainly young and middle-aged people. Altogether, a dataset of more than 444, 000 check-in observations in 2016 were collected through API for this research. The collected Weibo check-in data contain longitude and latitude. After preprocessing, ESRI ArcGIS 10.5 was used to transform the check-in data are from .csv format to point .shp format. Then, the fishnet function of ArcGIS10.5 was used to divide the Yuzhong District of Chongqing into the grid before the number of check-in data in each grid could be calculated. In this research, the density of Weibo check-in datasets in 2016 was then calculated in Yuzhong District ( Figure 2) for further experiments.

POI data
The POI data used in this study were obtained through API (a free HTTP interface) provided by Gaode online map (http://lbs.amap.com), a famous web mapping and location-based services provider with over 100 million daily active users in China. The POI data are crawled in JSON format

POI data
The POI data used in this study were obtained through API (a free HTTP interface) provided by Gaode online map (http://lbs.amap.com), a famous web mapping and location-based services provider with over 100 million daily active users in China. The POI data are crawled in JSON format and have plenty of information including ID, name, address, type, longitude, and latitude of the POI. After processing, the categories of POI data are combined into commercial, residential, transportational, educational and cultural POI data. The dataset processed in the study comprises more than 33,000 POI data in 2016 ( Figure 3). and have plenty of information including ID, name, address, type, longitude, and latitude of the POI. After processing, the categories of POI data are combined into commercial, residential, transportational, educational and cultural POI data. The dataset processed in the study comprises more than 33,000 POI data in 2016 ( Figure 3).

Baidu Heat Map
Baidu Heat Map was launched by Baidu based on the real-time location data derived from smartphones with Baidu Maps and other apps [41]. Baidu Map has over 200 million users and processes 3 million position requests per day, which makes it reliable and effective to help detect urban population mobility ( Figure 4). After geocoding, the Baidu Heat Map used in this study was converted into raster format. The heat values between 0 and 1 were assigned to different places. The closer the value is to 1, the high the population concentration and vice versa.

Methodology
In this study, the XGBoost method was employed and compared with other commonly used methods to classify urban functional zones. The framework of our methodology can be seen in Figure   Figure 3. Density of POI data in the study area.

Baidu Heat Map
Baidu Heat Map was launched by Baidu based on the real-time location data derived from smartphones with Baidu Maps and other apps [41]. Baidu Map has over 200 million users and processes 3 million position requests per day, which makes it reliable and effective to help detect urban population mobility ( Figure 4). After geocoding, the Baidu Heat Map used in this study was converted into raster format. The heat values between 0 and 1 were assigned to different places. The closer the value is to 1, the high the population concentration and vice versa. and have plenty of information including ID, name, address, type, longitude, and latitude of the POI. After processing, the categories of POI data are combined into commercial, residential, transportational, educational and cultural POI data. The dataset processed in the study comprises more than 33,000 POI data in 2016 ( Figure 3).

Baidu Heat Map
Baidu Heat Map was launched by Baidu based on the real-time location data derived from smartphones with Baidu Maps and other apps [41]. Baidu Map has over 200 million users and processes 3 million position requests per day, which makes it reliable and effective to help detect urban population mobility ( Figure 4). After geocoding, the Baidu Heat Map used in this study was converted into raster format. The heat values between 0 and 1 were assigned to different places. The closer the value is to 1, the high the population concentration and vice versa.

Methodology
In this study, the XGBoost method was employed and compared with other commonly used methods to classify urban functional zones. The framework of our methodology can be seen in Figure   Figure 4. Baidu Heat Map of the study area.

Methodology
In this study, the XGBoost method was employed and compared with other commonly used methods to classify urban functional zones. The framework of our methodology can be seen in Figure 5 that comprises data pre-processing, feature selection, model construction model evaluation, and model comparison. Each module is explained in the following sections in detail. 5 that comprises data pre-processing, feature selection, model construction model evaluation, and model comparison. Each module is explained in the following sections in detail.

Multinomial Logistic Regression
Multinomial Logistic Regression is a popular and widely used linear classification method to solve multiclass problems [42]. It aims to predict the probabilities of the different possible outcomes of a dependent variable. In the algorithm of Multinomial Logistic Regression, linear predictor function ( , ) is used to calculate the probability that the observation i could have outcome k. The formula is listed as follows: where , is a regression coefficient with the explanatory variable and the outcome.

K-Nearest Neighbors
The K-Nearest Neighbors (KNN) algorithm is a common classification model based on distance measures. For two n-feature instances, for example, A = (a1, a2, a3……an), B= (b1, b2, b3…..bn ), the Euclidean distance is usually measured as Once the nearest neighbors are confirmed, the prediction then depends on the majority or distanceweighted voting [24].

Decision Tree
The Decision Tree algorithm relies on decision rules to classify new data based on the learning from training samples [43]. A couple of algorithms have been used to build a decision tree including CART (Classification and Regression Trees) and ID3 (Iterative Dichotomiser 3). Decision tree has been frequently used in land-use classification in the past decade because it is relatively easy to understand and interpret. In addition, it can also be combined with other classification methods. Nevertheless, this method is not very stable, for example, even very small changes in the input data may lead to a significant difference in the structure of the optimal decision tree [44]. In addition,

Multinomial Logistic Regression
Multinomial Logistic Regression is a popular and widely used linear classification method to solve multiclass problems [42]. It aims to predict the probabilities of the different possible outcomes of a dependent variable. In the algorithm of Multinomial Logistic Regression, linear predictor function f (n, i) is used to calculate the probability that the observation i could have outcome k. The formula is listed as follows: where β m,n is a regression coefficient with the m th explanatory variable and the n th outcome.

K-Nearest Neighbors
The K-Nearest Neighbors (KNN) algorithm is a common classification model based on distance measures. For two n-feature instances, for example, A = (a 1 , a 2 , a 3 , . . . , a n ), Once the nearest neighbors are confirmed, the prediction then depends on the majority or distance-weighted voting [24].

Decision Tree
The Decision Tree algorithm relies on decision rules to classify new data based on the learning from training samples [43]. A couple of algorithms have been used to build a decision tree including CART (Classification and Regression Trees) and ID3 (Iterative Dichotomiser 3). Decision tree has been frequently used in land-use classification in the past decade because it is relatively easy to understand and interpret. In addition, it can also be combined with other classification methods. Nevertheless, this method is not very stable, for example, even very small changes in the input data may lead to a significant difference in the structure of the optimal decision tree [44]. In addition, compared with other classification models like Random Forest, the classification result of the Decision Tree is usually less accurate.

SVM
The SVM model is non-parametric and is derived from the statistical learning theory. It normally is able to outperform traditional classifier when the training samples are small [26]. The principle of SVM model is to use an optimal hyperplane with the maximal margin to categorize new examples and it is derived by solving those constrained quadratic programming problem.
where x i ∈ R d denotes the vectors of training sample, y i ∈ [−1, 1] denotes the related class label, and K(u, v) denotes the kernel function [45]. The radial basis function (RBF) was evaluated as the kernel function in this study [46], and in the following equation, the width σ (the only free parameter) was set to 1.0.
In this study, the RBF kernel was used to address the classification problem.

Random Forest
Breiman [47] first developed Random forests (RF), a supervised classification method that consists of numerous trees generated by bootstrap samples. The RF model has the following strengths for addressing classification issues. First, the RF model is insensitive to outliers, noise, and even overtraining. Second, it is highly efficient to accept input layers whose nature is different. Third, it is able to generate layer importance measure [47]. In the RF model, the 'out-of-bag' (OOB) datasets, regarded as the holdout data before growing a tree, are the overall samples used for validation. Alternatively, the training dataset is regarded as 'in bag'. To perform the RF model, two essential parameters need to be set: One is the maximum depth of the tree, and the other is the number of trees (ntree). For Random Forest, its final predicted class is calculated by the majority vote from the single classifiers.

XGBoost
XGBoost is a highly efficient, flexible and accurate application of distributed gradient boosting system [48]. Developed to improve the model performance, XGBoost runs much faster than many other machine learning algorithms [49]. For a dataset with n labeled samples and m features is the space of regression trees. q denotes the structure of a tree with T leaves. Each f k corresponds to an independent q and weights w. The regularized objective for minimization is as follows. where l denotes the loss function and Ω denotes the regularized term. In addition,ŷ (t) i is used to calculate the i-th instance at the t-th iteration.
where δ i and h i are the first and second order gradient statistics on the loss function. Besides, other techniques are also employed in the XGBoost model to improve the classification results [49].

Evaluation and Comparison Approaches
To assess the performance of the Random Forest model trained in this study, accuracy is used as the major evaluation and comparison measures in this research. In addition, confusion matrix, AUC (area under the curve) and ROC (receiver operating characteristics) curve are also employed as evaluation and supplementary comparison metrics.
The performance measures are thus defined as: A confusion matrix is a table or a figure that is often used to show the TP, TN, FP, and FN of a classification model (see Table 1 for details). Confusion matrix, as an evaluation indicator, also shows the different performances of the classification models. Typically, it is a tabular representation showing the strengths and weaknesses of our model. In the confusion matrix, element a ij denotes the number of test class i that the classification model predicted as the class j. The diagonal elements a ii means that this is the correct predictions. ROC curve is a graph that summarizes the performance of a classification model based on different threshold settings and is generated by plotting the TP Rate against the FP Rate, and AUC represents degree or measure of separability [50].

Data Pre-Processing
The dataset collected in 2016 was used to assess the performance of models. This dataset is distributed in Yuzhong District, Chongqing Municipality, China, which contains nightlight imagery, Sina Weibo check-in data, POI data, and Baidu Heat Map. The whole Yuzhong District is divided into 100 m × 100 m pixels. After data pre-processing, 2381 grids were generated. The dataset was randomly split into a training set (80% of the data) and a testing set (20% of the data) using the scikit-learn python library. The parameters of each model have been calibrated by the function GridSearchCV from the scikit-learn python library. In addition, the ground truth data for further validation and comparison of these models are obtained through visual interpretation and field survey for each grid of the urban functional zones. Referring to Tu et al. [51], which divided the urban functional zones of Shenzhen into eight types, and based on the master plan of Chongqing (2007-2020) [52] and the field survey of Yuzhong District, the urban functional zones of Yuzhong District in this research are categorized into seven types, namely, residential functional, commercial and financial functional, transportation and parking functional zones, educational and research functional zones, cultural and entertainment functional, mixed functional, and green land and square functional zones ( Figure 6). As Yuzhong District is the economic and entertainment center of Chongqing municipality, the industrial zone only takes a very small portion of the functional zones, the industrial zone of Yuzhong District has not been considered in this study. The abbreviation of urban functional zones is shown in in this research are categorized into seven types, namely, residential functional, commercial and financial functional, transportation and parking functional zones, educational and research functional zones, cultural and entertainment functional, mixed functional, and green land and square functional zones ( Figure 6). As Yuzhong District is the economic and entertainment center of Chongqing municipality, the industrial zone only takes a very small portion of the functional zones, the industrial zone of Yuzhong District has not been considered in this study. The abbreviation of urban functional zones is shown in Table 2. The detailed experiments are shown below:  After data pre-processing, it can be detected that there are 2381 urban functional zones grids in Yuzhong District, Chongqing. There are seven kinds of functional zones, and the commercial functional zone is the majority (Figure 7). To make the results of our research reliable and stable, the testing of these classification models has been repeated for 100 times on the input datasets, and the average of classification accuracy was obtained as the final accuracy of the model for further comparison [31].

Full name Abbreviation
Residential functional zones R Commercial and financial functional zones C Transportation and parking functional zones T Educational and research functional zones E Cultural and entertainment functional zones L Mixed functional zones M Green land and square functional zones G After data pre-processing, it can be detected that there are 2381 urban functional zones grids in Yuzhong District, Chongqing. There are seven kinds of functional zones, and the commercial functional zone is the majority (Figure 7). To make the results of our research reliable and stable, the testing of these classification models has been repeated for 100 times on the input datasets, and the average of classification accuracy was obtained as the final accuracy of the model for further comparison [31].

Multinomial Logistic Regression
In the Multinomial Logistic Regression, to avoid the overfitting and underfitting of the model, parameter C was used as our regularization parameter. The function of GridSearchCV from scikitlearn python library is used to tune the parameters' penalty and C. First, a list of values of each parameter was defined, then the Grid Search function was used to calibrate these parameters. When the penalty is set to be 11 and C is set to be 59.94, the accuracy is the highest which is can also calculated to be 70.86%. The confusion matrix, ROC curves, and AUC values are presented in Figure  8.

KNN
The KNN is a supervised classification method, and its main task is to determine the K closest labeled data points. Generally, if the K is set small, the model would be complex which leads to overfitting. It means that the model memorized too much of the train datasets to predict the test datasets accurately. If the k is set too big, it means that the model is simple and would cause underfitting. To find the best value of the hyperparameter K, we ranged the value of K from 1 to 40 and calculated the related accuracy. The best K, i.e., five, was obtained by using GridSearchCV function from scikit-learn python library. The highest accuracy achieved by the KNN model was 61.22%. Figure 9 shows the confusion matrix, the ROC curves and AUC values of the classification results. It can be noted that KNN model does not perform very well in differentiating between different urban functional zones.

Multinomial Logistic Regression
In the Multinomial Logistic Regression, to avoid the overfitting and underfitting of the model, parameter C was used as our regularization parameter. The function of GridSearchCV from scikit-learn python library is used to tune the parameters' penalty and C. First, a list of values of each parameter was defined, then the Grid Search function was used to calibrate these parameters. When the penalty is set to be 11 and C is set to be 59.94, the accuracy is the highest which is can also calculated to be 70.86%. The confusion matrix, ROC curves, and AUC values are presented in Figure 8.

Multinomial Logistic Regression
In the Multinomial Logistic Regression, to avoid the overfitting and underfitting of the model, parameter C was used as our regularization parameter. The function of GridSearchCV from scikitlearn python library is used to tune the parameters' penalty and C. First, a list of values of each parameter was defined, then the Grid Search function was used to calibrate these parameters. When the penalty is set to be 11 and C is set to be 59.94, the accuracy is the highest which is can also calculated to be 70.86%. The confusion matrix, ROC curves, and AUC values are presented in Figure  8.

KNN
The KNN is a supervised classification method, and its main task is to determine the K closest labeled data points. Generally, if the K is set small, the model would be complex which leads to overfitting. It means that the model memorized too much of the train datasets to predict the test datasets accurately. If the k is set too big, it means that the model is simple and would cause underfitting. To find the best value of the hyperparameter K, we ranged the value of K from 1 to 40 and calculated the related accuracy. The best K, i.e., five, was obtained by using GridSearchCV function from scikit-learn python library. The highest accuracy achieved by the KNN model was 61.22%. Figure 9 shows the confusion matrix, the ROC curves and AUC values of the classification results. It can be noted that KNN model does not perform very well in differentiating between different urban functional zones.

KNN
The KNN is a supervised classification method, and its main task is to determine the K closest labeled data points. Generally, if the K is set small, the model would be complex which leads to overfitting. It means that the model memorized too much of the train datasets to predict the test datasets accurately. If the k is set too big, it means that the model is simple and would cause under-fitting. To find the best value of the hyperparameter K, we ranged the value of K from 1 to 40 and calculated the related accuracy. The best K, i.e., five, was obtained by using GridSearchCV function from scikit-learn python library. The highest accuracy achieved by the KNN model was 61.22%. Figure 9 shows the confusion matrix, the ROC curves and AUC values of the classification results. It can be noted that KNN model does not perform very well in differentiating between different urban functional zones.

Decision Tree
The Decision Tree model is simple to understand as well as to interpret. Different to the artificial neural network which is a black box model, the Decision Tree model can be easily explained. In addition, in contrast to other methods which need to normalize the data, create dummy variables, and remove missing values before preprocessing, the Decision Tree model needs little data preparation. The hyper-parameters of Decision Tree model include criterion, splitter, max_depth, min_samples_split, min_samples_leaf, min_weight_fraction_leaf, max_features, random_state, max_leaf_nodes, min_impurity_decrease, min_impurity_split, class_weight, presort. If new data is constantly introduced in the Decision Tree model, the overfitting is most likely to occur. Hence, suitable parameters are needed to stop the recursive splitting process to prevent overfitting. One of the most important parameters is the max_depth, which shows the maximum depth of the tree in the model. While the tree becomes deeper, it will split more to collect more information on the data. As a result, the maximum depth of the tree was tuned in the decision tree. The GridSearchCV function from scikit-learn python library was used to tune the model. When maximum depth was set to nine, the accuracy is able to reach 76.10%. From the ROC curves, it can be seen that although the Decision Tree model performs better than the KNN model, it is still unable to effectively identify different urban functional zones ( Figure 10). In addition, the confusion matrix and AUC values are also presented below.

Decision Tree
The Decision Tree model is simple to understand as well as to interpret. Different to the artificial neural network which is a black box model, the Decision Tree model can be easily explained. In addition, in contrast to other methods which need to normalize the data, create dummy variables, and remove missing values before preprocessing, the Decision Tree model needs little data preparation. If new data is constantly introduced in the Decision Tree model, the overfitting is most likely to occur. Hence, suitable parameters are needed to stop the recursive splitting process to prevent overfitting. One of the most important parameters is the max_depth, which shows the maximum depth of the tree in the model. While the tree becomes deeper, it will split more to collect more information on the data. As a result, the maximum depth of the tree was tuned in the decision tree. The GridSearchCV function from scikit-learn python library was used to tune the model. When maximum depth was set to nine, the accuracy is able to reach 76.10%. From the ROC curves, it can be seen that although the Decision Tree model performs better than the KNN model, it is still unable to effectively identify different urban functional zones ( Figure 10). In addition, the confusion matrix and AUC values are also presented below.

Decision Tree
The Decision Tree model is simple to understand as well as to interpret. Different to the artificial neural network which is a black box model, the Decision Tree model can be easily explained. In addition, in contrast to other methods which need to normalize the data, create dummy variables, and remove missing values before preprocessing, the Decision Tree model needs little data preparation. The hyper-parameters of Decision Tree model include criterion, splitter, max_depth, min_samples_split, min_samples_leaf, min_weight_fraction_leaf, max_features, random_state, max_leaf_nodes, min_impurity_decrease, min_impurity_split, class_weight, presort. If new data is constantly introduced in the Decision Tree model, the overfitting is most likely to occur. Hence, suitable parameters are needed to stop the recursive splitting process to prevent overfitting. One of the most important parameters is the max_depth, which shows the maximum depth of the tree in the model. While the tree becomes deeper, it will split more to collect more information on the data. As a result, the maximum depth of the tree was tuned in the decision tree. The GridSearchCV function from scikit-learn python library was used to tune the model. When maximum depth was set to nine, the accuracy is able to reach 76.10%. From the ROC curves, it can be seen that although the Decision Tree model performs better than the KNN model, it is still unable to effectively identify different urban functional zones ( Figure 10). In addition, the confusion matrix and AUC values are also presented below.

SVM
SVM can be divided into linear SVM and non-linear SVM. For the more commonly used non-linear SVM with a Gaussian RBF kernel, C, and Gamma are the parameters that need to be tuned to find the best margin that separates all positive and negative samples. The model is sensitive to the parameters. The high value of gamma indicates the influence if the single training example reaches very close, as far as the support vector itself, which will cause overfitting. The low value of gamma shows that the whole training dataset will be included in any chosen support vector. The overall accuracy of the SVM model is the highest when the C and gamma are 100 and 0.01 respectively, which were obtained by using the GridSearchCV function from scikit-learn python library. The accuracy assessment indicates that the highest accuracy of the urban functional zones is 67.71%. In addition, the confusion matrix, ROC curves and AUC values are also presented below ( Figure 11). Results from the classification of urban functional zones indicate that it is difficult to distinguish different urban functional zones using SVM in this research. SVM can be divided into linear SVM and non-linear SVM. For the more commonly used nonlinear SVM with a Gaussian RBF kernel, C, and Gamma are the parameters that need to be tuned to find the best margin that separates all positive and negative samples. The model is sensitive to the parameters. The high value of gamma indicates the influence if the single training example reaches very close, as far as the support vector itself, which will cause overfitting. The low value of gamma shows that the whole training dataset will be included in any chosen support vector. The overall accuracy of the SVM model is the highest when the C and gamma are 100 and 0.01 respectively, which were obtained by using the GridSearchCV function from scikit-learn python library. The accuracy assessment indicates that the highest accuracy of the urban functional zones is 67.71%. In addition, the confusion matrix, ROC curves and AUC values are also presented below ( Figure 11). Results from the classification of urban functional zones indicate that it is difficult to distinguish different urban functional zones using SVM in this research.

Random Forest
The Random Forest model was selected for this research because of its high accuracy and limiting overfitting. During the model training, it is important to tune the model to improve its efficiency. n_jobs denotes the number of cores used in the training process. Negative one was selected because it can enable all the cores in our experiment. n_estimators denotes the number of classification trees in the training model. The high value of n_estimators will make the predictions of the model accurate and stable, but it may take more time. max_depth denotes the maximum depth of the tree, or how much a node should be expanded in the training process. If the max_depth is set too high, the whole model will have a high risk of overfitting. min_sample_split and min_sample_leaf control the number of samples at a leaf node. As a leaf is usually where the decision tree ends, small numbers may lead to overfitting, while large numbers may prevent learning. The default value of the parameter min_sample_leaf and min_sample_split is used in this research. Lastly, the parameters n_estimators and max_depth were calibrated using the GridSearchCV function from scikit-learn python library, and the best values for n_estimators and max_depth were 81 and 21, respectively. The testing accuracy finally reached 84.49%. In addition, the confusion matrix, the ROC curves and AUC values of the experiment based on the Random Forest model can also be seen from Figure 12.

Random Forest
The Random Forest model was selected for this research because of its high accuracy and limiting overfitting. During the model training, it is important to tune the model to improve its efficiency. n_jobs denotes the number of cores used in the training process. Negative one was selected because it can enable all the cores in our experiment. n_estimators denotes the number of classification trees in the training model. The high value of n_estimators will make the predictions of the model accurate and stable, but it may take more time. max_depth denotes the maximum depth of the tree, or how much a node should be expanded in the training process. If the max_depth is set too high, the whole model will have a high risk of overfitting. min_sample_split and min_sample_leaf control the number of samples at a leaf node. As a leaf is usually where the decision tree ends, small numbers may lead to overfitting, while large numbers may prevent learning. The default value of the parameter min_sample_leaf and min_sample_split is used in this research. Lastly, the parameters n_estimators and max_depth were calibrated using the GridSearchCV function from scikit-learn python library, and the best values for n_estimators and max_depth were 81 and 21, respectively. The testing accuracy finally reached 84.49%. In addition, the confusion matrix, the ROC curves and AUC values of the experiment based on the Random Forest model can also be seen from Figure 12.

XGBoost
XGBoost, a famous boosted tree learning model, was built to optimize large-scale boosted tree algorithms. XGBoost has a few parameters that can dramatically affect the model's accuracy and training speed, including max_depth, gamma, eta, min_child_weight, etc. The parameter of max_depth determines how deep we would like to grow our tree. If max_depth is set to be too high, this model might run into the risk of overfitting. gamma is the minimum loss reduction required to make a further partition on a leaf node of the tree. The larger, the more conservative the algorithm will be. eta is the step size shrinkage used in each boosting step to prevent overfitting and make the model more robust. min_child_weight is the minimum sum of weights needed in a child. In this research, max_depth was calibrated using the GridSearchCV function from scikit-learn python library, apart from the selection of the default values for all the other parameters. The experiment accuracy surprisingly reached 88.05%. In addition, the confusion matrix, the ROC curves and AUC values of the experiment based on XGBoost model can also be found in Figure 13.

Model Performance Comparison
From Tables Table 3 Table 4 Table 5 it is noted that the tree-based models, including XGBoost, Random Forest, and Decision Tree, performed better than other classifiers in our experiments. It is also noted that XGBoost model outperforms all the other models in terms of accuracy, followed by Random Forest, both of which have reached more than 80%. The results also align with our hypothesis. As a tree-based model, Decision Tree model reached an accuracy of 76.1%, but less than

XGBoost
XGBoost, a famous boosted tree learning model, was built to optimize large-scale boosted tree algorithms. XGBoost has a few parameters that can dramatically affect the model's accuracy and training speed, including max_depth, eta, min_child_weight, etc. The parameter of max_depth determines how deep we would like to grow our tree. If max_depth is set to be too high, this model might run into the risk of overfitting. eta is the step size shrinkage used in each boosting step to prevent overfitting and make the model more robust. min_child_weight is the minimum sum of weights needed in a child. In this research, max_depth was calibrated using the GridSearchCV function from scikit-learn python library, apart from the selection of the default values for all the other parameters. The experiment accuracy surprisingly reached 88.05%. In addition, the confusion matrix, the ROC curves and AUC values of the experiment based on XGBoost model can also be found in Figure 13.

XGBoost
XGBoost, a famous boosted tree learning model, was built to optimize large-scale boosted tree algorithms. XGBoost has a few parameters that can dramatically affect the model's accuracy and training speed, including max_depth, gamma, eta, min_child_weight, etc. The parameter of max_depth determines how deep we would like to grow our tree. If max_depth is set to be too high, this model might run into the risk of overfitting. gamma is the minimum loss reduction required to make a further partition on a leaf node of the tree. The larger, the more conservative the algorithm will be. eta is the step size shrinkage used in each boosting step to prevent overfitting and make the model more robust. min_child_weight is the minimum sum of weights needed in a child. In this research, max_depth was calibrated using the GridSearchCV function from scikit-learn python library, apart from the selection of the default values for all the other parameters. The experiment accuracy surprisingly reached 88.05%. In addition, the confusion matrix, the ROC curves and AUC values of the experiment based on XGBoost model can also be found in Figure 13.

Model Performance Comparison
From Tables Table 3 Table 4 Table 5 it is noted that the tree-based models, including XGBoost, Random Forest, and Decision Tree, performed better than other classifiers in our experiments. It is also noted that XGBoost model outperforms all the other models in terms of accuracy, followed by Random Forest, both of which have reached more than 80%. The results also align with our hypothesis. As a tree-based model, Decision Tree model reached an accuracy of 76.1%, but less than

Model Performance Comparison
From Tables 3-5 it is noted that the tree-based models, including XGBoost, Random Forest, and Decision Tree, performed better than other classifiers in our experiments. It is also noted that XGBoost model outperforms all the other models in terms of accuracy, followed by Random Forest, both of which have reached more than 80%. The results also align with our hypothesis. As a tree-based model, Decision Tree model reached an accuracy of 76.1%, but less than the ensemble models Random Forest and XGBoost. The classification accuracy of the Multinomial Logistic Regression model is around 70%. The second least performed model is SVM, followed by K-Nearest Neighbors model, which is the least performed model and nearly 27% less than XGBoost model in this research. The reason might be that K-Nearest Neighbors model does not learn the weight from the training data, which lead to the poor generation of results and being easily influenced by noise [53]. Table 3. The accuracy of different models' performance in this research.

Ranks
Model Accuracy  In terms of the confusion matrix, the comparison of correctly classified numbers of grids for different classes and the comparison of ranks of these models have been conducted, respectively (see Tables 4 and 5). XGBoost was the best among all these classifiers tested in this research. It showed the best performance of classifying all these classes except Class G. For individual classes, the classification result of Random Forest was not as good as XGBoost for most of these classes, but Random Forest outperformed XGBoost in the classification of class G. The Decision Tree model performed worse than Random Forest and XGBoost in all the classes' classification. Regarding Multinomial Logistic Regression, it classified worse in class C, E, M, R, and T than Decision Tree, Random Forest and XGBoost. SVM and K-Nearest Neighbors models performed the worst in general, compared with all the other models, especially the class T was classified the worst in SVM and the class C, L and R were classified the worst in the K-Nearest Neighbors model.
Besides, the comparison of the ROC curves of these models (see Figures 8b, 9b, 10b, 11b, 12b and 13b) and the comparison of the AUC values and ranks (see Tables 6 and 7) have also been conducted. It is noted that the curves of XGBoost and Random Forest are closer to the left-top corner compared to the other tested models, which revealed that the two models have a higher capability of distinguishing between individual classes. Especially, most AUC values of these classes in XGBoost and Random Forest models were near to 1, which means XGBoost and Random Forest models had shown good separability between classes. As for Decision Tree model, its classification performance was apparently worse than XGBoost and Random Forest models. The AUC values of these classes M and R were less than 0.8, while AUC values of all the other classes are greater than or equal to 0.89. For the Multinomial Logistic Regression, its AUC values of all these classes were more than or equal to 0.81 and ROC curves look smoother than the ROC curves based on the Decision Tree model, which showed its better performance on the classification if looking into these classes as a whole. As can be seen from ROC curves based on the SVM model, they looked much worse than the ROC curves based on XGBoost and Random Forest models. The AUC values of these classes were also significantly worse. Lastly, the K-Nearest Neighbours model performed worst compared with all the other models in this study, whose AUC values of class M, R and T were even less than 0.8 and the ROC curves looked quite sharp. In summary, XGBoost showed its dominant performance among the other tested models in this study. However, to reach a high accuracy, XGBoost model may require more knowledge and parameters calibration than other techniques, such as Random Forest, which also need to be considered while using XGBoost.

Discussion and Conclusion
It has been an increasingly important issue recently to conduct the urban functional zones classification because it could provide a good reference for urban planners and decision makers to monitor the changes of urban functional zones over space and time for making better plans and decisions. However, urban functional zones classification remains a challenge due to the complexity of urban systems and the limitation of datasets. Although many classification approaches have been used to distinguish between different urban functional zones based on various kinds of datasets, there is still room for improvement in terms of designing and employing more effective models for better accuracy in the field. Newly developed machine learning classifier XGBoost has shown its high efficiency and effectiveness in many applications. However, it has not been tested and utilized in urban functional zones classification. Hence, in this study, the XGBoost model was employed, tested, and compared with other commonly used classification models to classify a variety of urban functional zones in the case study of Yuzhong District, Chongqing, China. In these successful experiments, the XGBoost model was found to be the best among all these commonly used models tested in this research, with the highest accuracy of 88.05%. The results could explicitly demonstrate that the XGBoost model could effectively be applied in urban functional zones classification through the combination of physical and socioeconomic features extracted from high-resolution satellite images and multi-source geospatial data, respectively. In this study, ensemble classifiers, such as XGBoost and Random Forest, have also shown a promising classification performance compared to other kinds of state-of-the-art classifiers. In addition, although XGBoost is a highly sophisticated algorithm, the model is still quite straightforward to use and is able to perform better than Random Forest and the other tested models based on accuracy, confusion matrix, ROC curves, and AUC values in this case study. This might be due to the following aspects: First, XGBoost is a regularized boosting technique and allows users to define custom optimization objectives and evaluation criteria, which is of high flexibility as well as able to reduce overfitting. Second, XGBoost is also able to handle missing values, which could decrease the uncertainty. Furthermore, Random Forest model ranks the second of all these tested models in the case study. As a tree-based model, the Decision Tree has also performed well, but less accurate than XGBoost and Random Forest models, respectively. Surprisingly, Multinomial Logistic Regression, as the most commonly used and simplest model, shows an accuracy of 70.86%, which is even better than SVM and K-Nearest Neighbors models in this study. SVM, although proved to be efficient in previous studies [26,44], only achieved an accuracy of 67.71% in this case study, which is less than most of the classification model tested in this study. The K-Nearest Neighbors model performed worst in both accuracy and classification separability in this study.
The success of our comparison between these models could be a good reference to other case studies in urban functional zones classification or even other applications of classifications. Furthermore, as the extension of our current research, more efforts will be put into the temporal dimension of urban functional zones classification, which requires even more efficient classification models or even integration of high-performance computation. In addition, in this research, the sample size was set to be 100 m by 100 m given the data availability and the size of the research area. However, the scale may also matter in the performance of these models, we would like to continue our research on this direction in the future. On the other hand, our case study based on multi-source geospatial datasets has also revealed the value of nighttime light imagery, social media datasets, POI datasets, and Baidu Heat Map to the recognition of urban functional zones. Of course, more geospatial datasets, especially social sensing datasets, are also worth exploring, which will also be one direction of our future research.
Last but not least, there are also a few limitations in this research. First, there exist some uncertainties on these multi-source geospatial data collected. Given that it applies to all these models, the influence on the results of our study could be ignored. Second, considering the computation intensity, the parameters calibration in this research could also be improved for possibly more precise results. These aspects will be addressed in our future research.