Uncovering the Nature of Urban Land Use Composition Using Multi-Source Open Big Data with Ensemble Learning

: Detailed information on urban land uses has been an essential requirement for urban land management and policymaking. Recent advances in remote sensing and machine learning technologies have contributed to the mapping and monitoring of multi-scale urban land uses, yet there lacks a holistic mapping framework that is compatible with different end users’ demands. Moreover, land use mix has evolved to be a key component in modern urban settings, but few have explicitly measured the spatial complexity of land use or quantitively uncovered its driving forces. Addressing these challenges, here we developed a novel two-stage bottom-up scheme for mapping essential urban land use categories. In the ﬁrst stage, we conducted object-based land use classiﬁcation using crowdsourcing features derived from multi-source open big data and an automated ensemble learning approach. In the second stage, we identiﬁed parcel-based land use attributes, including the dominant type and mixture mode, by spatially correlating land parcels with the object-based results. Furthermore, we investigated the potential inﬂuencing factors of land use mix using principal components analysis and multiple linear regression. Experimental results in Ningbo, a coastal city in China, showed that the proposed framework could accurately depict the distribution and composition of urban land uses. At the object scale, the highest classiﬁcation accuracy was as high as 86% and 78% for the major (Level I) and minor (Level II) categories, respectively. At the parcel scale, the generated land use maps were spatially consistent with the object-based maps. We found larger parcels were more likely to be mixed in land use, and industrial lands were characterized as the most complicated category. We also identiﬁed multiple factors that had a collective impact on land use mix, including geography, socioeconomy, accessibility, and landscape metrics. Altogether, our proposed framework offered an alternative to investigating urban land use composition, which could be applied in a broad range of implications in future urban studies.


Introduction
Our planet witnessed rapid urbanization in recent decades. By 2018, global artificial surface areas reached 797,076 km 2 , more than 2.5 times that of 1990 [1]. This trend is expected to continue in the coming decades that by 2050, about 70% of the world's population (6.7 billion) is going to live in urban areas [2,3]. Although urbanization can promote economic growth and living standards improvement, its negative outcome in for the Level II category [41]. This hinders their applications in fine-resolution research at local and regional scales. Additionally, the majority of current studies are carried out in a determined spatial context (either pixel, object, or parcel). In consequence, it is valuable to construct a flexible and adjustable mapping framework that can simultaneously meet the demands from varying research and practice groups.
In recent years, ensemble learning, as an efficient method of machine learning, has received much attention from the remote sensing community. One advantage of ensemble learning is the capacity that strategically generates and combines multiple models or classifiers in solving a particular computational intelligence problem [42]. When multiple models are employed, the combined result of them is almost always better as compared to using a single model [43]. Consequently, ensemble learning is extremely helpful in improving mapping accuracy and has been widely adopted in various studies, including land cover classification [44,45], wetland monitoring [46,47], and change detection [48,49]. Nonetheless, the performance of ensemble learning in urban land use classification remains poorly understood.
Mixed land use has always been an unneglectable issue in land management and urban planning [50,51]. It is defined as the phenomenon that two or more land use types, such as industrial zones, commercial zones, and residential districts, simultaneously provide services for different groups in a spatial entity (for example, a land parcel) [52,53]. The city itself is an integrated, complex, and multifunctional systems system. Theoretically and practically, land use mix, as a fundamental principle of urban development, has achieved substantial progress worldwide [54]. In the book named The Death and Life of Great American Cities, Jacobs claims that the mixture of land uses is one of the critical preconditions for maintaining the city's vitality [55]. Apart from that, mixed land use has been treated as a desirable wheel for advocating active travel and promoting public health [56][57][58][59]. However, existing measurements of urban land use mix are mainly based on ground survey or statistical data and usually require a large amount of labor expense [51,54,60,61]. Moreover, few studies have quantitively explored the underlying factors that drive land use mix.
To address these challenges, we developed a novel two-stage bottom-up framework for urban land use categories mapping with multi-source geospatial big data and an automatic ensemble learning approach. Taking Ningbo as the case study, we provided a comprehensive review of urban land use composition in a Chinese city with the four research aims as follows: (1) derive urban land use classification maps accurately at both object and parcel scales; (2) verify the efficiency and robustness of ensemble learning in object-based urban land use classification; (3) measure the degree of land use mix at the parcel scale; and (4) investigate potential influencing factors that drive land use mix.

Basic Assumption: Each Parcel Is Composed of Objects
The starting point of our two-stage urban land use mapping framework lies in the spatial structure of mapping units, that is, each parcel consists of several objects, which have the same or different land use attributes. Since a pixel does not explicitly represent an entity characterizing the urban environment, we exclude the pixel-scale context in this study. A parcel is defined as a geographically meaningful region with relatively homogeneous socioeconomic functions and is usually delineated by the road network that surrounds [12,39,41]. An object is defined as a group of pixels with consistent visual cues, such as spectrum, texture, and shape [24,36]. Normally, a parcel has a larger area than an object does. Figure 1a displays a conceptional example of spatial interdependence between parcels and objects, in which a parcel contains multiple objects with different urban land uses including residential areas, commercial areas, and educational areas. In some cases, parcels may be used for a single land use type, for example, all buildings within a land parcel being for residential purpose. We interpret such kinds of parcels (i.e., one with objects in it all having the same function) as relatively pure land use.
(i.e., the basic spatial context for urban land use classification in this study). We leveraged multiple datasets from OSM road network, global urban boundaries (GUB) [62], and the 10 m global land cover product (FROM-GLC10) [63] for generating parcels while adopting a seed-based segmentation approach called the simple non-iterative clustering (SNIC) algorithm [64] for segmenting the 10 m Sentinel-2 imagery into homogeneous objects. See Supplementary Material Section S1 for more information on the generation of two types of mapping units.  Figure 2 presents a flowchart outlining the entire mapping scheme. According to the basic assumption in Section 2.1, we divided this framework into mapping essential urban land use categories at two stages: the object scale (EULUC-seg, stage 1) and the parcel scale (EULUC-parcel, stage 2). In stage 1, four main procedures were involved ( Figure 2): extracting features from multi-source remotely sensed and social sensing data; collecting training and validation samples; automatic classification with ensemble learning; and mapping and accuracy assessment. We performed ensemble learning and statistical analysis in the Python 3 environment and used the ArcMap 10.3 software for spatial analysis and map production. Figure 1b shows a flowchart for the generation of the two kinds of mapping units (i.e., the basic spatial context for urban land use classification in this study). We leveraged multiple datasets from OSM road network, global urban boundaries (GUB) [62], and the 10 m global land cover product (FROM-GLC10) [63] for generating parcels while adopting a seed-based segmentation approach called the simple non-iterative clustering (SNIC) algorithm [64] for segmenting the 10 m Sentinel-2 imagery into homogeneous objects. See Supplementary Material Section S1 for more information on the generation of two types of mapping units.  According to the basic assumption in Section 2.1, we divided this framework into mapping essential urban land use categories at two stages: the object scale (EULUC-seg, stage 1) and the parcel scale (EULUC-parcel, stage 2). In stage 1, four main procedures were involved ( Figure 2): extracting features from multi-source remotely sensed and social sensing data; collecting training and validation samples; automatic classification with ensemble learning; and mapping and accuracy assessment. We performed ensemble learning and statistical analysis in the Python 3 environment and used the ArcMap 10.3 software for spatial analysis and map production.

Feature Extraction
The inclusive features can be divided into two categories: remote sensing based and social sensing based. For remote sensing images, the average value, the sum value, and the standard deviation of each band are the most commonly used features. Additionally, texture features that describe the degree of tonal variations across pixels of an image or the level of landscape heterogeneity of an area [65] is another widely adopted item of information in land use classification [15,20,22,66,67]. Among the numerous texture calculation approaches in the literature, the gray level co-occurrence matrix (GLCM) method proposed by Haralick et al. [68] is a reliable method that computes the texture of an image by counting the occurrences of combinations of specific values between neighborhood pixels [69]. Following the GLCM method, we calculated six texture metrics of variance, correlation, contrast, dissimilarity, entropy, and angular second moment in this study. Details about the metric calculation and descriptions are provided in Table S1.

Feature Extraction
The inclusive features can be divided into two categories: remote sensing based and social sensing based. For remote sensing images, the average value, the sum value, and the standard deviation of each band are the most commonly used features. Additionally, texture features that describe the degree of tonal variations across pixels of an image or the level of landscape heterogeneity of an area [65] is another widely adopted item of information in land use classification [15,20,22,66,67]. Among the numerous texture calculation approaches in the literature, the gray level co-occurrence matrix (GLCM) method proposed by Haralick et al. [68] is a reliable method that computes the texture of an image by counting the occurrences of combinations of specific values between neighborhood pixels [69]. Following the GLCM method, we calculated six texture metrics of variance, correlation, contrast, dissimilarity, entropy, and angular second moment in this study. Details about the metric calculation and descriptions are provided in Table S1. As for social sensing data, feature extraction is determined by the structure and characteristics of data. For example, the point of interest (POI) data are a set of points recording multiple spatial and attribute information of geographical entities, such as addresses, names, coordinates, and land use types. In this case, features of POI data are usually calculated based on the number and proportion of each POI type within the mapping unit. Table S2 summarizes all object-based features used in the stage 1 mapping of EULUCseg. In total, 76 features derived from multi-source remotely sensed and social sensing data sources were included in this study.

Sample Collection
We adopted the two-level classification system proposed in our previous study for mapping urban land use, which comprises five major (Level I) categories (residential, commercial, industrial, transportation, and public) and twelve minor (Level II) categories (residential, village, business, commercial, industrial, transportation, administrative, educational, medical, sport and cultural, park and greenspace, and undeveloped) ( Table 1). This classification system originated from the EULUC scheme proposed by Gong et al. [41] and was later modified by Tu et al. [23] considering the characteristics of the study area. Detailed descriptions for each category of the two-level EULUC classification system are provided in Table S3. Based on the defined classification system, we identified 485 samples of the ground truth through visual interpretation and field investigation (Table 1). Practically, we first randomly selected objects within the study area and interpreted them based on multiple online sources such as Google Earth (https://www.google.com/earth/, accessed on 20 August 2021) and Baidu Street View Maps (https://map.baidu.com/, accessed on 20 August 2021). Second, we conducted an on-site field survey back in October 2019 and confirmed that more than 99% of the investigated samples were correct [23]. For subsequent analysis, the collected samples were randomly split into two datasets, that is, 70% for training (340) and 30% for validation (145). The training data were used for ensemble learning through multi-layer stacking, 5-fold cross-validation, and parameter tuning (see Section 2.2.3). The optimal model retrieved from ensemble learning was then applied to the validation samples for an accuracy assessment (see Section 2.2.4).

Ensemble Learning
As shown in Figure 2, we leveraged the multi-layer stacking model in ensemble learning. Each stack layer (L) is composed of several individual base models ("Base Learner" (BL)) and a "Meta Learner" (ML). Iteratively, each base model in BL n is trained individually with the output of ML n−1 in the previous stack layer, and ML n is trained by stacking the learning results from BL n and original input features. The revisit of the original data enables high-layer stackers to achieve more robust and accurate performance during the training process.
Apart from multi-layer stacking, here we adopted a bagging approach called k-fold cross-validation for reducing variance in predicting results and mitigating over-fitting issues in ensemble learning. Practically, for any base model in BL n , we randomly partitioned the input dataset into k subsamples with equal size (stratified based on labels). Among the k subsamples, one single subsample was reserved as the validation set for model testing, and the remaining k−1 subsamples were used for model training. This cross-validation process would be repeated k times, in which each k subsample was used only once for the validation. The averaged k results were then computed as the final output.
We employed the AutoGluon package introduced by Erickson et al. [70] to realize automatic ensemble learning. AutoGluon is an open-source Python library that automates the process of model selection, hyperparameter tuning, and model ensembling during machine learning [70]. Based on the given parameters, such as training time, bagging strategy, etc., AutoGluon will automatically achieve the best classification results by combining and stacking multi-model classifications. In this study, the parameter "num_bag_folds" was set to 5 for 5-fold cross-validation, "auto_stack" was set to True for automatic multi-layer stacking, and "time_limit" was set to 3600 for a maximum learning time of 3600 s in total. To achieve the most robust classification results, we used all the available base models provided by AutoGluon herein ensemble learning, which included Random Forest [71], Extremely Randomized Trees [72], Gradient Boosted Decision Trees (CatBoost) [73], Light Gradient Boosting Machine (LightGBM) [74], and Neural Networks [75]. For each base model, we tested its performance under 20 sets of parameter combinations and chose values with the highest overall accuracy as the optimal parameters. An introduction to each base model as well as its parameter settings is provided in Supplementary Material Section S2.

Accuracy Assessment and Mapping
Two evaluation schemes were included in the accuracy assessment. For the training process, the average overall accuracy [76] derived from the 5-fold cross-validation using ensemble learning was calculated for comparing the classification performance of different models. In this process, the model with the highest training accuracy was defined as the optimal model. For the validation process, the validation sample set (described in Section 2.2.2) was used to assess the performance of the optimal model independently. Specifically, we calculated the overall accuracy, Kappa coefficient, user accuracy, and producer accuracy based on the confusion matrix [77].
After that, we predicted the land use categories based on the derived features (Section 2.2.1) and the optimal training model (Section 2.2.3), and finally generated the stage-1 mapping results of EULUC-seg.

Stage-2: Mapping at the Parcel Scale
Since each parcel consisted of several objects, we could further identify the land use attributes (dominant category, degree of mix, etc.) of parcels according to objects within them, which had been classified in the previous stage of EULUC-seg. Specifically, for stage 2 mapping of EULUC-parcel, we defined and calculated three indices, i.e., dominant category (DC), dominant rate (DR), and complexity index (CI), based on the spatial relationship between land parcels and objects. We assigned DC as the final map of EULUC-parcel. We used DR and CI to measure the land use mix of parcels.
Let P i represent the area proportion of the i-th land use category to the entire parcel and n the total number of land use categories of a parcel. Three indices of DC, DR, and CI can be calculated as: DC is determined as the land use category with the largest area proportion within each parcel. DR refers to the area proportion of DC in the entire parcel. The larger the DR, the purer the parcel (1 indicates single land use). CI is essentially an entropy-based index that characterizes the evenness of land use classes and has been widely adopted in measuring land use mix [50,60,[78][79][80]. It ranges from 0 to 1 with a higher CI value indicating a more mixed parcel and vice versa. CI equals 0 when there is only one land use within the parcel and reaches 1 when all the land use classes are evenly distributed. In this study, we mainly focused on the calculation and analysis of land use mix for the Level I category.

Quantifying Influencing Factors of Land Use Mix
With an understanding of the current status of land use mix, our next goal was to explore the underlying factors behind the spatial heterogeneity of land use mix, which can provide guidance and insight for urban planning and neighborhood design. After a comprehensive review of the existing literature [51,57,61,81], we selected 19 variables from four aspects of geography, socioeconomy, accessibility, and landscape as the potential influencing factors of urban land use mix ( Table 2). The variables, calculated at the parcel scale, were based on attributes of the landscape itself (e.g., parcel size), in addition to other multi-source geospatial data (e.g., using digital elevation model (DEM) data to obtain elevations). Taking the Level I category as an experiment, we further uncovered the driving forces of land use mix by performing principal components analysis (PCA) and multiple linear regression with CI as the dependent variable. A detailed description of how we processed and calculated the raw data to derive these variables as well as quantified their associations with land use mix was provided in the Supplementary Material Section S3.

Study Area
Ningbo is a sub-provincial city located in the northeastern Zhejiang province, China (between 28 • 51 -30 • 33 N and 120 • 55 -122 • 16 E, Figure 3). It faces the East China Sea and Zhoushan Archipelago to the east, Hangzhou Bay to the north, and Shanghai-the largest and most prosperous metropolis in China-across the sea [82]. Topographically, the city is high in the southwest and low in the northeast with an average elevation of 4 m [83]. The total land area of Ningbo is 9816.23 km 2 , of which plain accounts for 40.3%, hill accounts for 25.2%, and mountain accounts for 24.9% [84].
Ningbo is a sub-provincial city located in the northeastern Zhejiang province, China (between 28°51′-30°33′N and 120°55′-122°16′E, Figure 3). It faces the East China Sea and Zhoushan Archipelago to the east, Hangzhou Bay to the north, and Shanghai-the largest and most prosperous metropolis in China-across the sea [82]. Topographically, the city is high in the southwest and low in the northeast with an average elevation of 4 m [83]. The total land area of Ningbo is 9816.23 km 2 , of which plain accounts for 40.3%, hill accounts for 25.2%, and mountain accounts for 24.9% [84]. Benefiting from the country's "Reform and Opening-Up" policy, Ningbo is among the first batch of Chinese coastal cities that opened to the outside world back in the 1980s. It has, therefore, been experiencing dramatic socioeconomic development, rapid urban expansion, and substantial population growth in the past four decades. Statistically, Ningbo's gross domestic product (GDP) has increased from USD 0.37 billion in 1979 to USD 191.68 billion in 2020. It now possesses a registered population of 6.03 million with an urbanization rate of 72.9% [85]. As an important economic, industrial, and trading center in East China, the city has a diversified land use pattern [23,84,86], making it a representative region for case studies. Benefiting from the country's "Reform and Opening-Up" policy, Ningbo is among the first batch of Chinese coastal cities that opened to the outside world back in the 1980s. It has, therefore, been experiencing dramatic socioeconomic development, rapid urban expansion, and substantial population growth in the past four decades. Statistically, Ningbo's gross domestic product (GDP) has increased from USD 0.37 billion in 1979 to USD 191.68 billion in 2020. It now possesses a registered population of 6.03 million with an urbanization rate of 72.9% [85]. As an important economic, industrial, and trading center in East China, the city has a diversified land use pattern [23,84,86], making it a representative region for case studies.

Data
As listed in Table 3, we included an expansive set of remotely sensed and social sensing data layers for mapping urban land use categories, including Sentinel-1 Synthetic Aperture Radar (SAR) imagery, Sentinel-2 multispectral imagery, Luojia-1 NTL imagery, WorldPop population dataset, and Baidu POI data. All these datasets were collected during the year 2018 for temporal consistency. Details for the preparation and processing of each category of the used datasets are provided in the Supplementary Material Section S4. Figure S2 displays and compares six layers of the datasets used in the city center of Ningbo, in which a significant difference is observed in spectral reflectance, spatial resolution, and data structure. Compared with Sentinel-1 and Sentinel-2 ( Figure S2a-c), Luojia-1 and WorldPop reveal fewer spatial details due to the relatively low spatial resolution ( Figure S2d,e). As the only vector dataset used, the distribution of POI data shows a certain heterogeneity across space ( Figure S2f). Following Section 2.2.1, we further extracted an expansive set of spectrum, texture, and additional features based on these datasets for object-based urban land use classification (i.e., stage 1 mapping). Figure S3 provides an example of the five typical object-based features derived.

Accuracy Assessment
Tables 4 and 5 compare the classification accuracy of different models of EULUCseg for the two-level categories. At the object scale, Ensemble models achieved the best performance for both the Level I category (training accuracy: 86.47%) and the Level II category (training accuracy: 77.94%), followed by Neural Networks, LightGBM, CatBoost, Extremely Randomized Trees, and Random Forest models with slightly lower accuracy. This indicated that the multi-layer stacking strategy did help improve model performance in land use classification. Another finding was that Ensemble and Neural Networks models required more training time than other models. Given the relatively superior performance of Ensemble models, we chose them as the optimal models for subsequent analysis. We further evaluated the classification performance of optimal models for each land use category in EULUC-seg, using the validation sample set in Section 2.2.2. An overall accuracy of 85.52% and a kappa coefficient of 0.79 were obtained for the five Level I categories (Table S5). Residential land had the highest user accuracy of 97.73%, while transportation land had the lowest user accuracy of 71.43%. In terms of producer accuracy, the classification performance of all Level I categories was rather satisfying (>85%). As for the Level II category, the overall accuracy and the kappa coefficient were 77.93% and 0.75, respectively (Table S6). Out of the twelve Level II categories, residential, village, and greenspace could be well classified, with both user accuracy and producer accuracy higher than 80%. In contrast, public land uses such as educational, medical, and sport and cultural lands yielded less plausible classification performance. As shown in the confusion matrix in Table S6, such kinds of land use categories, which aim at public management and service, can be easily confused with residential and business lands. Moreover, it was discovered that commercial lands had a relatively low user accuracy (41.67%), while industrial lands had a relatively low producer accuracy (60.00%). Figure 4 shows the two-stage urban land use mapping results for Level II categories in Ningbo. Compared with EULUC-parcel, EULUC-seg was more fragmented with smaller mapping units. Owing to the proposed bottom-up mapping scheme, urban land use distributions showed good spatial consistency between the object scale and the parcel scale. The core urban area was dominated by residential, business, and commercial lands (Figure 4b1,b2), whereas the suburban area was mainly distributed with residential, village, industrial, and greenspace lands (Figure 4a1,a2,c1,c2).    Figure 5 distributes land use mix in Ningbo as well as its relationship with parcel size. In terms of spatial distribution, land use was heterogeneously mixed in the city center, with larger parcels generally having higher CI values (Figure 5a) and lower DR values (Figure 5b). This was confirmed in the linear regression results, where a significantly positive relationship (r = 0.57, p < 0.01) was observed between CI and parcel area (Figure 5c), while a significantly negative relationship (r = −0.60, p < 0.01) was observed between DR and parcel area (Figure 5d). Moreover, it was discovered that land use was relatively mixed in peri-urban areas, with CI values generally higher than 0.6 ( Figure 5a).  Table 6 compares the degree of mix between different land use categories. Land uses for industrial and transportation purposes were the most complicated, with an average CI value of 0.56 ± 0.29 and 0.49 ± 0.35 and an average DR value of 0.76 ± 0.17 and 0.73 ± 0.23, respectively. In contrast, public lands had the simplest use among all categories, with a minimum CI value of 0.31 ± 0.37 and the maximum DR value of 0.87 ± 0.18. Moreover, residential and commercial lands were relatively simple, with half of them having a DR value higher than 0.86. In general, among the 5562 parcels investigated, most of them were dominated by one land use category only (average DR value: 0.84 ± 0.18) and about a quarter of them had a CI value greater than 0.71. Table 6. Statistics of complexity index and dominant rate for each Level I land use category. Definition for each land use category can be seen in Table 1.  Table 7 summarizes the extracted ten components from PCA, which totally explain 93.68% of the variables listed in Table 2. According to the calculated variable weights of different PCs (Table S8), we defined variables with loading factors greater than 0.5 as important variables and assigned physical meaning to each PC. For example, PC 1 and PC 2 , which accounted for 39.17% and 16.59% of the overall variance in the original data, respectively, were both mainly explained by the variable of distance to subway station (dis_subway). Therefore, these two components could be generally grouped as accessibility. Table 7. Summary of the extracted ten components from PCA. Important variables with a loading factor higher than 0.50 are listed.

Component
Percentage of Explained Variances (%) Important Variables Physical Meanings PC 1 39.17 dis_subway accessibility PC 2 16.59 dis_subway accessibility PC 3 11.41 dis_track_road, dis_rail accessibility PC 4 7.01 ndvi geography PC 5 4.09 richness, shape landscape PC 6 3.99 fra_silt geography PC 7 3.74 dis_major_road accessibility PC 8 3.37 house_price, shape socioeconomy and landscape PC 9 2.43 dis_major_road, house_price accessibility and socioeconomy PC 10 1.89 dis_rail accessibility Total 93.68 Table 8 reports the results of the multiple linear regression model with complexity index CI as the dependent variable. Both PC 1 and PC 2 , which represented the accessibility aspect, were significantly associated with the complexity index in a positive manner, indicating that the longer the distance to the subway station, the larger the land use mix will be. Contrary to the subway, regression results of PC 3 showed that the complexity of parcels would increase when their distances to the track road or the railway decreased (p < 0.05), a negative relationship. These results of PC 1-3 indicated that accessibility had a double-edged impact on land use. PC 8 was negatively associated with the complexity index at the p < 0.001 level, implying parcels with higher house prices and more irregular shapes would have less mixed land use. Moreover, regression results of PC 4 and PC 6 both reflected that a friendlier natural environment (such as higher green cover rates) could lead to an increase in land use mix.

Advantages of the Proposed Framework
Different from previous studies that only focused on a single mapping scale, in this research, we identified land use attributes at both parcel and object scales using a bottom-up mapping strategy, with the aid of multi-source geospatial big data and ensemble learning approaches. The developed mapping scheme has some noticeable advantages. First, it significantly improves classification accuracy at the object scale. Machine learning has been widely accepted as a fundamental tool for land use and land cover classification. In this study, leveraging the automatic ensemble learning strategy, we tested a group of machine learning algorithms and compared multi-model performance in urban land use classification, using the same training and validation samples. Our results showed that Ensemble models achieved more robust and better performance in terms of classification accuracy. Compared with other models, the utilization of Ensemble models could yield a net increase in the training accuracy of 2.65-6.18% for the Level I category and 2.65%-8.53% for the Level II category (Tables 4 and 5). Since the protocol of ensemble learning is to incorporate results of various models through multi-layer stacking and bagging, this method is especially suitable for dealing with extensive data with high dimensions. The efficiency and robustness of machine learning have also been observed and discussed in our previous experiments for land use and land cover mapping [12,23,44,[87][88][89].
Second, the proposed scheme has, in the meantime, achieved robust classification results at the parcel scale. Mixed land use has been a major challenge in parcel-based land use mapping. In the nationwide study of EULUC-China, for instance, Gong et al. [41] discovered that overall accuracy decreases rapidly with the increase in the land use mix of parcels. Figure S4 also provides a comparison between the two-stage mapping results of this study with the EULUC-China maps [41]. Since Gong et al. [41] directly performed classification at the parcel scale, parcels with mixed land use, especially large ones, were easily misclassified ( Figure S4j-l). By addressing this shortcoming, our mapping results will be more accurate in revealing land use patterns and keep solid spatial consistency between objects (Figure S4d-f) and parcels ( Figure S4g-i).
Third, the proposed mapping framework has offered an alternative to measuring land use mix. At the parcel scale, we delineated the degree of land use mix (CI) and the rate of dominant land use (DR) through spatial aggregation and indices calculation, using the classification results at the object scale. We found that larger parcels had more mixed land use, of which many were distributed in the peri-urban area ( Figure 5). We also found that industrial land uses were the most complicated, while residential and public lands were the simplest land uses (Table 6). These findings were generally in line with previous research [51,80]. Theoretically, the proposed scheme is very flexible and can be extended to any other region in the future. Based on the generated land use maps, urban planners, decision-makers, and stakeholders may use them as a benchmark to understand the current distribution, rationality, and mixture of land use or to examine the implementation effects of land adjustment and planning policy. Moreover, the produced maps can be correlated with factors such as urban livability or environmental quality to explore the potential influencing factors of mixed land use, which in turn guide future urban planning.

Limitations and Future Work
A few remaining caveats caused by data limitations need to be acknowledged. On the one hand, the size of mapping units largely depends on the data quality and parameter settings. For instance, parcels in this study are generated based on buffered roads of OSM data (Figure 1b). For areas where OSM roads are sparse or even absent, the generated parcels can be too large and are not suitable for urban studies. On the other hand, this study focuses on the spatial complexity of different land use categories but neglects other kinds of mixes (such as horizontal mix or inner mix). In reality, it is common in urban planning that a building provides both commercial and residential functions. To deeper uncover the nature of urban land use composition, higher-quality emerging data (such as street view maps) and more refined models (such as the 3D model) are urgently needed in future work. Lastly, our experiments and analysis focus on one Chinese city of Ningbo given the cost of sample collections and data availability, and the results and conclusions may not be applicable to other cities. Nonetheless, the main purpose of this research is to develop a prototype that comprehensively examines the distribution, mixture, and factors of urban land use. Potentially through crowdsourcing and cross-cooperation, our next goal is to de-composite land use patterns across cities with different socioeconomic development and historical-cultural backgrounds, which provides new insights into urban planning and management at a broader scale.

Conclusions
Leveraging multi-source open big data and machine learning algorithms, this research developed a flexible and cost-effective framework for multi-scale urban land use category mapping. Following this framework, we first performed object-based land use classification using an expansive set of remote sensing and social sensing data layers including OSM, Sentinel-1, Sentinel-2, Luojia-1, WorldPop, and Baidu POI data. Secondly, by spatially joining the classification results from the object scale, we calculated three land use attributes including dominant category, dominant rate, and complexity index at the parcel scale. Our results indicated that Ensemble models achieved better results than the other base models, with a training accuracy of 86% for the Level I category and 78% for the Level II category, respectively. In addition, the two-stage mapping results showed strong consistency in spatial patterns. These findings elucidated the role of multi-layer stacking and bagging in urban land use classifications.
Land use mix, as a ubiquitous characteristic of cities, has become a key concern in recent urban planning. Here, we proposed an efficient approach to measuring the degree of land use mix and its underlying driving forces. With this detailed information, planners, stakeholders, and city officials can quickly understand current land use compositions as well as decide when and where adjustments should be made. The new framework is expected to be widely utilized in various applications and implications across regions and countries.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/rs13214241/s1, Section S1: Mapping units generation, Section S2: Base models and parameter tuning in ensemble learning, Section S3: Quantifying influencing factors of land use mix, Section S4: Data preparation and processing, Table S1: Texture metrics used in this study, Table S2: Summary of features used in the stage-1 mapping of EULUC-seg, Table S3: The two-level essential urban land use categories (EULUC) classification system, Table S4: Tuned optimal parameters for each base model in ensemble learning, Table S5: Confusion matrix for the Level I category of EULUC-seg, Table S6: Confusion matrix for the Level II category of EULUC-seg, Table S7: Urban land use composition in Ningbo, 2018, Table S8: Extracted ten components from PCA and their variable weights, Figure S1: The architecture of Neural Networks in ensemble learning, Figure S2: Comparison of multi-source data with different spatial resolutions used for urban land use classification in this study, Figure S3: Comparison of object-based features derived from multi-source data in the city center of Ningbo, Figure S4: Comparison of remotely sensed images and urban land use mapping results.

Data Availability Statement:
Interactive map for the Level I categories of essential urban land use in Ningbo is available at https://thutyecology.users.earthengine.app/view/euluc-ningbo-viewer (accessed on 20 August 2021).

Conflicts of Interest:
The authors declare no conflict of interest.