Effectiveness of Semi-Supervised Learning and Multi-Source Data in Detailed Urban Landuse Mapping with a Few Labeled Samples

: Detailed urban landuse information plays a fundamental role in smart city management. A sufﬁcient sample size has been identiﬁed as a very crucial pre-request in machine learning algorithms for urban landuse classiﬁcation. However, it is often difﬁcult to recognize and label landuse categories from remote sensing images alone. Alternatively, ﬁeld investigation is time-consuming with a high demand in human resources and monetary cost. Therefore, previous studies on urban landuse classiﬁcation have often relied on a small size of labeled samples with very uneven spatial distribution. This study aims to explore the effectiveness of a semi-supervised classiﬁcation framework with multisource data for detailed urban landuse classiﬁcation with a few labeled samples. A disagreement-based semi-supervised learning approach, the co-forest, was employed and compared with traditional supervised methods (e.g., random forest and XGBoost). Multi-source geospatial data were utilized including optical and nighttime light remote sensing and geospatial big data, which present the physical and socio-economic features of landuse categories. Taking urban landuse classiﬁcation in Shenzhen City as a case, results show that the classiﬁcation accuracy of the semi-supervised method are generally on par with that of traditional supervised methods, and less labeled samples are needed to achieve a comparable result under different training set ratios. Given a small sample size, the accuracy tends to be stable with training samples no less than 5% in total. Our results also indicate that the classiﬁcation accuracy by using multi-source data is signiﬁcantly higher than that with any single data source being applied. Among these data, map POI and high-resolution optical remote sensing data make larger contributions on the classiﬁcation, followed by mobile data and nighttime light remote sensing data.


Introduction
With the development of smart city construction, accurate and detailed urban landuse information plays a fundamental role for urban planning, resource allocation, and public administration. Up-to-date urban landuse map is in high demand in the management of a smart society. Remote sensing technology, providing the ability of wide-range observation and rapid response to change, has been widely used in many studies on urban landuse and land cover classification [1][2][3][4]. Traditional urban landuse classification techniques are based on multi-spectral remote sensing images. In addition to the spectral features, geometric and texture features are employed to obtain a more accurate classification as the spatial resolution of remote sensing imagery has improved [5][6][7]. However, it is

Related Work
To implement urban landuse classification based on multi-source geospatial big data including remote sensing and social attribute data, there are some key issues. (1) Traditional machine learning algorithms including deep learning methods need a large number of training samples to build a reliable classifier. However, it is usually difficult to obtain a large number of labeled samples since reliable labeling work of urban landuse types by field investigation is time-consuming and needs a high demand on human and monetary resources. (2) Considering the balance between the cost of sampling and the effect of model training, the optimal size of training samples for different algorithms in urban landuse classification has not been well addressed. This section reviews the existing urban landuse classification methods and the relevant discussion on classification stability issues with limited training samples.

Urban Landuse Classification Methods
Variety of machine learning algorithms have been developed for urban landuse and land cover classification and urban functional zoning in the past decades. Apart from remote sensing images, multi-source social sensing data have increasingly been adopted. The application of multi-source data has become an important direction of the urban landuse classification field [24]. The most commonly used algorithms include SVM, DT, RF, which are also regarded as the benchmarks in comparative analyses. Mountrakis et al. [25] reviewed the SVM algorithm in remote sensing applications and pointed out that the SVM algorithm was suitable for multi-class image classification tasks. Considering the simplicity in algorithms, the DT algorithm is widely used in remote sensing urban landuse classification with advantages of fusing complex features at different scales [26]. As a further advance of the tree-based DT algorithm, the RF algorithm is another commonly used method. Talukdar et al. [27] summarized several machine learning algorithms including RF, SVM, and artificial neural network (ANN) in landuse and land cover classification based on multi-spectral remote sensing images. They found that the RF algorithm obtained the best performance. Zhang et al. [28] compared machine learning algorithms in landuse classification based on map POI data. They drew the conclusion that the tree-based methods such as the RF algorithm performed better than the Bayesian network, rule-based learning, and the SVM. With the development of deep learning, deep neural network algorithms with high-level features have been used in the recognition of urban functional zones [29]. Jozdani et al. [30] compared deep learning algorithms with traditional machine learning algorithms in an object-based landuse classification. They indicated that traditional machine learning algorithms such as the XGBoost algorithm had a comparable result with deep learning algorithms. Besides, unsupervised algorithms such as Gaussian mixture model, the k-means algorithm, kernel density classification algorithm, and hierarchical clustering algorithm have been applied to urban landuse classification [31,32]. The classification results are generally not as good as the supervised methods.

Classification Stability with Small Sample Size
In practical applications, the use of limited samples is very common in urban landuse classification because labeled samples are always rare, and labeling a large number of unlabeled samples costs too much. A small sample size refers to a situation where the proportion of labeled or training samples is small relative to the samples to be classified. Zhao et al. [33] reviewed the current work of machine learning with small size of labeled samples. They concluded that there were still many challenges, although progress has been made in this area. Based on multi-spectral remote sensing data, Li et al. [34] tested the impact of training sample size on urban landuse classification by using supervised and unsupervised classifiers. They indicated that tree-based classifiers were more sensitive to training sample size. Based on multi-source geospatial data, Su et al. [35] analyzed the impact of training sample size on detailed urban landuse classification by using the RF algorithm. They pointed out that a stable result could be achieved with a proportion of training Remote Sens. 2022, 14, 648 4 of 17 samples no less than 7%. From the angle of increasing labeled samples, Gong et al. [36] employed a crowd-sourcing method to obtain a large number of labeled urban landuse samples in more than 30 cities in China to improve the generalization performance of the classification model. However, considering the total number of samples to be classified over the 30 cities, the number of labeled samples is still regarded as a small size for a specific urban landuse category. Furthermore, labeling landuse information on urban land parcels is highly dependent on expert experience (e.g., urban planner) and is usually difficult to obtain through a crowd-sourcing approach.

Materials and Methods
In this study, we conducted a parcel-based landuse classification based on road segmentation. We chose a set of urban landuse features from multi-source geospatial big data for model training and classification including multi-spectral and textural features from high-resolution optical remote sensing images, light brightness features from nighttime light (NTL) remote sensing images, and human activity and behavior features from map POI and mobile phone data. Based on the multi-dimensional features, we adopted a semisupervised co-forest classification framework for detailed urban landuse classification, and compared it with the most popular supervised tree-based classifiers such as the RF and XG-Boost. To analyze the impact of small sample size on the model performance, we also tested the stability of the classification models under different proportions of training samples.

Study Area
We chose Shenzhen City as the study area ( Figure 1a). Shenzhen is a coastal city in southern China and on the border with Hong Kong. It has an area of 1997 km 2 with a population of more than 13 million by the end of 2020. The city is one of the pioneer cities experiencing reform and the opening-up policy in China. With rapid urban development in the past decades, it has experienced a dramatic change including changes in the urban landscape. Shenzhen has been designated as a national pilot city for China's comprehensive reform, and to lead the construction of the Guangdong-Hong Kong-Macao Greater Bay Area. New residential areas, industrial parks, transport network, and tourism infrastructure have been planned. More changes in urban landscape are expected in the future. As one of the fastest growing cities in China, Shenzhen can offer more open and comprehensive geospatial big data and is regarded as a natural template for urban studies. impact of training sample size on detailed urban landuse classification by using the RF algorithm. They pointed out that a stable result could be achieved with a proportion of training samples no less than 7%. From the angle of increasing labeled samples, Gong et al. [36] employed a crowd-sourcing method to obtain a large number of labeled urban landuse samples in more than 30 cities in China to improve the generalization performance of the classification model. However, considering the total number of samples to be classified over the 30 cities, the number of labeled samples is still regarded as a small size for a specific urban landuse category. Furthermore, labeling landuse information on urban land parcels is highly dependent on expert experience (e.g., urban planner) and is usually difficult to obtain through a crowd-sourcing approach.

Materials and Methods
In this study, we conducted a parcel-based landuse classification based on road segmentation. We chose a set of urban landuse features from multi-source geospatial big data for model training and classification including multi-spectral and textural features from high-resolution optical remote sensing images, light brightness features from nighttime light (NTL) remote sensing images, and human activity and behavior features from map POI and mobile phone data. Based on the multi-dimensional features, we adopted a semisupervised co-forest classification framework for detailed urban landuse classification, and compared it with the most popular supervised tree-based classifiers such as the RF and XGBoost. To analyze the impact of small sample size on the model performance, we also tested the stability of the classification models under different proportions of training samples.

Study Area
We chose Shenzhen City as the study area ( Figure 1a). Shenzhen is a coastal city in southern China and on the border with Hong Kong. It has an area of 1997 km 2 with a population of more than 13 million by the end of 2020. The city is one of the pioneer cities experiencing reform and the opening-up policy in China. With rapid urban development in the past decades, it has experienced a dramatic change including changes in the urban landscape. Shenzhen has been designated as a national pilot city for China's comprehensive reform, and to lead the construction of the Guangdong-Hong Kong-Macao Greater Bay Area. New residential areas, industrial parks, transport network, and tourism infrastructure have been planned. More changes in urban landscape are expected in the future. As one of the fastest growing cities in China, Shenzhen can offer more open and comprehensive geospatial big data and is regarded as a natural template for urban studies.

Road Segmentation for Land Parcels
Road network information from an open-source dataset-OpenStreetMap (OSM) (https://www.openstreetmap.org, accessed on 6 December 2021)-was utilized to divide the whole study area into land parcels. Two levels of important roads from the OSM data were utilized, namely, the main road and secondary road. The geometry of the road network is presented in the OSM data with single lines (i.e., road centerline). Buffer zones were applied to those lines based on the widths of the roads, which were determined by road levels. Since we aimed to classify detailed urban landuse rather than land cover, an impervious surface data of Shenzhen from GAIA_2018 (http://data.ess.tsinghua.edu.cn/ gaia.html, accessed on 6 December 2021) was utilized to mask non-built-up areas such as woodland, grassland, wetland. The sizes of land parcels were heterogeneous. After segmentation, very small land parcels with an area of less than 1000 square meters were removed. Finally, the number of urban land parcels to be classified was more than 6800. Figure 1b shows the distribution of these land parcels in Shenzhen.

Data and Data Pre-Processing
To present the characteristics of urban landuse in multiple dimensions, multi-source geospatial data were utilized, most of them free-of-charge ( Figure 2). These are from Sentinel-2 high-resolution remote sensing data (source: https://earthengine.google.com/, accessed on 6 December 2021), Luojia-1 NTL remote sensing data (source: http://59.1 75.109.173:8888/app/login.html, accessed on 6 December 2021), Gaode Map POI data (source: https://lbs.amap.com, accessed on 6 December 2021), and GPS location-based mobile big data provided by a leading third-party big data company in China. The mobile data recorded the cumulative number of active mobile devices in a grid (around 140 m resolution) by month. Daytime (9 a.m.-5 p.m.) and nighttime statistics (9 p.m.-5 a.m.) were adopted. To maintain the temporal consistency of the data, all datasets were collected from the same period of 2019, except for the map POI data, because of the difficulty of obtaining the historical POI data through public methods. Hence, the POI data in 2020 were collected to keep the temporal consistency as much as possible. Table 1 summarizes the basic characteristics of the data. Road network information from an open-source dataset-OpenStreetMap (OSM) (https://www.openstreetmap.org, accessed on 6 December 2021)-was utilized to divide the whole study area into land parcels. Two levels of important roads from the OSM data were utilized, namely, the main road and secondary road. The geometry of the road network is presented in the OSM data with single lines (i.e., road centerline). Buffer zones were applied to those lines based on the widths of the roads, which were determined by road levels. Since we aimed to classify detailed urban landuse rather than land cover, an impervious surface data of Shenzhen from GAIA_2018 (http://data.ess.tsinghua.edu.cn/gaia.html, accessed on 6 December 2021) was utilized to mask non-built-up areas such as woodland, grassland, wetland. The sizes of land parcels were heterogeneous. After segmentation, very small land parcels with an area of less than 1000 square meters were removed. Finally, the number of urban land parcels to be classified was more than 6800. Figure 1b shows the distribution of these land parcels in Shenzhen.

Data and Data Pre-processing
To present the characteristics of urban landuse in multiple dimensions, multi-source geospatial data were utilized, most of them free-of-charge ( Figure 2). These are from Sentinel-2 high-resolution remote sensing data (source: https://earthengine.google.com/, accessed on 6 December 2021), Luojia-1 NTL remote sensing data (source: http://59.175.109.173:8888/app/login.html, accessed on 6 December 2021), Gaode Map POI data (source: https://lbs.amap.com, accessed on 6 December 2021), and GPS locationbased mobile big data provided by a leading third-party big data company in China. The mobile data recorded the cumulative number of active mobile devices in a grid (around 140 m resolution) by month. Daytime (9 a.m.-5 p.m.) and nighttime statistics (9 p.m.-5 a.m.) were adopted. To maintain the temporal consistency of the data, all datasets were collected from the same period of 2019, except for the map POI data, because of the difficulty of obtaining the historical POI data through public methods. Hence, the POI data in 2020 were collected to keep the temporal consistency as much as possible. Table 1 summarizes the basic characteristics of the data.  To reduce the influence of clouds and precipitation on remote sensing data (Sentinel-2 and Luojia-1), cloud-free composite images in autumn and winter were utilized. Considering that Shenzhen is an immigrant city, human activities are easily affected by holiday economy and population migration. The mobile data were collected in October 2018 (National Day holiday), November 2018 (non-holiday), and February 2019 (Chinese Lunar New Year). Data pre-processing included data cleansing and coordinate transformation. As for map POI data, the pre-processing also included category reclassification to match the urban landuse classification system.

Feature Selection and Dimension Reduction
The multi-scale data were spatially unified based on land parcels in the form of data features (i.e., attributes to land parcels). The statistical characteristics were utilized to generate these attributes (e.g., the average NTL brightness value in a land parcel). Based on prior knowledge, initial features were manually selected including physical and socioeconomic features. The former was mainly from remote sensing imagery data such as band spectral characteristics, vegetation index, and nighttime light brightness. The latter is mainly from map POI and mobile big data such as population density in the daytime and nighttime, population in different months, POI density, and type.
Similar to most machine learning algorithms, it is necessary to reduce the dimension of feature space to simplify the complexity of the classification model. A few features that can best describe and present urban landuse types were finalized. Generally, there are two approaches to dimension reduction, namely, feature extraction and feature selection [37]. Feature extraction aims to construct a new feature space through feature transformation or a combination of features. It involves the generation of new features that may result in the loss of original feature information. The newly generated attributes are usually difficult to be explained physically while feature selection aims to obtain a subset of the original attribute features, which can retain the physical interpretability of the features. To preserve the features' physical interpretability, feature selection was adopted for feature dimension reduction. Correlation coefficient (r) was utilized to filter the redundancy with a threshold score of 0.95 to minimize the feature dimension. If the r is greater than 0.95, only one of the paired features remains. After feature dimension reduction, a subset of the features was utilized for model training.

Semi-Supervised Multi-Feature Classification Framework
The main objective of semi-supervised learning is to train classifiers with both labeled and unlabeled samples. The classification model is first trained from labeled samples and then refined by unlabeled samples [38]. In our study, the semi-supervised co-forest algorithm was applied to detailed urban landuse classification. The co-forest algorithm is a disagreement-based semi-supervised classification method and is regarded as an extension of co-training based on the RF algorithm (a co-training-style RF algorithm) [39]. In the co-forest, N (N > 3) classifiers such as random decision trees are first individually trained based on an original labeled sample set. If the classifiers make an agreement on labeling some unlabeled samples with a certain confidence, a new training set will be generated based on the original and newly labeled samples to re-train the classifiers. The Remote Sens. 2022, 14, 648 7 of 17 automated labeling strategy for unlabeled samples was characterized with higher costperformance [40]. Figure 3 illustrates the research framework of detailed urban landuse classification based on multi-source geospatial data and the co-forest algorithm. extension of co-training based on the RF algorithm (a co-training-style RF algorithm) [39]. In the co-forest, N (N > 3) classifiers such as random decision trees are first individually trained based on an original labeled sample set. If the classifiers make an agreement on labeling some unlabeled samples with a certain confidence, a new training set will be generated based on the original and newly labeled samples to re-train the classifiers. The automated labeling strategy for unlabeled samples was characterized with higher cost-performance [40]. Figure 3 illustrates the research framework of detailed urban landuse classification based on multi-source geospatial data and the co-forest algorithm.

Model Adjustment and Improvement
Urban landuse classification is a typical multi-nominal classification task. There is a common problem that the training set has an imbalance issue. For instance, various urban landuse types are not evenly distributed in a city, which can easily cause a sampling bias. Besides, unlike traditional target recognition and extraction tasks in which the samples are certain with a clear-cut definition or attribute, urban land parcels may be a mixed landuse type (e.g., commercial-residential mixed), especially in many well-developed mega cities. This will cause the deviation of training results in a semi-supervised model. To minimize the above problems, three improvement schemes of the co-forest algorithm were applied as follows. Improvement scheme 1: Add the weight of the sample in the process of initialing and constructing classifiers to deal with the sample imbalance issue; set a judgement in

Model Adjustment and Improvement
Urban landuse classification is a typical multi-nominal classification task. There is a common problem that the training set has an imbalance issue. For instance, various urban landuse types are not evenly distributed in a city, which can easily cause a sampling bias. Besides, unlike traditional target recognition and extraction tasks in which the samples are certain with a clear-cut definition or attribute, urban land parcels may be a mixed landuse type (e.g., commercial-residential mixed), especially in many well-developed mega cities. This will cause the deviation of training results in a semi-supervised model. To minimize the above problems, three improvement schemes of the co-forest algorithm were applied as follows. Improvement scheme 1: Add the weight of the sample in the process of initialing and constructing classifiers to deal with the sample imbalance issue; set a judgement in the model that unlabeled samples should not be added into the labeled sample set unless a certain confidence level is reached (exclude mixed landuse samples). Improvement scheme 2: On the basis of improvement scheme 1, add a restriction of error rate in the iteration process (i.e., limits the error to below 0.2 to end the iteration to avoid over fitting). Improvement scheme 3: On the basis of improvement scheme 2, a noise cutting step is executed after adding the unlabeled samples into the labeled sample set.

Model Evaluation and Accuracy Assessment
In this study, the k-fold (k = 5) cross-validation was adopted to evaluate the model performance to ensure the reliability of model evaluation. To quantify the classification accuracy, a confusion matrix was used and the assessment metrics included overall accuracy (OA) and Kappa coefficient [41].

Impact Analysis of Small Sample Size
In order to investigate the influence of small sample size on the classification result, we tested the stability of the classification by using different proportions of the training samples. Based on all training samples, the classification models were tested as the number of training samples decreased by 1% each time. A stratified random sampling method was adopted for each sampling process to keep the proportion of sample distribution consistent. The optimal cost-performance for the number of training samples was determined by the change rate of the accuracy. The change rate (m) can be calculated by using the following formular: m = (A − a_k)/A, where A represents the best accuracy by using all training samples and a_k represents the accuracy by using k% training samples.
To make the model learning more reliable, training samples were randomly selected from the training set for model training five times at every training set ratio. The average of five-time accuracy scores was adopted as the classification accuracy under that training set ratio.

Subset of Features
According to the researchers' prior knowledge, 59 features from multi-source geospatial data were initially selected as the main characteristics of urban landuse types. The correlation coefficient between any two of those features was computed to eliminate redundant features. Finally, 34 individual features were adopted for model training. Table 2 lists the selected features by data type.

Urban Landuse Classification System
Referring to previous studies [34,35] and the national standard of landuse classification (GB/T 21010-2017), an urban landuse classification system was adopted that consists of five level-1 urban landuse categories, namely residential (R), commercial (C), industrial (I), transportation (T), and public management and service (P). Table 3 lists the detailed landuse categories with descriptions or examples. Governmental office zone, medical and health services, sports and cultural facilities.
* Because land parcels are from road network segmentation, the classification of roads was excluded in our study. Figure 4 illustrates the distribution of labeled samples including two sources from field survey and manual interpretation of very high-resolution (VHR) imagery. Field survey data contain 162 labeled land parcels, which were sourced from the Urban Planning and Land Resource Research Center, Planning and Nature Resource Bureau of Shenzhen Municipality. The other labeled samples were derived from VHR image interpretation by human vision. The visual interpretation process is based on VHR images from Google Earth and assisted with map apps including Gaode maps and Microsoft Bing maps. Besides, field survey data are also used to assist in manual image interpretation. Through sample quality control, a total of 1021 labeled samples accounting for around 15% of all land parcels were finally collected for model training and testing.

Labeling and Train/Test Split
(I), transportation (T), and public management and service (P).  Governmental office zone, medical and health services, sports and cultural facilities * Because land parcels are from road network segmentation, the classification of road was excluded in our study. Figure 4 illustrates the distribution of labeled samples including two sources from field survey and manual interpretation of very high-resolution (VHR) imagery. Field sur vey data contain 162 labeled land parcels, which were sourced from the Urban Planning and Land Resource Research Center, Planning and Nature Resource Bureau of Shenzhen Municipality. The other labeled samples were derived from VHR image interpretation b human vision. The visual interpretation process is based on VHR images from Googl Earth and assisted with map apps including Gaode maps and Microsoft Bing maps. Be sides, field survey data are also used to assist in manual image interpretation. Through sample quality control, a total of 1021 labeled samples accounting for around 15% of al land parcels were finally collected for model training and testing. All labeled samples were divided into two groups, namely, the training set and test set. The stratified sampling method was adopted to ensure the same sample distribution for different landuse categories. To make the model training results comparable, a fixed number of labeled samples (i.e., 204) accounting for 3% of all land parcels was selected as the test set. The remaining labeled samples (i.e., 817), accounting for 12% of all land parcels, were employed as the training set. In order to find an optimal training set ratio, the proportion of labeled samples used for model training decreased by 1% each time. For each proportion, training samples were randomly selected five times. The model training results were evaluated by the fixed test set. The average of five-time calculation results at every training set ratio was regarded as the accuracy of the model at that ratio.

Experimental Environment and Parameters
The experiment was carried out on a Mac OS platform (4-core CPU, 16G memory). The implementation of the co-forest algorithm and the modified versions was based on Java language, JDK version 8.1, and Waikato Environment for Knowledge Analysis (Weka) framework, version 3.8.4. The number of co-training classifiers for the co-forest algorithm was set from 3 to 20. Figure 5 shows the classification accuracies of the original co-forest algorithm and the improved versions by the overall accuracy and Kappa coefficient, respectively. Results showed that all of the three improvement schemes were better than the original version of the co-forest algorithm. Among them, improvement scheme two had a better performance compared with the other improvement schemes, since it obtained a ranking as No.1 more times. Therefore, improvement scheme two was employed as the preferred method in the further comparison and analysis. All labeled samples were divided into two groups, namely, the training set and test set. The stratified sampling method was adopted to ensure the same sample distribution for different landuse categories. To make the model training results comparable, a fixed number of labeled samples (i.e., 204) accounting for 3% of all land parcels was selected as the test set. The remaining labeled samples (i.e., 817), accounting for 12% of all land parcels, were employed as the training set. In order to find an optimal training set ratio, the proportion of labeled samples used for model training decreased by 1% each time. For each proportion, training samples were randomly selected five times. The model training results were evaluated by the fixed test set. The average of five-time calculation results at every training set ratio was regarded as the accuracy of the model at that ratio.

Experimental Environment and Parameters
The experiment was carried out on a Mac OS platform (4-core CPU, 16G memory). The implementation of the co-forest algorithm and the modified versions was based on Java language, JDK version 8.1, and Waikato Environment for Knowledge Analysis (Weka) framework, version 3.8.4. The number of co-training classifiers for the co-forest algorithm was set from 3 to 20. Figure 5 shows the classification accuracies of the original co-forest algorithm and the improved versions by the overall accuracy and Kappa coefficient, respectively. Results showed that all of the three improvement schemes were better than the original version of the co-forest algorithm. Among them, improvement scheme two had a better performance compared with the other improvement schemes, since it obtained a ranking as No.1 more times. Therefore, improvement scheme two was employed as the preferred method in the further comparison and analysis.

Comparison with Traditional Supervised Algorithms
Based on the model improvement, the classification accuracy of the semi-supervised co-forest algorithm was compared with traditional supervised algorithms including the RF and XGBoost at different levels of the training sample size. From Figure 6, the semisupervised algorithm performance was better than the other two supervised algorithms

Comparison with Traditional Supervised Algorithms
Based on the model improvement, the classification accuracy of the semi-supervised co-forest algorithm was compared with traditional supervised algorithms including the RF and XGBoost at different levels of the training sample size. From Figure 6, the semisupervised algorithm performance was better than the other two supervised algorithms in the case of using a 7% training set ratio or above. In other words, the proposed co-forest classification method can achieve a comparable, and even a better classification result by using less labeled training samples. Table 4 lists the minimum sample size requirement for the three algorithms at different accuracy levels. Compared with the similar tree-based classifiers, the semi-supervised learning framework (co-forest) could reduce 17~20% of labeled samples to achieve the same accuracy level. in the case of using a 7% training set ratio or above. In other words, the proposed co-forest classification method can achieve a comparable, and even a better classification result by using less labeled training samples. Table 4 lists the minimum sample size requirement for the three algorithms at different accuracy levels. Compared with the similar tree-based classifiers, the semi-supervised learning framework (co-forest) could reduce 17%~20% of labeled samples to achieve the same accuracy level.

Impact of Training Sample Size
In order to analyze the influence of the training sample size on the classification performance, Figure 7 shows the change rate in the classification accuracy in the case of using a small sample size. By taking the best accuracy with a 12% training set ratio as a reference, the model performance declined as the training sample size became smaller. The classification accuracy declined rapidly once the training sample size was smaller than 5%. Accordingly, we can obtain a high cost-performance (i.e., the ratio of the labor cost in sampling and the accuracy of classification) with a training sample size no less than 5%.

Impact of Training Sample Size
In order to analyze the influence of the training sample size on the classification performance, Figure 7 shows the change rate in the classification accuracy in the case of using a small sample size. By taking the best accuracy with a 12% training set ratio as a reference, the model performance declined as the training sample size became smaller. The classification accuracy declined rapidly once the training sample size was smaller than 5%. Accordingly, we can obtain a high cost-performance (i.e., the ratio of the labor cost in sampling and the accuracy of classification) with a training sample size no less than 5%. In general, we can obtain better accuracy when more features are added. When considering the sources separately, map POI and high-resolution optical remote sensing (Sentinel-2) data showed better results than the other datasets.

Detailed Urban Landuse Mapping with Few Samples
Based on the modified co-forest algorithm and multi-source data, Figure 9 illustrates the detailed urban landuse classification result in Shenzhen with 5% training samples. The spatial distributions of detailed urban landuse categories were consistent with the official urban planning scheme to some extent. Residential and commercial lands were mainly distributed in the downtown, while industrial lands were mainly distributed in the suburbs. Remote Sens. 2022, 14, x FOR PEER REVIEW 12 of 18  Figure 8 presents the influence of different combinations of multi-source data on classification accuracy. The use of all multi-features leads a better accuracy of the classification. In general, we can obtain better accuracy when more features are added. When considering the sources separately, map POI and high-resolution optical remote sensing (Sentinel-2) data showed better results than the other datasets.

Detailed Urban Landuse Mapping with Few Samples
Based on the modified co-forest algorithm and multi-source data, Figure 9 illustrates the detailed urban landuse classification result in Shenzhen with 5% training samples. The spatial distributions of detailed urban landuse categories were consistent with the official urban planning scheme to some extent. Residential and commercial lands were mainly distributed in the downtown, while industrial lands were mainly distributed in the sub-  6. Discussion

Small Sample Learning in Urban Landuse Classification
The selection of the training sample size is always an empirical process. Although the usual guidance is "use as much as possible", it is a tradeoff when considering the cost and time of collecting the labeled samples for model training and testing [42]. In the case of To quantify the classification result, Figure 10 shows the confusion matrix and accuracy assessment. The overall accuracy of the classification was 0.79. For the accuracy of specific urban landuse category, the producer and user accuracies measure the omission and commission errors, respectively. From the results, "residential", "industrial", and "public management and service" types achieved higher accuracies. The classification errors mainly concentrated in the misclassification between "commercial" and "residential" types, and the misclassification between "public management and service" and "residential" types.  6. Discussion

Small Sample Learning in Urban Landuse Classification
The selection of the training sample size is always an empirical process. Although the Figure 10. The confusion matrix and accuracy assessment, the code of landuse category from 1 to 5 represents residential (1), commercial (2), industrial (3), transportation (4), and public management and service (5), respectively.

Small Sample Learning in Urban Landuse Classification
The selection of the training sample size is always an empirical process. Although the usual guidance is "use as much as possible", it is a tradeoff when considering the cost and time of collecting the labeled samples for model training and testing [42]. In the case of urban landuse classification with limited labeled samples, the ideas to improve the utilization of labeled samples are mainly from the data level or the model level. The former includes the use of high-dimensional data, and the latter includes the automatically train of unlabeled samples by active learning or semi-supervised learning methods. In this study, the two ideas are both considered.
The physical property of landuse such as "buildings" and "built-up areas" can be obtained directly through remote sensing imagery data. However, remote sensing data are not enough to obtain high-level semantic information of detailed urban landuse. By providing high-dimensional landuse features, socio-economic big data provide a better approach to the classification of detailed urban landuse. For example, map POI data are usually considered to be very closely related to the identification of urban landuse categories [43]. It should be pointed out that a single data source is still insufficient. According to our experiment, it is hard to obtain a satisfied classification result only by using map POI data or the combination of POI and optical remote sensing data. This confirms the importance of multi-features, where a better classification accuracy can be achieved with all multi-source and multi-modal data.
Since there might be many more parameters than the training samples, it is generally considered to be a reason for the failure of deep neural networks when the training sample size is small. The tree-based classifiers such as the RF and XGBoost have been proven to show a better classification effect and have been widely utilized in urban landuse classification applications [44]. Therefore, in this study, we adopted a semi-supervised tree-based classification framework for the comparison. The results prove that the semi-supervised method performed better than the supervised classifiers, and it effectively reduced the demand of labeled samples for model training without reducing the classification accuracy.

Classification Stability under Small Size of Training Samples
To examine the classification stability with small sample size, a previous study pointed out that the number of training samples should not be less than 7% of all land parcels for the RF algorithm based on multi-source geospatial data [35]. Our study agrees with this conclusion. For the semi-supervised co-forest algorithm, even a 5% sample size can ensure the variation of classification accuracy within an acceptable range. Our study also showed that the semi-supervised algorithm attained a better performance than the supervised algorithms when the training sample size was larger than 7%.

Limitations and Uncertainties
Most previous studies have focused on the classification of broad landuse/land cover categories rather than detailed urban landuse categories. Due to the difficulty of semantic segmentation, it is hard to obtain a highly-accurate result of detailed urban landuse classification. Some scholars have reported that the overall accuracy of detailed urban landuse classification in mega cities such as Shenzhen is lower than 0.76 [35,36]. Although our results reached or even exceeded that accuracy level, in this study, we focused more on the effectiveness of applying multi-source geospatial data and semi-supervised classifier to improve small sample size-based landuse classification, rather than the absolute accuracy of the classification task.
In this study, training samples were mainly from high-purity landuse samples (e.g., the dominant type occupies more than 90% of the area in a land parcel). However, as a fast and well-developed city, Shenzhen has various mixed models of landuse such as "commercial-residential mixed" land. The models of mixed use include the mixture in horizontal space and vertical space. This may introduce a certain degree of uncertainty in the classification. When considering the generalization to other cities (e.g., using the original labels from one city to another), the major challenges come from the difference in urban landuse structure. The same urban landuse may have a distinct physical description in different cities. Therefore, more city cases and landuse labels are needed to verify the generalization performance of the model.

Conclusions
In this study, we explored the effectiveness of the semi-supervised co-forest algorithm and multi-source geospatial data in detailed urban landuse classification with a small sample size. Given that the collection of the large number of labeled samples in urban landuse classification practice is very difficult and has a high-cost, we also tested an optimal training set ratio of maintaining a stable classification result. By taking Shenzhen City as a case, the semi-supervised co-forest method showed a comparable result with the traditional supervised classifiers such as RF and XGBoost with a lower training set ratio level (reduced by 17-20%). The model performance declined rapidly once the training sample size was less than 5% in total. Therefore, 5% training samples or above are necessary to keep the loss of classification accuracy within an acceptable range. This study also confirms the importance of multi-source and multi-modal data, which have significantly improved the classification accuracy. Among them, POI data and high-resolution remote sensing data make a higher contribution.
In the future, we will extend the proposed method to other rapidly changing cities to evaluate the generalization performance. For more efficient usage of labeled samples, we will introduce data enhancement methods such as unsupervised enhancement algorithms to generate new labeled samples based on the existing labeled and unlabeled samples. Besides, we will analyze the mixed landuse by creating more labels to mine the features of the mixed landuse category.  Data Availability Statement: Publicly accessible datasets presented in this study are available. The data can be found from the links in text.