remote sensing Remote Sensing Mapping of Build-Up Land with Noisy Label via Fault-Tolerant Learning

: China’s urbanization has dramatically accelerated in recent decades. Land for urban build-up has changed not only in large cities but also in small counties. Land cover mapping is one of the fundamental tasks in the ﬁeld of remote sensing and has received great attention. However, most current mapping requires a signiﬁcant manual effort for labeling or classiﬁcation. It is of great practical value to use the existing low-resolution label data for the classiﬁcation of higher resolution images. In this regard, this work proposes a method based on noise-label learning for ﬁne-grained mapping of urban build-up land in a county in central China. Speciﬁcally, this work produces a build-up land map with a resolution of 10 m based on a land cover map with a resolution of 30 m. Experimental results show that the accuracy of the results is improved by 5.5% compared with that of the baseline method. This notion indicates that the time required to produce a ﬁne land cover map can be signiﬁcantly reduced using existing coarse-grained data.


Introduction
Land cover data are crucial for ecological environmental protection [1][2][3][4], natural resource management [5][6][7][8], urban planning [9][10][11][12], and precision agriculture [13][14][15][16]. Rural populations continue to migrate to cities and towns for work, study, and live as a result of urbanization, prompting the need for planning authorities to adjust the extent of new urban build-up land. Remote sensing sensor and aerospace technology advancements have resulted in easier access to an increasing number of remote sensing images with shorter time periods. Remote sensing imagery has become an important data source for land cover and urban land use monitoring [17]. Urban land has expanded in large cities and small counties in China as a result of the reform and opening-up policy [18,19]. Large cities can have timely map data updates because their mapping is supported by the system and investment [20][21][22]. In contrast, small towns in China are considerable and widely dispersed with varied topography, making dynamic mapping of urban land difficult.
Many scholars have extensively studied land surface mapping based on remote sensing. These studies have mainly used machine learning for supervised or unsupervised classification based on medium-and high-resolution remote sensing images [23][24][25]. The supervised approach is more widely utilized because the sample data allow the algorithm in effectively differentiating features. Early supervised classification methods include maximum likelihood, neural network, and decision trees, while support vector machines (SVM) and random forests (RF) outperform other traditional supervised classifiers [26]. Supervised methods require a training set with a certain size of correctly labeled samples for the model to learn the classification patterns of the samples.
Deep neural network models have achieved great success in the field of remote sensing image analysis with the development of deep learning techniques and computer hardware. label noise by cleaning up noisy labels or designing robust loss functions in deep learning frameworks [62].
The noise contained in a dataset is divided into two main categories: the first category corresponds to feature noise, which is defined as inaccuracies or errors introduced in the instance attribute values. Feature noise comes from spectral noise caused by poor acquisition conditions (e.g., cloudy days); geometric errors brought in by data preprocessing, such as orthorectification and geometric correction; alignment differences caused by digitization or outlining; or errors in coding problems. The second category corresponds to class label noise (i.e., instance labels are different from ground truth labels). The corresponding instances are called corrupted or mislabeled instances. Label noise is considered to be more harmful and difficult to handle than attribute noise and can significantly degrade classification performance [63]. Noisy label learning using shallow learning methods has been studied in the literature [61]. However, research in the context of deep learning is still scarce (but has recently grown) [64]. Among several approaches that have been proposed to robustly train deep neural networks on datasets with noisy labels, some approaches address this problem by removing noisy labels and using clean estimated labels to train deep neural networks or smoothly reduce the effect of noisy labels by applying smaller weights on noisy labeled samples. These approaches use directed graphical models [65], conditional random fields [66], knowledge graph distillation [60], meta-learning [67], or noisy transfer matrix estimation [64] to solve the noisy labeling problem. However, these approaches require an additional small fraction of data with clean labels or ground truth of pre-identified noise labels to model the noise in the dataset.
Few studies have been focused on the adverse effects of label noise in remote sensing image analysis. Some studies have analyzed the effects of noisy labels on the classification performance of satellite image time series [68] and hyperspectral images [69]. Jian [70] and Damodaran [71] proposed loss functions to learn improved classification models to reduce the detrimental effect of noisy labels on the classification problem of remote sensing images. Kaiser [72] used online open street maps (outdated or unlabeled ground truth) to obtain the feasibility of classification maps. The aforementioned study did not directly consider label noise as the problem specificity. Some other studies have dealt with label noise in the context of shallow classifiers (RF and logistic regression) by selecting clean labeled instances through outlier detection [68] or using existing noise-resistant logistic regression methods [73]. However, these label noise minimization methods are designed for specific models; thus, the algorithms lack generality. Combining noisy label correction strategies with deep learning is a promising approach in solving the land cover classification problem of remote sensing images under noisy labels.
This work develops a noisy label learning method using land cover products as a benchmark to produce high-resolution construction-use maps using existing label data (i.e., land cover products). This method uses existing remote sensing images and lowresolution landcover maps containing noise as label data and produces high-resolution build-up landcover maps by semi-supervised data filtering and fault-tolerant learning loss functions. Section 2 introduces the study area and the data source. Section 3 explains the methodology, including details of semi-supervised data filtering and the fault-tolerant learning loss functions. Section 4 presents the results. A discussion is presented in Section 5 and conclusions are drawn in Section 6.

Study Area
The study area of this work is Taoyuan County in northwestern Hunan Province, China, which is a central town with typical complex geomorphology. Taoyuan County is part of the central Hunan hills and is on the transition zone from the western Hunan mountains to the lakeside plain of Dongting Lake, with a steep western and eastern terrain, high in the north and south, and low in the middle. Most county surfaces are covered by forest land, with forest land accounting for about 70% and construction land accounting for 2%. The remaining surface area is covered by agricultural land and water bodies.

Remote Sensing Data
The 10 m resolution remote sensing image of Sentinel-2 was used as the image data for land cover classification. Sentinel-2 L2A level data were downloaded from Google Earth Engine for September 2020 with a spatial resolution of 10 m. Sentinel-2 L2A level data is a geometrically corrected, atmospherically corrected and radiometrically calibrated product released by the European Space Agency (ESA). In this study, the 10 m resolution Sentinel-2 L2A of Taoyuan County in the RGB band image was used as the classification data, as shown in Figure 1. The 2020 globeland30 product [55] of Taoyuan County area was downloaded from the GLOBELAND30 website [74] for the training samples, which provides land cover labels of Taoyuan County at 30 m resolution, but the build-up land category is relatively coarse and a gap with the real land cover can be observed, as shown in Figure 2a. In addition, the 10 m land cover data [54] of Taoyuan County in 2020 were downloaded from the ESA website [75] and used to create a test set for the validation of the algorithm, as shown in Figure 2b. The distribution of farmland and forest land categories on the land cover maps of the two resolutions differs in Figure 2, and the labels of these two categories are easily confused with each other due to a large amount of noise. Meanwhile, the 30 m land cover map has rough build-up land boundaries compared with the 10 m land cover map, and each build-up land area is larger, while other land cover categories exist within the 30 m build-up land spot. Therefore, the labeling of built-up land using the 30 m land cover map will inevitably generate category noise.

Study Area
The study area of this work is Taoyuan County in northwestern Hunan Province, China, which is a central town with typical complex geomorphology. Taoyuan County is part of the central Hunan hills and is on the transition zone from the western Hunan mountains to the lakeside plain of Dongting Lake, with a steep western and eastern terrain, high in the north and south, and low in the middle. Most county surfaces are covered by forest land, with forest land accounting for about 70% and construction land accounting for 2%. The remaining surface area is covered by agricultural land and water bodies.

Remote Sensing Data
The 10 m resolution remote sensing image of Sentinel-2 was used as the image data for land cover classification. Sentinel-2 L2A level data were downloaded from Google Earth Engine for September 2020 with a spatial resolution of 10 m. Sentinel-2 L2A level data is a geometrically corrected, atmospherically corrected and radiometrically calibrated product released by the European Space Agency (ESA). In this study, the 10 m resolution Sentinel-2 L2A of Taoyuan County in the RGB band image was used as the classification data, as shown in Figure 1. The 2020 globeland30 product [55] of Taoyuan County area was downloaded from the GLOBELAND30 website [74] for the training samples, which provides land cover labels of Taoyuan County at 30 m resolution, but the build-up land category is relatively coarse and a gap with the real land cover can be observed, as shown in Figure 2a. In addition, the 10 m land cover data [54] of Taoyuan County in 2020 were downloaded from the ESA website [75] and used to create a test set for the validation of the algorithm, as shown in Figure 2b. The distribution of farmland and forest land categories on the land cover maps of the two resolutions differs in Figure  2, and the labels of these two categories are easily confused with each other due to a large amount of noise. Meanwhile, the 30 m land cover map has rough build-up land boundaries compared with the 10 m land cover map, and each build-up land area is larger, while other land cover categories exist within the 30 m build-up land spot. Therefore, the labeling of built-up land using the 30 m land cover map will inevitably generate category noise.

Sample Collection
To create the training set, the 30 m resolution land cover label map was resampled to a spatial resolution of 10 m using ArcGIS resampling tool, which enables the 30 m resolution label map to be spatially aligned with the 10 m image. Then, 4000 pixels of build-up land category and 4000 pixels of forest land, waterbody, farmland, and other categories were randomly selected within the whole image with a 30 m resolution land cover as category labels. Subsequently, in order to preserve the contextual information of the selected pixels, 4000 patches of 64 × 64 size are cropped with the selected pixels as the center and added to the training set, and the category of the patches depends on the category of the center pixel. Another advantage of using this method is that each pixel of the remote sensing image of Taoyuan County can be used as the center pixel of the mapping sample, and the pixel-level mapping results can be obtained in the final mapping stage. Since the pixels are randomly selected in the category, if the selected pixels are close enough, a partial overlap is created when cropping the patches. In making the test set, 8000 pixels of build-up land category and 8000 pixels of other categories were randomly selected within the whole image with a 10 m resolution land cover category as the label. The same size patches were cropped out. Considering that less category noise exists in the 10 m land cover product, 4000 construction land categories were selected among the 8000 real samples, and 4000 samples of other land cover types were regarded as the test set. When the 30 m land cover product is used as a label for the 10 m image, the erroneous label category will be brought to the pixels of the 10 m image, and this produces category noise labels. In addition, the 30 m land cover product cannot be guaranteed to be accurate at the time of production, and the same category noise exists due to the misclassification. In this case, the training set produced in this work is a training set containing category noise.

Sample Collection
To create the training set, the 30 m resolution land cover label map was resampled to a spatial resolution of 10 m using ArcGIS resampling tool, which enables the 30 m resolution label map to be spatially aligned with the 10 m image. Then, 4000 pixels of build-up land category and 4000 pixels of forest land, waterbody, farmland, and other categories were randomly selected within the whole image with a 30 m resolution land cover as category labels. Subsequently, in order to preserve the contextual information of the selected pixels, 4000 patches of 64 × 64 size are cropped with the selected pixels as the center and added to the training set, and the category of the patches depends on the category of the center pixel. Another advantage of using this method is that each pixel of the remote sensing image of Taoyuan County can be used as the center pixel of the mapping sample, and the pixel-level mapping results can be obtained in the final mapping stage. Since the pixels are randomly selected in the category, if the selected pixels are close enough, a partial overlap is created when cropping the patches. In making the test set, 8000 pixels of buildup land category and 8000 pixels of other categories were randomly selected within the whole image with a 10 m resolution land cover category as the label. The same size patches were cropped out. Considering that less category noise exists in the 10 m land cover prod-

Method
One pixel at 30 m resolution corresponds to 3 × 3 pixels at a 10 m rate due to the difference in resolution. These 9 pixels can only be given category labels of the same pixel from 30 m resolution when using the training set for manufacturing construction land category labels at 30 m resolution for construction land mapping at 10 m resolution. However, the land cover categories of these 9 pixels are not exactly the same in the actual 10 m resolution case. Consequently, the training set will contain incorrect labels (i.e., noisy labels). The goal of the method in this work is for the classifier to find the incorrect labels and learn against the correctly labeled samples in the case that the training set contains noisy labels, allowing the corresponding features of the correct class to be learned, and the images with 10 m resolution can be correctly classified. The flow chart of the method in this work is shown in Figure 3, which includes three steps. First, the land cover map and remote sensing images are preprocessed. Second, the pre-trained baseline is used to calculate the Remote Sens. 2022, 14, 2263 6 of 23 classification confidence of the unlabeled data and filter the samples with higher confidence in the building category to join the new training set for initial noise filtering. Finally, the filtered samples are trained using the fault-tolerant learning loss, and the build-up land mapping results are obtained after verifying the accuracy. ever, the land cover categories of these 9 pixels are not exactly the same in the actual 10 m resolution case. Consequently, the training set will contain incorrect labels (i.e., noisy labels). The goal of the method in this work is for the classifier to find the incorrect labels and learn against the correctly labeled samples in the case that the training set contains noisy labels, allowing the corresponding features of the correct class to be learned, and the images with 10 m resolution can be correctly classified. The flow chart of the method in this work is shown in Figure 3, which includes three steps. First, the land cover map and remote sensing images are preprocessed. Second, the pre-trained baseline is used to calculate the classification confidence of the unlabeled data and filter the samples with higher confidence in the building category to join the new training set for initial noise filtering. Finally, the filtered samples are trained using the fault-tolerant learning loss, and the build-up land mapping results are obtained after verifying the accuracy.

Problem Formulation of Fault-Tolerant Learning
We suppose that remote sensing image land cover categories with category classes are present in the training set, and each land cover category can be represented by feature type. Let be the feature space of the land cover category, and = {0,1} be the label space. We assume that all the labels are one-hot vectors and use to denote a onehot vector corresponding to class . Let = {( , ), = 1,2, ⋯ } be the independently and identically distributed samples obtained according to the distribution of × . The task of land cover classification is to train the classifier to learn the pattern

Problem Formulation of Fault-Tolerant Learning
We suppose that N remote sensing image land cover categories with C category classes are present in the training set, and each land cover category can be represented by feature type. Let X be the feature space of the land cover category, and Y = {0, 1} C be the label space. We assume that all the labels are one-hot vectors and use e c to denote a one-hot vector corresponding to class C. Let A r = x i , y r i , i = 1, 2, · · · N be the N independently and identically distributed samples obtained according to the distribution D of X × Y. The task of land cover classification is to train the classifier to learn the pattern of distribution D using A r as the training set. However, there is no such A r training set when using 30 m resolution land cover category labels for land cover classification of 10 m resolution remote sensing images. Instead, the training set A = x i , y i , i = 1, 2, · · · N is obtained, and A is obtained based on the distribution D ε . Here, y i denotes the incorrect label, and y r i denotes the correct label. y i and y r i are correlated, and their relationship can be expressed as follows: where ε cc denotes the noise rate, which is the probability that the label c becomes c . This general model is called class conditional noise because the probability of a label error in this model depends on the original label class. A special case of this model is called symmetric noise. The probability of converting a class label to any other class label is equal in the presence of symmetric noise (i.e., assuming that ε cc = 1 − ε and ε cc = ε c−1 , ∀c = c , where ε denotes the probability of a class label error). If all samples labeled by a particular category are selected from the training set of label errors under the condition that ε cc > ε cc , ∀c = c , then the samples that really belong to that category are still in the majority in the set. Now, the fault-tolerant learning problem of build-up land classification under noise labels can be formulated as follows: The build-upland classifier needs to learn the pattern of distribution D . However, the training set can only be obtained from the distribution D ε containing the error labels. In the build-up land classification task, let the function of the classifier be f (·; θ), where θ is a parameter. Assuming that the softmax output layer is used as the last layer of the neural network classifier, the labels y i of the training samples are all one-hot vectors when the training set is used for training. f (x i ; θ) is a probability vector of the same length as y i . Thus, the loss function of the classifier in the training phase can be defined as L f (x i ; θ), y i .

Training Set Sample Filtering and Pseudo-Label Assignment Method
The presence of category noise in the training set leads to a degradation of the classification performance. Accordingly, direct training with training sets containing a large amount of noise does not achieve satisfactory accuracy. We propose a semi-supervised learning scheme for confidence-based filtering of unlabeled remote sensing images to produce a low-noise dataset in this work to obtain datasets containing less noise, which is inspired by the pseudo-label assignment strategy [76] for obtaining valuable samples and the joint fine-tuning strategy [77] for using high-confidence samples to participate in optimization. The idea of the pseudo-label assignment is to select valuable samples based on the predicted classification confidence [76]. However, these pseudo-labels may not be reliable. Joint fine-tuning optimizes the classification model by adding samples with high confidence in the training set [77], but it requires a small number of labeled samples. Our solution combines the advantages of the above-mentioned two approaches to obtain reliable training information from unlabeled data for low-noise dataset construction for model optimization.
In a given training set A = x i , y i , i = 1, 2, · · · N , we input each patch x i into the baseline for pre-training. The output vector of the softmax layer is the confidence given by the classifier f (x i ; θ) for the class to which x i belongs. This vector is denoted by using P i = {P i1 , P i2 , P i3 , . . . .P ic }, P i ∈ R C , where P ic denotes the confidence that patch x i belongs to category c, and C is the total number of categories. We can use the confidence level to determine whether an unlabeled sample is associated with a label because the baseline has a strong discrimination ability.
After inputting unlabeled samples B = x i , y i , i = 1, 2, · · · N into the pre-training model, the first step is to rank the maximum value of the confidence level of each category to which the unlabeled patch x i belongs. In a given filtering threshold σ ∈ [0, 1], the N×σ samples with the maximum confidence level are selected to be added to the new training set, and the maximum value in P i is used for the samples assigned with category labels. The filtering threshold can be set based on the results of the pre-training test precision, which is also a response to the category labels in the samples. If the precision is low, indicating that the sample contains more noise, then σ should be reduced to obtain samples with better confidence; if the precision is high, indicating less noise in the sample, then σ can be appropriately increased. However, the precision should not be set particularly large. A large σ will bring in many noisy samples and weaken the effect of sample filtering.

Adaptive Fault-Tolerant Curriculum Learning Based on Batch Statistics
Curriculum learning can be considered a minimization of weighted losses [78]: where G(w) represents the curriculum. b is selected here as the size of a minibatch because the optimizer generally chooses SGD in the learning process. A simple choice for this curriculum is G(w) = −λ||w|| 1 , λ > 0. Substituting this equation into Equation (2) and taking l i = L f (x i ; θ), y i and considering the case where λ depends on the category label, Equation (2) can become: Where λ j = λ e j . In any fixed θ, the optimal solution ω i of the optimization problem is given by the relation: when l i < λ j , the optimal solution ω i = 1; when l i > λ j , the optimal solution ω i = 0, with y i = e i . Moreover, this relationship holds even when λ j is a function of θ or a function of all samples x i in the data set that matches y i = e i . Then, a truly dynamically adaptive optimization problem is available for the curriculum (i.e., by letting λ j depend on all x i in the minibatch and on the current value of θ).
In the choice of threshold λ j , consider those samples with no errors in the labels for i satisfying y i = e j , and ω i = 1 can be set to update θ in minibatches when l i < λ j . Given enough empirical evidence that samples with correct category labels are easier to learn than those with noise, some quantiles of the set of loss values obtained in small batches quantile or a similar statistic would be a good choice for λ j .
We can obtain l i = − ln f j (x i ; θ) because we use cross-entropy loss. f j (x i ; θ) is the posterior probability that x i belongs to class j at current θ because the network has a softmax output layer. The criterion for choosing the threshold can be that the assigned posterior probability is higher than a threshold value because the loss value and this posterior probability are inversely proportional. The method threshold in this work is set to the mean value of the posterior probability of each category of samples in a minibatch because the mean value can represent the confidence level of most of the samples in this category.
The above-mentioned method of learning adaptive fault-tolerant curriculum based on batch statistics can become: where mean µ y i = 1 |Sy i | ∑ s∈S y i f y i (x s ; θ) denotes the sample mean of the category posterior probability of the samples with category label y i , where S y i = q ∈ {b} y q = y i denotes the number of all y i category samples in the minibatch.
Considering that the neural network is trained using minibatches, the algorithm consists of three parts:

1.
Calculation of the sample selection threshold λ y x for a given small batch of data; 2.
Sample selection based on the threshold and Equation (4); 3.
Network parameter update using these selected samples.

Experimental Settings
In view of the different sizes and scales of buildings, this study uses a multi-scale classification model for remote sensing image feature extraction and classification. The model consists of four convolutional modules and four fully connected layers. The maximum pooling operation is also added in the second and fourth convolutional modules. The RELU activation operation is added in the fully connected layer, and the dropout is used to prevent overfitting. The input of the model is a mini-patch of 64 × 64, 32 × 32, and 16 × 16 size of the same patch for multi-scale feature extraction, and the specific structure of the Remote Sens. 2022, 14, 2263 9 of 23 CNN is shown in Table 1. The experiments will use the multiscale classification model with cross-entropy loss as the baseline. This experiment uses the Pytorch deep learning framework, which is trained on a computer with NVIDIA RTX3090 GPU, Intel i9-10900KF CPU, and 32 GB RAM. The training epoch is set to 1000, the batch size is set to 64, and the initial learning rate is 10-3. When the loss stops decreasing, we divide the learning rate by 10 and update the parameters with a new value. Pre-training is carried out using the training set before filtering, and the unlabeled samples are filtered by setting the hyperparameter σ according to the test accuracy of pre-training. Then, the 10 m image samples filtered by confidence are trained, mapped in the whole area of Taoyuan County, and tested with the artificial ground real data.

Evaluating Indexes
The widely used overall accuracy (OA), confusion matrix (CM), and Cohen's Kappa coefficient are applied to evaluate. In addition, the producer's accuracy and user's accuracy are also calculated.
CM is a matrix of n rows and n columns to represent the classification effect, where each row represents the actual category and each column represents the predicted value. It can indicate the categories that are prone to confusion, thus more intuitively representing the performance of the algorithm.
The formulas of OA and kappa coefficient are as follows: OA = Sum of diagonal values of confusion matrix Total number of confusion matrix samples (5) where n represents the category, N represents the sum of the number of samples, X ii represents the diagonal elements of the confusion matrix, X i+ represents the sum of the columns of the category, and X +i represents the sum of the rows of the category.

Mapping Results
The 10 m resolution build-up land mapping of Taoyuan County in 2020 is shown in Figure 4d. The overall results of construction land extraction in Taoyuan County in 2020 are good, and the boundaries of construction areas are more detailed compared with the 30 m land cover map, as shown in Figure 4. Large areas of construction land can refine the boundaries of build-up land and distinguish the internal non-build-up land areas, as shown in Figure 4e-g. The present mapping method can also find smaller settlements and even single-family houses, indicating that this method can help us in accurately understanding the urban expansion and land use changes, which is difficult to achieve with 30 m land cover maps (Figure 4a-c). Another major advantage of this method is the ability to use the trained model for construction land mapping of the latest image products to obtain the latest information on build-up land because the temporal coverage of the 10 m product data is not extensive.   Table 2 shows that our method improves OA by 5.5%, Cohen's Kappa coefficient by 0.11, producer's accuracy by 4.9% and user's accuracy by 6% over the baseline, which is the average result obtained by conducting 10 replicate experiments. This result shows the effectiveness of our method in classifying the build-up land in Taoyuan County. Moreover, this result verifies that category noise is generated when using the 30 m land cover map as a label for the 10 m image. The confusion matrices are shown in Figure 5.   Table 2 shows that our method improves OA by 5.5%, Cohen's Kappa coefficient by 0.11, producer's accuracy by 4.9% and user's accuracy by 6% over the baseline, which is the average result obtained by conducting 10 replicate experiments. This result shows the effectiveness of our method in classifying the build-up land in Taoyuan County. Moreover, this result verifies that category noise is generated when using the 30 m land cover map as a label for the 10 m image. The confusion matrices are shown in Figure 5.

Generalizability Assessment
The sample filtering in this method is assigned based on the confidence level of the unlabeled samples in the pseudo-label assignment method. The unlabeled samples from different regions can be added to the sample set to be filtered for filtering to improve the generalizability of the model. In this part of the experiment, we do not change the pretraining sample set. Only the set of unlabeled samples to be filtered is changed using the Sentinel-2 L2A cloudless images in September 2020 of Taojiang, Anhua, and Xinhua counties, which are geographically adjacent to Taoyuan County and have similar surface coverage types, to crop a total of 24,000 unlabeled image patches of 64 × 64 size and add them to the sample set to be filtered. After the sample screening and fault-tolerance training, the generalization of the method was verified by creating a test set with the 10 m land cover of the four counties for the build-up land category.
The results of build-up land mapping in Taojiang, Anhua, and Xinhua counties are obtained according to the experimental method in Section 4.4, as shown in Figure 6. Figure  7 shows that the build-up land mapping results of Taojiang, Anhua, and Xinhua counties can also reflect the fine town boundaries and smaller area of settlements and single buildings. Better results are obtained in different scenes, such as farmland, woodland, and urban areas, which are more fine compared with the 30 m land cover resolution.

Generalizability Assessment
The sample filtering in this method is assigned based on the confidence level of the unlabeled samples in the pseudo-label assignment method. The unlabeled samples from different regions can be added to the sample set to be filtered for filtering to improve the generalizability of the model. In this part of the experiment, we do not change the pre-training sample set. Only the set of unlabeled samples to be filtered is changed using the Sentinel-2 L2A cloudless images in September 2020 of Taojiang, Anhua, and Xinhua counties, which are geographically adjacent to Taoyuan County and have similar surface coverage types, to crop a total of 24,000 unlabeled image patches of 64 × 64 size and add them to the sample set to be filtered. After the sample screening and fault-tolerance training, the generalization of the method was verified by creating a test set with the 10 m land cover of the four counties for the build-up land category.
The results of build-up land mapping in Taojiang, Anhua, and Xinhua counties are obtained according to the experimental method in Section 4.4, as shown in Figure 6. Figure 7 shows that the build-up land mapping results of Taojiang, Anhua, and Xinhua counties can also reflect the fine town boundaries and smaller area of settlements and single buildings. Better results are obtained in different scenes, such as farmland, woodland, and urban areas, which are more fine compared with the 30 m land cover resolution.
The method can efficiently perform on the data of all four counties, as shown in Table 3. The average OA accuracy reaches 88.2% and average Cohen's Kappa reaches 0.588 without changing the pre-training set and using the unlabeled samples obtained from remote sensing images of the four counties for filtering and fault-tolerant training, indicating that the method is able to perform more accurate build-up land mapping and has better generalizability. However, the baseline obtained by simply using the training set of Taoyuan County, which contains more category noise, was tested on the image data of four counties without filtering and expanding the unlabeled samples, and the average OA accuracy was only 79.4%, indicating that the generalization of the baseline is insufficient. Among the results for each county, using this method Anhua County obtained relatively low accuracy results relative to the other two counties, which may be due to the fact that the built-up area in Anhua County is smaller in number and scope compared to Taoyuan County, producing a certain degree of difference from Taoyuan County. As can be seen from the confusion matrices in Figure 8, the confusion of our method is lower than that of baseline.  The method can efficiently perform on the data of all four counties, as shown in Table  3. The average OA accuracy reaches 88.2% and average Cohen's Kappa reaches 0.588 without changing the pre-training set and using the unlabeled samples obtained from re-  The method can efficiently perform on the data of all four counties, as shown in Table  3. The average OA accuracy reaches 88.2% and average Cohen's Kappa reaches 0.588 without changing the pre-training set and using the unlabeled samples obtained from re-    In addition, to evaluate the cartographic effect of the model in areas with different urbanization processes than Taoyuan County, we selected the cloudless Sentinel-2 images of Changsha County in September 2020 for evaluation. Changsha County is part of Changsha City which is the capital of Hunan Province, and is one of the top ten economic counties in China. Changsha County is close to Changsha City, with a large population and developed industries. The built-up area of the county is large and widely distributed, and the land cover is dominated by plains, with the terrain gradually sloping from the north, east and south to the central and west. Similarly, we do not change the pre-training sample set, but only the unlabeled sample set to be screened, and use the Sentinel-2 L2A images of Changsha County to crop a total of 8000 unlabeled images of 64 × 64 size patches to be added to the sample set to be filtered, and then complete the sample filtering and faulttolerant training model to obtain the validation results as shown in Table 4, the mapping results as shown in Figure 9 and the confusion matrices as shown in Figure 10.    ing imagery of small settlements.   As shown in Table 4, in Changsha County, where the urbanization process is different from Taoyuan County, the OA of our method reaches 85.9% higher than baseline by 8.2%, and Cohen's Kappa is 0.718 higher than baseline by 0.164, indicating that the method has better generalization and achieves in areas with different urbanization process satisfactory results. Figure 9c,e,g show the advantages of the model in extracting small area settlements. Additionally, Figure 10 shows our methods is better than baseline. However, the results of our method in Changsha County are 2.3% lower than the average OA of Taojiang County, Xinhua County, and Anhua County, indicating that the urbanization process, differences in the extent and size of building sites, and the distribution of land cover types may have an impact on the results. As in Figure 9b,f, some of the building sites exhibit white and blue roofs, but are not well extracted. The reason for this result may be that there is a difference in the distribution of construction land types in Changsha County, which is industrially developed with many factories, but Taoyuan County is not.

Applications for Future Scenarios
Urbanization is growing rapidly, and construction land maps need to be updated rapidly to meet the needs of the industry. Therefore, one application scenario of our method is to use the model trained from existing data for built-up land mapping of future images to update existing maps. We acquired cloud-free Sentinel-2 L2A level images of Taoyuan County in September 2021 from GEE. After cropping 8000 patches of 64 × 64 size, they were input to the pre-trained sample filtering and pseudo-label assignment model for Taoyuan County in 2020, and confidence sample filtering and label assignment values were performed for 8000 patches in 2021 to build a training set for fault-tolerant training.
For the test data, the surface coverage labels could not be obtained due to the lack of 10 m land cover products for 2021 in Taoyuan County, but considering that the time period of image acquisition is only 1 year different, the acquisition season is the same, and the built-up land in Taoyuan County does not change to a large extent within 1 year, the test set built with the 2020 data in Section 2.2.2 is still used for testing. In addition, since training on baseline requires the land cover map as the label, this section was not experimented on baseline. The mapping results are shown in Figure 11, and the test results are shown in Table 5 and Figure 12. 10 m land cover products for 2021 in Taoyuan County, but considering that the time period of image acquisition is only 1 year different, the acquisition season is the same, and the built-up land in Taoyuan County does not change to a large extent within 1 year, the test set built with the 2020 data in Section 2.2.2 is still used for testing. In addition, since training on baseline requires the land cover map as the label, this section was not experimented on baseline. The mapping results are shown in Figure 11, and the test results are shown in Table 5 and Figure 12.     As shown in Table 5, the overall accuracy OA of Taoyuan County reached 90.2% in 2021, which is only 0.5% less OA compared to 2020, Cohen's Kappa decreased by 0.01, producer's accuracy decreased by 0.4%, and user's accuracy decreased by 0.5%. As can be seen from Figure 11, the model can also detect the newly added built-up land in Taoyuan County. Therefore, the model can be better applied to future data for fast update of construction land use maps. As shown in Table 5, the overall accuracy OA of Taoyuan County reached 90.2% in 2021, which is only 0.5% less OA compared to 2020, Cohen's Kappa decreased by 0.01, producer's accuracy decreased by 0.4%, and user's accuracy decreased by 0.5%. As can be seen from Figure 11, the model can also detect the newly added built-up land in Taoyuan County. Therefore, the model can be better applied to future data for fast update of construction land use maps.

Evaluation of the Effects of SF and AFCB
SF denotes the training set sample filtering and pseudo-label assignment method, and AFCB denotes the adaptive fault-tolerant curriculum learning method based on batch statistics. Four sets of experiments are conducted in this section, which are: 1.
The baseline model uses cross-entropy loss and multiscale network.

2.
Baseline and training set sample filtering scheme are used. 3.
AFCB and multiscale network are used. 4.
The training set sample filtering scheme and AFCB are used. The experimental results are shown in Table 6.  Table 6 demonstrates that the training sample filtering can yield satisfactory results, improving the OA accuracy by about 3%, and using AFCB alone is also able to improve OA accuracy by 2.5%. The noise learning strategies used in this study worked. Finally, we are able to improve the OA accuracy by 5.5% over baseline by using both methods. The OA accuracy reached 90.7%, and this accuracy can support us in producing the build-up land map.

Band Evaluation
In this paper, we use the RGB bands of Sentinel-2 data for built-up land mapping, but there are 13 bands in Sentinel-2 data (Table 7). Therefore, in order to evaluate the validity of the bands, we acquired seven other bands from GEE that are applicable to land cover classification in the same L2A level image as the Sentinel-2 RGB image in 2.2.1. (The other three bands, B01, B09 and B10 are intended for atmospheric correction and therefore not considered.) Evaluating using single-band images as well as shortwave-infrared (SWIR) and red edge bands combinations on our method. We use a cubic spline interpolation [79] method to upsample the 20 m spatial resolution band to 10 m. The sampling and training methods are the same as above using RGB images. Figure 13 shows the comparison of the performance of each spectral band. Table 8 shows the results for the band combinations. The results show that the single-band performance of the R, G, and B bands outperforms the other bands. The results of band combinations are better than the results of single bands. Additionally, RGB band combination is the best for construction land classification, which is consistent with the results of [51].  Figure 13. Overall classification accuracy using single-band images.

Testing of Ground Truth Data
The advantage of the method in this paper is the ability to quickly obtain high-resolution built-up maps using publicly available data products, so the test set labels used are from the global 10 m land cover product. However, due to the uncertainty in the global 10 m land cover product itself, it may still exist despite sample selection to reduce the uncertainty. Therefore, in order to assess the uncertainty of the method in this paper, we use sub-meter high-resolution remote sensing image data from Google Maps to label ground truth samples of built and non-built land in Taoyuan County. By using visual interpretation and map annotation tools, we annotated 200 built-up land points and 200 non-built-up land points. Additionally, with 400 points of latitude and longitude as the patches' center, the RGB image of Sentinel-2 in 2020 was cropped to 64 × 64 size patches, Figure 13. Overall classification accuracy using single-band images. In addition, there are more studies using SAR data fused with optical data for surface coverage classification [48,80,81]. Therefore, to evaluate whether the fusion of SAR data can improve the performance of the method, we obtained VV and VH data from GEE with a spatial resolution of 10 m for the Sentinel-1 c-band in September 2020, which was processed by thermal noise removal, radiometric calibration, terrain correction and the final terrain-corrected values are converted to decibels via log scaling (10 × log10(x)). Since RGB works best in the band combination experiment, we then changed the number of input channels of the CNN model to 5 and produced images of five channels of R, G, B, VV, and VH and sampled and trained them using the same method as above. The testing results are shown in Table 9. The results show that the accuracy of the results using RGB+SAR data fusion is 0.2% lower than that using only RGB optical data, which may be due to the fact that the fusion effect of optical and SAR data depends on the classifier, and better performance may not be obtained using CNN [48,81].

Testing of Ground Truth Data
The advantage of the method in this paper is the ability to quickly obtain highresolution built-up maps using publicly available data products, so the test set labels used are from the global 10 m land cover product. However, due to the uncertainty in the global 10 m land cover product itself, it may still exist despite sample selection to reduce the uncertainty. Therefore, in order to assess the uncertainty of the method in this paper, we use sub-meter high-resolution remote sensing image data from Google Maps to label ground truth samples of built and non-built land in Taoyuan County. By using visual interpretation and map annotation tools, we annotated 200 built-up land points and 200 non-built-up land points. Additionally, with 400 points of latitude and longitude as the patches' center, the RGB image of Sentinel-2 in 2020 was cropped to 64 × 64 size patches, so that we obtained 200 ground truth samples of construction land and 200 non-construction land, as shown in Figure 14. County are covered by trees and croplands. Therefore, for the construction land, the uncertainty in the land cover product can be basically ignored, and it can be used as a reliable substitute for the ground truth data in the absence of ground truth data.

Conclusions
In this work, we convert the mapping results of land cover at 30 m resolution to buildup land mapping at 10 m resolution. The research method in this work is a new solution that can accomplish this work without human effort. Our proposed build-up land mapping method is able to achieve high accuracy in the case of labels containing category noise due to its better denoising and fault-tolerant capabilities. We obtained construction land mapping results with higher resolution than 30 m and 90.7% accuracy by using 30 m land cover category labels to learn from 10 m images. Our proposed solution uses training set sample filtering and pseudo-label assignment to accomplish the filtering of high-confidence samples and filter out a portion of noisy samples. The adaptive fault-tolerant curriculum learning method based on batch statistics used in the method further filters the Tests were performed on the model of our method using 10m land cover products and ground truth data and the results are shown in Table 10. As can be seen from Table 10, there is only a slight difference in the performance of the model on the test set produced by the 10m land cover product and on the ground truth, whether using the baseline or our approach. The possible reason is that both the production of WorldCover land cover products and the extraction of built-up land in Taoyuan County in this paper use the 2020 Sentinel-2 images as data sources, and the confused categories of built-up land in the products are bare lands and grasslands [54], while most of the areas in Taoyuan County are covered by trees and croplands. Therefore, for the construction land, the uncertainty in the land cover product can be basically ignored, and it can be used as a reliable substitute for the ground truth data in the absence of ground truth data.

Conclusions
In this work, we convert the mapping results of land cover at 30 m resolution to build-up land mapping at 10 m resolution. The research method in this work is a new solution that can accomplish this work without human effort. Our proposed build-up land mapping method is able to achieve high accuracy in the case of labels containing category noise due to its better denoising and fault-tolerant capabilities. We obtained construction land mapping results with higher resolution than 30 m and 90.7% accuracy by using 30 m land cover category labels to learn from 10 m images. Our proposed solution uses training set sample filtering and pseudo-label assignment to accomplish the filtering of high-confidence samples and filter out a portion of noisy samples. The adaptive faulttolerant curriculum learning method based on batch statistics used in the method further filters the samples with relatively correct labels during the training process. The combined use of the two strategies improves the overall accuracy by 5.5%. In addition, our proposed method has good generalizability and can be adapted to regions with similar land cover classes. Our method demonstrates the possibility of extending from existing low-resolution map products to high-resolution map products and can significantly reduce the human and material resources consumed in performing such mapping efforts, providing time-sensitive data support for urbanization.
However, the method also has shortcomings. If the noise in the sample is considerably large, then a good pre-training effect will be difficult to achieve, and the sample selection will be a challenge due to the limitation of the method. Considerable noise will also make it difficult for the batch statistics to reflect the true category mean value, resulting in the failure of the algorithm. The remote sensing data still contains a lot of data with category noise, and fault-tolerant learning strategies have many potential opportunities in remote sensing scenarios. In future research, we will consider adding prior knowledge of categories to reduce the impact of noise labels on the classification models.