Annual Maps of Built-Up Land in Guangdong from 1991 to 2020 Based on Landsat Images, Phenology, Deep Learning Algorithms, and Google Earth Engine

: Accurate mapping of built-up land is essential for urbanization monitoring and ecosystem research. At present, remote sensing is one of the primary means used for real-time and accurate surveying and mapping of built-up land, due to the long time series and multi-information advantages of existing remote sensing images and the ability to obtain highly precise year-by-year built-up land maps. In this study, we obtained feature-enhanced data regarding built-up land from Landsat images and phenology-based algorithms and proposed a method that combines the use of the Google Earth Engine (GEE) and deep learning approaches. The Res-UNet++ structural model was improved for built-up land mapping in Guangdong from 1991 to 2020. Experiments show that overall accuracy of built-up land map in the study area in 2020 was 0.99, the kappa coefficient was 0.96, user accuracy of built-up land was 0.98, and producer accuracy was 0.901. The trained model can be applied to other years with good results. The overall accuracy (OA) of the assessment results every five years was above 0.97, and the kappa coefficient was above 0.90. From 1991 to 2020, built-up land in Guangdong has expanded significantly, the area of built-up land has increased by 71%, and the proportion of built-up land has increased by 3.91%. Our findings indicate that the combined approach of GEE and deep learning algorithms can be developed into a large-scale, long time-series of remote sensing classification techniques framework that can be useful for future land-use mapping research.


Introduction
The term "built-up land" refers to locations in which people engage in multiple social and economic activities [1,2].These locations make use of the land's carrying capacity or building space and are not used for the main purpose of obtaining biological products.Built-up land consists primarily of residential space, roads, public facilities, and parks that may include vegetation and scenic facilities.The change in built-up land is a slow and long process, and may not be complete if we only look at the change in built-up land in one or two years.Long-term monitoring and mapping can more accurately reflect dynamic changes in built-up land over long periods of time and reveal the temporal and spatial laws governing these changes.Amid rapid economic development and the acceleration of urbanization, large amounts of crop land and forest are occupied by built-up land, leading to a sharp decline in biodiversity and further adverse impacts.In particular, expansion affects the water cycle system, altering surface runoff and increasing the risk of waterlogging disasters [3,4], and aggravates the urban "heat island" phenomenon [5][6][7], contributing to ecosystem deterioration [8][9][10][11][12].As such, the rate of change and trends in built-up land can directly or indirectly reflect and be used to evaluate the degree of urbanization.The swift acquisition of accurate spatial distribution information of built-up land supports analysis of changing trends in built-up land in the process of urban and rural development.This is crucial for formulating ecological and environmental protection policies, rationally planning land resources, and ensuring people's quality of life [13].
Thanks to recent rapid developments in remote sensing technology, big data with high spatial resolution and temporal resolution are accumulating continuously, and long time series remote sensing images have become indispensable for the timely and accurate monitoring of built-up land.Several previous studies have explored the possibilities offered by remote sensing for identifying built-up land.These studies can be roughly divided into two classes: spectral analysis and image classification.
Spectral analysis can be simply understood as utilizing the spectral characteristics of remote sensing images to analyze and obtain specific information.Various built-up land indexes are available, including the Normalized Difference Built-up Index (NDBI) [14][15][16], the Index-based Built-up Index (IBI) [2,[16][17][18], the Combinational Built-up Index (CBI) [16,19], the Modified Built-up Index (MBI) [16,19,20], and the Normalized Urban Areas Composite Index (NUACI) [14].These index methods were relatively popular in early and even current spectral analysis research [21].They can strengthen image features through calculations, and are then usually combined with the threshold method for the purpose of making judgments.The development of high-resolution images has allowed scholars to obtain built-up land characteristics based on dense time series image data using time series spectral analysis.For example, the spectral differences between impervious and pervious surfaces were analyzed from Landsat time series data [22]; time-space rules and Landsat time series stacks were used to capture continuous impervious surface dynamics [2]; MODIS NDVI and Landsat NDVI were merged for crop classification using long short-term memory (LSTM) algorithm [23]; and some scholars performed segment fitting on 12-year time series data and successfully extracted built-up land by extracting trajectory features [24].In summary, spectral analysis can accomplish much using the spectral and temporal characteristic information provided by remote sensing images but cannot provide optimal accuracy for surveying and mapping long time series built-up land.
In contrast with spectral analysis, image classification involves the use of image segmentation technology to classify remote sensing images [25].With the rapid development of image classification and segmentation technology, classification methods have evolved from decision tree classifiers and support vector machine (SVM) [18,26] methods to the more recent artificial neural network (ANN) [20,27,28] approach.Early research focused on simpler neural networks, such as random forests (RF) and back propagation (BP) neural networks.Zhang et al. [29] used multi-temporal synthesis and relative radiation normalization methods to extract phenological information and then mapped the impervious surface of the Yangtze River Delta from 1984 to 2020 using the RF method.Wan et al. [30] proposed a method that combined BP neural networks and support vector data description to extracted impervious surfaces.As deep learning methods used in other fields, such as medicine and biology, have matured, they have found new applications in remote sensing recognition research.Since 2018, deep learning has been used to identify built-up land.Tan et al. [31] designed a double-stream convolutional neural network that combines complementary cues from high-resolution panchromatic and multispectral images for highresolution built-up area recognition.Compared with the most advanced technology available hitherto, Tan's method has higher overall accuracy (OA) and superior generalization ability.Sun et al. [32] began from different data sources and proposed a three-dimensional convolutional neural network (3D-CNN) method for extracting impervious surfaces from WorldView-2 and airborne LiDAR datasets.To extract feature information more comprehensively, Zeng et al. [33] combined and optimized the Resnet and U-Net models to identify airport land.Pixel-level spatial and spectral information are typically used first, followed by the use of texture and feature maps through a multi-scale convolution process.This enhances the identification of impervious surfaces.Image classification methods can provide the year-by-year feature recognition required by national land monitoring authorities and other departments.The principle is simple, and the law of feature change can also be further analyzed.However, the data used in most studies were manually selected first-phase images with superior quality over a single year.Further research is required to address the need for year-by-year data processing and assess how the timespace spectral information provided by remote sensing images may be fully utilized to generate higher-precision built-up land maps.
Several large-scale identification and research projects have focused on built-up land.For example, NUACI is a global 30-m urban land cover dataset from 1990 to 2010 produced using an algorithm based on Landsat images [14].LUC is a 1-km land use cover dataset from 2015 in China based on multi-source remote sensing data, such as Landsat-8 and GF-2, combined with data obtained from field surveys and human-computer interaction interpretation, with a comprehensive evaluation accuracy greater than 90% [34].The Annual Global Land Cover (AGLC) is a global 30-m land cover dataset from 2000 to 2015, based on one year of high-confidence land cover data; it used random forests to classify areas that change every two years, resulting in 15 consecutive years of land cover data [13].Mainstream large-scale mapping methods are still based on pixel-based approaches, such as RF.Pixel-based methods are more efficient in terms of computing power; they are also highly sensitive to small objects and able to map in greater detail.However, due to the complexity of mixed pixels and ground objects in remote sensing images, the results of pixel-based methods typically include "salt and pepper" noise.
To compensate for the above-mentioned shortcomings of existing research, we performed an analysis based on the following assumptions.First, the spectral differences in the remote sensing images captured across the period of a year differ significantly, and the artificial selection of one of these images is not scientifically sound.Remote sensing image classification is used primarily to obtain information from spectral features for classification.Therefore, we propose beginning from the spectral characteristics of different land uses, and then integrating all available images from a given year, and synthesizing feature-enhanced images for that year.Furthermore, to capture the dynamic changes in built-up land in a long-term series, it is necessary to perform year-by-year built-up land mapping.Similarly, the corrected outlier classification of mapping results should be based on long time series of annual results.Finally, year-by-year large-scale built-up land mapping research will inevitably involve problems related to processing, feature selection and training, and prediction of large sets of data.On one hand, deep learning identifies image features through neural networks and automatically filters image features through labels.This can significantly save computational power and time, and to a certain extent, minimize reliance on professional knowledge and experience.The Convolutional Neural Network (CNN) approach involves learning and predicting based on pixel blocks, which can take into account the context information of pixels, avoiding the problem of "salt and pepper" noise.On the other hand, big data can be accessed and analyzed quickly and easily on the Google Earth Engine (GEE) platform, which is provided by Google for online visualization, calculation, analysis, and processing of large amounts of large-scale earth science data (especially satellite data).GEE is supported by numerous parallel servers, so it has significant advantages in data sourcing and processing efficiency.
In conjunction with what was discussed above, in this study all available Landsat time-series imagery data were integrated to synthesize yearly feature-enhanced data with a phenology-based algorithm.Then, we combined the advantages of deep learning and GEE to train a built-up land model that enables fast and accurate identification of builtup land.Finally, we obtained and corrected the annual built-up map of Guangdong from 1991 to 2020, and analyzed the spatial and temporal distribution of built-up land in Guangdong for 30 consecutive years.The main objective of this study was to generate a large-scale, long-term and accurate dataset of 30 m built-up land in Guangdong from 1991 to 2020 by combining time-series Landsat imagery, phenology-based algorithm, deep learning methods, and GEE.

Study Area
Guangdong is a provincial-level administrative region of the People's Republic of China.According to the 2018 China Statistical Yearbook, Guangdong has an area of approximately 179,710 km 2 .Guangdong has been China's most populous and largest economic province since 1989.In tandem with its economic development, land use in Guangdong has also witnessed drastic changes as the area of built-up land continues to increase.Moreover, the types and the distribution characteristics of its built-up land vary across different regions, making it a particularly suitable candidate for the study of built-up land identification.
As shown in Figure 1, Guangdong is divided into four sub-regions: West Guangdong, North Guangdong, East Guangdong, and the Pearl River Delta.Of these, the Pearl River Delta is Guangdong's most prosperous region.It is also one of China's three most densely populated and robust urban agglomerations.

Landsat Imagery
The data used in this study were obtained from GEE, and all data processing was implemented in GEE.We collected all available Landsat-4/5/7/8 surface reflectance (SR) data from 1991 to 2020.Table 1 shows the number of Landsat imagery used each year.Landsat-7 only used data from 1999 to 31 May 2003, due to the problematic presence of striping in the images, and data from 2012 (Landsat-8 has no available data from 2012).This dataset is the SR data from the Landsat 8 OLI/TIRS sensors, and had processed to the Level-1 Precision Terrain (L1TP) level.These data have been atmospherically corrected using Land Surface Reflectance Code (LaSRC, https://www.usgs.gov/media/files/landsat-8-collection-2-level-2-science-product-guide,accessed on 24 March 2022) and includes a cloud, shadow, water and snow mask produced using C implementation of mask function (CFMasK) [35], as well as a per-pixel saturation mask.The quality of Landsat SR data is determined by the "pixel_qa" band (a pixel quality attribute generated by the CFMask).
To ensure high-quality observation data, cloud, cloud confidence, cirrus confidence, cloud shadow, snow/ice, and water were filtered by "pixel_qa" band (Figure 2).

Vegetation Index
Numerous studies have demonstrated that the Normalized Difference Vegetation Index (NDVI) is useful in identifying built-up land [34,36,37].Moreover, NDVI is highly sensitive to green vegetation, which typically surrounds built-up land.To better extract built-up land boundaries, the NDVI of all valid data was calculated from Landsat SR data (Equation ( 1)) [38].

Algorithms for Identifying Time Series of Built-Up Land
Figure 3 presents the workflow followed to map built-up land in Guangdong from 1991 to 2020.Based on all high-quality observations of the time series data collected, we synthesized the annual feature-enhanced data by analyzing the phenological spectral features of different land use types.Built-up land cover from 1991 to 2020 was predicted using deep learning techniques with ground truth data, and temporal segmentation was used to correct and generate 30-year spatiotemporal dynamics of the built-up land.Following accuracy assessing, built-up maps of Guangdong from 1991 to 2020 were produced.These methods are described in detail in the following sections.

Spectral Variability between Built-Up Land and Other Lands Based on Phenology
Forest, grassland, and cropland are typically covered by crops or vegetation during the lush growth period of the year, and bare during non-growth periods.Therefore, the spectral values usually differ significantly in the time series remote sensing images from a single year.The growth period also differs, as crop land in particular may undergo several growth periods.Compared with forest, grassland, and cropland, land covered by water is relatively stable, but the water may be covered by algae, so small fluctuations may be observed.Data for built-up land use should be stable with no significant fluctuations across the one-year spectral curve.Based on this principle, it is not scientifically sound to select only one image from a given year [39].
To generate an annual effective characteristic data set of built-up land, we analyzed the time series spectral characteristics of different land use types.Figure 4 presents the time series spectral curves for different land use types.The NDVI of water is usually less than 0 (Figure 4b); the NDVI of forest, cropland, and grassland is usually higher, and concentrated at 0.3-0.8(Figure 4c-e); and the NDVI of built-up land is typically 0.1-0.3(Figure 4a), which is smaller than the values for forest, cropland, and grassland and larger than water, which can be effectively distinguished.At the same time, built-up land is usually surrounded by vegetation, and the spectral information of mixed pixels also increasingly exhibits vegetation spectral characteristics [36].In theory, therefore, NDVI can effectively distinguish built-up land from other land use types.
Figure 4. Time−series spectral curves for different land-use types with 1 data every 10 days were synthesized using all available Landsat−7, 8 and Sentinel-2 remote sensing imagery.The abscissa of the curve is the number of days in a year, and the ordinate is the reflectance of SR. (a-e) represent the time−series spectral curves of built-up land, crop land, forest, grass and water, respectively.

Annual Composite Feature Enhancement Images
To enhance the data characteristics and improve the model training efficiency, we synthesized the valid data for each year.All images were composited in a collection, using a quality band as a per-pixel ordering function.Each pixel used the maximum value of NDVI as an index to compose the NDVI_MAX image per year [37] (Figure 5).In the annual NDVI_MAX images, blue (452-512 nm), green (533-590 nm), red (636-673 nm), NIR (851-879 nm), and SWIR-1 (1566-1651 nm) NDVI bands were selected as training bands, and some sample enhancement operations, such as random flip and rotation, were completed.

Deep Learning for Identifying Built-Up Land Per Year
Res-UNet++ Model for Identifying Built-Up Land U-Net is a typical encoder-decoder structure network [40].This model architecture can achieve good accuracy by training a smaller number of data samples, thereby reducing training time and resources.However, the feature maps obtained by the encoder in the traditional U-Net structure are directly superimposed with the decoder through skip connections.This will cause the network to lose a lot of information in the feature fusion part, affecting the mapping accuracy.Therefore, a dense jump connection is added between the encoder and the decoder-namely, U-Net++ [41].This is considered an extension of U-Net, which can effectively reduce the semantic gap between low-level information and high-level information [42,43].To address the problem of gradient disappearance or explosion during the U-Net structure training process, we introduced the residual structure design of Resnet to optimize our model [44], which became the Res-UNet++ model.
Figure 6a presents the structure of the Res-UNet++ model.Resnet18 residual structure was used as the backbone of Res-UNet++, as shown in Figure 6b.Each basic-block contained two 3 × 3 convolutions.Batch normalization layers were added to speed up the network learning, and maximum pooling was added to downsampling.Through this model structure, the features of each layer from shallow to deep in the encoder were connected and fused with the deep features in the decoder.The original image size of the feature was then restored by upsampling step by step.

Deeplab-v3 and FCN Models for Comparison
In the field of semantic segmentation, FCN [45] is a typical model, and Deeplab-v3 [46] has been applied very successfully in recent years.In this study, FCN and Deeplab-v3 were adopted for comparison with our Res-UNet++ and improved according to the characteristics of built-up land in the study area.
Figure 7a presents the architecture of Deeplab-v3.Xception was used as the backbone and Deeplab-v3 structure as the framework.Each encoder block was composed of 3 × 3 depthwise convolution, 1 × 1 pointwise convolution, batch normalization, and maximum pooling.Five encoder blocks were designed in total.In particular, atrous spatial pyramid pooling was added to the final step of downsampling, which is a module unique to Deeplab-v3.We applied atrous convolution to the cascade module to realize Atrous Spatial Pyramid Pooling (ASPP) [47].On one hand, atrous convolution can address the problem of information loss caused by pooling, and the receptive field was increased without increasing the number of parameters to ensure that information was not lost.On the other hand, increasing multi-scale parallelism can resolve the problem of simultaneous segmentation of differently sized objects.
Figure 7b presents the architecture of FCN.Vgg16 was used as the backbone, and convolution transpose was used for upsampling.Each block was composed of two convolutions and one pooling.Five blocks were designed in total.

Parameters and Design of Deep Learning Training
Due to the vastness of the research area, the production of label data based on the entire research area would be an enormous project.Therefore, the entire study area was divided using a 60 × 60-km grid.Two grids were selected as training data patches in each sub-region.The selection was based on the larger built-up land area and more built-up land types in the sub-regions, and the adjacent grid should be avoided as far as possible.
To test the model's generalization, two grids were selected as testing data patches in North Guangdong and Pearl River Delta (Figure 8).
Data from 2019 were used as training and validation data, and all ground truth data were produced through manual visual interpretation.The ground truth data were based on the China Multi-period Land Use Land Cover Change Remote Sensing Monitoring Dataset (CNLUCC, from the Resource and Environmental Science Data Registration and Publishing System), which was verified by field inspection and manually revised.
The learning rate adopted a piecewise constant decay strategy.The initial learning rate was 1 × 10 -³, and the minimum learning rate is limited to 1 × 10 -5 .Learning rate was reduced by 5% when the accuracy difference between the two classes was very small.The model's loss function set the sum of improved balanced binary cross-entropy loss and the dice coefficient loss [42].The Adam optimization algorithm was used.

Correction of Time Series Built-Up Land by Temporal Segmentation Algorithm
Due to the complexity of the built-up land system, some built-up land is temporarily occupied, and some non-built-up land is temporarily occupied for several years.To adapt to the long-term dynamic changes of built-up land, we attempted to correct the mapping results of built-up land year-by-year from the perspective of a longer time series.We adopted LandTrendr (Landsat-based detection of Trends in Disturbance and Recovery) [48] to fit the long time series of built-up land probabilities to correct the misclassification of individual years.Figure 9 presents the pixel-based temporal smoothing procedure.The spectral time series of pixels were modeled as a linear segment sequence through temporal segmentation.Next, spikes with similar spectral values before and after the de-spiking algorithm were removed.Subsequently, the residual-error criterion was used to identify potential vertices, which had large deviations from the fitted regression line.We performed simplification and refitting based on vertices and simplifying models, and the final output of the segmentation algorithm emerged from the best model, which was determined using the p-value for the F-statistic.

Accuracy Assessment and Area Estimation
Validation data were selected and the area accuracy estimated following the "good practice" suggested by Olofsson [49].
Taking a Landsat pixel (30 × 30 m) as the evaluation unit, the stratified random sampling method was used to select the validation data based on four sub-regions.The sample size was based on the expected OA of the map and the assumed error matrix to obtain the approximate range, and the validation process was repeated to determine the best sample size [49].We calculated the kappa coefficient [50] based on the confusion matrix and overall accuracy (OA), producer accuracy (PA), and user accuracy (UA) based on the error matrix.To ensure that the accuracy and area estimates were unbiased and consistent, we also estimated each standard deviation and confidence interval at 95%.
Since the research area is large and the sample weight is sufficient, the study area was stratified by sub-regions and map classes.The proportion of built-up land and nonbuilt-up land was significantly unbalanced, so a combination of proportional and optimal distribution was adopted in the sample size allocation.After multiple experiments, 30 sample points for built-up land and 120 sample points for non-built-up land were obtained in each sub-region (Figure 10).According to the Landsat series imagery and the high spatial resolution Google Earth TM image, these samples were visually interpreted as reference data for accuracy assessment and area estimation.Assessment was completed every five years from 1991 to 2020.

Model Comparison Experiment Result
We used the improved Res-UNet++, Deeplab-v3, and FCN models to predict the built-up land map of Guangdong Province in 2020 and used the validation sample points for validation.Table 2 presents the comparison of each model's accuracy.Res-UNet++ clearly yielded the highest accuracies: OA was 0.99, kappa was 0.96 in 2020, and UA and PA were also the highest among the three models.Among them, the PA of the BL in Res-UNet++ was considerably higher than that of the other two models.Accuracies of Deeplab-v3 and FCN were comparable: FCN was slightly higher on kappa and OA (OA was 0.95, kappa was 0.82 in 2020).
Figure 11 presents detailed comparisons of the mapping results of the three models.The main problem of Deeplab-v3 and FCN was the serious omission error of built-up land (PA of BL on Deeplab-v3 was 0.53, PA of BL on FCN was 0.55), which was mainly reflected in the rough identification of the built-up land boundary, and more omission errors could be expected in small areas of built-up land.As with the precision results, Res-UNet++ yielded the best results: it provided the best results for the identification effect of the builtup land boundary, and also provided the best results in mapping small areas of built-up land and the capture and judgment of spatial context information; in other words, it yielded the best semantic segmentation effects.
Table 3 summarizes the accuracy results of sub-regions according to the results of Res-UNet++, which was convenient for further analysis and comparison of regions in Guangdong.In general, the accuracies of each of the four sub-regions were relatively similar.The accuracy indicators of WG, NG, and PRD were almost identical, and the OA of EG was relatively low.The main reason for this was that the PA of built-up land was low, so omission errors were present, and from the accuracies of four sub-regions, the PA of built-up land was also lower than other accuracy values.

Assessment of Built-Up Land Maps from 1991 to 2020
The built-up land mapping model was trained using the 2019 training patches.Based on this model, we predicted the built-up land map for each year from 1991 to 2020.The built-up land maps were displayed every five years, as shown in Figure 12.Table 4 lists the accuracy assessment, area estimation, and 95% confidence interval of area.
From the accuracy assessment in Table 4, we can see that OA peaked at 0.99 (2010) with a lowest value of 0.98 (2005); the kappa value peaked at 0.96 (2020) and had a lowest value of 0.94 (2010).Overall, the accuracies in 1995 and 2005 were lower than those in other years (OA was 0.98 and kappa was 0.90 in 1995; OA was 0.98 and kappa was 0.91 in 2005), mainly due to UA (0.93 in 1995; 0.94 in 2005) and PA (0.55 in 1995; 0.66 in 2005) in the built-up land class; 2020 had the highest accuracies (0.99 for OA and 0.96 for kappa).
From the spatial distribution of the built-up land maps in Figure 12 and the estimated built-up land area in Table 4, it is clear that there was an obvious phenomenon of built-up land expansion centered on towns and cities.Overall, the area of built-up land increased from 4.98(±0.94)× 10 3 km 2 in 1995 to 10.65(±1.56)× 10 3 km 2 in 2020.Among these, the greatest increase occurred between 2000 and 2005 (an increase of 1.56 × 10 3 km 2 ). Figure 12b,c shows that the most obvious built-up land expansion occurred in the PRD and EG.The second most obvious expansion was between 2015 and 2020 (an increase of 1.48 × 10 3 km 2 ), with an obvious expansion of built-up land at the village and town scale.The expansion of urban built-up land in the PRD region was also prominent.The smallest increase occurred between 1995 and 2000 (an increase of 390 km 2 ), followed by between 2005 and 2010 (an increase of 940 km 2 ).

Spatial-Temporal Changes of Built-Up Land in Guangdong
We counted the built-up land area mapped each year, as shown in Figure 13. Figure 14 presents the spatial distribution of built-up land expansion in Guangdong from 1991 to 2020.As shown in Figure 13, the built-up land did not increase every year: for example, it decreased by 60 km 2 in 2012 and by 110 km 2 in 2018, but on the whole, the built-up land in Guangdong expanded significantly, increasing from 2.64×10 3 km 2 in 1991 to 9.12×10 3 km 2 in 2020.The greatest expansion was 960 km 2 in 2019, followed by 540 km 2 in 1993 and 2006.Over a 30-year period, the built-up land increased by 71%.The proportion of builtup land in Guangdong has increased from 1.68% in 1991 to 5.59% in 2020-an increase of 3.91%.
From the perspective of regional spatial distribution, PRD had the largest area of built-up land and the fastest increase; WG had the smallest area and the slowest increase.PRD's increase was concentrated in the Guangzhou-Dongguan-Shenzhen-Foshan urban agglomeration and was mainly due to the outward expansion of urban built-up land, which expanded faster in 2005 and 2020 (Figure 14e).The built-up land in EG was concentrated in the Chaoshan area (Shantou-Jieyang-Chaozhou-Shanwei).It was mainly urban built-up land that was expanding, with faster expansion rates in 2000 and 2005 (Figure 14d).The built-up land in NG was relatively scattered.The expansion mainly affected town built-up land, with faster expansion rates in 1995 and 2020 (Figure 14b).The builtup land in WG was relatively scattered and located mainly in villages and towns; it expanded most rapidly in 2020 (Figure 14c).

Algorithms for Mapping Annual Built-Up Land Maps
In early built-up land mapping research, it was common to artificially select the best quality scene data for mosaicking [16,36] and calculate the indexes [36,51,52].To maximize the advantages of existing time series images and highlight the spectral characteristics of built-up land, we synthesized the annual pixel-based NDVI maximum data as featureenhanced data using all available time series data based on a phenology-based algorithm.The mapping results also verified that the NDVI_MAX data can reflect and highlight the characteristics of built-up land to a certain extent and can be used as basic data for builtup land mapping.
Res-UNet++, Deeplab-v3, and FCN structures were used in this study (Section 2.3.3), and the improved Res-UNet++ model was the best among these three models.Deeplab-v3 and the FCN structural model that we designed were not sufficient to extract built-up land details with a spatial resolution of 30 m, but the Res-UNet++ structural model can yield good results.These findings are sufficient to verify that the Res-UNet++ model we designed can effectively extract built-up land from large-scale 30-m spatial resolution remote sensing images.Moreover, the training rate of a built-up land mapping model in Guangdong was 1.80 × 10 3 km 2 /s, enabling it to achieve good results and be applied to long-term series mapping.Therefore, we can also conclude that the deep learning method can efficiently identify built-up land in large-scale remote sensing images.
In earlier attempts at built-up land mapping, bare land, sandy land, and mixed pixels were commonly confused with built-up land [15,22,37].In our built-up land maps, we also counted a matrix showing the specific situations of all commission errors in the builtup land map from 2020, as detailed in Table 5, and Figure 15 presents a detailed map of all misclassifications in 2020.We can see that three of the 120 built-up land validation points were misclassified (Figure 15B): the actual land uses of these three points are bare land, mixed pixels, and cropland.However, the level of commission error is still acceptable.In particular, our method had no commission error for sand, and the commission error for crop land was actually rare.The more serious commission error was the case in which bare land was classified as built-up land.Some omission errors also emerged in our mapping results (Figure 15A).This phenomenon was more prominent on some small areas of built-up land or branch roads.Since our method was implemented based on pixel blocks (such as 3×3 pixel blocks), some details could not be captured.Therefore, the builtup land area counted by our mapping results should be smaller than the actual area.These shortcomings may be addressed in future research.
Finally, we used the LandTrendr algorithm (Section 2.3.4) to re-fit the time series as post-processing.The results demonstrated that this method is suitable for post-processing of time series mapping, which can not only remove some simple errors but can also handle some complex situations.For example, due to the mixed pixels, the identification result of one pixel for three consecutive years was built-up land-non-built-up land-built-up land.LandTrendr can observe and judge results from longer time series.Some problems occurred with data due to cloud occlusion, and a small portion of water was mapped as built-up land over several years.These classifications will be reprocessed as built-up land.Temporal filtering improved the accuracy of the results.Table 5. Statistical matrix of land use for all commission errors of built-up land in Guangdong in 2020.Among 120 BL validation sample points in Guangdong, 3 N-BL were mapped as BL, and the actual land types were counted.

Comparison with Different Datasets
To better reflect the quality of mapping data, the built-up land identification results in 2015 (DL-based_2015) were compared with NUACI_2015 [14], LUC_2015 [34], and AGLC_2015 [53]. Figure 16 compares four regions: the selected cities are relatively developed large cities or megacities, so the built-up land was concentrated and large.Overall, the spatial distribution of built-up land presented by the four data products was relatively consistent.The built-up land distribution of PRD was highly concentrated and contiguous, while the built-up land of WG had the spatial characteristics of scattered small areas.Specifically, NUACI_2015, LUC_2015, and AGLC_2015 were pixel-based data products and were thus highly sensitive to the identification of tiny features.However, because the identification of NUACI_2015 was urban land, the identification of rural built-up land was lacking compared with other data, and the identification of built-up land should be the lowest.LUC_2015 was able to identify built-up land based on historical land use vector data and the human-computer interaction interpretation method, so the spatial distribution of the identified built-up land was relatively accurate, but the boundary was relatively rough.AGLC_2015 was highly sensitive to the identification of small roads and small areas of built-up land, and the identification was highly detailed.In contrast with the impervious surface identified by AGLC_2015, the data used in this study (DL-based_2015) identified built-up land.Due to the different definitions of identification objects, DL-based_2015 can identify areas such as urban wetlands and urban green spaces as built-up land, but impervious surfaces are not allowed.We also adopted the CNN method to realize identification based on pixel blocks.Therefore, our method focuses more on the extraction of spatial context information, with the result that the built-up land mapping was more comprehensive.
In sum, the data products obtained using different methods have different characteristics and different applications.In future research, appropriate methods can be selected according to different needs.

Advantages of Combining GEE and Deep Learning Methods
On one hand, GEE offers considerable advantages for the acquisition and processing of large amounts of data.Large amounts of free, high-quality data with different spatial and temporal resolutions are available via the GEE platform, which allowed us to directly acquire and process the data in parallel on GEE.In fact, it took only about 25 min to synthesize 30 years of annual NDVI_MAX data in Guangdong Province, which greatly reduced time and labor costs.Additionally, the algorithm of temporal smoothing had also been integrated on the GEE platform, which was convenient and fast to use.Therefore, batch data processing of large data volumes can be easily implemented using GEE.
On the other hand, deep learning has significant advantages in terms of data feature selection and feature learning.Deep learning simulates human brain neurons through complex neural networks to recognize and learn data features.In particular, CNNs are widely used in remote sensing image classification [20,31,33].The relationship between time, space, and spectrum can be automatically discovered through convolution.This considerably reduces the manual selection features in the identification of built-up land, making it more intelligent and automated.
We combined GEE and deep learning methods: we used the cloud to process data on GEE and used GEE data for deep learning training, and the trained model could also be directly utilized and predicted on GEE, saving space, memory, and time.Real-time map display can also be performed on GEE, and the model's shortcomings can be found over time.GEE has led to a new era of cloud operation, and the combination of GEE and deep learning will likely become the main method used in remote sensing mapping research.

Implications and Future Work
Mastering the spatial distribution of the temporal and spatial changes of built-up land over a period of 30 consecutive years helps support land planning and dynamic management of land resources.It is a necessary prerequisite to ensure the scientific implementation of spatial planning [1].The phenology-based algorithm based on time series remote sensing images obtains characteristic image data by selecting unique spectral or phenological features, which can be applied to other land use classifications and even remote sensing interpretation work.The combination of this algorithm, GEE, and deep learning methods has developed into a large-scale, long-term series of remote sensing classification technology framework that will be useful for future land use mapping research.
In future research, we will apply the algorithm generalization to the built-up land mapping of the entire country as well as on a global scale in a longer time series.Theoretically, the algorithm of this study can be directly applied and implemented.However, the edge detection effect and detail effect of deep learning convolutional neural networks need to be optimized.

Conclusions
We proposed a method combining phenology-based algorithm, GEE, and deep learning to map annual built-up land from medium-resolution remote sensing images.This method used 30-m spatial resolution time series remote sensing images to generate pixelbased annual NDVI maximum images as feature enhancement data on GEE.Taking China's Guangdong Province as the study area, the Res-UNet++ model with the U-Net++ structure as the backbone and Resnet18 as the framework was largely improved.Temporal segmentation was performed with the time series data of built-up land identification results.The accuracy of built-up land mapping was improved in various aspects, and we were able to map built-up land in Guangdong from 1991 to 2020.The results of our experiments indicate that our algorithm can identify built-up land relatively broadly and comprehensively, but the recognition effect needs to be improved in small area of built-up land.At the same time, the combination of the phenology-based algorithm, GEE, and deep learning offers significant advantages in data processing, feature learning, and model prediction efficiency and can be applied to long-term, large-scale remote sensing rapid identification research.

Figure 2 .
Figure 2. Good-quality observation of time-series remote sensing image from 1991 to 2020 according to the latitude.(a-d) show the number of Landsat 4, 5, 7 and 8 each year for 30 years.(e) shows total numbers in all study areas.(f) shows the number of Guangdong in 2019 in the form of a map.

Figure 3 .
Figure 3. Workflow of mapping built-up land in Guangdong province from 1991 to 2020.

Figure 5 .
Figure 5.The feature enhancement data (NDVI_MAX) of the study area in 2019.(a) shows the Day of Year (DOY) each pixel of NDVI_MAX; (b) shows false color image of NDVI_MAX.

Figure 7 .
Figure 7.The structure of the compared model.(a) the structure of Deeplab-v3; (b) the structure of FCN.

Figure 8 .
Figure 8. Spatial distribution of training and testing patches.

Figure 9 .
Figure 9.The procedure of pixel−based temporal smoothing.

Figure 10 .
Figure 10.Spatial distribution of validation sample points.

Figure 13 .
Figure 13.Built-up land area and its standard error statistics in annual classification maps.

Figure 14 .
Figure 14.Built-up land expansion in Guangdong from 1991 to 2020.

Figure 15 .
Figure 15.Examples of classification errors in the mapping of built-up land in Guangdong in 2020.(A) Examples of omission errors; (B) all commission errors in 120 built-up land validation points.

Figure 16 .
Figure 16.Comparison of built-up land mapping examples of three important data products in four sub-regions in 2015.DL-based_2015 is the data of this study, which identified built-up land; NUACI_2015 is the product of the global 30 m urban land; LUC_2015 is the product of the 1 km land use (built-up land) in China; AGLC_2015 is the product of the global 30 m land cover (impervious surface).

Table 1 .
Number of Landsat imagery used totally each year.

Table 2 .
Accuracy assessment of built-up land maps based on Res-UNet++, Deeplab-v3 and FCN model in 2020 (BL-Built-up land, N-BL-Non-Built-up land).

Table 3 .
Error matrix with cell entries expressed in terms of proportion of area and accuracies based on Res-UNet++ in 2020.The vertical axis represents the truth, and the horizontal axis represents the results of mapping, which are divided into four sub-regions for statistics (WG-West Guangdong, NG-North Guangdong, EG-East Guangdong, PRD-Pearl River Delta).