Mapping Paddy Rice in Complex Landscapes with Landsat Time Series Data and Superpixel-Based Deep Learning Method

: The spatial pattern and temporal variation in paddy rice areas captured by remote sensing imagery provide an effective way of performing crop management and developing suitable agricultural policies. However, fragmented and scattered rice paddies due to undulating and varied topography, and the availability and quality of remote sensing images (e.g., frequent cloud coverage) pose signiﬁcant challenges to accurate long-term rice mapping, especially for traditional pixel and phenological methods in subtropical monsoon regions. This study proposed a superpixel and deep-learning-based time series method to analyze Landsat time series data for paddy rice classiﬁcation in complex landscape regions. First, a superpixel segmentation map was generated using a dynamic-time-warping-based simple non-iterative clustering algorithm with preprocessed spectral indices (SIs) time series data. Second, the SI images were overlaid onto the superpixel map to construct mean SIs time series for each superpixel. Third, a multivariate long short-term memory full convolution neural network (MLSTM-FCN) classiﬁer was employed to learn time series features of rice paddies to produce accurate paddy rice maps. The method was evaluated using Landsat imagery from 2000 to 2020 in Cengong County, Guizhou Province, China. Results indicate that the superpixel MLSTM-FCN achieved a high performance with an overall accuracy varying from 0.9547 to 0.9721, which presents an 0.17–1.23% improvement compared to the random forest method. This study showed that combining spectral, spatial, and temporal features with deep learning methods can generate accurate paddy rice maps in complex landscape regions.


Introduction
As one of the most important staple crops in China, rice provides food for about 65% of Chinese people [1]. Paddy rice agriculture is a crucial component in national and global food security. However, pests and diseases continue to have detrimental effects on rice production [2,3]. In addition, maintaining the rice planting area is a critical challenge for global food security [4]. Accelerating urbanization and the "Grain for Green" project may create continual pressure on the rice planting area [5]. Furthermore, paddy rice agriculture also plays an essential role in water use and climate change. Rice agriculture requires a considerable amount of water [6]. The total water consumption of rice irrigation around the world accounts for about a quarter to one-third of global freshwater usage [7,8]. Rice paddies are an important source of methane emissions, contributing approximately 11% of total anthropogenic methane emissions [9]. Methane emissions from rice paddies significantly influence climate change by accelerating global warming [10]. Consequently, the spatial distribution of paddy rice is valuable information for agriculture management, and understanding and assessing pests and diseases, food security, climate change, and water usage at regional, national, and global scales [11]. Compared to conventional agricultural of land cover, which remains challenging in complex regions with variable topography, landscape composition, and configuration [16,41,42], such as rice planting areas in South and Southwest China. In such regions, the undulating terrain, non-uniform phenology, and small, irregular, and fragmented rice paddies generally mixed with other croplands lead to difficulties in rice mapping using phenological variables due to the spatial proximity and spectral similarity to other land types [42][43][44]. In addition, the salt-and-pepper effect is another issue in using per-pixel classification for complex landscapes [16].
Identifying paddy rice could be regarded as a multivariate time series (MTS) classification task with two classes: rice and non-rice. With the recent vigorous development of deep learning, ensemble deep learning models have been developed to accommodate the time series classification problem without heavy preprocessing or feature engineering. For example, Karim et al. [45] proposed the multivariate long short-term memory (LSTM) full convolution neural network (FCN) (MLSTM-FCN) classification method. The proposed MLSTM-FCN model was tested on 35 datasets and achieved strong performances with most of them. Therefore, the MLSTM-FCN was employed to identify the paddy rice in the Landsat SIs time series in this study. Due to object-based classification producing better results than per-pixel methods [12], a superpixel segmentation algorithm was employed to generate superpixels as objects.
The goal of this study was to use ensemble deep learning models with Landsat data to map annual paddy rice coverage. This method could generate precise rice paddy maps with 30 m-spatial-resolution images for complex regions. The specific objectives of the study were to (1) evaluate the potential of Landsat time series images for mapping rice paddy in complex regions; (2) compare the performance of the per-pixel-based and superpixel-based MLSTM-FCN classifier and RF classifier in mapping paddy rice with SIs time series; and (3) map the temporal variation in paddy rice using time series Landsat images.
The contribution of this study is two-fold. The first contribution integrates agronomic domain knowledge with deep learning methods to generate accurate paddy rice maps, and this approach could be extended to other crop studies as well. On the other hand, the superpixel approach proposed in this paper could be employed to accommodate various spatial-temporal analytics in agriculture, which offers a baseline for decision making in agriculture management.

Study Area
The study area, Cengong County, is located in the eastern part of Guizhou Province, China (longitude 108 • 20 E to 109 • 03 E and latitude 27 • 09 N to 27 • 32 N) ( Figure 1). Cengong has a total area of 1486.5 km 2 , and elevations vary between 330 m and 1359.9 m. The hilly terrain in Cengong typifies a complex landscape. According to a field survey, due to the topography and well-developed roads, paddy rice fields are mainly distributed in the intermountain basins (local name "Bazi"), which are generally fragmented, relatively small, and irregular. The major land cover categories are forest, built-up, water, and cropland, where the cropland includes upland, paddy fields, and abandoned land; because of the topography restriction, the rice fields are usually mixed with other croplands. It has a humid subtropical climate (Köppen Cfa), with distinct seasons and abundant precipitation. The annual average temperature ranges between 15.7 • C and 17.1 • C, and the annual precipitation ranges between 1005.6 mm and 1403.5 mm, which is mainly concentrated in May and June. Cengong County is the only national hybrid rice seed production base in Guizhou Province, and single-cropped rice is the predominant cultivar. The growing season for paddy rice is from May to September.

Time Series Landsat Data
All available Landsat TM, ETM+, and OLI Collection 2 Level 2 products from 2000 to 2020 for Worldwide Reference System-2 (WRS-2) 126/041 with more than 20% clear observations were downloaded from the USGS EROS Science Processing Architecture on Demand Interface (https://espa.cr.usgs.gov/ (accessed on 26 February 2022)), including surface reflectance, brightness temperature (BT), and quality assurance (QA) bands. The QA band can provide the mask of clouds and cloud shadows generated by the CFmask (C version of Fmask) algorithm [46]. The images for Landsat TM, Landsat ETM+, and Landsat OLI were 93, 156, and 63, respectively ( Figure 2).

Sample Points
Pure pixels were generated as sample points from the land use map in 2019, the field survey in July 2019, and Tianditu online images [47]. This land use map was provided by the Bureau of Natural Resources of Cengong. The land use map was reclassified to paddy fields, other crops, forests, shrubs, wetlands, built-up, water, and barren. A total of 2500 random sample points were generated in the parcels containing at least six connected Landsat pixels using the Create Random Points tool in ArcGIS Pro 2.8.1 software from ESRI (Redlands, CA, USA). The ground survey was conducted in July 2019 in Cengong County. During the field survey, a handheld data collector (Trimble GeoExplorer 6000, Trimble, Inc. Sunnyvale, CA, USA), which records location with high accuracy, was used

Time Series Landsat Data
All available Landsat TM, ETM+, and OLI Collection 2 Level 2 products from 2000 to 2020 for Worldwide Reference System-2 (WRS-2) 126/041 with more than 20% clear observations were downloaded from the USGS EROS Science Processing Architecture on Demand Interface (https://espa.cr.usgs.gov/ (accessed on 26 February 2022)), including surface reflectance, brightness temperature (BT), and quality assurance (QA) bands. The QA band can provide the mask of clouds and cloud shadows generated by the CFmask (C version of Fmask) algorithm [46]. The images for Landsat TM, Landsat ETM+, and Landsat OLI were 93, 156, and 63, respectively ( Figure 2).

Time Series Landsat Data
All available Landsat TM, ETM+, and OLI Collection 2 Level 2 products from 2000 to 2020 for Worldwide Reference System-2 (WRS-2) 126/041 with more than 20% clear observations were downloaded from the USGS EROS Science Processing Architecture on Demand Interface (https://espa.cr.usgs.gov/ (accessed on 26 February 2022)), including surface reflectance, brightness temperature (BT), and quality assurance (QA) bands. The QA band can provide the mask of clouds and cloud shadows generated by the CFmask (C version of Fmask) algorithm [46]. The images for Landsat TM, Landsat ETM+, and Landsat OLI were 93, 156, and 63, respectively ( Figure 2).

Sample Points
Pure pixels were generated as sample points from the land use map in 2019, the field survey in July 2019, and Tianditu online images [47]. This land use map was provided by the Bureau of Natural Resources of Cengong. The land use map was reclassified to paddy fields, other crops, forests, shrubs, wetlands, built-up, water, and barren. A total of 2500 random sample points were generated in the parcels containing at least six connected Landsat pixels using the Create Random Points tool in ArcGIS Pro 2.8.1 software from ESRI (Redlands, CA, USA). The ground survey was conducted in July 2019 in Cengong County. During the field survey, a handheld data collector (Trimble GeoExplorer 6000, Trimble, Inc. Sunnyvale, CA, USA), which records location with high accuracy, was used

Sample Points
Pure pixels were generated as sample points from the land use map in 2019, the field survey in July 2019, and Tianditu online images [47]. This land use map was provided by the Bureau of Natural Resources of Cengong. The land use map was reclassified to paddy fields, other crops, forests, shrubs, wetlands, built-up, water, and barren. A total of 2500 random sample points were generated in the parcels containing at least six connected Landsat pixels using the Create Random Points tool in ArcGIS Pro 2.8.1 software from ESRI (Redlands, CA, USA). The ground survey was conducted in July 2019 in Cengong County. During the field survey, a handheld data collector (Trimble GeoExplorer 6000, Trimble, Inc. Sunnyvale, CA, USA), which records location with high accuracy, was used to obtain the latitude and longitude of rice-dominated surrounding plots, and a total of 43 points were recorded. The field survey sample points and randomly generated points were merged and further interpreted carefully by using the corresponding Tianditu high-resolution images to exclude the points for which a confident interpretation could not be made.

Agriculture Statistical Data
Agricultural statistics for Cengong were provided by the Bureau of Agriculture and Rural Affairs of Cengong. Statistical data contain estimations of the paddy-rice-sown area from 2004 to 2017. The statistical data for the paddy-rice-sown areas were used to verify the agreement between the government statistics data and our results. Figure 3 shows the flowchart of the superpixel-based MLSTM-FCN for mapping paddy rice. First, the Continuous Change Detection and Classification (CCDC) algorithm was used to synthesize cloud-free Landsat time series, and then SIs time series were created for each year (Section 2.3.1). Next, a superpixel segmentation method was employed to generate annual superpixel maps (Section 2.3.2), and the superpixel-based training/validation dataset was extracted from superpixel maps and SIs time series (Section 2.3.3). Finally, the MLSTM-FCN was used to identify the paddy rice superpixels from the training data generated in the previous section (Section 2.3.4). to obtain the latitude and longitude of rice-dominated surrounding plots, and a total of 43 points were recorded. The field survey sample points and randomly generated points were merged and further interpreted carefully by using the corresponding Tianditu highresolution images to exclude the points for which a confident interpretation could not be made.

Agriculture Statistical Data
Agricultural statistics for Cengong were provided by the Bureau of Agriculture and Rural Affairs of Cengong. Statistical data contain estimations of the paddy-rice-sown area from 2004 to 2017. The statistical data for the paddy-rice-sown areas were used to verify the agreement between the government statistics data and our results. Figure 3 shows the flowchart of the superpixel-based MLSTM-FCN for mapping paddy rice. First, the Continuous Change Detection and Classification (CCDC) algorithm was used to synthesize cloud-free Landsat time series, and then SIs time series were created for each year (Section 2.3.1). Next, a superpixel segmentation method was employed to generate annual superpixel maps (Section 2.3.2), and the superpixel-based training/validation dataset was extracted from superpixel maps and SIs time series (Section 2.3.3). Finally, the MLSTM-FCN was used to identify the paddy rice superpixels from the training data generated in the previous section (Section 2.3.4).

Creating Landsat Spectral Indices Time Series
The CCDC algorithm was employed to generate synthetic Landsat surface reflectance images. The CCDC algorithm is based on the Landsat time series images without

Creating Landsat Spectral Indices Time Series
The CCDC algorithm was employed to generate synthetic Landsat surface reflectance images. The CCDC algorithm is based on the Landsat time series images without (or at least with few) noises such as clouds and cloud shadows, and requires the surface reflectance of Blue, Green, Red, Near-Infrared (NIR), Shortwave Infrared 1 (SWIR1), SWIR2, and the BT of the thermal infrared band as the inputs [48,49]. The clouds and cloud shadows can be screened out by the QA band of Landsat images [46]. For each individual pixel, the CCDC algorithm estimates three sets of Fourier models (e.g., simple (k = 1), advanced (k = 2), and full (k = 3)) according to clear-sky observations, as in Equation (1), and uses the models to detect land cover changes once new Landsat images are collected [50]. If a change is detected, a new time series model will be generated. The CCDC algorithm can also predict Landsat surface reflectance for any desired date based on estimated time series models. The surface reflectance image is synthesized every 8 days to meet the needs of temporal resolution of Landsat images for capturing the paddy rice growth profiles using the CCDC time series models [18,51], which results in 46 synthetic Landsat surface reflectance images per year [48].ρ where i is the i-th Landsat band;ρ i,t is the predicted surface reflectance for the i-th Landsat band on the Julian date t. a 0,i , and c 1,i are the overall (intercept) and interal-annual change (slope) coefficients for the ith Landsat band; a k,i and b k,i are the intra-annual change coefficients for the i-th Landsat band; k is the temporal frequency of the Fourier component (k = 1, 2, and 3), which is determined by the number of clear-sky observations; and T is the number of days per year (365. 25). Many previous studies have demonstrated that SIs (e.g., NDVI, EVI, LSWI, and NDSI) are useful for paddy rice mapping [25,26,29,30,35,36]. Prior to their input into rice paddy classifiers, the NDVI, EVI, LSWI, and NDSI are computed as follows ( Figure 4): where ρ Blue , ρ Green , ρ Red , ρ NIR , and ρ SWIR are the surface reflectance from the Blue, Green, Red, NIR, and SWIR1 bands of the synthesized Landsat image, respectively.
Remote Sens. 2022, 14, x FOR PEER REVIEW 6 of 20 (or at least with few) noises such as clouds and cloud shadows, and requires the surface reflectance of Blue, Green, Red, Near-Infrared (NIR), Shortwave Infrared 1 (SWIR1), SWIR2, and the BT of the thermal infrared band as the inputs [48,49]. The clouds and cloud shadows can be screened out by the QA band of Landsat images [46]. For each individual pixel, the CCDC algorithm estimates three sets of Fourier models (e.g., simple (k = 1), advanced (k = 2), and full (k = 3)) according to clear-sky observations, as in Equation (1), and uses the models to detect land cover changes once new Landsat images are collected [50]. If a change is detected, a new time series model will be generated. The CCDC algorithm can also predict Landsat surface reflectance for any desired date based on estimated time series models. The surface reflectance image is synthesized every 8 days to meet the needs of temporal resolution of Landsat images for capturing the paddy rice growth profiles using the CCDC time series models [18,51], which results in 46 synthetic Landsat surface reflectance images per year [48].
where i is the i-th Landsat band; ̂, is the predicted surface reflectance for the i-th Landsat band on the Julian date t. 0, , and 1, are the overall (intercept) and interal-annual change (slope) coefficients for the ith Landsat band; , and , are the intra-annual change coefficients for the i-th Landsat band; k is the temporal frequency of the Fourier component (k = 1, 2, and 3), which is determined by the number of clear-sky observations; and T is the number of days per year (365. 25). Many previous studies have demonstrated that SIs (e.g., NDVI, EVI, LSWI, and NDSI) are useful for paddy rice mapping [25,26,29,30,35,36]. Prior to their input into rice paddy classifiers, the NDVI, EVI, LSWI, and NDSI are computed as follows ( Figure 4):

Time Series Superpixel Segmentation
Superpixel is a perceptually meaningful cluster of connected similar pixels on images. It captures image redundancy and greatly reduces the complexity of subsequent image processing tasks [52]. In this study, the superpixel was introduced to reduce the salt-andpepper effect in rice paddy maps [12]. Many approaches have been proposed to generate superpixels, such as Superpixels Extracted via Energy-Driven Sampling (SEEDS) [53], Simple Non-Iterative Clustering (SNIC) [54], and Superpixel Sampling Networks (SSNs) [55]. Among these algorithms, SNIC is faster and requires less computation memory, and is easy to implement at higher dimensions. So, it was modified to generate superpixel maps using the SIs time series.
The original SNIC was designed for RGB images. Several modifications were adopted to apply the SNIC algorithm to time series images. Firstly, since the CIELAB color space is not suitable for time series images, SIs time series were directly input into the SNIC, and a penalty-coefficient-based dynamic time warping (DTW) [56] algorithm was introduced to measure the distance from candidate time series (t i ) to the superpixel centroid (t k ), as given by Equation (6). Secondly, an adaptive centroid placement method [57] was employed to produce initial centroids according to the information distribution of the images.
where x i and x k are the spatial position of time series t i and t k , respectively. s and m are the normalizing factors for spatial and time series distances. γ_DTW is the penalty-coefficientbased DTW, which can be written as γ_DTW = γ × DTW; the penalty coefficient γ is defined as Equation (7).
where n i and n k are the length of time series t i and t k , respectively; and l is the longest common subsequence (LCS) between t i and t k .
In this study, considering the size of the paddy rice field in Cengong, the initial superpixel size was set to 5 × 5 (i.e., approximately 150 × 150 m). The connectivity and compactness were set to 4 and 1, respectively. For each year, 46 SIs images were used to produce annual superpixel segmentation maps, and no postprocessing was applied to eliminate small superpixels (see Figure 5b).

Superpixel-Wise Time Series and Sample Data Construction
Pixels within a given superpixel present similar characteristics, which can be represented by the mean feature [58]. In this study, a mean operation was first applied to the pixels within the superpixel, and the mean feature value was assigned to the superpixel as a feature. Finally, four SI (NDVI, EVI, LSWI, and NDSI) time series were constructed for each superpixel.
To ensure the accuracy of samples, the sample points were screened out when land cover changes occurred based on the results of the CCDC algorithm. A total of 235 rice points and 658 non-rice points were collected, and the SIs time series at each sample point were extracted as features of the pixel-wise sample. Additionally, to obtain the superpixelwise sample features, as is shown in Figure 5a, a superpixel-containing sample s was first overlaid onto the land use map parcel p. Then, the features of the superpixel sample were produced by averaging the SI pixels completely within the superpixel and the parcel p (Figure 5c). pactness were set to 4 and 1, respectively. For each year, 46 SIs images were used to produce annual superpixel segmentation maps, and no postprocessing was applied to eliminate small superpixels (see Figure 5b).

Superpixel-Wise Time Series and Sample Data Construction
Pixels within a given superpixel present similar characteristics, which can be represented by the mean feature [58]. In this study, a mean operation was first applied to the pixels within the superpixel, and the mean feature value was assigned to the superpixel as a feature. Finally, four SI (NDVI, EVI, LSWI, and NDSI) time series were constructed for each superpixel.
To ensure the accuracy of samples, the sample points were screened out when land cover changes occurred based on the results of the CCDC algorithm. A total of 235 rice points and 658 non-rice points were collected, and the SIs time series at each sample point were extracted as features of the pixel-wise sample. Additionally, to obtain the superpixelwise sample features, as is shown in Figure 5a, a superpixel-containing sample s was first overlaid onto the land use map parcel p. Then, the features of the superpixel sample were produced by averaging the SI pixels completely within the superpixel and the parcel p (Figure 5c).  [45]. The MLSTM-FNC augments the full convolution neural network (FCN) module with a long short-term memory (LSTM) module and squeeze-and-excitation blocks [59] to improve the performance. The MLSTM-FCN considers the complex structure of MTS and classifies MTS without heavy preprocessing or feature engineering. Based on the MLSTM-FCN, paddy rice can be classified more accurately and quickly with the consideration of interdependences among different SIs.
In the MLSTM-FCN, feature extractors of the FC block were three temporal convolutional (TC) blocks. Each TC block was followed by a batch normalization (BN) layer with a momentum of 0.99 and an epsilon of 0.001 [60], which was activated by the Rectified Linear Units (ReLUs) [61]. Moreover, the first two TC blocks were succeeded by a squeeze-andexcite block with a reduction ratio r of 16, which was used to recalibrate the input feature maps adaptively [59]. Finally, a global average pooling layer was applied after the final TC block.
Besides the FC block, the MTS was passed through a dimension shuffle layer and conveyed into the LSTM block. The LSTM block contained an LSTM layer followed by a dropout layer, with a dropout rate of 80% to mitigate overfitting. Finally, the results obtained from the FC block and the LSTM block were merged and input into a SoftMax classification layer to produce the final classification result.

Experiment Design 2.4.1. Methods for Comparison
The pixel-based MLSTM-FCN and RF classifier were used for comparison. RF is an ensemble learning algorithm for classification based on the bagging technique, which is suitable for processing high-dimensional data and prevents overfitting [62].

Model Training and Mapping
The pixel-based (pixel MLSTM-FCN) classifier and superpixel-based classifiers (superpixel MLSTM-FCN and superpixel RF) were trained with pixel-wise and superpixel-wise samples, respectively. The 46 SIs (including the NDVI, EVI, LSWI, and NDSI every 8 days) time series images produced in a year were entered into the MLSTM-FCN classifier and the RF classifier. For the RF classifier, the SI values of each period of the time series were taken as independent features, and important features were selected using SelectFromModel in sklearn according to the importance of the feature.
For tuning the hyperparameters of the MLSTM-FCN and RF, a grid search with cross-validation (GridSearchCV) was employed. According to the specified parameter values, the GridSearchCV generated the best candidates by establishing models with each hyperparameter combination and using 10-fold cross-validation to evaluate each model. The hyperparameter combination with the highest OA was obtained. Table 1 shows the hyperparameter combinations used to search for optimal hyperparameter combinations of the classifiers. Then, a final model was trained with all sample data and optimal hyperparameter combination after searching. The other hyperparameters, including learning rate, dropout value, and loss function for the MLSTM-FCN, max_features, min_samples_leaf, and min_samples_split for RF, were set as the default values.

Performance Evaluation
To assess the performance of the paddy rice classifier, 10-fold cross-validation was conducted to evaluate the model established with the optimal hyperparameter combination; the average OA, Producer's Accuracy (PA), User's Accuracy (UA), and kappa coefficient of 10 times validation were analyzed. Considering the TP as correctly classified paddy rice samples, TN as the number of accurately identified non-paddy rice samples, FN as paddy rice classified as non-paddy rice, and FP as wrongly classified paddy rice, the performance metrics of classifiers can be defined as Equations (8)- (12). (TP + TN + FP + FN) 2 (12) In this study, the CCDC algorithm used to generate synthetic Landsat surface reflectance images was the Matlab procedure (https://github.com/GERSL/CCDC, (accessed on 20 November 2021)). Other data processes were carried out with Python, and the paddy rice classifiers were developed and evaluated using the scikit-learn and PyTorch packages established in Python. The classifiers were trained and executed on a workstation with 2 Intel Xeon E5-2650 processors, NVIDIA GeForce RTX 1080Ti (12 GB), and a memory of 128 GB.

Performance Evaluation and Comparison with Agriculture Statistical Data
With the validation samples given in Section 2.2.3 and Figure 1, the performance of annual paddy rice classifiers was evaluated, as listed in Table 2. The results indicate that the OA, PA, UA, and kappa of the annual paddy rice classifiers differed across the study period.  Table 2 as underlined values). It is noteworthy that all the classifiers used in this study appeared to be reasonably accurate. On average, the superpixel MLSMT-FCN produced slightly higher values of OA (0.9646), PA (0.9478), and kappa (0.9140) than the pixel MLSTM-FCN and superpixel RF and had the lowest value of UA (0.9330).
Moreover, the areas of paddy rice mapped by the superpixel MLSTM-FCN, superpixel RF, and pixel MLSTM-FCN were compared with agricultural statistics on rice-sown areas from 2004 to 2017. Figure 6 shows that the temporal profile of paddy rice areas generated by the superpixel MLSTM-FCN was more consistent with the rice-sown area than the paddy rice areas mapped by the pixel MLSTM-FCN and superpixel RF. During 2004-2017, the relative error in areas between annual paddy rice classification maps and agriculture statistics was 0.20-4.15%, 1.74-31.59%, and 0.07-14.55% for the superpixel MLSMT-FCN, superpixel RF, and pixel MLSTM-FCN, respectively. It is worth noting that the superpixel RF mapped paddy rice areas were underestimated in most years, whereas the pixel-MLSTM-FCN-mapped paddy rice areas were overestimated. The comparison of mapped paddy rice areas by three classifiers and statistical rice-sown areas is depicted in Figure 7

Visual Assessment of Paddy Rice Mapping Results
The classification maps of the superpixel MLSMT-FCN, superpixel RF, and pixel MLSTM-FCN are shown in Figure 8 to describe the spatial distribution of paddy rice in

Visual Assessment of Paddy Rice Mapping Results
The classification maps of the superpixel MLSMT-FCN, superpixel RF, and pixel MLSTM-FCN are shown in Figure 8 to describe the spatial distribution of paddy rice in

Visual Assessment of Paddy Rice Mapping Results
The classification maps of the superpixel MLSMT-FCN, superpixel RF, and pixel MLSTM-FCN are shown in Figure 8 to describe the spatial distribution of paddy rice in 2020. The local paddy rice classification results and the false-color image of some representative sites (subregions A-F in Figure 8a) are presented in Figure 9. The classified rice paddies were usually small and scattered. The classified rice paddies of models were mainly distributed in the intermountain basins (locally named "Bazi") along the rivers, which is shown in the river map in Figure 1b. Figure 9a-f confirm that the superpixel MLSMT-FCN achieved much higher classification accuracy than the other two methods. The pixel MLSMT-FCN generally identifies more rice paddies, and the superpixel RF classifier identified fewer rice paddies. Rice paddies with larger areas could be identified correctly by all the classifiers that were used in this study (Figure 9a,c,e). The major difference among paddy rice maps generated by three classifiers was the classification of edge pixels (mixed with other land cover types, such as well-developed roads) (Figure 9d) and smaller rice paddy areas ( Figure 9b). As shown in Figure 9, the pixel MLSTM-FCN rice map suffered from the salt-and-pepper effect across the region, mainly caused by commission errors. A reduction in commission error was achieved using the superpixel-based classifier as opposed to pixel-based classifiers (Figure 9). Figure 9a-f confirm that the superpixel MLSMT-FCN achieved much higher classification accuracy than the other two methods. The pixel MLSMT-FCN generally identifies more rice paddies, and the superpixel RF classifier identified fewer rice paddies. Rice paddies with larger areas could be identified correctly by all the classifiers that were used in this study (Figure 9a,c,e). The major difference among paddy rice maps generated by three classifiers was the classification of edge pixels (mixed with other land cover types, such as well-developed roads) (Figure 9d) and smaller rice paddy areas ( Figure 9b). As shown in Figure 9, the pixel MLSTM-FCN rice map suffered from the salt-and-pepper effect across the region, mainly caused by commission errors. A reduction in commission error was achieved using the superpixel-based classifier as opposed to pixel-based classifiers (Figure 9). Remote Sens. 2022, 14, x FOR PEER REVIEW 14 of 20

Interannual Spatial Distribution of Paddy Rice
Since the superpixel MLSMT-FCN yielded more accurate paddy rice maps, annual paddy rice maps from 2000 to 2020 in Cengong were generated with the superpixel MLSMT-FCN ( Figure 10). These maps can describe the temporal dynamics of paddy rice distributions. On a temporal scale, the paddy rice area did not exhibit a significant trend but fluctuated during the different years ( Figure 10

Interannual Spatial Distribution of Paddy Rice
Since the superpixel MLSMT-FCN yielded more accurate paddy rice maps, annual paddy rice maps from 2000 to 2020 in Cengong were generated with the superpixel MLSMT-FCN ( Figure 10). These maps can describe the temporal dynamics of paddy rice distributions. On a temporal scale, the paddy rice area did not exhibit a significant trend but fluctuated during the different years ( Figure 10 Figure 11 shows that the changed rice paddies were mainly located at boundary pixels of the field or the smaller plots.  Figure 11 shows that the changed rice paddies were mainly located at boundary pixels of the field or the smaller plots.     Figure 11 shows that the changed rice paddies were mainly located at boundary pixels of the field or the smaller plots.

Discussion
Compared with traditional machine learning algorithms used for mapping paddy rice regions, deep learning time series classifiers can effectively learn salient characteristics and long-term dependencies of time series to provide better performances without onerous preprocessing or feature engineering [63,64]. The superpixel MLSTM-FCN could map annual paddy rice accurately with available Landsat images in complex landscape regions, and its accuracy assessment indicated a better performance than that of superpixel RF. The superpixel RF usually omitted the edge pixels and rice paddies within small areas (Figure 9), which may have been caused by the input features of RF. The input features of RF were the SI values of each period in the time series, which were taken as independent features. Therefore, the RF classifier could capture the temporal correlation of SIs time series. These characteristics are usually used to represent the phenology of paddy rice and have been proven to play a critical role in mapping paddy rice in previous studies [6,14,26]. Consequently, when using traditional machine learning algorithms to map paddy rice, robust feature engineering is required to extract the temporal characteristics as part of the inputs. In addition, the spatial context plays a major role and provides inherent information about the target pixels in remote sensing image classification [30]. The DTW-SINC algorithm was employed to produce superpixel segmentation maps, and mean features were obtained based on superpixel segmentation to describe the spatial feature information of each superpixel. The well-trained MLSMT-FCN with superpixel features was more robust in resisting the salt-and-pepper noise and fragmentation than the pixel MLSTM-FCN, as clearly shown in Figure 9. Therefore, integrating more robust spectral, spatial, and temporal feature extraction into a single model for improving classification accuracy would be worth in-depth research.
Although the superpixel MLSMT-FCN in this study presented satisfactory paddy rice maps, the superpixel LSTM-FCN also exhibited misclassification in paddy rice maps. The synthetic SIs time series images relied on the change detection results from the CCDC algorithm. That algorithm has commission and omission errors in land change detection, especially for land cover types that have higher intra-annual variation, such as agriculture [48,49], and these errors will undermine the time series models used to generate synthetic surface reflectance images and be inherited in the final paddy rice maps. In addition, the misclassification pixels were usually found at the boundaries of the rice paddies, which was also confirmed by previous studies [16]. This finding suggests that using Landsat images with a spatial resolution of 30 m may be insufficient for classifying boundary pixels for small rice paddies. Additionally, land cover with similar spectral and phenological characteristics is often misclassified as paddy rice fields, which leads to noteworthy commission errors. The misclassified land cover mainly includes aquatic plants, such as lotus, abandoned paddies with weeds, and the shadows cast by buildings or topography [12,34,48]. Thus, methods that can combine neighborhood information [65] and high-spatial-resolution satellite images could potentially reduce the misclassification and provide more accurate paddy rice maps in future research.
Based on the spatiotemporal distribution of the annual paddy rice maps in this study, the declining trend from 2000 to 2004 was consistent with the decline in the rice-sown areas in China [66]. The paddy rice areas increased sharply in 2005, and declined severely in 2008, followed by increasing fluctuations until 2020. The possible reasons for the paddy rice area increase in 2005 may be that the government provided a series of special support policies to farmers, such as adopting a reduction in or exemption from agriculture tax and directly subsidizing grain planting while substantially increasing grain purchase prices [67]. In contrast, the increased prices of agricultural inputs (e.g., seeds, fertilizers, and chemicals) reduced farmers' willingness and confidence to plant crops, resulting in a severe decrease in paddy rice areas in 2008. Since 2008, the Chinese government has implemented a series of policies to promote farmland circulation, which have directly re-inspired the restoration of paddy rice planting.

Conclusions
This study proposed a framework that combined superpixel segmentation and deep learning for mapping paddy rice in complex landscape regions with all available Landsat images. Superpixel segmentation and deep learning were combined to extract spectral, spatial, and temporal features of rice paddies. The coupling of deep learning and superpixel segmentation could effectively improve the classification accuracy and reduce the misclassification of the paddy rice map. Compared with county statistics data, the results from using the superpixel MLSTM-FCN were more consistent. The validation results also show that the proposed method had an OA of 0.9547-0.9721, which indicated that the proposed method outperformed pixel-based methods and traditional machine learning algorithms. The dynamics of rice paddies over the last two decades were analyzed, and the policies and the cost of agricultural inputs had greatly affected farmers' willingness to plant. The resulting maps could provide useful information for policymakers in identifying the dynamics of the paddy rice area, which should be greatly helpful for planning government interventions to promote rice production effectively. The success of this framework verified the necessity of integrating domain knowledge with deep learning techniques for agriculture studies. More research should be conducted to verify the applicability of the proposed method for mapping paddy rice in other complex landscape regions, such as South China. Further, to avoid the irregular availability of remote sensing images, the time series inputs were generated by the CCDC algorithm, which is a heavy computation cost method, and this may limit the applicability of this method in a wider range of regions. Thus, improving the model's ability to handle the irregular data available without heavy computation is necessary in the future.