Development of a Phenology-Based Method for Identifying Sugarcane Plantation Areas in China Using High-Resolution Satellite Datasets

: Sugarcane is an important sugar and biofuel crop with high socio-economic importance, and its planted area has increased rapidly in recent years. China is the world’s third or fourth sugarcane producer. However, to our knowledge, no study has investigated the mapping of sugarcane cultivation areas across entire China. In this study, we developed a phenology-based method to identify sugarcane plantations in China at 30-m spatial resolution from 2016–2020 using the time-series of Landsat and Sentinel-1/2 images derived from Google Earth Engine (GEE) platform. The method worked by comparing the phenological similarity in normalized difference vegetation index (NDVI) series between unknown pixels and sugarcane samples. The phenological similarity was assessed using the time-weighted dynamic time warping method (TWDTW), which has less sensitivity to training samples than machine learning methods and therefore can be easily applied to large areas with limited samples. More importantly, our method introduced multiple and moving time standard phenological curves of sugarcane to the TWDTW by fully considering the variable crop life-cycle of sugarcane, particularly its long harvest season spanning from December to March of the following year. Validations showed the method performed well in 2019, with overall accuracies of 93.47% and 92.74% for surface reﬂectance (SR) and top of atmosphere reﬂectance (TOA) data, respectively. The sugarcane maps agreed well with the agricultural statistical areas from 2016–2020. The mapping accuracies using TOA data were comparable to SR data in 2019–2020, but outperformed SR data in 2016–2018 when SR data had lower availability on GEE. The sugarcane maps produced in this study can be used to monitor growing conditions and production of sugarcane and, therefore, can beneﬁt sugarcane management, sustainable sugarcane production, and national


Introduction
Sugarcane (Saccharum officinarum) is a semi-perennial crop with high socio-economic importance planted across tropical and subtropical regions of the world [1]. Sugarcane can be used to produce sugar and provides over 70% of the world's sugar production [2,3]. Sugarcane residues are rich in cellulose and hemicellulose, which can be used to produce bioethanol and other liquid transportation biofuels [4]. Sugarcane biofuel is an alternative energy source that can contribute to reducing greenhouse gases (GHG) emissions associated with fossil fuel combustion [5][6][7]. For example, sugarcane bioethanol could reduce fossil fuel GHG emissions by 85% in Brazil [8]. Sugarcane expanded rapidly in the past decades by

Landsat Data and Sentinel-2 Data
To test the capability of the model in the identification of sugarcane by using atmospherically corrected optical data from the SR and TOA data, we used all the available Landsat and Sentinel-2 TOA data and SR data at the Google Earth Engine (GEE) platform

Landsat Data and Sentinel-2 Data
To test the capability of the model in the identification of sugarcane by using atmospherically corrected optical data from the SR and TOA data, we used all the available Landsat and Sentinel-2 TOA data and SR data at the Google Earth Engine (GEE) platform to generate 16-day composite 30 m normalized difference vegetation index (NDVI) series from 2016 to 2020. We used the Landsat-7 Enhanced Thematic Mapper (ETM+) and Landsat-8 Operational Land Imager (OLI) Collection 1 SR and TOA data at 16-day intervals from 2016-2020. The quality bands pixel_qa and BQA were used to remove cloud and cloud shadow contaminations to obtain high-quality SR and TOA images, respectively. We also used the Sentinel-2A/B Multi-Spectral Instrument (MSI) Level-1C (TOA) and Level-2A (SR) data at 5-day intervals from 2016-2020. The quality band QA60 in Level-1C and Level-2A were used to remove cloud and cloud shadow contaminations. The Level-2A Sentinel-2 data on GEE were derived from the Sentinel Hub generated from Level-1C using the Sen2Cor processor. These data were produced by the European Space Agency (ESA) over Europe since March 2018 and extended to global since December 2018. The Level-2A data can also be derived from Level-1C using the Sentinel Toolbox provided by ESA. However, the toolbox is not directly available in GEE and the process needs huge computation and storage when applied to a large scales. Therefore, in China, the Level-2A (SR) data cannot be obtained directly from either GEE or ESA at large scales before December 2018.
The 16-day composite NDVI were produced by selecting the maximum value with cloud-free observation in each 16-day period. NDVI can be calculated from the reflectance of near-infrared band (ρNIR) and red band (ρRED): We further filled the cloud gaps in the 16-day NDVI using the linear interpolation method, and filtered the filled NDVI using the Whittaker smoother to eliminate noises [49,50]. The number of high-quality observations (cloud-free observations) for the 16-day composite SR and TOA data for each year from 2016 to 2020 are shown in Figure 2. For TOA data, the majority of the pixels (over 60%) had more than 17 high-quality observations for each year from 2018 to 2020, while the high-quality observations were fewer for each pixel in the years 2016 and 2017 for the absence of Sentinel-2B data (Sentinel-2B was launched on March 2018) (Figure 2a-e). For SR data, the number of high-quality observations was comparable to the TOA data from 2019 to 2020, while it greatly decreased in 2016-2018 with most of the pixels had fewer than 14 high-quality observations (Figure 2f-j). The main cause for this is that, at the time of this study, the Sentinel-2A/B SR data for China were not available on the GEE platform before December 2018.

Sentinel-1 Data
SAR data from Sentinel-1 were used to separate banana from sugarcane (see Section 2.3.3). The Sentinel-1 mission provides data from a dual-polarization C-band SAR instrument at 5.405 GHz at a spatial resolution of 10 m. The data were processed using the Sentinel-1 Toolbox to generate a calibrated, ortho-corrected product by ESA. The refined Lee filter method was used to remove the speckle noises on the SAR images [51]. Finally, we obtained the averaged value of the VH (vertical transmit/horizontal receive) polarization band during the growing season of sugarcane (from June to November) for each year in 2016-2020 from the GEE platform.

Field Data and Agricultural Statistical Data
Field data were used for training and validating the proposed sugarcane identification method. We collected the field data from two sources. First, field surveys were conducted in the four major sugarcane production provinces in the year 2019, and the vegetation types and their locations were marked using a hand-held GPS (G120, UniStrong, Beijing, China). Second, sample points were selected and labelled as sugarcane or non-sugarcane according to the color and textures on the high-resolution images from Google Earth. To further guarantee the accuracy of the labels, we cross-checked each sample using their NDVI series curve to verify the crop type (i.e., sugarcane or non-sugarcane). Finally, for the year 2019, we collected a total of 5764 samples, of which 1596 were for sugarcane and 4168 for non-sugarcane ( Figure 1).
We further filled the cloud gaps in the 16-day NDVI using the linear interpolation method, and filtered the filled NDVI using the Whittaker smoother to eliminate noises [49,50]. The number of high-quality observations (cloud-free observations) for the 16-day composite SR and TOA data for each year from 2016 to 2020 are shown in Figure 2. For TOA data, the majority of the pixels (over 60%) had more than 17 high-quality observations for each year from 2018 to 2020, while the high-quality observations were fewer for each pixel in the years 2016 and 2017 for the absence of Sentinel-2B data (Sentinel-2B was launched on March 2018) (Figure 2a-e). For SR data, the number of high-quality observations was comparable to the TOA data from 2019 to 2020, while it greatly decreased in 2016-2018 with most of the pixels had fewer than 14 high-quality observations (Figure 2fj). The main cause for this is that, at the time of this study, the Sentinel-2A/B SR data for China were not available on the GEE platform before December 2018.  this study and were replaced with the year 2019. The sugarcane planted area at the county level could not be found for Yunnan. For Hainan Province, the administrative regions of municipalities and counties are not separated, therefore we used all the statistical data at the municipalities and counties in the validation at both the municipal and county levels.

Methods
As a typical and commonly used vegetation index in landcover and land use classification, NDVI has been successfully applied to the TWDTW in crop identification in several studies [23,42]. In our study, we used NDVI time series to represent the phenological changes of vegetation in TWDTW. The workflow includes the following steps: (1) preprocess of the satellite data, including optical data (i.e., Landsat 7/8 and Sentinel-2) and SAR data (Sentinel-1) (Section 2.2.1; Section 2.2.2); (2) extract and generate the standard NDVI curves (i.e., the averaged NDVI temporal patterns) of randomly selected 100 sugarcane samples from the 16-day composite NDVI time series for SR and TOA data; (3) develop the mapping method to identify the sugarcane planted area in China based on the TWDTW; (4) validate the sugarcane maps at point and regional levels in different provinces and years.

Time-Weighted Dynamic Time Warping (TWDTW) Method
The DTW was originally proposed for speech recognition [52]. The DTW compares the similarity between two sequences, such as the unknown sequence (Y) and target sequence (X), through warping the unknown sequence (Y) to search for their optimal path and obtain their minimum distance, namely the similarity (dissimilarity). The TWDTW is an improved version of the DTW method by adding a temporal weight accounting for phenological changes [43]. The TWDTW method includes three main steps in land cover and land use classification: (1) generation of the standard phenological curves (temporal pattern) of the field samples of crops to be classified using NDVI time series; (2) calculation of TWDTW distances between unknown pixels and the standard NDVI curves; (3) identification or classification of the unknown pixels according to the TWDTW distances, in which pixels with lower distances indicate higher similarities and, therefore, higher probabilities of being the target class.

Incorporating the TWDTW Method into Sugarcane Mapping in China
In China, newly planted sugarcane and ratoon sugarcane have similar growing stages and NDVI variations. Most of the sugarcane in China begins to grow from March to April and can be harvested from December to March of the following year, with about a one year life cycle [16,17]. The sugarcane crop cycle in China generally can be divided into four phenological stages, including (1) seeding or germination phase (including ratoon cane), from March to early June with low NDVI; (2) tillering phase, from May to August, with NDVI increasing quickly; (3) grand growth phase, from July to the end of November, with NDVI reaching its peak; (4) ripening and harvesting phase, from November to March of the following year, with NDVI sharply decreasing. The temporal patterns of NDVI, namely standard NDVI curves, for sugarcane in China for both SR and TOA data are shown in Figure 3. The standard NDVI curves were produced as following. (1) Randomly selecting 100 sugarcane samples across China from the field samples. (2) Extracting their NDVI curves from the 16-day composited NDVI time series in 2019/2020 crop year for both SR and TOA data. (3) Adjusting the timeline of each NDVI curves manually by moving forward or back 16 days (or 32 days) to ensure all the 100 NDVI curves generally in the same growing period. Because the majority of these NDVI curves are consistent with each other, we only need to adjust a very small number of them manually to let their germination phase, harvesting phase, and peak of growing season are relatively in the same 16-day period with the majorities. (4) Calculating their averaged NDVI values to obtain the standard NDVI curves for both SR and TOA data ( Figure 3).
crop year for both SR and TOA data. (3) Adjusting the timeline of each NDVI curves manually by moving forward or back 16 days (or 32 days) to ensure all the 100 NDVI curves generally in the same growing period. Because the majority of these NDVI curves are consistent with each other, we only need to adjust a very small number of them manually to let their germination phase, harvesting phase, and peak of growing season are relatively in the same 16-day period with the majorities. (4) Calculating their averaged NDVI values to obtain the standard NDVI curves for both SR and TOA data ( Figure 3). The TWDTW method was applied to sugarcane planted area identification in China with the following processes for both the SR and TOA data. (1) Generation of multiple standard curves from the standard NDVI curve. Because the germination and harvest times may vary in different areas and fields, we tested the result in each province using multiple and moving time standard curves. Namely, the standard curves of sugarcane in Figure 3 were moved forward and back by 16 days and 32 days, and therefore five The TWDTW method was applied to sugarcane planted area identification in China with the following processes for both the SR and TOA data. (1) Generation of multiple standard curves from the standard NDVI curve. Because the germination and harvest times may vary in different areas and fields, we tested the result in each province using multiple and moving time standard curves. Namely, the standard curves of sugarcane in Figure 3 were moved forward and back by 16 days and 32 days, and therefore five standard curves were derived from the original standard curves for both SR and TOA data ( Figure 4). (2) Calculation of the minimum TWDTW distances. For each unknown pixel, all the five NDVI distance (NDVI dis ) and their minimum distance were calculated using the TWDTW method based on the five standard NDVI curves in Figure 4 for each year in 2016-2020. (3) Calculation of the NDVI difference (NDVI diff ) between sugarcane grand growth period and harvest season. The NDVI of sugarcane is high in its grand growth period and decreases quickly during harvest season, which can be a good feature to help further separate sugarcane from other vegetation types that have no discernible NDVI decrease during the harvest season of sugarcane (e.g., evergreen forest). Therefore, for each pixel, we calculated the "difference" between the maximum NDVI (NDVI max ) in sugarcane grand growth period and the minimum NDVI (NDVI min ) in sugarcane harvest season. The NDVI max was calculated as the averaged value of the three maximum 16-day NDVI from July to December (namely from the 161-day to the 321-day, shadowed with orange in Figure 4); and the NDVI min was calculated as the two minimum 16-day NDVI form December to April in the next year (namely from the 337-day to the 113-day in the next year, shadowed with grey in Figure 4). From 100 randomly selected sugarcane samples, we found most of the sugarcane samples had higher NDVI diff than 0.36. Therefore, pixels with NDVI diff less than 0.36 were labelled as non-sugarcane, and pixels with NDVI diff greater than 0.36 were selected for further identification. (4) Identification and classification to obtain sugarcane maps. Pixels with lower dissimilarities (NDVI dis ) have a higher probability of being sugarcane. Agricultural statistical areas of sugarcane at the province level were used to determine the threshold of NDVI dis (T-NDVI dis ). Pixels with NDVI dis less than T-NDVI dis were recognized as sugarcane, with the condition that the identified area of sugarcane should be equal to the statistical area of sugarcane in the studied province. For each province, except Guangxi, municipalities with small areas of sugarcane plantation (less than 1000 ha or 1% plantation areas of the studied province) were identified separately to obtain higher accuracies.
classification to obtain sugarcane maps. Pixels with lower dissimilarities (NDVIdis) have a higher probability of being sugarcane. Agricultural statistical areas of sugarcane at the province level were used to determine the threshold of NDVIdis (T-NDVIdis). Pixels with NDVIdis less than T-NDVIdis were recognized as sugarcane, with the condition that the identified area of sugarcane should be equal to the statistical area of sugarcane in the studied province. For each province, except Guangxi, municipalities with small areas of sugarcane plantation (less than 1000 ha or 1% plantation areas of the studied province) were identified separately to obtain higher accuracies.  Multiple standard NDVI curves for sugarcane in China for both SR and TOA data. The orange areas are the periods used to calculate maximum NDVI (NDVI max ) in sugarcane grand growth period and the grey areas are the periods used to calculate the minimum NDVI (NDVI min ) in sugarcane harvest season.

Separating Banana from Sugarcane
Banana is a multi-year growing fruit exhibiting NDVI curves similar to those of sugarcane. Both banana and sugarcane have low NDVI at the start of the growing season (from March to May) and the end of the growing season (from December to March of the following year) because of harvest (Figure 5a). Where both crops are grown, some banana pixels may be misclassified as sugarcane. While compared with sugarcane, banana has larger leaves and branches with a rougher canopy surface and therefore exhibits larger backscatter coefficients (e.g., in VH polarization) during the growing season. Therefore, we calculated the mean value of VH from Sentinel-1 images from June to November for field samples of sugarcane and banana. The histogram shows that most of the sugarcane has an averaged VH less than −13.3 and most of the banana has an averaged VH greater than −13.3 (Figure 5b). Therefore, we set the threshold of the averaged VH from June to November as −13.3 to separate banana from sugarcane. Pixels with averaged VH greater than −13.3 were identified as non-sugarcane.

Separating Banana from Sugarcane
Banana is a multi-year growing fruit exhibiting NDVI curves similar to those of sugarcane. Both banana and sugarcane have low NDVI at the start of the growing season (from March to May) and the end of the growing season (from December to March of the following year) because of harvest (Figure 5a). Where both crops are grown, some banana pixels may be misclassified as sugarcane. While compared with sugarcane, banana has larger leaves and branches with a rougher canopy surface and therefore exhibits larger backscatter coefficients (e.g., in VH polarization) during the growing season. Therefore, we calculated the mean value of VH from Sentinel-1 images from June to November for field samples of sugarcane and banana. The histogram shows that most of the sugarcane has an averaged VH less than −13.3 and most of the banana has an averaged VH greater than −13.3 (Figure 5b). Therefore, we set the threshold of the averaged VH from June to November as −13.3 to separate banana from sugarcane. Pixels with averaged VH greater than −13.3 were identified as non-sugarcane.

Accuracy Assessment
The model accuracy for sugarcane mapping was validated at both pixel and regional levels. At the pixel level, we assessed the sugarcane map by comparing it with the field samples derived from field surveys and high-resolution images from Google Earth using

Accuracy Assessment
The model accuracy for sugarcane mapping was validated at both pixel and regional levels. At the pixel level, we assessed the sugarcane map by comparing it with the field samples derived from field surveys and high-resolution images from Google Earth using the indices of producer's accuracy (PA), user's accuracy (UA), and overall accuracy (OA). At the regional level, the identified sugarcane areas were calculated and compared with the agricultural statistical area at both municipal and county levels. The coefficient of determination (R 2 ), RMSE (root mean squared error), and RMAE (relative mean absolute error) between the estimated areas and agricultural statistical areas of sugarcane were adopted to evaluate the mapping accuracy. The RMSE can be calculated as: The RMAE is the ratio of MAE (mean absolute error) to the averaged value of the agricultural statistical areas for all the administrative regions: whereŜ i and S i are the estimated area and agricultural statistical area of sugarcane for the ith administrative region, respectively; n is the number of administrative regions included in the validation. The MAE can be calculated as: The MAE represents the accumulated absolute errors, and the RMSE reflects the deviation. Compared with MAE, RMSE is more sensitive to outliers.

Results
Annual sugarcane maps in the four major production provinces (Guangxi, Yunnan, Guangdong, and Hainan) in China from 2016-2020 were produced using the TWDTW method. We show year 2019 with SR data as an example in Figure 6. Based on the validations at pixel level using field samples from field surveys and Google Earth in 2019, the producer's, user's, and overall accuracies were 86.76%, 88.30%, and 93.47% with SR data; and 85.63%, 86.73%, and 92.74% with TOA data for sugarcane in China, respectively. The performance in different provinces decreased with the area of sugarcane, that is province with larger sugarcane area or proportion generally showed higher performance (Table 1). For example, Guangxi, the province with the largest sugarcane plantation (accounting for about 64.01% of the sugarcane in China), showed high producer's, user's, and overall accuracies of 87.66%, 89.25%, and 94.46% with SR data; and 86.31%, 87.89%, and 93.81% with TOA data, respectively (Table 1). Yunnan (17.70%) and Guangdong (12.20%) showed moderate performance, with an overall accuracy of 89.32% derived from SR data and 88.64% derived from TOA data for Yunnan, and 90.43% derived from SR data and 88.78% derived from TOA data for Guangdong (Table 1). Hainan, accounting for only 1.32% of the sugarcane in China, exhibited low performance (with an overall accuracy of 87.33% derived from SR data and 86.33% derived from TOA data) compared with other provinces. All the investigated provinces exhibited overall accuracies (OA) over 86% with both SR and TOA data (    In addition, our method accurately estimated the planted area of sugarcane compared with agricultural statistical data at the municipal and county levels. The R 2 were 0.97 and 0.92 at the municipal and county levels, respectively, and the respective RMAEs were 23.7% and 38.2% with SR data (Figure 7a,c). The model performance by using TOA data was comparable to that with SR data, with R 2 of 0.95 and 0.91 at the municipal and county levels, respectively, and the respective RMAEs were 26.6% and 40.0% (Figure 7b,d). Correlations between the estimated and agricultural statistical planted areas of sugarcane exhibited differently in different provinces at the municipal and county levels. At the municipal level, the R 2 ranged from 0.80 (Yunnan) to 0.99 (Guangdong) and the RMAE ranged from 19.4% (Guangxi) to 42.7% (Hainan) with SR data. And the R 2 ranged from 0.73 (Yunnan) to 0.98 (Guangdong) and the RMAE ranged from 21.4% (Guangxi) to 43.2% (Hainan) with TOA data (Figure 8a1-h1). At the county level, the R 2 ranged from 0.82 (Hainan) to 0.96 (Guangdong) and the RMAE ranged from 35.7% (Guangxi) to 51.1% (Guangdong) with SR data. And the R 2 ranged from 0.83 (Hainan) to 0.94 (Guangdong) and the RMAE ranged from 37.1% (Guangxi) to 55.0% (Guangdong) with TOA data (Figure 8a2-h2). Correlations between the estimated and agricultural statistical planted areas of sugarcane exhibited differently in different provinces at the municipal and county levels. At the municipal level, the R 2 ranged from 0.80 (Yunnan) to 0.99 (Guangdong) and the RMAE ranged from 19.4% (Guangxi) to 42.7% (Hainan) with SR data. And the R 2 ranged from 0.73 (Yunnan) to 0.98 (Guangdong) and the RMAE ranged from 21.4% (Guangxi) to 43.2% (Hainan) with TOA data (Figure 8a). At the county level, the R 2 ranged from 0.82 (Hainan) to 0.96 (Guangdong) and the RMAE ranged from 35.7% (Guangxi) to 51.1% (Guangdong) with SR data. And the R 2 ranged from 0.83 (Hainan) to 0.94 (Guangdong) and the RMAE ranged from 37.1% (Guangxi) to 55.0% (Guangdong) with TOA data (Figure 8b). Correlations between the estimated and agricultural statistical planted areas of sugarcane exhibited differently in different provinces at the municipal and county levels. At the municipal level, the R 2 ranged from 0.80 (Yunnan) to 0.99 (Guangdong) and the RMAE ranged from 19.4% (Guangxi) to 42.7% (Hainan) with SR data. And the R 2 ranged from 0.73 (Yunnan) to 0.98 (Guangdong) and the RMAE ranged from 21.4% (Guangxi) to 43.2% (Hainan) with TOA data (Figure 8a). At the county level, the R 2 ranged from 0.82 (Hainan) to 0.96 (Guangdong) and the RMAE ranged from 35.7% (Guangxi) to 51.1% (Guangdong) with SR data. And the R 2 ranged from 0.83 (Hainan) to 0.94 (Guangdong) and the RMAE ranged from 37.1% (Guangxi) to 55.0% (Guangdong) with TOA data (Figure 8b). The performance of the method also varied in different years (Figures 9 and 10). For the SR data, the R 2 generally increased with the year from 2016-2020, with lower R 2 in 2016-2018 (the lowest R 2 was 0.83 in 2016) and higher R 2 in 2019 and 2020 (the highest R 2 was 0.97 in 2019) for the entire China at the municipal level (Figure 9a). The RMAE showed the opposite trend in value to R 2 , with higher RMAE in 2016-2018 (the highest RMAE was 45.65% in 2016) and lower RMAE in 2019 and 2020 (the lowest RMAE was 23.66% in 2019) for the entire China at the municipal level (Figure 9e). Similar to SR data, for the TOA data, the R 2 generally increased with the year from 2016-2020, with the lowest R 2 (0.85) in 2016 and the highest R 2 (0.95) in 2019, and the RMAE generally decreased with the year from 2016-2020, with the highest RMAE (42.87%) in 2016 and the lowest RMAE (26.57%) in 2019 (Figure 9b,f). Additionally, at the county level, the identification using TOA data generally showed good performance from 2016-2020. However, the accuracy using SR data were high in 2019 and 2020 but a bit lower from 2016-2018 ( Figure 10).

Discussion
Sugarcane is distributed in tropical and subtropical areas of the world, with different phenological characteristics in different fields, regions, and countries. For example, sugarcane in Brazil has a long harvest season, spanning from April to December in the southcentral Brazil and spanning from September to April of the following year in northeast

Discussion
Sugarcane is distributed in tropical and subtropical areas of the world, with different phenological characteristics in different fields, regions, and countries. For example, sugarcane in Brazil has a long harvest season, spanning from April to December in the south-central Brazil and spanning from September to April of the following year in northeast Brazil [35]. In China, on the other hand, sugarcane enters the ripening stage in the November and can be harvested thereafter until March of the following year [31]. The long harvest season and variable phenological characteristics of sugarcane make it difficult to map sugarcane plantation areas at large scales. With the rapid expansion of sugarcane in recent years, studies appeared on sugarcane identification at local or regional scale. The first high resolution (30 m) sugarcane map in Brazil was generated for São Paulo for the 2003/04 crop year and updated annually using visual/manual image interpretation [35,53].
China is the world's fourth largest sugarcane-producing country, accounting for about 5.64% of the world's sugarcane production in 2019 [14]. In the recent ten years (2010-2019), the planted area of sugarcane in China decreased by about 14.35%, while its production increased by about 3.21% as a result of increasing yield (20.51%) [48]. In the recent four years, the planted area of sugarcane in China remained relatively stable (decreasing by about 0.78%), while its production and yield significantly increased by 5.98% and 6.18%, respectively [48]. Therefore, mapping sugarcane plantations in China is helpful to detect the mechanism of sugarcane plantation and yield variations in China. Some studies attempted to investigate sugarcane plantations in China at the local or regional scales. Tan et al. [17] extracted the sugarcane planted areas in Guangxi Province using maximum likelihood method and MODIS data. Jiang et al. [31] mapped the sugarcane in Zhanjiang City in Guangdong using RF and XGBoost classification methods based on Sentinel-1A and Sentinel-2 data. Wang et al. [40] identified the sugarcane in Guangxi Province using the thresholds of NDVI and phenology periods in different growing stages. However, to our knowledge, there are no moderate to high resolution sugarcane distribution maps all over the entire China. This is probably because machine learning methods strongly require extensive training samples, which are difficult to obtain [20,36,37]. Traditional phenology-based methods need fewer samples for training, but need local training samples to calibrate several thresholds when applied to other regions or years [40].
We proposed a phenology-based sugarcane identification method using the TWDTW method and our results showed a good performance in identifying planted areas of sugarcane. We also compared the performance of the method by using SR and TOA optical data from 2016-2020. Classifications based on high resolution optical data from the GEE platform often use TOA data instead of SR data mainly because the Sentinel-2A/B SR data before 2019 are not available on the GEE platform [40,54]. Therefore, it is necessary to examine whether it is suitable to use TOA data instead of SR data in sugarcane mapping on a large scale in our study. Study indicated that, in many applications involving classification and change detection, the atmospheric correction was unnecessary as long as the training data and the data to be classified were in the same relative scale [55]. Our study exhibited that the mapping accuracy of sugarcane was comparable by using SR and TOA data in 2019-2020, when there were sufficient data with high-quality observations (Figures 2 and 11). However, the mapping accuracy of sugarcane using SR data was lower than TOA data in 2016-2018, with lower R 2 and slope and higher RMSE and RMAE, when there were not enough SR images (Figures 2 and 11). Our study demonstrated that, TOA data can be used to identify sugarcane plantation areas based on the proposed method when there are not enough available SR data.
Overall, the method developed in our study can identify the sugarcane planted areas automatically and effectively with limited training samples for China, but there remain potential uncertainties in our proposed method. For example, the sugarcane maps produced in our study contain some isolated pixels. As shown in Figure 12, sugarcane patches with only one pixel account for 5.52% (Guangxi), 8.85% (Hainan), 13.37% (Guangdong), 14.74% (Yunnan) of the sugarcane planted area with SR data, and account for 6.12% (Guangxi), 9.23% (Hainan), 11.41% (Guangdong), 11.68% (Yunnan) of the sugarcane planted area with TOA data in the year 2019. On the one hand, south China is located in the tropical and subtropical areas charactered with frequent rainy and cloudy weather, therefore speckle saltand-pepper effects may exist in the sugarcane maps due to poor quality optical images [16]. In future study, applying an object-based image segmentation technique to our proposed method instead of the pixel-based method may reduce the salt-and-pepper effects in the sugarcane maps [23]. Furthermore, image fusion with high revisit MODIS data or cloud-free SAR data to reconstruct high quality NDVI time series may help to reduce the salt-and-pepper effects and improve mapping performance [56][57][58]. On the other hand, the landscape in south China is also highly heterogeneous and fragmented, with complex sugarcane plantation patterns. Some of the sugarcane is planted by small farm holders and is scattered in small patches. For example, the most sugarcane planted municipality in Guangdong is Zhanjiang, accounting for about 81.98% (139,100 ha) of the sugarcane plantations in Guangdong, while the remaining 18.02% (30,580 ha) of sugarcane plantations are dispersed in other municipalities [48]. The mapping accuracy at 30 m spatial resolution could be limited by the spectral mixing with other types of vegetation in the heterogeneous or scattered planted areas [27]. Additionally, directly comparing with other prevailing methods, such as machine learning, can help to deepen understanding of the method in our study and therefore reduce uncertainties. But the machine learning methods rely on plenty of samples which are difficult to apply to such a large region in current study. methods strongly require extensive training samples, which are difficult to obtain [20,36,37]. Traditional phenology-based methods need fewer samples for training, but need local training samples to calibrate several thresholds when applied to other regions or years [40].
We proposed a phenology-based sugarcane identification method using the TWDTW method and our results showed a good performance in identifying planted areas of sugarcane. We also compared the performance of the method by using SR and TOA optical data from 2016-2020. Classifications based on high resolution optical data from the GEE platform often use TOA data instead of SR data mainly because the Sentinel-2A/B SR data before 2019 are not available on the GEE platform [40,54]. Therefore, it is necessary to examine whether it is suitable to use TOA data instead of SR data in sugarcane mapping on a large scale in our study. Study indicated that, in many applications involving classification and change detection, the atmospheric correction was unnecessary as long as the training data and the data to be classified were in the same relative scale [55]. Our study exhibited that the mapping accuracy of sugarcane was comparable by using SR and TOA data in 2019-2020, when there were sufficient data with high-quality observations (Figures 2 and 11). However, the mapping accuracy of sugarcane using SR data was lower than TOA data in 2016-2018, with lower R 2 and slope and higher RMSE and RMAE, when there were not enough SR images (Figures 2 and 11). Our study demonstrated that, TOA data can be used to identify sugarcane plantation areas based on the proposed method when there are not enough available SR data. Figure 11. Comparisons of the regional performance (R 2 , slope, RMSE, and RMAE) of the method in identification sugarcane planted areas with SR and TOA data at the municipal level for China from 2016 to 2020.
Overall, the method developed in our study can identify the sugarcane planted areas automatically and effectively with limited training samples for China, but there remain Figure 11. Comparisons of the regional performance (R 2 , slope, RMSE, and RMAE) of the method in identification sugarcane planted areas with SR and TOA data at the municipal level for China from 2016 to 2020.
(30,580 ha) of sugarcane plantations are dispersed in other municipalities [48]. The mapping accuracy at 30 m spatial resolution could be limited by the spectral mixing with other types of vegetation in the heterogeneous or scattered planted areas [27]. Additionally, directly comparing with other prevailing methods, such as machine learning, can help to deepen understanding of the method in our study and therefore reduce uncertainties. But the machine learning methods rely on plenty of samples which are difficult to apply to such a large region in current study.

Conclusions
Large area sugarcane identification is difficult because of the geographically variable phenology characteristics of sugarcane. In our study, we developed a phenology-based method to identify sugarcane plantation areas in China using Landsat-7/8, Sentinel-1, and Sentinel-2 images. The proposed method can identify sugarcane cultivation areas effectively and accurately with limited training sample data. Sugarcane maps were produced for China at 30 m spatial resolution from 2016 to 2020 with both SR and TOA data. Validations against field samples and agricultural statistical data showed the maps were in high accuracy. The sugarcane maps produced in our study are applicable to monitor the

Conclusions
Large area sugarcane identification is difficult because of the geographically variable phenology characteristics of sugarcane. In our study, we developed a phenology-based method to identify sugarcane plantation areas in China using Landsat-7/8, Sentinel-1, and Sentinel-2 images. The proposed method can identify sugarcane cultivation areas effectively and accurately with limited training sample data. Sugarcane maps were produced for China at 30 m spatial resolution from 2016 to 2020 with both SR and TOA data. Validations against field samples and agricultural statistical data showed the maps were in high accuracy. The sugarcane maps produced in our study are applicable to monitor the growing conditions and yields of sugarcane, and contribute to sustainable sugarcane production. Of course, there are uncertainties in our sugarcane maps. In the future, integrating other techniques to the proposed method, such as the object-based image segmentation technique or image fusion with high revisit MODIS data or cloud-free SAR data, may help to improve the model performance and obtain sugarcane maps with higher accuracy and longer time period.