Collaborative Extraction of Paddy Planting Areas with Multi-Source Information Based on Google Earth Engine: A Case Study of Cambodia

: High-precision spatial mapping of paddy planting areas is very important for food security risk assessment and agricultural monitoring. Previous studies have mainly been based on multisource satellite imagery, the fusion of Synthetic Aperture Radar (SAR) with optical data, and the combined use of multi-scale and multi-source sensors. However, there have been few studies on paddy spatial mapping using collaborative multi-source remote sensing product information, especially in tropical regions such as Southeast Asia. Therefore, based on the Google Earth Engine (GEE) platform, in this study, Cambodia, which is dominated by agriculture, was taken as the study area, and an extraction scheme for paddy planting areas was developed from collaborative multisource information, including multi-source remote sensing images (Sentinel-1 and Sentinel-2), multisource remote sensing land cover products (GFSAD30SEACE, GLC_FCS30-2015, FROM_GLC2015, SERVIR MEKONG, and GUF), paddy phenology information, and topographic features. Evaluation and analysis of the extraction results and the SERVIR MEKONG and ESACCI-LC paddy products revealed that the accuracy of the paddy planting areas extracted using the proposed method is the highest, with an overall accuracy of 89.90%. The results of the proposed method are better than those of the other products in terms of the outline of the paddy planting areas and the description of the road information. The results of this study provide a reference for future high-precision paddy mapping. products, and the results are better for different paddy landscape patterns. The results demonstrate that the paddy planting area overall of the is 89.90%, and the spatial expression ability for planting areas is good under different landscape patterns, the feasibility of the proposed The results of the and demonstrate that the paddy extraction results obtained using the proposed method are superior to those of the SERVIR MEKONG and ESACCI-LC products. This shows that the method of using multi-source information for cooperative extraction of paddy planting areas proposed in this has great applicability and superiority.


Introduction
The timely and accurate acquisition of the spatial distribution of paddy planting areas, on which humans depend for survival, is of great significance for agricultural monitoring, paddy yield estimation, and other research [1,2]. Paddy is the most important food security crop in Southeast Asia, but most paddy is planted in the rainy season, and the cloudy weather in the rainy season seriously affects the use of optical remote sensing images, which in turn affects the accurate extraction of paddy planting areas [3]. Synthetic Aperture Radar (SAR) can observe the ground surface in near real-time and can reflect basic information about an area, such as the geometric structure of the land cover and the roughness of the vegetation canopy [4]. However, compared with optical remote sensing images, SAR data are very often used in agricultural mapping and monitoring [5,6]. The Sentinel series of freely available satellite data, which has high temporal and spatial resolutions, Free Sentinel-1 SAR and Sentinel-2 multispectral data provide the possibility for paddy extraction in Cambodia, which contains a large paddy planting area and is susceptible to cloud cover. Studies have shown that the Vertical Horizontal (VH) polarization backscattering of the Sentinel-1 SAR is more sensitive to paddy growth [16,18]. In addition, the Interferometric Wide swath (IW) mode is the main mode of operation according to Sentinel-1's mission requirements [19], which supports its application in natural resources such as agriculture and forestry [20]. In order to meet the image quality and scan width requirements, IW mode is implemented as the ScanSAR mode with a progressive azimuth scan. IW mode combines a large mapping zone width (250 km) with a high geometric resolution (5 m × 20 m above ground). Using this technique, the radar beam scans back and forth three times along a strip, allowing the production of a very high quality and uniform SAR image. Sentinel-1 VH polarization and IW level-1 Ground Distance image (GRD) data were used as the remote sensing data sources in this study. The Sentinel-2 data include Level-1C top-of-atmosphere reflectance products and Level-2A bottom-of-atmosphere reflectance data. The image block is a 100 × 100 km 2 ortho image projected onto the UTM/WGS84 coordinate system. In this study, the Level-2A products of Sentinel-2 were selected for use in the extraction of the paddy growing areas. Tables 1 and 2 present the detailed information about Sentinel-1 and Sentinel-2 data used in this study.  The Sentinel-1 data provided by the GEE (Figure 2) have been processed, including orbit correction, GRD boundary noise cancellation, thermal noise cancellation, radiometric calibration, and terrain correction. The unit of the radar backscattering coefficient is decibels (dB). The Sentinel-2 data provided by the GEE have been preprocessed, including atmospheric correction. Based on this, in this study, the preprocessing included Sentinel-2 data cloud removal and region clipping. First, the Sentinel-2 data package contains cloud mask band QA60 with a resolution of 60 m. The dense clouds and cirrus clouds are labeled as binary positions 10 and 11, respectively. Based on this, the QA10 and QA20 bands are resampled to resolutions of 10 m and 20 m, respectively. In the corresponding binary bit, 0 indicates no clouds, and 1 indicates that there are clouds. Therefore, the GEE can be used to facilitate cloud removal using the band logic operation of the image. Finally, the clip function in the GEE was used to obtain Sentinel-2 image data covering the study area.

Multi-Source Land Cover Data
In order to solve the problem of the use of time-consuming and labor-intensive training samples, in this study, the acquisition of the paddy extraction training samples was conducted using a combination of multi-source remote sensing land cover products. The land cover products used included GLC_FCS30-2015 [21], FROM_GLC2015 [22,23], SERVIR MEKONG [24,25], Global Food Security-Support Analysis Data Cropland Extent 2015 Southeast and Northeast Asia 30 m (GFSAD30SEACE) [26], and Global Urban Footprint (GUF) (https://www.dlr.de/eoc/en/desktopdefault.aspx/tabid-11725/20508_read-4794 4/) (accessed on 6 December 2021) [27,28]. According to the production instructions of each product, the overall accuracies of GLC_FCS30-2015, FROM_GLC2015, GFSAD30SEACE, and GUF products are 81.4%, 77.3%, 88.6%, and 93.96%, respectively, based on the absolute accuracy evaluation of the validation samples. The overall accuracy of the SERVIR MEKONG product is currently unpublished. Among them, the spatial resolution of the GUF product is 12 m, and those of the other products are 30 m. The GUF product was produced in the period 2011-2012, and the other products (GFSAD30SEACE, GLC_FCS30-2015, FROM_GLC2015, and SERVIR MEKONG) were produced in 2015. According to previous studies [29], natural ecosystems often undergo significant changes during periods of 10 years or longer. Therefore, the timeframes of the data selected are adequate for use in this study. After obtaining the land cover products for the study area, the WGS84 data and the Albert equal-area projection coordinate system were used as the spatial reference system for the analysis.

Other Auxiliary Data
Paddy planting calendar. The paddy planting calendar information from the Food and Agriculture Organization (FAO) Global Information and Early Warning System (GIEWS) information (http://www.fao.org/giews/countrybrief) (accessed on 8 December 2021) and the RiceAtlas (https://doi.org/10.7910/DVN/JE6R2R) (accessed on 15 December 2021), which is a spatial database of the global paddy calendar and yield produced by Laborte et al. [30], were used in this study. For more information about the RiceAtlas data, please refer to Laborte et al. [30]. In addition, in order to ensure the reliability of the paddy planting calendar information, they were compared with the information obtained from the field investigations and consultation with local residents in 2018 and data obtained from the literature. The results revealed that the paddy planting calendar information from the GIEWS and RiceAtlas is consistent with the reference data. Detailed information about the paddy planting calendar for the study area is presented in Figure 3. Altimeter System (GLAS), and Portable Remote Imaging Spectrometer (PRISM) data in terms of processing and improving the original SRTM data, filling in missing values, and improving the elevation accuracy [31]. In order to improve the computational efficiency, the DEM data and slope data calculated based on the DEM were obtained directly from the GEE platform. Figure 4 shows the overall extraction process for the paddy planting areas. First, in order to quickly obtain the training samples, multi-source remote sensing land cover products were combined to obtain the training samples. Second, using the GEE platform, Sentinel-1 and Sentinel-2 remote sensing images acquired in the dry and rainy seasons were selected based on the phenological information for the study area. The backscattering coefficients of the three key phenological periods (transplanting, heading, and mature) of paddy were extracted based on the Sentinel-1 images. Then, other classification features were extracted from Sentinel-2 images and were classified using the random forest method. Finally, the accuracy of the paddy extraction was qualitatively and quantitatively verified and was compared with existing paddy data products to verify the advantages and disadvantages of the proposed method. Figure 5 shows the overall process used to quickly acquire training samples from the combination of multi-source remote sensing land cover products.  Acquisition samples of forests, water bodies, and artificial surfaces. In this study, using the ArcGIS software (version 10.3 of Esri USA), the spatial superposition method was used to calculate the consistency of the different land cover data for each grid point pixel by pixel. The same number was defined for each type of pixel-by-pixel statistic, and the consistency was divided into different levels from low to high. In this study, samples were collected from the regions with the highest spatial consistency. The spatially consistent regions of the GLC_FCS30-2015, FROM_GLC2015, and SERVIR MEKONG products were obtained via spatial superposition. The results revealed that the consistencies of the water bodies, forests, and artificial surfaces were high, but the consistencies of the other types (such as shrubland) were low, which did not satisfy the requirements for the automatic acquisition of training samples. Therefore, in order to ensure the quality of the samples, the random sampling method was adopted to obtain the training samples from consistent areas of forest and water bodies. For the artificial surfaces, the GUF products and artificial surface consistency area were spatially superimposed, and the artificial surface samples were randomly sampled in the new consistent area.

Acquisition of Training Samples
Paddy sample acquisition. The consistent areas of the GLC_FCS30-2015, FROM_GLC2015, SERVIR MEKONG, and GFSAD30SEACE were spatially superposed with the paddy extraction areas of the SERVIR MEKONG, then the paddy samples were randomly sampled in the consistent areas. In addition, in order to ensure the number of paddy samples, Sentinel-2 images, high-resolution Google Earth images, and field investigation data were used as auxiliary data for the sample encryption to obtain the final paddy training samples.
Acquisition of other types of samples. Due to the small coverage areas of the shrubland, grassland, other cropland, and other land cover types in the study area, the random sampling of sample points based on the consistent areas could not guarantee the sample quality. Therefore, in order to improve the accuracy of the samples, these types of samples were acquired based on Google Earth images, Sentinel-2 images, and field investigation data. Using the above three steps, 1350 training samples covering the study area were finally obtained ( Figure 6 and Table 3).

Feature Extraction
Key phenological characteristics of paddy. The phenological growth characteristics of paddy mainly include the germination stage, seedling stage, and transplanting stage (Figures 7 and 8). During the process of paddy growth and development, the radar backscattering coefficients under different polarization and polarization combinations exhibit different response characteristics [32][33][34]. Therefore, based on the transition of paddy between the different phenological stages, the backscattering coefficient of the VH polarization exhibits different trends, which can be used to obtain the paddy planting areas [35].  The backscattering coefficients of the different land surface types were different in the time series, and the paddy types exhibited obvious phenological characteristics. During the growth and development of paddy, the radar backscattering coefficients under different polarizations and polarization combinations exhibit different response characteristics. Specifically, the VH polarization characteristics of paddy in the transplanting stage are similar to those of water bodies, and the backscattering coefficient is low. With the continuous growth of the paddy, the radar backscattering is mainly from the paddy itself, and the backscattering coefficient increases. When the paddy is in the heading stage and the milk ripening stage, the backscattering coefficient of the paddy reaches its maximum value. However, when the paddy is in the waxy ripening stage to the mature stage, the attenuation effect of the backscattering becomes stronger and the backscattering coefficient begins to decrease. In summary, in the three key phenological stages (transplanting, heading, and mature stages), the backscattering coefficients of paddy are small, large, and small, successively. In this study, the paddy phenology information for Cambodia was used as the time window to enhance the paddy information based on the Sentinel-1 image extraction of the backscattering coefficients in the three different growth stages of paddy in the dry season and rainy season. The detailed steps are as follows. Using the GEE platform, according to the image data for the first growth stage of paddy in the dry season, the minimum backscattering coefficient was extracted pixel by pixel using the image collection function in order to enhance the signal of the paddy transplanting stage. Then, using the image data for the second growth stage during the dry season, the maximum backscattering coefficient was extracted pixel by pixel using the image collection function to enhance the signal of the paddy heading stage. Finally, using the image data for the third growth stage during the dry season, the minimum backscattering coefficient was extracted pixel by pixel using the image collection function to enhance the signal of the mature paddy stage. The method of enhancing the signals of the paddy transplanting stage, heading stage, and mature stage for the rainy season is consistent with that of the dry season. For a single pixel, the backscattering coefficients of the three key phenological stages were extracted using Equations (1)-(3). Through the above steps, the radar backscattering coefficients of six bands were obtained to enhance the extraction of the paddy planting areas.
where σ 0 a , σ 0 b , and σ 0 c are the backscattering coefficients of the transplanting, heading, and mature stages, respectively, and k, m, and n are the numbers of all of the available backscattering coefficients for the transplanting, heading, and ripening stages, respectively.
Other characteristics. The other features used in this study included the thematic index feature, textural feature, and terrain feature. The features of the thematic indexes mainly include the Normalized Difference Vegetation Index (NDVI) [36], Modified Normalized Difference Water Index (MNDWI) [37], and Normalized Difference Building Index (NDBI) [38]. In this study, after resampling the resolution to 10 m, based on previous research results, the annual maximum value of the NDVI was used to highlight the forest information, the annual mean value of the MNDWI was used to highlight the water information, and the annual mean value of the NDBI was used to highlight the building information. Haralick et al. [39] proposed the Gray-Level Co-Occurrence Matrix (GLCM), which is widely used at present and can obtain good classification results, to characterize the textural features of images [40,41]. Previous studies have shown that the four features of contrast, entropy, angle of the second moment, and correlation calculated using the GLCM have good classification effects for remote sensing images and can effectively represent the textural information of the ground objects [42]. Therefore, in this study, the four textural features were calculated based on the B8 band Sentinel-2 data and were introduced into the classification. During the calculation of the above characteristic indices, the data resolution was resampled to 10 m. In addition to the extracted paddy phenology features, thematic index features, and textural features, the DEM and slope (topographic factors) also have an impact on the accuracy of the cropland extraction. Therefore, the DEM and slope were added to the classification to avoid unreasonable classification results.

Classification and Accuracy Evaluation
At present, many remote sensing images classification studies have verified the robustness of the random forest algorithm [43][44][45][46]. Therefore, in this study, the random forest algorithm [47] was used to extract the paddy areas. When building a random forest model using the GEE platform, the number of randomly selected features is the default value of the random forest function, which is the square root of the total number of input features. The number of spanning trees is determined according to the highest classification accuracy obtained through multiple tests. The final value is set to 40, and the default values of the other parameters are used.
The accuracy evaluation was conducted from qualitative and quantitative perspectives. The qualitative evaluation involved comparing the extracted results with high-resolution Google Earth images and Sentinel-2 images to assess the spatial pattern. The quantitative evaluation involved calculating the Overall Accuracy (OA), User Accuracy (UA), Producer Accuracy (PA), and Kappa coefficients using the confusion matrix [48,49]. The validation samples used for the quantitative evaluation were based on the data collected during the field investigation in 2018 and high-resolution Google Earth images. The spatial distribution of the validation samples and photos of typical samples taken in the field are shown in Figures 9 and 10. In addition, the results were compared with the SERVIR MEKONG and European Space Agency Climate Change Initiative Land Cover (ESACCI-LC) [50] products to verify the advantages and disadvantages of the proposed method. More information about the ESACCI-LC products can be found at http://maps.elie.ucl.ac.be/CCI/viewer/ index.php (accessed on 10 December 2021).   Figure 11 shows the spatial distribution of the paddy areas in Cambodia in 2015. The paddy planting areas in Cambodia were mainly distributed in the northwestern, central, and southern regions. In the east and northeast, the paddy planting areas were small and scattered, which is closely related to the topography and water resources in the study area. By comparing the results of this study with the SERVIR MEKONG and ESACCI-LC products for the same period, it was found that the SERVIR MEKONG product has a good spatial consistency with the extraction results obtained in this study. However, the extraction results obtained in this study differ from the SERVIR MEKONG product in the Tonle Sap Lake (https://hydrosheds.org/downloads) (accessed on 18 December 2021) basin and the southern Mekong River (https://hydrosheds.org/downloads) (accessed on 18 December 2021) basin. This may be due to the fact that the two products use different data sources and classification methods. The consistencies of the spatial patterns of the ESACCI-LC product and the other two products are low. The paddy area extracted from the ESACCI-LC was mainly concentrated in the southern Mekong River basin, and fewer paddy areas were located in the vicinity of the Tonle SAP Lake basin. There was a significant leakage phenomenon in this product. The main reason for this phenomenon is that when the classification system of the ESACCI-LC product was developed, the paddy types only included irrigated paddy fields, not rain-fed paddy fields.

Comparative Analysis with Google Earth Images
The four different land cover types were selected based on high-resolution Google Earth images to compare the extraction results with the existing paddy products. As can be seen from Figure 12, the paddy planting areas extracted in this study are superior to the existing paddy products for all four types of surface cover. In regions A and B, there are obvious omissions in the paddy areas derived from the SERVIR MEKONG product, resulting in low estimates. The paddy extraction results of the ESACCI-LC products not only have serious omissions, but they also failed to distinguish paddy areas from water, with obvious characteristics such as spectrum and texture. In regions C and D, the results obtained using the proposed method are also better than the results of the other two products in terms of depicting linear features such as roads and rivers.  The yellow area represents paddy, and the white area represents non-paddy. Figure 13 shows that the spatial representation of the paddy planting areas extracted in this study is superior to those of the other two products, and its ability to depict linear features such as roads and rivers is also slightly better. The SERVIR MEKONG product depicts the large contours and boundaries of the paddy planting areas, but there is a certain degree of leakage. For the ESACCI-LC product, the spatial comparison results are consistent with the aforementioned high-resolution Google Earth images. For all four types of surface cover, the paddy planting area extracted using this product has a serious omission phenomenon. Therefore, it is necessary to be cautious when using these data for relevant research.

Quantitative Evaluation
The results of the paddy extraction were evaluated using data for 792 verification points collected in 2015. Table 4 shows that the OA and kappa values of the paddy areas extracted in this study are 89.90% and 0.792, respectively. According to the corresponding relationship between the kappa value and the classification accuracy [51], the method of extracting the paddy planting areas developed in this study is very good and can meet the requirements of paddy correlation analysis. By comparing the evaluation results with the SERVIR MEKONG and ESACCI-LC products acquired in the same period (Figure 14), it was found that the accuracy of the paddy extraction obtained using the proposed method is significantly superior to those of the other two products, especially the ESACCI-LC. The PA of the SERVIR MEKONG product is lower than the UA, indicating that the product has more omissions than misclassifications, and there is a certain degree of underestimation. The UA of the ESACCI-LC product is significantly higher than the PA, indicating that the product has serious omissions and serious underestimation.

Potential Sources of Uncertainty
The Sentinel satellites, as data sources with finer temporal and spatial scales, have been used for paddy extraction and land cover monitoring research [16,18]. Previous studies have shown that the combination of optical and SAR data can improve the extraction accuracy of land cover information in areas with strong heterogeneity [52,53]. In this study, it was found that the overall accuracy of the paddy planting area is higher than those of the SERVIR MEKONG and ESACCI-LC products, and the results are better for different paddy landscape patterns. The results demonstrate that the paddy planting area extraction method that uses multi-source information developed in this study can not only reduce the time-consuming and labor-consuming training sample acquisition but can also rapidly obtain high-precision paddy planting area results using the GEE platform, so it has good applicability. Therefore, the proposed paddy planting area extraction method based on Sentinel data provides a new strategy for accurately obtaining high resolution paddy planting information.
The paddy planting areas extracted in this study were overestimated. The Sentinel SAR data have different radar backscattering coefficients for the different surface cover types and are sensitive to the geometric and dielectric properties [54,55]. However, some dryland crops and other vegetation types were misclassified as paddy ( Figure 15). The temporal radar backscattering coefficient of paddy is largely a function of the paddy growth stage, and the high temporal frequency of SAR observations can reveal more detailed growth profiles, which improves the ability to identify paddy from other land cover types [56]. However, different planting densities and planting dates for paddy have important effects on the radar backscattering coefficient. In addition, the classification method developed in this study also relies on the three key growth stages of paddy (transplanting stage, heading stage and mature stage), which may increase the error of the extraction results.

Advantages of the GEE
The results of this study demonstrate the feasibility and reliability of using the GEE platform and Sentinel image data to obtain high-resolution paddy data products for Cambodia. The GEE platform has a very large database, huge computing power, and many algorithms, and it is helpful in processing a large number of high-resolution images for the creation of crop maps [57] and for the timely acquisition of different levels of processed image products for direct use. This study reveals the incentive effect of land cover mapping on cloud computing technology in emerging platforms such as GEE.

Suggestions for Future Work on Paddy Mapping
The Sentinel series of satellites provide open data with hyperspectral, high spatial, and temporal resolutions, which is important for tropical and subtropical countries with severe cloud cover. The data fusion techniques and methods are being further developed and applied to land cover mapping [58]. In addition, phenology-based approaches and available Landsat images are increasingly being used to study land cover and land-use changes. For example, Zhong et al. [59] used phenological indicators to map soybean and maize crops. Dong et al. [60] extracted the area of a rubber plantation using Landsat and SAR images and a phenology method. Methods based on phenology have been proven to be effective in mapping paddy using time-series MODIS data [61]. However, this method faces more challenges when applied to tropical areas (such as Southeast Asia) with more clouds [62], because paddy planting in these regions is fragmented and the paddy planting intensity is high [63]. However, the combined application of Landsat and Sentinel-2 data provides an opportunity to produce high-resolution paddy maps based on phenological methods.

Conclusions
Based on the GEE platform, this paper utilized multi-source information, including multi-sensor remote sensing images (Sentinel-1 and Sentinel-2), multi-source remote sensing land cover products (GFSAD30SEACE, GLC_FCS30-2015, FROM_GLC2015, SERVIR MEKONG and GUF), paddy phenology information and topographic factors (DEM and slope) to acquire paddy data products in Cambodia. The results show that the overall accuracy of the paddy extraction is 89.90%, and the spatial expression ability for paddy planting areas is good under different landscape patterns, indicating the feasibility of the proposed method. The results of the qualitative and quantitative analysis demonstrate that the paddy extraction results obtained using the proposed method are superior to those of the SERVIR MEKONG and ESACCI-LC products. This shows that the method of using multi-source information for cooperative extraction of paddy planting areas proposed in this paper has great applicability and superiority.