Monitoring Oasis Cotton Fields Expansion in Arid Zones Using the Google Earth Engine: A Case Study in the Ogan-Kucha River Oasis, Xinjiang, China

Rapid and accurate mapping of the spatial distribution of cotton fields is helpful to ensure safe production of cotton fields and the rationalization of land-resource planning. As cotton is an important economic pillar in Xinjiang, accurate and efficient mapping of cotton fields helps the implementation of rural revitalization strategy in Xinjiang region. In this paper, based on the Google Earth Engine cloud computing platform, we use a random forest machine-learning algorithm to classify Landsat 5 and 8 and Sentinel 2 satellite images to obtain the spatial distribution characteristics of cotton fields in 2011, 2015 and 2020 in the Ogan-Kucha River oasis, Xinjiang. Unlike previous studies, the mulching process was considered when using cotton field phenology information as a classification feature. The results show that both Landsat 5, Landsat 8 and Sentinel 2 satellites can successfully classify cotton field information when the mulching process is considered, but Sentinel 2 satellite classification results have the best user accuracy of 0.947. Sentinel 2 images can distinguish some cotton fields from roads well because they have higher spatial resolution than Landsat 8. After the cotton fields were mulched, there was a significant increase in spectral reflectance in the visible, red-edge and near-infrared bands, and a decrease in the short-wave infrared band. The increase in the area of oasis cotton fields and the extensive use of mulched drip-irrigation water saving facilities may lead to a decrease in the groundwater level. Overall, the use of mulch as a phenological feature for classification mapping is a good indicator in cotton-growing areas covered by mulch, and mulch drip irrigation may lead to a decrease in groundwater levels in oases in arid areas.


Introduction
China is the world's largest cotton producer and consumer. A total of 5.91 million tons of cotton were produced in China in 2020, with only Xinjiang producing 5.16 million tons of cotton, accounting for 87.3% of China's total cotton production. In the same year, China sowed 3169.9 thousand hectares of cotton, with the largest cotton-producing region, Xinjiang, sowing 2501.9 thousand hectares of cotton, accounting for 78.9% [1]. Fast, accurate and efficient mapping of cotton acreage helps to improve the efficiency of land resource use and rational land-use planning. It plays an important role in maintaining food security in Xinjiang, consolidating the economic status of the cotton industry, improving quality and efficiency and implementing the rural revitalization strategy [2].
Google Earth Engine (GEE) is a cloud-based geographic information processing platform that currently collects commonly used remote sensing datasets such as MODIS, Landsat and Sentinel. GEE can acquire and process data using online or offline programming, and use cloud computing for remote sensing data analysis and processing, thus avoiding the tedious process of data downloading and pre-processing brought about by the traditional remote sensing analysis mode [3].
A number of studies have demonstrated the high potential of GEE for monitoring farmland, including farmland classification, crop growth monitoring, mapping agricultural acreage, disaster area monitoring, etc., [4][5][6][7][8][9]. GEE also stands out in identifying cotton fields, for example, Aneece et al. [10] constructed a data collection from Earth Observing-1 Hyperion hyperspectral images to classify and map major crops in the United States, including cotton, with the aid of the GEE cloud computing platform. Al-Shammari et al. [11] used Landat8 images and the Random Forest (RF) model based on the GEE platform to assess the ability of a vegetation phenology-based crop type mapping approach to map cotton fields in the cotton growing region of eastern Australia. The results show the ability to combine amplitude and phase mapping, and a significant increase in model accuracy when adding amplitude and phase as predictor variables to the model, rather than just using harmonized-NDVI or raw bands for prediction. More recently, Yan et al. [12] showed very good results for mapping crop types in two growing regions of the Central Valley in Illinois and California using a neural network model with Google Street View images. In summary, the current use of GEE to identify cotton crop information has a high applicability and most studies are based on cotton peach spectral features for classification. Fewer studies have been conducted on crop identification based on crop phenology as it requires a large amount of continuous image data to correspond to the phenological information.
In Xinjiang, under-film drip irrigation has been promoted since the last century to cope with the increasing water stress, and the area of under-film drip-irrigated cotton fields accounts for a very high proportion of the cotton cultivation area [13]. Covering cotton with mulch before growth increases soil temperature and moisture content and suppresses soil salinity, ensuring seed germination and crop growth [14,15]. The availability of satellite mapping of plastic mulch has been confirmed by research based on Landsat 8 satellite data using RF algorithms to map mulched farmland [16]. In summary of earlier studies, it is easy to see that few studies of cotton field identification based on phenological information have explored the effect of film on image classification. This phenological information from bare soil to film cover to crop germination may be a useful way to identify cotton fields; therefore, research on this is needed.
This work aims to to investigate the potential of this particular phenology information in identifying cotton fields and to provide a reference for identifying and monitoring cotton fields in arid zones. The differences between Landsat 8 and Sentinel 2 based on the GEE platform in identifying cotton fields are explored to explain the reasons for identifying cotton fields based on phenology. This study can provide scientific reference for cotton field identification and monitoring, land resource and water resource regulation and management in arid areas. Figure 1 shows that the Ogan-Kucha River oasis is located in the northern part of the Tarim Basin and the southern foot of the Xinjiang Tianshan Mountains (82 • 10 -83 • 50 E, 41 • 06 -41 • 40 N). The oasis is deposited by alluvial fans and can be divided into three parts: proximal fan, intermediate fan and distal fan. The topography is high in the north and low in the south, sloping from northwest to southeast, and the alluvial fans are~900 m above sea level. The climate type is continental, warm temperate, arid climate, characterized by drought and little rain, abundant light and heat resources, strong evaporation, and large diurnal differences in temperature. The average annual temperature is 11.6 • C, the average annual precipitation is 52 mm, and the ratio of evaporation to precipitation is 54: 1. The soil types of the oasis are loam, clay loam, sand, sandy loam in descending order of area, and the land cover types are mainly farmland and desert. Natural vegetation is dominated by Populus euphratica, Tamarix chinensis, Phragmites australis, Alhagi sparsifolia, Suaeda glauca, and Kalidium foliatum. The desert on the periphery of the oasis is accompanied by seasonal waterlogging. Oasis crops are dominated by cotton and fruit trees, as well as wheat, corn and other food crops. In the 1990s, managers gradually introduced under-membrane drip irrigation technology to alleviate the adverse effects of salinization on cotton fields. Cotton production has increased to witness the effectiveness of the film, which has reached 100 percent of the entire oasis cotton fields.

Cotton Phenology Information
The whole growth cycle of cotton can be divided into five stages: sowing, emergence, squaring, flowering and boll setting, and boll opening [17]. In Xinjiang, close to 100% of cotton fields are covered with mulch during the sowing period to reduce water evaporation, insulate and alleviate soil salinity. The cotton boll opening period is an important phenological feature in most cotton field classification studies. In this study, the entire growing season of cotton in Xinjiang, including the process of mulching, was used as a phenological feature to classify cotton fields.

GEE Image Collection
The Landsat series satellites with long time series and the Sentinel satellites with fine spatial resolution were selected as data sources, while the Sentinel 2 satellites have finer pixel information and richer band information than the Landsat satellites, i.e., four more red-edge bands, but the Landsat series satellites have a longer time range. Figure 2 shows the detailed technical flow of the study. All Landsat 5 TM, Landsat 8 OLI and Sentinel 2 data covering the entire the Ogan-Kucha river oasis were filtered in the code editor (Java) interface of GEE to generate the image collection. The conditions for filtering the data are time, cloud mask and position filtering. The time conditions were selected based on the quality of the satellite images for the years 2011, 2015 and 2020, from March to October of each year. The cloud masks were de-clouded for Landsat 5 TM, Landsat 8 and Sentinel 2 data by an algorithm provided by GEE. The band features used for classification were selected for the visible, near-infrared and short-wave infrared bands of each sensor. Normalized Difference Vegetation Index (NDVI) was calculated for all images in the image collection to better distinguish cotton fields from other agricultural fields. Normalized Difference Built-up Index (NDBI) and Modified Normalized Difference Water Index (MNDWI) were calculated to classify built-up land and water bodies. The Landsat 5 and 8 images were used the visible, NIR and short-wave infrared bands and calculated NDVI, NDBI, MNDWI indices, while the Sentinel 2 images were used more than these and the red-edge bands. The bands and indices used for each image when entering the classification model are shown in Table 1.

Supervised Classification Model
A variety of machine-learning, supervised classification algorithms are provided in GEE, such as RF, Bayesian, Support Vector Machine, etc. In this study, the commonly used RF machine-learning algorithm was chosen as the classifier to correct the habit of overfitting the decision tree to its training set [18]. The RF constructs the bootstrap [19] aggregation method (Bagging) with a decision tree as the base learner, and further introduces random attribute selection in the training process of the decision tree. The number of trees in the model in the study was set to 20. Polygon feature datasets were generated from local land-use data and visual interpretation in 2011, 2015 and 2020, including cotton fields, agricultural lands other than cotton, urban construction lands, salt-affected lands, water bodies and deserts, respectively. Each type of classification and its detailed characterization is shown in Table 2. We divide 70% of the feature set into training samples and 30% as validation samples to evaluate the classification accuracy.

Accuracy Assessment
Some common methods for assessing classification accuracy are based on confusion matrices [20]. After the confusion matrix is constructed, the Overall accuracy, Kappa accuracy, and User accuracy are calculated to assess the classification accuracy. Equations (1)- (3) give the formulae for overall accuracy, kappa accuracy and user accuracy respectively.
where, OA is overall accuracy, KA is Kappa accuracy, UA is user accuracy, k is the total number of types, N is the total number of samples, N ii is the number of samples that are type i in the test sample and type i in the actual classification result, N i+ is the total number of samples in type i in the test sample, and N +j is the total number of samples classified into type j in the actual classification.

Classification Accuracy
In terms of each accuracy indicator (Table 3), the classification results using the RF classifier provided by GEE are credible for different land types in different years. Both Sentinel 2 and Landsat 8, as data sources, yielded good classification accuracy with Overall accuracy and Kappa coefficient greater than 0.9. However, the Overall accuracy and Kappa coefficient of Landsat 8 were found to be better than Sentinel 2 in the classification results for 2020, while comparing user accuracy revealed diametrically opposite results. The results indicate that both Landsat 8 and Sentinel 2 data were successful when applied to classified cotton fields, but Sentinel 2 showed higher user accuracy. Comparing whether considering the film feature based on the Sentinel 2 images from 2020 has an impact on the classification accuracy (Table 3), it was found that considering the film feature in the date filter has higher Overall Accuracy, Kappa Accuracy and User Accuracy. While there was a slight improvement in Overall Accuracy and Kappa Accuracy, there was a larger improvement in User Accuracy of 0.144. Tables 4 and 5 show the difference in confusion matrices when considering film features and when using only cotton boll features, respectively. Considering film features reduces the number of cotton field pixels classified as other types of pixels.   Figure 3 compares the detailed texture of the Sentinel 2 and Landsat 8 classification results. The results show that both Sentinel 2 and Landsat 8 are able to extract cotton field information better, with the differences being, for example, in the classification errors caused by roads and small cotton fields. The reason for this difference may be due to the higher spatial resolution of the Sentinel 2 image compared to the Landsat 8 image, resulting in better recognition of dirt roads in fields less than 30 m wide in Sentinel 2. Another reason may be that local farmers may plant fruit trees in the middle of the fruit trees or reclaim additional small plots of farmland for cotton, and these small cotton fields may cause classification errors.

Differences in Reflectance between Cotton Fields and Other Fields during the Growing Season
In this study, cotton phenological information was used as the main feature to distinguish other farmlands. Figures 4 and 5 show the variation in surface reflectance versus NDVI for cotton fields, fruit trees and wheat during the growing season, respectively. The results show that cotton fields and other agricultural fields have similar reflectance variation characteristics, i.e., B6, B7, B8, B8A and B9 bands gradually increase and then decrease during the growing season, and B1, B2, B3, B4, B5, B11 and B12 gradually decrease and then increase. The difference is reflected in the reflectance of the B6, B7, B8, B8A and B9 bands starting around the beginning of May in cotton fields (Figure 4a), while in fruit trees and winter wheat it starts to increase in April (Figure 4b,c), which may be related to crops such as winter wheat. The cotton fields showed an increase in NDVI throughout the growing season, followed by a decrease after flowering and bolling in July (Figure 5a). The fruit trees and winter wheat showed a decrease and then an increase in NDVI minimums in June and July (Figure 5b,c), probably due to the maturation of crops such as winter wheat during this period. The rebound in reflectance is due to the replanting of crops such as maize after the wheat harvest in June. Therefore, differences between crops in terms of reflectance and NDVI can help to classify cotton fields for extraction.

Changes in Spectral Reflectance Due to Mulching and Cotton Picking in Cotton Fields
To further clarify the extent to which mulching disturbed the reflectance of the cotton fields, the spectral reflectance of the ground surface before and after mulching was counted, with reflectance from the Sentinel 2 image collection. The results ( Figure 6) show that Sentinel 2 images show an increase in bands 1 to 9 after mulching, and a decrease in bands 11 and 12 (Figure 6a). The NDVI box is slightly reduced and can be considered to be little changed (Figure 6b). The true color image of Sentinel 2 appears as a distinct greyishwhite image element after covering the ground film (Figure 6c,d). This result suggests that ground cover is a good mutation detection phenomenon during the cotton growing season in Xinjiang, which also indicates that the use of ground cover as a phenological phenomenon is a good indicator for cotton field classification. Some studies will use cotton phenology information such as cotton boll as the main feature for classifying cotton fields, so we compared the change in reflectance of Sentinel 2 images of cotton fields after cotton picking (Figure 7). Figure 7a shows that there is a decrease in the spectral reflectance of cotton fields from band 1 to band 9 after cotton picking, while bands 11 & 12 show an increase. The box becomes smaller after cotton picking, i.e., the data distribution becomes more concentrated, but the range of values in some bands becomes wider, and the NDVI of the cotton field shows similar changes ( Figure 7b). Comparing the Sentinel 2 true color images before and after cotton picking the grey-white image elements (cotton boll) also disappear (Figure 7c,d). This suggests that cotton boll is necessary as a feature to classify cotton crops, but that there may be picked versus unpicked cotton fields in a single image, which may cause errors.

Spatial Trends of Oasis Farmland and Its Relationship to Groundwater Depth
The above findings suggest that it is feasible and successful to use gee to classify oasis cotton fields in arid zones from other agricultural fields. Focusing only on cotton fields and other farmlands, we explore the spatial distribution of farmlands in the oasis from 2011 to 2020. Figure 8 shows a clear trend of spatial expansion of cotton fields in the study area from 2011 to 2020, mainly in areas such as the southern part of the eastern part of the western part of the oasis. Newly reclaimed cotton fields are mainly located on the periphery of the oasis, while other farmlands are mainly located within the oasis along the river, mainly due to the greater water demand of other crops such as wheat, maize and fruit trees. Groundwater depth data are from the groundwater salt monitoring station of the Ogan-Kucha River Basin Management Office in Aksu District, Xinjiang. The area of cotton fields shows an increasing trend and is in line with the change in groundwater depth, while the area of other fields does not change much (Figure 9). The area of cotton fields showed a high correlation with groundwater depth, with a correlation r of 0.979. As the area of cotton fields increased, the groundwater depth became lower, which may be related to the extensive implementation of drip irrigation facilities in cotton fields.

Discussion
In this study, a collection of image data from the study area was constructed at GEE and a random forest machine-learning algorithm was used to classify the ground cover, and the classification results met expectations. Studies have been conducted comparing the potential of Landsat 8 with Sentinel 2 in classifying cotton fields, but there is disagreement on the classification accuracy. A comparison of previous studies found similar accuracy results. Paul et al. [21] comparing Landsat 8 with Sentinel 2 imagery in classifying multiple crops (including cotton) found better accuracy results for sentinel, which would be interpreted as the red edge band providing more crop information. Liu et al. [22] found that Sentinel 2 had lower Overall accuracy and Kappa coefficient results than Landsat 8, which is interpreted as an insufficient number of sample points. In contrast, the detailed differences between Landsat 8 and Sentinel 2 classification mapping in this study show ( Figure 3) that both data are effective in extracting cotton fields to distinguish them from other agricultural fields. However, Sentinel 2 is superior in extracting edge information from cotton fields, whereas Landsat 8 imagery cannot show these details, such as the edge features of some fields. Therefore, the reason for the inferior accuracy of Landsat 8 imagery compared to Sentinel 2 in this study is more likely to be explained by the influence of both the resolution of the image elements and the number of spectral bands of Landsat 8 data being inferior to that of Sentinel 2. Overall, both the Landsat 8 and Sentinel 2 imagery datasets based on phenology can successfully extract cotton fields, but different data sources are chosen for different user needs. From a classification accuracy perspective, the accuracy of their proposed white boll index (OA = 0.95) is similar and better than that of support vector machines (OA = 0.59) and one-dimensional convolutional neural networks (OA = 0.27) compared to that of Wang et al. [23] based on Sentinel 2 time series considering cotton phenological information (without considering films) to identify cotton fields. Other studies [24] have reported evaluating the potential of hyperspectral images with Sentinel 2 and Landsat 8 for cotton field identification and found that the classification accuracy increased with a increasing number of bands and that their classification accuracy based on Sentinel 2 images was similar to this study, but in terms of Overall accuracy the Landsat 8 accuracy in this research was higher and the user accuracy was lower. Compared to other studies on cotton field identification based on phenological information, considering the mulching will have higher accuracy [17], which can be explained by its coarse image resolution.
Information on cotton phenology throughout the growing season including mulching phenological information for cotton throughout the growing season including mulching was used as a key feature to distinguish other farmland, and the use of GEE to construct image data collections allows for the inclusion of this phenological information. The results of this study show that ground cover film is a very good phenomenon for mutation detection because of the increased spectral reflectance in bands 1 to 9. Wang et al. [23] also constructed the White Bolls index for identifying cotton fields based on the Sentinel 2 time series data set and it also highlighted that cotton bolls are important features for distinguishing other fields. This study also underlines the same perspective as Wang et al. [23] using Sentinel 2 time series data to classify cotton fields. Csillik et al. [25] also emphasized the usefulness of the Sentinel 2 satellite in applicability of farmland classification. However, this could lead to many cotton fields being missed if time series image data is missing before cotton picking. The advantage of using mulching is that it adds knowledge of features to identify cotton fields to compensate for the absence of other features such as cotton bolls after the cotton is picked.
The expansion of cotton fields into alluvial plains is mainly influenced by farmer choices driven by crop price fluctuations, agricultural policies and natural conditions [26]. The area of cotton fields extracted in this study was smaller than previous studies and lower than the statistical results from the government sector [27]. The reason for this is that Li et al. [28] misclassified other farmland and may have identified other farmland as cotton fields, while the area estimated in this study is lower than the government statistics since there are still some cotton fields outside the southern part of the study area.
The mulch drip irrigation and the anti-leakage hardening measures of irrigation canals have reduced the infiltration of irrigation water into the ground, and groundwater recharge has been greatly reduced. With the spread of mulch drip-irrigation cotton cultivation technology and the expansion of agricultural land to the periphery of the oasis, we should be alert to vegetation death due to the ecological water shortage caused by the decline of groundwater level [27,29,30].
Although this study successfully classified cotton fields in Xinjiang based on the GEE platform, the following shortcomings remain. (1) Whether or not an image ensemble is constructed to respond to cotton phenological information, the quality of Sentinel 2 image imaging is still influenced by clouds. One idea to solve this problem is to use spatio-temporal fusion to fuse MODIS and Sentinel 2 images to obtain a high temporal resolution image collection to provide more weather information. (2) Using the Sentinel 2 full band and calculating some vegetation indices would increase the computational effort of the classification algorithm, and even on the GEE platform, individual users would face computational limitations. (3) There is a lack of discussion on multiple classification models, such as multiple machine learning and deep learning algorithms. Finally, we propose the development of a cloud-based client system [31] to carry out rapid and large-scale cotton field monitoring tasks.

Conclusions
In this study, based on GEE cloud computing platform, a random forest machinelearning algorithm was used to supervise the classification of Landsat 5 and 8 and Sentinel 2 satellite images to obtain the spatial distribution characteristics of cotton fields in the Ogan-Kucha River oasis in 2011, 2015 and 2020. After the cotton fields were mulched, there was a significant increase in the spectral reflectance in the visible, red-edge and near-infrared bands, and a decrease in the short-wave infrared band. The accuracy of the classification results showed that using Sentinel 2 images to extract cotton fields was more accurate than Landsat 8 users and could reduce road misclassification. Using the mulching process of cotton sowing period as a classification feature is a good phenological feature for extracting cotton fields. The Ogan-Kucha River oasis has shown a trend of cotton field area expanding to the periphery in the last 10 years, and further promotion of mulching drip irrigation may lead to the decrease of groundwater level in the oasis area. The decline in groundwater level needs to be taken seriously by local managers, as some plant roots cannot obtain water and die leading to ecological degradation. The study provides a reference for monitoring and mapping cotton fields based on a cloud computing platform and has a positive effect on land-resource planning and water resource regulation.