Mapping of Cotton Fields Within-Season Using Phenology-Based Metrics Derived from a Time Series of Landsat Imagery

: A phenology-based crop type mapping approach was carried out to map cotton ﬁelds throughout the cotton-growing areas of eastern Australia. The workﬂow was implemented in the Google Earth Engine (GEE) platform, as it is time e ﬃ cient and does not require processing in multiple platforms to complete the classiﬁcation steps. A time series of Normalised Di ﬀ erence Vegetation Index (NDVI) imagery were generated from Landsat 8 Surface Reﬂectance Tier 1 (L8SR) and processed using Fourier transformation. This was used to produce the harmonised-NDVI (H-NDVI) from the original NDVI, and then phase and amplitude values were generated from the H-NDVI to visualise active cotton in the targeted ﬁelds. Random Forest (RF) models were built to classify cotton at early, mid and late growth stages to assess the ability of the model to classify cotton as the season progresses, with phase, amplitude and other individual bands as predictors. Results obtained from leave-one-season-out cross validation (LOSOCV) indicated that Overall Accuracy (OA), Kappa, Producer’s Accuracies (PA) and User’s Accuracy (UA), increased signiﬁcantly when adding amplitude and phase as predictor variables to the model, than prediction using H-NDVI or raw bands only. Commission and omission errors were reduced signiﬁcantly as the season progressed and more in-season imagery was available. The methodology proposed in this study can map cotton crops accurately based on the reconstruction of the unique cotton reﬂectance trajectory through time. This study conﬁrms the importance of phenological metrics in improving in-season cotton ﬁelds mapping across eastern Australia. This model can be used in conjunction with other datasets to forecast yield based on the mapped crop type for improved decision making related to supply chain logistics and seasonal outlooks for production.


Introduction
Remote identification of crop type is widely studied for many agricultural applications. In most cases, this is done retrospectively after the growing season [1], however, in-season mapping is required for better decisions for policymakers and others in the agribusiness sector [2]. For example, knowing the type of crop and the hectares of a particular crop can help in making crucial management, marketing, and logistical decisions [3]. Cotton is an economically important crop in Australia and is worth $2 billion dollars annually on average. The number of cotton fields grown in Australia considerably fluctuates from season to season. For example, in 2011, 599,630 hectares were grown, whereas in 2019, be used to perform this in relatively short time periods [19]. The classifier package in the Google Earth Engine provides both generative and discriminative models including Random Forest (RF) [20], Classification and Regression Trees (CART) [21], Support Vector Machine (SVM) [22] and Naïve Bayes [23]. A classifier can be selected depending on the purpose and volume of data being used for a study. Generally, RF requires less time than the SVM for training and can provide comparable accuracies to the SVM classifier [24,25]. RF can give better results when datasets are large, ascompared to the Naïve Bays, which can give good results with small datasets. Moreover, compared to Decision Trees (DT), RF overcomes overfitting by constructing an ensemble of DTs. Agricultural applications of the GEE platform such as those in [26][27][28] have all shown benefits from the flexibility and efficiency of the platform in solving big data problems.
In machine learning models, training and validation of a model requires representative samples to achieve reliable and accurate results. In many studies, machine learning models use subsets of data for training and validation. There are many methods used to train and evaluate models. For instance, some studies use samples that are drawn from the same datasets based on assigning a specific percent of samples for training and percent of samples for evaluation. For example, in a study conducted by Deschamps et al. (2012) [29], data from 2009 season were used to classify different types of beans, cereals and oilseeds. The ground data (fields) were split randomly into equal numbers of training and validation subsets where each class in the training and validation sets were assigned equally. Accuracies obtained from this study were above 80%. In another study, Waldner et al. (2015) [30] classified land cover and crop type using data from the 2013 season. The model was trained on 50% of data which was split randomly. In this study, crop type classification reached an accuracy of 89%. Another study conducted by Hao et al. (2018) [31] showed data obtained from two campaigns (2011 and 2013) that were split randomly for crop type classification. Howard and Wylie (2014) [32]  Other studies rely on using cross validation (CV) techniques [33] to evaluate models where the dataset is partitioned into a number of folds, where a specific number of folds are used for training and the remaining folds are used for validation. There are a limited number of studies that have used the CV technique for crop classification purposes. For example, Pena and Brenning (2015) [34] classified four fruit-tree crops in central Chile using Landsat-8 time series for the 2013/2014 season. They evaluated three models' performances (linear discriminant analysis, random forest and support vector machine) using the cross validation technique. CV was performed at the field level where 10 folds were used to partition 2291 fields for training and validation subsets. They repeated the CV with different training sample sizes (100, 200, 400 and 800, and all 2291). They achieved a low misclassification error rate <0.21. However, as mentioned earlier, data used in their study were taken from only one season.
However, one limitation of all these studies is that they use observations from the same season to calibrate and validate the models. This means to use this data operationally for mapping crop types, they are testing the interpolation quality of their models rather than their ability to predict in-season without in-season training data. The real challenge is to evaluate a model using a subset that has never been seen by a model. Therefore, this study aims to map the cotton crop in-season across the major cotton fields of eastern Australia based on the phenology information derived from a time series of Landsat 8 imagery in the GEE platform. All models will be tested using the leave-one-season-out cross validation (LOSOCV) technique where an entire season will be left out to test the ability of the model to map cotton fields that have never been modelled. Furthermore, we will test the quality of the model at different stages of the growing season. Successful development of an approach will enable near real-time in-season mapping of cotton that can be used for managing supply chain logistics and seasonal outlooks for the agricultural sector.

Study Area
This study covered major cotton production areas (about 90% of the cotton production region in eastern Australia) stretching from the Macintyre River south of Queensland (QLD), to the Murrumbidgee River in New South Wales (NSW), Australia, for 2013/2014, 2014/2015 and 2019/2020 seasons. This area is known for the dominance of cotton crops in the summer season. Figure 1 shows the study region in NSW and QLD. Actual cotton field boundaries were retrieved from the SataCrop (https://satacrop.com.au/) tool which provides in-season boundaries. This data is collected for the primary purpose of mitigating the risk of spray drift, which allows growers to understand where sensitive crops are located in proximity to their spray operation.
Those field boundaries are uploaded or drawn by farmers on a voluntary basis in the SataCrop tool. Therefore, some filtering of the polygons was based on the NDVI threshold. Polygons were assessed based on the mean NDVI for the entire growing season where polygons with NDVI values that exceeded the threshold of 0.3 were kept and polygons that showed lower values than this threshold were discarded and assumed as bare soil or a mix between bare soil and weeds. Additionally, some polygons drawn in Victoria and Northern Queensland were not used, as these were well outside where cotton is typically grown. The retrieved field boundaries covered~1,195,439 hectares (ha) with 2742 polygon boundaries (fields). Field boundaries for three seasons were extracted from this tool and a new subset of 286, 211 and 231 field boundaries with a total area of 8426, 6482 and 4643 ha for 2013/2014, 2014/2015 and 2019/2020 seasons, respectively, were created. The new subset was used to train and validate the classification models.
These sites are located along paths 91 to 93 and row 80 to 84 (Figure 1) of the Landsat 8 satellite mission and are covered by a total number of 408 remotely sensed images during this period.

Study Area
This study covered major cotton production areas (about 90% of the cotton production region in eastern Australia) stretching from the Macintyre River south of Queensland (QLD), to the Murrumbidgee River in New South Wales (NSW), Australia, for 2013/2014, 2014/2015 and 2019/2020 seasons. This area is known for the dominance of cotton crops in the summer season. Figure 1 shows the study region in NSW and QLD. Actual cotton field boundaries were retrieved from the SataCrop (https://satacrop.com.au/) tool which provides in-season boundaries. This data is collected for the primary purpose of mitigating the risk of spray drift, which allows growers to understand where sensitive crops are located in proximity to their spray operation.
Those field boundaries are uploaded or drawn by farmers on a voluntary basis in the SataCrop tool. Therefore, some filtering of the polygons was based on the NDVI threshold. Polygons were assessed based on the mean NDVI for the entire growing season where polygons with NDVI values that exceeded the threshold of 0.3 were kept and polygons that showed lower values than this threshold were discarded and assumed as bare soil or a mix between bare soil and weeds. Additionally, some polygons drawn in Victoria and Northern Queensland were not used, as these were well outside where cotton is typically grown. The retrieved field boundaries covered ~1,195,439 hectares (ha) with 2742 polygon boundaries (fields). Field boundaries for three seasons were extracted from this tool and a new subset of 286, 211 and 231 field boundaries with a total area of 8426, 6482 and 4643 ha for 2013/2014, 2014/2015 and 2019/2020 seasons, respectively, were created. The new subset was used to train and validate the classification models.
These sites are located along paths 91 to 93 and row 80 to 84 (Figure 1) of the Landsat 8 satellite mission and are covered by a total number of 408 remotely sensed images during this period.

Analytical Process
In Australia, the variation in cotton planting/harvesting dates are primarily due to differences in climate and weather conditions between the different growing regions. However, the cotton season generally starts from October (planting) and ends in March/April (picking). For this reason, the start and end dates of images were used to take into account the variation in sowing dates across the study area. Figure 2 explains the methodology used in this study. The first step included the acquisition of the surface reflectance bands and removal of pixels with clouds and cloud shadows using the function of mask (FMASK) described by [35]. After that, the NDVI was calculated and subsequently used for the generation of amplitude and phase bands. Three-time windows (Figure 3) for early-, mid-and late-season were selected based on the cotton crop calendar for the NSW and QLD regions. Images were gathered from the beginning of the growing season (which usually starts in September/October according to the cotton growing calendar in Australia) to the end of the season (April/May). The three-time windows were assigned to enable the assessment of the models with increasing amounts of in-season information. For example, for window 1, five images were available at each single path and row. Those five images were used to create metrics to map cotton at early stages. For window 2, the images from the early season (window 1) were supplemented with an additional four to five (depending on revisit time at each path and row) images gathered after the end of window 1. For window 3, a further four to five images were added to the nine to 10 images used in window 1 and 2. Window 3 also includes images from window 1 and 2, in total meaning that approximately 13 to 14 images were used for mapping cotton at the late stage.
The second step was splitting the datasets into training and validation sets. The third step was fitting the classification models and evaluating them accordingly.

Analytical Process
In Australia, the variation in cotton planting/harvesting dates are primarily due to differences in climate and weather conditions between the different growing regions. However, the cotton season generally starts from October (planting) and ends in March/April (picking). For this reason, the start and end dates of images were used to take into account the variation in sowing dates across the study area. Figure 2 explains the methodology used in this study. The first step included the acquisition of the surface reflectance bands and removal of pixels with clouds and cloud shadows using the function of mask (FMASK) described by [35]. After that, the NDVI was calculated and subsequently used for the generation of amplitude and phase bands. Three-time windows (Figure 3) for early-, mid-and late-season were selected based on the cotton crop calendar for the NSW and QLD regions. Images were gathered from the beginning of the growing season (which usually starts in September/October according to the cotton growing calendar in Australia) to the end of the season (April/May). The three-time windows were assigned to enable the assessment of the models with increasing amounts of in-season information. For example, for window 1, five images were available at each single path and row. Those five images were used to create metrics to map cotton at early stages. For window 2, the images from the early season (window 1) were supplemented with an additional four to five (depending on revisit time at each path and row) images gathered after the end of window 1. For window 3, a further four to five images were added to the nine to 10 images used in window 1 and 2. Window 3 also includes images from window 1 and 2, in total meaning that approximately 13 to 14 images were used for mapping cotton at the late stage.
The second step was splitting the datasets into training and validation sets. The third step was fitting the classification models and evaluating them accordingly. October 1st was set for the start of imagery acquisition and the 1st of May for the end of acquisition to cover the entire study area in terms of the revisit period of the Landsat 8 satellite and (1) Data acquisition, processing and metric variable extraction. (2) Data preparation for analysis.
(3) Data analysis and validation. Window 1 includes images from the 1st of October to the 1st of January for each season. Window 2 includes images from the 1st of October to the 1st of February (this includes images from window 1 and window (2). Window 3 includes images from 1st of October to the 1st of May, including also images from windows 1 and 2.
October 1st was set for the start of imagery acquisition and the 1st of May for the end of acquisition to cover the entire study area in terms of the revisit period of the Landsat 8 satellite and also the difference in sowing dates. In this way, it was ensured that all fields in NSW and QLD were covered for the period of study. The analysis of time series data within each of the time-windows tests if the cotton crop can be spectrally mapped at early-, mid-or late-season as well as identifying the reduction/increase in accuracies at each time-window as more in-season data becomes available [36].
Remote Sens. 2020, 12, x FOR PEER REVIEW 6 of 19 also the difference in sowing dates. In this way, it was ensured that all fields in NSW and QLD were covered for the period of study. The analysis of time series data within each of the time-windows tests if the cotton crop can be spectrally mapped at early-, mid-or late-season as well as identifying the reduction/increase in accuracies at each time-window as more in-season data becomes available [36].

Data Acquisition and Preparation
The Google Earth Engine (GEE) platform is a tool that helps to acquire, process, analyse and visualise remote sensing (RS) observations that are available freely in different temporal and spatial scales [37,38]. In this study, GEE was used to carry out all steps starting from surface reflectance data acquisition to the accuracy assessment stage. All surface reflectance bands were acquired from the Landsat 8 Tier 1 surface reflectance with a spatial resolution of 30 m and temporal resolution of 16 days and preprocessed in GEE using available functions including masking clouds/cloud shadows using the FMASK algorithm, which is capable of identifying and removing clouds/cloud shadows from pixels in all images in the collection [35]. The FMASK algorithm labels potential clouds and cloud shadows of each pixel using the 'pixel_qa' band.
Images were acquired for the entire growing season to extract the reflectance band information and the NDVI [39] was calculated for each pixel (Equation (1)): where NIR is reflectance in the Near Infrared wavelength range (0.636-0.673 μm), and red is the red wavelength range (0.851-0.879 μm). All acquired images were stacked together to build a time series of images to be used in the classification process.

Data Acquisition and Preparation
The Google Earth Engine (GEE) platform is a tool that helps to acquire, process, analyse and visualise remote sensing (RS) observations that are available freely in different temporal and spatial scales [37,38]. In this study, GEE was used to carry out all steps starting from surface reflectance data acquisition to the accuracy assessment stage. All surface reflectance bands were acquired from the Landsat 8 Tier 1 surface reflectance with a spatial resolution of 30 m and temporal resolution of 16 days and preprocessed in GEE using available functions including masking clouds/cloud shadows using the FMASK algorithm, which is capable of identifying and removing clouds/cloud shadows from pixels in all images in the collection [35]. The FMASK algorithm labels potential clouds and cloud shadows of each pixel using the 'pixel_qa' band.
Images were acquired for the entire growing season to extract the reflectance band information and the NDVI [39] was calculated for each pixel (Equation (1)): where ρNIR is reflectance in the Near Infrared wavelength range (0.636-0.673 µm), and ρred is the red wavelength range (0.851-0.879 µm). All acquired images were stacked together to build a time series of images to be used in the classification process. Phenological metrics derived from time series data can provide very useful information for temporal monitoring of vegetation. However, due to noise in the data and missing data (caused by removing pixels affected by clouds and cloud shadows), temporal monitoring applications can be hindered. Therefore, it is usual to reconstruct or harmonise the times series data using a continuous model to improve the signal quality and deal with data gaps [40].

Harmonised NDVI (H-NDVI)
A Fourier Transformation method (FT) was used to harmonise the original NDVI (H-NDVI), and the coefficients were added to the linear model to get the amplitude (A) and phase (ϕ) (Equation (2)): where H-NDVI(t) i is the value of H-NDVI at time t within a pixel (i), β 0 is a constant component of the regression which represents the start of greenness at each pixel, β 1 is a linear coefficient (linear trend), β 2 = A cos (ϕ) and β 3 = A sin (ϕ) are the harmonic terms that represent the cyclical period of 32 weeks for cotton crops (for this study) and ω the angular frequency which was set to 1 to limit the time series by one cycle per time unit (year), and e t represents the error term.
A specific H-NDVI threshold was necessary to confirm active cotton growth in the target fields, delineate the start of the cotton growing season and designate the beginning of the time series data for further analysis [41]. Visual assessment was performed relying on the H-NDVI threshold which was set to 0.3 (after testing different H-NDVI thresholds) to ensure an accurate identification of active growth of cotton within field boundaries. Following this process, time series data sets were constructed for the three time windows from which amplitude and phase data were extracted.

Amplitude
The amplitude (Equation (3)) was recovered from the H-NDVI to indicate the size of H-NDVI change during the growing season: where β 2 and β 3 are the fitted coefficients used to calculate the magnitude of H-NDVI change.
The accumulation of magnitudes in the time series shows the amount of change through time.

Phase
Phase images were constructed using (Equation (4)) to indicate when the season H-NDVI peak occurs (timing of growth stage): where atan2 calculates the arctangent at two dimensions to obtain the phase of H-NDVI [42]. These metrics for each pixel were then collated with the reflectance band information and the H-NDVI for all images in the data sets for the three time periods.

Random Forest (RF) Classification Model
An RF classification model was used due to its ability to indicate the best input variables based on their contribution in the model. These models are also more computationally efficient compared to other classifiers [43]. Since this study seeks to evaluate some phenological-based metrics that might help for cotton detection from surface reflectance data, only two land cover classes were generated (cotton versus others that includes bare soil, water, other crops and pasture and native vegetation). A pixel-based supervised classification was performed, and training of the RF model was conducted in space and time where temporal predictors were used as input for each pixel location. Time series metric variables were assigned to each model based on allocated time windows. Each data point in the training and validation datasets included more metric values by including new scenes as the growing season progresses. For example, mapping cotton at early stages included a time series constructed from five observations for each pixel. For window 2, the number of observations increased to 10 observations for each pixel (including observations from window 1). For window 3, the number of observations increased to 15 observations (including observations from window 1 and 2).
The RF classifier was trained using the training subset with one Rifle decision tree per class, a size of terminal node set to 1, a fraction of input to bag per tree of 0.  (Table 1). A comparison between random sampling and Leave-One-Season-Out Cross Validation (LOSOCV) was performed. With random sampling approach, all samples from all years were used to train and validate the models where the dataset was split to 80% for training and 20% for validation and this was repeated 20 times and the mean of model quality metrics were calculated. For the LOSOCV, two years were used to train each model and one season left out for validation. This was repeated so that each season was included in the validation set once. The RF model was tested with five different combinations of metric variables. The first model was based on only the H-NDVI to test the reliability of using the constructed H-NDVI in identifying cotton fields from other classes. The second model was based on using surface reflectance bands only. The third model was based on using only the phase and amplitude images. The fourth model was based on using H-NDVI, phase and amplitude. The fifth model was based on using all metrics as variables.

Accuracy Assessment
Many accuracy assessment methods are available to validate the model predictions. However, the most common method to assess classification accuracy in remote sensing is based on the error matrix [44]. Classification accuracy was computed by establishing a confusion/error matrix [45] on the training and validation samples [46]. Once the confusion matrix was built, overall accuracies, kappa values, producers' accuracies, users' accuracies, omission and commission errors were derived from this matrix to report the performance of each model.
The overall accuracy (OA), which is the simplest description for the model performance, is calculated by dividing the total number of correctly classified samples by the total number of samples (Equation (5)).
Overall accuracy ranges between 0 when there is no agreement and 1 when there is complete agreement: where OA stands for the overall accuracy, C s is the number of Correct samples (C s ) classified, and N s is the total number of samples. The Kappa statistic is considered a more robust accuracy measure because it considers the random probability of agreement. It estimates the percentage of correctly classified samples compared to chance [47]. Kappa coefficient values range between 0 (when there is no agreement) and 1 (when there is strong agreement). Kappa coefficient is calculated using (Equation (6)): where K is the kappa coefficient, P o is the correctly allocated samples (the proportion of cases in agreement), and P e is the hypothetical random agreement. Producer accuracy (PA) indicates the percentage of correctly classified samples of an individual class in the "map maker" point of view [48]. PA was also derived from the error matrix and it ranges from 0 (when the samples of a class are totally misclassified, to 1 when all samples of a class are all correctly classified) (Equation (7)): where PA is the producer accuracy, C c is the accurately classified samples of a specific class divided by the total number of samples of the same class (C t ).
User's accuracy (UA) is a measure of how well the classified pixels belong to the actual class (vegetation cover). It measures the reliability of the map according to the "user's" point of view [49], and it can be calculated by dividing the total number of correct classifications of a particular class by the total number of reference data for that class (Equation (8)): where UA is the user's accuracy, C c is the total of correctly classified pixels of a specific class, and C r is the total number of a reference data of that particular class. RF variable importance is a very useful feature that ranks the importance of individual predictors regarding their contribution in predicting response and their interaction with the other predictor variables [50][51][52]. For this study, mean decrease in accuracy (MDA) [53] was used to measure the importance of the metric variables for the three time periods.
Commission Error (CE) and Omission Error (OE) (which are supplementary to UA and PA) were also considered to assess the models' performances. CE can have occurred when pixels from a specific class are wrongly classified as other classes in the validation data. CE was simply calculated as (CE = 1 − UA). OE occurs when pixels of a specific class are excluded from that class in the validation data. OE can be calculated as (OE = 1 − PA).

Results
The entire study area is shown in the left side of Figure 4, but a small region (representing cotton fields near Moree, NSW) is highlighted to show the different model mapping results due to the large extent of the study area. The selected time windows allowed the capture of crop variation at the start, middle and towards the end of the season effectively. Tables 2 and 3 show the effect of using different combinations of metrics variables on the models' performances. The accuracy assessment includes the overall accuracy, kappa coefficient, User's accuracy, Producer's accuracy, omission and commission.
According to the results achieved from random sampling technique (Table 2), high accuracies can be achieved when using all data to train and validate the models, and this is in contrast with the models trained on LOSOCV which scored lower accuracies, especially in early season. In Table 2, the only low results when using random sampling were when using H-NDVI only, which resulted in an OA of 0.70 to 0.79 and kappa coefficient of 0.57 to 0.70. UA, PA were also decreased, and the CE and OE were increased significantly. This is expected because using data to train and validate a model from the same season is likely to guarantee similar signatures of metric variables in both training and validation sets, unlike the LOSOCV, where a metric variable of a new season might not have been modelled before, which makes it hard to decide which class a specific signal might belong to. Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 19 Conversely, LOSOCV accuracies increased and errors decreased with the progress of the growing season. Moreover, the overall accuracy, kappa coefficient, UA and PA increased with the addition of amplitude and phase in all three time periods as well as with the allocation of more information (with the progression of the season). It can be seen in Table 3 that the best results were obtained from the models trained on a combination of surface reflectance bands, amplitude, phase and H-NDVI, followed by models using amplitude, phase and H-NDVI and models that used only phase and amplitude. The poorest results were obtained when using only H-NDVI. Overall accuracies increased significantly as the season progressed from a mean of ~0.57 (early season) to a mean of ~0.97 when using all metrics as variables. Kappa coefficients were very low for all three-time windows with a mean of ~0.12 when using only H-NDVI and a mean of ~0.46 when using all metrics (late season classification). The obtained kappa values are generally low even though classification is performed at late stages of plant growth, which means that relying on only H-NDVI to map cotton can be very difficult and not reliable. UA was high in the three time periods and for all four models except when using the H-NDVI, which produced relatively lower UA. PA recorded lower accuracies in the early season models. Commission and omission errors were also high at the early stages of classification. However, these errors were reduced significantly with the progress of the growing season and this is not surprising, since more classes were classified correctly at the mid and late seasons.  Conversely, LOSOCV accuracies increased and errors decreased with the progress of the growing season. Moreover, the overall accuracy, kappa coefficient, UA and PA increased with the addition of amplitude and phase in all three time periods as well as with the allocation of more information (with the progression of the season). It can be seen in Table 3 that the best results were obtained from the models trained on a combination of surface reflectance bands, amplitude, phase and H-NDVI, followed by models using amplitude, phase and H-NDVI and models that used only phase and amplitude. The poorest results were obtained when using only H-NDVI. Overall accuracies increased significantly as the season progressed from a mean of~0.57 (early season) to a mean of~0.97 when using all metrics as variables. Kappa coefficients were very low for all three-time windows with a mean of~0.12 when using only H-NDVI and a mean of~0.46 when using all metrics (late season classification). The obtained kappa values are generally low even though classification is performed at late stages of plant growth, which means that relying on only H-NDVI to map cotton can be very difficult and not reliable. UA was high in the three time periods and for all four models except when using the H-NDVI, which produced relatively lower UA. PA recorded lower accuracies in the early season models. Commission and omission errors were also high at the early stages of classification. However, these errors were reduced significantly with the progress of the growing season and this is not surprising, since more classes were classified correctly at the mid and late seasons.  In this study, the gap-filling technique using FT ( Figure 5 shows the mean original NDVI time series and H-NDVI time series values of cotton fields for the three growing seasons; Figure 6 shows the mean error derived from the H-NDVI time series for the cotton fields for the three growing seasons) helped to compensate masked pixels with interpolated values in the time series and building of the unique signal. The FT was used to decompose the NDVI signal into sine and cosine waves, whereas other studies (e.g., Inglada et al. 2015 [54]) used the linear interpolation to compensate for missing information. However, the linear interpolation might be good in some cases where the gap and variance are small. In contrast, if the gap and variance are large (which is usually a problem of clouds) then more sophisticated techniques such as FT are required [55]. Furthermore, the FT technique also preserves the seasonality of the NDVI which helped to create a distinct trajectory for the cotton and other classes and made them separable especially at the mid and late seasons [56]. This also can be supported by Figure 6, showing a very low error produced by the harmonic model. A study conducted by Hendrik and Cyrus (2006) [57] reported that FT-based NDVI attributes have limitations for crop-type classification. However, their study was based on coarse resolution (1.1 km) data derived from AVHRR, unlike this study which was based on moderate resolution data from L8SR (30 m).
Variable importance plots of model 4 are shown in Figure 7 (for early-season, mid-season, and late-season) which were derived from each RF model. Based on the MDA results, the constructed metric variables showed better contribution to the models than the other reflectance bands. More specifically, the H-NDVI was very important for the model at early stages of cotton growth. However, the contribution of this metric variable decreased with the progress of the growing season. The decrease of H-NDVI importance was influenced by the increased contribution of amplitude with the progress of the growing season (mid-and late-season). Similarly, phase contributed more in the late-season model than in the early-and mid-season models. This is due to the fact the at the beginning of the season, there was insufficient time series data to build a unique phase for the cotton crop.
of the unique signal. The FT was used to decompose the NDVI signal into sine and cosine waves, whereas other studies (e.g., Inglada et al. 2015 [54]) used the linear interpolation to compensate for missing information. However, the linear interpolation might be good in some cases where the gap and variance are small. In contrast, if the gap and variance are large (which is usually a problem of clouds) then more sophisticated techniques such as FT are required [55]. Furthermore, the FT technique also preserves the seasonality of the NDVI which helped to create a distinct trajectory for the cotton and other classes and made them separable especially at the mid and late seasons [56]. This also can be supported by Figure 6, showing a very low error produced by the harmonic model. A study conducted by Hendrik and Cyrus (2006) [57] reported that FT-based NDVI attributes have limitations for crop-type classification. However, their study was based on coarse resolution (1.1 km) data derived from AVHRR, unlike this study which was based on moderate resolution data from L8SR (30 m).  Variable importance plots of model 4 are shown in Figure 7 (for early-season, mid-season, and late-season) which were derived from each RF model. Based on the MDA results, the constructed metric variables showed better contribution to the models than the other reflectance bands. More specifically, the H-NDVI was very important for the model at early stages of cotton growth. However, the contribution of this metric variable decreased with the progress of the growing season. The decrease of H-NDVI importance was influenced by the increased contribution of amplitude with the progress of the growing season (mid-and late-season). Similarly, phase contributed more in the late-

Discussion
Mapping cotton fields using only H-NDVI reduced the accuracy, which could be due to the fact that there are many areas that can reflect similar greenness during the growing season, so relying on only changes in greenness might lead to a confusion of cotton pixels with other classes. Therefore, the delineation of cotton fields was more reliable using the amplitude and phase images only. This also can be seen in Table 3 when using H-NDVI with the amplitude and phase bands, which led to a slight reduction in all accuracies, yet higher than models 1 and 3 ( Table 3). The amplitude and phase outperformed the use H-NDVI only and this is because amplitude and phase are the basis to determine the amount change and time of change in the greenness as mentioned previously. But H-NDVI is still very important and provides complementary information to the models. Conversely, even though the H-NDVI, amplitude and phase were the dominant metric variables, the use of the surface reflectance bands (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11) was still important to the models because they carry additional information that the three metric variables might not have. This is especially the case for band 5 (Near Infrared) which has been used in some crop-type studies and it is very important in crop type classification applications [58,59].
Random sampling technique outperformed the LOSOCV and this was expected because the dataset was split randomly into training and validation. This means data from the same season can be found in both training and validation sets. The only poor results obtained from the random sampling technique was when using H-NDVI only. In the three time windows, H-NDVI yielded very low results and this might be because most classes showed similar greenness, as summer is the time of high vegetative growth, and without capturing the time and amount of H-NDVI change, the separation of cotton from other classes was not effective.

Discussion
Mapping cotton fields using only H-NDVI reduced the accuracy, which could be due to the fact that there are many areas that can reflect similar greenness during the growing season, so relying on only changes in greenness might lead to a confusion of cotton pixels with other classes. Therefore, the delineation of cotton fields was more reliable using the amplitude and phase images only. This also can be seen in Table 3 when using H-NDVI with the amplitude and phase bands, which led to a slight reduction in all accuracies, yet higher than models 1 and 3 ( Table 3). The amplitude and phase outperformed the use H-NDVI only and this is because amplitude and phase are the basis to determine the amount change and time of change in the greenness as mentioned previously. But H-NDVI is still very important and provides complementary information to the models. Conversely, even though the H-NDVI, amplitude and phase were the dominant metric variables, the use of the surface reflectance bands (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11) was still important to the models because they carry additional information that the three metric variables might not have. This is especially the case for band 5 (Near Infrared) which has been used in some crop-type studies and it is very important in crop type classification applications [58,59].
Random sampling technique outperformed the LOSOCV and this was expected because the dataset was split randomly into training and validation. This means data from the same season can be found in both training and validation sets. The only poor results obtained from the random sampling technique was when using H-NDVI only. In the three time windows, H-NDVI yielded very low results and this might be because most classes showed similar greenness, as summer is the time of high vegetative growth, and without capturing the time and amount of H-NDVI change, the separation of cotton from other classes was not effective.
Conversely, the models' performances improved with the progress of the season using the LOSOCV (Table 3), which was because, at the early stages of cotton, the phenological trajectory was not yet clear. This led to confusing pixels between cotton and other classes which reflect similar amplitude at the same acquired time window, especially at early stages of growth [1]. However, the trajectory becomes unique mid-and late-season. The confusion between classes might be avoided by using higher temporal resolution imagery with shorter revisit periods such as Sentinel 2, which can improve capturing distinct trajectories to differentiate different classes. For instance, Vaudour et al. (2015) [46] found that confusion between spring barley and bare soil at early stages of crop growth occurred because of low plant cover at early stages [60]. However, that study was not based on analysis of time series information, which might be an additional cause of confusion. Therefore, incorporating high spatiotemporal imagery might be beneficial for distinguishing between classes based on phenological information. The only concern about higher resolution imagery is that it is very expensive in terms of computation time and may be useful for small scale studies only. For example, it is reported that fine resolution imagery is useful for overcoming over/underestimation of small-sized fields due to the sub-pixel proportion that can cause mixed pixels with other classes [61]. However, in this present study, field sizes are relatively large with an average of 394 hectares, so the L8SR spatial resolution of 30-m was sufficient to avoid this issue. However, cotton mapping at early stages might be improved by using higher temporal resolution images such as from Sentinel 2, which unfortunately are only available from mid-2015 onwards, so could not be applied here.
The accuracies shown in Table 3 reflect the usefulness of amplitude and phase and their contribution to the model ( Figure 6) when using them without other inputs. This is because temporal amplitude and phase images delineate cotton fields based on the size of NDVI changes and the time of that change throughout the growing season, which was obviously different for other vegetation in the study area. This can be seen in Figure 3 where the reflectance from cotton fields started from bare soil (NDVI~0.2) in October to February (NDVI~0.9) when the maximum changes (peak) occurred and then decreased to the bare soil status after harvesting (~April and May). The shape of trajectory was the main driver of the high accuracy and this indicates the change of temporal profile [62,63] which gave a description of the cycle [64].
The unique amplitude could be used to identify cotton, and this was similar to that found by Jakubauskas et al. (2001) [49]. However, they produced seven additive terms which correlated to crop identification. They found that the first term of amplitude can be correlated to a single season, whereas the second term of amplitude (cosine waves) can be correlated to fallow rotation. Cotton fields in the study area were classified using the first term which yielded high accuracy for mapping cotton fields.
Up to now, there are no other studies that have mapped cotton over large areas in Australia. Also, the use of amplitude and phase for in-season mapping of cotton fields is not studied widely and no other studies have handled this aspect. The LOSOCV in this study allowed to test the ability of the models to classify classes on data that were not used to train the models. Therefore, this technique provided results that are reliable and not biased.

Conclusions
The aim of this study was to demonstrate the potential of mapping cotton fields in NSW and QLD within season, by applying a simple method using phenology-based indices derived from a time series of imagery. GEE helped to map large areas of cotton fields without the need of multiple platforms for processing and analysing data. The results obtained from this study show that mapping cotton fields at early-, mid-and late-season can be done with high accuracy. Including a signal transformation to create amplitude and phase to feed the random forest classifier increased the accuracy. This approach can be extended by including more crop types to the model to test its ability to discriminate between a wide range of crops. Furthermore, different vegetation indices can be included in this model to test their sensitivity to discriminate different classes.
More work needs to be done on summer crops such as sorghum, which have a similar reflectance signal to cotton. Also, more work needs to be done in a more complicated system where many crops are grown in the same area to examine the validity of the model on other crops. There is also an opportunity to further test the transferability of the model by including more data from different seasons.
While the present study addressed the cotton detection in NSW and QLD, Australia, it might be applied in other regions. However, it relies on "ground truth" data, and therefore it constrains its applicability on areas with sparse data. A global crop classifier might be envisioned by including the latitudinal/longitudinal gradient. However, this is a matter for future research.