Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia

Sharma, Sneha; Eslick, Harry; Pires, Rodrigo; Singh, Balwinder; Tareque, Hasnein

doi:10.3390/rs18101653

Open AccessArticle

Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia

by

Sneha Sharma

^*

,

Harry Eslick

,

Rodrigo Pires

,

Balwinder Singh

and

Hasnein Tareque

Department of Primary Industries and Regional Development, Perth 6000, Australia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2026, 18(10), 1653; https://doi.org/10.3390/rs18101653

Submission received: 29 March 2026 / Revised: 8 May 2026 / Accepted: 14 May 2026 / Published: 20 May 2026

(This article belongs to the Special Issue Advances in the Remote Sensing of Crop Phenology and Production Monitoring Under Environmental Constraints)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Classification performance increased through the season, with the strongest improvement in late August to early September.
Under multi-year LOYOCV against the reference labels, overall agreement exceeded 90% once vegetation-index observations through August were included, and canola separated earlier than wheat and barley.

What are the implications of the main findings?

Late-winter Sentinel-2 observations provide a practical pre-harvest decision window for in-season crop mapping in this Mediterranean broadacre system.
The workflow provides an interpretable multi-year benchmark for evaluating temporal transferability and linking model performance to crop phenology.

Abstract

Accurate in-season crop type mapping is critical for agricultural monitoring and yield assessment, yet most operational products remain proprietary, post-seasonal or insufficiently tested across contrasting seasons. This study presents an open and transferable framework that quantifies how in-season crop classification skills evolve through the growing season across the southwest agricultural region of Western Australia (WA) using a multi-temporal (2020–2024) Sentinel-2 derived vegetation indices (VIs) time-series. Six crop classes (i.e., wheat, barley, canola, lupins, pasture, and fallow) were evaluated using extreme gradient boosting (XGBoost) and long short-term memory (LSTM) models under a leave-one-year-out cross-validation (LOYOCV) design. Classification performance increased progressively through the season, with a marked improvement in late winter (late August to early September). In LOYOCV, overall agreement with the reference dataset exceeded 90% once vegetation-index observations through August were included, indicating that reliable in-season mapping was achievable before harvest. Canola was separated consistently from mid-season onwards, whereas reliable discrimination between wheat and barley required later phenological information. Independent field-based testing was used to assess true crop identification accuracy for the three externally observed classes: wheat, barley, and canola. In this test set, precision was highest for canola (0.93), followed by wheat (0.82) and barley (0.71). These field-based results supported the main temporal pattern observed in the LOYOCV analysis, particularly the strong mid-season separability of canola and the persistent confusion between wheat and barley. SHapley Additive exPlanations (SHAP) showed thatVIs centred on late winter contributed most strongly to model predictions, consistent with peak phenological divergence among crop types. These results identify a phenologically meaningful decision window for in-season crop mapping and provide a multi-year benchmark for evaluating temporal transferability in Mediterranean broadacre systems.

Keywords:

Sentinel-2; vegetation indices; time-series analysis; in-season crop mapping; explainable artificial intelligence

1. Introduction

Western Australia (WA) is a globally significant agricultural region, characterised by vast broadacre cropping systems dominated by wheat, barley, canola and other legumes and pulse crops. In the 2024 growing season, more than 8.3 million hectares were planted, contributing to one of the largest grain harvests on record (approximately 23 metric tons) and reflecting continued expansion of the cropping area driven by agronomic, economic, and climatic trends [1]. In such a large and dynamic production system, accurate and timely information on crop distribution is crucial to support regional agricultural intelligence, yield forecasting, biosecurity surveillance, resource allocation for logistics and marketing, and policy development.

Agricultural land monitoring also plays a critical role in addressing global challenges such as rapid population growth, rising food demand, climate variability and shifting consumption patterns [2,3,4]. This pressure highlights the need for reliable spatial and temporal information on cropping systems to support sustainable food production and resource management. Crop distribution data and location-specific crop classification maps enable timely monitoring of cropping patterns and provide key inputs for in-season yield forecasting and agricultural risk assessment. This need for accurate crop-type mapping is particularly pronounced in the Mediterranean-type climate of the southwest agricultural region in WA, where strong interannual rainfall variability, increasing climate extremes, and diverse management practices influence crop phenology and productivity. Despite this uncertainty, most crop-mapping products remain post-seasonal or proprietary, arriving too late to inform in-season decisions.

Advances in digital agriculture under the Agriculture 4.0 paradigm that integrates smart farming technologies, big data analytics, artificial intelligence (AI), the Internet of Things (IoT), and information and communication technology (ICT) are reshaping agricultural monitoring and decision-making [5,6,7,8,9]. In parallel, progress in earth observation (EO), cloud computing, and data-driven modelling has enabled scalable crop mapping, demonstrated by operational products such as the USDA Cropland Data Layer (CDL) in the United States [10,11,12], Canada’s Annual Crop Inventory [13], and Copernicus high-resolution crop type layers across the European Union [14,15]. However, despite these advances, WA lacks an open, scalable, and explainable crop mapping framework capable of providing reliable in-season crop classification and transferable performance across multiple years and regions. Addressing this gap requires robust modelling approaches and transparent interpretation methods to support agricultural monitoring and decision-making at regional scales.

Satellite remote sensing has advanced significantly in recent decades, particularly with the launch of the Sentinel-1 and Sentinel-2 missions under the Copernicus programme. Sentinel-2 provides freely accessible optical imagery at 10–20 m spatial resolution with a five-day revisit cycle, enabling the monitoring of crop growth dynamics throughout the growing season [16,17,18]. When combined with long-term Landsat archives, these datasets allow the detection of spectral and structural differences among crops across key phenological stages. However, the 8–16 days revisit cycle of Landsat-7, -8 and -9 limits its suitability for near-real-time monitoring in regions with persistent cloud cover [19]. In addition, the 30 m spatial resolution may not resolve individual paddocks in heterogeneous agricultural landscapes [20]. Sentinel-2’s higher temporal and spatial resolution address these constraints, through processing time series at scale requires substantial computational resources. Cloud computing platforms such as Google Earth Engine (GEE) have reduced technical barriers, enabling large-scale and near-real-time crop monitoring [21,22].

Remote sensing-based crop classification has largely followed two broad strategies. The first relies on spectral features from a single satellite scene sampled on a particular day within the growing season [23,24]. While computationally efficient, this approach often struggles to distinguish some crops with similar spectral responses during peak growth, leading to reduced classification accuracy. The second strategy incorporates multi-temporal time-series data, allowing models to capture seasonal dynamics and crop-specific phenological patterns throughout the season [7,25,26,27]. Time-series approaches are particularly valuable because different crops exhibit distinct growth trajectories, which can reveal when crop discrimination becomes possible and which phenological stages contain the most useful information [28,29]. Previous studies have demonstrated the potential of time-series satellite data for crop classification in systems such as rice, maize, soybean and major summer crops [30,31,32]. However, much of this evidence is based on single year experiments, within-year train-test splits or simplified crop systems, including binary rotations [21,32,33], leaving the year-to-year transferability of in-season classifiers less well resolved for diverse broadacre environments such as those found in WA.

Within this context, Extreme Gradient Boosting (XGBoost) and Long Short-Term Memory (LSTM) networks represent two complementary strategies for learning from time-series Earth observation data. XGBoost is well suited to tabular remote-sensing predictors because it can model non-linear relationships among spectral and temporal variables, accommodate heterogeneous feature importance, and perform strongly in operational agricultural applications [27,34,35]. LSTM networks were developed to learn temporal dependencies in sequential data [36,37] and have shown strong performance in crop classification, in some cases exceeding traditional machine-learning approaches when crop separability depends on dense phenological trajectories [31,38,39,40]. More recent developments, including BiLSTM, hybrid temporal masking, and multimodal memory-based architectures, further highlight the value of recurrent and memory-based models for extracting temporal, spatial, and spectral information from remote-sensing data [41,42,43,44]. These models provide a useful comparison between two ways of learning from the same seasonal vegetation indices (VIs) information: XGBoost as a tabular learner over date-specific predictors, and LSTM as a sequential learner over ordered VI trajectory.

Yet two issues continue to constrain operational uptake. First, many studies use limited temporal coverage, which reduces confidence that in-season performance will remain stable across seasons with different rainfall, temperature, and management conditions. Second, high-performing classifiers often operate as black boxes, providing little insight into which phenological stages or VIs drive crop discrimination. Both issues are especially important in the Mediterranean cropping systems of south-western WA, where strong inter-annual variability can shift emergence, canopy development, flowering, and senescence, and thereby alter the timing of spectral separability.

Here, we use five growing seasons of Sentinel-2 observations (2020–2024) and paddock-level reference labels to evaluate in-season crop classification for six dominant crop classes in the southwest agricultural region of WA. We compare XGBoost and LSTM under a leave-one-year-out cross-validation (LOYOCV) design, assess independent external performance using field-based observations for canola, wheat, and barley, and progressively truncate the seasonal time-series to quantify how classification skill changes as the season progresses. We then apply SHapley Additive exPlanations (SHAP) to identify which VIs and observation windows drive model predictions. Our objectives are to determine the earliest decision-ready mapping window, assess year-to-year transferability, and relate model performance to phenological timing. The contribution of our study is on combining multi-year temporal transfer testing, progressive in-season truncation, and phenology-linked feature attribution to identify when in-season crop mapping becomes reliable and why.

2. Materials and Methods

2.1. Study Area

The focus of our current research is the southwest broadacre agricultural region in WA, as shown in Figure 1. The study area extends from Geraldton in the north to Esperance in the south, spanning approximately 20 million hectares [45] in 2022. This region exhibits a mediterranean-type climate characterised by hot, dry summers and mild to cool, wet winters, shaped by the seasonal shift in the tropical ridge and the passage of winter westerly storm systems that deliver cold fronts and low-pressure events to the southwest. Mean maximum temperatures peak in January–February (~31–32 °C) and are lowest in July (~18 °C), with typical diurnal ranges of 8–18 °C. Rainfall is strongly seasonal, averaging ~700 mm annually with around 80 rain days (>1 mm) per year [46]. Agricultural soils in WA are inherently low nitrogen and phosphorus availability, predominately sandy to duplex profiles [47,48].

Importantly, the region displays pronounced spatial climatic gradients that influence both agricultural production and crop phenology. Rainfall generally declines with increasing distance from the coast, creating a marked west–east gradient in moisture availability. Higher-rainfall zones in the western and southern coastal districts support earlier crop establishment and longer growing seasons, whereas the drier inland eastern grainbelt is characterised by shorter seasons and increased susceptibility to moisture stress. A north–south temperature gradient also exists, northern areas experience warmer conditions and earlier growing season onset, while cooler southern zones delay emergence, elongate vegetative growth, and shift phenological stages later into the season. These interacting gradients in rainfall, temperature, and season length contribute to substantial variability in the timing of crop development across the study area. Such temporal shifts in phenology highlight the importance of time-sensitive approaches for in-season crop classification, particularly when applying fixed calendar-based observation windows across multiple years and locations.

The main grains cultivated in the southwest farming system include wheat, barley, canola, and lupins [49]. In recent years, particularly since the late 2000s, the area under canola cultivation has expanded significantly, especially in the southern regions where average rainfall exceeds 400 mm [49]. Annual pastures constitute the predominant land cover types, reflecting the extensive areas dedicated to livestock grazing in the WA rangelands and agricultural region [50]. Broadacre crops in WA follow typical patterns characterised by autumn sowing with the onset on consistent seasonal rainfall (April–June), winter vegetative growth (June–August), spring flowering (August–October) and harvest (October–December).

Table 1 presents published sowing, flowering, maturity, and harvest ranges for the target crops from WA-based and closely comparable Australian agronomic sources. These ranges were harmonised into approximate regional phenology windows for the southwest agricultural region to support interpretation of temporal separability in the classification experiments. Because phenological timing varies with cultivar, sowing date, rainfall zone, and seasonal conditions, these windows should be interpreted as indicative rather than as fixed stage boundaries for all paddocks or years.

2.2. Labelled Reference Data Sources

Crop type labels used for supervised training and validation were obtained from Digital Agriculture Services (DAS), unpublished data, under a research licence. These labels were derived from an independently developed operational crop-mapping system and were used for model calibration and benchmarking. Details regarding the DAS training data and machine learning model are described in Lawes et al. [45] and Fowler et al. [57]. While these data do not represent field-verified ground truth, they provide a consistent and spatially comprehensive reference dataset suitable for large-scale modelling applications, particularly where field observations are limited or unavailable. The use of high-quality, model-derived labels as reference data is increasingly adopted in remote sensing studies for regional-scale classification and benchmarking, particularly when operational products are available and internally validated. In this study, DAS labels are treated as a reference dataset rather than absolute truth, and model evaluation is complemented by independent external test data using field-based observations to assess robustness and generalisability. Apart from the training dataset, an independent field-based test dataset was collected between 2020 and 2024 through crop disease surveillance activities conducted by the Department of Primary Industries and Regional Development (DPIRD) across Western Australian farming systems. Field surveys were undertaken annually during the growing season, where crop type and associated observations were recorded at paddock level across a broad geographic transect spanning from Geraldton to Esperance. After restricting the DAS reference dataset to the six target classes (barley, canola, fallow, lupins, pasture, and wheat) and excluding low-confidence records with DAS, 1,247,690 paddock-level samples were retained for model development. Under the outer LOYOCV design, the held-out sample counts were 242,630 (2020), 247,239 (2021), 253,926 (2022), 254,771 (2023), and 249,124 (2024), with corresponding training counts of 1,005,060, 1,000,451, 993,764, 992,919, and 998,566. An independent external field-based test dataset comprising 340 paddocks was reserved for final evaluation and included canola, wheat, and barley only.

2.3. Satellite Data Preparation

Figure 2 shows the overall methodology used for data preparation and classification model development. Sentinel-2 Level-2A surface reflectance imagery at 10 m spatial resolution (Copernicus/S2_SR_Harmonized) was processed using the Google Earth Engine (GEE) platform to generate multi-temporal inputs across the study region. To reduce cloud and shadow contamination, cloud masking was applied using a combination of the Sentinel-2 Scene Classification Layer (SCL) and the cloud probability band (MSK_CLDPRB). Pixels classified as cloud shadow (SCL = 3) and cirrus (SCL = 10) were removed, and an additional threshold was applied to exclude pixels with cloud probability greater than 5%. This combined approach provides a conservative and robust filtering of atmospheric contamination compared to using a single quality layer. To balance temporal resolution with data completeness, a 10-day median composite product [58] was produced for each growing season between April and October for the years 2020–2024. Despite these preprocessing, optical time-series data remain susceptible to temporal gaps, particularly during periods of persistent cloud cover. To ensure continuous time-series inputs for subsequent modelling, linear interpolation was applied to fill missing observations within the composite sequences, preserving seasonal trajectories while minimising artefacts associated with irregular acquisition intervals. Time-series spectral bands, including blue, green, red, near-infrared (NIR), red-edge 1, and red-edge 2, were extracted for the growing season from April to October for each year between 2020 and 2024. Shortwave infrared (SWIR-1 and SWIR-2) bands were not included in this analysis to maintain a consistent 10 m spatial resolution across all input features and avoid potential artefacts introduced by resampling 20 m bands to finer resolution. Although 10-day median compositing and linear interpolation may smooth short-term phenological variation, this approach was adopted to reduce residual noise and cloud-related gaps while preserving the broader seasonal vegetation trajectories required for in-season crop classification.

To derive paddock-level inputs for model development, zonal statistics were applied using paddock boundaries as spatial units. For each composite date and spectral band, the mean value of all cloud-free Sentinel-2 pixels within each paddock was calculated, resulting in a paddock-scale time series of spectral features used for crop classification. Each paddock was treated as a single observational unit in the modelling framework, consistent with object-based (parcel-level) classification approaches. Consequently, all subsequent model training and validation procedures were based on paddock-level samples rather than individual pixels or paddock area, ensuring that each paddock contributed equally to the analysis and reducing potential bias associated with varying paddock sizes, although smaller paddocks may exhibit greater variability due to fewer contributing pixels.

Vegetation Indices (VIs) and Feature Engineering

A diverse set of VIs derived from Sentinel-2 spectral data was used as model features to capture crop growth dynamics and canopy biochemical properties throughout the growing season. These indices were selected to represent key spectral–biophysical relationships, such as chlorophyll concentration, vegetation density, and canopy structure, providing both temporal and structural context for LSTM and XGBoost models.

Traditional indices such as the Normalised Difference Vegetation Index (NDVI) [59] and the Enhanced Vegetation Index 2 (EVI2) [60,61] were included to represent photosynthetic activity and overall vegetation vigour across the season. The Soil-Adjusted Vegetation Index (SAVI) [62] was used to reduce soil brightness effects, particularly in early phenological stages, while the Normalised Difference Red-Edge Index (NDRE2) [63] enhanced sensitivity to chlorophyll variation within dense crop canopies. In addition, the Visible Difference Vegetation Index (VDVI) [64] was calculated from visible reflectance bands (blue, green, red) and was developed originally to provide an alternative vegetation signal independent of near-infrared data, as originally designed for applications where NIR information is unavailable or unreliable. The Vegetation Coverage Index (VCI) proposed by He et al. [65] was also incorporated; unlike earlier drought-monitoring VCIs, this new index directly estimates fractional vegetation cover (FVC) using spectral contrast between vegetation and soil. The VCI demonstrates improved linearity with canopy closure and minimal sensitivity to soil background, making it a robust input for models that infer crop coverage and vigour at fine spatial scales [65]. Furthermore, two recently developed indices, Vi₂ and Vi₃, proposed by Ashourloo et al. [66], were integrated to enhance separability among cereal crops, particularly wheat and barley. These indices exploit optimised combinations of Sentinel-2 visible and red-edge bands identified through Relief-F feature selection, emphasising phenological differences during heading and flowering stages. When tested in combination with SVM and Random Forest classifiers, Vi₂ and Vi₃ achieved classification accuracies above 88% and Kappa > 0.80, outperforming NDVI and EVI2 for similar datasets [66]. The equation used to calculate the VIs are listed on Table S2.

2.4. Classification Model

Two complementary machine learning algorithms were implemented to evaluate in-season crop classification performance using Sentinel-2 derived VI features: XGBoost and LSTM networks. These algorithms were selected to represent contrasting but widely adopted approaches for modelling multi-temporal remote-sensing data. XGBoost was used as a tabular tree-based learner, in which multi-temporal vegetation-index observations were represented as fixed predictors. In contrast, LSTM was used as a sequential learner, in which the same vegetation-index observations were arranged as an ordered time series. Therefore, the comparison between XGBoost and LSTM should be interpreted as an evaluation of tabular and sequential representations of the same underlying spectral-temporal information.

2.4.1. Extreme Gradient Boosting (XGBoost)

XGBoost algorithm is an advanced ensemble learning method based on gradient-boosted decision trees, optimised for speed and predictive performance [67]. It sequentially builds an ensemble of weak learners, typically decision trees, where each new tree corrects the residual errors of the previous ones using gradient descent optimisation [68]. XGBoost incorporates regularisation (L1 and L2) to prevent overfitting and employs techniques such as parallelised tree construction and weighted quantile sketching for handling large, sparse datasets efficiently [67]. The algorithm’s key hyperparameters govern its learning process and model complexity. Parameters such as the learning rate (

η

) control the contribution of each tree, while max_depth and min_child_weight regulate tree size and overfitting [69]. The subsample and colsample_bytree parameters introduce stochasticity by sampling subsets of the data and features, enhancing generalisation. Regularisation parameters (

λ

L2;

α

, L1) penalise overly complex trees, while gamma (

γ

) controls the minimum loss reduction required for further partitioning [67]. During training, XGBoost uses second-order gradient-based optimisation to efficiently compute both gradients and curvatures, thereby accelerating convergence and improving stability [67]. To maintain strict separation between model development and evaluation, XGBoost hyperparameter optimisation was performed only within the training years of each LOYO fold. Candidate hyperparameter combinations were evaluated using three-fold cross-validation applied exclusively to the four training years, with macro-F1 score used as the optimisation metric to give balanced weight to all crop classes. The held-out year was not used during hyperparameter tuning or model selection and was reserved only for fold-level evaluation. The evaluated hyperparameter space included n_estimators: 100, 200, and 300; max_depth: 3, 5, 7, and 10;

η

: 0.01, 0.05, 0.1, and 0.2; subsample: 0.7 and 1.0; colsample_bytree: 0.7 and 1.0;

γ

: 0 and 1; and min_child_weight: 1 and 3 [69]. Randomised grid search combined with targeted grid refinement around promising regions was used to identify the best-performing parameter set within each fold. After completing all five LOYO folds, the best hyperparameter sets identified from the training data were aggregated, and the most frequently occurring parameter combination was selected as the final XGBoost configuration.

2.4.2. Long Short-Term Memory (LSTM)

For crop classification, time-series features derived from optical or radar imagery are commonly structured as sequential input vectors and passed through one or more stacked LSTM layers, followed by a fully connected dense layer that outputs the predicted crop label [39,43]. To guide hyperparameter selection and model configuration, we reviewed previous studies that evaluated recurrent deep-learning models for remote-sensing-based crop classification. Sher et al. [70] systematically tested more than 1000 LSTM hyperparameter combinations, including optimiser type, activation function, batch size, and number of LSTM layers, and showed that LSTM performance is sensitive to hyperparameter choice. Zhao et al. [41] evaluated several deep-learning models, including LSTM, GRU, LSTM-CNN, and GRU-CNN, for Sentinel-2 crop-type mapping and used empirical and grid-search-based tuning of parameters such as dropout rate and recurrent cell number. Similarly, Durrani et al. [44] demonstrated that layer number, batch size, and filter configuration can affect recurrent deep-learning performance in crop classification.

Based on these studies, a bidirectional LSTM model was implemented in this study to represent multi-temporal vegetation-index trajectories. The LSTM input tensor had dimensions

T \times p

, where

T

represents the number of temporal observations within the seasonal window and

p

represents the number of VI features at each time step. For the April–October growing-season configuration,

T = 22

time steps were available at 10-day intervals, spanning from d10 to d31. This sequence-based approach is consistent with previous crop-mapping studies that have demonstrated the value of multi-temporal Sentinel-2 observations for capturing crop growth dynamics [43].

The standard sigmoid and hyperbolic tangent activation functions were used internally within the LSTM gates, while ReLU activation was applied in the dense layer. Model training used a weighted categorical cross-entropy loss function with the AdamW optimiser, with a learning rate of

1 \times 10^{- 4}

and weight decay of

1 \times 10^{- 4}

. Class weights were calculated from the inverse frequency of each crop class in the training data to reduce the influence of class imbalance and ensure that under-represented classes contributed more strongly to the loss during optimisation. Gradient norms were clipped to 1.0 to stabilise training. Early stopping was applied with a patience of 10 epochs based on macro-averaged F1-score from an internal validation split within the training years, up to a maximum of 100 epochs.

Hyperparameter optimisation was performed within the LOYOCV framework using only the training years available in each fold. For each LOYO fold, the held-out year was excluded from hyperparameter selection, early stopping, scaler fitting, and model training, and was used only for final fold-level evaluation. Candidate LSTM configurations were evaluated through grid search using only the four training years in each fold. The evaluated grid search space included hidden dimensions

h \in {32, 64, 128, 256}

, number of LSTM layers

n \in {1, 2, 3}

, and batch size

\in {32, 64, 128, 256, 512}

, with a fixed dropout rate of 0.3. For each fold, the best-performing hyperparameter configuration (highest macro-F1 on training years) was selected and evaluated on the held-out test year. The final LSTM model used the best-performing hyperparameter configuration identified through the within-fold grid-search optimisation.

2.5. Model Validation and Accuracy Assessment

To ensure model robustness and temporal generalisation, a LOYOCV framework was implemented using growing season data from 2020 to 2024. This temporal validation strategy avoids optimistic bias arising from random within-season splits and is particularly appropriate for this study because spatial and temporal autocorrelation are prevalent in large-scale agricultural datasets. LOYOCV ensures that models are evaluated on temporally independent held out data, providing a realistic assessment of predictive performance on unseen crop years. For both XGBoost and LSTM, the held-out year was not used for model fitting, hyperparameter optimisation, scaler fitting, early stopping, or model selection. Hyperparameter tuning was conducted only within the training years available in each LOYO fold, using the model-specific procedures described in above sections.

Model performance was quantified using overall accuracy, precision, recall, class-specific F1-score, commission error, omission error, and confusion matrices (Table 2) to assess within-class and between-class classification patterns. Because the dataset was imbalanced, with pasture and wheat more prevalent than minor classes such as lupins and fallow, class-specific metrics were emphasised alongside overall accuracy. This allowed model performance to be assessed more appropriately under imbalanced class distributions. After LOYOCV evaluation, final models were retrained using the full multi-year labelled dataset from 2020 to 2024 with the selected hyperparameter configurations. These final models were then evaluated using the independent external independent field-based test dataset.

In addition to the full-season evaluation, an in-season classification experiment was conducted to assess the feasibility of early crop-type identification. For this experiment, time series of VIs (NDVI, EVI2, SAVI, VDVI, NDRE2, Vi₂, Vi₃, and VCI) were incrementally truncated at multiple temporal cut-off points, corresponding to key phenological stages between April and October (e.g., early growth, flowering, grain fill). Models were retrained and evaluated at each truncation using the same LOYOCV framework, allowing classification accuracy to be tracked through the growing season. This allowed assessment of progressive classification accuracy through the season, quantifying the earliest period at which reliable crop discrimination could be achieved, while maintaining consistency in model structure, validation strategy, and performance metrics. This validation framework provides an operationally relevant assessment of model performance against the reference dataset, addressing temporal generalisation, class imbalance, and early-season uncertainty.

3. Results

3.1. Crop Distribution

The distribution of crop types followed consistent patterns throughout the study period (2020–2024). Pasture was the dominant class in all seasons, accounting for 38% of the total paddock area each year (Table 3). This reflects the prevalence of mixed farming systems across the southwest agricultural region, where pasture plays a key role in crop-livestock rotations and long-term soil management.

The second most prevalent crop class was wheat, consistently accounting for more than 27–29% of the annual paddock composition. Its relatively stable spatial extent across seasons makes wheat a contributor to overall classification performance. Canola exhibited higher inter-annual variability than pasture-wheat classes, with its proportional area increasing from approximately 9% in 2020 to a peak of 17% in 2022, before stabilising around 14% in subsequent years. This expansion is consistent with longer-term trends toward increased canola adoption in higher rainfall zones. Barley maintained representation across all seasons, accounting for 10–14% of the paddock area. Lupins and fallow were consistently minor classes, each contributing a small proportion of the dataset (

\leq

6.2%). The limited representation of lupin and fallow compared with the dominant wheat and pasture classes highlight the class imbalance inherent in the large-scale agricultural landscapes and presents a more challenging classification problem than balanced experimental datasets.

The observed class distribution provides important context for interpreting classification results presented in subsequent sections. Dominant classes, such as pasture and wheat, benefit from larger training samples, whereas minority classes, e.g., lupins and fallow, are more sensitive to inter-annual variability and to spectral overlap with other crop types. The use of LOYOCV ensures that model performance reflects temporal generalisation rather than year-specific class distributions.

3.2. Seasonal Vegetation Index (VI) Signature and Phenological Behaviour

Seasonal trajectories of VIs revealed distinct phenological patterns among major crop classes, providing a clear basis for temporal crop discrimination. Figure 3, Figure 4 and Figure 5 show representative seasonal profiles for key VIs, highlighting both inter-crop separability and inter-annual variability across the 2020–2024 growing seasons.

Figure 3 shows the EVI2 seasonal overlay of barley, wheat and canola. Both cereals exhibit a classic bell-shaped growth curve, with the onset of vegetative growth, the ascending phase, beginning in June, followed by a peak around late August (approximately August 20–30), and a decline thereafter toward harvest in late October. Peak EVI2 values for wheat and barley typically ranged between 0.4 and 0.5, which corresponds to partial canopy closure, that is, an intermediate canopy density during heading and early grain fill. Notably, EVI2 values rarely approach 1 in agricultural systems; even dense vegetation usually peaks below ~0.7 [60]. Both cereal crops have similar curve characteristics, reflecting rapid early growth followed by a symmetric senescence period, corresponding to the tillering, flag leaf emergence, and grain fill stages that strongly influence canopy reflectance. Figure 4 and Figure 5 show that the start of the growing season or initial rise in NDVI and VCI around mid-May for both wheat and barley, but barley often showed slightly earlier canopy development and marginally higher peak values.

In contrast to the cereals, canola exhibited a broader and earlier reflectance profile. Canopy development began earlier, with peak EVI2 values reaching approximately 0.6 between 20 July and 10 August across seasons. Unlike the cereals, which show a sharp and symmetrical rise and decline, canola displayed an extended plateau associated with prolonged flowering and pod development. Importantly, canola also senesced earlier, with the decline in EVI2 beginning soon after peak flowering, well before senescence was apparent in wheat and barley. This broader peak and earlier onset of senescence produced a distinct temporal signature that differentiated canola from cereal crops, even under varying seasonal conditions (Figure 3). The NDVI and VCI curve for canola shows the start of the season in early May, and the point of maximum curvature before the peak lies between early and mid-July (Figure 4 and Figure 5).

Seasonal climate variability between 2020 and 2024 strongly influenced the magnitude and timing of these VI trajectories. For example, cooler and wetter winter conditions in 2021 promoted prolonged canopy greenness, elevating mid-season EVI2 values for cereals (Figure 3a,b). In contrast, the year 2023 had below-average winter rainfall across much of the grainbelt, resulting in compressed growing-season curves with lower peak VI values and an earlier onset of senescence. Break-of-season rainfall also played a major role in determining the timing of initial canopy development: years with early, consistent April-May rainfall exhibited earlier rise in EVI2, NDVI and VCI, whereas late rainfall delayed canopy establishment regardless of crop type.

Inter-annual climatic variability influenced the magnitude and timing of VI trajectories but did not fundamentally alter their relative structure. Wetter seasons, such as 2021, were associated with elevated peak VI values and prolonged periods of high greenness, particularly for wheat, barley and canola (Figure 3). Drier or more variable seasons showed compressed growth cycles, reduced peak values, and earlier onset of senescence. Despite these shifts, the relative ordering and shape of crop-specific VI profiles remained consistent across years, indicating stable phenological signatures that are robust to moderate seasonal variability (Figure 3).

In contrast to the major grain crops (the cereals and canola), pasture, lupins, and fallow displayed markedly different VI trajectories. Annual pasture showed an early rise in greenness following autumn rainfall, reaching moderate VI values (0.3–0.4) by May–June and peaking around August with values approaching 0.7. This pattern reflects rapid biomass accumulation in mixed grass–legume systems under favourable soil moisture, followed by gradual senescence during spring. The seasonal trajectory of lupins’ VIs were similar in shape to the cereal crops, showing a rapid increase in VI magnitude, and a well-defined peak, but with a delayed onset and greater seasonal amplitude (Figure 4 and Figure 5). This indicates a slower establishment but denser canopies later in winter for lupins demonstrated by the peak VI values frequently exceeding those of cereals, reaching 0.75–0.85 in mid to late August, before declining toward maturity. Fallow paddocks maintained consistently low VI values throughout the season, typically below 0.3, reflecting the limited vegetation growth.

3.3. In-Season Classification Performance and Temporal Sensitivity

Figure 6 and Table 4 reports LOYOCV accuracy metrics representing agreement with the reference dataset for LSTM and XGBoost model. Classification performance increased consistently as longer segments of the growing season were incorporated into the LSTM model, demonstrating strong temporal sensitivity as shown in Figure 6. Across most years, F1-scores and overall accuracy (Table 4) improved from early season (April–July) to the full season (April–October) inputs, reflecting the increasing availability of phenological information as crops progressed toward maturity. While the general trend indicates improved performance with longer observation windows, some years exhibited plateauing or slight declines at the end of the season, highlighting inter-annual variability in temporal signal quality.

Using the full growing season (April–October), both XGBoost and LSTM models achieved high classification accuracy, with overall accuracies exceeding 90% across the LOYOCV experiments (Table 4). XGBoost consistently achieved higher accuracy than LSTM, particularly under truncated early-season conditions (Figure 6, Table 4), indicating robust predictions when limited temporal information was available. The performance of XGBoost relies on the summary features derived from the VI time series, which tend to be more robust to noise, missing observations, and irregular temporal sampling. In contrast, LSTM models depend on learning temporal patterns directly from sequential inputs and may be more sensitive to architectural choices, limited hyperparameter tuning, class imbalance or noise in training data. LSTM performance improved as additional time steps were included, reflecting its capacity to exploit longer sequential phenological patterns. Additionally, the use of paddock-level averaging may reduce within-field temporal variability, thereby diminishing the advantage of sequence-based models [71].

The temporal truncation experiments showed a clear inflection point in classification performance during late winter. When VI time series were truncated at the end of July, both models maintained reasonable accuracy, with overall accuracy exceeding 80%.; however, substantial improvements were observed when data through August were included, with accuracy increasing to approximately 90% for XGBoost and 88% for LSTM. Beyond this point, gains from additional late-season data diminished, suggesting that key information was already captured by late August to early September.

Inter-annual variability affected early-season model performance, particularly when limited observations and reduced spectral separability constrained crop discrimination. Lower early-season accuracy was observed across multiple years (e.g., 2020, 2021, and 2024; Figure 6), indicating that this pattern is not solely attributable to a single season but reflects a broader limitation of early phenological stages, where crops exhibit similar spectral characteristics. In some seasons, such as 2021, persistent winter cloud covers likely further reduced temporal data density and contributed to lower performance during early to mid-season periods. However, once observations from August onwards were included when cloud frequency typically decreases and phenological divergence between crop types increases classification performance improved and became more consistent across years.

Overall, the observed error patterns are consistent with known limitations of multispectral VIs for separating crop types with similar growth habits. The dominance of wheat–barley confusion highlights the challenge of separating cereal crops using optical indices alone, particularly during early to mid-season stages. These results underscore the importance of incorporating phenologically informative temporal windows and motivate the use of explainable analyses to identify when crop separability is maximised, as explored in the subsequent section.

3.3.1. Crop Specific Errors and Confusion Patterns

Class-specific accuracy metrics and confusion matrices based on reference labelled dataset suggested systematic patterns in misclassification that are consistent with observed phenological overlap among crop types (Figure 7 and Figure 8). While overall classification performance was high, errors were not evenly distributed across classes, with the highest uncertainty observed among spectrally and phenologically similar crops.

Canola consistently exhibited the lowest commission and omission errors across all temporal windows (Figure 7). When the full growing season was used, both models achieved commission and omission errors below 5% for canola. Even under a truncated observation period (April–July), canola errors remained below 13%, indicating strong separability from other crop types early in the season. These results suggest that canola’s distinct phenological pattern allows for early-season classification with minimal confusion. This correlates with its characteristic early and broad VI peaks, which produce a clear spectral signature (i.e., earlier canopy closure, higher peak VI values, and an extended flowering period) that sets it apart from cereals even before full maturity.

In contrast, the errors in wheat classification show a stronger dependence on the length of the temporal input data. For the LSTM model, total error rates increased from approximately 10% using full season data to nearly 19% when only data up to July were included. Omission errors dominated this increase, indicating that wheat paddocks were more frequently misclassified as other crops under early-season conditions. XGBoost exhibited higher temporal stability, maintaining total error rates of approximately 10% across most observation periods, although confusion with barley remained evident. These results suggest that wheat identification relies heavily on late-season phenological features associated with grain filling and early senescence.

Figure 8 shows the confusion matrix from XGBoost model for different observation periods. Since the overall accuracy of XGBoost was higher than LSTM, the confusion matrix results were presented as the highest achieving model. The barley class showed the highest classification uncertainty across both models. Confusion matrixes indicate that approximately 30–40% of barley paddocks were misclassified as wheat, particularly when training data were truncated before August (Figure 8). The confusion persisted at reduced levels (i.e., 0.90 precision) even when full season data were available. The high degree of misclassification reflects the overlap in canopy structure, chlorophyll dynamics, and phenology between wheat and barley, as demonstrated by their closely aligned VI trajectories (Section 3.2/Figure 8). Additional confusion was observed between barley and annual pasture during mid-season, when pasture reached peak reflectance and exhibited VI values comparable to sparse or early–senescing cereal canopies.

Minor classes displayed contrasting classification outcomes. Lupins were generally well classified once sufficient mid- to late-season data were available, achieving precision values exceeding 90% by August. This performance reflects lupins’ delayed onset and higher peak VI values relative to cereals. Pasture exhibited high precision and recall throughout most of the season (0.95 and 0.97), although limited confusion with cereals occurred during early establishment and late senescence phases (Figure 4 and Figure 5). Fallow paddocks did not exhibit strong seasonal VI peaks and showed lower amplitude across the season, making them largely separable from other classes. Misclassification of fallow mainly occurred when pasture or cereal fields were sparsely vegetated, reducing spectral contrast between classes. This is supported by observed confusion patterns (Figure 8) and is consistent with known limitations of VI–based classification, where low fractional vegetation cover and poor crop establishment can lead to spectral similarity between bare soil, stressed crops, and fallow conditions [72].

Overall, the observed error patterns are consistent with known limitations of multispectral VIs for discriminating crop types with similar growth habits. The dominance of wheat–barley confusion highlights the challenge of separating cereal crops using optical indices alone, particularly during early to mid-season stages. These results point to the importance of incorporating phenologically informative temporal windows and motivate the use of explainable analyses to identify when crop separability is maximised.

The confusion matrix from model further illustrates how the temporal input length affects the class separability. In October, when the full signal was available, canola achieved a high precision of 98%, demonstrating the model’s ability to identify canola with minimal false positives. Even with limited data available up to July, canola precision remained above 93%, confirming that canola can be accurately classified as early as mid-season. Wheat and barley showed greater inter-class confusion early on; misclassification rates only decreased after including post-anthesis and senescence features in the data. A similar study was conducted over a region in Victoria, Australia by Nguyen et al. [73], which shows the performance of the LSTM model trained over early season (April–June), mid-season (April–October) and full season (April–December) with overall accuracy of 80%, 91% and 93%, respectively. In contrast to our model, the accuracy of canola, wheat, and barley reported by Nguyen et al. [73] for the April–October window was lower (88%, 86%, and 83%, respectively), and declined further for early-season classification (79%, 69%, and 65%). These differences may reflect variations in class distribution, environmental conditions, and crop phenology between the study regions, as well as differences in data sources and preprocessing approaches.

Lupins were also well classified in August, with a precision of 93%. In both models, pasture and fallow exhibited contrasting classification behaviours, with a precision of 95% and 85% in August, respectively. Minor confusion mainly involved wheat and pasture, especially early in the season when pasture greenness overlapped with emerging cereals, and later when pasture’s VI values resembled those of sparsely vegetated surfaces. Fallow had lower precision at 85% but consistently showed a low VI signal (below 0.3, according to Figure 4 and Figure 5) throughout the season, clearly distinguishing it from active land uses. Overall, pasture can be reliably detected throughout most of the season, while fallow, despite its spectral distinctness, can occasionally be confused with degraded or senescing vegetation under variable growing conditions.

3.3.2. Explainable AI Analysis of Temporal Importance

To interpret the drivers of model performance and identify phenologically critical periods for crop discrimination, SHAP analysis was conducted on the XGBoost model trained across different temporal windows. The SHAP score measured the contribution of each feature, VI, and time interval to the model’s prediction, allowing direct interpretation of when and why particular features influence crop classification outcomes. Positive SHAP value indicates that the feature increases the model’s likelihood of predicting the target crop class, thereby supporting that class. Conversely, a negative SHAP value suggests the feature decreases the possibility of that crop class and may favour a different class. The colour represents the feature value, whether the paddocks exhibit a high or low value for the specific VIs (Figure 9). Feature names reported in SHAP outputs follow the form VI_dXX, where VI denotes the vegetation index and dXX denotes the sequential 10-day composite window within the annual series retained for analysis (e.g., ndvi_d10 = NDVI for the tenth 10-day window, corresponding approximately to early April after exclusion of January–March and November–December windows) (Table S1).

The SHAP analysis showed a strong correlation of feature importance in late August to early September (d25; Table S1). Across all temporal configurations that included this window, VIs centred on late August ranked as the most influential predictors a with EVI2, Vi3 and VCI at d25 consistently ranked as the top three features (Figure 9). When models were trained using data truncated before August (i.e., up to July), the contribution of these features disappeared, and overall classification accuracy declined substantially (0.88 to 0.82, Table 3). SHAP values for earlier-season indices were more diffuse and of lower magnitude, indicating weaker and less consistent discriminatory power during early vegetative stages. Conversely, features derived from late September and October exhibited negligible SHAP values, suggesting that post-maturity vegetation signals contribute little additional information for crop differentiation once senescence was well underway.

The late-August and early September window coincides with a phenologically important stage in the southwest agricultural region. Winter cereals typically reach peak canopy development and early senescence begins, while canola and lupins exhibit divergent canopy trajectories. During this period, wheat and barley maintain high but subtly declining greenness, while canola displays partial senescence associated with flowering completion and pod development, and lupins sustain high canopy vigour. These contrasting dynamics create maximum inter-class spectral and structural divergence, which the model exploits for classification (Figure 3, Figure 4 and Figure 5). Overall, the SHAP analysis establishes late August to early September as the most informative period for in-season crop classification.

3.4. External Test Data

To distinguish reference-product agreement from field-verified crop identification accuracy, external validation was conducted using an independent field-based test dataset derived from crop disease-surveillance records collected between 2020 and 2024. This dataset was not used during model training and provides an independent evaluation of model performance under real-world conditions, including variation in climate, management practices, and spatial distribution of crop types. The classification model was validated for only three major crop types (Wheat, Barley and Canola) due to the data limitations on other crop types.

Using VI data truncated at the end of August, the XGBoost classification model achieved consistently strong agreement with the external test data. Canola was classified with high accuracy, achieving an F1-score of 0.91, with balanced precision (0.93) and recall (0.90). Wheat also demonstrated robust performance, with an F1-score of 0.85, reflecting reliable detection across seasons despite residual confusion with barley. Barley exhibited lower performance relative to other crops, with an F1-score of 0.65, driven primarily by misclassification as wheat. These results are consistent with the error patterns observed in the cross-validation experiments and reflect the inherent spectral and phenological similarity between the two cereal crops.

The confusion matrix for the external test-set highlights that most classification errors occurred among cereal classes, while confusion between broadleaf crops and cereals was minimal (Table 5). Canola paddocks were rarely misclassified as wheat or barley, reinforcing the stability of its phenological signature across seasons. The persistence of wheat–barley confusion in the independent test-set indicates that this limitation is systematic rather than specific to the dataset in this study, further supporting the interpretation that optical VIs alone provide limited separability for these crops during certain growth stages.

Overall, the external validation showed strong agreement between the model-derived classifications and the observed field records across multiple seasons and variable climatic conditions. Canola achieved the highest and most stable accuracy, reflecting its distinct phenological trajectory and clear separability across years. Wheat also performed well, although its accuracy was slightly lower than canola due to residual confusion with barley consistent with the overlapping spectral and phenological characteristics observed during mid-season. Barley showed comparatively lower agreement with field records, again reflecting systematic confusion with wheat rather than year-specific effects. The results support the temporal patterns observed in LOYOCV analysis for the major three crop types.

4. Discussion

This study demonstrates that accurate, in-season crop classification across a large and heterogeneous agricultural region is achievable using open-access Sentinel-2 data, multi-temporal VIs, and transparent machine learning methods. Here, we have integrated phenology-aware modelling with explainable AI which delivers high classification accuracy and provides clear insight into when and why crop separability is maximised. The results show that late winter, specifically late August to early September, represents the most informative temporal window for reliable in-season crop discrimination in the southwest agricultural region of WA.

4.1. Phenological Timing as the Primary Driver of In-Season Accuracy

The strong dependence of classification performance on temporal coverage highlights the central role of crop phenology in remote sensing-based crop mapping. Both classification approaches exhibited steady improvements in accuracy as additional time windows of the growing season were included, with a marked inflection point once August data were incorporated. This finding aligns with established literature indicating that mid- to late-season phenological stages contain the greatest spectral separability among crops, as differences in canopy structure, chlorophyll dynamics, and senescence processes become more pronounced [74,75].

The SHAP analysis provides a mechanistic explanation for this temporal sensitivity, demonstrating that VIs centred on late August dominate model predictions. This period corresponds to peak canopy development for winter cereals and the onset of divergent senescence trajectories among wheat, barley, canola, and lupins. Results also point to the limited contribution of post-maturity signals indicating that extending classification into late spring leads to diminishing returns, reinforcing the value of targeting phenologically informative windows rather than maximising data volume.

The high accuracies achieved by the October models (90–93%) indicate strong agreement between the predicted crop classes and the reference crop labels under the LOYOCV framework, supporting the temporal transferability of the classification models across different growing seasons. The performance decline observed in the 2021 season highlights the sensitivity of spectral signatures to extreme climatic conditions, where excessive rainfall likely altered the expected phenological reflectance or impacted satellite data availability due to cloud cover [76]. Despite this, the ability of both models to achieve approximately 90% accuracy by August suggests that in-season classification is viable for most crop types well before harvest.

Together, these results identify late August to early September as the most stable and phenologically informative period for in-season crop discrimination in the southwest agricultural region of WA. This window captures maximum divergence in crop growth trajectories while still providing sufficient lead time for operational applications such as yield forecasting, disease surveillance, and regional production assessment.

4.2. Crop-Specific Performance and Sources of Uncertainty

Classification performance varies systematically among crop types, reflecting known agronomic and phenological similarities. Canola was consistently well classified across all temporal windows, including early- to mid-season periods, due to its distinctive growth pattern, earlier canopy closure, and extended flowering phase [7,77]. This robustness suggests strong potential for early in-season detection of canola, which is particularly valuable for yield forecasting, disease surveillance, and market intelligence [73,78,79].

In contrast, wheat and barley exhibited persistent confusion across models and validation scenarios. This limitation reflects the substantial overlap in their canopy architecture, growth timing, and spectral signatures, especially during the vegetative and early reproductive stages. The persistence of wheat–barley confusion under independent external validation confirms that this challenge is structural rather than bound to the study dataset. Similar limitations have been widely reported in optical remote sensing studies and highlight the constraints of relying solely on multispectral VIs for cereal discrimination [4,80]. Addressing this challenge will likely require integrating additional information sources, such as shortwave infrared bands, radar backscatter, or ancillary agronomic data.

Minor classes such as lupins, pasture, and fallow displayed more variable but generally interpretable performance. Lupins benefited from their delayed phenology and higher peak VI values, enabling reliable classification once sufficient mid-season data were available. Pasture and fallow exhibited distinct temporal signatures but occasionally overlapped with cropped paddocks under conditions of sparse canopy cover or stress, underscoring the influence of seasonal variability on class separability. Overall, crop-specific results indicate that the main residual errors were associated with spectrally and phenologically similar cereal crops and with seasonal variability affecting minor or more heterogeneous classes.

4.3. Implications for Operational Crop Mapping

The identification of a stable, phenologically meaningful in-season window has direct implications for operational agricultural monitoring. The ability to generate reliable crop maps by late August provides a practical lead time for applications such as yield forecasting, biosecurity monitoring, and regional production assessments. Importantly, the consistency of this window across contrasting seasonal conditions suggests that the framework is resilient to inter-annual climate variability, a critical requirement for deployment in Mediterranean cropping systems as the southwest of WA.

Beyond immediate seasonal applications, reliable crop type mapping enables the development of longer-term temporal records of cropping patterns [35]. Multi-year crop maps can support monitoring of crop rotations, changes in cropping intensity, and the expansion or contraction of certain crop types across regions [15]. The availability of consistent crop type information across multiple seasons further supports risk management and policy development, including monitoring shifts in land use, identifying emerging production zones, and assessing regional exposure to pests, diseases, or extreme climate events. For example, multi-year crop distribution datasets can help track host crop availability for disease modelling, support biosecurity surveillance, and improve the targeting of extension and advisory services.

The use of explainable AI further supports the operational relevance of the framework. The approach supports transparency, validation, and stakeholder confidence by explicitly linking model predictions to specific VIs and time periods. This interpretability is particularly important in public-sector and research contexts, where black-box models can limit trust and uptake despite strong predictive performance [81,82].

4.4. Limitations and Future Directions

Several limitations should be acknowledged, while also providing clear directions for future work. First, the reliance on optical VIs may constrain the separability of crops with similar phenological behaviour, most notably wheat and barley. Furthermore, the use of 10-day median composites with linear interpolation may smooth short-term phenological variation in the reconstructed time series. Future work could address these issues by incorporating complementary data sources such as Sentinel-1 synthetic aperture radar (SAR), shortwave infrared bands, soil type, and thermal time accumulation. Sentinel-1 SAR would be particularly useful because it is less affected by cloud cover than optical imagery and can provide additional information on crop structure and moisture conditions, thereby reducing reliance on optical gap-filling while improving discrimination among crops with similar spectral and phenological behaviour. Future studies should also assess the sensitivity of classification performance to alternative gap-filling and time-series reconstruction approaches, including different compositing windows, non-interpolated time series, Savitzky–Golay filtering [83], and harmonic smoothing [84].

Second, supervised training and validation were conducted using an operational crop-mapping product rather than field-level ground observations, introducing potential uncertainty in class labels. While this approach is pragmatic for large-scale studies and was mitigated through multi-year cross-validation and external accuracy testing, it remains a source of uncertainty. Future work should therefore prioritise expanded field-based validation datasets across additional crops, seasons, and regions to better quantify label uncertainty and strengthen confidence in crop-specific accuracy estimates. Importantly, although the training labels are licenced, the satellite data, feature-generation workflow, and modelling framework are based entirely on open-access data and tools.

Future work could also explore newer representation-learning products such as Geospatial Embedding Models (AlphaEarth and BetaEarth), which provide a unified representation of the terrestrial land surface by integrating diverse data sources, including optical imagery, radar, terrain, and climate information [85,86]. In practical terms, these embeddings could be extracted and aggregated to paddock scale and combined with the existing vegetation-index features before classification. Their expected benefit would be to provide additional landscape, structural, and environmental context that may improve discrimination among spectrally similar crops, particularly wheat and barley, and support more robust mapping of under-represented classes. However, their use should be evaluated carefully through staged experiments, such as VIs only and VIs plus embeddings, because embedding dimensions are not directly interpretable and annual embedding products may need to be assessed for compatibility with in-season prediction.

From a modelling perspective, future work could improve performance for under-represented crops such as lupins and fallow by combining class-imbalance mitigation strategies with more advanced temporal modelling approaches. For example, targeted sampling, class weighting, or additional field observations could be used to improve representation of minority classes, while recurrent and sequence-learning architectures such as BiLSTM, attention-based LSTM, temporal masking, spatial–temporal integration, and broader tuning of hidden units, layer depth, dropout, and learning rate could improve the model’s ability to capture complex crop phenological trajectories under variable seasonal conditions. In this study, the LSTM model was used as a literature-guided temporal deep-learning baseline, and these extensions provide a clear pathway for further model development.

Extending the framework across additional growing seasons and regions would further strengthen generalisability and support the development of nationally consistent in-season crop-mapping products. Incorporating spatial stratification or climate-zone-specific parameterisation by calibrating models separately for the northern, central, and southern grainbelt regions of WA could also enable prediction windows to adjust across climatic gradients, improving regional applicability. While this study focuses on the southwest agricultural region of WA, the findings are relevant to other Mediterranean and temperate cropping systems characterised by winter-dominant rainfall and broadacre cereal production. The emphasis on phenology-aware modelling, temporal validation, and explainability provides a transferable template for developing transparent, in-season crop monitoring systems using open-access satellite data.

Looking ahead, the insights from this study suggest clear opportunities for developing an in-season crop monitoring pipeline across a larger extent of southwest WA. Since canola accuracy remains high even in early growth stages, future applications could focus on translating these models into real-time, near-real-time, or early-season mapping tools for operational decision making. Such systems could support disease surveillance, nutrient management, and harvest planning, where early and accurate detection of crop type enables timely intervention. By coupling explainable AI models with continuous Earth Observation data streams, future research can advance toward dynamic in-season crop mapping and improve regional-scale agronomic decision-support systems for public use.

5. Conclusions

This study shows that in-season crop classification in the southwest agricultural region of WA can achieve strong performance before harvest when multi-temporal Sentinel-2 vegetation-index data are available through August. Across five growing seasons, the analysis showed a clear late-winter improvement in class separability, and SHAP analysis indicated that vegetation-index observations centred on late August to early September contributed most strongly to model predictions. Independent field-based testing provided additional validation for the three externally observed crops: canola, wheat, and barley. These results supported the main temporal pattern observed in the LOYOCV analysis, with canola separable earlier and more consistently than the cereal classes, whereas wheat–barley confusion remained the dominant residual limitation.

The framework demonstrated strong temporal transferability under contrasting seasonal conditions when evaluated against the reference dataset, and field-based testing further supported its utility for the three major externally observed crops. The results identify a biologically grounded decision window for operational in-season mapping and provide a multi-year benchmark for testing temporal transferability in Mediterranean broadacre systems. The framework is most immediately useful as a reproducible Sentinel-2-based workflow for regional crop monitoring, while future work should focus on expanding field-based validation across all crop classes, reducing cereal confusion, testing preprocessing sensitivity, and integrating complementary data sources where earlier or more robust discrimination is required.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18101653/s1, Table S1: Sentinel 2 composite days and acquisition date for year between 2020–2024, Table S2: Vegetation indices with the formula used as the features in the model.

Author Contributions

Conceptualisation, S.S., R.P. and H.E.; methodology, S.S.; software, S.S.; validation, S.S.; formal analysis, S.S. and R.P.; investigation, S.S.; resources, S.S., R.P. and H.E.; data curation, S.S.; writing—original draft preparation, S.S. and R.P.; writing—review and editing S.S., R.P., H.E., B.S. and H.T.; visualisation, S.S., R.P. and H.E.; supervision, B.S. and H.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available because the crop data are proprietary and provided as a commercial product.

Acknowledgments

Authors would like to acknowledge the Department of Primary Industries and Regional Development (DPIRD) and Grains Research and Development Cooperation (GRDC) funding project DAW1907-002RTX, DAW2104-003RTX and DAW2404-005RTX and the team for providing the valuable dataset used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

GIWA Crop Report—February 2025|Grain Industry Association of Western Australia. Available online: https://www.giwa.org.au/wa-crop-reports/2024-season/giwa-crop-report-february-2025/ (accessed on 4 January 2026).
Heupel, K.; Spengler, D.; Itzerott, S. A Progressive Crop-Type Classification Using Multitemporal Remote Sensing Data and Phenological Information. PFG—J. Photogramm. Remote Sens. Geoinf. Sci. 2018, 86, 53–69. [Google Scholar] [CrossRef]
Lira Melo de Oliveira Santos, C.; Augusto Camargo Lamparelli, R.; Kelly Dantas Araújo Figueiredo, G.; Dupuy, S.; Boury, J.; Luciano, A.C.d.S.; Torres, R.d.S.; le Maire, G. Classification of Crops, Pastures, and Tree Plantations along the Season with Multi-Sensor Image Time Series in a Subtropical Agricultural Region. Remote Sens. 2019, 11, 334. [Google Scholar] [CrossRef]
Faqe Ibrahim, G.R.; Rasul, A.; Abdullah, H. Improving Crop Classification Accuracy with Integrated Sentinel-1 and Sentinel-2 Data: A Case Study of Barley and Wheat. J. Geovis. Spat. Anal. 2023, 7, 22. [Google Scholar] [CrossRef]
Rose, D.C.; Chilvers, J. Agriculture 4.0: Broadening Responsible Innovation in an Era of Smart Farming. Front. Sustain. Food Syst. 2018, 2, 87. [Google Scholar] [CrossRef]
Santos Valle, S.; Kienzle, J. Agriculture 4.0—Agricultural Robotics and Automated Equipment for Sustainable Crop Production. In Integrated Crop Management, 1st ed.; FAO: Rome, Italy, 2020. [Google Scholar]
Zhang, C.; Di, L.; Lin, L.; Li, H.; Guo, L.; Yang, Z.; Yu, E.G.; Di, Y.; Yang, A. Towards Automation of In-Season Crop Type Mapping Using Spatiotemporal Crop Information and Remote Sensing Data. Agric. Syst. 2022, 201, 103462. [Google Scholar] [CrossRef]
Zhai, Z.; Martínez, J.F.; Beltran, V.; Martínez, N.L. Decision Support Systems for Agriculture 4.0: Survey and Challenges. Comput. Electron. Agric. 2020, 170, 105256. [Google Scholar] [CrossRef]
Verdouw, C.; Tekinerdogan, B.; Beulens, A.; Wolfert, S. Digital Twins in Smart Farming. Agric. Syst. 2021, 189, 103046. [Google Scholar] [CrossRef]
Boryan, C.; Yang, Z.; Mueller, R.; Craig, M. Monitoring US Agriculture: The US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 2011, 26, 341–358. [Google Scholar] [CrossRef]
Boryan, C.G.; Yang, Z.; Di, L.; Hunt, K. A New Automatic Stratification Method for U.S. Agricultural Area Sampling Frame Construction Based on the Cropland Data Layer. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 4317–4327. [Google Scholar] [CrossRef]
Boryan, C.G.; Yang, Z.; Willis, P.; Di, L. Developing Crop Specific Area Frame Stratifications Based on Geospatial Crop Frequency and Cultivation Data Layers. J. Integr. Agric. 2017, 16, 312–323. [Google Scholar] [CrossRef]
Fisette, T.; Davidson, A.; Daneshfar, B.; Rollin, P.; Aly, Z.; Campbell, L. Annual Space-Based Crop Inventory for Canada: 2009–2014. In Proceedings of the 2014 IEEE Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2014; pp. 5095–5098. [Google Scholar]
CLMS High Resolution Layer Croplands. Available online: https://land.copernicus.eu/en/products/high-resolution-layer-croplands (accessed on 16 January 2026).
Zhang, C.; Kerner, H.; Wang, S.; Hao, P.; Li, Z.; Hunt, K.A.; Abernethy, J.; Zhao, H.; Gao, F.; Di, L.; et al. Remote Sensing for Crop Mapping: A Perspective on Current and Future Crop-Specific Land Cover Data Products. Remote Sens. Environ. 2025, 330, 114995. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Orynbaikyzy, A.; Gessner, U.; Conrad, C.; Orynbaikyzy, A.; Gessner, U.; Conrad, C. Spatial Transferability of Random Forest Models for Crop Type Classification Using Sentinel-1 and Sentinel-2. Remote Sens. 2022, 14, 1493. [Google Scholar] [CrossRef]
Felegari, S.; Sharifi, A.; Moravej, K.; Amin, M.; Golchin, A.; Muzirafuti, A.; Tariq, A.; Zhao, N.; Felegari, S.; Sharifi, A.; et al. Integration of Sentinel 1 and Sentinel 2 Satellite Images for Crop Mapping. Appl. Sci. 2021, 11, 10104. [Google Scholar] [CrossRef]
Whitcraft, A.K.; Becker-Reshef, I.; Killough, B.D.; Justice, C.O.; Whitcraft, A.K.; Becker-Reshef, I.; Killough, B.D.; Justice, C.O. Meeting Earth Observation Requirements for Global Agricultural Monitoring: An Evaluation of the Revisit Capabilities of Current and Planned Moderate Resolution Optical Earth Observing Missions. Remote Sens. 2015, 7, 1482–1503. [Google Scholar] [CrossRef]
Defourny, P.; Bontemps, S.; Bellemans, N.; Cara, C.; Dedieu, G.; Guzzonato, E.; Hagolle, O.; Inglada, J.; Nicola, L.; Rabaute, T.; et al. Near Real-Time Agriculture Monitoring at National Scale at Parcel Resolution: Performance Assessment of the Sen2-Agri Automated System in Various Cropping Systems around the World. Remote Sens. Environ. 2019, 221, 551–568. [Google Scholar] [CrossRef]
Sharma, S.; Ryu, D.; C, S.K.; Lee, S.-G.; Jeong, S. Synergistic Use of Sentinel-1 and Sentinel-2 Images for in-Season Crop Type Classification Using Google Earth Engine and Machine Learning. In Proceedings of the IGARSS 2023—2023 IEEE International Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2023; pp. 3498–3501. [Google Scholar]
Yao, J.; Wu, J.; Xiao, C.; Zhang, Z.; Li, J.; Yao, J.; Wu, J.; Xiao, C.; Zhang, Z.; Li, J. The Classification Method Study of Crops Remote Sensing with Deep Learning, Machine Learning, and Google Earth Engine. Remote Sens. 2022, 14, 2758. [Google Scholar] [CrossRef]
Yang, C.; Everitt, J.H.; Murden, D. Evaluating High Resolution SPOT 5 Satellite Imagery for Crop Identification. Comput. Electron. Agric. 2011, 75, 347–354. [Google Scholar] [CrossRef]
Van Niel, T.G.; McVicar, T.R. Determining Temporal Windows for Crop Discrimination with Remote Sensing: A Case Study in South-Eastern Australia. Comput. Electron. Agric. 2004, 45, 91–108. [Google Scholar] [CrossRef]
Cai, Y.; Guan, K.; Peng, J.; Wang, S.; Seifert, C.; Wardlow, B.; Li, Z. A High-Performance and in-Season Classification System of Field-Level Crop Types Using Time-Series Landsat Data and a Machine Learning Approach. Remote Sens. Environ. 2018, 210, 35–47. [Google Scholar] [CrossRef]
Kumar, P.; Gupta, D.K.; Mishra, V.N.; Prasad, R. Comparison of Support Vector Machine, Artificial Neural Network, and Spectral Angle Mapper Algorithms for Crop Classification Using LISS IV Data. Int. J. Remote Sens. 2015, 36, 1604–1617. [Google Scholar] [CrossRef]
Maleki, S.; Baghdadi, N.; Najem, S.; Dantas, C.F.; Bazzi, H.; Ienco, D. Determining Effective Temporal Windows for Rapeseed Detection Using Sentinel-1 Time Series and Machine Learning Algorithms. Remote Sens. 2024, 16, 549. [Google Scholar] [CrossRef]
Shen, J.; Wang, H.; Tareque, H. Toward Resilience in Broadacre Agriculture: A Methodological Review of Remote Sensing in Crop Productivity, Phenology, and Environmental Stress Detection. Remote Sens. 2025, 17, 3886. [Google Scholar] [CrossRef]
Potgieter, A.B.; Zhao, Y.; Zarco-Tejada, P.J.; Chenu, K.; Zhang, Y.; Porker, K.; Biddulph, B.; Dang, Y.P.; Neale, T.; Roosta, F.; et al. Evolution and Application of Digital Technologies to Predict Crop Type and Crop Phenology in Agriculture. Silico Plants 2021, 3, diab017. [Google Scholar] [CrossRef]
Wei, P.; Ye, H.; Qiao, S.; Liu, R.; Nie, C.; Zhang, B.; Song, L.; Huang, S. Early Crop Mapping Based on Sentinel-2 Time-Series Data and the Random Forest Algorithm. Remote Sens. 2023, 15, 3212. [Google Scholar] [CrossRef]
Zhong, L.; Hu, L.; Zhou, H. Deep Learning Based Multi-Temporal Crop Classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
Konduri, V.S.; Kumar, J.; Hargrove, W.W.; Hoffman, F.M.; Ganguly, A.R. Mapping Crops within the Growing Season across the United States. Remote Sens. Environ. 2020, 251, 112048. [Google Scholar] [CrossRef]
Mahlayeye, M.; Darvishzadeh, R.; Nelson, A. Characterising Maize and Intercropped Maize Spectral Signatures for Cropping Pattern Classification. Int. J. Appl. Earth Obs. Geoinf. 2024, 128, 103699. [Google Scholar] [CrossRef]
Yu, X.; Yin, Q.; Qian, L.; Zhang, C.; Shao, L.; Ran, D.; Wang, W.; Zhang, B.; Hu, X. Cross-Scale Soil Moisture Content Monitoring of Winter Wheat by Integrating UAV and Sentinel-1/2 Data. Agric. Water Manag. 2025, 320, 109831. [Google Scholar] [CrossRef]
Som-ard, J.; Hossain, M.D.; Keawsomsee, S.; Suwanlee, S.R.; Veerachitt, V.; Heawchaiyaphum, P.; Puntura, A.; Izquierdo-Verdiguier, E.; Immitzer, M.; Atzberger, C. Integrating Multi-Temporal Satellite Data and Machine Learning Approaches for Crop Rotation Pattern Mapping in Thailand. Remote Sens. 2025, 17, 3156. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Rußwurm, M.; Körner, M. Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders. ISPRS Int. J. Geo-Inf. 2018, 7, 129. [Google Scholar] [CrossRef]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Pelletier, C.; Webb, G.I.; Petitjean, F. Deep Learning for the Classification of Sentinel-2 Image Time Series. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium; IEEE: New York, NY, USA, 2019; pp. 461–464. [Google Scholar]
Lyu, H.; Lu, H.; Mou, L. Learning a Transferable Change Rule from a Recurrent Neural Network for Land Cover Change Detection. Remote Sens. 2016, 8, 506. [Google Scholar] [CrossRef]
Zhao, H.; Duan, S.; Liu, J.; Sun, L.; Reymondin, L. Evaluation of Five Deep Learning Models for Crop Type Mapping Using Sentinel-2 Time Series Images with Missing Information. Remote Sens. 2021, 13, 2790. [Google Scholar] [CrossRef]
Chen, B.; Zheng, H.; Wang, L.; Hellwich, O.; Chen, C.; Yang, L.; Liu, T.; Luo, G.; Bao, A.; Chen, X. A Joint Learning Im-BiLSTM Model for Incomplete Time-Series Sentinel-2A Data Imputation and Crop Classification. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102762. [Google Scholar] [CrossRef]
Feng, F.; Gao, M.; Liu, R.; Yao, S.; Yang, G. A Deep Learning Framework for Crop Mapping with Reconstructed Sentinel-2 Time Series Images. Comput. Electron. Agric. 2023, 213, 108227. [Google Scholar] [CrossRef]
Durrani, A.u.R.; Minallah, N.; Aziz, N.; Frnda, J.; Khan, W.; Nedoma, J. Effect of Hyper-Parameters on the Performance of ConvLSTM Based Deep Neural Network in Crop Classification. PLoS ONE 2023, 18, e0275653. [Google Scholar] [CrossRef]
Lawes, R.; Mata, G.; Richetti, J.; Fletcher, A.; Herrmann, C. Using Remote Sensing, Process-Based Crop Models, and Machine Learning to Evaluate Crop Rotations across 20 Million Hectares in Western Australia. Agron. Sustain. Dev. 2022, 42, 120. [Google Scholar] [CrossRef]
Bureau of Meteorology (BOM). Climate Data Online; Australian Government: Melbourne, Australia. Available online: https://www.bom.gov.au/climate/data/ (accessed on 25 December 2025).
McArthur, W.; Australian Society of Soil Science; Branch, W.A.; Department of Agriculture and Food. Reference Soils of South-Western Australia; Department of Primary Industries and Regional Development: Perth, Australia, 2004.
Waddell, P.-J.; Galloway, P. Land Systems, Soils and Vegetation of the Southern Goldfields and Great Western Woodlands of Western Australia; Technical Bulletins; Department of Primary Industries and Regional Development: Perth, Australia, 2023; Volume 2.
Kingwell, R.; Rice, A.; Pratley, J.; Mayfield, A.; van Rees, H. Farms and Farmers—Conservation Agriculture amid a Changing Farm Sector. In Australian Agriculture in 2020: From Conservation to Automation; Agronomy Australia and Charles Sturt University: Wagga Wagga, Australia, 2019; p. 33. [Google Scholar]
Australian Bureau of Agricultural and Resource Economics and Sciences (ABARES). Land Use of Australia 2021–2022 Dataset. Available online: https://www.abs.gov.au/statistics/environment/environmental-accounts/national-land-account-experimental-estimates/latest-release (accessed on 15 December 2025).
Li, S. Modeling the Impacts of Compound Dry and Hot Extremes on Australia’s Wheat. Ph.D. Thesis, University of Technology Sydney (Australia), Sydney, Australia, 2024. [Google Scholar]
Malik, R.S.; Seymour, M.; French, R.J.; Kirkegaard, J.A.; Lawes, R.A.; Liebig, M.A. Dynamic Crop Sequencing in Western Australian Cropping Systems. Crop Pasture Sci. 2015, 66, 594–609. [Google Scholar] [CrossRef]
Walton, G.; Mendham, N.; Robertson, M.; Potter, T. Phenology, Physiology and Agronomy. In Canola in Australia: The First Thirty Years; NSW Department of Primary Industries: Orange, Australia, 1999; pp. 9–14. [Google Scholar]
McBeath, T.M.; Meier, E.A.; Ware, A.; Kirkegaard, J.; Moodie, M.; Davoren, B.; Hunt, E. Agronomic Management Combining Early-Sowing on Establishment Opportunities, Cultivar Options and Adequate Nitrogen Is Critical for Canola (Brassica napus) Productivity and Profit in Low-Rainfall Environments. Crop Pasture Sci. 2020, 71, 807–821. [Google Scholar] [CrossRef]
Coppa, I.P.M. Use of Remote Sensing Data for Broad Acre Grain Crop Monitoring in Southeast Australia. Ph.D. Thesis, RMIT University, Melbourne, Australia, 2006. [Google Scholar]
Chen, C.; Fletcher, A.; Lawes, R.; Berger, J.; Robertson, M. Modelling Phenological and Agronomic Adaptation Options for Narrow-Leafed Lupins in the Southern Grainbelt of Western Australia. Eur. J. Agron. 2017, 89, 140–147. [Google Scholar] [CrossRef]
Fowler, J.; Waldner, F.; Hochman, Z. All Pixels Are Useful, but Some Are More Useful: Efficient in Situ Data Collection for Crop-Type Mapping Using Sequential Exploration Methods. Int. J. Appl. Earth Obs. Geoinf. 2020, 91, 102114. [Google Scholar] [CrossRef]
Griffiths, P.; Nendel, C.; Hostert, P. Intra-Annual Reflectance Composites from Sentinel-2 and Landsat for National-Scale Crop and Land Cover Mapping. Remote Sens. Environ. 2019, 220, 135–151. [Google Scholar] [CrossRef]
Tucker, C.J. Red and Photographic Infrared Linear Combinations for Monitoring Vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
Jiang, Z.; Huete, A.R.; Didan, K.; Miura, T. Development of a Two-Band Enhanced Vegetation Index without a Blue Band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Miura, T.; Yoshioka, H.; Fujiwara, K.; Yamamoto, H. Inter-Comparison of ASTER and MODIS Surface Reflectance and Vegetation Index Products for Synergistic Applications to Natural Resource Monitoring. Sensors 2008, 8, 2480–2499. [Google Scholar] [CrossRef]
Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Clevers, J.G.P.W.; Gitelson, A.A. Remote Estimation of Crop and Grass Chlorophyll and Nitrogen Content Using Red-Edge Bands on Sentinel-2 and -3. Int. J. Appl. Earth Obs. Geoinf. 2013, 23, 344–351. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant Remote Sensing Vegetation Indices: A Review of Developments and Applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
He, B.; Zhu, W.; Zhao, C.; Xie, Z.; Zhuang, H. A Novel Index for Directly Indicating Fractional Vegetation Cover Based on Spectral Differences between Vegetation and Soil. Remote Sens. Environ. 2025, 331, 115056. [Google Scholar] [CrossRef]
Ashourloo, D.; Nematollahi, H.; Huete, A.; Aghighi, H.; Azadbakht, M.; Shahrabi, H.S.; Goodarzdashti, S. A New Phenology-Based Method for Mapping Wheat and Barley Using Time-Series of Sentinel-2 Images. Remote Sens. Environ. 2022, 280, 113206. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Liaghat, A.-M.; Rabbaniha, H.; Kaviani, A. Comparison and Hyperparameter Optimization of Ensemble Learning Models for Soil Moisture Prediction Using Remote Sensing Data. SSRN 2025. [Google Scholar] [CrossRef]
Sher, M.; Minallah, N.; Ahmad, T.; Khan, W. Hyperparameters Analysis of Long Short-Term Memory Architecture for Crop Classification. Int. J. Electr. Comput. Eng. IJECE 2023, 13, 4661–4670. [Google Scholar] [CrossRef]
Huber, F.; Yushchenko, A.; Stratmann, B.; Steinhage, V. Extreme Gradient Boosting for Yield Estimation Compared with Deep Learning Approaches. Comput. Electron. Agric. 2022, 202, 107346. [Google Scholar] [CrossRef]
Thenkabail, P.S.; Lyon, J.G.; Huete, A. Advances in Hyperspectral Remote Sensing of Vegetation and Agricultural Crops. In Fundamentals, Sensor Systems, Spectral Libraries, and Data Mining for Vegetation; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
Nguyen, L.H.; Robinson, S.; Galpern, P. Medium-Resolution Multispectral Satellite Imagery in Precision Agriculture: Mapping Precision Canola (Brassica napus L.) Yield Using Sentinel-2 Time Series. Precis. Agric. 2022, 23, 1051–1071. [Google Scholar] [CrossRef]
Yang, Z.; Diao, C.; Gao, F. Towards Scalable Within-Season Crop Mapping with Phenology Normalization and Deep Learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 1390–1402. [Google Scholar] [CrossRef]
Elders, A.; Carroll, M.L.; Neigh, C.S.R.; D’Agostino, A.L.; Ksoll, C.; Wooten, M.R.; Brown, M.E. Estimating Crop Type and Yield of Small Holder Fields in Burkina Faso Using Multi-Day Sentinel-2. Remote Sens. Appl. Soc. Environ. 2022, 27, 100820. [Google Scholar] [CrossRef]
Nolasco, M.; Balzarini, M. Assessment of Temporal Aggregation of Sentinel-2 Images on Seasonal Land Cover Mapping and Its Impact on Landscape Metrics. Environ. Monit. Assess. 2025, 197, 142. [Google Scholar] [CrossRef]
Liu, T.; Li, P.; Zhao, F.; Liu, J.; Meng, R. Early-Stage Mapping of Winter Canola by Combining Sentinel-1 and Sentinel-2 Data in Jianghan Plain China. Remote Sens. 2024, 16, 3197. [Google Scholar] [CrossRef]
Rai, N.; Pathak, H.; Mahecha, M.V.; Buckmaster, D.R.; Huang, Y.; Overby, P.; Sun, X. A Case Study on Canola (Brassica napus L.) Potential Yield Prediction Using Remote Sensing Imagery and Advanced Data Analytics. Smart Agric. Technol. 2024, 9, 100698. [Google Scholar] [CrossRef]
Cao, F.; Liu, F.; Guo, H.; Kong, W.; Zhang, C.; He, Y. Fast Detection of Sclerotinia Sclerotiorum on Oilseed Rape Leaves Using Low-Altitude Remote Sensing Technology. Sensors 2018, 18, 4464. [Google Scholar] [CrossRef]
Gerstmann, H.; Möller, M.; Gläßer, C. Optimization of Spectral Indices and Long-Term Separability Analysis for Classification of Cereal Crops Using Multi-Spectral RapidEye Imagery. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 115–125. [Google Scholar] [CrossRef]
Höhl, A.; Obadic, I.; Fernández-Torres, M.-Á.; Najjar, H.; Oliveira, D.A.B.; Akata, Z.; Dengel, A.; Zhu, X.X. Opening the Black Box: A systematic review on explainable artificial intelligence in remote sensing. IEEE Geosci. Remote Sens. Mag. 2024, 12, 261–304. [Google Scholar] [CrossRef]
Cartolano, A.; Cuzzocrea, A.; Pilato, G. Analyzing and Assessing Explainable AI Models for Smart Agriculture Environments. Multimed. Tools Appl. 2024, 83, 37225–37246. [Google Scholar] [CrossRef]
Chen, J.; Jönsson, P.; Tamura, M.; Gu, Z.; Matsushita, B.; Eklundh, L. A Simple Method for Reconstructing a High-Quality NDVI Time-Series Data Set Based on the Savitzky–Golay Filter. Remote Sens. Environ. 2004, 91, 332–344. [Google Scholar] [CrossRef]
Zhou, J.; Jia, L.; Menenti, M. Reconstruction of Global MODIS NDVI Time Series: Performance of Harmonic ANalysis of Time Series (HANTS). Remote Sens. Environ. 2015, 163, 217–228. [Google Scholar] [CrossRef]
Brown, C.F.; Kazmierski, M.R.; Pasquarella, V.J.; Rucklidge, W.J.; Samsikova, M.; Zhang, C.; Shelhamer, E.; Lahera, E.; Wiles, O.; Ilyushchenko, S.; et al. AlphaEarth Foundations: An Embedding Field Model for Accurate and Efficient Global Mapping from Sparse Label Data. arXiv 2025, arXiv:2507.22291. [Google Scholar] [CrossRef]
Fang, J.; Wu, M.; Zhang, Z.; Luo, W. Leveraging AlphaEarth Foundations Embeddings for High-Accuracy County-Scale Corn and Soybean Yield Estimation. TechRxiv 2025. [Google Scholar] [CrossRef]

Figure 1. Location of study area within the southwestern agricultural region of Western Australia (WA).

Figure 2. Overall framework for crop type classification (in-season and post season).

Figure 3. EVI2 seasonal overview of wheat (a), barley (b) and canola (c) at years between 2020 and 2024 along with the standard smooth curve.

Figure 4. NDVI smooth spectral signature averaged across the 2020–2024 years across growing season for target crop types.

Figure 5. VCI smooth spectral signature averaged across the 2020–2024 years across growing season for target crop types.

Figure 6. Heatmap showing class-specific F1-scores from the LSTM model for each held-out year under the LOYOCV framework.

Figure 7. Comparison of error rates for XGBoost and LSTM across different growing-season observation windows.

Figure 8. Confusion matrices from final XGBoost models trained using different observation periods: April–October, (a); April–August, (b). Models were trained using the full 2020–2024 reference-labelled dataset, and the accompanying tables show class-specific agreement metrics for each crop type.

Figure 9. SHAP score showing the most influential feature in the XGBoost model trained at different intervals.

Table 1. Typical dates for major phenological stages of the barley, canola, lupin and wheat in the southwest agricultural region [51,52,53,54,55,56]. Dates indicate the range of reported values.

Crops	Sowing	Flowering	Maturity	Harvest
Barley	Late April–Mid June	August–September	September–October	October–November
Canola	Early April–Late May	July–August	August–September	October–November
Lupins	Mid April–Late May	Late July–September	September–October	October–November
Wheat	Mid April–Early June	August–September	September–October	October–December

Table 2. Accuracy metrices used for machine learning model evaluation.

Metric	Formula	Description
Accuracy	$\frac{T P + T N}{T P + T N + F P + F N}$	Indicates the percentage of true predictions among all predictions
Precision	$\frac{T P}{T P + F P}$	Determines how well a model is in predicting positive labels
Recall	$\frac{T P}{T P + F N}$	Determines the percentage of true positive successfully detected by the model
Commission error	$\frac{F P}{F P + T P}$	Classes predicted as one class but belongs to another class
Omission error	$\frac{F N}{F N + T P}$	Classes left out from the correct class
F1-score (F-measure)	$\frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}$	Represent harmonic mean of precision and recall

TP: True positive; TN: true negative; FP: false positive; FN: false negative.

Table 3. Yearly percentage of each crop type used for training from DAS.

Year	Percentage (%)
Year	Barley	Canola	Fallow	Lupins	Pasture	Wheat
2020	11.9	9.2	2.2	4.5	42.7	29.6
2021	13.5	13.5	1.2	4.2	39.4	28.2
2022	11.0	17.1	1.7	3.3	38.3	28.5
2023	10.7	14.7	6.2	2.8	38.6	27.1
2024	14.1	13.2	1.4	3.6	38.4	29.3

Table 4. Overall agreement of XGBoost and LSTM models with the reference dataset across different growing-season observation windows.

Observation Period	Overall Accuracy
Observation Period	XGBoost	LSTM
April–October	0.928	0.909
April–September	0.917	0.903
April–August	0.900	0.883
April–July	0.862	0.824

Table 5. Confusion matrix and the accuracy metrices for the independent field-based test dataset using XGBoost model.

	Predicted
		Canola		Barley		Wheat
Actual	Canola	102		7		1
	Barley	5		49		27
	Wheat	6		13		130
	Total					340
Crops	Accuracy metrics
	F1-score		Precision		Recall
Canola	0.91		0.93		0.90
Wheat	0.85		0.82		0.87
Barley	0.65		0.71		0.61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sharma, S.; Eslick, H.; Pires, R.; Singh, B.; Tareque, H. Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia. Remote Sens. 2026, 18, 1653. https://doi.org/10.3390/rs18101653

AMA Style

Sharma S, Eslick H, Pires R, Singh B, Tareque H. Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia. Remote Sensing. 2026; 18(10):1653. https://doi.org/10.3390/rs18101653

Chicago/Turabian Style

Sharma, Sneha, Harry Eslick, Rodrigo Pires, Balwinder Singh, and Hasnein Tareque. 2026. "Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia" Remote Sensing 18, no. 10: 1653. https://doi.org/10.3390/rs18101653

APA Style

Sharma, S., Eslick, H., Pires, R., Singh, B., & Tareque, H. (2026). Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia. Remote Sensing, 18(10), 1653. https://doi.org/10.3390/rs18101653

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Temporal Sensitivity of In-Season Crop Classification: An Explainable Multi-Year Sentinel-2 Analysis in Western Australia

Highlights

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Labelled Reference Data Sources

2.3. Satellite Data Preparation

Vegetation Indices (VIs) and Feature Engineering

2.4. Classification Model

2.4.1. Extreme Gradient Boosting (XGBoost)

2.4.2. Long Short-Term Memory (LSTM)

2.5. Model Validation and Accuracy Assessment

3. Results

3.1. Crop Distribution

3.2. Seasonal Vegetation Index (VI) Signature and Phenological Behaviour

3.3. In-Season Classification Performance and Temporal Sensitivity

3.3.1. Crop Specific Errors and Confusion Patterns

3.3.2. Explainable AI Analysis of Temporal Importance

3.4. External Test Data

4. Discussion

4.1. Phenological Timing as the Primary Driver of In-Season Accuracy

4.2. Crop-Specific Performance and Sources of Uncertainty

4.3. Implications for Operational Crop Mapping

4.4. Limitations and Future Directions

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI