Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis

Shi, Sensen; Liu, Quanming; Yan, Zhiyuan

doi:10.3390/agriculture16090920

Open AccessArticle

Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis

by

Sensen Shi

,

Quanming Liu

^* and

Zhiyuan Yan

^*

College of Water Conservancy and Civil Engineering, Inner Mongolia Agricultural University, Hohhot 010018, China

^*

Authors to whom correspondence should be addressed.

Agriculture 2026, 16(9), 920; https://doi.org/10.3390/agriculture16090920

Submission received: 12 March 2026 / Revised: 14 April 2026 / Accepted: 14 April 2026 / Published: 22 April 2026

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Fine-scale crop type information is essential for agricultural monitoring, irrigation management, and food security assessment. This study mapped three major crops—wheat, corn, and sunflower—in the Hetao Irrigation District, China, using multi-temporal Sentinel-2 optical imagery and Sentinel-1 SAR observations at the parcel scale. A multi-source feature set, including spectral bands, vegetation and red-edge indices, moisture-related variables, radar backscatter coefficients, and derived radar features, was constructed from the full growing season. An LSTM network was used to learn temporal representations of crop phenological dynamics, and the resulting embeddings were then combined with traditional machine learning classifiers, including Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost), for final classification. The results show that the hybrid framework substantially improves classification performance compared with the corresponding non-LSTM classifiers. Among all tested models, XGBoost + LSTM achieved the best performance, with an overall accuracy of 93.61%, a Kappa coefficient of 91.66%, and a mean IoU of 87.41%. The class-wise F1-scores were 85.61% for wheat, 97.22% for corn, and 87.27% for sunflower. Additional experiments further confirmed the advantages of parcel-based aggregation in improving spatial consistency and reducing mixed-field noise. The proposed framework provides a promising parcel-scale workflow for crop type mapping in fragmented irrigation districts, while its transferability across years and regions still requires further validation.

Keywords:

crop planting structure; multi-source remote sensing; temporal feature learning; LSTM; parcel-scale classification; Hetao Irrigation District

1. Introduction

Farmland mapping based on remote sensing imagery plays a crucial role in agricultural monitoring and management [1]. Accurate crop distribution information supports food security assessment, irrigation planning, and precision agriculture [2,3]. Compared with traditional field surveys, remote sensing provides efficient, objective, and spatially continuous observations, enabling large-scale monitoring of crop dynamics and planting structures [4,5].

With the rapid development of Earth observation missions such as Sentinel-1 and Sentinel-2, agricultural monitoring capabilities have significantly improved [5,6]. Optical multispectral imagery provides rich spectral information related to vegetation biophysical properties, including chlorophyll content, canopy density, and biomass. However, optical observations are frequently affected by cloud cover and atmospheric conditions. Synthetic Aperture Radar (SAR) imagery provides all-weather observation capability and is sensitive to vegetation structure, soil moisture, and canopy water content [7,8]. Therefore, the integration of SAR and optical data has become an effective strategy for improving crop classification accuracy [9,10].

Time-series analysis further enhances crop identification using remote sensing data. Different crops exhibit distinctive phenological patterns throughout the growing season, which can be captured using multi-temporal satellite observations [11,12]. Previous studies have shown that vegetation indices and radar backscatter signals present unique temporal trajectories for different crop types. However, traditional machine learning approaches usually treat multi-temporal observations as independent features, ignoring the temporal dependencies embedded in crop growth processes [13,14,15].

Recent studies have further demonstrated the value of machine learning and deep learning for crop mapping using Sentinel time series, including object-oriented classification with SNIC and conventional classifiers, deep temporal models based on reconstructed Sentinel-2 sequences, and large-scale Sentinel-1/Sentinel-2 crop type classification workflows implemented on cloud platforms [15,16,17,18].

Deep learning models such as Long Short-Term Memory (LSTM) networks can effectively model sequential data and capture temporal relationships [19]. Several studies have successfully applied LSTM models to crop classification based on satellite time series [20,21]. Nevertheless, purely deep learning frameworks typically require large labeled datasets and high computational cost, which limits their applicability in many regional agricultural studies.

Another challenge arises from the spatial unit used for classification. Most existing crop classification studies operate at the pixel level. However, farmland parcels represent the actual agricultural management units. Pixel-based classification often suffers from mixed-pixel effects and salt-and-pepper noise, particularly in fragmented agricultural landscapes and smallholder parcel systems [16,22,23]. These limitations reduce the practical value of classification results for agricultural management and decision-making.

Despite recent advances in crop mapping, several limitations remain in fragmented irrigation districts. First, many studies still rely on pixel-based classification, which is prone to mixed-pixel effects and salt-and-pepper noise in heterogeneous agricultural landscapes. Second, end-to-end deep learning methods often require large and well-distributed labeled datasets and may be less robust when parcel boundaries are narrow or poorly represented at medium spatial resolution. Third, although Sentinel-1 and Sentinel-2 fusion has been widely explored, the temporal mismatch between optical and SAR observations and its influence on parcel-scale crop classification remain insufficiently discussed.

Accordingly, the contribution of this study does not lie in proposing an entirely new standalone algorithm, but in establishing and validating a parcel-scale workflow tailored to fragmented irrigated cropland under limited training samples. This workflow integrates pseudo-parcel construction, temporal alignment of asynchronous optical and SAR observations, and LSTM-based temporal representation learning, and is evaluated through ablation experiments, feature–source comparison, pixel-versus-parcel comparison, and SNIC sensitivity analysis. Therefore, the novelty of this work lies in the problem-oriented integration and systematic validation of the framework in a challenging irrigation-district setting.

2. Materials and Methods

2.1. Study Area

As shown in Figure 1, the Hetao Irrigation District is located on the alluvial plain of the middle and upper reaches of the Yellow River in Inner Mongolia, China. It is a typical irrigated agricultural ecosystem with saline-alkali soil and scattered farmland plots [24,25,26]. The Hetao Irrigation District has a typical temperate continental arid and semi-arid climate. According to the Hetao Irrigation District Management Bureau, the annual average precipitation is approximately 130–180 mm, while the annual average evaporation reaches 2000–2300 mm. The annual average temperature ranges from 6 °C to 8 °C, with a frost-free period of 130–150 days. Wheat, corn, and sunflower are the dominant crops in the district [25]. The crop growth periods of the major crops and the acquisition dates of Sentinel-2 and Sentinel-1 imagery are shown in Figure 2.

2.2. Data Sources and Preprocessing

2.2.1. Sentinel-2 Satellite Imagery

Sentinel-2 Level-2A scenes used in this study were obtained from the Copernicus Data Space Ecosystem and served as the primary optical data source for crop mapping [5]. A summary of the Sentinel-2 data used in this study, including acquisition period, data source, cloud-cover threshold, processing level, tile ID, and number of scenes, is provided in Table 1.

2.2.2. Sentinel-1 Satellite Imagery

Sentinel-1 IW GRD dual-polarization (VV/VH) SAR time-series data were used to complement the optical observations [6]. A summary of the Sentinel-1 data used in this study, including acquisition period, data source, acquisition mode, product type, polarization, and number of scenes, is also provided in Table 1.

2.2.3. Data Acquisition and Preprocessing

This study focused on representative crop planting areas within the Hetao Irrigation District. Multi-temporal Sentinel-2 MSI optical imagery and Sentinel-1 C-band dual-polarization (VV/VH) SAR imagery were obtained from the Copernicus Data Space Ecosystem. Sentinel-2 data were used to characterize crop spectral and biochemical properties, whereas Sentinel-1 data provided complementary information on canopy structure and moisture conditions. The integrated preprocessing workflow of Sentinel-2 and Sentinel-1 data is shown in Figure 3.

The preprocessing workflow consisted of optical preprocessing, SAR preprocessing, temporal alignment, and parcel-level feature integration. For Sentinel-2, Level-2A surface reflectance products from the 2024 growing season were selected by prioritizing scenes with low cloud contamination during key phenological stages. Cloud and shadow pixels were removed using the Scene Classification Layer (SCL). All 20 m bands were resampled to 10 m using cubic convolution to ensure a consistent spatial resolution for feature extraction [27]. Cubic convolution was adopted because it is a widely used resampling approach for continuous remote sensing imagery and provides a practical balance between edge preservation and spatial smoothness [27]. Subsequently, the vegetation and humidity indicators in Table 2 including NDVI, EVI, GNDVI, SAVI, NDWI, as well as the red edge correlation index were calculated and summarized to construct the optical time series dataset within each plot.

For Sentinel-1, IW GRD dual-polarization products were preprocessed in ESA SNAP 12.0 (Sentinel Application Platform, European Space Agency, Paris, France) using orbit correction, thermal noise removal, radiometric calibration to sigma naught (σ⁰), speckle filtering, terrain correction, and co-registration with Sentinel-2 data. Incidence-angle normalization was not applied as an additional independent step. This decision was made because the study area is predominantly flat and all Sentinel-1 scenes were processed using the same workflow, which improved temporal comparability among radar observations. This treatment is consistent with previous SAR crop-monitoring studies conducted in relatively flat agricultural regions, where acquisition geometry effects and local incidence-angle normalization have been discussed in relation to temporal comparison of backscatter signals [28,29]. However, we acknowledge that explicit incidence-angle normalization may be beneficial in more topographically complex regions.

Because Sentinel-1 acquisition dates did not fully coincide with Sentinel-2 observations, linear interpolation was applied to align radar features with the optical time series. Linear interpolation and temporal harmonization have been used in previous crop-mapping studies to reconstruct or align irregular satellite time series before temporal modeling and multi-source fusion [17,30,31]. The resulting VV, VH, and VV/VH ratio features were then integrated at the parcel level, forming a unified parcel–time–feature dataset for downstream classification. However, linear interpolation may smooth short-term fluctuations associated with precipitation events or irrigation pulses, and this limitation should be considered when interpreting the aligned radar time series. Recent evidence also suggests that linear interpolation does not necessarily guarantee substantial classification gains in temporal models; thus, its role in this study should be regarded primarily as a practical temporal alignment strategy rather than a guaranteed source of performance improvement [31].

Feature redundancy was not explicitly removed in the main workflow because some correlated optical and radar variables may still capture complementary crop responses at different phenological stages. Removing them too early could therefore discard temporally informative signals. Nevertheless, we acknowledge that this design choice may increase model complexity and affect interpretability. A dedicated comparison with feature-selection strategies is still needed to evaluate this issue more rigorously.

In this study, a dual-polarization RVI formulation based on Sentinel-1 VV/VH observations was used, following the commonly adopted alternative form 4 × VH/(VV + VH) in dual-pol agricultural applications [32].

2.3. Ground Sample Data

Based on the preprocessed multi-source time-series dataset, parcel-level ground samples were constructed for model training and validation. Crop labels were obtained through a combination of field surveys, visual interpretation of remote sensing imagery, and temporal similarity analysis. First, parcels with clear crop characteristics were identified from imagery acquired during key growth stages. Ground observations were then collected during the 2024 growing season using handheld RTK equipment with an approximate positioning accuracy of 1 m, and crop types were recorded in the field. These field records were subsequently combined with parcels visually interpreted in ArcGIS 10.8.1 (Esri, Redlands, CA, USA), to generate plot-level training and validation data annotated on Sentinel imagery.

The classification unit in this study was the parcel rather than the field survey point. The approximately 1 m accuracy of the RTK measurements refers only to the positional accuracy of the ground observations and does not imply a 1 m comparison with satellite imagery. Instead, the RTK records were used to verify field labels, after which parcel-level statistics were extracted from Sentinel imagery for training and validation. The comparison was therefore conducted between field-validated parcel labels and parcel-aggregated remote sensing features, which helps reduce the scale mismatch between in situ observations and satellite data. The ground samples and their local zoomed-in view are shown in Figure 4.

In total, 3750 parcel-level samples were labeled in 2024, including 1000 wheat samples, 1000 corn samples, and 1750 sunflower samples (Figure 5). The dataset was divided into training and validation subsets using an 8:2 ratio, which provides a practical balance between sufficient training samples for model learning and an independent subset for robust validation. Samples were randomly split at the parcel level to avoid duplication of parcels between the two subsets. However, we acknowledge that random splitting may still retain a degree of spatial autocorrelation among neighboring parcels, and spatially independent validation strategies should therefore be considered in future work.

2.4. Parcel Extraction

After vectorizing the U-Net segmentation outputs [33] and extracting parcel-level temporal features, we found that many segmented units still contained mixed crop types (Figure 6). As shown in Table 3, U-Net’s Boundary IoU is significantly lower than SNIC. Consequently, the resulting temporal profiles deviated substantially from field-consistent phenological patterns. This suggests that parcel adhesion in the U-Net outputs compromised the spatial purity required for reliable time-series feature extraction. Therefore, SNIC-based superpixel segmentation was adopted to generate pseudo-parcel units that better preserve spatial homogeneity and more closely approximate field boundaries, thereby providing more reliable spatial units for parcel-scale temporal feature aggregation.

To support the selection of SNIC parameters, a preliminary sensitivity analysis was conducted using the XGBoost + LSTM framework under several representative combinations of seed spacing and compactness. As shown in Table 4, classification performance remained relatively stable across the tested parameter range, with OA consistently above 92.8%. Among these settings, seed spacing = 10 pixels and compactness = 20 yielded the best overall performance in terms of OA, mIoU, and Kappa, and were therefore selected for the final SNIC configuration. The limited variation across the tested settings also indicates that the parcel-scale framework is reasonably robust to moderate parameter changes.

Parcel extraction was performed using the SNIC superpixel segmentation algorithm to generate pseudo-parcel units [34]. The imagery was first partitioned into spectrally homogeneous regions based on spectral similarity and spatial proximity. The selected SNIC parameters (seed spacing = 10 pixels; compactness = 20) were then applied to generate the final parcel units. Local merging and filtering were subsequently performed using a heterogeneity threshold based on spectral distance, followed by boundary smoothing to produce approximate parcel units. The changes in parcels after merging and filtering are shown in Figure 7.

SNIC segments imagery into connected regions based on a combination of spectral and texture similarity and spatial proximity, producing pseudo-parcel units that substantially reduce salt-and-pepper noise and improve the spatial consistency of classification results. This is particularly important in the study area, where farmland is highly fragmented and direct delineation of field boundaries from medium-resolution imagery is challenging. Compared with pixel-wise classification, parcel-based aggregation provides more stable spatial units for time-series feature extraction and classification [35].

2.5. Overall Research Framework

As shown in Figure 8, this study constructs a parcel-scale crop classification framework that integrates the strengths of deep learning and traditional machine learning. The overall strategy is as follows: multi-temporal optical vegetation indices and radar features are first organized as time-series inputs. An LSTM network is then used to learn phenological patterns and high-level temporal representations during crop growth. These learned representations are subsequently fused with parcel-level features and used as input for RF, SVM, and XGBoost classifiers for final crop discrimination.

Within this framework, the learned temporal embeddings can be analyzed to identify key phenological stages and important feature dimensions. In addition, the hybrid strategy improves robustness under limited training samples by combining temporal modeling in the LSTM with the stable decision boundaries of traditional machine learning classifiers.

Overall, this study integrates multi-source remote sensing data, parcel-scale spatial units, and a hybrid LSTM–machine learning framework to establish a crop-mapping approach suitable for fragmented irrigation districts.

2.6. Time-Series Dataset Construction

2.6.1. Vegetation Indices and Optical Time-Series Features

Vegetation indices derived from multi-temporal optical imagery were used to characterize crop growth dynamics throughout the growing season. In this study, indices including NDVI, EVI, GNDVI, SAVI, NDWI, RENDVI, NDRE, and cIre were calculated for each parcel based on Sentinel-2 observations [36,37,38,39,40,41]. These variables capture complementary information on canopy vigor, chlorophyll content, and moisture status, and are therefore suitable for describing crop phenological differences.

Figure 9 shows the temporal trajectories of representative optical indices for wheat, corn, and sunflower.

As shown in Figure 9, wheat exhibited an early-rise and early-senescence pattern, with most growth-related indices increasing rapidly in spring, peaking around June, and declining sharply after July. In contrast, corn and sunflower showed delayed development, with low index values during April–May, rapid increases from June onward, and peak values during July–August. Indices related to canopy biomass and leaf area, such as NDVI, EVI, SAVI, and GNDVI, effectively characterized the overall crop growth process with smooth temporal trajectories. Red-edge indices were more sensitive to chlorophyll content and canopy structure during June–August, resulting in stronger inter-crop separability during this period. By contrast, NDWI was more sensitive to irrigation and precipitation effects and therefore showed weaker stability as an independent discriminator.

2.6.2. Radar Time-Series Features

Radar remote sensing systems commonly employ four polarization modes: HH, VV, HV, and VH [42]. The first two represent co-polarization, while the latter two correspond to cross-polarization. As shown in Figure 10, time-series curves of radar backscatter features, including VV, VH, RVI, and VV/VH, are presented for the three crop types.

The VV/VH ratio time-series curves indicate that corn generally exhibits higher VV/VH ratios, with a pronounced increase during the peak growing season (approximately June–August). Wheat reaches its peak earlier (May–June) and then declines, while sunflower shows larger fluctuations with a slightly later peak (July–August). This can be explained by the fact that the VH channel is more sensitive to volume scattering related to vegetation structure, whereas VV is more sensitive to surface scattering. Distinct canopy structure development stages (e.g., jointing, heading, grain filling) result in clear discrimination in VV/VH among different crops, making this ratio effective for crop type characterization.

The VH time-series curves show that corn exhibits significantly enhanced VH signals during mid-to-late growth stages; wheat has relatively high VH values during early stages followed by rapid decline; and sunflower shows a slight increase during mid-to-late stages. Because VH is highly sensitive to vegetation structural changes, the early-maturing nature of wheat and late-maturing characteristics of corn are clearly reflected in the curves. VH demonstrates high utility, particularly for time-series models such as LSTM, by providing key temporal discrimination.

The VV curves exhibit less pronounced differences among the three crops, with similar overall trends and peak values occurring around July–August. Corn generally shows slightly higher VV values than wheat, while sunflower overlaps substantially with corn during mid-to-late stages. VV is primarily influenced by surface and geometric scattering and thus has limited discriminative power when used alone. However, when combined with VH or optical features such as NDVI, VV can contribute to improved crop discrimination. The VV/VH ratio exhibits trends similar to those of VH and VV, emphasizing delayed and sustained peaks for corn, early maturity and rapid decline for wheat, and mid-season variability for sunflower. This ratio highlights combined structural and moisture characteristics and serves as a useful auxiliary classification feature.

2.7. Classification Models

The aim of this study was not to exhaustively compare all deep learning architectures, but to evaluate whether LSTM-derived temporal embeddings could enhance stable and interpretable conventional classifiers under limited training samples. Therefore, RF, SVM, and XGBoost were selected as representative baseline models because they are widely used in remote sensing classification and provide complementary decision mechanisms. End-to-end deep learning models such as CNN- or U-Net-based frameworks are valuable alternatives, but under medium spatial resolution and fragmented field conditions they may be more sensitive to parcel adhesion and boundary uncertainty.

2.7.1. Long Short-Term Memory (LSTM)

In this study, parcel-scale crop classification based on NDVI and other temporal features was formulated as a sequence learning task. Long Short-Term Memory (LSTM) networks are well suited for modeling long temporal dependencies and can alleviate the gradient vanishing and gradient explosion problems commonly encountered in standard recurrent neural networks when handling long sequences. In the proposed framework, the LSTM was used to encode multi-temporal Sentinel-1 and Sentinel-2 features into compact temporal representations, which were subsequently used as input features for traditional machine learning classifiers, including RF, SVM, and XGBoost, as illustrated in Figure 11.

A multi-layer LSTM architecture was adopted, with 128 hidden units in each layer and a dropout rate of 0.2. At each time step, the LSTM maintains a cell state for long-term information storage and a hidden state for information propagation. The information flow is regulated by the forget gate, input gate, and output gate [19], as shown in Figure 12.

The network was optimized using the Adam optimizer with a learning rate of 0.001. The maximum number of training epochs was set to 100, and early stopping was applied when the validation loss did not improve for 10 consecutive epochs.

After training, the final hidden state was used as the temporal embedding for downstream machine learning tasks.

This embedding summarizes the annual temporal dynamics of multi-source Sentinel-1 and Sentinel-2 observations in a compact feature space. Rather than corresponding to raw input variables, it represents a higher-level abstract feature learned by the network and is therefore more suitable for crop discrimination.

To further examine the effectiveness of the learned temporal representations, the extracted embeddings were visualized using t-SNE dimensionality reduction [43]. As shown in Figure 13, the embeddings form relatively compact clusters corresponding to different crop types, indicating that the LSTM successfully captures crop-specific phenological patterns. These embeddings integrate the temporal dynamics of multi-source observations throughout the growing season and provide a more discriminative representation than the original spectral or radar variables.

A comparison of embeddings derived before and after Sentinel-1 temporal interpolation shows that the overall cluster structure remains largely consistent, suggesting that the interpolation process does not introduce substantial distortion into the temporal representation.

Compared with corn and sunflower, wheat embeddings exhibit slightly larger dispersion in the feature space, which may be related to greater variability in sowing dates, growth stages, and management practices across the study area.

The visualization confirms that the LSTM effectively transforms the original multi-source temporal features into discriminative representations suitable for downstream machine learning classification.

2.7.2. Random Forest (RF)

Random Forest (RF), proposed by Breiman [44], is a tree-based ensemble learning algorithm built on the bagging strategy. It constructs multiple decision trees using different bootstrap samples and aggregates their outputs through majority voting for classification. By combining many weak learners, RF can effectively reduce variance, improve prediction accuracy, and alleviate overfitting. Owing to its robustness and ability to handle high-dimensional features, RF is widely used in remote sensing classification tasks.

In this study, the RF classifier was implemented with 500 trees (n_estimators = 500). The number of variables considered at each split was set to the square root of the total number of input features (max_features = sqrt), tree depth was left unrestricted, and bootstrap sampling was enabled. These settings represent a commonly adopted configuration for remote sensing classification and were used to ensure both robustness and reproducibility.

2.7.3. Support Vector Machine (SVM)

Support Vector Machine (SVM) is a supervised learning algorithm originally developed by Vapnik and colleagues [45]. Its main objective is to find an optimal separating hyperplane that maximizes the margin between classes. For linearly separable samples, SVM directly identifies the optimal decision boundary in the original feature space. For nonlinear problems, kernel functions are used to map the data into a higher-dimensional space, where linear separation can then be achieved.

In this study, the SVM classifier used a radial basis function (RBF) kernel, which is widely applied in remote sensing classification because of its ability to model nonlinear class boundaries. The penalty parameter was set to C = 10, and the kernel parameter was set to gamma = scale. A one-versus-one strategy was adopted for multi-class classification.

2.7.4. Extreme Gradient Boosting (XGBoost)

Extreme Gradient Boosting (XGBoost) [46] is an efficient implementation of the gradient boosting framework and is widely used for classification and regression tasks. It builds decision trees sequentially, where each new tree is trained to correct the errors made by the previous ensemble. Compared with traditional boosting algorithms, XGBoost introduces several optimizations, including regularization, parallel computation, and efficient handling of sparse or missing data, which improve both prediction accuracy and computational efficiency.

In this study, the XGBoost classifier was trained with 300 boosting rounds (n_estimators = 300), a learning rate of 0.1, and a maximum tree depth of 6. To improve generalization, the subsample ratio and column sampling ratio were set to 0.8 (subsample = 0.8, colsample_bytree = 0.8). The objective function was specified as multi-class soft probability output, which is suitable for the three-class crop classification task considered in this study.

2.8. Accuracy Evaluation Metrics

Model performance was evaluated using several commonly used metrics, including Overall Accuracy (OA), Kappa coefficient, class-wise F1-score, and mean Intersection over Union (mIoU). All evaluation metrics were calculated at the parcel level to ensure consistency with the classification unit used in this study.

OA = \frac{\sum_{i = 1}^{n} {T P}_{i}}{N}, UA = \frac{T P_{c}}{T P_{c} + F P_{c}}

PA = \frac{T P_{c}}{T P_{c} + F N_{c}}, F 1 = 2 \times \frac{T P_{c}}{2 T P_{c} + F N_{c} + F P_{c}}

{IoU}_{c} = \frac{Area of Intersection}{Area of Union}, mIoU = \frac{\sum_{c = 1}^{C} {IoU}_{c}}{n}

Boundary IoU = \frac{|(G_{d} \cap G) \cap (P_{d} \cap P)|}{|(G_{d} \cap G) \cup (P_{d} \cap P)|}

Kappa = \frac{p_{o} - p_{e}}{1 - p_{e}}

where

n

represents the number of classes,

N

denotes the total number of parcels in the validation dataset,

T P_{c}

,

F P_{c}

, and

F N_{c}

represent the true positives, false positives, and false negatives for a specific class

c

, respectively. OA represents the proportion of correctly classified parcels among all validation parcels. The Kappa coefficient measures the agreement between predicted and reference labels after accounting for chance agreement. Precision and recall were used to evaluate the reliability and completeness of crop identification for each class, and the F1-score was calculated as the harmonic mean of precision and recall.

{IoU}_{c}

was calculated for each crop class, and mIoU was obtained as the mean value across all classes. Boundary IoU measures the intersection-over-union ratio between the predicted boundary and the actual boundary to evaluate the quality of edge segmentation.

3. Results

3.1. SHAP-Based Model Interpretability Results

SHAP is a model interpretation method based on Shapley values from game theory, which enables interpretation of the decision logic of black-box machine learning models by quantifying feature contributions [47]. SHAP can be applied to most machine learning models. In this study, the SHAP library together with commonly used Python 3.8 (Python Software Foundation, Wilmington, DE, USA) libraries was employed to conduct model interpretability analysis. SHAP values were calculated for each feature, global feature importance was visualized using bar plots to highlight the most influential features, and scatter plots were used to show how variations in individual feature values affect their corresponding SHAP values. Nonlinear functions were fitted to these relationships to better capture the influence trends, thereby enabling a deeper understanding of model behavior.

Figure 14 presents the SHAP-based feature interpretation results for the XGBoost model. From the feature importance ranking, it can be observed that cIre, RENDVI, NDRE, B12, and NDWI contribute most significantly to model decisions, with red-edge indices and shortwave infrared bands playing dominant roles in overall contribution. SHAP dependence analysis indicates that the SHAP values of cIre and RENDVI exhibit clear monotonic increasing trends with increasing feature values, suggesting that the XGBoost model can stably capture the response of red-edge bands to variations in crop chlorophyll content and biomass. In contrast, NDWI shows a significant positive contribution to classification results in the low-to-medium value range, while its contribution becomes more moderate in the high-value range, reflecting the stage-dependent sensitivity of moisture information for crop discrimination. Compared with RF and SVM, the XGBoost model exhibits clearer feature–response relationships and stronger nonlinear fitting capability in SHAP space. Its decision process relies more heavily on a small number of key features, effectively reducing interference from redundant features. This characteristic is consistent with its superior performance in classification accuracy and spatial consistency.

Figure 15 shows the SHAP-based interpretation results for the Support Vector Machine (SVM) model. From the global feature importance perspective, the SVM model mainly relies on a small number of features such as B3, EVI, and VH, and its overall feature utilization is lower than that of RF and XGBoost. SHAP dependence relationships indicate that the relationships between most features and model outputs are relatively scattered, with some features exhibiting weak or unstable nonlinear responses. This suggests that under high-dimensional, multi-source feature conditions, SVM struggles to fully capture complex feature interactions, and its discriminative capability depends more on a limited set of dominant features. These results further demonstrate that, in multi-temporal and multi-feature crop classification tasks, tree-based ensemble learning methods have clear advantages in feature representation capacity and decision stability.

Figure 16 presents the SHAP-based interpretability analysis results for the Random Forest (RF) model. The left panel shows the mean absolute SHAP values (Mean |SHAP|) of features, which measure their relative contributions to classification decisions at the global scale, while the right panel displays SHAP dependence scatter plots for several key features, illustrating the direction and magnitude of feature value effects on model outputs. The global importance ranking indicates that features such as WVP, BI, EVI, cIre, and VH contribute substantially to RF model decisions, suggesting that the model primarily relies on vegetation vigor information, red-edge-sensitive features, and radar backscattering characteristics for crop discrimination. From the local dependence perspective, most key features exhibit pronounced nonlinear relationships with SHAP values. For example, EVI and NDWI show the strongest positive contributions to classification results in intermediate value ranges, while contributions weaken at lower or higher ranges, reflecting crop sensitivity to changes in canopy structure and moisture conditions during key growth stages. Radar features such as VH exhibit negative regulatory effects on classification results within certain ranges, indicating their complementary discriminative capability for differentiating crop structural characteristics. These results demonstrate that the RF model can adaptively learn complex nonlinear decision boundaries in multi-source feature space; however, it is also sensitive to the number of features and involves a certain degree of redundant feature participation in decision-making.

Overall, the SHAP analysis indicates that red-edge indices, vegetation indices, and moisture-related features play stable roles in crop discrimination, while radar features provide complementary information in specific value ranges. These results support the interpretability of the proposed multi-source framework. However, because several optical variables are correlated, the SHAP-based importance ranking should be interpreted as indicative rather than definitive.

To empirically justify the decision to retain the full feature space rather than applying explicit dimensionality reduction, an ablation experiment was conducted. We applied Recursive Feature Elimination (RFE) to systematically remove redundant and highly correlated variables before training the XGBoost + LSTM model [48].

As shown in Table 5, the model trained on the reduced feature set achieved an Overall Accuracy of 93.35%, which was slightly lower than the 93.61% obtained with the full feature set. This result suggests that retaining the full feature space preserved temporally complementary signals that were useful for crop discrimination. Therefore, in the current framework, explicit feature removal was not adopted in the main workflow, while SHAP analysis was used only as an interpretability tool rather than as a feature-selection criterion.

3.2. Visual Comparison of Classification Results

In this study, several classification cases within the study area were selected for analysis. As shown in Figure 17, the selected cases are spatially distributed across representative parts of the study area, suggesting that the model performance remains visually stable within the regional application context. The results indicate that learning temporal features is highly feasible for crop classification studies. However, notable differences are observed among different models in their utilization of the temporal representations learned by the LSTM.

3.3. Analysis of Individual Model Results

As shown in Table 6, the XGBoost + LSTM model achieves relatively high classification accuracy. Its classification results exhibit good within-parcel consistency, with relatively uniform colors and few anomalous pixels inside the same parcel. The model is able to maintain good class consistency for small and elongated parcels, indicating that it can effectively learn complex and nonlinear decision boundaries among crop classes in the temporal representation space extracted by the LSTM.

The RF + LSTM model also shows generally good parcel-level homogeneity; however, a small number of block-like misclassifications or localized noise appear at the boundaries between adjacent parcels. This model is slightly sensitive to parcel edges, and some boundary areas are prone to confusion with neighboring crops.

The classification results of the SVM + LSTM model indicate that SVM is less suitable for high-dimensional temporal feature analysis, resulting in relatively poorer performance. In some areas, a certain degree of over-smoothing is observed: although parcel interiors appear relatively uniform, some small parcels or local variations that should be preserved are ignored. In regions where temporal differences among crops are weak, large contiguous areas of a single class are more likely to occur, leading to a loss of local detail.

Table 6. Classification result display for representative parcels.

Number	Image	Imagery and Farmland Boundaries	RF + LSTM	SVM + LSTM	XGBoost + LSTM
①
②
③
④
⑤

3.4. Quantitative Validation Results

To quantitatively compare the performance of the three models for different crop types, the class-wise validation metrics, including F1-score, user’s accuracy (UA), producer’s accuracy (PA), and IoU, are summarized in Table 7.

3.5. Ablation Experiment

The ablation results in Table 8 confirm the value of the LSTM temporal representation module. Without the LSTM, OA was 84.67% for RF, 78.83% for SVM, and 85.40% for XGBoost. After incorporating the LSTM, OA increased to 90.78%, 82.81%, and 93.61%, corresponding to improvements of 6.11, 3.98, and 8.21 percentage points, respectively. These results indicate that the LSTM effectively captures temporal dependencies that are difficult to represent using static multi-date features alone. The learned temporal embeddings help distinguish crops with overlapping spectral responses by encoding differences in peak timing, growth duration, and senescence patterns across the growing season. This is particularly important for separating early-maturing wheat from later-developing corn and sunflower

3.6. Optical/SAR Contribution and Temporal Interpolation Analysis

To further assess the effect of temporal alignment, we compared the XGBoost + LSTM model using non-interpolated Sentinel-1 data and linearly interpolated Sentinel-1 data (Table 9). Linear interpolation produced a small but consistent improvement, indicating that it improved temporal synchronization while preserving the main seasonal signal.

To further evaluate the contribution of different feature sources, we compared optical-only, SAR-only, and multi-source classification results using the optimal XGBoost + LSTM framework. As shown in Table 10, optical features alone provided relatively strong classification performance, whereas SAR-only features were less effective when used independently. The best results were achieved by integrating optical and SAR observations, which confirms that the two data sources provide complementary information for crop discrimination.

Although the differences among the large-area classification maps in Figure 18 appear pronounced, they should be interpreted together with the quantitative results in Table 8. Figure 18 presents wall-to-wall spatial predictions over the entire study area, and therefore, emphasizes differences in class distribution, parcel continuity, spatial coherence, and boundary delineation across the landscape. By contrast, the metrics reported in Table 8 are derived from validation samples and summarize the statistical agreement between predicted labels and reference observations. Consequently, visually obvious differences in mapped patterns do not necessarily correspond to equally large differences in OA, Kappa, or mIoU. In the present study, many core parcels were correctly classified by all three methods, whereas the most visible differences were concentrated in fragmented fields, boundary zones, and locally heterogeneous areas. These regions strongly affect the visual impression of the classification maps, but their contribution to the global sample-based accuracy metrics is more limited.

3.7. Pixel-Based vs. Parcel-Based Comparison

To further illustrate the spatial effect of parcel-based classification, Figure 19 presents a representative visual comparison between pixel-based and parcel-based classification results. The pixel-based approach shows evident salt-and-pepper noise and less coherent field boundaries, whereas the parcel-based framework produces more homogeneous within-parcel patterns and clearer spatial continuity. This visual example suggests that parcel-level aggregation can effectively suppress local classification noise and improve the spatial consistency of crop maps in fragmented agricultural landscapes.

To further evaluate the effectiveness of parcel-based classification, the proposed parcel-level framework was compared with a conventional pixel-based baseline using the same feature set and classifiers. As shown in Table 11, parcel-based classification consistently improved overall accuracy, Kappa, and mIoU for RF, SVM, and XGBoost. These improvements indicate that parcel-level aggregation reduces within-field heterogeneity and suppresses mixed-pixel effects, thereby enhancing both numerical performance and spatial consistency.

4. Discussion

4.1. Agricultural Interpretation of Differences in Crop Classification Results

Recent studies increasingly employ attention-based deep learning and multi-task learning for multi-source crop classification. However, under moderate spatial resolution, field boundaries are frequently misaligned with image grids, resulting in mixed pixels and parcel adhesion effects in pixel-wise segmentation models, particularly in fragmented agricultural landscapes. In contrast, the proposed parcel-based temporal representation learning framework reduces noise through pseudo-parcel aggregation, thereby improving spatial coherence and classification robustness.

From the overall classification performance, clear differences in recognition accuracy are observed among the three major crops. Corn achieves the highest classification accuracy, followed by wheat, while sunflower shows relatively lower accuracy. These differences are not solely determined by model structure but are closely related to crop-specific growth characteristics and agricultural management practices in the irrigation district.

Corn in the study area generally has relatively concentrated sowing times and consistent field management practices. Its growing season is relatively long, and canopy structure changes markedly during growth, resulting in stable temporal curve patterns. Consequently, corn exhibits clear and continuous temporal features in both optical vegetation indices and radar backscattering characteristics, making it easier to distinguish from other crops in the multi-source temporal feature space. This is an important reason why all models achieve relatively high classification accuracy for corn.

Wheat shows lower classification stability than corn because its sowing and harvesting dates vary more substantially across the study area. In addition, relay cropping with corn occurs in some parcels, which increases phenological overlap during the middle of the growing season. As a result, wheat is more easily confused with other crops in both optical and radar temporal features.

The relatively lower classification accuracy of sunflower is mainly related to its planting system and growth characteristics. Sunflower varieties in the study area are diverse, sowing dates vary considerably, and some parcels involve double cropping or crop rotation with other crops. As a result, sunflower temporal curves exhibit high dispersion in both timing and magnitude. This high variability weakens the ability of a unified temporal model to characterize its growth features and increases the risk of inter-crop confusion.

Overall, the differences in classification accuracy among crops reflect real differences in crop growth rhythms and management practices in irrigation agriculture. This indicates that crop remote sensing classification results should be interpreted in conjunction with agricultural context rather than being evaluated solely from the perspective of model performance.

Compared with previous Sentinel-1/Sentinel-2 crop-mapping studies conducted at pixel scale or relying mainly on temporal statistical features, the proposed framework achieved competitive parcel-level performance in a fragmented irrigation district. Its main contribution is not to claim universally superior accuracy, but to show that parcel-scale aggregation and LSTM-based temporal representation learning can jointly improve spatial consistency and reduce mixed-pixel noise under limited training samples. This is consistent with previous object-oriented crop classification studies showing that parcel- or object-based aggregation can reduce salt-and-pepper noise and improve the spatial coherence of crop maps derived from Sentinel time series [15,22].

4.2. Significance of Parcel-Scale Classification for Irrigation District Agricultural Management

Under fragmented irrigation district conditions, farmland parcels usually correspond to actual cultivation and management units and also represent the basic spatial units for irrigation scheduling, operational planning, and statistical accounting. Compared with pixel-scale classification, parcel-scale classification results exhibit better spatial integrity and consistency, making them more suitable for agricultural applications.

By introducing parcel extraction and spatial constraints, this study effectively suppresses the “salt-and-pepper noise” commonly observed in pixel-scale classification and significantly improves classification consistency within the same parcel. For irrigation district management, such consistency is particularly important, as scattered misclassifications within parcels not only affect the accuracy of crop area statistics but may also interfere with irrigation water allocation calculations and management decisions. It should be noted that in areas with highly fragmented parcels and complex boundaries, relying solely on image resolution for parcel delineation may lead to the erroneous merging of different crops into the same unit, thereby reducing classification accuracy. Therefore, this study adopts a parcel construction strategy based on spectral and spatial constraints, which balances spatial consistency and crop heterogeneity to a certain extent and enables better adaptability of parcel-scale classification in complex irrigation environments.

From an agricultural application perspective, parcel-scale crop classification results are easier to integrate with existing farmland management data and statistical systems and can be directly used for monitoring crop planting structure, evaluating crop layout in irrigation districts, and analyzing the effectiveness of agricultural policy implementation.

From an application perspective, the parcel-scale multi-source temporal crop classification method proposed in this study demonstrates strong practical potential for monitoring crop planting structures in irrigation districts. The method can stably identify major crop types under relatively limited sample conditions, providing technical support for annual crop distribution mapping and planting structure change analysis.

In irrigation management practice, accurate knowledge of the spatial distribution of different crops helps optimize irrigation water allocation and evaluate the matching relationship between crop layout and water resource carrying capacity. With multi-year application, this method can also provide continuous and objective data support for crop rotation assessment, agricultural structural adjustment in irrigation districts, and food security analysis.

4.3. Role of Temporal Features in Characterizing Crop Phenological Differences

Crop phenological processes are an important source of information for distinguishing crop types. In irrigation district environments, due to the influence of irrigation systems and human management factors, spectral differences among crops at a single time point are often insufficient to support reliable classification, whereas complete growing-season temporal information can more comprehensively reflect crop growth rhythms.

The results of this study indicate that temporal features at key growth stages make important contributions to crop discrimination. For example, wheat exhibits temporal characteristics of early peak and early senescence, while corn and sunflower display later growth peaks and more prolonged development. Such differences are difficult to fully capture in single-date imagery but can be effectively learned within a time-series modeling framework. Similar observations have been reported in previous studies using optical and SAR time series for crop classification, which emphasize the importance of key phenological stages for improving class separability.

Combined with feature separability analysis and model interpretation results, it can be observed that vegetation indices, red-edge indices, and radar structural features show stronger discriminative power within specific temporal windows. This suggests that key information for crop classification is not evenly distributed throughout the entire growing season but is concentrated in several critical phenological stages. Integrating information from these stages through temporal modeling helps alleviate the problem of spectral confusion and improves the stability of classification results.

4.4. Method Limitations and Future Research Directions

Although this study achieves favorable results in irrigation district crop classification, certain limitations remain. First, the temporal interpolation applied to Sentinel-1 radar time series may weaken signals associated with extreme phenological variations to some extent, and the ability to characterize anomalous years or special management scenarios still requires improvement. Second, interannual variations in climatic conditions and farmers’ management practices may affect the stability of crop temporal features, limiting the applicability of the model when transferred across years or regions.

Future research could leverage multi-year datasets to further analyze the interannual stability of crop phenological features and explore the integration of higher temporal resolution data or auxiliary agricultural information to enhance the robustness and transferability of crop classification in complex agricultural environments.

5. Conclusions

This study developed a parcel-scale crop classification framework for the Hetao Irrigation District by integrating Sentinel-2 optical imagery, Sentinel-1 SAR observations, and LSTM-based temporal representation learning. The results show that temporal embeddings derived from the LSTM can effectively improve the performance of conventional machine learning classifiers for crop mapping in fragmented irrigated farmland. Among the tested models, XGBoost + LSTM achieved the best performance, with an overall accuracy of 93.61%, a Kappa coefficient of 91.66%, and a mean IoU of 87.41%. In addition, the comparison results indicate that parcel-based aggregation improves within-field consistency and reduces mixed-pixel noise relative to pixel-based classification.

The results also confirm that optical and SAR data provide complementary information for crop discrimination, while temporal modeling helps distinguish crops with different phenological characteristics across the growing season. This is particularly important in irrigation districts where crop phenology, fragmented field structure, and management differences increase classification difficulty. Therefore, the proposed framework provides an effective approach for crop type mapping and planting structure monitoring in the study area, with potential value for irrigation management and agricultural statistics.

Nevertheless, this study was conducted for a single year and one irrigation district, and the transferability of the model across years and regions still requires further validation. Future work should focus on multi-year experiments, cross-region applications, and the integration of additional auxiliary data to improve the robustness and generalizability of parcel-scale crop mapping.

Author Contributions

All authors contributed to the conception and design of this study. Conceptualization, S.S. and Q.L.; methodology, S.S.; software, S.S.; validation, S.S., Q.L. and Z.Y.; formal analysis, S.S.; investigation, S.S.; resources, Q.L. and Z.Y.; data curation, S.S.; writing—original draft preparation, S.S.; writing—review and editing, Q.L. and Z.Y.; visualization, S.S.; supervision, Q.L. and Z.Y.; project administration, Q.L.; funding acquisition, Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation Project of Inner Mongolia Autonomous Region, grant number 2025MS05114. The APC was funded by the Natural Science Foundation Project of Inner Mongolia Autonomous Region, grant number 2025MS05114.

Data Availability Statement

Publicly available Sentinel-1 and Sentinel-2 datasets were analyzed in this study and can be accessed through the Copernicus Data Space Ecosystem. The derived parcel-level datasets generated during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

NDVI	Normalized Difference Vegetation Index
EVI	Enhanced Vegetation Index
NDRE	Normalized Difference Red Edge Index
cIre	Chlorophyll red-edge index
NDWI	Normalized Difference Water Index
SAVI	Soil-adjusted Vegetation Index
RVI	Radar Vegetation Index
SNIC	Simple Non-Iterative Clustering
IoU	Intersection Over Union
mIoU	mean IoU
XGBoost	Extreme Gradient Boosting
SVM	Support Vector Machine
RF	Random Forest
LSTM	Long Short-Term Memory

References

Foley, J.A.; Ramankutty, N.; Brauman, K.A.; Cassidy, E.S.; Gerber, J.S.; Johnston, M.; Mueller, N.D.; O’Connell, C.; Ray, D.K.; West, P.C.; et al. Solutions for a cultivated planet. Nature 2011, 478, 337–342. [Google Scholar] [CrossRef] [PubMed]
Thenkabail, P.S.; Lyon, J.G.; Huete, A. Remote Sensing of Vegetation: Principles, Techniques, and Applications; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Atzberger, C. Advances in Remote Sensing of Agriculture: Context Description, Existing Operational Monitoring Systems and Major Information Needs. Remote Sens. 2013, 5, 949–981. [Google Scholar] [CrossRef]
Mulla, D.J. Twenty five years of remote sensing in precision agriculture: Key advances and remaining knowledge gaps. Biosyst. Eng. 2013, 114, 358–371. [Google Scholar] [CrossRef]
Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
Torres, R.; Snoeij, P.; Geudtner, D.; Bibby, D.; Davidson, M.; Attema, E.; Potin, P.; Rommen, B.; Floury, N.; Brown, M.; et al. GMES Sentinel-1 mission. Remote Sens. Environ. 2012, 120, 9–24. [Google Scholar] [CrossRef]
McNairn, H.; Shang, J. A Review of Multitemporal Synthetic Aperture Radar (SAR) for Crop Monitoring. In Multitemporal Remote Sensing. Remote Sensing and Digital Image Processing; Ban, Y., Ed.; Springer: Cham, Switzerland, 2016; Volume 20. [Google Scholar] [CrossRef]
Veloso, A.; Mermoz, S.; Bouvet, A.; Toan, T.L.; Planells, M.; Dejoux, J.F.; Ceschia, E. Understanding the temporal behavior of crops using Sentinel-1 and Sentinel-2-like data for agricultural applications. Remote Sens. Environ. 2017, 199, 415–426. [Google Scholar] [CrossRef]
Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2019, 221, 430–443. [Google Scholar] [CrossRef]
Wardlow, B.D.; Egbert, S.L.; Kastens, J.H. Analysis of time-series MODIS 250 m vegetation index data for crop classification in the U.S. Central Great Plains. Remote Sens. Environ. 2007, 108, 290–310. [Google Scholar] [CrossRef]
Pan, Y.; Li, L.; Zhang, J.; Liang, S.; Zhu, X.; Sulla-Menashe, D. Winter wheat area estimation from MODIS-EVI time series data using the Crop Proportion Phenology Index. Remote Sens. Environ. 2012, 119, 232–242. [Google Scholar] [CrossRef]
Tuia, D.; Persello, C.; Bruzzone, L. Domain adaptation for the classification of remote sensing data: An overview of recent advances. IEEE Geosci. Remote Sens. Mag. 2016, 4, 41–57. [Google Scholar] [CrossRef]
Zhu, X.X.; Tuia, D.; Mou, L.; Xia, G.-S.; Zhang, L.; Xu, F.; Fraundorfer, F. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci. Remote Sens. Mag. 2017, 5, 8–36. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, L.; Du, B. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geosci. Remote Sens. Mag. 2016, 4, 22–40. [Google Scholar] [CrossRef]
Xue, H.; Xu, X.; Zhu, Q.; Yang, G.; Long, H.; Li, H.; Yang, X.; Zhang, J.; Yang, Y.; Xu, S.; et al. Object-oriented crop classification using time series Sentinel images from Google Earth Engine. Remote Sens. 2023, 15, 1353. [Google Scholar] [CrossRef]
Feng, F.; Gao, M.; Liu, R.; Yao, S.; Yang, G. A deep learning framework for crop mapping with reconstructed Sentinel-2 time series images. Comput. Electron. Agric. 2023, 213, 108227. [Google Scholar] [CrossRef]
Eisfelder, C.; Boemke, B.; Gessner, U.; Sogno, P.; Alemu, G.; Hailu, R.; Mesmer, C.; Huth, J. Cropland and crop type classification with Sentinel-1 and Sentinel-2 time series using Google Earth Engine for agricultural monitoring in Ethiopia. Remote Sens. 2024, 16, 866. [Google Scholar] [CrossRef]
Sun, Y.; Li, Z.-L.; Luo, J.; Wu, J.; Liu, N. Farmland parcel-based crop classification in cloudy/rainy mountains using Sentinel-1 and Sentinel-2 based deep learning. Int. J. Remote Sens. 2022, 43, 1054–1073. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Crisóstomo de Castro Filho, H.; Abílio de Carvalho Júnior, O.; Ferreira de Carvalho, O.L.; Pozzobon de Bem, P.; dos Santos de Moura, R.; Olino de Albuquerque, A.; Rosa Silva, C.; Guimarães Ferreira, P.H.; Fontes Guimarães, R.; Trancoso Gomes, R.A. Rice Crop Detection Using LSTM, Bi-LSTM, and Machine Learning Models from Sentinel-1 Time Series. Remote Sens. 2020, 12, 2655. [Google Scholar] [CrossRef]
Rußwurm, M.; Körner, M. Temporal vegetation modelling using long short-term memory networks for crop identification from medium-resolution multi-spectral satellite images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops; IEEE Computer Society: Los Alamitos, CA, USA, 2017; pp. 11–19. [Google Scholar] [CrossRef]
Sitokonstantinou, V.; Papoutsis, I.; Kontoes, C.; Lafarga Arnal, A.; Armesto Andrés, A.P.; Garraza Zurbano, J.A. Scalable Parcel-Based Crop Identification Scheme Using Sentinel-2 Data Time-Series for the Monitoring of the Common Agricultural Policy. Remote Sens. 2018, 10, 911. [Google Scholar] [CrossRef]
Wang, J.; Ding, J.; Yu, D.; Ma, X.; Zhang, Z.; Ge, X.; Teng, D.; Li, X.; Liang, J.; Lizaga, I.; et al. Capability of Sentinel-2 MSI data for monitoring and mapping of soil salinity in dry and wet seasons in the Ebinur Lake region, Xinjiang, China. Geoderma 2019, 353, 172–187. [Google Scholar] [CrossRef]
Cai, J.; Liu, H.; Lei, T.; Pereira, L.S. Estimating reference evapotranspiration with the FAO Penman–Monteith equation using daily weather forecast messages. Agric. For. Meteorol. 2007, 145, 22–35. [Google Scholar] [CrossRef]
Yu, B.; Shang, S. Multi-Year Mapping of Maize and Sunflower in Hetao Irrigation District of China with High Spatial and Temporal Resolution Vegetation Index Series. Remote Sens. 2017, 9, 855. [Google Scholar] [CrossRef]
Song, W.; Wang, C.; Dong, T.; Wang, Z.; Wang, C.; Mu, X.; Zhang, H. Hierarchical extraction of cropland boundaries using Sentinel-2 time-series data in fragmented agricultural landscapes. Comput. Electron. Agric. 2023, 212, 108097. [Google Scholar] [CrossRef]
Keys, R. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1153–1160. [Google Scholar] [CrossRef]
Arias, M.; Campo-Bescós, M.Á.; Álvarez-Mozos, J. On the influence of acquisition geometry in backscatter time series over wheat. Int. J. Appl. Earth Obs. Geoinf. 2022, 106, 102671. [Google Scholar] [CrossRef]
Kaplan, G.; Fine, L.; Lukyanov, V.; Manivasagam, V.S.; Tanny, J.; Rozenstein, O. Normalizing the local incidence angle in Sentinel-1 imagery to improve leaf area index, vegetation height, and crop coefficient estimations. Land 2021, 10, 680. [Google Scholar] [CrossRef]
Weilandt, F.; Behling, R.; Goncalves, R.; Madadi, A.; Richter, L.; Sanona, T.; Spengler, D.; Welsch, J. Early crop classification via multi-modal satellite data fusion and temporal attention. Remote Sens. 2023, 15, 799. [Google Scholar] [CrossRef]
Che, X.; Zhang, H.K.; Li, Z.B.; Wang, Y.; Sun, Q.; Luo, D. Linearly interpolating missing values in time series helps little for land cover classification using recurrent or attention networks. ISPRS J. Photogramm. Remote Sens. 2024, 212, 73–95. [Google Scholar] [CrossRef]
Mandal, D.; Kumar, V.; Ratha, D.; Dey, S.; Bhattacharya, A.; Lopez-Sanchez, J.M.; McNairn, H.; Rao, Y.S. Dual polarimetric radar vegetation index for crop growth monitoring using Sentinel-1 SAR data. Remote Sens. Environ. 2020, 247, 111954. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Achanta, R.; Süsstrunk, S. Superpixels and polygons using simple non-iterative clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4895–4904. [Google Scholar] [CrossRef]
Yan, L.; Roy, D.P. Automated crop field extraction from multi-temporal Web Enabled Landsat Data. Remote Sens. Environ. 2014, 144, 42–64. [Google Scholar] [CrossRef]
Sonobe, R.; Yamaya, Y.; Tani, H.; Wang, X.; Kobayashi, N.; Mochizuki, K.-I. Crop classification from Sentinel-2-derived vegetation indices using ensemble learning. J. Appl. Rem. Sens. 2018, 12, 026019. [Google Scholar] [CrossRef]
Delegido, J.; Verrelst, J.; Alonso, L.; Moreno, J. Evaluation of Sentinel-2 red-edge bands for empirical estimation of green LAI and chlorophyll content. Sensors 2011, 11, 7063–7081. [Google Scholar] [CrossRef]
Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
Huete, A.R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Gao, B.-C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef]
Hashemi, M.G.Z.; Jalilvand, E.; Alemohammad, H.; Tan, P.-N.; Das, N.N. Review of synthetic aperture radar with deep learning in agricultural applications. ISPRS J. Photogramm. Remote Sens. 2024, 218, 20–49. [Google Scholar] [CrossRef]
van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 4768–4777. [Google Scholar]
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]

Figure 1. Schematic map of the geographical location of the Hetao Irrigation District.

Figure 2. Crop growth periods and acquisition dates of Sentinel-2 and Sentinel-1 imagery.

Figure 3. Integrated preprocessing workflow of Sentinel-2 and Sentinel-1 data.

Figure 4. Ground samples and local zoomed-in view.

Figure 5. Spatial distribution of the ground samples in the irrigation district.

Figure 6. Land parcel extraction based on U-Net.

Figure 7. Changes in parcels after merging and filtering.

Figure 8. Technical workflow of the proposed parcel-scale crop classification framework.

Figure 9. Time-series curves of typical optical phenological indices, including NDVI, EVI, GNDVI, SAVI, NDWI, and RENDVI. Shaded areas represent the standard deviation.

Figure 10. Time-series curves of VH and VV backscattering coefficients, RVI, and VV/VH. Shaded areas represent the standard deviation.

Figure 11. LSTM model architecture and outputs.

Figure 12. Schematic diagram of a single LSTM unit.

Figure 13. t-SNE visualization of the learned temporal embeddings derived from the LSTM network. The embeddings integrate multi-temporal Sentinel-1 and Sentinel-2 observations and compress the annual time-series information into a low-dimensional representation space. Different colors represent different crop types.

Figure 14. SHAP-based interpretation of the XGBoost model.

Figure 15. SHAP-based interpretation of the SVM model.

Figure 16. SHAP-based interpretation of the RF model.

Figure 17. Five example distribution charts of three classification models. The numbers 1-5 in the figure correspond respectively to the numbers 1–5 in Table 6.

Figure 18. Large-scale mapping results of the three methods and corresponding parcel-level statistical values.

Figure 19. Representative visual comparison between pixel-based and parcel-based classification results. The pixel-based approach shows evident salt-and-pepper noise and more ambiguous field boundaries, whereas the parcel-based framework produces more homogeneous within-parcel patterns and clearer spatial continuity. Although some parcel borders may be narrower than the spatial resolution of the input imagery, parcel-level aggregation still helps reduce local mixed-pixel noise and improves the overall coherence of field-scale crop maps.

Table 1. Summary of Sentinel-1 and Sentinel-2 data used in this study.

Data Source	Acquisition Period	Source	Acquisition Mode	Type	Polarisation Channels	Number of Scenes
Sentinel-1	1 March 2024 to 12 October 2024	Copernicus BROWSER	IW	GRD	VV + VH	8
Data source	Acquisition period	source	Cloud threshold	processing level	Tile ID	Number of scenes
Sentinel-2	1 March 2024 to 30 October 2024	Copernicus BROWSER	˂10%	L2A	T48TYL	66

Table 2. Band information and feature definitions of Sentinel-2 (S2) and Sentinel-1 (S1).

Band and Feature Calculation Formula
S2	Blue	B2 496.6 nm (S2A)/492.1 nm (S2B)
	Green	B3 560 nm (S2A)/559 nm (S2B)
	Red	B4 664.5 nm (S2A)/665 nm (S2B)
	NIR	B8 835.1 nm (S2A)/833 nm (S2B)
	SWIR-1	B11 1613.7 nm (S2A)/1610.4 nm (S2B)
	SWIR-2	B12 2202.4 nm (S2A)/2185.7 nm (S2B)
	RENDVI	(B8 − B7)/(B8 + B7)
	GNDVI	(B8 − B3)/(B8 + B3)
	NDVI	(B8 − B4)/(B8 + B4)
	EVI	2.5 × (B8 − B4)/(B8 + 6 × B4 − 7.5 × B2 + 1)
	NDRE	(B8 − B5)/(B8 + B5)
	cIre	B8/B5 − 1
	NDWI	(B8 − B11)/(B8 + B11)
	SAVI	1.5(B8 − B4)/(B8 + B4 + 0.5)
S1	Backscattering coefficient under VV polarization	σ⁰_VV
	Backscattering coefficient under VH polarization	σ⁰_VH
	Backscattering coefficient ratio	VV/VH
	RVI	4 × VH/(VV + VH)

Table 3. Quantitative Comparison of Parcel Extraction Methods.

Extraction Method	Boundary IoU	Remarks
U-Net (Baseline)	63.21%	Suffers from severe parcel adhesion; spectral and phenological signals of different crop types are frequently mixed.
SNIC (Proposed)	77.34%	Effectively suppresses “salt-and-pepper” noise while maintaining high within-parcel spatial homogeneity.

Table 4. Preliminary sensitivity analysis for selecting SNIC parameter settings.

Model	Seed Spacing (Pixels)	Compactness	Overall Accuracy (OA)	mIoU	Kappa
XGBoost + LSTM	8	20	93.42%	87.15%	91.35%
XGBoost + LSTM	10	20	93.61%	87.41%	91.66%
XGBoost + LSTM	15	20	92.85%	86.20%	90.50%
XGBoost + LSTM	10	10	93.15%	86.80%	91.02%
XGBoost + LSTM	10	30	93.50%	87.00%	91.40%

Table 5. Ablation study on the impact of feature selection (using the XGBoost + LSTM model).

Feature Strategy	Overall Accuracy (OA)	mIoU	Kappa
Reduced Feature Set (After explicit feature selection via RFE)	93.35%	86.92%	0.90
Full Feature Set (No explicit feature removal)	93.61%	87.41%	0.91

Table 7. Validation results of different crops for the three models.

Model	RF + LSTM			SVM + LSTM			XGBoost + LSTM
Crop Types	Wheat	Corn	Sunflower	Wheat	Corn	Sunflower	Wheat	Corn	Sunflower
F1	81.35%	93.33%	90.9%	70.83%	85.71%	76.9%	85.61%	97.22%	87.27%
UA	80%	96.55%	92.1%	73.91%	80.76%	78.12%	87.20%	96.80%	89.10%
PA	82.76%	90.32%	89.74%	68%	91.35%	75.66%	84.08%	97.64%	85.51%
IoU	68.57%	87.5%	83.33%	54.84%	75%	62.47%	74.84%	94.59%	77.41%

Table 8. Results of the ablation experiments.

Model	RF	RF + LSTM	SVM	SVM + LSTM	XGBoost	XGBoost + LSTM
OA	84.67%	90.78%	78.83%	82.81%	85.4%	93.61%
kappa	0.80	0.88	0.72	0.81	0.81	0.92
mIoU	72.8%	81.34%	61.85%	70.46%	73.23%	87.41%

Table 9. Comparison of classification performance before and after Sentinel-1 linear interpolation (using the XGBoost + LSTM model).

Data Processing Strategy	Overall Accuracy (OA)	mIoU	Kappa
Without Interpolation (Nearest-neighbor temporal matching)	93.18%	86.82%	0.91
With Linear Interpolation (Synchronized to 15-day intervals)	93.61%	87.41%	0.92

Table 10. Comparison of optical-only, SAR-only, and multi-source classification performance.

Feature Source	OA	Kappa	mIoU
Optical only	91.24%	0.88	83.92%
SAR only	84.31%	0.79	72.46%
Optical + SAR	93.61%	0.92	87.41%

Table 11. Pixel-based vs. parcel-based classification comparison.

Method	OA	Kappa	mIoU
Pixel-based RF	88.42%	0.85	75.31%
Parcel-based RF	90.78%	0.88	81.34%
Pixel-based XGBoost	89.57%	0.87	77.18%
Parcel-based XGBoost	93.61%	0.91	87.41%
Pixel-based SVM	79.57%	0.77	67.44%
Parcel-based SVM	82.81%	0.81	70.46%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, S.; Liu, Q.; Yan, Z. Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis. Agriculture 2026, 16, 920. https://doi.org/10.3390/agriculture16090920

AMA Style

Shi S, Liu Q, Yan Z. Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis. Agriculture. 2026; 16(9):920. https://doi.org/10.3390/agriculture16090920

Chicago/Turabian Style

Shi, Sensen, Quanming Liu, and Zhiyuan Yan. 2026. "Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis" Agriculture 16, no. 9: 920. https://doi.org/10.3390/agriculture16090920

APA Style

Shi, S., Liu, Q., & Yan, Z. (2026). Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis. Agriculture, 16(9), 920. https://doi.org/10.3390/agriculture16090920

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Crop Type Mapping in an Irrigation District Using Multi-Source Remote Sensing and LSTM-Based Time Series Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data Sources and Preprocessing

2.2.1. Sentinel-2 Satellite Imagery

2.2.2. Sentinel-1 Satellite Imagery

2.2.3. Data Acquisition and Preprocessing

2.3. Ground Sample Data

2.4. Parcel Extraction

2.5. Overall Research Framework

2.6. Time-Series Dataset Construction

2.6.1. Vegetation Indices and Optical Time-Series Features

2.6.2. Radar Time-Series Features

2.7. Classification Models

2.7.1. Long Short-Term Memory (LSTM)

2.7.2. Random Forest (RF)

2.7.3. Support Vector Machine (SVM)

2.7.4. Extreme Gradient Boosting (XGBoost)

2.8. Accuracy Evaluation Metrics

3. Results

3.1. SHAP-Based Model Interpretability Results

3.2. Visual Comparison of Classification Results

3.3. Analysis of Individual Model Results

3.4. Quantitative Validation Results

3.5. Ablation Experiment

3.6. Optical/SAR Contribution and Temporal Interpolation Analysis

3.7. Pixel-Based vs. Parcel-Based Comparison

4. Discussion

4.1. Agricultural Interpretation of Differences in Crop Classification Results

4.2. Significance of Parcel-Scale Classification for Irrigation District Agricultural Management

4.3. Role of Temporal Features in Characterizing Crop Phenological Differences

4.4. Method Limitations and Future Research Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI