Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches

Okupska, Edyta; Juostas, Antanas; Gozdowski, Dariusz; Wójcik-Gront, Elżbieta

doi:10.3390/agronomy16070763

Open AccessArticle

Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches

¹

Seed and Agricultural Farm, “Bovinas” Ltd., Chodów 17, 62-652 Poznań, Poland

²

Institute of Agricultural Engineering and Safety, Agriculture Academy, Vytautas Magnus University, Studentu 15, Akademija, LT-53362 Kaunas, Lithuania

³

Department of Biometry, Institute of Agriculture, Warsaw University of Life Sciences, Nowoursynowska 159, 02-776 Warsaw, Poland

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(7), 763; https://doi.org/10.3390/agronomy16070763

Submission received: 2 March 2026 / Revised: 1 April 2026 / Accepted: 3 April 2026 / Published: 5 April 2026

(This article belongs to the Special Issue Agricultural Monitoring and Yield Assessment Through Remote Sensing and GIS)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Accurate prediction of winter oilseed rape yield is essential for optimising crop management and improving production efficiency. However, the reliability of commonly reported model performance remains uncertain due to the widespread use of random validation strategies. This study evaluated the predictive potential of multi-temporal Normalised Difference Vegetation Index (NDVI) metrics collected between September 2023 and May 2024 for yield estimation across multiple Lithuanian fields, while explicitly addressing spatial generalisation. The analytical dataset comprised dry yield (t ha⁻¹), monthly NDVI, and field identifiers, and underwent quality control, including outlier removal. Four modelling approaches were compared: ordinary least squares (OLS) regression, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and a Deep Neural Network (DNN). Model performance was assessed using both random (80/20) and a spatially independent field-wise (GroupSplit) validation schemes designed to assess model transferability to previously unseen fields, further extended by repeated group-based resampling to quantify variability in model generalisation. Under random sampling, RF and XGBoost achieved the highest accuracy (RMSE ≈ 0.85 t ha⁻¹, R² ≈ 0.55). However, under spatially independent validation, predictive performance declined markedly for all models, with tree-based ensembles showing near-zero R² values, indicating limited transferability to unseen fields. In contrast, the DNN demonstrated more consistent generalisation (RMSE = 1.09 t ha⁻¹, R² = 0.28). Repeated field-wise validation confirmed that performance estimates based on random splits substantially overestimate true predictive capability. Feature importance analyses consistently identified spring NDVI, particularly from March to May, as the dominant predictor of yield, whereas autumn NDVI showed weaker and less consistent relationships with yield. These findings demonstrate that a large portion of the predictive skill reported in NDVI-based yield modelling may arise from spatial information leakage rather than transferable crop-environment relationships. By explicitly quantifying the gap between random and spatial validation, this study provides a more realistic benchmark for model performance and highlights the necessity of spatially robust evaluation frameworks for operational yield prediction in precision agriculture.

Keywords:

winter oilseed rape; NDVI; yield prediction; remote sensing; machine learning; deep neural network; random forest; precision agriculture

1. Introduction

Accurate yield prediction remains a cornerstone of modern precision agriculture, informing management decisions, supply chain planning, and broader food security assessments. For winter oilseed rape (Brassica napus L.), one of the dominant oilseed crops in temperate regions, yield formation is strongly conditioned by both autumn canopy establishment and rapid biomass accumulation during spring regrowth [1,2]. Detecting spatial and temporal variability in crop vigour during these phases can provide actionable guidance for fertilisation, irrigation, and pest control, thereby supporting more efficient and sustainable production systems [3]. Over the past two decades, remote sensing has become an increasingly integral component of agricultural monitoring, largely due to the growing availability of multispectral satellite imagery and improvements in data processing workflows. Among vegetation metrics, the Normalised Difference Vegetation Index (NDVI) is still one of the most widely used proxies for green biomass, canopy closure, and photosynthetic activity [4,5,6]. Based on near-infrared and red reflectance, NDVI provides a sensitive though not flawless indicator of plant vigour across developmental stages and has been widely applied in yield estimation studies for cereals, maize, soybean, and oilseed rape [7,8,9,10]. The temporal depth of the NDVI time series further enables tracking of phenological transitions, which often carry important signals related to final yield [11]. In winter oilseed rape, earlier work has pointed to the combined importance of autumn establishment and spring canopy dynamics for yield determination [12,13,14,15]. NDVI measured in early spring frequently shows a positive association with yield, likely because it reflects overwintering success and the initial pace of canopy recovery [16,17,18]. The autumn signal, however, appears more context-dependent. Under mild conditions, vigorous autumn growth may support higher yield potential, whereas in colder regions, excessive early biomass can increase exposure to winter damage [19]. For this reason, reliable yield prediction is unlikely to depend on any single observation date but rather on capturing the seasonal trajectory of canopy development [20]. Parallel to advances in remote sensing, data-driven modelling approaches have reshaped yield prediction research. Classical linear regression remains attractive for its transparency and interpretability, yet it may struggle to capture nonlinear interactions among biophysical variables [21]. Machine-learning algorithms, including Random Forest (RF), Extreme Gradient Boosting (XGBoost), and artificial neural networks (ANNs), offer greater flexibility in modelling complex relationships between multi-temporal NDVI features and yield [22,23,24]. Encouraging results have been reported for several crops, such as wheat [25], maize [26], and oilseed rape [27]. Still, an important limitation persists. Many models perform well under random sampling but degrade when applied beyond their training domain, reflecting spatial heterogeneity, field-specific effects, and environmental variability [28]. They are mostly evaluated using random train-test splits, which may lead to overly optimistic performance estimates because spatial structure and field-specific effects are partially shared between the training and test data. Improving model generalisation, therefore, requires evaluation under spatially independent validation schemes such as field-wise cross-validation, which more closely mimic the operational task of predicting yield for previously unseen fields [29]. Despite its critical importance for operational deployment, the explicit comparison of random versus spatially independent validation remains underexplored in winter oilseed rape yield modelling. This study addresses this methodological gap by quantifying the spatial transferability penalty across different machine learning architectures. At the same time, the relative importance of individual phenological windows remains insufficiently resolved for winter oilseed rape, whose growth trajectory spans both autumn establishment and spring regrowth phases.

Against this background, the present study aims to clarify the temporal sensitivity and predictive value of multi-temporal NDVI for winter rape yield estimation. Specifically, the objective of this study are to: (1) quantify relationships between multi-temporal NDVI and winter oilseed rape yield during the 2023–2024 growing season; (2) evaluate and compare linear and nonlinear modelling approaches (OLS, RF, XGBoost, and Deep Neural Network); and (3) identify the phenological periods that contribute most strongly to predictive performance. By combining multi-temporal Sentinel-2 NDVI data with several modelling frameworks and a spatially rigorous validation strategy, this work seeks to improve understanding of the temporal dynamics of NDVI-yield relationships and to provide practical guidance for remote-sensing-supported decision-making in oilseed rape production. By explicitly quantifying the discrepancy between random and spatially independent validation using repeated group-based resampling, this study demonstrates that commonly reported model accuracies may substantially overestimate true predictive performance, thereby redefining how NDVI-based yield models should be evaluated for operational use.

2. Materials and Methods

2.1. Study Area

The study was conducted in central Lithuania, a region characterised by a humid continental climate with cold winters and moderately warm summers. The average annual temperature is approximately 6–7 °C, while mean annual precipitation ranges between 600 and 700 mm. The growing season typically extends from April to October. The study included eight commercial winter oilseed rape fields, ranging in size from 6.9 to 62.5 ha, totalling approximately 224 ha. Yield monitor data were collected during the harvest performed in July and August 2024 from winter oilseed rape fields during a single 2023–2024 growing season (Figure 1, Table 1).

2.2. Yield Monitor Data

Each yield dataset consisted of georeferenced yield monitor observations of dry seed yield, expressed in t ha⁻¹. In total, 29,569 yield observations were retained for analysis after preprocessing and filtering of raw yield monitor data (Table S1). Observed winter oilseed rape yield across the analysed fields ranged from 0.31 to 9.95 t ha⁻¹, reflecting substantial spatial variability within and between fields. Normalised Difference Vegetation Index (NDVI) metrics derived from Sentinel-2 imagery were used to describe crop development during the growing season. NDVI values were aggregated into monthly metrics spanning September 2023 to May 2024 (Table 2).

2.3. Sentinel-2 NDVI Processing

Sentinel-2 multispectral imagery was used to derive vegetation metrics describing crop development during the growing season. NDVI was calculated using the standard formulation:

N D V I = \frac{N I R - R E D}{N I R + R E D}

where NIR corresponds to Sentinel-2 band B8 and RED corresponds to band B4. These bands provide a spatial resolution of 10 m.

Monthly NDVI values were calculated as the mean of all available cloud-free Sentinel-2 observations within each month. The analytical dataset, therefore, consisted of winter oilseed rape yield observations, together with multi-temporal NDVI metrics describing crop development during the autumn establishment phase (September–November 2023) and the spring regrowth period (February–May 2024). The final modelling dataset included winter oilseed rape dry yield (t ha⁻¹), monthly NDVI metrics, and categorical field identifiers. Field identifiers were encoded using one-hot encoding to capture field-specific variability that may not be fully represented by vegetation indices alone. NDVI variables were treated as continuous predictors and standardised where required by the modelling algorithms. Missing NDVI values were imputed with the median from the training subset, preventing information leakage from the test data. Yield observations exceeding 10 t ha⁻¹ were removed prior to modelling because they were considered implausible outliers likely resulting from measurement artefacts. Preliminary data inspection and formatting were performed in Microsoft Excel prior to statistical analysis.

2.4. Modelling Approaches

To evaluate the predictive capacity of NDVI time series, both classical statistical and machine-learning models were implemented. Four modelling methods were compared: Ordinary Least Squares (OLS) regression, used as an interpretable baseline model describing linear relationships between NDVI and yield; Random Forest (RF) regression, capable of capturing nonlinear interactions between predictors; Extreme Gradient Boosting (XGBoost), a gradient boosting ensemble method designed to model complex nonlinear relationships; Deep Neural Network (DNN), used to evaluate the potential of deep learning for yield prediction. The Deep Neural Network consisted of three fully connected hidden layers with 256, 128, and 64 neurons, respectively. LeakyReLU activation functions were used in each hidden layer. To reduce the risk of overfitting, dropout regularisation (0.20–0.25) was applied. The network was trained using the Adam optimiser with a learning rate of 0.001 and mean squared error as the loss function [30]. Training was performed for up to 400 epochs, with early stopping triggered when validation loss did not improve for 20 consecutive epochs. Although Deep Neural Networks are often applied to large datasets, the present dataset contains more than 29,000 observations, allowing the model to learn spatial patterns within fields.

Model performance was evaluated using two complementary validation strategies. First, a random 80/20 split divided the dataset into training (80%) and testing (20%) subsets. Under this scheme, approximately 23,600 observations were used for training and 5900 observations for testing. In the training subset of the Deep Neural Network model, 20% of the data was reserved for validation during model training. Second, a field-wise GroupSplit validation was applied. In this approach, the grouping variable corresponded to the field identifier, ensuring that all observations originating from the same field were assigned either to the training or to the testing subset. Approximately 80% of the fields were used for training, and the remaining 20% for testing. This strategy prevents spatial information leakage and represents a more realistic scenario in which models must predict yield for previously unseen fields. To further assess the robustness of spatial transferability, field-wise validation was repeated 20 times using repeated GroupShuffleSplit.

Predictive performance was assessed using Root Mean Square Error (RMSE) and the coefficient of determination (R²), which were treated as the main measures of predictive accuracy. Pearson’s correlation coefficient (r) was also reported as a supplementary descriptive statistic, as it does not directly quantify prediction error or explained variance.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

r = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) ({\hat{y}}_{i} - \bar{\hat{y}})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {({\hat{y}}_{i} - \bar{\hat{y}})}^{2}}}

where n is the number of observations,

y_{i}

observed yield value,

{\hat{y}}_{i}

predicted yield, value

\bar{y}

mean observed yield, and

\bar{\hat{y}}

mean predicted yield.

To identify which periods of crop development contributed most strongly to yield prediction, feature importance was analysed for the Random Forest and CART (Classification and Regression Trees) models. Two complementary measures were used: the built-in mean decrease in impurity and permutation importance calculated on the test set. Using both metrics helped reduce the risk of over-interpreting a single importance measure. All models were implemented in Python (version ≥ 3.10) and Statistica 13^® software (TIBCO Software Inc., Palo Alto, CA, USA). Machine learning models were developed using the scikit-learn library, XGBoost was implemented using the official XGBoost Python package, and the Deep Neural Network was implemented using TensorFlow/Keras (Figure 2).

3. Results

3.1. Yield Variability and NDVI Dynamics

Dry yield of winter oilseed rape varied markedly among the studied fields, ranging from 1.52 to 3.33 t ha⁻¹. The highest yield was recorded at Maišiagala (3.33 t ha⁻¹), followed closely by Barskunai (3.27 t ha⁻¹), whereas the lowest yields were observed at Drublionys (1.52 t ha⁻¹) and Paberžine (1.67 t ha⁻¹). NDVI values showed clear seasonal dynamics. In September 2023, NDVI ranged from 0.35 to 0.72, indicating variable early crop establishment across sites. Where available, October–November 2023 NDVI values were generally high (up to 0.85), reflecting well-developed autumn canopies at selected locations (e.g., Paberžine and Drublionys). During the winter/early spring period (February–March 2024), NDVI declined to 0.35–0.62, consistent with reduced vegetation activity. A strong increase was observed in April–May 2024, when NDVI reached peak values of 0.65–0.84 in April and 0.68–0.79 in May, indicating vigorous spring regrowth and canopy closure. Fields with the highest yields (e.g., Maišiagala and Barskunai sites) were generally characterised by high spring NDVI values (0.78–0.83), whereas lower-yielding fields showed more moderate or variable NDVI patterns. Overall, the data indicate substantial spatial variability in crop performance and confirm the usefulness of NDVI time series for monitoring winter oilseed rape development.

Pearson correlation analysis was conducted using observations with complete NDVI information (N = 7349) to describe relationships between NDVI and yield. Missing NDVI values reduced the number of usable observations for correlation analysis compared with the full modelling dataset (29,569 sampling points). Pearson correlation analysis revealed statistically significant relationships (p < 0.05) between dry yield and the multi-temporal NDVI metrics. However, the strength and direction of these associations varied substantially across the growing season. The strongest positive relationship with yield was observed for NDVI in March 2024 (r = 0.48), indicating that early spring canopy development was the most informative period for yield prediction. Weaker but still meaningful positive correlations were found for February 2024 (r = 0.26) and April 2024 NDVI (r = 0.19). By contrast, NDVI in May 2024 showed a near-negligible relationship with yield (r = 0.02), suggesting that very late-season canopy greenness adds little independent predictive value, possibly due to index saturation or reduced variability at this stage. Autumn NDVI metrics displayed weak negative associations with yield. November 2023 NDVI showed a modest inverse correlation (r = −0.26), and September 2023 NDVI followed a similar pattern (r = −0.21). October 2023 NDVI exhibited only a very weak positive relationship (r = 0.08). These mixed and generally weak autumn signals may reflect variability in establishment conditions that does not consistently translate into final yield differences.

3.2. Model Performance

The final dataset comprised 29,569 observations and four NDVI predictors (February–May 2024). Model performance varied substantially across validation strategies and feature sets (Table 3). Under the random 80/20 split, the highest accuracy was achieved by the tree-based ensemble models when both NDVI and Field ID were included. XGBoost and Random Forest performed best (RMSE ≈ 0.85 t ha⁻¹, R² ≈ 0.55), followed by DNN (RMSE = 0.88 t ha⁻¹, R² = 0.52). OLS showed the lowest predictive performance (R² = 0.39). Using NDVI only led to a consistent decrease in model accuracy across all methods. In contrast, under field-wise GroupSplit validation, predictive accuracy declined for all models. The DNN showed the best generalisation under field-wise validation (RMSE = 1.09 t ha⁻¹, R² = 0.28), followed by OLS (R² = 0.19). In contrast, the tree-based models exhibited poor transferability, with higher prediction errors and near-zero R² values. Although Pearson’s remained moderate in some cases, this did not translate into strong predictive performance. This indicates limited ability to generalise beyond the training fields. Repeated GroupSplit validation confirmed these findings. Mean R² values were low and highly variable across repeated field-wise splits, and this pattern was accompanied by persistently elevated prediction errors, particularly for tree-based models, whereas the DNN showed relatively more stable performance. The results demonstrate that model accuracy is substantially overestimated under random splits and that spatial transferability remains a major limitation. Including Field ID improved performance under random validation but had limited or inconsistent effects under GroupSplit, suggesting that field-specific information contributes primarily to within-sample fitting rather than generalizable prediction. Feature importance analysis consistently identified spring NDVI as the dominant predictor. NDVI in April showed the highest importance, followed by May and March, confirming that the main predictive signal originates from the spring regrowth period.

Feature importance analysis based on the RF model consistently pointed to spring NDVI as the primary driver of yield variability. The strongest contribution came from April NDVI (NDVI 1 April 2024 to 30 April 2024), which ranked highest both in impurity-based importance (0.36). May NDVI was the second-most-influential predictor (importance = 0.22), followed by March NDVI (importance = 0.18). February NDVI showed a smaller but still detectable contribution (importance = 0.12). This pattern suggests that the model is most sensitive to canopy conditions during the peak spring growth window. Categorical variables representing field identity were less prominent in the impurity-based ranking. Yield prediction was primarily driven by NDVI during the main spring development phase, especially in April and May. The spatial maps (Figure 3) reveal pronounced within-field variability in winter oilseed rape yield. In most fields, yield patterns followed clear striping related to harvesting direction and management history. The residual maps demonstrate that model errors were generally moderate but spatially structured. Clusters of positive residuals indicate zones where the model tended to underestimate yield, whereas negative residuals mark areas of overestimation. The presence of spatially coherent residual patches suggests that part of the yield variability may be driven by factors not fully captured by the multi-temporal NDVI metrics, such as soil heterogeneity, micro-topography, or local management effects.

The CART model provided an interpretable view of the hierarchical relationships between multi-temporal NDVI and winter oilseed rape yield (Figure 4 and Figure 5). The tree structure suggests that NDVI in March 2024 was the primary splitting variable, indicating that early spring canopy condition was likely the strongest determinant of yield variability in the analysed dataset. At the root node (N = 7349), the mean predicted yield was 2.23 t ha⁻¹. The first split, at NDVI 2024-03 ≤ 0.50, clearly separated lower- from higher-yielding situations. Fields with lower March NDVI values formed a subgroup with a reduced mean yield (1.88 t ha⁻¹), whereas higher NDVI values were associated with substantially greater yield potential (2.72 t ha⁻¹). This pattern is consistent with the expectation that early canopy development sets the trajectory for subsequent yield formation. Within the low-NDVI branch, the model next split on Field ID, pointing to the presence of field-specific effects under suboptimal early-season conditions. One subgroup (Field ID = 1) showed a noticeably higher mean yield (2.17 t ha⁻¹) than the remaining fields (1.41 t ha⁻¹), which may indicate that local soil properties or management practices partly compensated for weaker early vigour. Further partitioning in this branch was driven by April NDVI. Higher values (>0.53) were associated with improved yield (2.39 t ha⁻¹) relative to lower values (1.46 t ha⁻¹), suggesting that mid-spring canopy development can partially offset earlier limitations. On the high-March-NDVI branch, NDVI in May 2024 emerged as the next influential predictor, further emphasising the importance of late-spring canopy status. Interestingly, the subgroup with NDVI 2024-05 ≤ 0.80 exhibited the highest mean yield (2.82 t ha⁻¹), whereas very high May NDVI (>0.80) corresponded to somewhat lower yield (1.86 t ha⁻¹). This pattern may reflect NDVI saturation effects or structural differences in dense canopies, although this interpretation should be treated with some caution. The final splits incorporated November NDVI (2023), suggesting that autumn crop establishment still contributed to yield differentiation even under otherwise favourable spring conditions. The CART analysis reinforces the central role of spring NDVI (March–May) in explaining yield variability. Field-specific effects were most evident under low early-season vigour, suggesting that site conditions matter most when crop development is constrained. The tree also hints at diminishing returns or possible index saturation at very high late-season NDVI values. At the same time, the relatively shallow tree and the consistent prominence of NDVI predictors support the biological plausibility of remotely sensed canopy metrics as indicators of winter oilseed rape yield.

The variable importance analysis revealed a clear dominance of spring NDVI metrics in explaining winter oilseed rape yield variability (Figure 5). The highest importance was assigned to NDVI from March 2024, which reached the maximum normalised value (1.0). This strongly suggests that early spring canopy status was the single most influential predictor in the model. Substantial importance was also observed for NDVI from September and November 2023, indicating that autumn crop establishment contributed meaningfully to final yield formation. Although these effects were secondary to the spring signal, their consistent presence implies that early crop condition still leaves a detectable imprint on yield potential. Among the late-spring indicators, NDVI from April and May 2024 showed comparable, moderately high importance values (≈0.70). This pattern reinforces the central role of peak spring biomass development, which appears to act in concert with the earlier March signal rather than replacing it. The categorical variable Field ID displayed intermediate importance. This suggests that site-specific effects were present but generally less influential than the key NDVI-derived canopy metrics. By contrast, NDVI from February 2024 ranked lowest among the analysed predictors, suggesting limited sensitivity of yield to canopy conditions during the late winter. The ranking consistently highlights the critical role of spring canopy vigour (March–May), with additional—though smaller—contributions from autumn establishment metrics. From a physiological perspective, this pattern is plausible, as yield formation in winter oilseed rape is tightly linked to rapid canopy expansion and biomass accumulation during the main spring growth phase.

4. Discussion

4.1. Temporal Relevance of NDVI for Winter Oilseed Rape Yield Prediction

This study showed that NDVI’s predictive value depends strongly on crop phenology. Across the analyses, the most informative period for yield prediction was the spring regrowth phase, particularly March–May. This pattern is physiologically plausible because winter oilseed rape undergoes rapid canopy expansion and biomass accumulation during early and mid-spring, and these processes strongly influence pod formation and final seed yield [1,2,31]. Earlier work has pointed to the same window, noting that spring greenness often reflects overwinter survival and the vigour of regrowth [13,32]. The clearest linear signal appeared in March. In practical terms, this suggests that canopy condition at the onset of stem elongation already provides a meaningful, though far from perfect, indication of yield potential. The NDVI in February and April also showed moderate positive relationships, which align with studies linking early canopy development to nitrogen uptake and the initiation of reproductive structures [33,34]. From a phenological standpoint, March sits at a transition from the rosette stage to stem elongation, so its prominence in the models seems biologically plausible. By contrast, late-season NDVI (May) contributed little independent linear information, even though it ranked highly in the tree-based importance metrics. Most likely, however, it reflects NDVI saturation in dense canopies [4]. Once the canopy closes, additional biomass does not result in proportional changes in reflectance. Then, a decline in NDVI sensitivity near flowering has been observed [35]. Autumn NDVI (September–November 2023) showed weak or even negative associations with yield. This somewhat counterintuitive pattern has been described before in cooler regions. Excessively vigorous autumn growth can, under some conditions, increase vulnerability to winter injury, whereas moderate development tends to favour overwinter survival [36,37]. The negative correlation observed in November is consistent with that interpretation, although other unmeasured factors may also be at play.

4.2. Model Performance and the Importance of Validation Strategy

Model performance depended strongly on the validation approach. Under the random 80/20 split, tree-based ensemble methods (RF and XGBoost) produced the highest accuracy. This outcome reinforces a now familiar point: vegetation index-yield relationships are rarely purely linear, and flexible machine-learning models tend to capture their structure more effectively [38,39]. The picture changed noticeably when field-wise group-split validation was applied. Once the models were forced to predict entirely unseen fields, accuracy dropped across the board. This contrast is methodologically important because it shows that random splitting can overestimate model skill when spatially related observations from the same field are present in both training and testing subsets. In this sense, one of the main contributions of the present study is not simply the comparison of models, but the explicit demonstration that conclusions about “best model” depend on whether spatial transfer is taken into account. Including R² alongside RMSE provided a more complete view of predictive skill, particularly in spatially independent validation. In contrast, Pearson’s r was less suitable as a primary performance metric because a model may preserve the relative ranking of observations while still producing large prediction errors and low explained variance. The repeated field-wise validation applied in this study reveals a critical limitation that is largely overlooked in NDVI-based yield prediction research. While most existing studies report high predictive performance with random train-test splits, our results demonstrate that these estimates are substantially inflated by spatial information leakage. When evaluated using repeated GroupShuffleSplit, model performance dropped markedly, with mean R² values frequently approaching zero for widely used tree-based ensemble models. This indicates that these models, although effective in fitting within-field variability, fail to generalise to new spatial domains. In contrast, the DNN exhibited comparatively greater robustness across repeated spatial splits, suggesting that it captures more generalizable temporal patterns of canopy development rather than relying on field-specific signals. Importantly, the inclusion of field identifiers did not improve and, in some cases, degraded performance under repeated spatial validation, highlighting that part of the predictive skill reported in previous studies may stem from implicit encoding of site-specific effects rather than transferable biological relationships. These findings directly challenge the common assumption that high accuracy under random validation reflects true predictive capability. By explicitly quantifying the spatial transferability gap using repeated group-aware validation, this study provides a more realistic and methodologically rigorous assessment of NDVI-based yield prediction and underscores the need to reevaluate model performance standards in precision agriculture. Yield variability is driven not only by canopy greenness, but also by field-specific differences in soil properties, microtopography, moisture redistribution, management, and other local factors not fully represented by NDVI alone [40]. Tree ensembles were especially sensitive to this shift. Random Forest, in particular, showed the least transferability, consistent with its known tendency to fit spatially autocorrelated patterns closely [41]. Interestingly, the Deep Neural Network maintained comparatively better performance under spatial transfer. One possible explanation is that the network learned more general temporal signatures of canopy development rather than field-specific patterns. This does not necessarily mean that DNN is universally superior, but in the present dataset, it was more robust when the task shifted from within-sample prediction to prediction in previously unseen fields. This distinction addresses an important gap in the literature, as many remote-sensing-based yield studies still report performance only on random splits, even though operational applications require models to generalise beyond the training domain.

The residual maps were perhaps among the more revealing outputs. Rather than appearing random, prediction errors formed coherent spatial patches within fields. That pattern strongly suggests that yield variability is driven by factors not fully captured by canopy greenness, such as soil heterogeneity (e.g., texture, organic matter, water-holding capacity), microtopographic controls on moisture redistribution, localised pest or disease pressure, and small-scale management differences. NDVI time series are highly informative but not exhaustive. Incorporating ancillary layers, such as soil maps, crop rotation data, or other structural indicators, would likely help reduce the spatial bias observed in the current models. Both RF and CART consistently elevated spring NDVI to the top of the predictor hierarchy. In the CART models, March NDVI frequently appeared as the first splitting variable, effectively separating lower from higher-yield cases. Subsequent splits involving April and May suggest that mid-spring canopy vigour adds incremental information rather than replacing the early signal.

One detail worth noting is the apparent plateau and occasional decline in yield at very high May NDVI values (>0.80). This pattern may point to NDVI saturation, but structural changes in the canopy could also play a role, for example, a higher proportion of stem biomass relative to reproductive tissue [42]. Without complementary structural data, it is difficult to fully disentangle these mechanisms. Although autumn NDVI ranked lower overall, it still appeared in both RF and CART importance outputs. This hints that early crop establishment leaves a detectable, if modest, imprint on final yield potential.

4.3. Comparison with Previous Studies

Previous work has suggested that NDVI-driven machine-learning tools can assist in fine-tuning fertilisation decisions in oilseed rape systems [13,43]. In the study by Sulig and Long [44], conducted in North America, the strongest relationships between NDVI and yield were observed during mid-season growth stages, particularly around peak canopy development and the flowering period. In the early growth and late senescent stages, the correlations were weaker, and during peak flowering, NDVI performance is further reduced because yellow flowers distort the spectral signal. In the study of Domínguez et al. [45], the strongest correlation between NDVI and winter oilseed rape seed yield occurred after flowering, at the beginning of pod development. This study confirmed weaker correlations during the flowering stage. A similar conclusion of a weaker correlation between UAV-derived NDVI during flowering and rapeseed seed yield was reported by Lucas et al. in a study conducted in the Czech Republic [10]. In the study of Zamani-Noor and Feistkorn conducted in Germany [46], the strongest correlations between UAV-derived NDVI and final winter oilseed rape yield occurred before winter at the full-leaf development stage (beginning of November) and during pod development (beginning of June), which showed the highest correlation with final seed yield of rapeseed. In a study conducted in Canada [47] on the relationships between satellite-derived NDVI and rapeseed seed yield, the strongest correlation was observed in later growth stages. The relationships were evaluated across various agro-climatic zones, and the strongest correlation was observed in the semi-arid zone, while the weakest was observed in the arid zone.

Rapeseed yield prediction during early growth stages can support precise nitrogen fertilisation decisions. The study by Wen et al. [33] demonstrated that a machine learning-based RFR model using NDVI and other predictors can accurately predict yield and optimise nitrogen fertilisation. According to the study by Hu et al. [43], among the machine learning methods tested for rapeseed yield prediction using UAV-derived spectral indices, random forest regression performed best, yielding the highest determination coefficient and the lowest error metrics compared to multiple linear regression and support vector machine models. This indicates that ensemble tree-based approaches, such as Random Forests, can more effectively capture complex relationships between spectral data and rapeseed yield than simpler regression techniques in this context. Machine learning algorithms allow the selection of the most important predictors, such as spectral indices, and enable accurate prediction of rapeseed seed yield across different spatial extents. It allows the use of such results from small plot scale, e.g., for selection for breeding purposes, through larger crop fields, up to the regional level [48,49,50].

4.4. Practical Implications, Limitations, and Future Work

Several practical points emerge from the analysis. First, the March–April window appears to be the most reliable period for operational yield forecasting in winter oilseed rape. Signals earlier than this are weaker and more context-dependent, while later NDVI becomes increasingly saturated. Second, model selection matters. Tree ensembles perform well during training and prediction under similar spatial conditions, but their performance degrades when extrapolated to new fields. Deep Neural Networks showed somewhat better spatial transfer here, although the advantage is not yet definitive. Third and perhaps most importantly, NDVI alone has clear structural limits. Integrating additional vegetation indices or weather variables would likely improve model stability and interpretability. Finally, the results reinforce the value of spatial validation. Random splits alone can give an overly optimistic view of model skill. From an agronomic standpoint, reasonably accurate early-season forecasts could support adaptive nitrogen management.

At the same time, several limitations must be acknowledged. The study is based on a single growing season, and the stability of the identified NDVI-yield relationships under different weather conditions remains uncertain. In addition, only NDVI and field identifiers were used as predictors, leaving important structural and environmental drivers of yield unrepresented. The DNN also needs to be interpreted cautiously, because although the dataset included more than 29,000 observations, these observations originated from only eight fields and one growing season. Future work should therefore focus on three directions. First, multi-year and multi-region datasets are needed to test the temporal and geographic robustness of the identified relationships. Second, model performance could likely be improved by integrating additional predictors. Third, spatially explicit modelling frameworks and spatial cross-validation should be further explored, as they might be more appropriate for operational yield forecasting than conventional random validation alone.

5. Conclusions

Multi-temporal NDVI proved to be a powerful yet interpretable basis for predicting winter oilseed rape yield. The analysis consistently indicated that NDVI acquired during the critical spring window (March–May) carried the strongest predictive signal for final yield, highlighting the importance of early canopy recovery and biomass accumulation during this phenological phase. Among the evaluated approaches, tree-based ensemble models (RF and XGBoost) achieved the highest accuracy under within-sample conditions, whereas the DNN demonstrated more reliable spatial transfer to previously unseen fields. These results emphasise that model ranking may depend strongly on the validation strategy used and underline the importance of spatially independent validation when assessing remote-sensing-based yield models. At the same time, the results highlight that overfitting remains an important limitation of ensemble models when applied across heterogeneous field environments. Taken together, the findings suggest that integrating multi-temporal vegetation metrics with more diverse remote-sensing inputs and temporally aware modelling frameworks represents a promising pathway toward scalable, operational yield forecasting systems.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agronomy16070763/s1, Table S1: Yield and NDVI rapeseed 2024.

Author Contributions

Conceptualisation, E.O. and E.W.-G.; methodology, E.O. and E.W.-G.; software, E.O.; validation, E.O. and D.G.; formal analysis, E.O., D.G. and E.W.-G.; investigation, E.O. and E.W.-G.; resources, A.J. and D.G.; data curation, D.G.; writing—original draft preparation, E.O.; writing—review and editing, E.O., A.J., D.G. and E.W.-G.; visualisation, E.O. and E.W.-G.; supervision, E.W.-G.; project administration, E.W.-G.; funding acquisition, E.W.-G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article/Supplementary Material. Further inquiries can be directed to the corresponding author.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (OpenAI, GPT-5.2) to assist with code refinement and language editing. The authors carefully reviewed and edited all outputs and take full responsibility for the content of this publication.

Conflicts of Interest

Author Edyta Okupska was employed by the company Seed and Agricultural Farm,“Bovinas” Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Rathke, G.W.; Behrens, T.; Diepenbrock, W. Integrated nitrogen management strategies to improve seed yield, oil content and nitrogen efficiency of winter oilseed rape (Brassica napus L.): A review. Agric. Ecosyst. Environ. 2006, 117, 80–108. [Google Scholar] [CrossRef]
Berry, P.M.; Spink, J.; Foulkes, M.J.; White, P.J. The physiological basis of genotypic differences in nitrogen use efficiency in oilseed rape (Brassica napus L.). Field Crops Res. 2010, 119, 365–373. [Google Scholar] [CrossRef]
Getahun, S.; Kefale, H.; Gelaye, Y. Application of precision agriculture technologies for sustainable crop production and environmental sustainability: A systematic review. Sci. World J. 2024, 2024, 2126734. [Google Scholar] [CrossRef]
Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef]
Xue, J.; Su, B. Significant remote sensing vegetation indices: A review of developments and applications. J. Sens. 2017, 2017, 1353691. [Google Scholar] [CrossRef]
Jiang, Z.; Huete, A.R.; Didan, K.; Miura, T. Development of a two-band enhanced vegetation index without a blue band. Remote Sens. Environ. 2008, 112, 3833–3845. [Google Scholar] [CrossRef]
Bolton, D.K.; Friedl, M.A. Forecasting crop yield using remotely sensed vegetation indices and crop phenology metrics. Agric. For. Meteorol. 2013, 173, 74–84. [Google Scholar] [CrossRef]
Becker-Reshef, I.; Justice, C.; Whitcraft, A.K.; Jarvis, I. Geoglam: A Geo Initiative on Global Agricultural Monitoring. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 8155–8157. [Google Scholar]
Wei, C.; Huang, J.; Mansaray, L.R.; Li, Z.; Liu, W.; Han, J. Estimation and Mapping of Winter Oilseed Rape LAI from High Spatial Resolution Satellite Data Based on a Hybrid Method. Remote Sens. 2017, 9, 488. [Google Scholar] [CrossRef]
Lukas, V.; Huňady, I.; Kintl, A.; Mezera, J.; Hammerschmiedt, T.; Sobotková, J.; Brtnický, M.; Elbl, J. Using UAV to Identify the Optimal Vegetation Index for Yield Prediction of Oil Seed Rape (Brassica napus L.) at the Flowering Stage. Remote Sens. 2022, 14, 4953. [Google Scholar] [CrossRef]
Ishaq, R.A.F.; Zhou, G.; Jing, G.; Shah, S.R.A.; Ali, A.; Imran, M.; Jiang, H.; Obaid-ur-Rehman. Geospatial Robust Wheat Yield Prediction Using Machine Learning and Integrated Crop Growth Model and Time-Series Satellite Data. Remote Sens. 2025, 17, 1140. [Google Scholar] [CrossRef]
Fan, H.; Liu, S.; Li, J.; Li, L.; Dang, L.; Ren, T.; Lu, J. Early prediction of the seed yield in winter oilseed rape based on the near-infrared reflectance of vegetation (NIRv). Comput. Electron. Agric. 2021, 186, 106166. [Google Scholar] [CrossRef]
Zhu, H.; Lin, C.; Dong, Z.; Xu, J.-L.; He, Y. Early Yield Prediction of Oilseed Rape Using UAV-Based Hyperspectral Imaging Combined with Machine Learning Algorithms. Agriculture 2025, 15, 1100. [Google Scholar] [CrossRef]
Hussain, S.; Gao, K.; Din, M.; Gao, Y.; Shi, Z.; Wang, S. Assessment of UAV-Onboard Multispectral Sensor for Non-Destructive Site-Specific Rapeseed Crop Phenotype Variable at Different Phenological Stages and Resolutions. Remote Sens. 2020, 12, 397. [Google Scholar] [CrossRef]
Jełowicki, Ł.; Sosnowicz, K.; Ostrowski, W.; Osińska-Skotak, K.; Bakuła, K. Evaluation of Rapeseed Winter Crop Damage Using UAV-Based Multispectral Imagery. Remote Sens. 2020, 12, 2618. [Google Scholar] [CrossRef]
Sun, Y.; Hao, Z.; Chang, H.; Yang, J.; Ding, G.; Guo, Z.; He, X.; Huang, J. Accurate mapping of rapeseed fields in the initial flowering stage using Sentinel-2 satellite images and convolutional neural networks. Ecol. Indic. 2024, 162, 112027. [Google Scholar] [CrossRef]
do Nascimento Bendini, H.; Fieuzal, R.; Carrere, P.; Clenet, H.; Galvani, A.; Allies, A.; Ceschia, É. Estimating Winter Cover Crop Biomass in France Using Optical Sentinel-2 Dense Image Time Series and Machine Learning. Remote Sens. 2024, 16, 834. [Google Scholar] [CrossRef]
Fieuzal, R.; Baup, F.; Marais-Sicre, C. Monitoring wheat and rapeseed by using synchronous optical and radar satellite data—From temporal signatures to crop parameters estimation. EARSeL Adv. REMOTE Sens. 2013, 2, 162–180. [Google Scholar] [CrossRef]
Tovpyha, M. Features of growing winter rapeseed in abnormally warm winters. Ukr. Black Sea Reg. Agrar. Sci. 2025, 29, 94–106. [Google Scholar] [CrossRef]
Li, N.; Hou, Z.; Jiang, H.; Chen, C.; Yang, C.; Sun, Y.; Yang, L.; Zhou, T.; Chu, J.; Fan, Q.; et al. Rapeseed Yield Estimation Using UAV-LiDAR and an Improved 3D Reconstruction Method. Agriculture 2025, 15, 2265. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2016; pp. 785–794. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Wang, L.a.; Zhou, X.; Zhu, X.; Dong, Z.; Guo, W. Estimation of biomass in wheat using random forest regression algorithm and remote sensing data. Crop J. 2016, 4, 212–219. [Google Scholar] [CrossRef]
Shahhosseini, M.; Hu, G.; Archontoulis, S.V. Forecasting corn yield with machine learning ensembles. Front. Plant Sci. 2020, 11, 1120. [Google Scholar] [CrossRef] [PubMed]
Zhu, H.; Liang, S.; Lin, C.; He, Y.; Xu, J.-L. Using multi-sensor data fusion techniques and machine learning algorithms for improving UAV-based yield prediction of oilseed rape. Drones 2024, 8, 642. [Google Scholar] [CrossRef]
You, J.; Li, X.; Low, M.; Lobell, D.; Ermon, S. Deep gaussian process for crop yield prediction based on remote sensing data. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017. [Google Scholar]
Filippi, P.; Han, S.Y.; Bishop, T.F. On crop yield modelling, predicting, and forecasting and addressing the common issues in published studies. Precis. Agric. 2025, 26, 8. [Google Scholar] [CrossRef]
Diepenbrock, W. Yield analysis of winter oilseed rape (Brassica napus L.): A review. Field Crops Res. 2000, 67, 35–49. [Google Scholar] [CrossRef]
Ma, Y.; Fang, S.; Peng, Y.; Gong, Y.; Wang, D. Remote Estimation of Biomass in Winter Oilseed Rape (Brassica napus L.) Using Canopy Hyperspectral Data at Different Growth Stages. Appl. Sci. 2019, 9, 545. [Google Scholar] [CrossRef]
Habekotté, B. A model of the phenological development of winter oilseed rape (Brassica napus L.). Field Crops Res. 1997, 54, 127–136. [Google Scholar] [CrossRef]
Liu, Y.; Liu, S.; Li, J.; Guo, X.; Wang, S.; Lu, J. Estimating biomass of winter oilseed rape using vegetation indices and texture metrics derived from UAV multispectral images. Comput. Electron. Agric. 2019, 166, 105026. [Google Scholar] [CrossRef]
Han, J.; Zhang, Z.; Cao, J. Developing a new method to identify flowering dynamics of rapeseed using landsat 8 and sentinel-1/2. Remote Sens. 2020, 13, 105. [Google Scholar] [CrossRef]
Rife, C.; Zeinali, H. Cold tolerance in oilseed rape over varying acclimation durations. Crop Sci. 2003, 43, 96–100. [Google Scholar] [CrossRef]
Balodis, O.; Gaile, Z. Changes of winter oilseed rape plant survival during vegetation. Rural Sustain. Res. 2015, 33, 35–45. [Google Scholar] [CrossRef]
Liu, Y.; Sun, L.; Liu, B.; Wu, Y.; Ma, J.; Zhang, W.; Wang, B.; Chen, Z. Estimation of winter wheat yield using multiple temporal vegetation indices derived from UAV-based multispectral and hyperspectral imagery. Remote Sens. 2023, 15, 4800. [Google Scholar] [CrossRef]
Pignède, E.; Roudier, P.; Diedhiou, A.; N’Guessan Bi, V.H.; Kobea, A.T.; Konaté, D.; Péné, C.B. Sugarcane yield forecast in Ivory Coast (West Africa) based on weather and vegetation index data. Atmosphere 2021, 12, 1459. [Google Scholar] [CrossRef]
Verhulst, N.; Govaerts, B.; Sayre, K.D.; Deckers, J.; François, I.M.; Dendooven, L. Using NDVI and soil quality analysis to assess influence of agronomic management on within-plot spatial variability and factors limiting production. Plant Soil 2009, 317, 41–59. [Google Scholar] [CrossRef]
Tepe, E. A random forests-based hedonic price model accounting for spatial autocorrelation. J. Geogr. Syst. 2024, 26, 511–540. [Google Scholar] [CrossRef]
Soudani, K.; Hmimina, G.; Delpierre, N.; Pontailler, J.-Y.; Aubinet, M.; Bonal, D.; Caquet, B.; De Grandcourt, A.; Burban, B.; Flechard, C. Ground-based Network of NDVI measurements for tracking temporal dynamics of canopy structure and vegetation phenology in different biomes. Remote Sens. Environ. 2012, 123, 234–245. [Google Scholar] [CrossRef]
Hu, H.; Ren, Y.; Zhou, H.; Lou, W.; Hao, P.; Lin, B.; Zhang, G.; Gu, Q.; Hua, S. Oilseed Rape Yield Prediction from UAVs Using Vegetation Index and Machine Learning: A Case Study in East China. Agriculture 2024, 14, 1317. [Google Scholar] [CrossRef]
Sulik, J.J.; Long, D.S. Spectral considerations for modeling yield of canola. Remote Sens. Environ. 2016, 184, 161–174. [Google Scholar] [CrossRef]
Domínguez, J.; Kumhálová, J.; Novák, P. Assessment of the relationship between spectral indices from satellite remote sensing and winter oilseed rape yield. Agron. Res. 2017, 15, 55–68. [Google Scholar]
Zamani-Noor, N.; Feistkorn, D. Monitoring Growth Status of Winter Oilseed Rape by NDVI and NDYI Derived from UAV-Based Red–Green–Blue Imagery. Agronomy 2022, 12, 2212. [Google Scholar] [CrossRef]
Mkhabela, M.S.; Bullock, P.; Raj, S.; Wang, S.; Yang, Y. Crop yield forecasting on the Canadian Prairies using MODIS NDVI data. Agric. For. Meteorol. 2011, 151, 385–393. [Google Scholar] [CrossRef]
Shahsavari, M.; Mohammadi, V.; Alizadeh, B.; Alizadeh, H. Application of machine learning algorithms and feature selection in rapeseed (Brassica napus L.) breeding for seed yield. Plant Methods 2023, 19, 57. [Google Scholar] [CrossRef] [PubMed]
Fernando, H.; Ha, T.; Nketia, K.A.; Attanayake, A.; Shirtliffe, S. Machine learning approach for satellite-based subfield canola yield prediction using floral phenology metrics and soil parameters. Precis. Agric. 2024, 25, 1386–1403. [Google Scholar] [CrossRef]
Nguyen, L.H.; Robinson, S.; Galpern, P. Medium-resolution multispectral satellite imagery in precision agriculture: Mapping precision canola (Brassica napus L.) yield using Sentinel-2 time series. Precis. Agric. 2022, 23, 1051–1071. [Google Scholar] [CrossRef]
Sadenova, M.; Beisekenov, N.; Varbanov, P.S.; Pan, T. Application of Machine Learning and Neural Networks to Predict the Yield of Cereals, Legumes, Oilseeds and Forage Crops in Kazakhstan. Agriculture 2023, 13, 1195. [Google Scholar] [CrossRef]

Figure 1. Location of the crop fields (F1–F8) with winter oilseed rape included in the study.

Figure 2. The workflow for winter oilseed rape yield prediction using multi-temporal Sentinel-2 NDVI and machine learning models.

Figure 3. Spatial distribution of winter oilseed rape yield and model residuals across the analysed fields. The left column shows the observed dry yield (t ha⁻¹) derived from yield monitor data, whereas the right column presents the residuals calculated as the difference between observed and predicted yield (Observed-Predicted). Positive residuals indicate model underestimation, while negative residuals indicate model overestimation.

Figure 4. CART regression tree for rapeseed seed yield based on monthly NDVI. Red squares indicate final nodes.

Figure 5. Variable importance of predictors in the CART model for rapeseed seed yield.

Table 1. Location of the cropfields and seed dry yield of winter oilseed rape in 2024.

Cropfield ID	Cropfield Location	Longitude and Latitude	Area (ha)	Number of Sampling Points	Seed Dry Yield (t/ha) Mean (Min–Max)
1	Valiuliai	23.12 E, 54.97 N	62.5	7469	2.54 (0.31–8.29)
2	Drublionys	24.94 E, 55.10 N	11.6	1302	1.52 (0.32–4.86)
3	Barskunai	25.10 E, 54.93 N	17.6	2909	3.27 (0.31–8.97)
4	Maišiagala	25.08 E, 54.88 N	17.5	2109	3.33 (0.32–9.95)
5	Paberžine	25.18 E, 54.95 N	31.7	3073	1.67 (0.31–7.49)
6	Barskunai	25.11 E, 54.93 N	60.8	8694	2.74 (0.31–8.72)
7	Barskunai	25.10 E, 54.94 N	15.1	2395	2.79 (0.32–9.15)
8	Gudaičio	23.09 E, 54.97 N	6.9	1618	2.52 (0.33–5.58)

Table 2. Mean monthly NDVI values for winter oilseed rape during the 2023–2024 growing season.

Cropfield ID	NDVI 1 September 2023 to 30 September 2023	NDVI 1 October 2023 to 31 October 2023	NDVI 1 November 2023 to 30 November 2023	NDVI 1 February 2024 to 29 February 2024	NDVI 1 March 2024 to 31 March 2024	NDVI 1 April 2024 to 30 April 2024	NDVI 1 May 2024 to 31 May 2024
1	0.37	0.76	0.28	0.57	0.49	0.65	0.68
2	0.58	0.79	0.65	0.48	0.35		0.72
3	0.70			0.50	0.43	0.82	0.79
4	0.43			0.45	0.43	0.83	0.78
5	0.72	0.85	0.68	0.62	0.47	0.79	0.78
6	0.69			0.55	0.47	0.84	0.75
7	0.70			0.52	0.44	0.81	0.79
8	0.35	0.81	0.26		0.56	0.73	0.73

Table 3. Predictive performance of yield models based on multi-temporal NDVI under random and field-wise validation schemes.

Split	Model	RMSE (t/ha)	R²
GroupSplit by Field ID	DNN	1.09	0.28
	OLS	1.16	0.19
	RF	1.44	0.05
	XGBoost	1.32	0.10
Random 80/20	DNN	0.88	0.52
	OLS	0.99	0.39
	RF	0.85	0.55
	XGBoost	0.85	0.55

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Okupska, E.; Juostas, A.; Gozdowski, D.; Wójcik-Gront, E. Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches. Agronomy 2026, 16, 763. https://doi.org/10.3390/agronomy16070763

AMA Style

Okupska E, Juostas A, Gozdowski D, Wójcik-Gront E. Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches. Agronomy. 2026; 16(7):763. https://doi.org/10.3390/agronomy16070763

Chicago/Turabian Style

Okupska, Edyta, Antanas Juostas, Dariusz Gozdowski, and Elżbieta Wójcik-Gront. 2026. "Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches" Agronomy 16, no. 7: 763. https://doi.org/10.3390/agronomy16070763

APA Style

Okupska, E., Juostas, A., Gozdowski, D., & Wójcik-Gront, E. (2026). Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches. Agronomy, 16(7), 763. https://doi.org/10.3390/agronomy16070763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Yield Prediction in Winter Oilseed Rape Based on Multi-Temporal NDVI and Modelling Approaches

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Yield Monitor Data

2.3. Sentinel-2 NDVI Processing

2.4. Modelling Approaches

3. Results

3.1. Yield Variability and NDVI Dynamics

3.2. Model Performance

4. Discussion

4.1. Temporal Relevance of NDVI for Winter Oilseed Rape Yield Prediction

4.2. Model Performance and the Importance of Validation Strategy

4.3. Comparison with Previous Studies

4.4. Practical Implications, Limitations, and Future Work

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI