High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning

Arizo-García, Patricia; Castiñeira-Ibáñez, Sergio; Cruzado-Campos, Enric; San Bautista, Alberto; Rubio, Constanza

doi:10.3390/agriculture16050516

Open AccessArticle

High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning

by

Patricia Arizo-García

^1,2

,

Sergio Castiñeira-Ibáñez

^2,3,*

,

Enric Cruzado-Campos

¹

,

Alberto San Bautista

^1,4

and

Constanza Rubio

^2,3

¹

Centro de Investigación del Regadío y Agrosistemas Mediterráneos, Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain

²

Departamento de Física Aplicada, Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain

³

Centro de Tecnologías Físicas, Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain

⁴

Departamento de Producción Vegetal, Universitat Politècnica de València, Camí de Vera s/n, 46022 Valencia, Spain

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(5), 516; https://doi.org/10.3390/agriculture16050516

Submission received: 30 January 2026 / Revised: 20 February 2026 / Accepted: 24 February 2026 / Published: 26 February 2026

(This article belongs to the Special Issue Artificial Intelligence in Precision Agriculture: Applications in Crop Management)

Download

Browse Figures

Versions Notes

Abstract

High-resolution yield forecasting is essential for advancing precision agriculture and improving the sustainability of wheat and barley production. While most previous studies focus on field-scale predictions, pixel-level approaches are needed to capture intra-field variability and support site-specific management. This paper evaluates the performance of machine learning models for 10 m resolution yield prediction using multi-temporal Sentinel-2 surface reflectance data across seven major cereal-producing regions in Spain. Yield monitor data from winter wheat and barley fields collected over five growing seasons (2020–2024) were combined with spectral bands and vegetation indices. Random Forest (RF) and XGBoost (XGB) models were trained at five phenological stages expressed as days before harvest (DBH) and validated using both internal (2020–2023) and independent external (2024) datasets. Model accuracy increased as harvest approached. In external validation, RF achieved the best performance for wheat (R² = 0.77; RMSE ≈ 697 kg · ha⁻¹), while XGB performed best for barley (R² = 0.86; RMSE ≈ 744 kg · ha⁻¹). Visible, red-edge, and SWIR bands were the most informative predictors, especially during grain filling and senescence. Results demonstrate the potential of multi-temporal Sentinel-2 data and machine learning for accurate, transferable, pixel-level yield forecasting in Mediterranean cereal systems.

Keywords:

cereals; prediction models; machine learning; digital farming

1. Introduction

Cereals are the cornerstone of global food systems, providing a significant proportion of the calories consumed by the population and acting as key regulators of the stability of international agricultural markets [1]. Moreover, the cereal industry is susceptible to climate variability, since a crop’s tolerance to stress during sensitive phenological stages—tillering, stem elongation, and grain filling—largely determines productivity. The increase in both the frequency and intensity of climate change, together with abiotic stress conditions, raises average temperatures, exacerbating yield variability and uncertainty, through their direct effects on crop growth dynamics, biomass accumulation, and yield formation, placing additional pressure on food security and agricultural resilience [2]. So, the development of robust, field-relevant, and scalable yield forecasting approaches has become a key agronomic research priority, as these tools support crop management decisions, risk assessment, and yield gap analysis at the farm and agricultural system levels, while also helping as a basis for agricultural policy planning, insurance schemes, and market stabilization strategies at the regional and global levels [3,4].

In response to the growing need for reliable yield forecasting amid increasing climatic variability, productivity forecasting can be achieved through process-based models that incorporate agronomic parameters, as well as linear or non-linear statistical models derived from empirical relationships, both supported by environmental and crop production variables. Crop models were determined by simulating crop growth, development, and resource use in response to environmental and management conditions, providing a strong ecological and physiological basis for yield analysis at the field scale. However, the need for highly detailed input data and site-specific calibration limits these models’ ability to produce robust forecasts when scaled to regional or global levels [5]. Statistical models, in contrast, are based on empirical relationships between yield and a limited set of explanatory variables. This facilitates their implementation but does not improve their scalability across regions and seasons. So, statistical approaches could provide robust and computationally efficient yield predictions, particularly when detailed inter-field information is unavailable. But they often struggle to relate environmental uncertainty and the complex interactions between climate drivers and crop behavior [5,6]. In this statistical analysis, a critical limitation arises from the assumption of linearity commonly used in empirical models predicting final crop yields. Among statistical modeling methods, simple or multiple linear regression (MLR) was frequently applied to crop yield forecasting due to its simplicity and interpretability.

Nevertheless, final biological productivity is determined by the interaction among multiple environmental, physiological, and crop management variables, which often affect crop performance in an n-dimensional yield-agronomic response. So, linear approaches may be insufficient to completely capture the complexity of yield performance under variable climatic and agronomic conditions. For this reason, the use of machine learning (ML) algorithms has become increasingly common in precision agriculture (PA), as these methods are capable of modeling non-linear relationships, handling complex interactions among predictors, improving robustness to noise through regularization and ensemble strategies, and benefiting from increased predictive performance as the size and variability of initial large dataset grow [7]. Machine learning algorithms such as Random Forest (RF), Support Vector Machines (SVM), and Gradient Boosting (GB) are considered the most widely used for forecasting final yield, indicating their flexibility and consistent performance across a range of crops and crop production environments [4].

Four types of inputs are commonly used to train these models: yield data, meteorological data, crop management data, and remote sensing data. The most common source of yield data is government statistics [4,5], which are typically available at aggregated spatial scales, thereby limiting their applicability for field-level analyses and constraining model transferability despite their usefulness for regional or global predictions. The use of yield data extracted from combine harvesters remains scarce. When employed, it is typically limited to specific fields or restricted study areas, which hampers the development of broadly applicable models [3,8].

The use of meteorological data has expanded in recent years; however, it is most often incorporated as complementary information alongside other data sources, such as remote sensing [6,7]. Furthermore, climatic variables are not always available at the iner farm level, introducing uncertainty into both the analysis and the results, particularly when the spatial separation between stations and cropped fields is significant, thereby reducing the statistical significance of the results. Crop management data can provide valuable information and define new variables related to site-specific cultural practices and crop responses at the field scale. Still, their heterogeneity and limited availability pose a constraint on the development of scalable prediction models.

Finally, remote sensing data from different platforms have been used with reasonable success, either alone or in combination with other data sources, at both regional and field scales [3,4,5,6,7,8]. However, some of these applications have focused on vegetation indices (VIs), such as the Normalized Vegetation Index (NDVI), the Enhanced Vegetation Index (EVI), or the Green Difference Vegetation Index (GDVI), which are known to be affected by saturation effects during periods of high biomass accumulation. This limitation could be particularly significant in wheat and barley, because the final productivity is defined by advanced vegetative stages, potentially reducing the statistical significance of VI-based approaches to yield variability during key phenological phases of yield formation. In this sense, Unmanned Aerial Vehicles (UAVs) have been widely used for field scale yield prediction, due to it’s high-resolution images [9]. However, UAV sensors are mostly limited to RGB region, have high operating cost and requires the operator to be physically be in the field, while Sentinel-2 offers a freely available wide range of reflectance products at a high spatial resolution easily applicable to large cultivation areas [8,9,10]. Furthermore, recent studies in cereals have demonstrated that Red-Edge bands are more sensitive to chlorophyll content and nitrogen status, potentially overcoming the saturation effects observed in RGB VI [8,11].

In this context, this study aims to develop and evaluate robust pixel-level yield prediction models for wheat and barley using multi-temporal Sentinel-2 surface reflectance data and machine learning techniques. Specifically, the objectives are to: assess the predictive capability of Random Forest and XGBoost algorithms across multiple phenological stages; evaluate the temporal and spatial transferability of the models using an independent growing season for validation; and identify the most informative spectral regions for yield prediction in Mediterranean cereal systems. By integrating multi-year yield-monitoring data with satellite observations at 10 m resolution, this work aims to provide scalable, field-relevant tools to support precision agriculture and site-specific crop management. Compared with other studies, the present work integrates high-resolution multitemporal satellite reflectance data and a large dataset of yield monitor data across different cropping areas in Spain. In addition, the potential of reflectance bands data as yield predictors is also evaluated, in contrast to the frequently used VI-trained yield prediction models. To ensure the reliability of yield predictions, a two-tier validation strategy was implemented. Data from four growing seasons (2020–2023) was split into an 80/20 ratio for internal calibration and validation. Crucially, the 2024 growing season was withheld entirely as an independent external validation set to rigorously assess the models’ inter-annual transferability.

2. Materials and Methods

2.1. Site of Study

The study was conducted using yield data from seven of the main wheat and barley production areas in Spain (Figure 1): Burgos, Córdoba, León, Palencia, Sevilla, Soria, and Valladolid. In this regard, the studied production areas represent 39.1% and 23.4% of the Spanish cultivated area (hectares) for wheat (Burgos 9.7%, Córdoba 3.1%, León 2.7%, Palencia 6.3%, Sevilla 6.5%, Soria 5.1%, and Valladolid 5.7%) and barley (Burgos 6.0%, Córdoba 0.7%, Palencia 5.2%, Soria 3.5%, and Valladolid 8.0%), respectively; while those production areas account for the 40.1% and 25.5% of the Spanish production (tons) of wheat (Burgos 11.4%, Córdoba 2.5%, León 3.2%, Palencia 8.1%, Sevilla 5.0%, Soria 4.1%, and Valladolid 5.8%) and barley (Burgos 7.6%, Córdoba 0.5%, Palencia 6.4%, Soria 2.9%, and Valladolid 8.1%), respectively [12]. A total surface of 5466.3 ha of yield data was used in the data cleaning study (2700.4 ha and 2765.9 ha for wheat and barley, respectively) over 5 crop seasons (from 2020 to 2024). Detailed information about the number of plots, growing seasons, and total surface can be found in the Table A1 of Appendix A, as well as a summary statistics for the cropping areas elevation and elevation differences (Table A2). Available data were present for wheat in all the locations (643 fields). In comparison, no data were available in Sevilla and León for barley (893 fields). For wheat, data were available for all growing seasons in Burgos, Palencia, Córdoba, and Valladolid, while Sevilla had available data for the 2020 season, Soria for the 2021 and 2023–2024 seasons, and León for the 2023–2024 seasons. For barley, data were available for all growing seasons in Burgos, Palencia, and Valladolid, while Soria has data for 2021 and 2023–2024, and Córdoba for 2021–2023. The available yield data correspond to winter wheat and barley crops sown in November. The crop management of wheat and barley fields in Spain follows the recommendation of López-Bellido et al. [13,14].

According to the Köppen Climate Classification (KCC), the studied areas have different characteristic climates. Therefore, the climate for each location is Temperate oceanic climate (Cfb) for Burgos and Soria, Warm-summer Mediterranean climate (Csb) for León and Palencia, and Arid steppes (BSk) for Valladolid. The soil type in these areas is calcimorphic, except for the León area, which has an umbrisol soil type [15].

Both wheat and barley are mainly rainfed winter cereals in Spain, sown in autumn, grow through the cool winter, and harvested at the end of spring or beginning of summer. These crops occupy a huge area of Spanish land used for cereal production, with Spain being the highest producer of barley in the European Union (EU) and also producing an important share of EU wheat production [16]. Winter barley is used for malting and animal feed, while wheat is used both for animal and human alimentation. Barley has been traditionally considered less productive than wheat. However, under similar conditions of soil, climate, and management, both crops can achieve yields of 1.0–5 t·ha⁻¹ under rainfed conditions, achieving up to 7.0–8.0 t·ha⁻¹ under irrigation [17,18]. Figure 2 shows the phenological cycle of both crops under Spanish Mediterranean conditions, showing the phenological stages expressed in days before harvest (DBH) [19,20,21].

2.2. Yield Data Acquisition

The final yield data for the crops were recorded using two different Yield Track software programs: one installed by TOPCON Corporation (Tokyo, Japan) and the other by Trimble (Westminster, CO, USA). A total of three combine harvesters were used for data acquisition. Both companies’ measurement systems are based on volumetric grain flow estimates from optical sensors before the grain enters the combine harvester hopper. The software that generates yield maps includes internal calibration based on the crop type, so combine operators select the grain type to be harvested and calibrate the sensors before starting. However, the data representation in the resulting shapefiles differs. TOPCON (Yieldtrakk YM-1) (Topcon Positioning Systems, Inc., Livermore, CA, USA) creates a layer composed of polygons with an irregular surface and a constant width that matches the cutting width (7.5 m for wheat and barley crops) of the combine, whereas Trimble (Trimble Ag) (Trimble Inc., Westminster, CO, USA) creates a layer of points. The yield maps of Palencia, León, and Valladolid were created using TOPCON software (MAGNET Field v7.x) while the Burgos and Soria yield maps used Trimble software (v2022.10) TOPCON yield maps were downloaded directly from the combine harvester, while the Trimble yield maps were obtained from the company. These datasets were processed following the methodology described by [22]. First, overlaps and data outside of the biological limits were deleted. Subsequently, a global filter at the field level was applied, excluding the data recording outside of the range (median_field ± 1 · IQR_field). After the global filtering, a local filtering was applied within each field, creating a searching radius of 40 m, excluding all the data records outside the range (median_searchradio ± 2.5 · IQR_searchradio). Finally, a mean filter was applied to rescale the data to the maximum spatial resolution of Sentinel-2 data (10 × 10 m).

2.3. Satellite Data

The satellite data were obtained from a Multi-Spectral Instrument (MSI) on board two twin satellites (Sentinel-2A and Sentinel-2B) that fly in the same orbit but are phased 180°, allowing the acquisition of wide-swath, high-resolution images with a 5-day time-frequency [23]. The optical instrument sampled 13 spectral bands, 10 of which were used (Table 1). Only cloud-free images were used, downloaded from ESA’s official Copernicus Browser platform. The downloaded images were the level 2A products that provide Bottom-Of-Atmosphere (BOA) reflectance data. In addition, the downloaded images belong to seven different tyles: T30TVM (Burgos), T30SUG (Córdoba), T29TQH (León), T30TUM (Palencia), T30STG (Sevilla), T30TWM (Soria) and T30TUL (Valladolid). All band information was aggregated into a 10 × 10 m pixel grid, resampling all 20 m resolution bands to 10 m using bilinear interpolation, and estimating the value of each 10 m pixel as a distance-weighted average of the four nearest 20 m pixels. In addition, four vegetation index (VI) widely used for yield production modeling were selected and computed from Sentinel-2 surface reflectance products: the Normalized Difference Vegetation Index (NDVI) (Equation (1)) [24], the Normalized Difference Red-Edge Index (NDRE) (Equation (2)) [25], the Ratio Vegetation Index (RVI) (Equation (3)) [26], and the Enhanced Vegetation Index (EVI) (Equation (4)) [27].

NDVI = \frac{NIR - Red}{NIR + Red}

(1)

NDRE = \frac{NIR - Red - Edge}{NIR + Red - Edge}

(2)

RVI = \frac{NIR}{Red}

(3)

EVI = \frac{NIR - Red}{NIR + 6 \cdot Red - 7.5 \cdot Blue + 1}

(4)

Table 2 shows the selected cloud-free images for the studied growing seasons. Five images were finally selected for each location and year. Those five dates correspond to five specific moments during the crop cycle, and are specified in Days Before Harvest (DBH). In this sense, images of 110, 95, 80, 65, 50, 35, 15 and 5 DBH will be used for the present study, corresponding this moment to four different phenological stages of wheat and barley: booting (80 DBH), grain filling (50 and 35 DBH), ripening (15 DBH), and senescence (5 DBH).

2.4. Experimental Design

The workflow followed for the present study is presented in Figure 3. Firstly, the data for the 2020–2024 seasons was gathered, and the selected VI was calculated. Once the datasets were assembled correctly, the data were analyzed prior to being split for the final stage, which consisted of training and validating a prediction yield model.

2.4.1. Modeling Dataset Building

The dataset used for modeling was built by combining the yield datasets (target variable) along with the downloaded Sentinel-2 surface data (independent variables). A 10 × 10 m pixel grid was created, aligned with Sentinel-2 pixels geometry to avoid spectral contamination due to mixed pixels, aggregating at the pixel level the observed yield measurements (kg·ha⁻¹) and the spectral information derived from Sentinel-2, constituted by the selected 10 reflectance surface bands and the calculated VI. A negative 10 m buffer was applied at the field level to remove the field-corner effect, characterized by mixed pixels (crop and adjacent areas). Fields boundaries tend to have a significantly lower plant density, so erasing this data is necessary to avoid the introduction of noise to the training dataset, as the reflectance values of those pixels will be the result of soil and crop combination.Regarding the final dataset size, it can be seen in Table 3. The number of pixels of 10 × 10 m or records (Records; matrix rows) depends on the size and number of fields for each growing season and the studied crop, based on available yield data. On the other hand, the number of data or predictors (N; matrix columns) for the study depends on the number of pixels, the selected number of dates, and the spectral information (Pixels × number of dates × Sentinel-2 bands and VI).

2.4.2. Data Analysis

A descriptive analysis of all variables in the dataset was conducted to understand the data’s nature, identify potential patterns and temporal trends, and detect anomalies. This analysis included calculating basic statistics, such as the mean, median, coefficient of variation, standard deviation, maximum, and minimum values. Subsequently, a correlation study of the spectral variables with the measured yield was conducted at the pixel level. Pearson’s correlation coefficient was used to quantify relationships between the available predictors and the target variable, aiding in identifying and potentially reducing less informative predictors and, consequently, reducing the complexity of the yield prediction models.

2.4.3. Machine Learning Algorithms

Random Forest (RF) and XGBoost (XGB) were chosen for the present study, as these algorithms are widely used and especially useful when the relationships between variables are non-linear, and the data include noise or missing values [8].

The RF is an ensemble method that randomly selects variables to construct multiple decision trees, thereby avoiding model overfitting when high multicollinearity is present, as is the case with hyperspectral remote sensing data [28]. First introduced by Breiman [29], RF is built on a bootstrap sample of the training data, considering at each node a random subset of features, thereby reducing correlation between trees and improving generalization. In addition, as it’s a non-parametric algorithm, it can handle many predictors, is robust to noise, and its generalization error converges to a limit as more trees are added.

XGB is also based on an additive ensemble of decision trees, but it introduces a gradient boosting that corrects the residual error of the previous tree. It builds a scalable gradient-boosted tree framework in an additive, stage-wise manner and prevents overfitting by minimizing a regularization function that penalizes tree complexity [30].

For tuning each algorithm, a k-fold cross-validation (CV) was used (k = 3). The use of k-fold CV is widely recommended for hyperparameter selection in yield prediction models instead of the algorithm’s default setting, as it provides out-of-sample estimates and improves model generalization [9,31]. The tested hyperparameter ranges for each algorithm are presented in Table 4, along with the selected parameter for each studied crop.

2.4.4. Model Training and Performance Evaluation

To prevent data leakage, the data was rigorously split. The 2020–2023 growing seasons were used for model development, while the 2024 season data were reserved for testing the inter-annual transferability of the models (external validation). 2020–2023 dataset was randomly split using an 80/20 ratio, saving 20% for internal validation purposes. 80% was first used for model hyperparameter tuning, applying a 3-fold CV. After the tuning process, the yield-predictive models were trained using the selected hyperparameters and 80% of the 2020–2023 dataset.

The models were trained at five points in the growing season: 5, 15, 35, 50, and 80 DBH. The data from the different prediction moments were combined into the posterior models; i.e., the model at 35 DBH will contain all the information from 35, 50, and 80 DBH. In addition, to test the predictive performance of the models for future predictions and their inter-annual transferability, all pixels from the 2024 growing season were used as an external validation dataset.

The performance of the models was evaluated using R², the Mean Absolute Error (MAE), the Mean Absolute Percentage Error (MAPE), and the Root Mean Square Error (RMSE). To increase the statistical reliability of these performance metrics, bootstrap confidence intervals at the 95% confidence level were calculated for all trained and validated models. The bootstrap method was used to estimate a range of plausible values for a parameter by repeatedly resampling from the original sample, without using classical methods that assume normality. In this case, a bootstrap with 1000 resamplings (n_iterations = 1000) and a size equal to the size of the original dataset (training or test datasets) was applied (size = len(dataset)); the replace function in the bootstrap resampling was activated (replace = True).

2.5. Software

Google Earth Engine was used to download all the Sentinel-2 images. Yield map processing was performed with QGIS 3.34.6. The ensembling of yield data and Sentinel-2 data was made using Python language (v3.11.7) in Visual Studio Code v1.101.2. The descriptive analysis of the data, as well as the correlation study, was performed using the libraries pandas v2.2.3, geopandas v1.01 and numpy v2.2.6. The scikit-learn v1.6.1 library was used to train the random forest algorithm and obtain performance metrics. The training and testing of the XGB algorithm were performed using the xgboost 3.0.5 library. The k–fold CV was made using the module HalvingGridSearchCV of the scikit–learn library. The train_test_split function from the scikit-learn library was used to partition the historical 2020–2023 dataset. Finally, all the plots and figures were created using the libraries malplotlib v3.10.3 and seaborn v0.13.2 (Table 5).

3. Results

3.1. Study of Correlation

Figure 4 shows the evolution of the correlation coefficient (r) between the selected Sentinel-2 reflectance bands and the reference yield throughout the wheat crop cycle for each studied growing season. Two clear trends can be observed across all seasons. On the one hand, Near–Infrared (NIR) reflectance bands (B8 and B8A) and one red–edge band (B7) exhibit a positive correlation with yield, reaching values between 0.25 and 0.80 at times of maximum correlation. On the other hand, yield shows a negative correlation with the visible bands (B2–B4), the Short-Wave Infrared (SWIR; B11 and B12), and one red–edge band (B5), with values ranging from −0.77 to −0.27 at times of strongest correlation. However, the remaining red-edge band (B6) does not show a consistent pattern, aligning with either group depending on the growing season.

When comparing the evolution of correlation across all wheat growing seasons, it can be observed that at several points during the crop cycle, the two aforementioned groups exhibit simultaneous peaks of opposite sign. These peaks occur approximately between 5–15 DBH (ripening and senescence), 35–50 DBH (grain filling), and 65–80 DBH (booting). Beyond 80 DBH, no consistent pattern across growing seasons can be identified. Based on these periods of maximum correlation, five temporal points (5, 15, 35, 50, and 80 DBH) were selected for yield prediction modeling.

Figure 5 shows the evolution of r between the selected Sentinel-2 reflectance bands and the reference yield during the barley crop cycle for each studied growing season. Two clear trends can be observed across all the studied seasons. On the one hand, the correlation between yield and NIR reflectance bands (B8 and B8A), as well as the red-edge band B7, shows a similar tendency. However, unlike wheat, the correlation of these reflectance bands is not always positive, but in function of the growing season, it can also present a negative correlation with yield or positive correlations really close to 0, especially after flowering (from 60 to 5 DBH); achieving values between −0.25 and 0.50 at the moment of maximum correlation. On the other hand, a negative correlation of yield with the reflectance bands of the visible (B2–B4), the Short Wave Infrared (SWIR; B11 and B12), and one of the red-edge (B5), with values of correlation in the same range as wheat in the moments of best correlation. However, the remaining reflectance band in the red-edge wavelength (B6) does not have a clear pattern, following the tendency of one of the other groups depending on the growing season.

In the case of barley crops, the simultaneous peaks of correlation are found at the same moments of the crop cycle already mentioned for wheat crops (5–15 DBH, 35–50 DBH, and 65–80 DBH), which leads to the selection of the same five temporal moments for the modeling of yield. Nevertheless, the correlation between the two mentioned trend groups does not necessarily imply a simultaneous opposite correlation; unlike wheat, both groups show a simultaneous negative correlation.

3.2. Models Evaluation

3.2.1. Yield Prediction Models of Wheat

The performance of the trained ML models for wheat yield prediction is presented in Table 6 and Table 7. The model was validated using two datasets: an internal validation set with data from 2020–2023 growing seasons that were not used during training, and an external validation set with all available data from the 2024 growing season. Confidence intervals (CI) were calculated to assess the uncertainty of the model performance metrics. For both RF and XGB, the internal validation had similar results, with R² ranging from 0.876 to 0.940 with a 95% CI from 0.002 to 0.004, a MAPE of 15.4–26.4% with a 95% CI of 3.8–5.4%, a MAE of 252.4–403.4 kg·ha⁻¹ with a 95% CI of 3.0–4.4 kg·ha⁻¹, and a RMSE of 441.5–635.6 kg·ha⁻¹ with a 95% CI of 8.3–9.8 kg·ha⁻¹. The models’ performance improved in both cases as the prediction scenario approached the harvest. However, it can be seen that the XGB models have slightly better performance (Table 7 in the internal validation for scenarios that were farther from the harvest date (80–36 DBH). On the other hand, the external validation showed better results as the prediction scenarios were closer to the harvest date (15–5 DBH) with R² of 0.22–0.78 and 95% CI of 0.007–0.016, a MAPE of 19.3–49.5% and a 95% CI of 0.4–1.7%, a MAE of 515.9–933.2 kg·ha⁻¹ and a 95% CI of 6.7–13.0 kg·ha⁻¹, and a RMSE of 680.1–1284.9 kg·ha⁻¹ and a 95% CI of 9.6–16.6 kg·ha⁻¹. On the external dataset, the RF models proved more robust and generally outperformed the other models.

Figure 6 presents the evolution of the predictive potential of the models in the external validation dataset. It can be seen that for both RF and XGB models, the R² value increases as the crop approaches harvest (i.e., lower DBH). In fact, the R² at 80 DBH did not reach 0.3 in either model; however, at the next prediction time (50 DBH), it was close to 0.7. This R² increase is also reflected in the prediction errors, decreasing the MAE by 28.3% and 33.4%, and the RMSE by 31.7% and 36.1% for RF and XGB, respectively. The performance of the models on an external dataset continues to improve up to 15 DBH for the RF and XGB models. RF will increase its R² from 35 DBH to 15 DBH, from 0.75 to 0.78, while remaining constant from 15 DBH to 5 DBH, despite a slight decrease in MAE and an increase in RMSE. Nevertheless, XGB models R² will also increase from 35 DBH to 15 DBH, from 0.75 to 0.77, consistently achieving lower MAPE, MAE, and RMSE, but the performance at 5 DBH will suddenly decrease significantly.

3.2.2. Yield Prediction Models of Barley

The performance of trained ML models for barley yield prediction is presented in Table 8 and Table 9. The models were validated in the same way as those for wheat. For both RF and XGB, the internal validation R² ranges from 0.892 to 0.961 with a 95% CI from 0.001 to 0.003, a MAPE of 8.4–15.3% with a 95% CI of 0.1–0.2%, a MAE of 203.1–349.0 kg·ha⁻¹ with a 95% CI of 2.0–3.1 kg·ha⁻¹, and a RMSE of 299.1–497.0 kg·ha⁻¹ with a 95% CI of 3.6–5.5 kg·ha⁻¹. The models’ performance improved in both cases as the prediction scenario approached the harvest. However, the XGB models show slightly better performance (Table 9) in internal validation across all prediction scenarios, with higher R² and significantly lower errors. On the other hand, the external validation showed better results as the prediction scenarios were closer to the harvest date. In this sense, the external validation of the first scenario (80 DBH), being that one furthest to harvest, presented an R² of 0.333 ± 0.006 and 0.405 ± 0.007, MAPE of 92.0 ± 1.3 and 83.6 ± 1.2%, MAE of 1487.0 ± 9.9 and 1356.8 ± 10.2 kg·ha⁻¹, and RMSE of 1795.7 ± 10.1 and 1695.8 ± 10.7 kg·ha⁻¹ for RF and XGB, respectively. On the other hand, the performance of the rest of the models improves significantly with R² of 0.689–0.855 and 95% CI of 0.001–0.006, a MAPE of 26.0–46.2% and a 95% CI of 0.4–0.8%, a MAE of 529.3–924.0 kg·ha⁻¹ and a 95% CI of 5.4–8.6 kg·ha⁻¹, and a RMSE of 744.2–1225.4 kg·ha⁻¹ and a 95% CI of 8.8–11.6 kg·ha⁻¹. On behalf of an external dataset, the XGB models appear to be more robust, presenting better performance in general with respect to RF models, with an increased R² of 2.0–21.6%, and a decrease in all the metrics of error (MAPE of 9.1–21.3%; MAE 8.8–16.1%; RMSE 5.6–16.0%).

Figure 7 presents the evolution of the predictive potential of the models in the external validation dataset. It can be seen that for both RF and XGB models, the R² value increases as the crop approaches harvest (i.e., lower DBH). In fact, the R² at 80 DBH did not reach 0.5 in either model; however, at the next prediction time (50 DBH), it was close to 0.7 and 0.8 for RF and XGB, respectively. This R² increase is also reflected in the prediction errors, decreasing the MAE by 57.6% and 61.0%, and the RMSE by 50.7% and 56.1% for RF and XGB, respectively. The performance of the models on an external dataset continues to improve up to 5 DBH for the RF and XGB models. RF will increase its R² from 50 DBH to 5 DBH, from 0.689 to 0.838. The same pattern can be observed in XGB models, with an increase in R² from 50 DBH to 5 DBH, from 0.759 to 0.855, and consistently lower MAPE, MAE, and RMSE.

3.3. Model Analysis

To improve model comprehension, the weight of each predictive variable group was evaluated across the different prediction scenarios.

3.3.1. Wheat Model Interpretability

Figure 8 illustrates the feature importance (FI) of each group of variables in the various prediction yield models for wheat. In this sense, it can be seen how, regardless of the type of algorithms used to train the models, and the moment of prediction, the selected VI and Sentinel-2 NIR bands have an importance below the mean, while the bands of the visible, Red-Egde (RE), and SWIR have varying importance through the crop cycle. In the case of the prediction models trained with the RF algorithm (Figure 8a), the SWIR variables gain importance in the models as the harvesting date approaches. At 80 DBH, the visible and RE bands are equally important, followed by the SWIR bands. At 50 DBH, the RE has greater weight in the prediction models, but visible and SWIR have similar importance. A similar phenomenon occurs at 35 DBH, where the three types of variables have a similar weight in the prediction model, with the visible bands exhibiting a slightly higher FI. It’s at the model 15 DBH that a gentle increase in the FI of SWIR relative to the visible and RE bands is observed. Finally, at the end of the crop cycle (5 DBH), the trained model is highly dependent on the SWIR bands.

On the other hand, the prediction models trained using the XGB algorithm (Figure 8b), in contrast to the RF prediction models, SWIR bands have an FI below the mean almost throughout the crop cycle, excluding the 5 DBH model. In fact, combining the visible and RE variables can explain between 70% and 80% of the model predictive power. In the 80 DBH model, the FI for the visible bands was significantly higher than that for all other variables. However, from 50 DBH until the end of the crop, are the RE bands the ones that have a high weight in the model, decreasing in the retrieval of an increase in visible bands FI.

In view of these results, the Sentinel-2 reflectance bands of SWIR, RE, and visible may be sufficient to achieve an accurate yield prediction.

3.3.2. Barley Model Interpretability

Figure 9 illustrates the FI of each group of variables in the various prediction yield models for barley. In this sense, as it was highlighted in wheat, regardless of the type of algorithms used to train the models, and the moment of prediction, the selected VI and Sentinel-2 NIR bands have an importance below the mean, while the bands of the visible, Red-Egde (RE), and SWIR have varying importance through the crop cycle. In the case of the prediction models trained with the RF algorithm (Figure 9a), the visible bands gain importance in the models as the harvesting date approaches, but the SWIR bands are the ones that accumulate a higher FI at the end of the crop cycle (5 DBH). At 80 DBH, RE and SWIR bands have a similar weight in the model, followed by the visible bands. At 50 DBH, the visible has more weight in the prediction models, but RE and SWIR have similar importance in the prediction. Nevertheless, at 35 and 15 DBH, where visible bands are the ones with the higher FI, followed by SWIR and RE bands, both still have equal weight in the prediction. It’s at the end of the crop cycle (5 DBH), the trained model is highly dependent on the SWIR bands. On the other hand, the prediction models trained using the XGB algorithm (Figure 9b), in contrast to the RF prediction models, SWIR bands have an FI below the mean, excluding the 5 and 80 DBH models. In fact, combining the visible and RE variables can explain around 80% of the predictive power at 15, 35, and 50 DBH models. At the 5 DBH model, it can be seen that the FI of the SWIR bands was significantly higher than that of all the other variables. However, for the rest of the models, are the RE bands the ones that have a relatively constant weight in the model.

Similar to the case of wheat yield prediction models, the Sentinel-2 reflectance bands of SWIR, RE, and visible may be sufficient to achieve an accurate yield prediction.

4. Discussion

In the first part of the results section, the correlation between satellite reflectance bands and yield was studied, finding that, in general, there were two main groups of correlation patterns across the growing seasons in both wheat and barley. On the one hand, the group of visible (B2–B4), SWIR, and one of the RE (B5) bands presented a negative r with yield. On the other hand, the NIR (B8 and B8A) and one of the RE (B7) showed a positive r. Meanwhile, the last RE band (B6) exhibited behavior that fell between the two groups. Visible bands and RE in ≈710 nm (B5), as it is well known, are really susceptible to chlorophyll, decreasing its reflectance as the chlorophyll concentration increases [32]. Conversely, NIR and RE in ≈783 nm (B7) are related to the amount of canopy, as they are scattered in the presence of healthy and dense vegetation [11]. The B6 RE band exhibits mixed behavior compared to the other RE bands mentioned, as it lies between them in the RE spectrum [11].

In contrast, the SWIR has been associated with water content, with reflectance decreasing as water content increases in plants [33]. Therefore, the observed general patterns align with those reported by other authors in cereal crops, as increased plant canopy and chlorophyll content are generally associated with higher yield. In contrast, an increase in water content of field crops not flooded is mainly associated with an increase in the crop canopy [11,33,34]. In this sense, the highest r values, both positive and negative, were generally observed at two distinct temporal ranges during the crop cycle. The first range was around 80–60 DBH, while the plant was in the reproductive stage, but still growing vegetatively, so it was fundamental for the plant to have a high canopy and chlorophyll concentration so it could produce enough photosynthetic compounds that are necessary for generating the inflorescences that will become the final grain [35]. The second temporal range takes place around 35–15 DBH, at the end of the grain filling and beginning of senescence, so the plants should remain as long as possible photosynthetically active (high canopy and/or chlorophyll content), so it can continue filling the grain for longer, increasing its final weight [34,35]. Consequently, it can be confirmed that NIR and B7 bands showed positive r-values of 0.25–0.55 and 0.35–0.70 in the time periods of 80–60 DBH and 35–15 DBH, respectively. In contrast, SWIR, visible, and B5 bands presented a negative r-value of -0.25 to -0.77 and -0.33 to -0.63 in the time periods of 80–60 DBH and 35–15 DBH, respectively.

In this study, the accuracy of the trained RF and XGB yield prediction models improved as the harvesting date approached for both studied crops. Those models were validated using an internal dataset (20% of the 2020–2023 seasons’ data, randomly split) and an external dataset (the 2024 dataset, not previously seen by the training algorithms). The internal validation of the RF and XGB was similar for both crops, with R² values ranging from 0.876 to 0.961. At the same time, the metrics of error were lower in the case of barley prediction, with a ranging MAPE of 8.4–26.4%, MAE of 203.1–403.4 kg·ha⁻¹, and RMSE of 299.1–635.6 kg·ha⁻¹, with the upper limit of the error metric interval always a value from the RF wheat model at 80 DBH. On the other hand, results from the external validation made clear that none of the 80 DBH prediction models will be applicable in future scenarios. However, from 50 DBH to 80 DBH, the performance was good in both crops with R² ranging from 0.664 to 0.855, while the error metrics were lower in the case of wheat prediction, with a ranging MAPE of 19.3–46.2%, MAE of 515.9–924.0 kg·ha⁻¹, and RMSE of 697.2–1225.4 kg·ha⁻¹.

Other authors have developed performance prediction models, primarily at the regional level, using RF, SVM, and various neural network architectures. The performance of these models varies, with R² values ranging from 0.60 to 0.89 [4,6,7,36,37,38], with error metrics varying widely depending on how the models have been validated. However, a large part of this work only validates the prediction models through random data splitting, so there is no guarantee that these models will perform well in future situations. For example, Marzhan et al. [7] in wheat with R² of 0.81–0.84 and an MAE of 120–250 kg·ha⁻¹), Koparde et al. [36] on cereals with an R² of 0.88 and an accuracy of 92%, and Okupska et al. [38] on cereals and rapeseed, achieving an R² of 0.77 and an MAE < 600 kg·ha⁻¹. In contrast, other authors such as Meroni et al. [4] (R² 0.82–0.89, RMSE 150–350 kg·ha⁻¹), Morales et al. [38] (R² 0.65–0.75 MAE 800 kg·ha⁻¹), and Newlands et al. [6] (R² 0.60–0.85 MAPE 5–15%) have presented models validated with external training data with positive results, but they required large time series (20–30 years) to achieve such minor prediction errors. Nonetheless, the results of this study at 5 DBH in external and internal validation align with these studies. They may be suitable for final yield predictions for government agencies or insurance companies.

In parallel, from a crop management perspective, prediction models at 50 DBH are of particular interest, as the crop is at a point in its phenological cycle when yield estimates will help producers plan corrective measures, such as N fertilization. Furthermore, as the models are trained at the intra-field level, corrective management measures can be made site-specific, optimizing the production inputs [10]. In this regard, the trained models may be of great interest, especially those for wheat, as external validation shows that they have fewer prediction errors than those for barley. These types of predictive models are less common in the literature, yielding variable results because they use varied sources of information, such as climatological data, reflectance measurements at different spatial resolutions, and different forms of validation. However, the results for this type of study, which also performs external validation, are consistent with those presented, with R² values of 0.4–0.77 and RMSE values of 300–1600 kg·ha⁻¹ [8,10,39]. Once again, the lowest errors are observed in the final stage of cultivation, at the end of grain filling, in line with the results. While the models approximately 2–3 months before harvest show a higher error, although in some crops moderate errors have been achieved, 400–700 kg · ha⁻¹ using a combination of climatological data, soil type, and MODIS [10]; However, there are limitations to its applicability, as not all producers will be able to acquire all this information and have sufficiently large plots, since MODIS has moderate spatial resolution. In addition, other models with good spatial resolutions (UAV, Sentinel-2) often have the added limitations of having been validated with small data sets, limited locations, and random cross-validation [8,9,40,41].

Regarding the importance of predictor variables, the bands in the visible, RE, and SWIR regions contribute the most significant predictive power to the models in both crops. During the early stages of cultivation, the visible and RE bands, especially the visible ones, are more important due to their sensitivity to chlorophyll [42]. The results for wheat at 80 DBH are consistent with these findings, while for barley, SWIR is of great importance. However, other authors have already highlighted that this spectral range is essential for barley prediction [10], especially at the beginning and end of the crop cycle, when the plant is susceptible to drought. As for the intermediate stages of the cycle (50–35 DBH), RE and visible continue to be the most important, still due to their ability to detect chlorophyll [10,41,43], with SWIR gaining a key role at the end of the crop [10,44,45], especially at the end of grain filling (15–5 DBH), as the state of the vegetation is closely correlated with water content. Only the XGB model for wheat deviated from the end-of-crop tendency, maintaining the RE and visible bands with higher predictive power.

Unlike most commercial UAV platforms that primarily capture imagery in the RGB spectrum, the use of Sentinel-2 in this study provides a significant advantage through its specific Red-Edge (RE) bands (B5, B6, B7). While RGB-based indices from UAVs often fail to detect subtle physiological changes once the canopy is closed, the results of the present study demonstrate that the RE region maintains high predictive power during intermediate stages (50–35 DBH) due to its sensitivity to chlorophyll and nitrogen status without the early saturation typical of RGB sensors [8]. This spectral richness, combined with the 10 m spatial resolution, allows the models to balance the high-detail requirements of field-level precision agriculture with the scalability needed for regional monitoring [8,41], which is logistically and economically unfeasible with UAV-based systems [9].

Overall, yield prediction models have been trained at the pixel level (10 × 10 m), with results in line with those of other studies. The best prediction models (RF in wheat and XGB in barley) are obtained at the end of grain filling (5 DBH); therefore, these predictions will be helpful for agricultural insurance and market estimates. The models at 50 DBH, despite somewhat high errors, will be valuable for farmers to adjust management and plan variable fertilization or biostimulant applications. The trained models overcome several limitations that have been widely identified in the field: they are trained across multiple spatially separated locations, and Sentinel-2 reflectance bands are used as predictor variables alongside VI, demonstrating the limitation of VI when working with satellite data at high spatial resolution. In addition, these models make predictions at the pixel level and have been validated with data from a future year, entirely external to the training set.

In summary, the novelty of this research lies in addressing three critical gaps simultaneously: the spatial gap, by moving from aggregated regional statistics to pixel-level (10 m) predictions; the spectral gap, by demonstrating the superiority of multi-temporal reflectance bands (especially RE and SWIR) over traditional VIs; and the validation gap, by employing a strictly temporal external validation with future-year data, a more rigorous approach than the random cross-validation commonly found in the literature. This framework provides a scalable tool that bridges the gap between localized UAV studies and coarse-resolution satellite models.

To ensure that these models are practically useful, it is essential to make them accessible to producers. A key next step is to develop a user-friendly platform that can serve as a decision-support and management tool. Additionally, while this study primarily used Sentinel-2 data as predictor variables, future research could enhance predictive performance by incorporating other field-level information, such as soil properties or rainfall data. Furthermore, the continuous expansion of the databases utilized is crucial for building extensive temporal and spatial series, which will enable the training of more robust models. This approach will not only improve the performance of the algorithms presented in this study but also allow for the integration of advanced deep learning methods, which have shown promising results in predictive modeling. In this context, the proposed approach supports Sustainable Development Goal 12 (Responsible Consumption and Production) by promoting more efficient use of agricultural inputs, optimizing resource management, and fostering data-driven decision-making that contributes to more sustainable production systems.

5. Conclusions

This study developed and externally validated pixel-level (10 × 10 m) yield prediction models for wheat and barley using Random Forest (RF) and Extreme Gradient Boosting (XGB) algorithms and multi-temporal Sentinel-2 data. By operating at the spatial resolution of Sentinel-2, the proposed approach effectively captures within-field yield variability and supports site-specific agricultural decision-making. The models were trained using data from three growing seasons (2020–2023) across seven spatially separated locations in Spain and validated using an independent growing season (2024), allowing the assessment of inter-annual variability under realistic conditions.

Both machine learning algorithms successfully estimated crop yield at the pixel level, with RF achieving the best performance for wheat and XGB for barley. Model accuracy increased as the crop cycle progressed, with the highest predictive performance obtained during the grain-filling stage. The best models achieved R² values of 0.771 ± 0.009 for wheat and 0.855 ± 0.003 for barley, with corresponding RMSE values of 697.2 ± 10.8 kg·ha⁻¹ and 744.2 ± 8.8 kg·ha⁻¹, and MAPE of 19.3 ± 0.5% and 26.0 ± 0.4%, respectively. Early-season forecasts (80 DBH) exhibited greater uncertainty, suggesting that predictions at this stage should be interpreted with caution. In contrast, forecasts from the grain-filling stage onwards provide more reliable yield estimates.

Feature importance analysis highlighted the relevance of the visible, red-edge, and Short-Wave Infrared (SWIR) spectral regions for yield prediction, with SWIR bands becoming particularly informative towards the end of grain filling. These findings suggest that vegetation indices and near-infrared bands may contribute limited additional information at high spatial resolution, depending on crop type and phenological stage.

Future work will focus on integrating these models into a user-friendly decision-support platform for producers. Although the models were trained exclusively on Sentinel-2 data, incorporating additional predictors, such as soil properties and precipitation, and longer temporal series could further improve model robustness and reduce prediction uncertainty. Moreover, enhancing model transferability through complementary field-level data and uncertainty reporting would strengthen their applicability for operational use. Overall, the proposed workflow provides a scalable framework for high-resolution yield forecasting and mapping, offering reliable late-season predictions that can support yield monitoring, agricultural insurance, and site-specific management of wheat and barley crops.

Author Contributions

Conceptualization, A.S.B., S.C.-I., C.R. and P.A.-G.; methodology, A.S.B., S.C.-I., C.R. and P.A.-G.; software, P.A.-G., S.C.-I. and E.C.-C.; validation, A.S.B., P.A.-G., S.C.-I., E.C.-C. and C.R.; formal analysis, A.S.B., P.A.-G., S.C.-I., E.C.-C. and C.R.; investigation, A.S.B., P.A.-G., S.C.-I., E.C.-C. and C.R.; resources, A.S.B. and C.R.; data curation, A.S.B., P.A.-G., S.C.-I., E.C.-C. and C.R.; writing—original draft preparation, A.S.B., P.A.-G. and S.C.-I.; writing—review and editing, A.S.B., P.A.-G., S.C.-I., E.C.-C. and C.R.; visualization, P.A.-G., S.C.-I. and E.C.-C.; supervision, A.S.B., P.A.-G., S.C.-I., E.C.-C. and C.R.; project administration, A.S.B.; funding acquisition, A.S.B. and C.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the PREDIC-PRO project SCPP2100C008733XV0 of the State Research Agency of the Ministry of Science, Innovation and Universities, and the ACIF Generalitat Valenciana, European Union (European Social Fund. Investing in Your Future) grant number CIACIF/2022/255.

Data Availability Statement

The Sentinel-2 used data is openly available in https://browser.dataspace.copernicus.eu (accessed on 7 January 2025). The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

P.A.-G. acknowledges financial support from Generalitat Valenciana, European Union (European Social Fund. Investing in Your Future) through grant CIACIF/2022/255.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MLR	Multiple Linear Regression
ML	Maching Learning
PA	Precision Agriculture
RF	Random Forest
SVM	Support Vector Machines
GB	Gradient Boosting
XGB	Extreme Gradient Boosting
VIs	Vegetation Indices
NDVI	Normalized Vegetation Index
EVI	Enhanced Vegetation Index
GDVI	Green Difference Vegetation Index
KCC	Köppen Climate Classification
EU	European Union
DBH	Days Before Harvest
PPS	Principal Phenological Stage
MSI	Multi-Spectral Instrument
ESA	European Space Agency
BOA	Bottom of Atmosphere
VI	Vegetacion Index
NDRE	Normalized Difference Red-Edge Index
RVI	Ratio Vegetation Index
CV	Cross-Validation
MAE	Mean Absolute Error
MAPE	Mean Absolute Percentage Error
RMSE	Root Mean Square Error
NIR	Near Infrared
SWIR	Short-Wave Infrared
RE	Red–Edge
RGB_Visible	Bands B2, B3, and B4
CI	Confidence Intervals
V. Dataset	Validation datasets
FI	Feature Importance
UAV	Unmanned Aerial Vehicle
MODIS	Moderate Resolution Imaging Spectroradiometer

Appendix A

The appendix contains detailed selected yield data for wheat and barley production in Spain. The available number of fields (N fields) and productive surface (ha) for each crop, Spain’s cropping area, and growing season (Table A1). In addition, Table A2 presents a summary of statistics for elevation and elevation differences for each cropping area.

Table A1. Selected yield data for Spain wheat and barley production. The available number of fields (N fields) and productive surface (ha) for each crop, studied Spain’s cropping area, and growing season.

Cropping Area	Year	Wheat		Barley
Cropping Area	Year	N Fields	Surface (ha)	N Fields	Surface (ha)
Burgos	2020	6	0.6	6	3.5
	2021	43	115.0	43	90.8
	2022	36	124.2	44	49.0
	2023	25	84.6	58	117.3
	2024	31	48.4	49	151.6
Córdoba	2021	8	63.0	2	19.6
	2022	2	21.2	1	2.7
	2023	13	122.3	1	23.7
	2024	1	15.9	–	–
León	2023	27	44.2	–	–
León	2024	32	36.5	–	–
Sevilla	2020	1	11.3	–	–
Soria	2021	88	389.9	84	213.7
	2023	3	4.1	120	118.1
	2024	4	14.8	76	107.8
Palencia	2020	38	111.9	59	177.1
	2021	40	119.4	52	155.0
	2022	31	88.3	53	140.0
	2023	44	96.0	83	236.4
	2024	30	71.4	39	117.8
Valladolid	2020	23	204.8	12	231.7
	2021	26	280.0	53	399.9
	2022	39	284.7	–	–
	2023	52	348.5	58	410.5
Total		643	2700.4	893	2765.9

Table A2. Elevation and elevation difference statistics for the studied cropping areas. Being max the maximum values, and min the minimum value. All the statistics were determined for the available fields, representing the max values the maximum elevation and elevation difference for the fields of each location; the minimum elevation a elevation differences for the fields of each location, and the mean and median values will represent the mean and median of all the fields for each cropping area.

Cropping Area	Elevation (m)				Elevation Difference (m)
Cropping Area	Max	Min	Mean	Median	Max	Min	Mean	Median
Burgos	929.0	765.4	829.7	827.3	157.7	0.0	26.5	14.4
Córdoba	297.1	164.4	230.6	235.4	101.2	0.0	31.4	31.7
León	990.7	765.6	931.9	946.2	162.8	0.0	19.1	11.8
Sevilla	803.0	181.8	466.2	465.4	102.7	0.0	12.1	9.2
Soria	1110.9	801.0	945.6	952.2	103.9	0.0	8.2	5.3
Palencia	1002.7	765.6	878.7	871.7	176.3	0.0	28.9	18.6
Valladolid	803.0	675.5	746.8	747.3	111.2	0.0	28.6	23.7

References

Hossain, A.; Farhad, M.; Aonti, A.J.; Kabir, M.P.; Hossain, M.M.; Ahmed, B.; Haq, M.I.; Azim, J. Cereals production under changing climate. In Challenges and Solutions of Climate Impact on Agriculture; Elsevier: Amsterdam, The Netherlands, 2025; pp. 63–83. [Google Scholar] [CrossRef]
Neupane, D.; Adhikari, P.; Bhattarai, D.; Rana, B.; Ahmed, Z.; Sharma, U.; Adhikari, D. Does climate change affect the yield of the top three cereals and food security in the world? Earth 2022, 3, 45–71. [Google Scholar] [CrossRef]
Zhao, Y.; Potgieter, A.B.; Zhang, M.; Wu, B.; Hammer, G.L. Predicting wheat yield at the field scale by combining high-resolution Sentinel-2 satellite imagery and crop modelling. Remote Sens. 2020, 12, 1024. [Google Scholar] [CrossRef]
Okupska, E.; Gozdowski, D.; Pudełko, R.; Wójcik-Gront, E. Cereal and Rapeseed Yield Forecast in Poland at Regional Level Using Machine Learning and Classical Statistical Models. Agriculture 2025, 15, 984. [Google Scholar] [CrossRef]
Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.M.; Gerber, J.S.; Reddy, V.R.; et al. Random forests for global and regional crop yield predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef]
Newlands, N.K.; Zamar, D.S.; Kouadio, L.A.; Zhang, Y.; Chipanshi, A.; Potgieter, A.; Toure, S.; Hill, H.S. An integrated, probabilistic model for improved seasonal forecasting of agricultural crop yield under environmental uncertainty. Front. Environ. Sci. 2014, 2, 17. [Google Scholar] [CrossRef]
Sadenova, M.; Beisekenov, N.; Varbanov, P.S.; Pan, T. Application of machine learning and neural networks to predict the yield of cereals, legumes, oilseeds and forage crops in Kazakhstan. Agriculture 2023, 13, 1195. [Google Scholar] [CrossRef]
Li, F.; Miao, Y.; Chen, X.; Sun, Z.; Stueve, K.; Yuan, F. In-season prediction of corn grain yield through PlanetScope and Sentinel-2 images. Agronomy 2022, 12, 3176. [Google Scholar] [CrossRef]
Kešelj, K.; Stamenković, Z.; Kostić, M.; Aćin, V.; Tekić, D.; Novaković, T.; Ivanišević, M.; Ivezić, A.; Magazin, N. Machine learning (AutoML)-driven wheat yield prediction for European varieties: Enhanced accuracy using multispectral UAV data. Agriculture 2025, 15, 1534. [Google Scholar] [CrossRef]
Filippi, P.; Jones, E.J.; Wimalathunge, N.S.; Somarathna, P.D.; Pozza, L.E.; Ugbaje, S.U.; Jephcott, T.G.; Paterson, S.E.; Whelan, B.M.; Bishop, T.F. An approach to forecast grain crop yield using multi-layered, multi-farm data sets and machine learning. Precis. Agric. 2019, 20, 1015–1029. [Google Scholar] [CrossRef]
Al-Shammari, D.; Whelan, B.M.; Wang, C.; Bramley, R.G.; Bishop, T.F. Assessment of red-edge based vegetation indices for crop yield prediction at the field scale across large regions in Australia. Eur. J. Agron. 2025, 164, 127479. [Google Scholar] [CrossRef]
Ministerio de Agricultura, Pesca y Alimentación. Anuario de Estadística 2023. Capítulo 7: Cereales. 2023. Available online: https://www.mapa.gob.es/es/estadistica/temas/publicaciones/anuario-de-estadistica/2023?parte=3&capitulo=7&grupo=1&seccion=3 (accessed on 26 January 2026).
López-Bellido, L. Cereales, Cultivos Herbáceos. Esc. Técnica Super. Ing. Agrónomos. Univ. Córdoba 1991, 1, 539. [Google Scholar]
López-Bellido, R.J.; López-Bellido, L.; Benítez-Vega, J.; López-Bellido, F.J. Tillage system, preceding crop, and nitrogen fertilizer in wheat crop: II. Water utilization. Agron. J. 2007, 99, 66–72. [Google Scholar] [CrossRef]
IGME. IGME. 2023. Available online: https://igme.maps.arcgis.com/home/webmap/viewer.html?useExisting=1 (accessed on 26 September 2025).
Bento, V.A.; Ribeiro, A.F.; Russo, A.; Gouveia, C.M.; Cardoso, R.M.; Soares, P.M. The impact of climate change in wheat and barley yields in the Iberian Peninsula. Sci. Rep. 2021, 11, 15484. [Google Scholar] [CrossRef] [PubMed]
Cossani, C.M.; Slafer, G.A.; Savin, R. Nitrogen and water use efficiencies of wheat and barley under a Mediterranean environment in Catalonia. Field Crop. Res. 2012, 128, 109–118. [Google Scholar] [CrossRef]
Acreche, M.M.; Briceño-Félix, G.; Sánchez, J.A.M.; Slafer, G.A. Physiological bases of genetic gains in Mediterranean bread wheat yield in Spain. Eur. J. Agron. 2008, 28, 162–170. [Google Scholar] [CrossRef]
de Lima, V.J.; Gracia-Romero, A.; Rezzouk, F.Z.; Diez-Fraile, M.C.; Araus-Gonzalez, I.; Kamphorst, S.H.; do Amaral Júnior, A.T.; Kefauver, S.C.; Aparicio, N.; Araus, J.L. Comparative performance of high-yielding European wheat cultivars under contrasting Mediterranean conditions. Front. Plant Sci. 2021, 12, 687622. [Google Scholar] [CrossRef]
Nasrallah, A.; Baghdadi, N.; El Hajj, M.; Darwish, T.; Belhouchette, H.; Faour, G.; Darwich, S.; Mhawej, M. Sentinel-1 data for winter wheat phenology monitoring and mapping. Remote Sens. 2019, 11, 2228. [Google Scholar] [CrossRef]
Yang, C.; Fraga, H.; van Ieperen, W.; Trindade, H.; Santos, J.A. Effects of climate change and adaptation options on winter wheat yield under rainfed Mediterranean conditions in southern Portugal. Clim. Change 2019, 154, 159–178. [Google Scholar] [CrossRef]
Arizo-García, P.; Castiñeira-Ibáñez, S.; Cruzado-Campos, E.; Ricarte, B.; Rubio, C.; San Bautista, A. A Standardized Framework for Cleaning Non-Normal Yield Data from Wheat and Barley Crops, and Validation Using Machine Learning Models for Satellite Imagery. Agronomy 2026, 16, 386. [Google Scholar] [CrossRef]
European Space Agency. Sentinel-2 User Handbook; Technical Report; ESA Standard Document; ESA: Paris, France, 2015. [Google Scholar]
Kriegler, F.J. Preprocessing transformations and their effects on multspectral recognition. In Proceedings of the Sixth International Symposium on Remote Sesning of Environment, Ann Arbor, MI, USA, 13–16 October 1969; pp. 97–131. [Google Scholar]
Barnes, E.; Clarke, T.; Richards, S.; Colaizzi, P.; Haberland, J.; Kostrzewski, M.; Waller, P.; Choi, C.; Riley, E.; Thompson, T.; et al. Coincident detection of crop water stress, nitrogen status and canopy density using ground based multispectral data. In Proceedings of the Fifth International Conference on Precision Agriculture, Bloomington, MN, USA, 16–19 July 2000; Volume 1619. [Google Scholar]
Jordan, C.F. Derivation of leaf-area index from quality of light on the forest floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
Huete, A.; Warrick, A. Assessment of vegetation and soil water regimes in partial canopies with optical remotely sensed data. Remote Sens. Environ. 1990, 32, 155–167. [Google Scholar] [CrossRef]
Pullanagari, R.; Kereszturi, G.; Yule, I. Mapping of macro and micro nutrients of mixed pastures using airborne AisaFENIX hyperspectral imagery. ISPRS J. Photogramm. Remote Sens. 2016, 117, 1–10. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef]
Bami, Z.; Behnampour, A.; Doosti, H. A New Flexible Train-Test Split Algorithm, an approach for choosing among the Hold-out, K-fold cross-validation, and Hold-out iteration. arXiv 2025, arXiv:2501.06492. [Google Scholar]
Hatfield, J.L.; Gitelson, A.A.; Schepers, J.S.; Walthall, C.L. Application of spectral remote sensing for agronomic decisions. Agron. J. 2008, 100, S-117–S-131. [Google Scholar] [CrossRef]
Jacques, D.C.; Kergoat, L.; Hiernaux, P.; Mougin, E.; Defourny, P. Monitoring dry vegetation masses in semi-arid areas with MODIS SWIR bands. Remote Sens. Environ. 2014, 153, 40–49. [Google Scholar] [CrossRef]
Fita, D.; Rubio, C.; Uris, A.; Castiñeira-Ibáñez, S.; Franch, B.; Tarrazó-Serrano, D.; San Bautista, A. Remote Sensor Images and Vegetation Indices to Optimize Rice Yield Analysis for Specific Growth Stages Within Extensive Data. Appl. Sci. 2025, 15, 3870. [Google Scholar] [CrossRef]
Arizo-García, P.; Castiñeira-Ibáñez, S.; Tarrazó-Serrano, D.; Franch, B.; Rubio, C.; San Bautista, A. Use of Sentinel-2 Images to Elaborate a VRT Sensor-Based and Map-Based Nitrogen Fertilization in Wheat and Barley Crops. Appl. Sci. 2025, 15, 11646. [Google Scholar] [CrossRef]
Koparde, S.; Behare, A.; Kasare, S.; Patil, J.; Nadar, K. Crop Yield Prediction for Cereals using Machine Learning. In Advancements in Communication and Systems; Soft Computing Research Society: New Delhi, India, 2024; pp. 407–418. [Google Scholar] [CrossRef]
Meroni, M.; Waldner, F.; Seguini, L.; Kerdiles, H.; Rembold, F. Yield forecasting with machine learning and small data: What gains for grains? Agric. For. Meteorol. 2021, 308, 108555. [Google Scholar] [CrossRef]
Morales, A.; Villalobos, F.J. Using machine learning for crop yield prediction in the past or the future. Front. Plant Sci. 2023, 14, 1128388. [Google Scholar] [CrossRef]
Lobell, D.B.; Thau, D.; Seifert, C.; Engle, E.; Little, B. A scalable satellite-based crop yield mapper. Remote Sens. Environ. 2015, 164, 324–333. [Google Scholar] [CrossRef]
Hollinger, D.L. Crop Condition and Yield Prediction at the Field Scale with Geospatial and Artificial Neural Network Applications. PhD Thesis, Kent State University, Kent, OH, USA, 2011. [Google Scholar]
Son, N.T.; Chen, C.F.; Cheng, Y.S.; Toscano, P.; Chen, C.R.; Chen, S.L.; Tseng, K.H.; Syu, C.H.; Guo, H.Y.; Zhang, Y.T. Field-scale rice yield prediction from Sentinel-2 monthly image composites using machine learning algorithms. Ecol. Inform. 2022, 69, 101618. [Google Scholar] [CrossRef]
Camenzind, M.P.; Yu, K. Multi temporal multispectral UAV remote sensing allows for yield assessment across European wheat varieties already before flowering. Front. Plant Sci. 2024, 14, 1214931. [Google Scholar] [CrossRef]
Barzin, R.; Pathak, R.; Lotfi, H.; Varco, J.; Bora, G.C. Use of UAS multispectral imagery at different physiological stages for yield prediction and input resource optimization in corn. Remote Sens. 2020, 12, 2392. [Google Scholar] [CrossRef]
Medar, R.; Rajpurohit, V.S.; Rashmi, B. Impact of training and testing data splits on accuracy of time series forecasting in machine learning. In Proceedings of the 2017 International Conference on Computing, Communication, Control and Automation (ICCUBEA), Pune, India, 17–18 August 2017; pp. 1–6. [Google Scholar] [CrossRef]
Marshall, M.; Belgiu, M.; Boschetti, M.; Pepe, M.; Stein, A.; Nelson, A. Field-level crop yield estimation with PRISMA and Sentinel-2. ISPRS J. Photogramm. Remote Sens. 2022, 187, 191–210. [Google Scholar] [CrossRef]

Figure 1. Location of the studied cropping areas. The arrows indicate the provinces where the analyzed sites are located. Wheat fields are outlined in red and barley fields in pink for each location and growing season included in the study.

Figure 2. Phenological cycle of winter wheat and barley crops under Spanish conditions. The crop stage and the principal phenological stage (PPS) are expressed in days before harvest (DBH).

Figure 3. Study workflow. Composed by three different stages, the building of the modeling dataset, the analysis of the available data and the model training and validation.

Figure 4. Evolution of the correlation coefficient (r) between Sentinel-2 reflectance bands and observed yield at different prediction moments (DBH) for wheat across the studied growing seasons. (a) 2020, (b) 2021, (c) 2022, (d) 2022, and (e) 2024 growing season.

Figure 5. Evolution of the correlation coefficient (r) between Sentinel-2 reflectance bands and observed yield at different prediction moments (DBH) for barley in the studied growing seasons. (a) 2020, (b) 2021, (c) 2022, (d) 2022, and (e) 2024 growing season.

Figure 6. Evolution of the coefficient of determination (R²) of the extern validation in the models as a function of DBH in wheat crops.

Figure 7. Evolution of the coefficient of determination (R²) of the extern validation in the models as a function of DBH in barley crops.

Figure 8. Comparison of the variable in of wheat prediction models, being (a) RF and (b) XGB, at the different DBH scenarios (5, 15, 35, 50, and 80 DBH). Being NIR the bands B8 and B8A; RGB_Visible the bands B2–B4; RE the bands B5–B7; SWIR the bands B11 and B12, and VI the selected vegetation indexes.

Figure 9. Comparison of the variable in of barley prediction models, being (a) RF and (b) XGB, at the different DBH scenarios (5, 15, 35, 50, and 80 DBH). Being NIR the bands B8 and B8A; RGB_Visible the bands B2–B4; RE the bands B5–B7; SWIR the bands B11 and B12, and VI the selected vegetation indexes.

Table 1. Characteristics of the Sentinel-2 used bands in this study.

Sentinel-2 Band	Central Wavelength (nm)	Spatial Resolution (m)
B02—Blue	450	10
B03—Green	560	10
B04—Red	665	10
B05—Vegetation Red-Edge	705	20
B06—Vegetation Red-Edge	740	20
B07—Vegetation Red-Edge	783	20
B08—Near-Infrared (NIR)	842	10
B8A—Narrow NIR	865	20
B11—SWIR	1610	20
B12—SWIR	2190	20

Table 2. Available dates selected for each location and year and their equivalent at DBH.

Location	Year	DBH
Location	Year	110	95	80	65	50	35	15	5
Burgos	2020	2/26	–	3/27	–	–	5/21	6/5	6/20
	2021	–	3/27	4/6	4/16	5/6	–	6/15	6/25
	2022	–	–	–	4/16	–	5/11	–	6/15
	2023	–	–	3/27	4/6	5/1	–	–	6/15
	2024	–	–	4/15	–	5/10	6/4	6/24	7/4
Córdoba	2021	–	2/15	3/2	3/12	4/1	–	5/1	5/16
	2022	2/15	3/2	–	–	4/16	5/1	5/26	5/31
	2023	–	3/2	3/22	4/3	4/18	5/1	–	6/5
	2024	1/21	2/2	2/20	3/6	–	–	–	5/10
León	2023	–	3/27	4/6	4/16	5/6	5/16	–	6/25
León	2024	–	4/10	4/20	4/20	5/10	5/25	6/24	7/4
Sevilla	2020	2/3	2/16	2/28	3/12	4/1	–	5/6	5/21
Soria	2021	-	3/24	4/8	4/8	5/8	5/8	6/12	6/12
	2023	3/4	–	4/3	4/18	–	5/18	–	6/27
	2024	–	–	4/12	4/22	–	–	–	7/1
Palencia	2020	2/26	–	3/27	–	–	5/21	6/5	6/20
	2021	–	3/27	4/6	4/16	5/6	–	6/15	6/25
	2022	–	–	–	4/6	–	5/11	5/26	6/15
	2023	–	–	3/27	–	5/1	–	–	6/15
	2024	–	4/15	4/15	5/10	5/10	6/4	6/24	7/4
Valladolid	2020	2/21	–	–	–	–	5/6	5/21	6/5
	2021	–	3/7	3/17	4/6	4/16	5/6	–	5/31
	2023	3/12	3/27	4/6	–	5/1	–	–	6/25

Table 3. Dataset size for each crop and growing season. The size is reported as the number of records or rows (Records) and total predictors or columns (N).

Crop	2020	2021	2022	2023	2024	Total
Crop	Records	Records	Records	Records	Records	Records	N (×10⁶)
Wheat	32,847	96,711	51,828	69,947	18,703	270,036	30.2
Barley	41,230	87,889	19,160	90,590	37,721	273,590	30.9

Table 4. Range of values used for the tuning of RF and XGB models, and the best hyperparameter (Best HP) chosen for wheat and barley crops.

Algorithms	Hyperparameter	Range Values	Best HP
Algorithms	Hyperparameter	Range Values	Wheat	Barley
RF	n_estimators	[200, 400, 600, 800, 1000]	600	1000
	max_depth	[None, 4, 6, 8, 10, 12]	None	None
	min_samples_split	[2, 3, 5, 8, 10]	8	10
	min_samples_leaf	[1, 2, 3, 4, 5]	5	5
	max_features	[“sqrt”, “log2”, 0.3, 0.5, 0.7]	0.3	0.5
	bootstrap	[True, False]	False	True
	random_state	–	42	42
	n_jobs	–	−1	−1
XGB	n_estimators	[300, 500, 700, 900, 1200]	900	900
	learning_rate	[0.01, 0.03, 0.05, 0.08, 0.1, 0.15]	0.15	0.15
	max_depth	[3, 4, 5, 6, 7, 8, 9, 10]	10	10
	subsample	[0.6, 0.7, 0.8, 0.9, 1.0]	1.0	1.0
	colsample_bytree	[0.6, 0.7, 0.8, 0.9, 1.0]	0.9	0.9
	gamma	[0, 0.05, 0.1, 0.2, 0.3]	0.2	0.2
	reg_alpha	[0, 0.1, 0.3, 0.5, 0.8]	0.1	0.1
	reg_lambda	[0.1, 0.3, 0.5, 1.0, 2.0]	0.5	0.5
	random_state	–	42	42
	n_jobs	–	−1	−1

Table 5. Software, programming languages, and libraries used in the study.

Software/Library	Version	Developer/Organization	Official URL
Platforms
Google Earth Engine	–	Google	https://earthengine.google.com (accessed on 12 January 2025)
QGIS	3.34.6	QGIS Development Team	https://qgis.org (accessed on 21 June 2025)
Visual Studio Code	1.101.2	Microsoft	https://code.visualstudio.com (accessed on 20 October 2025)
Languages
Python	3.11.7	Python Software Foundation	https://www.python.org (accessed on 5 January 2025)
Python Libraries
pandas	2.2.3	NumFOCUS / Wes McKinney	https://pandas.pydata.org (accessed on 5 March 2025)
geopandas	1.0.1	GeoPandas contributors	https://geopandas.org (accessed on 21 October 2025)
numpy	2.2.6	NumPy Developers	https://numpy.org (accessed on 21 October 2025)
scikit-learn	1.6.1	scikit-learn Developers	https://scikit-learn.org (accessed on 21 October 2025)
xgboost	3.0.5	XGBoost Developers	https://xgboost.ai (accessed on 21 October 2025)
matplotlib	3.10.3	Matplotlib Development Team	https://matplotlib.org (accessed on 21 October 2025)
seaborn	0.13.2	Michael Waskom / PyData	https://seaborn.pydata.org (accessed on 21 October 2025)

Table 6. Performance results for the wheat prediction models trained with RF algorithms at each DBH scenario. The performance metrics are presented for both validation datasets (V. Dataset) and the internal validation dataset (2020–2023), as well as the external validation dataset (2024).

DBH	V. Dataset	RF
		R²	MAPE	MAE	RMSE
			%	kg·ha⁻¹
80	2020–2023	0.876 ± 0.004	26.4 ± 5.4	403.4 ± 4.5	635.6 ± 8.8
80	2024	0.280 ± 0.016	47.7 ± 1.7	875.6 ± 13.0	1236.5 ± 16.4
50	2020–2023	0.908 ± 0.003	19.2 ± 5.6	318.8 ± 3.7	546.5 ± 9.0
50	2024	0.664 ± 0.009	29.2 ± 0.9	627.8 ± 8.1	844.7 ± 11.2
35	2020–2023	0.932 ± 0.002	16.8 ± 4.0	278.0 ± 3.3	471.3 ± 8.4
35	2024	0.747 ± 0.008	24.9 ± 0.7	559.3 ± 7.5	732.5 ± 9.6
15	2020–2023	0.935 ± 0.002	16.1 ± 3.8	238.5 ± 3.4	460.2 ± 7.9
15	2024	0.782 ± 0.007	20.0 ± 0.4	519.3 ± 6.7	680.1 ± 9.7
5	2020–2023	0.940 ± 0.002	15.5 ± 3.8	252.4 ± 3.0	441.5 ± 8.3
5	2024	0.771 ± 0.009	19.3 ± 0.5	515.9 ± 7.6	697.2 ± 10.8

Table 7. Performance results for the wheat prediction models trained with XGB algorithms at each DBH scenario. The performance metrics are presented for both validation datasets (V. Dataset) and the internal validation dataset (2020–2023), as well as the external validation dataset (2024).

DBH	V. Dataset	XGB
		R²	MAPE	MAE	RMSE
			%	kg·ha⁻¹
80	2020–2023	0.886 ± 0.003	24.9 ± 4.8	392.7 ± 4.4	610.2 ± 8.6
80	2024	0.223 ± 0.016	49.5 ± 1.6	933.2 ± 12.5	1284.9 ± 16.6
50	2020–2023	0.914 ± 0.003	18.9 ± 5.3	315.4 ± 3.8	530.1 ± 8.9
50	2024	0.683 ± 0.009	27.4 ± 0.9	621.3 ± 8.0	820.9 ± 11.2
35	2020–2023	0.932 ± 0.002	17.1 ± 4.0	281.9 ± 3.4	472.5 ± 8.4
35	2024	0.747 ± 0.008	20.8 ± 0.5	555.0 ± 6.7	732.5 ± 9.8
15	2020–2023	0.934 ± 0.003	16.2 ± 3.8	274.3 ± 3.3	464.0 ± 9.1
15	2024	0.766 ± 0.008	19.2 ± 0.4	520.8 ± 6.9	704.8 ± 10.8
5	2020–2023	0.938 ± 0.002	15.4 ± 4.1	261.4 ± 3.1	450.1 ± 8.7
5	2024	0.707 ± 0.009	25.3 ± 0.6	588.2 ± 8.1	789.3 ± 11.5

Table 8. Performance results for the barley prediction models trained with Random Forest algorithms at each DBH scenario. The performance metrics are presented for both validation datasets (V. Dataset), the internal validation dataset (2020–2023), and the external validation dataset (2024).

DBH	V. Dataset	RF
		R²	MAPE	MAE	RMSE
			%	kg·ha⁻¹
80	2020–2023	0.892 ± 0.003	15.3 ± 0.2	349.0 ± 3.1	497.0 ± 5.5
80	2024	0.333 ± 0.006	92.0 ± 1.3	1487.0 ± 9.9	1795.7 ± 10.1
50	2020–2023	0.927 ± 0.002	11.6 ± 0.1	277.1 ± 2.7	409.6 ± 5.2
50	2024	0.689 ± 0.006	46.2 ± 0.8	924.0 ± 8.6	1225.4 ± 11.6
35	2020–2023	0.946 ± 0.002	10.1 ± 0.1	238.8 ± 2.3	351.1 ± 4.9
35	2024	0.724 ± 0.005	37.4 ± 0.7	837.7 ± 8.4	1154.4 ± 11.0
15	2020–2023	0.955 ± 0.001	9.5 ± 0.1	230.0 ± 2.2	338.8 ± 4.2
15	2024	0.815 ± 0.004	32.3 ± 0.6	666.1 ± 7.1	945.2 ± 10.3
5	2020–2023	0.955 ± 0.001	9.3 ± 0.1	216.3 ± 2.1	332.0 ± 4.5
5	2024	0.838 ± 0.003	33.0 ± 0.7	631.0 ± 6.4	886.0 ± 9.7

Table 9. Performance results for the barley prediction models trained with XGBoost algorithms at each DBH scenario. The performance metrics are presented for both validation datasets (V. Dataset) and the internal validation dataset (2020–2023), as well as the external validation dataset (2024).

DBH	V. Dataset	XGB
		R²	MAPE	MAE	RMSE
			%	kg·ha⁻¹
80	2020–2023	0.910 ± 0.002	13.6 ± 0.2	316.0 ± 3.0	452.5 ± 5.2
80	2024	0.405 ± 0.007	83.6 ± 1.2	1356.8 ± 10.2	1695.8 ± 10.7
50	2020–2023	0.942 ± 0.002	10.2 ± 0.1	250.2 ± 2.4	365.5 ± 4.6
50	2024	0.759 ± 0.005	38.2 ± 0.7	798.2 ± 7.2	1079.4 ± 10.5
35	2020–2023	0.955 ± 0.001	9.0 ± 0.1	219.6 ± 2.2	321.5 ± 4.3
35	2024	0.788 ± 0.004	32.9 ± 0.6	742.4 ± 6.7	1013.6 ± 9.8
15	2020–2023	0.958 ± 0.001	8.8 ± 0.1	212.9 ± 2.1	311.2 ± 4.1
15	2024	0.860 ± 0.003	26.6 ± 0.5	586.3 ± 5.8	822.3 ± 9.7
5	2020–2023	0.961 ± 0.001	8.4 ± 0.1	203.1 ± 2.0	299.1 ± 3.6
5	2024	0.855 ± 0.003	26.0 ± 0.4	529.3 ± 5.4	744.2 ± 8.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Arizo-García, P.; Castiñeira-Ibáñez, S.; Cruzado-Campos, E.; San Bautista, A.; Rubio, C. High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning. Agriculture 2026, 16, 516. https://doi.org/10.3390/agriculture16050516

AMA Style

Arizo-García P, Castiñeira-Ibáñez S, Cruzado-Campos E, San Bautista A, Rubio C. High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning. Agriculture. 2026; 16(5):516. https://doi.org/10.3390/agriculture16050516

Chicago/Turabian Style

Arizo-García, Patricia, Sergio Castiñeira-Ibáñez, Enric Cruzado-Campos, Alberto San Bautista, and Constanza Rubio. 2026. "High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning" Agriculture 16, no. 5: 516. https://doi.org/10.3390/agriculture16050516

APA Style

Arizo-García, P., Castiñeira-Ibáñez, S., Cruzado-Campos, E., San Bautista, A., & Rubio, C. (2026). High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning. Agriculture, 16(5), 516. https://doi.org/10.3390/agriculture16050516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Resolution Wheat and Barley Yield Forecasting Using Multi-Temporal Satellite Time Series and Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Site of Study

2.2. Yield Data Acquisition

2.3. Satellite Data

2.4. Experimental Design

2.4.1. Modeling Dataset Building

2.4.2. Data Analysis

2.4.3. Machine Learning Algorithms

2.4.4. Model Training and Performance Evaluation

2.5. Software

3. Results

3.1. Study of Correlation

3.2. Models Evaluation

3.2.1. Yield Prediction Models of Wheat

3.2.2. Yield Prediction Models of Barley

3.3. Model Analysis

3.3.1. Wheat Model Interpretability

3.3.2. Barley Model Interpretability

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI