Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction

Kešelj, Krstan; Stamenković, Zoran; Kostić, Marko; Aćin, Vladimir; Ivezić, Aleksandar; Ivanišević, Mladen; Magazin, Nenad

doi:10.3390/agriengineering8020071

Open AccessArticle

Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction

by

Krstan Kešelj

¹

,

Zoran Stamenković

¹

,

Marko Kostić

¹

,

Vladimir Aćin

²

,

Aleksandar Ivezić

³

,

Mladen Ivanišević

^1,*

and

Nenad Magazin

¹

Faculty of Agriculture, University of Novi Sad, Trg Dositeja Obradovića 8, 21000 Novi Sad, Serbia

²

Institute of Field and Vegetable Crops, Maksima Gorkog 30, 21000 Novi Sad, Serbia

³

BioSense Institute, University of Novi Sad, Dr. Zorana Đinđića 1, 21000 Novi Sad, Serbia

^*

Author to whom correspondence should be addressed.

AgriEngineering 2026, 8(2), 71; https://doi.org/10.3390/agriengineering8020071

Submission received: 19 December 2025 / Revised: 22 January 2026 / Accepted: 9 February 2026 / Published: 16 February 2026

(This article belongs to the Special Issue Advancements in Remote Sensing and AI-Driven Analytics for Sustainable Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate wheat yield prediction is essential for ensuring food security and sustainable resource management under the increasing challenges of climate change. This study investigates the integration of unmanned aerial vehicle (UAV)-based multispectral imaging and machine learning (ML) techniques to improve yield forecasting in European wheat cultivars. Field experiments were conducted on 400 sub-plots with varying NPK fertilization regimes and five wheat varieties, monitored across six phenological stages during the 2023 growing season in Vojvodina, Serbia. A DJI Phantom 4 Multispectral UAV collected high-resolution imagery, from which 65 vegetation indices were computed. Using PyCaret’s automated ML framework, 25 regression algorithms were evaluated for yield prediction. Ensemble models, particularly Random Forest, Extra Trees, Gradient Boosting, and LightGBM, consistently outperformed linear and kernel-based approaches. The highest prediction accuracy was achieved with the Random Forest Regressor during full flowering (BBCH 65–69), yielding an R² of 0.952 and an RMSE of 0.44 t/ha. Results highlight the temporal dynamics of model performance, with optimal predictions occurring during reproductive stages. The findings confirm that UAV-derived multispectral data, coupled with ensemble machine learning, provide a non-invasive, accurate, and computationally efficient method for yield forecasting. This framework has significant potential for supporting precision agriculture, enabling real-time decision-making, and enhancing the resilience of wheat production systems.

Keywords:

UAV multispectral imaging; vegetation indices; ensemble models; PyCaret

1. Introduction

Wheat is a crucial cereal crop and a primary source of calories and proteins for a significant share of the global population [1,2]. It plays an essential role in food security, particularly in regions where wheat-based products are dietary staples. Recent studies indicate that wheat contributes to feeding approximately 40% of the world’s population, underscoring the importance of reliable yield prediction for maintaining stable food supplies and supporting agricultural management strategies [3,4]. Accurate yield estimation is vital for optimizing resource allocation, stabilizing production and supply chains, and reducing the risks associated with fluctuations in food production [5].

In the context of Serbian wheat production, average yields typically range from 4 to 6 t/ha, although depending on the year and production conditions they may vary from as low as 3 t/ha to as high as 10 t/ha.

Wheat productivity is influenced by climatic conditions, soil fertility, and management practices. Variability in temperature, precipitation patterns, and occurrences of stress conditions can affect plant development and final yield, which highlights the need for timely and precise monitoring tools [3]. Additionally, suboptimal resource management, such as inefficient fertilizer application or irrigation scheduling, may contribute to yield instability. The increased availability of remote sensing technologies and data-driven analytics offers opportunities to monitor crop health more effectively and support improved decision-making [6,7,8].

Traditional yield estimation methods rely on field observations, crop modeling, and expert judgment. While these approaches have been widely used, they are often time-consuming, labor-intensive, and limited in their ability to capture spatial variability at the field or regional scale [5]. Remote sensing, particularly using unmanned aerial vehicles (UAVs), has introduced new possibilities for non-destructive, high-resolution monitoring of crop growth dynamics [7]. UAV-mounted multispectral sensors allow for detailed analysis of canopy reflectance characteristics, enabling the computation of vegetation indices that reflect biomass accumulation, chlorophyll content, and the physiological status of crops [8,9,10]. Such indices facilitate early detection of crop stress related to drought, nutrient imbalance, or disease pressure, enabling timely interventions to protect yield [11].

Satellite data have long been utilized for crop monitoring [11,12], yet constraints related to cloud cover, spatial resolution, and revisit frequency can limit their utility for field-scale management. UAV systems provide high spatial and temporal resolution and can be deployed on demand to capture crop conditions at critical growth stages [7]. Studies have shown that UAV-derived vegetation indices may outperform satellite-derived data in yield estimation tasks, particularly when combined with machine learning (ML) models tailored to the specific production environment [6,13].

Machine learning has significantly improved predictive performance in agricultural research, enabling the integration of diverse agronomic, spectral, and environmental variables [3,4,11]. However, challenges remain regarding generalization, data requirements, and selection of optimal modeling techniques [6]. Various ML approaches such as Long Short-Term Memory (LSTM) networks, Random Forest (RF), Extreme Gradient Boosting (XGBoost), Bayesian models, Support Vector Machines (SVMs), and Multi-Layer Perceptron (MLP) networks have been used for yield prediction, each exhibiting strengths and limitations [5,7,14,15]. For instance, ref. [4] evaluated multiple models using multispectral, RGB, and thermal features, but reported constraints related to model performance and prediction reliability. Hybrid and deep learning strategies have shown incremental improvements [8], while ref. [13] demonstrated that incorporating invasive measurements such as canopy height and chlorophyll content may improve predictive accuracy but it increases complexity and reduces scalability.

In contrast to studies requiring extensive field measurements, the present research focuses on using UAV-based multispectral imaging as a fully non-invasive data source combined with automated machine learning model comparison. The PyCaret framework enables efficient testing of multiple ML algorithms with automated hyperparameter tuning and performance benchmarking, supporting systematic model selection for yield estimation applications [9,15,16].

Therefore, the objectives of this study are to:

(a): Collect high-resolution UAV-based multispectral data across multiple wheat varieties and fertilizer treatments to capture variation in crop performance;
(b): Evaluate and compare a broad range of machine learning models for wheat yield prediction using the PyCaret framework;
(c): Identify the most effective modeling approach for accurate, scalable, and non-invasive wheat yield prediction to support precision agriculture in real-world production systems.

Despite the growing body of research on UAV-based yield prediction, most existing studies focus on single-season experiments, limited cultivar sets, or non-European production systems. Moreover, comparative evaluations of multiple machine learning models using standardized multispectral inputs across different growing seasons remain scarce. In particular, there is a lack of studies that systematically assess the robustness of machine learning-based yield prediction models for European winter wheat cultivars under real field conditions.

To address these gaps, the present study integrates UAV-derived multispectral imagery with a comprehensive set of machine learning algorithms to evaluate wheat yield prediction performance across multiple seasons and phenological stages. While UAV-based multispectral data and machine learning have been previously applied for yield estimation, the novelty of this study lies in its large experimental scale, temporal depth, and unified modeling framework. Unlike many existing studies relying on single-date observations or limited experimental setups, this work systematically analyzes the temporal dynamics of prediction accuracy using repeated UAV acquisitions. Furthermore, the application of an AutoML framework enables an objective and reproducible comparison of a wide range of regression models without manual tuning. By combining field-scale experiments, multi-season temporal analysis, and automated model selection, this research provides new insights into optimal phenological windows for wheat yield prediction and demonstrates the practical advantages of AutoML-driven approaches for precision agriculture.

Based on the identified research gaps and objectives, the following hypotheses were formulated:

H1.

Temporal UAV-based multispectral observations enable higher wheat yield prediction accuracy compared to single-date acquisitions.

H2.

The flowering stage (BBCH 65–69) represents the most robust and informative phenological window for machine learning-based wheat yield prediction across different regression models.

H3.

AutoML-based model selection using multispectral UAV data achieves comparable or superior prediction performance relative to conventional manually tuned modeling approaches.

2. Materials and Methods

2.1. Overview of the Research Methodological Framework

This study follows a structured methodological approach for wheat yield prediction, integrating UAV-based multispectral data acquisition with machine learning modeling. The workflow (Figure 1) consists of five key stages: experimental design, data collection, data preprocessing, model development, and model evaluation. A UAV equipped with multispectral and RGB sensors was used to monitor wheat growth across six phenological stages, generating 65 vegetation indices. Machine learning models were trained and evaluated using statistical metrics to identify the most accurate yield predictions. The methodological framework ensures a systematic and data-driven approach to yield estimation.

2.2. Experimental Setup, Study Location, and Wheat Phenological Stages

The experiment was conducted as part of a long-term field study at the Institute of Field and Vegetable Crops in Novi Sad, Vojvodina, Serbia, during the 2023 growing season. The trial site is geographically positioned at 45°20′00.5″ N 19°49′51.4″ E, at an altitude of 82 m above sea level (Figure 2). The field was systematically divided into 400 experimental sub-plots, each measuring 5 × 10 m, arranged in a 20 × 20 grid structure. The soil type, classified as Haplic Chernozem Aric, is among the most fertile in the Vojvodina region, representing approximately 43% of the total arable land.

In the study area (Serbia), wheat is commonly sown in autumn using a conventional planting pattern with a row spacing of 12.5 cm. The standard seeding rate ranges from 450 to 550 viable seeds per square meter, depending on the variety, soil fertility, and expected overwintering conditions. This planting pattern is widely adopted in Serbian wheat production systems to ensure uniform crop establishment and optimal canopy closure during the early growing stages. The study included five European wheat cultivars (NS Epoha, NS Futura, NS Igra, NS Obala and NS Rajna) sown at a density of 0.205–0.225 t/ha, depending on the cultivar. Each cultivar was subjected to 20 different NPK fertilization treatments, with each treatment replicated four times to ensure statistical validity and a comprehensive assessment of yield variability under different nutrient regimes.

Phenological Stages and Measurement Timeline

Throughout the growing season, data collection was carried out across six key phenological stages, in accordance with the BBCH growth scale:

End of winter dormancy/Beginning of stem elongation (BBCH 20–29)—The first phase of data collection, capturing early vegetative growth and the resumption of metabolic activity.
Stem elongation (BBCH 30–39)—Characterized by internode elongation, critical for biomass accumulation.
Booting and heading (BBCH 40–59)—The period when the spike emerges from the leaf sheath, influencing grain formation.
Flowering (BBCH 60–69)—The reproductive phase, where successful pollination determines grain set.
Milk stage (BBCH 70–79)—The grain-filling period, during which chlorophyll degradation begins.
Dough stage (BBCH 80–89)—The final pre-harvest stage, when kernels reach physiological maturity.

Stages were monitored through 15 consecutive measurement dates, ensuring a detailed temporal assessment of vegetation indices and crop performance (Table 1). By integrating UAV-based remote sensing with precise phenological tracking, this study provides insights into how spectral responses correlate with physiological changes in wheat growth.

All UAV flights were conducted under dry conditions with no recorded precipitation and predominantly clear or mostly sunny skies (Table 2), ensuring stable illumination and minimizing atmospheric effects on multispectral data acquisition.

2.3. Yield Variability Under Different NPK Treatments

The experimental setup was designed to simulate a wide range of field conditions, ensuring a realistic representation of agronomic variability. A total of 400 experimental sub-plots were systematically arranged, each planted with one of five wheat cultivars. To assess the impact of nutrient availability on crop performance, 20 distinct NPK fertilization regimes were applied, creating a complex matrix of treatment combinations.

This diverse experimental approach allowed for an investigation into how different nutrient levels influence wheat growth and spectral responses, mirroring the heterogeneity found in real-world agricultural landscapes. A comprehensive overview of NPK treatments and their corresponding yield outcomes across different wheat cultivars is presented in Table 3.

The structured fertilization treatments resulted in a broad spectrum of yield outcomes, reflecting the influence of different nutrient combinations on wheat growth. The lowest recorded mean yield was 2.55 t/ha for the NS Futura variety in the 0N-0P-100K treatment, indicating the limited impact of potassium alone on wheat productivity. On the other hand, the highest mean yield of 9.09 t/ha was observed in the 0.10N-0.15P-0.05K treatment for NS Igra, highlighting the substantial yield improvement with increased nitrogen and phosphorus availability. This dataset (Table 3) provides a comprehensive foundation for analyzing spectral responses and developing predictive models based on multispectral UAV data.

2.4. Multispectral UAV Platform and Sensor Specifications

The DJI Phantom 4 Multispectral (P4M) (SZ DJI Technology Co., Ltd., Shenzhen, China) drone was utilized as the primary remote sensing platform for data acquisition in this study. This UAV is specifically designed for agricultural and environmental monitoring, integrating a high-precision multispectral imaging system with a stable flight platform.

The UAV operates using an Intelligent Flight Battery (6000 mAh, LiPo 4S SZ DJI Technology Co., Ltd., Shenzhen, China), allowing approximately 23–27 min of continuous flight per charge, depending on weather conditions and payload. The aircraft operates on a GPS + GLONASS dual band positioning system, ensuring high flight stability and precise georeferencing of acquired images. The drone is equipped with RTK (Real-Time Kinematic) technology, which enables centimeter-level positioning accuracy.

The integrated multispectral camera system consists of six sensors, RGB sensor and five distinct spectral bands which are detailed in Table 4:

Each spectral sensor comes with 1/2.9-inch CMOS and provides a resolution of 2 MP, while the RGB camera operates at 12 MP. Multispectral camera produces images in TIFF format and RGB camera produces images in JPEG format. The global shutter mechanism eliminates motion blur, enabling accurate spectral readings even during flight.

2.5. Data Acquisition Strategy and Digital Footprint

All UAV-based measurements were conducted under optimal meteorological conditions, ensuring clear skies, no cloud cover, and the sun positioned at or near zenith. To achieve high geospatial precision, the DJI D-RTK 2 high-precision GNSS receiver was employed, providing real-time kinematic (RTK) corrections for sub-centimeter accuracy. Before each flight, the UAV’s compass was calibrated on-site to minimize potential magnetic interference and enhance navigational reliability.

The UAV’s flight plan was algorithmically generated using the DJI GS Pro software suite (Version: V2.0 2018.11, DJI, Shenzhen, China). The mission parameters were adaptively optimized based on real-time solar irradiance data, obtained through an onboard solar radiation spectral sensor. This allowed for automatic adjustments of photographic parameters, including ISO, white balance, and shutter speed, ensuring consistent image quality without the need for manual calibrations.

Each flight was conducted at an altitude of 50 m, providing a spatial resolution of approximately 2.65 cm per pixel. To ensure comprehensive field coverage and prevent data gaps, a 75% image overlap was maintained both longitudinally and laterally. The drone’s exposure setting was configured to 2 s per image, with flight margins of 10 m to account for positional drift. On average, each mission covered an area of approximately 2.85 hectares, optimizing data collection efficiency for large-scale spectral analysis.

2.6. Processing and Computation of Vegetation Indices

After data collection using the DJI P4 Multispectral drone, raw images were processed using Pix4Dmapper Enterprise 4.5.6, 2020, Pix4D, Prilly, Switzerland). The workflow involved stitching individual images into orthomosaics, ensuring precise geometric alignment of the experimental field. This step produced georeferenced RGB and multispectral orthomosaics, exported as GeoTiff files containing metadata on pixel location and spatial orientation. These files served as the basis for further spectral analysis.

To extract relevant spectral information, the processed datasets were imported into ArcGIS (ArcGIS-ArcMap 10.8.2, 2021, Esri Inc., 380 New York St., Redlands, CA, USA), where numerical reflectance values were assigned to each experimental plot. Following the approach of [17], statistical descriptors were calculated for each plot, enabling a detailed assessment of spectral variations across treatments.

A total of 65 vegetation indices were selected from the literature to capture key spectral patterns linked to wheat growth and biomass. This included 40 indices based on multispectral reflectance (red, blue, green, red-edge, and NIR) and 25 indices derived from RGB data. The formulas and references for all computed indices are provided in Table 5 and Table 6.

By analyzing spectral reflectance values across different wavelengths, these indices provided insights into vegetation health, canopy structure, and physiological variations. The methodology for selecting and computing indices followed the principles established by [9] ensuring consistency with previous research while allowing for further application in predictive modeling.

2.7. Predictive Modeling and Accuracy Assessment

To develop a robust framework for wheat yield prediction, a machine learning-based approach was implemented using PyCaret (Version: 3.0, Toronto, ON, Canada), an open-source automated machine learning (AutoML) library. The dataset consisted of 400 experimental plots, where spectral data were systematically collected, yielding 65 vegetation indices per plot. The objective was to establish predictive models that correlate these spectral indices with yield values, resulting in a dataset comprising 66 features (65 vegetation indices + actual yield).

Cultivar identity and fertilization treatments (NPK levels) were not included as explicit input variables in the machine learning models. Instead, their effects were implicitly captured through the UAV-derived multispectral spectral responses and vegetation indices. This modeling choice was made deliberately to ensure that yield prediction relied exclusively on non-invasive remotely sensed data, thereby enhancing model generalization and operational applicability. By avoiding the use of categorical or management-specific inputs, the models were designed to learn yield-related variability directly from canopy spectral characteristics, which inherently reflect differences in cultivar traits and fertilization-induced physiological responses.

Before modeling, the dataset was subjected to data normalization to ensure that all variables were on a comparable scale. This step helped mitigate potential biases caused by differences in numerical ranges across vegetation indices and prevented certain features from disproportionately influencing the models. Normalization was performed using Min-Max scaling, transforming values into a range between 0 and 1.

To streamline the model development process, PyCaret’s regression module was utilized to automate feature selection, model comparison, and hyperparameter tuning. PyCaret efficiently handled several key tasks, including:

Automated preprocessing (handling missing values, encoding categorical variables, feature scaling).
Feature selection (removing redundant or non-contributory features).
Comparative model evaluation (testing multiple regression algorithms).
Hyperparameter optimization (fine-tuning models for better performance).

In this context, feature selection does not involve removing vegetation indices, but rather computing and assigning importance weights to each index based on its contribution to the prediction. This approach allows the model to emphasize highly informative indices while reducing the influence of those with lower predictive value, ensuring that all spectral information is retained while improving model interpretability and preventing overfitting.

The modeling workflow involved training and evaluating 25 different regression algorithms, with a comprehensive list of all tested models presented in Table 7.

To ensure the robustness of the predictive models, the dataset was divided into training (280 samples) and test (120 samples) subsets using a random split strategy following a 70/30 ratio. This random allocation allowed samples from different wheat cultivars and fertilization treatments to be present in both subsets, enabling the evaluation of overall predictive performance under realistic field-scale variability rather than strict spatial or temporal generalization. The test set, comprising 30% of the total dataset, was reserved for assessing model performance on previously unseen data.

In addition, 10-fold cross-validation was applied during model training, whereby the training data were iteratively divided into ten subsets, with nine used for model fitting and one for validation in each iteration. This procedure reduced the risk of overfitting and ensured that model evaluation was based on multiple independent validation cycles rather than a single train–test split, thereby improving the robustness of the performance assessment. Model training and hyperparameter optimization were performed using the PyCaret AutoML framework (PyCaret version 3.0). For each regression algorithm, PyCaret automatically conducts model-specific hyperparameter tuning using internal cross-validation-based search strategies, aiming to maximize predictive performance on the training data. Hyperparameters are optimized independently for each model and each phenological stage, ensuring objective and reproducible model comparison without manual intervention. Due to the large number of evaluated models and phenological stages, explicit reporting of individual hyperparameter settings was intentionally omitted to avoid excessive methodological complexity and to maintain focus on comparative model performance and operational robustness.

Model performance was assessed using statistical metrics, including:

1.: Root Mean Squared Error (RMSE)—provides insight into prediction variance.

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(1)

where

y_{i}

represents the observed (measured) values,

{\hat{y}}_{i}

denotes the predicted values, and n is the total number of observations. Lower RMSE values indicate smaller deviations between predictions and observations, and thus better model performance.

2.: R² Score (Coefficient of Determination)—quantifies how well the model explains variance in yield.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(2)

where

\bar{y}

is the mean of the observed values. An R² value approaching 1.0 indicates a strong agreement between predicted and observed values, reflecting high model explanatory power.

The selection of RMSE and R² as performance metrics follows established practice in UAV- and remote sensing-based crop yield prediction studies, where these indicators are commonly used to evaluate regression accuracy and model robustness [3,4,5,6,8,9,10,12,15]. RMSE is particularly sensitive to larger prediction errors and provides insight into overall prediction variance [3,4,5,8,9]. The coefficient of determination (R²) is widely adopted to quantify the proportion of yield variability explained by the model, enabling direct comparison across different modeling approaches and datasets [3,4,5,6,8,9,10,12,15]. Together, these complementary metrics provide a comprehensive and interpretable evaluation of model performance in an agricultural context.

In addition to evaluating model accuracy, training time (TT, sec) was recorded for each regression algorithm to assess computational efficiency. The TT metric represents the total time required for model training and hyperparameter tuning.

3. Results

3.1. Temporal Dynamics of Model Performance Across 15 Measurements

The temporal dynamics of model performance were evaluated over 15 distinct measurement dates corresponding to critical wheat phenological stages, ranging from the end of winter dormancy (BBCH 20–29) to late dough stage (BBCH 87–89). This section presents the predictive performance of the five top-performing machine learning models—Random Forest Regressor, Extra Trees Regressor, Gradient Boosting Regressor, Light Gradient Boosting Machine, and AdaBoost Regressor—based on two primary evaluation metrics: the coefficient of determination (R²) and root mean squared error (RMSE).

3.1.1. R² Dynamics Across Measurement Dates

Figure 3 presents the R² scores for the five best-performing models over the 15 measurement dates. A clear temporal trend can be observed, where model accuracy generally increases as the growing season progresses, peaking during specific phenological stages.

During the early stages of crop development (measurements 1–3; BBCH 20–29), R² values ranged from 0.80 to 0.86, reflecting the limited ability of models to accurately predict yield based on early vegetative indices. The accuracy improved significantly between measurements 4 and 7 (BBCH 31–39), where the appearance of nodes and flag leaves enhanced the spectral distinction between treatments, resulting in R² values rising to approximately 0.92.

A notable decline in model performance occurred at measurement 8 (BBCH 49–50), coinciding with the pre-booting stage. This reduction in R² (down to ~0.89) suggests a potential physiological complexity or spectral saturation that affected the models’ capacity to differentiate yield outcomes during this transitional period.

Following this dip, model accuracy recovered during heading and flowering stages (measurements 9–12; BBCH 51–69), where R² scores again exceeded 0.92. The highest predictive performance was observed during the flowering stage (BBCH 61–69), where the reproductive development of wheat likely contributed to greater variability captured by the UAV multispectral data.

After the flowering stage, a gradual decline in R² was noted through the milk and dough stages (measurements 13–15; BBCH 71–89). By the late dough stage, R² values decreased to approximately 0.88, suggesting diminishing returns of spectral data in explaining yield variance as the crop approached physiological maturity.

Among the models, the Random Forest Regressor consistently outperformed others, especially during mid to late-season stages, followed closely by Extra Trees and LightGBM. AdaBoost and Gradient Boosting regressors exhibited slightly lower but still competitive performance throughout.

3.1.2. RMSE Dynamics Across Measurement Dates

Figure 4 illustrates the RMSE values for the same five models over the 15 measurement dates. RMSE exhibited an inverse trend relative to R², decreasing as the models’ predictive accuracy improved over time.

Initial RMSE values were high (up to 0.86 t/ha) during the early stages (measurements 1–3), reflecting the limited predictive power of spectral data prior to significant crop development. A steady decrease in RMSE occurred between measurements 4 and 7 (BBCH 31–39), reaching its lowest point (~0.52 t/ha) by measurement 6, which corresponds to the third node visible stage (BBCH 33–34).

Like the R² trend, RMSE values increased slightly at measurement 8 (BBCH 49–50), peaking again around 0.59 t/ha. This transient reduction in accuracy is consistent with the physiological transition of the crop and possible spectral confusion at pre-booting.

The lowest RMSE values were recorded during measurements 11 and 12 (BBCH 61–69), particularly during the full flowering stage, where values dropped below 0.48 t/ha. This phase demonstrated optimal model performance, reflecting the highest predictive precision achieved in the study.

In the final stages (measurements 13–15; BBCH 71–89), RMSE increased again, rising to ~0.55–0.65 t/ha. This late-season performance decline may be attributed to spectral homogeneity as wheat crops approached maturity, reducing model discrimination capacity.

As with R², Random Forest Regressor and Extra Trees Regressor achieved the lowest RMSE values consistently, particularly during mid-season measurements. LightGBM also demonstrated comparable RMSE values, confirming its suitability for yield prediction in multispectral datasets.

3.2. Comparative Evaluation of 25 Machine Learning Models Across Multiple Metrics

A comprehensive comparison of 25 machine learning regression models was performed to evaluate their predictive accuracy for wheat yield estimation across 15 measurement dates. The results are presented in two summary tables, showing the coefficient of determination (R²) and root mean squared error (RMSE) for each model and time point. These tables highlight clear differences in model performance, with ensemble-based algorithms generally achieving superior accuracy and stability.

3.2.1. R² Score Comparison Across 25 Models

The results demonstrate a consistent trend where ensemble-based models outperform linear and kernel-based algorithms.

Top Performers:

Random Forest Regressor and Extra Trees Regressor consistently achieved the highest R² scores across nearly all measurements. For example, Random Forest Regressor reached an R² of 0.9317 on 10 May 2023 (measurement 11, full flowering stage, BBCH 65–69), the highest recorded in this study.

Light Gradient Boosting Machine and Gradient Boosting Regressor also showed excellent predictive performance, with R² values regularly exceeding 0.92 during key stages such as heading and flowering (BBCH 51–69).

AdaBoost Regressor, while generally competitive, exhibited slightly lower R² scores compared to the aforementioned models, particularly in early- and late-season measurements.

Moderate Performers:

Models such as K Neighbors Regressor (KNN) and Support Vector Regression (SVR) demonstrated moderate R² scores, typically ranging from 0.80 to 0.90 during the mid-season stages (BBCH 30–69), but experienced notable drops during the early (BBCH 20–29) and late (BBCH 80–89) stages.

Low Performers:

Linear models (Linear Regression, Lasso Regression) and kernel-based methods (Kernel Ridge) consistently underperformed, with R² values frequently below 0.80. Notably, Kernel Ridge and Least Angle Regression failed to provide meaningful predictions, often resulting in extreme negative or undefined R² values across multiple measurement dates. Although the Dummy Regressor, used as a baseline model, exhibited consistently poor performance (e.g., R² = −5.31 on 21 February 2023), Kernel Ridge and Least Angle Regression were the least effective models throughout the growing season due to their instability and lack of robustness in this application context. For instance, Kernel Ridge recorded R² values as low as −13.39 on 21 February 2023, while Least Angle Regression produced missing or non-numeric results in several cases. These outcomes indicate that both models struggled to generalize in the presence of high-dimensional multispectral data and complex nonlinear relationships characteristic of crop yield prediction.

Trend Observation:

The highest R² scores were recorded during the heading, flowering, and early milk stages (measurements 9–13; BBCH 51–73), corresponding to critical periods in wheat development when the relationship between vegetation indices and final yield is most pronounced.

3.2.2. RMSE Comparison Across 25 Models

The observed patterns provide a clear distinction between high-performing ensemble models and the remaining algorithms.

Top Performers:

Random Forest Regressor consistently achieved the lowest RMSE values throughout the growing season. The minimum RMSE was 0.44 t/ha, recorded on 1 June 2023 (measurement 13; early milk stage, BBCH 71–73). Extra Trees Regressor and Gradient Boosting Regressor also performed well, maintaining RMSE values below 0.50 t/ha during the critical reproductive stages, particularly between measurements 10 and 13 (5 May 2023–1 June 2023). Light Gradient Boosting Machine (LightGBM) demonstrated comparable performance, although with slightly higher RMSE values in certain measurements (e.g., 0.56 t/ha on 13 June 2023, BBCH 80–85). These models demonstrated stable predictive accuracy, particularly from heading to the early dough stages (BBCH 51–85), where spectral indicators strongly correlated with yield.

Moderate Performers:

AdaBoost Regressor and Extreme Gradient Boosting (XGBoost) achieved moderate RMSE values, typically ranging between 0.50 t/ha and 0.65 t/ha. They were generally less consistent, showing larger fluctuations across different phenological stages. K Neighbors Regressor (KNN), Support Vector Regression (SVR), and Decision Tree Regressor exhibited higher RMSEs, often exceeding 0.65 t/ha, particularly in early-season (BBCH 20–29) and late-season (BBCH 87–89) measurements.

Low Performers:

Linear models (Linear Regression, Lasso Regression), Elastic Net, and kernel-based models (Kernel Ridge) showed significantly higher RMSE values, frequently surpassing 0.80 t/ha. Dummy Regressor and Random Sample Consensus (RANSAC) performed poorly, with RMSE values consistently above 1.0 t/ha. Kernel Ridge Regressor produced extremely high and unstable error values, often exceeding 6.0 t/ha, and in some cases, returning undefined or infinite results. Least Angle Regression also failed to deliver reliable predictions, frequently resulting in missing or infinite values across multiple dates.

Temporal Trends:

The lowest RMSE values were consistently observed between measurements 9 and 13 (27 April 2023–1 June 2023; BBCH 51–73), aligning with the heading, flowering, and early grain-filling phases. These periods coincide with optimal physiological development, where spectral signatures most accurately reflect final yield outcomes.

Higher RMSE values were recorded at the beginning (measurements 1–3; BBCH 20–29) and at the end of the growing season (measurements 14–15; BBCH 80–89), likely due to reduced spectral variability and the diminishing sensitivity of vegetation indices to yield components at this stage.

3.3. Identification of the Best Performing Model for Each Measurement Date

For each of the 15 measurement dates, the best-performing regression model was identified based on the highest R² score obtained on test data. The selected models and their corresponding R² values varied over time, reflecting the influence of different phenological stages on model accuracy.

3.3.1. Early Growth Stages (BBCH 20–29)

21 February 2023 (BBCH 20–29): Random Forest Regressor was the best-performing model (R² = 0.81), which is expected due to limited spectral differentiation early in the season.

8 March 2023 (BBCH 20–29): Random Forest Regressor again achieved the highest performance (R² = 0.84), showing improved canopy separability.

14 March 2023 (BBCH 20–29): Random Forest Regressor maintained the lead (R² = 0.87), indicating stronger yield–spectral coupling as biomass accumulated.

3.3.2. Stem Elongation and Node Development (BBCH 31–39)

20 March 2023 (BBCH 31): Random Forest, Extra Trees, Gradient Boosting, and LightGBM performed equally well (R² = 0.88), reflecting increased structural differentiation.

29 March 2023 (BBCH 32): The same ensemble group remained dominant (R² = 0.91), confirming robustness during rapid canopy development.

6 April 2023 (BBCH 33–34): Random Forest Regressor provided the highest accuracy (R² = 0.93), marking a key improvement as the flag leaf emerged.

3.3.3. Heading and Flowering Phases (BBCH 49–69)

12 April 2023 (BBCH 37–39): Ensemble models (RF/ET/GBR/LightGBM) performed similarly (R² ≈ 0.89), while Ridge did not outperform them.

22 April 2023 (BBCH 49–50): Ensemble models continued to yield the highest accuracy (R² ≈ 0.89).

27 April 2023 (BBCH 51–55): Ensemble models again led performance (R² ≈ 0.92).

5 May 2023 (BBCH 59): Random Forest Regressor achieved the highest accuracy (R² = 0.92), during full heading.

3.3.4. Flowering and Grain Filling Phases (BBCH 61–89)

10 May 2023 (BBCH 61): Random Forest Regressor remained best (R² = 0.93).

22 May 2023 (BBCH 65–69): Random Forest Regressor reached its peak (R² = 0.93), indicating excellent predictive power during full flowering.

1 June 2023 (BBCH 71–73): Random Forest Regressor again performed best (R² = 0.92), even as grain filling progressed.

13 June 2023 (BBCH 80–85): Ensemble models remained leading (R² ≈ 0.90), while SVR did not outperform them.

23 June 2023 (BBCH 87–89): Random Forest Regressor concluded the season with the best performance (R² = 0.91), despite reduced spectral sensitivity at maturity.

3.3.5. General Performance Patterns Across Models

The comparative analysis of all 25 machine learning models reveals several consistent patterns. Ensemble methods—particularly Random Forest, Extra Trees, Gradient Boosting, and LightGBM—demonstrated the highest accuracy and stability across different phenological stages, making them the most reliable for wheat yield prediction based on UAV multispectral data.

The performance gap between these ensemble models and simpler linear algorithms (such as Linear Regression, Ridge, and Lasso), as well as kernel-based approaches (Kernel Ridge and Least Angle Regression), was especially pronounced during the early vegetative stages (BBCH 20–29) and at crop maturity (BBCH 80–89), where spectral data were less informative and model robustness became critical.

The most accurate predictions were consistently achieved during the heading, flowering, and early grain-filling stages (BBCH 51–73), where the relationship between vegetation indices and final yield was strongest. During these stages, ensemble models frequently achieved R² values above 0.90 and RMSE values below 0.50 t/ha.

Ensemble models consistently outperformed all other algorithms, highlighting their suitability for operational deployment in precision agriculture workflows.

3.4. Training Time and Computational Efficiency of Models

Training time (TT, in seconds) was evaluated for all 25 machine learning models across 15 measurement dates. The computational efficiency analysis highlights clear differences in processing time, largely aligned with the complexity of the models as show in Figure 5. All training and evaluation procedures were performed on a workstation equipped with an Intel i5–12400F CPU, NVIDIA RTX 2060 (12 GB) GPU, 32 GB RAM, and a 1 TB NVMe M.2 SSD, ensuring consistent computational conditions across all experiments.

Simpler models, such as Kernel Ridge Regression, Least Angle Regression, and Elastic Net, consistently demonstrated the shortest training times. For example, Kernel Ridge Regression had an average training time of 0.046 s, while Least Angle Regression averaged 0.059 s. These models completed training rapidly but offered limited predictive performance, as discussed in previous sections.

Among the linear models, Ridge Regression, Bayesian Ridge, and Passive Aggressive Regressor generally trained in under 0.3 s, balancing computational efficiency with moderate prediction accuracy.

Ensemble models, including Random Forest Regressor, Extra Trees Regressor, Gradient Boosting, and LightGBM, exhibited moderate training times:

Random Forest averaged 0.82 s (min: 0.034 s; max: 1.53 s).
Extra Trees Regressor averaged 0.54 s.
LightGBM averaged 0.40 s, confirming its efficiency compared to other ensemble approaches.

More complex models, such as Lasso Regression, Lasso Least Angle Regression, TheilSen Regressor, and Passive Aggressive Regressor, incurred significantly higher training times. For instance:

Lasso Least Angle Regression averaged 1.97 s, with maximum times up to 8.30 s.
TheilSen Regressor averaged 1.13 s, peaking at 9.22 s.

The Dummy Regressor and Random Sample Consensus models, although computationally simple by design, exhibited unexpectedly high training times in some cases, with averages of 0.59 s and 0.64 s, respectively. This is likely due to dataset-specific factors or implementation details.

A clear trade-off was observed between training time and predictive performance. Ensemble models offered the best balance of computational demand and prediction accuracy, justifying their use in practical applications. On the other hand, although some linear and kernel-based models trained quickly, they failed to deliver competitive yield prediction results.

Although the time differences observed here are relatively small, they may become significantly more pronounced when working with larger datasets or higher-resolution imagery.

4. Discussion

4.1. Interpretation of Key Findings

This study confirms the superiority of ensemble-based machine learning models, notably Random Forest Regressor and LightGBM, in predicting wheat yield from UAV-derived multispectral data. These models demonstrated consistent accuracy across 15 phenological stages, especially during key developmental phases such as flowering and grain filling (BBCH 51–89).

The Random Forest Regressor achieved the highest test R² value of 0.952 during full flowering (BBCH 65–69), and a training R² of 0.9317 with 10-fold cross-validation during flowering onset (BBCH 61), showcasing its robustness. LightGBM also delivered high accuracy with shorter training times, highlighting its practical value.

Error metrics were minimized in later stages, with the lowest RMSE (0.44 t/ha) observed during milk and dough stages (BBCH 71–89). These findings suggest that late-season spectral data offer the most reliable indicators of final yield.

In contrast, linear and kernel-based models such as Kernel Ridge, and Least Angle Regression consistently underperformed. Dummy Regressor results confirmed the added value of advanced modeling approaches.

For operational implementation, UAV monitoring should be prioritized during flowering to early grain-filling stages (BBCH 61–75), when spectral signals show the strongest correlation with final yield. Conducting one to two UAV flights in this period and applying ensemble-based models such as Random Forest or LightGBM enables high-accuracy yield prediction without requiring additional ground-based measurements. This provides a cost-effective and scalable workflow for farmers, consultants, and digital agriculture service providers.

The decision to exclude cultivar identity and fertilization treatments as explicit input variables has important implications for model interpretability and generalization. While the inclusion of such variables could potentially improve predictive performance within a specific experimental setup, it would also reduce the transferability of the models to other fields, cultivars, or management systems. In this study, cultivar- and fertilization-specific effects were intentionally represented implicitly through spectral vegetation indices, which capture integrated physiological responses of the crop canopy. This approach supports the development of scalable UAV-based yield prediction models that are less dependent on prior agronomic knowledge and more robust to variations in genotype and management practices. Nevertheless, the explicit integration of agronomic variables remains a relevant direction for future research focused on interpretability and site-specific optimization.

The use of a large number of vegetation indices inevitably introduces a high degree of correlation among input features, which may raise concerns regarding multicollinearity and model interpretability. In this study, the primary objective was not to identify a minimal or physiologically optimal subset of indices, but rather to evaluate the predictive robustness of machine learning models under realistic UAV-based monitoring conditions. Ensemble-based models such as Random Forest, Gradient Boosting, and LightGBM are known to be relatively robust to multicollinearity, as they internally perform feature weighting, regularization, and split selection that reduce the adverse effects of redundant inputs.

Within the AutoML framework, model selection and hyperparameter optimization further mitigated potential multicollinearity effects by favoring models and configurations that generalize well on unseen data. Consequently, the inclusion of a comprehensive set of vegetation indices was intended to maximize information content rather than model interpretability. Nevertheless, an explicit analysis of multicollinearity and feature importance was not the primary focus of this study and remains a limitation. Future work should therefore investigate dimensionality reduction techniques and feature selection strategies to improve interpretability and to better understand the contribution of individual vegetation indices at different phenological stages.

The primary objective of this study was the comparative evaluation of machine learning models and their predictive robustness across phenological stages, rather than the interpretability of individual input features. Consequently, a detailed feature importance analysis was not performed. Ensemble-based models were assessed based on overall predictive accuracy and stability, which aligns with the operational goal of identifying reliable yield prediction windows rather than explaining individual vegetation index contributions. Feature importance analysis therefore remains an important direction for future work, particularly for improving agronomic interpretability and understanding the physiological relevance of specific spectral indices at different growth stages.

From an agronomic perspective, the consistently superior performance observed during the flowering stage (BBCH 65–69) is biologically plausible. This phenological phase represents a critical transition from vegetative to reproductive development, during which canopy structure, chlorophyll content, and biomass accumulation reach peak expression and are strongly linked to final grain number and yield potential. Spectral signals acquired during flowering therefore capture integrated effects of crop vigor, nitrogen status, and stress responses accumulated throughout earlier growth stages, explaining their high predictive power. As grain filling progresses, these physiological signals become increasingly constrained by sink–source dynamics, which may reduce the sensitivity of spectral indices to final yield outcomes.

The results of this study provide clear support for the formulated hypotheses. The use of multi-temporal UAV-based multispectral data consistently improved yield prediction accuracy compared to single-date observations, confirming Hypothesis H1. Furthermore, the flowering stage (BBCH 65–69) emerged as the most robust and informative phenological window across different machine learning models, validating Hypothesis H2. Finally, the AutoML-based modeling framework demonstrated comparable or superior performance relative to conventional manually tuned approaches, supporting Hypothesis H3 and highlighting the effectiveness of automated model selection for UAV-based yield prediction.

4.2. Comparative Evaluation with Prior Studies

The findings of this study, particularly the high predictive accuracy achieved by the Random Forest Regressor (R² = 0.952 on test data during BBCH 65–69) and LightGBM models, demonstrate the potential of ensemble learning methods when applied to UAV-based multispectral imagery for wheat yield prediction. When compared to previous research, these results represent a significant advancement, both in terms of accuracy and methodological scope.

For instance, ref. [6] employed Multilayer Perceptron (MLP) models integrated with Bayesian Model Averaging (BMA) and Copula Bayesian Model Averaging (CBMA), using 30 years of meteorological and management data across Iran. They reported R² values up to 0.92 using CBMA, yet their focus was on long-term regional yield prediction, whereas our study emphasizes in-season, high-resolution forecasting using UAV data at the plot scale.

Other researchers have incorporated a wide variety of input features for yield prediction. Study [11] combined vegetation indices such as NDVI, enhanced vegetation index (EVI), leaf area index (LAI), meteorological data, and Gross Primary Productivity within a Bayesian-optimized long short-term memory neural network (LSTM) model, achieving RMSE of 0.178 t/ha and R² of 0.82. Similarly, ref. [4] used hyperspectral canopy reflectance in breeding trials with a Bayesian model averaging (BMA) model, obtaining an R² of 0.684. Ref. [5] extended this approach by integrating RGB, multispectral, and thermal imagery in a stacking ensemble, reaching R² of 0.692 and RMSE of 0.916 t/ha. Our study, by comparison, demonstrates that high performance can be achieved using multispectral data alone, without the need for sensor fusion.

Other approaches have focused on historical data or broader spatial scales. Study [3] modeled wheat yield using long-term climate and yield data combined with global climate models, reporting R² = 0.953 and RMSE = 0.107 t/ha. Study [12] used solar-induced chlorophyll fluorescence (SIF), near-infrared reflectance of vegetation (NIRv), and other satellite-derived indices to model yield across the U.S., achieving a peak R² of 0.75. While these studies illustrate the value of climate-based and satellite-derived modeling, our approach prioritizes in-season responsiveness and sub-field spatial precision.

A few studies have also leveraged UAV and satellite imagery at varying resolutions. Ref. [10] reported NDVI–yield correlations of R² = 0.6–0.9 using Sentinel-2 data across a limited number of time points. Ref. [15] modeled corn yield using climate and spatiotemporal data at the county level, achieving R² = 0.81. Ref. [8] combined LSTM and RF models with UAV-based multispectral and thermal data, obtaining R² = 0.78 and RMSE = 0.684 t/ha. Our findings, using spectral-only UAV data, provide more temporally resolved insights with higher predictive strength.

Other modeling frameworks have focused on neural networks and time-series analysis. Ref. [39] applied convolutional neural networks using soil, weather, and phenological variables, yielding RMSE = 0.64 t/ha and r = 0.87. Ref. [14] developed an LSTM model using Moderate Resolution Imaging Spectroradiometer (MODIS)-derived LAI, reporting R² = 0.87 and RMSE = 0.522 t/ha. Despite their use of broader time-series and environmental variables, our UAV-based models outperformed these in both R² and RMSE.

Several recent studies also employed PyCaret to explore large model libraries. Ref. [9] estimated photosynthetic efficiency (Fv/Fm where Fv is variable fluorescence and Fm is maximum fluorescence) using multispectral and RGB imagery, achieving R² = 0.925, although their target was not yield but physiological stress. Ref. [7] combined RGB, MS, thermal infrared (TIR), and texture data in a stacking ensemble, reaching R² = 0.733. Our PyCaret-based pipeline, focused strictly on multispectral inputs, achieved higher performance with simpler implementation.

Ref. [13] used UAV RGB imagery and ground-measured physiological traits for yield prediction. Their highest R² was 0.93 during flowering, but their method depended on extensive in-field calibration data. In contrast, our UAV-only multispectral model reached R² = 0.952 without any ground measurements, underlining its scalability and operational feasibility.

4.3. Contributions and Practical Implications

This study provides a comprehensive comparison of 25 machine learning models across 15 phenological stages, offering one of the most detailed temporal assessments reported in wheat yield prediction research. The results demonstrate that UAV-based multispectral imagery alone can achieve high predictive performance, confirming its suitability for non-invasive field-scale monitoring. Ensemble models, particularly Random Forest and LightGBM, consistently showed strong accuracy and stability, indicating their practicality for use in precision agriculture platforms and advisory decision-support systems.

Furthermore, the identification of optimal prediction periods, especially around flowering and early grain-filling stages, provides actionable guidance for scheduling UAV monitoring flights and optimizing data collection efforts. These findings can help reduce operational costs while maintaining high prediction reliability. It is also important to emphasize that the required equipment is relatively affordable: a professional agricultural UAV system suitable for this workflow can be acquired for approximately €5000, while a standard mid-range computer capable of running the modeling pipeline costs around €2000. This makes the proposed approach accessible not only to researchers but also to producers, cooperatives, and agronomic advisory services.

At present, no fully automated software platform exists that would allow non-expert users to directly implement the complete UAV acquisition, vegetation index extraction, and model-based yield prediction workflow described in this study. However, the modeling framework developed here was intentionally designed to be modular and transferable, and efforts are currently underway to integrate these components into a user-oriented software tool. The long-term aim is to provide an accessible interface that enables agronomists, advisors, and producers to perform yield prediction without requiring advanced programming or machine learning expertise. Future work will therefore focus on implementing this workflow into a standalone application, which is planned to be released once validated across multiple locations and production seasons.

Finally, the modeling framework demonstrated here can be extended to larger-scale monitoring, including satellite-based applications. Although satellite data have lower spatial resolution and longer revisit intervals, the workflow allows for adaptation through techniques such as temporal averaging or feature harmonization, suggesting the potential for scalable regional yield forecasting.

4.4. Limitations and Future Work

Limitations of this study include the use of data collected from a single growing season and a single geographic location, which may constrain the broader applicability and generalizability of the developed models across diverse environmental conditions, management practices, and wheat genotypes. While the results demonstrate strong predictive performance within the studied context, model robustness may vary when applied to different agro-ecological zones. Future work should therefore incorporate multi-season and multi-location datasets to capture wider climatic and soil variability and to ensure that the model generalizes across regions and production systems. Formal statistical significance testing between model performances was not conducted, as the primary objective was comparative ranking and operational robustness rather than hypothesis testing between individual algorithms.

Although treatments were replicated four times, substantial yield variability within repetitions reduces the risk of spatial memorization and supports the validity of the random split for evaluating within-field predictive performance.

An additional limitation relates to the validation strategy employed in this study. The use of a random train–test split may lead to optimistic performance estimates when samples from the same field, cultivar, or management treatment appear in both training and test sets. Although this approach is suitable for assessing overall model accuracy under realistic within-field variability, it does not fully evaluate spatial or temporal transferability. Future studies should therefore implement spatially and temporally structured validation schemes, such as leave-one-field-out or leave-one-season-out cross-validation, to more rigorously assess model generalization across different environments and growing conditions.

Another important direction for future research is the scalability of the developed framework to satellite remote sensing. While UAV-based multispectral data provide high spatial and temporal resolution, satellite platforms offer substantially broader coverage at regional and national scales. However, coarser spatial resolution and lower revisit frequency introduce additional uncertainty in yield predictions. Future work should therefore examine how the modeling framework performs when using satellite-derived vegetation indices and explore strategies such as spatial downscaling, image fusion, or multi-resolution feature integration to maintain predictive accuracy at larger operational scales.

Additionally, integrating complementary data sources—such as high-resolution weather records, soil physicochemical properties, hyperspectral or thermal infrared (TIR) imaging, and phenological growth metrics—may further enhance prediction performance and improve the interpretability of feature–yield relationships.

Furthermore, the potential extension of this modeling framework to other cereal crops (e.g., rice or barley) or different crop families remains an open question. Given differences in canopy architecture, physiological development, and spectral response dynamics, direct transferability cannot be assumed. Although this approach was not tested beyond wheat in the present study, future research should investigate model retraining and validation across multiple crop types to assess whether the workflow is broadly generalizable or requires crop-specific calibration.

Finally, evaluating advanced deep learning architectures—such as convolutional neural networks and transformer-based models—as well as exploring transfer learning and multisource data fusion strategies represents a promising direction for advancing UAV- and satellite-based crop yield prediction. Such approaches may enable the development of more scalable, adaptable, and resilient prediction frameworks suitable for real-world precision agriculture applications. Formal statistical significance testing between individual machine learning models was not performed, as the primary objective of this study was comparative model ranking and operational robustness rather than hypothesis testing between algorithms. In addition, although data volume and processing requirements were reported to illustrate operational feasibility, future studies should more explicitly quantify radiometric uncertainty, spectral noise, and vegetation index saturation effects, particularly during dense canopy stages. Finally, model performance under extreme weather conditions and across multiple growing seasons remains an important topic for further investigation to ensure long-term robustness and transferability.

4.5. Summary

Overall, this study demonstrates the power of UAV-acquired multispectral data and machine learning, particularly ensemble models, in accurate, non-invasive wheat yield prediction. Our approach provides a scalable, efficient, and high-resolution solution for supporting timely decisions in precision agriculture.

5. Conclusions

The results of this study clearly demonstrate that integrating UAV-based multispectral data with machine learning methods provides an effective and reliable framework for wheat yield prediction. By analyzing 400 experimental plots under varying fertilization regimes and five wheat cultivars, monitored across six phenological stages, it was shown that ensemble algorithms (particularly Random Forest, Extra Trees, Gradient Boosting, and LightGBM) significantly outperformed linear and kernel-based approaches. The highest prediction accuracy was achieved during the flowering stage, where the Random Forest Regressor reached a highest R² and RMSE, confirming the reproductive phase as the most critical period for predictive optimization.

An important contribution of this work lies in the detailed assessment of the temporal dynamics of model performance. While early vegetative stages offered limited predictive capacity, the mid-season and reproductive phases (heading, flowering, and early grain filling) provided the strongest correlations between vegetation indices and final yield. This confirms that the timing of data acquisition is essential for achieving highly accurate yield estimation.

Moreover, the application of the PyCaret AutoML framework enabled systematic evaluation of 25 algorithms, highlighting the advantage of ensemble methods not only in terms of accuracy but also in stability and computational efficiency.

In conclusion, this study confirms that UAV multispectral technologies, when combined with advanced machine learning, represent a sustainable and scalable approach toward the digital transformation of agriculture and precision yield management. Future research should focus on extending this framework by incorporating climatic and soil variables and testing the models under diverse agroecological conditions to further enhance their generalizability and operational deployment.

Author Contributions

Conceptualization: K.K., Z.S., M.K. and V.A.; methodology, K.K., Z.S., M.K. and V.A.; software, K.K. and Z.S.; validation, Z.S., M.K. and N.M.; formal analysis, K.K., A.I. and M.I.; investigation, K.K. and Z.S.; resources, V.A.; data curation, M.K. and M.I.; writing—original draft preparation, K.K. and Z.S.; writing—review and editing, M.I. and N.M.; visualization, A.I. and M.I.; supervision, N.M., M.I. and M.K.; project administration, N.M.; funding acquisition, N.M. All authors have read and agreed to the published version of the manuscript.

Funding

This publication is part of the TALLHEDA project that has received funding from the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101136578. Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The financial support mentioned in the Funding part is gratefully acknowledged.

Conflicts of Interest

The authors declare no conflict of interest.

References

Park, H.; Cha, J.-K.; Lee, S.-M.; Kwon, Y.; Choi, J.; Lee, J.-H. Artificial Rainfall on Grain Quality and Baking Characteristics of Winter Wheat Cultivars in Korea. Foods 2024, 13, 1679. [Google Scholar] [CrossRef] [PubMed]
Shen, Y.; Han, X.; Feng, H.; Han, Z.; Wang, M.; Ma, D.; Jin, J.; Li, S.; Ma, G.; Zhang, Y.; et al. Wheat GSPs and Processing Quality Are Affected by Irrigation and Nitrogen through Nitrogen Remobilisation. Foods 2023, 12, 4407. [Google Scholar] [CrossRef] [PubMed]
Iqbal, N.; Shahzad, M.U.; Sherif, E.-S.M.; Tariq, M.U.; Rashid, J.; Le, T.-V.; Ghani, A. Analysis of Wheat-Yield Prediction Using Machine Learning Models Under Climate Change Scenarios. Sustainability 2024, 16, 6976. [Google Scholar] [CrossRef]
Fei, S.; Chen, Z.; Li, L.; Ma, Y.; Xiao, Y. Bayesian Model Averaging to Improve the Yield Prediction in Wheat Breeding Trials. Agric. For. Meteorol. 2023, 328, 109237. [Google Scholar] [CrossRef]
Fei, S.; Hassan, M.A.; Xiao, Y.; Su, X.; Chen, Z.; Cheng, Q.; Duan, F.; Chen, R.; Ma, Y. UAV-Based Multi-Sensor Data Fusion and Machine Learning Algorithm for Yield Prediction in Wheat. Precis. Agric. 2023, 24, 187–212. [Google Scholar] [CrossRef]
Bazrafshan, O.; Ehteram, M.; Gerkaninezhad Moshizi, Z.; Jamshidi, S. Evaluation and Uncertainty Assessment of Wheat Yield Prediction by Multilayer Perceptron Model with Bayesian and Copula Bayesian Approaches. Agric. Water Manag. 2022, 273, 107881. [Google Scholar] [CrossRef]
Yang, S.; Li, L.; Fei, S.; Yang, M.; Tao, Z.; Meng, Y.; Xiao, Y. Wheat Yield Prediction Using Machine Learning Method Based on UAV Remote Sensing Data. Drones 2024, 8, 284. [Google Scholar] [CrossRef]
Shen, Y.; Mercatoris, B.; Cao, Z.; Kwan, P.; Guo, L.; Yao, H.; Cheng, Q. Improving Wheat Yield Prediction Accuracy Using LSTM-RF Framework Based on UAV Thermal Infrared and Multispectral Imagery. Agriculture 2022, 12, 892. [Google Scholar] [CrossRef]
Wu, Q.; Zhang, Y.; Xie, M.; Zhao, Z.; Yang, L.; Liu, J.; Hou, D. Estimation of Fv/Fm in Spring Wheat Using UAV-Based Multispectral and RGB Imagery with Multiple Machine Learning Methods. Agronomy 2023, 13, 1003. [Google Scholar] [CrossRef]
Panek, E.; Gozdowski, D.; Stępień, M.; Samborski, S.; Ruciński, D.; Buszke, B. Within-Field Relationships between Satellite-Derived Vegetation Indices, Grain Yield and Spike Number of Winter Wheat and Triticale. Agronomy 2020, 10, 1842. [Google Scholar] [CrossRef]
Di, Y.; Gao, M.; Feng, F.; Li, Q.; Zhang, H. A New Framework for Winter Wheat Yield Prediction Integrating Deep Learning and Bayesian Optimization. Agronomy 2022, 12, 3194. [Google Scholar] [CrossRef]
Joshi, A.; Pradhan, B.; Chakraborty, S.; Behera, M.D. Winter Wheat Yield Prediction in the Conterminous United States Using Solar-Induced Chlorophyll Fluorescence Data and XGBoost and Random Forest Algorithm. Ecol. Inform. 2023, 77, 102194. [Google Scholar] [CrossRef]
Zeng, L.; Peng, G.; Meng, R.; Man, J.; Li, W.; Xu, B.; Lv, Z.; Sun, R. Wheat Yield Prediction Based on Unmanned Aerial Vehicles-Collected Red–Green–Blue Imagery. Remote Sens. 2021, 13, 2937. [Google Scholar] [CrossRef]
Wang, J.; Si, H.; Gao, Z.; Shi, L. Winter Wheat Yield Prediction Using an LSTM Model from MODIS LAI Products. Agriculture 2022, 12, 1707. [Google Scholar] [CrossRef]
Schumacher, B.L.; Burchfield, E.K.; Bean, B.; Yost, M.A. Leveraging Important Covariate Groups for Corn Yield Prediction. Agriculture 2023, 13, 618. [Google Scholar] [CrossRef]
Kešelj, K.; Stamenković, Z.; Kostić, M.; Aćin, V.; Tekić, D.; Novaković, T.; Ivanišević, M.; Ivezić, A.; Magazin, N. Machine Learning (AutoML)-Driven Wheat Yield Prediction for European Varieties: Enhanced Accuracy Using Multispectral UAV Data. Agriculture 2025, 15, 1534. [Google Scholar] [CrossRef]
Wilber, A.L.; Czarnecki, J.M.P.; McRudy, J.D. An ArcGIS Pro workflow to extract vegetation indices from aerial imagery of small-plot turfgrass research. Crop Sci. 2021, 62, 503–511. [Google Scholar] [CrossRef]
Lukas, V.; Hunady, I.; Kintl, A.; Mezera, J.; Hammerschmiedt, T.; Sobotková, J.; Brtnický, M.; Elbl, J. Using UAV to Identify the Optimal Vegetation Index for Yield Prediction of Oil Seed Rape (Brassica napus L.) at the Flowering Stage. Remote Sens. 2022, 14, 4953. [Google Scholar] [CrossRef]
Aldubai, A.A.; Alsadon, A.A.; Al-Gaadi, K.A.; Tola, E.; Ibrahim, A.A. Utilizing spectral vegetation indices for yield assessment of tomato genotypes grown in arid conditions. Saudi J. Biol. Sci. 2022, 29, 2506–2513. [Google Scholar] [CrossRef]
Qi, H.; Wu, Z.; Zhang, L.; Li, J.; Zhou, J.; Jun, Z.; Zhu, B. Monitoring of peanut leaves chlorophyll content based on drone-based multispectral image feature extraction. Comput. Electron. Agric. 2021, 187, 106292. [Google Scholar] [CrossRef]
Herrmann, I.; Bdolach, E.; Montekyo, Y.; Rachmilevitch, S.; Townsend, P.A.; Karnieli, A. Assessment of maize yield and phenology by drone-mounted superspectral camera. Precis. Agric. 2019, 20, 1016–1035. [Google Scholar] [CrossRef]
Nadjla, B.; Assia, S.; Ahmed, Z. Contribution of spectral indices of chlorophyll (RECl GCI) in the analysis of multi-temporal mutations of cultivated land in the Mostaganem plateau. In Proceedings of the 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria, 8–9 May 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
Shashikant, V.; Mohamed Shariff, A.R.; Wayayok, A.; Kamal, M.R.; Lee, Y.P.; Takeuchi, W. Utilizing TVDI and NDWI to Classify Severity of Agricultural Drought in Chuping, Malaysia. Agronomy 2021, 11, 1243. [Google Scholar] [CrossRef]
Main, R.; Cho, M.A.; Mathieu, R.; O’Kennedy, M.M.; Ramoelo, A.; Koch, S. An investigation into robust spectral indices for leaf chlorophyll estimation. ISPRS J. Photogramm. Remote Sens. 2011, 66, 751–761. [Google Scholar] [CrossRef]
Dorigo, W.A.; Zurita-Milla, R.; deWit, A.J.W.; Brazile, J.; Singh, R.; Schaepman, M.E. A review on reflective remote sensing and data assimilation techniques for enhanced agroecosystem modeling. Int. J. Appl. Earth Obs. Geoinf. 2007, 9, 165–193. [Google Scholar] [CrossRef]
Prananda, A.R.A.; Kamal, M.; Wijaya Kusuma, D. The effect of using different vegetation indices for mangrove leaf area index modelling. IOP Conf. Ser. Earth Environ. Sci. 2020, 500, 012006. [Google Scholar] [CrossRef]
Ravi, P.S. Determining In-Season Nitrogen Requirements for Corn Using Aerial Color-Infrared Photography. Ph.D. Thesis, North Carolina State University, Raleigh, NC, USA, 2005. Available online: https://repository.lib.ncsu.edu/handle/1840.16/4200 (accessed on 29 September 2024).
Hunt, E.R., Jr.; Daughtry, C.S.T.; Eitel, J.U.H.; Long, D.S. Remote sensing leaf chlorophyll content using a visible band index. Agron. J. 2011, 103, 1090–1099. [Google Scholar] [CrossRef]
Sims, D.A.; Gamon, J.A. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Pu, R.; Gong, P.; Yu, Q. Comparative Analysis of EO-1 ALI and Hyperion, and Landsat ETM+ Data for Mapping Forest Crown Closure and Leaf Area Index. Sensors 2008, 8, 3744–3766. [Google Scholar] [CrossRef]
Bannari, A.; Asalhi, H.; Teillet, P.M. Transformed difference vegetation index (TDVI) for vegetation cover mapping. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Toronto, ON, Canada, 24–28 June 2002; IEEE: Piscataway, NJ, USA, 2002; pp. 3053–3055. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Stark, R.; Rundquist, D. Novel algorithms for remote estimation of vegetation fraction. Remote Sens. Environ. 2002, 80, 76–87. [Google Scholar] [CrossRef]
Wang, F.; Huang, J.; Tang, Y.; Wang, X. New Vegetation Index and Its Application in Estimating Leaf Area Index of Rice. Rice Sci. 2007, 14, 195–203. [Google Scholar] [CrossRef]
Bannari, A.; Morin, D.; Bonn, F.; Huete, A.R. A review of vegetation indices. Remote Sens. Rev. 1995, 13, 95–120. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; de Colstoun, E.B.; McMurtrey, J.E., III. Estimating corn leaf chlorophyll concentration from leaf and canopy reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Ehammer, A.; Fritsch, S.; Conrad, C.; Lamers, J.; Dech, S. Statistical derivation of fPAR and LAI for irrigated cotton and rice in arid Uzbekistan by combining multi-temporal RapidEye data and ground measurements. Remote Sens. 2010, 7824, 782409. [Google Scholar] [CrossRef]
Zhang, H.; Li, J.; Liu, Q.; Lin, S.; Huete, A.; Liu, L.; Croft, H.; Clevers, J.G.P.W.; Zeng, Y.; Wang, X.; et al. A novel red-edge spectral index for retrieving the leaf chlorophyll content. Methods Ecol. Evol. 2022, 13, 2771–2787. [Google Scholar] [CrossRef]
Clevers, J.G.P.W.; Gitelson, A.A. Remote estimation of crop and grass chlorophyll and nitrogen content using red-edge bands on Sentinel-2 and -3. Int. J. Appl. Earth Obs. Geoinf. 2013, 23, 344–351. [Google Scholar] [CrossRef]
Srivastava, A.K.; Safaei, N.; Khaki, S.; Lopez, G.; Zeng, W.; Ewert, F.; Gaiser, T.; Rahimi, J. Winter Wheat Yield Prediction Using Convolutional Neural Networks from Environmental and Phenological Data. Sci. Rep. 2022, 12, 3215. [Google Scholar] [CrossRef]

Figure 1. Structured Workflow for Multispectral UAV Data Processing and Yield Estimation.

Figure 2. Study Site Overview: Experimental Plot Arrangement and Coordinates.

Figure 3. Coefficient of determination (R²) for the five top-performing regression models across 15 measurement dates.

Figure 4. Root mean squared error (RMSE) for the five top-performing regression models across 15 measurement dates.

Figure 5. Average Training Time per Model Across 15 Measurements.

Table 1. Phenological Stages of Wheat and Measurement Timeline in the 2023 Growing Season.

Date (2023)	Phenological Stage	Physiological Status (Summary)
21 February 2023	End of winter dormancy/Early stem elongation (BBCH 20–29)	Resumption of vegetative growth after winter dormancy; initiation of tiller activity and early biomass accumulation.
8 March 2023	Early stem elongation (BBCH 20–29)	Active vegetative growth with increasing leaf area and stem elongation; photosynthetic activity intensifies.
14 March 2023	Early stem elongation (BBCH 20–29)	Continued canopy expansion and biomass accumulation; crop architecture becomes more developed.
20 March 2023	First node visible (BBCH 31)	Onset of rapid stem elongation; structural differentiation of the main stem begins.
29 March 2023	Second node visible (BBCH 32)	Accelerated stem growth and internode elongation; increasing demand for nutrients and assimilates.
6 April 2023	Third node visible (BBCH 33–34)	Advanced stem elongation; canopy structure becomes more uniform and vertically developed.
12 April 2023	Flag leaf sheath extending, head developing (BBCH 37–39)	Flag leaf emergence; establishment of maximum photosynthetically active leaf area.
22 April 2023	Head development, pre-booting (BBCH 49–50)	Transitional phase between vegetative and reproductive development; increased structural and spectral variability.
27 April 2023	Heading begins (BBCH 51–55)	Emergence of the ear from the flag leaf sheath; onset of reproductive development.
5 May 2023	Heading complete (BBCH 59)	Complete ear emergence; canopy reaches peak structural complexity.
10 May 2023	Flowering begins (BBCH 61)	Start of anthesis; initiation of grain number determination.
22 May 2023	Full flowering (BBCH 65–69)	Peak canopy development and physiological activity; strong coupling between biomass, chlorophyll content, and yield formation.
1 June 2023	Beginning of milk stage (BBCH 71–73)	Onset of grain filling; assimilates actively translocated from vegetative organs to grains.
13 June 2023	Early dough stage (BBCH 80–85)	Advanced grain filling; gradual decline in leaf photosynthetic activity.
23 June 2023	Late dough stage, approaching maturity (BBCH 87–89)	Final grain development and onset of senescence; reduced canopy greenness and spectral response.

Table 2. Temperature, precipitation, and sky conditions during UAV-based multispectral data acquisition (2023).

Date (2023)	Avg. Temp. (°C)	Max Temp. (°C)	Min Temp. (°C)	Sky Conditions
21 February 2023	10	18	3	Mostly sunny
8 March 2023	11	14	7	Sunny
14 March 2023	12	20	6	Mostly sunny
20 March 2023	14	20	7	Sunny
29 March 2023	13	20	4	Sunny
6 April 2023	8	10	−1	Sunny
12 April 2023	11	17	5	Sunny
22 April 2023	15	22	9	Sunny
27 April 2023	10	16	2	Mostly sunny
5 May 2023	18	24	9	Mostly sunny
10 May 2023	15	20	7	Sunny
22 May 2023	21	27	12	Sunny
1 June 2023	20	26	15	Sunny
13 June 2023	18	25	12	Sunny
23 June 2023	29	35	21	Sunny

Table 3. Yield Variability Under Different Mineral Fertilization Treatments.

	Fertilization Application Rate			Yield Range/Mean Yield (t/ha)
Combination	N (t/ha)	P (t/ha)	K (t/ha)	NS Epoha	NS Futura	NS Igra	NS Obala	NS Rajna
1	0	0	0	3.58–5.18 (4.12)	2.55–3.26 (2.95)	2.96–4.33 (3.38)	2.42–3.92 (2.94)	2.31–3.03 (2.65)
2	0.10	0	0	7.09–8.86 (8.04)	6.21–7.76 (6.97)	5.54–9.37 (7.77)	5.20–7.89 (7.07)	5.06–7.72 (6.96)
3	0	0.10	0	3.79–6.84 (5.30)	3.58–3.91 (3.76)	3.07–4.45 (3.86)	3.64–4.66 (3.95)	3.09–3.45 (3.26)
4	0	0	0.10	2.87–6.08 (4.21)	2.31–3.06 (2.55)	2.94–3.10 (3.03)	2.49–3.46 (2.87)	2.26–3.18 (2.62)
5	0.10	0.10	0	8.24–9.18 (8.61)	6.90–7.80 (7.34)	8.36–8.88 (8.65)	7.44–8.18 (7.86)	7.25–8.08 (7.71)
6	0.10	0	0.10	7.77–8.81 (8.40)	7.07–7.50 (7.25)	7.55–8.71 (8.33)	7.15–8.59 (7.91)	6.63–7.62 (7.33)
7	0	0.10	0.10	2.82–7.50 (5.29)	2.59–4.05 (3.23)	2.73–4.68 (3.88)	3.11–5.22 (3.77)	2.60–3.98 (3.42)
8	0.05	0.05	0.05	6.93–7.89 (7.30)	6.11–6.70 (6.28)	6.75–7.42 (7.02)	5.78–6.15 (5.97)	4.61–5.94 (5.42)
9	0.05	0.10	0.05	7.57–8.19 (7.92)	6.53–6.72 (6.61)	7.70–8.65 (8.21)	6.08–7.47 (7.06)	5.27–6.88 (6.03)
10	0.05	0.10	0.10	5.94–8.31 (7.45)	5.71–6.76 (6.28)	6.75–8.07 (7.32)	5.66–7.41 (6.49)	5.27–6.28 (5.79)
11	0.10	0.05	0.05	8.31–8.83 (8.60)	6.89–7.76 (7.32)	7.97–9.06 (8.73)	7.77–8.09 (7.92)	7.48–7.74 (7.58)
12	0.10	0.10	0.05	8.50–8.98 (8.78)	7.10–7.72 (7.39)	8.61–9.20 (8.86)	7.67–8.08 (7.92)	7.26–7.63 (7.44)
13	0.10	0.10	0.10	8.51–8.86 (8.73)	6.39–7.76 (7.18)	8.79–9.12 (8.90)	7.86–8.14 (8.00)	7.66–7.91 (7.82)
14	0.10	0.15	0.05	7.86–9.00 (8.61)	7.43–7.72 (7.59)	8.74–9.65 (9.09)	7.84–8.32 (8.08)	7.08–7.92 (7.68)
15	0.10	0.15	0.15	7.89–9.39 (8.85)	7.26–8.08 (7.61)	8.82–9.16 (9.01)	7.99–8.36 (8.15)	7.53–7.86 (7.69)
16	0.15	0.05	0.05	8.55–9.22 (8.93)	6.89–7.54 (7.22)	7.97–9.52 (8.73)	7.75–8.40 (8.10)	7.62–7.84 (7.73)
17	0.15	0.10	0.05	8.38–9.43 (8.83)	6.65–8.09 (7.07)	8.03–9.27 (8.76)	7.85–8.51 (8.08)	7.31–7.74 (7.50)
18	0.15	0.10	0.10	8.23–9.40 (8.88)	6.78–7.53 (7.10)	8.60–9.04 (8.88)	7.67–8.30 (7.95)	7.30–7.90 (7.61)
19	0.15	0.15	0.10	7.68–9.27 (8.65)	6.49–6.77 (6.65)	8.22–9.10 (8.63)	7.78–8.54 (8.18)	7.24–8.07 (7.57)
20	0.15	0.15	0.15	7.20–9.28 (8.51)	6.39–7.72 (7.05)	8.19–8.76 (8.41)	7.96–8.59 (8.25)	7.51–7.94 (7.68)

Table 4. DJI Phantom 4 Multispectral Camera Sensor Characteristics [16].

Band	Center Wavelength/nm	Bandwidth/nm
Blue (B)	450	16
Green (G)	560	16
Red (R)	650	16
Red Edge (RE)	730	16
Near-infrared (NIR)	840	26

Table 5. Equations and References for Vegetation Indices derived from RGB Bands [9].

Index Type	Formula
Red (R), Green (G), Blue (B)	Numerical values
Normalized Red	$r = \frac{R e d}{R e d + G r e e n + B l u e}$
Normalized Green	$g = \frac{G r e e n}{R e d + G r e e n + B l u e}$
Normalized Blue	$b = \frac{B l u e}{R e d + G r e e n + B l u e}$
Green–Red Ratio Index	$G R R I = \frac{G r e e n}{R e d}$
Green–Blue Ratio Index	$G B R I = \frac{G r e e n}{B l u e}$
Red–Blue Ratio Index	$R B R I = \frac{R e d}{B l u e}$
Excess Red Vegetation Index	$E x R = 1.4 \times r - g$
Excess Green Vegetation Index	$E x G = 2 \times g - r - b$
Excess Blue Vegetation Index	$E x B = 1.4 \times b - g$
Excess Green Minus Excess Red Index	$E x G R = E x G - E x R$
Woebbecke Index	$W I = \frac{G r e e n - B l u e}{G r e e n + R e d}$
Normalized Difference Index	$N D I = \frac{r - g}{r + g + 0.01}$
Color Intensity	$I N T = \frac{R e d + B l u e + G r e e n}{3}$
Green Leaf Index 1	$G L I 1 = \frac{2 \times G r e e n - R e d - B l u e}{2 \times G r e e n + R e d + B l u e}$
Green Leaf Index 2	$G L I 2 = \frac{2 \times G r e e n - R e d + B l u e}{2 \times G r e e n + R e d + B l u e}$
Vegetative Index	$V E G = \frac{{R e d}^{(2 / 3)} \times b^{(1 / 3)}}{R e d}$
Color Index of Vegetation	$C I V E = 0.441 \times r - 0.811 \times g + 0.3856 \times b + 18.79$
Combination	$C O M = 0.25 \times E x G + 0.3 \times E x G R + 0.33 \times C I V E + 0.12 \times V E G$
Normalized Green–Red Vegetation Index	$N G R V I = \frac{G r e e n - R e d}{G r e e n + R e d}$
Kawashima Index	$I K A W = \frac{R e d - B l u e}{R e d + B l u e}$
Visible band difference vegetation Index	$V D V I = \frac{2 \times g - r - b}{2 \times g + r + b}$
Visible Atmospherically Resistance Index	$V A R I = \frac{g - r}{g + r - b}$
Principal Component Analysis Index	$I P C A = 0.994 \times \|R e d - B l u e\| + 0.961 \times \|G r e e n - B l u e\| + 0.914 \times \|G r e e n - R e d\|$
Modified Green–Red Vegetation Index	$M G R V I = \frac{{G r e e n}^{2} - {R e d}^{2}}{{G r e e n}^{2} + {R e d}^{2}}$
Red–Green–Blue Vegetation Index	$R G B V I = \frac{{G r e e n}^{2} - B l u e \times R e d}{{G r e e n}^{2} + B l u e \times R e d}$

Table 6. Equations and References for Vegetation Indices derived from Multispectral Bands.

Index Type	Formula	Reference
Normalized Difference Vegetation Index	$N D V I = \frac{N I R - R e d}{N I R + R e d}$	[18]
Renormalized Difference Vegetation Index	$R D V I = \frac{N I R - R e d}{{(N I R + R e d)}^{1 / 2}}$	[19]
Difference vegetation index	$D V I = N I R - R e d$	[20]
Blue normalized difference vegetation index	$B N D V I = \frac{N I R - B l u e}{N I R + B l u e}$	[18]
Green normalized difference vegetation index	$G N D V I = \frac{N I R - G r e e n}{N I R + G r e e n}$	[21]
Modified Soil Adjusted Vegetation Index	$M S A V I = \frac{2 N I R + 1 - \sqrt{{(2 N I R + 1)}^{2} - 8 (N I R - R e d)}}{2}$	[10]
Red-Edge Chlorophyll Vegetation Index	$R e C I = \frac{N I R}{R e d - 1}$	[22]
Normalized Difference Red Edge Index	$N D R E = \frac{N I R - R e d E d g e}{N I R + R e d E d g e}$	[21]
Normalized Difference Water Index	$N D W I = \frac{G r e e n - N I R}{G r e e n + N I R}$	[23]
Optimized Soil-Adjusted Vegetation Index	$O S A V I = (1 + 0.16) \times \frac{(N I R - R e d)}{(N I R + R e d + 0.16)}$	[21]
The Simple Ratio	$S R = \frac{N I R}{R e d}$	[21]
Modified Simple Radio	$M S R = \frac{\frac{N I R}{R e d} - 1}{{(\frac{N I R}{R e d})}^{1 / 2} + 1}$	[24]
Infrared percentage vegetation index	$I P V I = \frac{N I R}{N I R + R e d}$	[19]
Enhanced Vegetation Index	$E V I = \frac{2.5 \times (N I R - R e d)}{(N I R + 6 \times R e d - 7.5 \times B l u e + 1)}$	[25]
Green atmospherically resistant vegetation index	$G A R I = \frac{(N I R - (G r e e n - (B l u e - R e d)))}{(N I R - (G r e e n + (B l u e - R e d)))}$	[26]
Soil-Adjusted Vegetation Index	$S A V I = \frac{(N I R - R e d)}{(N I R + R e d + 0.5)} \times (1 + 0.5)$	[10]
Green Soil Adjusted Vegetation Index	$G S A V I = \frac{(N I R - G r e e n)}{(N I R + G r e e n + 0.5)} \times (1 + 0.5)$	[27]
Green Optimized Soil Adjusted Vegetation Index	$G O S A V I = \frac{N I R - G r e e n}{N I R + G r e e n + 0.16}$	[27]
Green Chlorophyll Vegetation Index	$G C I = \frac{N I R}{G r e e n} - 1$	[28]
Plant Senescence Reflectance Index	$P S R I = \frac{R e d - G r e e n}{N I R}$	[29]
Nonlinear vegetation index	$N L I = \frac{{N I R}^{2} - R e d}{{N I R}^{2} + R e d}$	[30]
Transformed difference vegetation index	$T D V I = 1.5 \times \frac{(N I R - R e d)}{\sqrt{({N I R}^{2} + R e d + 0.5)}}$	[31]
Visible Atmospherically Resistant Index	$V A R I = \frac{G r e e n - R e d}{G r e e n + R e d - B l u e}$	[32]
Wide Dynamic Range Vegetation Index	$W D R V I = \frac{0.1 N I R - R e d}{0.1 N I R + R e d}$	[26]
Green–Red NDVI	$G R N D V I = \frac{N I R - (G r e e n + R e d)}{N I R + (G r e e n + R e d)}$	[24]
Green–Blue NDVI	$G B N D V I = \frac{N I R - (G r e e n + B l u e)}{N I R + (G r e e n + B l u e)}$	[33]
Red–Blue NDVI	$G R N D V I = \frac{N I R - (R e d + B l u e)}{N I R + (R e d + B l u e)}$	[33]
Pan NDVI	$P N D V I = \frac{N I R - (G r e e n + R e d + B l u e)}{N I R + (G r e e n + R e d + B l u e)}$	[33]
Simple Ratio Red/NIR Ratio Vegetation Index	$I S R = \frac{R e d}{N I R}$	[34]
Leaf Chlorophyll Index	$L C I = \frac{N I R - R e d E d g e}{N I R + R e d}$	[20]
Ratio Between NIR and Green Bands	${V I}_{(\frac{N I R}{G r e e n})} = \frac{N I R}{G r e e n}$	[35]
Ratio Between NIR and Red Edge Bands	${V I}_{(\frac{N I R}{R e d E d g e})} = \frac{N I R}{R e d E d g e}$	[36]
Simplified Canopy Chlorophyll Content Index	$S C C C I = \frac{N D R E}{N D V I}$	[9]
Modified Chlorophyll Absorption Reflectance Index	$M C A R I = (R e d E d g e - R e d - 0.2 \times (R e d E d g e - G r e e n)) \times \frac{R e d E d g e}{R e d}$	[20]
Transformed Chlorophyll Absorption Reflectance Index	$T C A R I = 3 \times ((R e d E d g e - R e d) - 0.2 \times (R e d E d g e - G r e e n) \times \frac{R e d E d g e}{R e d})$	[21]
Structure Insensitive Pigment Index	$S I P I = \frac{N I R - B l u e}{N I R - R e d}$	[24]
TCARI/OSAVI	$\frac{T C A R I}{O S A V I}$	[24]
MCARI/OSAVI	$\frac{M C A R I}{O S A V I}$	[24]
Red-Edge Chlorophyll Index 1	$C I 1 = \frac{N I R}{R e d E d g e} - 1$	[37]
Red-Edge Chlorophyll Index 2	$C I 2 = \frac{R e d E d g e}{G r e e n} - 1$	[38]

Table 7. List of Machine Learning Models Used for Yield Prediction [16].

Model Type	Model Abbreviation
Linear Regression	lr
Lasso Regression	lasso
Ridge Regression	ridge
Elastic Net	en
Least Angle Regression	lar
Lasso Least Angle Regression	llar
Orthogonal Matching Pursuit	omp
Bayesian Ridge	br
Automatic Relevance Determination	ard
Passive Aggressive Regressor	par
Random Sample Consensus	ransac
TheilSen Regressor	tr
Huber Regressor	huber
Kernel Ridge	kr
Support Vector Regression (SVR)	svr
K Neighbors Regressor	knn
Decision Tree Regressor	dt
Random Forest Regressor	rf
Extra Trees Regressor	et
AdaBoost Regressor	ada
Gradient Boosting Regressor	gbr
MLP Regressor	mlp
Extreme Gradient Boosting	xgboost
Light Gradient Boosting Machine	lightgbm
Dummy Regressor	dummy

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kešelj, K.; Stamenković, Z.; Kostić, M.; Aćin, V.; Ivezić, A.; Ivanišević, M.; Magazin, N. Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction. AgriEngineering 2026, 8, 71. https://doi.org/10.3390/agriengineering8020071

AMA Style

Kešelj K, Stamenković Z, Kostić M, Aćin V, Ivezić A, Ivanišević M, Magazin N. Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction. AgriEngineering. 2026; 8(2):71. https://doi.org/10.3390/agriengineering8020071

Chicago/Turabian Style

Kešelj, Krstan, Zoran Stamenković, Marko Kostić, Vladimir Aćin, Aleksandar Ivezić, Mladen Ivanišević, and Nenad Magazin. 2026. "Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction" AgriEngineering 8, no. 2: 71. https://doi.org/10.3390/agriengineering8020071

APA Style

Kešelj, K., Stamenković, Z., Kostić, M., Aćin, V., Ivezić, A., Ivanišević, M., & Magazin, N. (2026). Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction. AgriEngineering, 8(2), 71. https://doi.org/10.3390/agriengineering8020071

Article Menu

Temporal Dynamics of UAV Multispectral Vegetation Indices for Accurate Machine Learning-Based Wheat Yield Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Research Methodological Framework

2.2. Experimental Setup, Study Location, and Wheat Phenological Stages

2.3. Yield Variability Under Different NPK Treatments

2.4. Multispectral UAV Platform and Sensor Specifications

2.5. Data Acquisition Strategy and Digital Footprint

2.6. Processing and Computation of Vegetation Indices

2.7. Predictive Modeling and Accuracy Assessment

3. Results

3.1. Temporal Dynamics of Model Performance Across 15 Measurements

3.1.1. R2 Dynamics Across Measurement Dates

3.1.2. RMSE Dynamics Across Measurement Dates

3.2. Comparative Evaluation of 25 Machine Learning Models Across Multiple Metrics

3.2.1. R2 Score Comparison Across 25 Models

3.2.2. RMSE Comparison Across 25 Models

3.3. Identification of the Best Performing Model for Each Measurement Date

3.3.1. Early Growth Stages (BBCH 20–29)

3.3.2. Stem Elongation and Node Development (BBCH 31–39)

3.3.3. Heading and Flowering Phases (BBCH 49–69)

3.3.4. Flowering and Grain Filling Phases (BBCH 61–89)

3.3.5. General Performance Patterns Across Models

3.4. Training Time and Computational Efficiency of Models

4. Discussion

4.1. Interpretation of Key Findings

4.2. Comparative Evaluation with Prior Studies

4.3. Contributions and Practical Implications

4.4. Limitations and Future Work

4.5. Summary

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.1.1. R² Dynamics Across Measurement Dates

3.2.1. R² Score Comparison Across 25 Models