Wheat Yield Prediction Using Machine Learning Method Based on UAV Remote Sensing Data

: Accurate forecasting of crop yields holds paramount importance in guiding decision-making processes related to breeding e ﬀ orts. Despite signi ﬁ cant advancements in crop yield forecasting, existing methods often struggle with integrating diverse sensor data and achieving high prediction accuracy under varying environmental conditions. This study focused on the application of multi-sensor data fusion and machine learning algorithms based on unmanned aerial vehicles (UAVs) in wheat yield prediction. Five machine learning (ML) algorithms, namely random forest (RF), partial least squares (PLS), ridge regression (RR), k-nearest neighbor (KNN) and extreme gradient boosting decision tree (XGboost), were utilized for multi-sensor data fusion, together with three ensemble methods including the second-level ensemble methods (stacking and feature-weighted) and the third-level ensemble method (simple average), for wheat yield prediction. The 270 wheat hybrids were used as planting materials under full and limited irrigation treatments. A cost-e ﬀ ective multi-sensor UAV platform, equipped with red–green–blue (RGB), multispectral (MS), and thermal infra-red (TIR) sensors, was utilized to gather remote sensing data. The results revealed that the XGboost algorithm exhibited outstanding performance in multi-sensor data fusion, with the RGB + MS + Texture + TIR combination demonstrating the highest fusion performance ( R 2 = 0.660, RMSE = 0.754). Compared with the single ML model, the employment of three ensemble methods signi ﬁ cantly enhanced the accuracy of wheat yield prediction. Notably, the third-layer simple average ensemble method demonstrated superior performance ( R 2 = 0.733, RMSE = 0.668 t ha − 1 ). It signi ﬁ cantly out-performed both the second-layer ensemble methods of stacking ( R 2 = 0.668, RMSE = 0.673 t ha − 1 ) and feature-weighted ( R 2 = 0.667, RMSE = 0.674 t ha − 1 ), thereby exhibiting superior predictive capabilities. This ﬁ nding highlighted the third-layer ensemble method’s ability to enhance predictive capabilities and re ﬁ ned the accuracy of wheat yield prediction through simple average ensemble learning, o ﬀ ering a novel perspective for crop yield prediction and breeding selection.


Introduction
Wheat stands as one of the most vital crops globally, with approximately 35-40% of the world's population relying on it as a primary food source.It contributes approximately 21% of food energy and 20% of protein intake.Given the backdrop of population growth and climate change, the early and accurate prediction of wheat yield holds utmost importance for safeguarding national food security and maintaining people's living standards [1,2].Conventionally, the yield prediction method has primarily been dependent on field observation and investigation, which are not only time-consuming and laborious processes but also susceptible to subjective biases, and can even result in crop damage [3].In recent years, remote sensing technology has gained widespread application in the domain of agricultural monitoring.This technology enables the effective acquisition of canopy spectral data from aerial sources, thereby facilitating the prediction of crop yields [4,5].Furthermore, unmanned aerial vehicle (UAV)-based remote sensing technology has witnessed rapid development, owing to its distinctive advantages of flexibility and high resolution [6].
The vegetation index (VI) derived from UAV images has demonstrated its effectiveness in predicting crop yields.Spectral, structural, thermal infrared (TIR), and texture features extracted from UAV-collected datasets through sensors can be utilized to assess various plant traits and structures [7].For instance, low-altitude UAVs were employed to capture RGB imaging data of potato canopies at two distinct growth stages, to predict yields [8].The use of a multispectral (MS) UAV platform for swift monitoring of the normalized vegetation index (NDVI) during the wheat filling stage exhibited a strong correlation with wheat grain yield [9]. Texture information extracted from UAV images can effectively reflect the spatial variations in pixel intensity, thereby emphasizing the structural and geometric characteristics of the plant canopy [10].The potential of UAV TIR imaging technology for assessing crop water stress and predicting wheat kernel yield in different wheat varieties has also been thoroughly validated [11].However, the majority of studies solely rely on data from a single sensor to predict crop yields, overlooking the advantages of combining multiple sensors.For example, by combining the features derived from MS, RGB, and TIR imaging, the accuracy of soybean yield prediction can be significantly improved [7].The combination of canopy TIR information with spectral and structural characteristics can improve the robustness of crop yield prediction across diverse climatic conditions and developmental stages [12].In particular, the application of machine learning (ML) techniques to the analysis of multi-sensor data collected by UAVs can significantly enhance the accuracy of crop yield predictions [13].On this basis, to fully harness the potential of ML algorithms, the machine learning technology was combined with the VIs extracted from the spectral image of the sensor to build a yield prediction model that provides strong support for the relevant practices of precision agriculture [14,15].
At present, a variety of machine learning methods have been applied to yield predictions, such as random forest (RF) [16], partial least squares (PLS) [17], ridge regression (RR) [18], k-nearest neighbor (KNN) [19] and extreme gradient boosting decision tree (XGboost) [20].However, predictions by the same model may vary significantly across different crops and environments, primarily due to the quality of data, the representation of the model, and the dependencies between input and target variables within the collected dataset [21].If the data are biased or if the chosen model exhibits overfitting to the respective dataset, the model will fail to demonstrate accurate performance [22].Ensemble learning, a research hotspot, is proposed to address these challenges.Its objective is to integrate data fusion, data modeling, and data mining into a cohesive framework [23].The ensemble learning paradigm known as stacked regression involves linearly combining various predictors to enhance prediction accuracy [24,25].The feature-weighted ensemble method assigns weights according to the correlation of features and predicts the degree of correlation between each feature and the extracted output model [26,27].In this study, we employ a feature-weighted ensemble learning approach that assigns weights to the training dataset generated by the primary learner, based on the prediction accuracy of each individual learner.Subsequently, utilizing these weighted data, the meta-learner is trained to enhance the overall model's learning efficiency.To further optimize the model performance, we introduce a novel ensemble method in the third layer, specifically the simple average ensemble method.The method calculates the average values of the predictions of the stacking ensemble method and the feature-weighted ensemble method on the test set and compares them with the actual measured values to realize the effect of the third-layer ensemble learning.
This study utilized remote sensing data collected by UAVs 21 days after wheat flowering to predict wheat yield.The main objectives were (1) to evaluate the data based on UAV-derived RGB, MS, Texture, and TIR, along with multi-sensor data fusion, for multidimensional feature analysis to understand the relationship between crop growth status and yield, and to verify the applicability of various classic machine learning algorithms in agriculture, and (2) to apply stacking, feature-weighted, and simple average ensemble methods to enhance the accuracy of wheat yield prediction, providing new technical approaches for the implementation of precision agriculture.

Experiment Location and Design
Two hundred and seventy Recombinant Inbred Lines (RILs) from cross Zhongmai 578/Jimai 22 were planted at the research site of Chinese Academy of Agriculture Sciences (35°18′0″ N, 113°52′0″ E) in Xinxiang, Henan province, China during the 2021-2022 growing season (Figure 1).The experimental design was a randomized complete block with three replicates.The experimental protocol included two distinct irrigation treatments: full and limited irrigation.Both treatments received two irrigations during the seedling and overwintering stages.The full irrigation treatment was supplemented with additional flooding at the greening jointing and early grain filling stage.Each plot area was 3.6 m 2 (1.2 m × 3 m).It was designed in 6 lines, spaced 0.20 m, resulting in a total of 1620 plots across the study area (Figure 1).The planting density was maintained at 270 plants/m 2 , and agricultural management was performed according to local conditions.After maturity, the harvest was conducted using a combine harvester.The seeds were weighed after drying to a moisture content of less than 12.5%.

Multi-Sensor Image Acquisition and Processing Based on UAV
Data acquisition for all traits was performed by a UAV platform M210 (SZ DJI Technology Co., Shenzhen, China).The RGB and TIR sensors were in the same sensor (Zenmuse XT2 camera, SZ DJI Technology Co., Shenzhen, China), with lens pixels of 4000 × 3000 and 640 × 512, respectively.The MS sensor (Red-Edge MX camera, MicaSense, Seattle, WA, USA) captured the same pixel images (1280 × 960) in five bands, including blue, green, red, red edge and near infrared (NIR), at wavelengths of 475 nm, 560 nm, 668 nm, 717 nm and 842 nm, respectively.The aerial surveys were carried out at 21 days postanthesis due to the proven high accuracy of yield predictions during this period [13].All flight tasks were carried out from 10:00 to 14:00 in clear skies, using DJI Pilot 2.3.15software to set route parameters as follows: the forward and side overlap were 90% and 85%, respectively, and the flight altitude was 30 m.
In this study, the Pix4D Mapper Pro 4.5.6 software (Pix4D, Lausanne, Switzerland) was used to perform radiometric correction and image stitching on RGB, TIR and MS images obtained by UAV, and the visible, TIR orthophoto image and five-band orthophoto reflectance map were obtained.The obtained images with spectral reflectance were imported into ArcGIS 10.8.1 (Environmental Systems Research Institute, Inc., Redlands, CA, USA) software for image cropping; each cell was selected as the area of interest, and the features were extracted to calculate the different VIs used in this study.The detailed process is shown in Figure 2. To minimize the noise impact on the images and enhance the efficiency of subsequent processing steps, it was necessary to exclude non-target areas from the acquired MS images.The Pix4D Mapper Pro 4.5.6 software was utilized to perform image stitching, shading correction, and digital number (DN) processing on the filtered MS data, ultimately converting them into a TIFF image format with spectral reflectivity.Radiation calibration was conducted prior to and following each flight using a dedicated calibration plate.Subsequently, the TIR data were calibrated based on the blackbody reference to determine the temperature corresponding to each pixel value in the TIR imagery.

Extraction of Vegetation and Texture Index
As a metric for evaluating physiological parameters of crops, VIs could effectively reflect the real-time growth level of crops [28].The selected 10 color indices and 11 MS VIs provided a robust assessment of the vegetation's physiological condition, ranging from greenness and biomass to senescence and pigment content, as shown in Table 1.
In addition to spectral information, texture features as another important type of remote sensing information were less susceptible to external environmental factors.They reflected the grayscale nature of the image and its spatial relationships, thereby enhancing the inversion accuracy of single spectral information sources that may suffer from saturation issues.Furthermore, texture features enhanced the potential for inverting physicochemical parameters to a certain extent [29].In ENVI 5.3, the widely utilized gray-level co-occurrence matrix (GLCM) was used to extract 40 texture features for the RGB-based R, G, B bands and MS-based red-edge and NIR bands, which included parameters such as angular second moment, contrast, correlation, sum average, sum variance, and others.Then, the region of interest was delimited for the texture feature images of each band in ArcGIS 10.8.1 (Figure 2).
Principal component analysis (PCA) is a data mining technique in multivariate statistics.It converts high-dimensional data into low-dimensional data through dimensionality reduction, while preserving the majority of the information within the data without compromising its integrity [30].Through principal component analysis, we concluded that while using 8 PCAs could capture most of the variance, it did not significantly increase the accuracy of our model.Considering the trade-off between retaining texture information and achieving computational efficiency with PCA, we transformed the original 40 texture features into 3 new principal components.These principal components were linear combinations of the original features and encapsulated the most significant variance in the dataset.By using PCA, we effectively condensed the information from the 40 texture features into a more manageable and interpretable form, without losing the essential characteristics of the texture.This approach not only simplified our analysis but also enhanced our ability to identify and interpret the most meaningful texture features that contribute to the physiological assessment of wheat.Consequently, these three principal components could be regarded as representative of the most significant texture features within the dataset (Figure 2).

Ensemble Learning Framework
This study employed five ML algorithms, RF, PLS, RR, KNN, and XGboost, as base learners to explore the accuracy of yield prediction through ensemble learning and the integration of multi-sensor data.
In ML, each algorithm possesses its distinct strengths.Ensemble learning achieves superior generalization performance by harnessing the combined advantages of various machine learning algorithms [23].This study proposed three ensemble learning methods in total.The first method was stacking regression, which is a heterogeneous ensemble learning model first introduced by WOLPERT in 1992 [49], capable of enhancing the accuracy of yield prediction [50].The objective of this method was to integrate the predictive strengths of five fundamental models.Initially, the training dataset was partitioned into an 80% training subset and a 20% testing subset.Each base model was then trained independently on the training subset, utilizing a 10-fold cross-validation approach, and their respective predictions were generated for the testing subset.Subsequently, these prediction results were employed as input features for the meta-model.RR served as the regression algorithm for the meta-model, tasked with learning to effectively integrate the learn-ing algorithms of the various basic models in order to generate a final ensemble prediction.Throughout the training process, cross-validation techniques were employed to meticulously fine-tune the hyperparameters of the meta-model, with the ultimate goal of bolstering its generalization capabilities.Upon completion of the training phase, the refined stacking model was then utilized to predict outcomes for the test set, subsequently enabling a thorough evaluation of the model's overall performance (Figure 3).The second approach was feature-weighted ensemble learning.By assigning weights to each base learner that reflected their predictive power, it significantly enhanced the model's accuracy and generalizability [51].Each base model underwent training on the training set, and the coefficient of determination (R 2 ) for each base model was computed using the testing set.Subsequently, the R 2 values served as the foundation for allocating weights (Figure 3).
The third approach proposed in this study was simple average ensemble learning, characterized by its ease of implementation and enhanced predictive stability.It took the average of the predictive results obtained from stacking regression and feature-weighted ensemble methods on the test set, thereby enhancing accuracy and generalizability while reducing the risk of overfitting.Then, the R 2 score was computed between the averaged predictions and the true values of the testing set (Figure 3).

Model Performance Evaluation
In this study, the selection R 2 , root-mean-square error (RMSE) and normalized rootmean-square error (NRMSE) were selected as the indexes to evaluate the prediction accuracy of the base learner [52].The formulas are as follows: where  and  are measured and predicted values of wheat yield, respectively,  is the mean value of measured yield, and n is the sample size.
The weight allocation formula [26] is as follows: where  is the weight of the  primary learner,  = 1, 2, …, T; T is the number of primary learners;  , is the R 2 of the  primary learner;  , is the R 2 of the ℎ primary learner.This formula transforms the R 2 scores of each base model into weights and ensures that the sum of all weights equals 1.Thus, the stronger predictive performance of each base model is assigned a higher weight, leading to a larger proportion in the ensemble prediction.

Principal Component Analysis of Texture Features
In analyzing the initial value, variance contribution rate and cumulative variance contribution rate of the texture feature principal components (Table 2), we observed that the initial eigenvalues of the first, second, and third principal components exceeded 1, specifically 19.72, 11.13 and 3.09, respectively.The variance contribution rates were 49.30%, 27.80% and 7.70%, respectively, and the cumulative variance contribution rate amounted to 84.90%.This indicated that the first three principal components were capable of retaining 84.90% of the information from the original data.Consequently, the first three components were extracted as the principal components for the comprehensive evaluation of texture features.Figure 4 displays the loadings of the principal component analysis for the 40 texture features.The variance contributions of the first (PC1), second (PC2), and third (PC3) principal components were represented on the X−, Y− and Z−axes, respectively.It was evident that the larger the absolute value of a variable's coefficient on a particular principal component, the greater its contribution to that component.

Correlation Analysis of CI, VI, Texture Features and TIR with Wheat Yield
The Pearson's correlation coefficient (r) analysis of the vegetation index including 10 CIs and 11 VIs, 3 texture features and thermal infrared index are shown in Figure 5.The absolute correlation between CI and wheat yield ranged from r = 0.13 to r = 0.72.Among these, the highest correlation was observed with VARI (r = 0.72), while the lowest correlations were with PPR and GBRI (r = 0.13).The remaining seven indices, IKAW, ExG, RGBVI, GLA, CIVE, RBRI and VARI, all exhibited correlations of 0.6 and above (r ≥ 0.60).The absolute correlation between VIs and wheat yield consistently approached 0.70, with RDVI and GOSAV showing the highest correlation (r = 0.70).The lowest correlation was observed with MTCI (r = 0.68).The texture features were primarily assessed with component analysis.In the correlation analysis between TIR and wheat yield, it was found that the absolute correlation value of PC1 was the highest (r = 0.69), whereas the remaining indices exhibited lower correlations.Notably, TIR demonstrated a relatively higher correlation (r = 0.68).

Wheat Yield Prediction for Optimal Sensor
In this study, five regression algorithms (RF, PLS, RR, KNN, and XGboost) were employed, alongside three ensemble learning algorithms, to forecast wheat yield.These predictions were based on features extracted from three distinct types of sensors (RGB, MS, and TIR) and their various combinations, as depicted in Table 3 and Figure 6.Among the predicted results from the single data source, the fusion of two data sources, the fusion of three data sources and the fusion of four data sources across eight machine learning algorithms, the highest R 2 values were observed for Texture (R 2 = 4.773), Texture + TIR (R 2 = 4.934), RGB + Texture + TIR (R 2 = 5.153) and RGB + MS + Texture + TIR (R 2 = 5.238).Additionally, the prediction error value based on the RGB + MS + Texture + TIR data fusion model was also the lowest, with RMSE = 5.546 t ha⁻ 1 and NRMSE = 55.733%.Therefore, the RGB + MS + Texture + TIR data fusion yielded the most accurate predictions for wheat yield, surpassing single, dual and triple data source fusion.Specifically, it achieved a higher overall R 2 value, ranging from 9.74% to 33.48%, 6.17% to 19.61% and 1.64% to 8.88%, respectively, compared to the other fusion strategies.Furthermore, it demonstrated a lower total RMSE, decreasing by 7.53-17.72%,5.12-16.07%and 3.23-6.97%,respectively.Similarly, the total NRMSE was reduced by 7.54-17.73%,5.13-16.06%and 3.31-6.98%,respectively.In conclusion, the RGB + MS + Texture + TIR data fusion emerged as the most precise in estimating wheat yield.To further analyze the contribution of each sensor, when compared with the three-source data fusion wheat yield predictions, the RGB, MS, Texture and TIR sensors improved the R 2 by 2.13-7.57%,0.72-2.76%,6.33-11.05%,and 3.90-7.15%,respectively.Among them, the Texture sensor contributed the most to the accuracy of wheat yield predictions.

Optimal Machine Learning Algorithm for Wheat Yield Prediction
Based on the results above, the fusion data of RGB + MS + Texture + TIR demonstrated high accuracy in predicting wheat yield.Among the five base models, the RR model performed the best when using RGB data (R 2 = 0.517) and TIR data (R 2 = 0.490) as single data sources.Conversely, PLS exhibited the highest predictive value for MS data (R 2 = 0.534), while XGboost showed the highest predictive value for Texture data (R 2 = 0.593).After the fusion of multi-sensor data, the prediction accuracy of most machine learning models was notably enhanced.The findings indicated that XGboost emerged as the top-performing predictive machine learning model, achieving an R 2 value of 0.660 (Table 3).The analysis results for the models on different data combinations are depicted in Figure 6.The R 2 value of XGboost was observed to be 0.011, 0.014, 0.0053, and 0.044 higher than RF, PLS, RR, and KNN, respectively.Furthermore, the XGboost model exhibited smaller errors in terms of RMSE and NRMSE.Specifically, its RMSE was lower than that of the other four models by 0.010, 0.013, 0.005, and 0.040, respectively, while the NRMSE was lower by 0.104, 0.131, 0.053, and 0.399, respectively.These findings further confirm the superiority of XGboost in wheat yield prediction, followed by RR.
Compared with the basic model, three ensemble methods were used in this study, including two second-layer ensemble methods (stacking and feature-weighted methods) and one third-layer ensemble method (simple average method).The analysis results are shown in Table 3.All three ensemble methods demonstrated higher model prediction accuracy compared to the single ML model.When compared to the single ML model that performed best on single sensor data, stacking, feature-weighted and simple average ensemble learning increased the R 2 values of the single sensor by 1.53-2.16%,0.50-2.67%and 14.33-21.26%,respectively.Additionally, RMSE was reduced by 0.81-1.48%,0.33-1.55%and 1.10-1.65%,respectively, while NRMSE was reduced by 0.83-1.51%,0.37-1.54%and 1.08-1.66%,respectively.
Compared with the single ML models exhibiting the best performance in the optimal combination of multi-source data fusion (RGB + MS + Texture + TIR), the prediction accuracy of the three ensemble learning methods was also superior, surpassing each single model by 1.23%, 1.07% and 11.01%, respectively.Additionally, the RMSE was reduced by 1.19%, 1.03% and 1.97%, respectively, while NRMSE decreased by 1.20%, 1.04% and 1.68%, respectively.These results confirmed the effectiveness of the ensemble learning model.
Additionally, Figure 7 illustrates that the R 2 of the simple average ensemble model shows a significant difference (p < 0.05) compared to the other models, being notably higher than that of the stacking ensemble model and the feature-weighted ensemble model, with increases of 1.121 and 1.157, respectively.Moreover, both RMSE and NRMSE were significantly lower in the simple average ensemble model compared to the other two ensemble models (p < 0.05).By comparing the correlation and linear fit between the predicted yield and measured yield of different ensemble methods under the optimal combination of RGB + MS + Texture + TIR (Figure 8), it was observed that there was a significant positive correlation between the yield predictions of the simple average ensemble method and the actual yields, demonstrating good fit and reliability.Therefore, it can be inferred that the simple average ensemble model was more accurate for wheat yield prediction.

Prediction of Wheat Yield from Single Sensor Data and Multi-Sensor Fusion Data
In this study, through the analysis of the single sensor prediction results, it was found that the wheat yield prediction accuracy ranked as follows: Texture > MS > RGB > TIR.Among them, texture features exhibited superior performance in wheat yield prediction accuracy, with R 2 values ranging from 0.539 to 0.593.This has been consistently demonstrated in studies across various sites and crops.The utilization of PCA in maize yield prediction effectively reduced the standard deviation of the prediction performance, thereby enhancing the accuracy of yield forecasts [53].
In Vietnam, the rice yield prediction model utilizing PCA-ML exhibited an average improvement of 18.5-45.0%compared to using ML alone.This outcome fully underscores the reliability and effectiveness of the combined model [54].This indicates that the method combining PCA and ML effectively handles redundant data in multi-channel texture features, consequently leading to a significant enhancement in the accuracy of yield prediction.
The wheat yield prediction results from MS data were superior to those from RGB data, primarily due to their capability to capture spectral information across multiple bands from visible light to near infrared.Particularly, the near-infrared band provides the opportunity to accurately calculate Vis such as NDVI, which in turn can be utilized to better assess wheat yield.Furthermore, the stability of MS cameras across varying lighting conditions minimizes the influence of environmental fluctuations on prediction accuracy, ensuring the provision of reliable data for yield prediction [55,56].The performance of TIR information extracted by TIR sensors was not satisfactory, with R 2 values ranging from 0.434 to 0.490.This finding aligns with the results reported by Luz and Elarab [57,58].The possible explanation for this could be that canopy heat information is intricately linked to factors such as leaf water content, pigment concentration and canopy structural characteristics.If these factors are not appropriately controlled or corrected for during data processing, they can significantly impact the accuracy of yield predictions [7,59].
Multi-sensor fusion (RGB + MS + Texture + TIR) demonstrated clear advantages over single sensor prediction.By harnessing the capabilities of multiple sensors and integrating data from different sources, it provided a more comprehensive overview of crop growth information, thereby enhancing forecast accuracy [13].
However, it also poses challenges in terms of data processing and algorithm optimization.Future research efforts should focus on streamlining the fusion process and enhancing algorithm efficiency to achieve more reliable wheat yield prediction.

Application of Basic Model in Wheat Yield Prediction
Five basic models were employed for wheat yield forecasting.XGboost, as a novel ML algorithm, has demonstrated superior predictive capabilities compared to other models, such as RF [60].RF has been favored by many researchers due to its capability of removing redundant information from spectral data and achieving higher inversion accuracy through a smaller set of spectral characteristic variables [60,61].Indeed, the XGboost model exhibited exceptional performance in the wheat yield prediction task.This was primarily attributed to its innovative algorithm design and optimization strategy, which effectively minimized overfitting and reduced computational demands.Consequently, the model's generalization ability was significantly enhanced, leading to more accurate predictions [62].This research result has been corroborated by Li et al., who confirmed that the XGboost model outperforms other models in soybean yield prediction when utilizing the same input data [63].Furthermore, in the prediction of winter wheat yield, the XGboost model not only marginally exceeded the RF model in terms of prediction accuracy but also demonstrated significant superiority in computational efficiency in most scenarios.Notably, it requires less time, making it a more efficient and practical choice for yield prediction [64].These results underscore the advantages of XGboost in processing large-scale agricultural data, particularly in situations where swift and efficient output predictions are imperative.The model's superior performance in terms of both accuracy and computational efficiency demonstrates its potential as a valuable tool for agricultural yield forecasting.
The PLS model exhibited the poorest performance in wheat yield prediction, both in single-sensor and multi-sensor data fusion scenarios.Although PLS is capable of addressing the issue of multicollinearity among independent variables, as the number of potential variables increases, the training model tends to overfit.This overfitting phenomenon adversely impacts the model's performance on new test data, limiting its accuracy and reliability for yield prediction tasks [65,66].

Performance of Ensemble Learning in Wheat Yield Prediction
Despite the recent significant advancements in ML methods and their successful applications across various fields, the pure data-driven approach in utilizing ML technology still poses some fundamental limitations.The accuracy and uncertainty of predictions generated by ML algorithms heavily depend on several factors.These include the quality of the data, the representativeness of the chosen model, and the dependencies between the input and target variables within the collected dataset [67].Data that contain high levels of noise, erroneous information, outliers, biases, and incompleteness can significantly diminish the predictive capabilities of a machine learning model [21].For this reason, this study incorporated three ensemble methods: stacking, feature-weighted and simple average ensemble.In comparison to a single model, the ensemble model demonstrates higher precision.This finding aligns with the outcomes of previous research [13,68].The R 2 values of the stacking ensemble method, which served as the second layer, were closely comparable to those of the feature-weighted ensemble learning approach.The primary advantage of the stacking ensemble method lies in its ability to learn and capitalize on the complementarities among diverse base learners, thereby enhancing the accuracy of predictions [69].However, since the performance of each primary learner varies, the presence of large output errors in some primary learners can introduce significant error features into the training process of the meta-learner.This, in turn, can negatively impact the prediction accuracy of the entire model [70].Another feature-weighted ensemble learning method involves correcting the prediction error of each primary learner.By doing so, it addresses the issue of poor prediction performance of individual models to some extent, generating a dataset that is more conducive to learner training [67].Therefore, when there is variation in the correlation among features within the data, it is a prudent choice to select ensemble methods tailored to the specific characteristics of the dataset [71].In summary, the prediction accuracy of both stacking and feature-weighted methods was comparable, likely due to the unique advantages each approach offers.Notably, the novel layer 3 simple average ensemble method exhibited the highest R 2 value.This superior performance may be attributed to its ability to effectively integrate prediction results from diverse methods, mitigating potential issues such as model disparities, variations in sample distribution, and inaccuracies in feature weights, ultimately leading to enhanced prediction accuracy.
With the rise of deep learning technology, especially the innovative applications of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), they have demonstrated exceptional capabilities in handling large-scale and high-dimensional datasets [72].In the future, within the pivotal domain of agricultural yield forecasting, the deployment of deep learning models will overcome the constraints inherent in conventional methodologies.These models have fostered the integration of diverse datasets, including satellite imagery, climatic data, and soil conditions, to perform a holistic analysis aimed at forecasting the production of crops [73][74][75].

Conclusions
This study delved into the capabilities of UAV multi-sensor data fusion and machine learning algorithms for wheat yield prediction.Three ensemble learning methods of stacking, feature-weighted and simple average were proposed to improve the performance of the prediction model.The results demonstrated that these ensemble learning methods enhanced the accuracy of wheat yield prediction.By synthesizing the strengths of different learners, ensemble learning methods effectively mitigated the potential risk of overfitting associated with individual models, thereby bolstering the model's generalization ability.The introduction of the simple average as the third layer ensemble learning represented a novel concept in wheat yield prediction.This method not only evaluated and improved the model's forecasting performance in a more robust and comprehensive manner, but also enhanced its adaptability and flexibility to data variations while maintaining high predictive accuracy.These ensemble learning methods are expected to become valuable tools for assessing the yield potential of diverse wheat genotypes.By providing a scientific basis and crucial decision support, they will contribute significantly to the development of superior wheat varieties with higher yield potential.

Figure 4 .
Figure 4. Principal component analysis loading plots for different texture features.

Figure 6 .
Figure 6.Comparison of the prediction accuracies of models for different sensors and their combinations.(a) R 2 ; (b) RMSE; (c) NRMSE.

Figure 7 .
Figure 7.Comparison of the prediction accuracies of different ML algorithms.Note: The same letter indicates that there is no significant difference between the two groups of data, while different letters indicate significant differences (p < 0.05).

Figure 8 .
Figure 8.Comparison of ensemble learning prediction and measured yields.

Table 1 .
Vegetation index formula for UAV images.

Table 2 .
Initial eigenvalues, contribution rates of variance and cumulative contribution rates of variance of texture feature principal components.

Table 3 .
Test accuracy statistics of different models for wheat yield prediction. TIR, thermal infrared features; RF, random forest; PLS, partial least squares; RR, ridge regression; KNN, k-nearest neighbor; XGboost, extreme gradient boosting decision tree; StRR, stacking ensemble using ridge regression as a secondary learner; En_FW featureweighted ensemble as a secondary learner; En_Mean simple mean ensemble as a tertiary learner.