remote sensing Prediction of Field-Scale Wheat Yield Using Machine Learning Method and Multi-Spectral UAV Data

: Accurate prediction of food crop yield is of great signiﬁcance for global food security and regional trade stability. Since remote sensing data collected from unmanned aerial vehicle (UAV) platforms have the features of ﬂexibility and high resolution, these data can be used as samples to develop regional regression models for accurate prediction of crop yield at a ﬁeld scale. The primary objective of this study was to construct regional prediction models for winter wheat yield based on multi-spectral UAV data and machine learning methods. Six machine learning methods including Gaussian process regression (GPR), support vector machine regression (SVR) and random forest regression (RFR) were used for the construction of the yield prediction models. Ten vegetation indices (VIs) extracted from canopy spectral images of winter wheat acquired from a multi-spectral UAV at ﬁve key growth stages in Xuzhou City, Jiangsu Province, China in 2021 were selected as the variables of the models. In addition, in situ measurements of wheat yield were obtained in a destructive sampling manner for prediction algorithm modeling and validation. Prediction results of single growth stages showed that the optimal model was GPR constructed from extremely strong correlated VIs (ESCVIs) at the ﬁlling stage (R 2 = 0.87, RMSE = 49.22 g/m 2 , MAE = 42.74 g/m 2 ). The results of multiple stages showed GPR achieved the highest accuracy (R 2 = 0.88, RMSE = 49.18 g/m 2 , MAE = 42.57 g/m 2 ) when the ESCVIs of the ﬂowering and ﬁlling stages were used. Larger sampling plots were adopted to verify the accuracy of yield prediction; the results indicated that the GPR model has strong adaptability at different scales. These ﬁndings suggest that using machine learning methods and multi-spectral UAV data can accurately predict crop yield at the ﬁeld scale and deliver a valuable application reference for farm-scale ﬁeld crop management.


Introduction
Accurate prediction of grain crop yield is of great value for the formulation of food policies, the regulation of food prices and agricultural management, and is urgently needed for the development of precision agriculture [1]. Small and medium-sized farms are widely distributed in the world, accounting for more than 80% of global farms, and their food security cannot be guaranteed [2]. Predicting crop yields at small and medium-sized plots can help agricultural producers locate in advance target areas where crop yields are low due to abnormal crop health and low soil fertility. To enable early intervention measures to be taken, precise fertilization and management are required, which represents an effective way to maintain the normal level of crop yield and ensure food security [3]. The traditional method of predicting yield is primarily through field investigation, which is inefficient and even destructive to crops. With the diversity of remote sensing platforms and improvement in the spatial resolution of images, remote sensing techniques have been considered as effective methods for the monitoring of crop growth [4,5]. At present, satellite remote sensing technology is widely used in large-scale agricultural monitoring [6], but it has some shortcomings, such as long revisit periods, coarse resolution and ability to operate in limited meteorological conditions [7,8]. Low-altitude unmanned aerial vehicle (UAV) remote sensing has the advantages of improved spatio-temporal resolution, low operation cost, flexibility and repeatability [9,10]. It can quickly and efficiently acquire centimeterlevel remote sensing images of large areas of farmland and effectively assist agricultural operators in management and decision making [11].
Currently, the three typical optical sensors commonly carried by UAV platforms are RGB, multispectral and hyperspectral sensors. RGB cameras are low-cost, but the number of bands is small, and it is difficult to capture complex information contained in the spectrum of the crop canopy [12]. Although hyperspectral sensors have outstanding performance in the high precision characterization of spectral response, they are expensive and involve complicated data processing [13]. Multi-spectral sensors have recently attracted extensive attention from the agricultural quantitative remote sensing community because they are more economical and contain red edge and near-infrared bands, which are important for the estimation of agricultural parameters [14]. The band wavelength of multispectral imaging equipment is generally between 400 and 900 nm, mainly including blue, green, red, red edge and near-infrared. Different vegetation indices (VIs) obtained from different bands have been widely used to determine the values of crop biophysical parameters, which can accurately reflect details of crop growth and the response of the crop to stress (e.g., pests, diseases, temperature, soil, water, etc.) [10]. Many researchers have estimated crop physical and chemical parameters, as well as yield, based on VIs extracted from UAV multi-spectral images, which have achieved good results [15][16][17].
For crop yield prediction, several methods, including process-oriented crop growth models and empirical statistical models [18,19], have recently been developed. Processoriented crop models are also used to simulate and predict production around the world, such as the world food studies (WOFOST) model [20], the decision support system for agrotechnology transfer (DSSAT) [21], and the agricultural production systems simulator (APSIM) [22]. Although these models are effective in simulating crop yield, they rely on a large number of initial input variables with a priori knowledge, such as climate, soil, and hydrology. Moreover, they are hampered by the high cost of time and labor [23,24]. Traditional statistical regression predicts outputs by developing regression equations between meteorological variables (e.g., temperature, precipitation, solar radiation, etc.) and measured output at different times and different spatial scales [25,26]. These regression results clearly show the impact of climatic factors on yield. The factors that affect yield prediction vary, depending on geographic location, crop varieties and growing seasons [23].
As an immediate successor to statistical regression, machine learning can be used to analyze the hierarchical and non-linear relationships between predictor variables and response variables, and usually performs better than traditional linear regression methods regarding goodness of fit [27]. In addition, machine learning can effectively analyze spectral data and identify crop growth information and is widely used in physiological parameter estimation and crop yield prediction [28,29]. Therefore, many researchers have used UAV remote sensing data and machine learning methods to improve the accuracy of crop yield prediction models. For example, Fu et al. constructed prediction models for winter wheat yield using six machine learning methods including partial least squares regression (PLSR), and found that random forest regression (RFR) had the optimal prediction result in a model constructed using a normalized difference vegetation index (NDVI) (R 2 = 0.78) [30]. Maimaitijiang et al. used five machine learning methods to develop soybean yield predic-tion models and found that the highest accuracy was obtained for a deep neural network that provided an R 2 of 0.72 [31].
Currently, the prediction of many crop yields is based on fixed VIs with a single growth stage of crops [13,32]. Hence, the characteristics of crops at different growth stages are not taken into account [33]. According to previous research, a few studies predict crop yield using an optimal combination of some of the VIs of multiple growth stages. Intuitively determining the optimal VI combination of multiple growth stages can better reflect crop growth law and greatly improve the prediction models of crop yield [34,35]. Furthermore, it is often difficult to predict yield at field scale due to limited surface weather observations on small farms [27]. There are few studies on the stability of yield prediction models at different field scales, so it is difficult to ensure the stability of the model in a larger range of yield prediction. Therefore, the accurate prediction of crop yield at a farm field scale is crucial for optimizing the management of large-scale agricultural operations and improving food security [36].
Since wheat is one of the most widely cultivated food crops in the world and has very high economic value [37], winter wheat was selected for investigation in this study. Based on machine learning methods and the VIs extracted from multi-spectral UAV images, fieldscale yield prediction models were developed to better suit the study area and surrounding areas, and the accuracies of the prediction models constructed were subsequently evaluated. The main objectives of this study were to: (1) compare the accuracies of the six selected regression models and to identify the optimal model for single growth stages of winter wheat; (2) identify the prediction model constructed by the combination of the VIs at multiple stages that leads to the best accuracy; (3) verify the performance of the optimal prediction model in larger sampling plots, and also to evaluate the effectiveness of the optimal model in predicting winter wheat yield at field scale.

Study Area
The ground observation experiment was conducted at the Tongshan experimental station (117 • 23 48 E, 34 • 8 24 N; Figure 1) in Xuzhou City, Jiangsu province, China. Xuzhou is located in Huang-Huai-Hai Plain, the main winter wheat producing area in China. It is an important grain production base in north China and plays a very important strategic position in China's grain production pattern. The study area was located in the monsoon climate of medium latitudes, with an average annual temperature of 13.9 • C, an average annual frost-free period of 210 days and average annual precipitation of 868.6 mm. The main soil type in the study area is tidal soil. Note that the main planting structure is the rotation of winter wheat and summer corn. Wheat seeds (Xumai-33) were sown in early October 2020 and the wheat was harvested in early June 2021. An intensive cultivation method was adopted in the study area, and a base fertilizer plus topdressing method was used in fertilization. The basal fertilizer dosage was 1000 kg of organic fertilizer and 30 kg of NPK compound fertilizer (N:P:K = 15:15:15) and 10 kg of urea per mu; the topdressing dosage was 15 kg urea per mu in the middle and late March of 2021.

UAV Image Acquisition and Processing
In our experiments, image data were obtained using a multi-rotor UAV (DJI Phantom4 RTK Multispectral) at 100 m altitude with a 5.4 cm spatial resolution (focal length was 5.74 mm, flight speed was 7.5 m/s, the sideways overlap was 60%, the heading overlap was 75%). The UAV images were acquired on 8 April (jointing stage), 19 April (heading stage), 29 April (flowering stage), 20 May (filling stage) and 26 May (milk-ripe stage) in 2021, and the flights were conducted under stable sunlight conditions between 11 a.m. and 1 p.m. The software DJI GS PRO (https://www.dji.com/cn/ground-station-pro/, accessed on 2 June 2021) was used to design the route of the flights and observe the flight status. The multispectral DJI Innovations 4 camera onboard the UAV consisted of an RGB color sensor with a resolution of 1600 × 1300 and five single-band sensors with central wavelengths of 450 ± 16 (blue, B), 560 ± 16 (green, G), 650 ± 16 (red, R), 730 ± 16 (red edge, RE) and 840 ± 26 nm (near-infrared, NIR). After image data acquisition, the UAV images are required to be preprocessed with image mosaicing and radiometric calibration. The tag image file format (TIFF) images obtained from the UAV were recorded as a set of digital number (DN) values, which needed to be converted into reflectance through radiometric calibration [38]. Before each aerial shooting, the images of a calibration whiteboard (as a reference panel) on the ground were collected for radiometric calibration. Our experiment used the DN values, and the reflectivity of the calibration whiteboard area obtained before the UAV flight, to convert ground DN values into the corresponding reflectivity using the empirical linear correction method [39]. The aerial survey software Pix4Dmapper (Pix4DSA, Co. Ltd., Prilly, Switzerland) was used to complete the radiometric calibration of the multispectral images, and the image mosaic was completed at the same time. Finally, a high-resolution orthophoto of the winter wheat in the study area was obtained.

Yield Data Acquisition
At the mature stage of winter wheat, 119 of 1 m × 1 m quadrats ( Figure 1) with uniform growth were randomly selected from the fields in the study area. As shown in Figure 2, all the wheat of the quadrats was harvested (destructive sampling). Moreover, the yield of winter wheat samples was obtained after the processes of threshing, drying to constant weight and weighing (the unit is g/m 2 ). GPS real-time kinematic (RTK) positioning was used to obtain the central position of the sampling squares, and the UAV images of the quadrats region were clipped according to the central position of the quadrats and used to extract the VIs for the modeling of the winter wheat yield. The frequency distribution of the measured yield data of winter wheat is shown in Figure 3. It can be seen that the spatial heterogeneity of the wheat yield in different quadrats was significant. The lowest and highest yields of the winter wheat samples were 145.9 g/m 2 and 839.7 g/m 2 , respectively. The values of most of the samples were distributed between 400 and 700 g/m 2 , implying an approximately normal distribution and good representativeness.

Selection of VIs
It is known that VIs quantitatively indicate crop growth state and can be used for wheat yield prediction [40,41]. In this study, ten VIs obtained from UAV multispectral images were selected to construct winter wheat yield prediction models, and their related information is shown in Table 1. The ten VIs have been widely used in crop yield prediction [42,43], including green red ratio index (GRRI), green blue ratio index (GBRI), red blue ratio index (RBRI), normalized difference yellowness index (NDYI), NDVI, ratio vegetation index (RVI), modified triangular vegetation index (MTVI), enhanced vegetation index 2 (EVI2), modified soil adjusted vegetation index 2 (MSAVI2) and transformed chlorophyll absorption in reflectance index (TCARI). Each VI value of each quadrat was obtained by substituting the mean reflectivity of each band of the clipped quadrat image into the VI formulation in Table 1. Table 1. Selected vegetation indices (VIs) for the yield prediction of the winter wheat. [52] Each subfigure in Figure 4 shows the variation of each VI extracted from the multispectral images of 119 yield quadrats with the five crop growth stages. It can be seen that the VIs in most of the subfigures (except for the second, third, and fourth subfigures from left to right) showed a gradually decreasing or increasing trend with crop growth, and the VIs at the filling and milky-ripe stages were significantly different from the other stages in the same subfigure. In addition, the VIs at the flowering stage showed a notable difference from the first two stages. These phenomena reflect the changes in crops at different growth stages.

Machine Learning Methods for Yield Prediction
Six common machine learning algorithms were tested for the identification of the optimal prediction model of wheat yield, including Gaussian process regression (GPR), support vector machine regression (SVR), RFR, decision tree (DT), least absolute shrinkage and selection operator (Lasso), and gradient boost regression tree (GBRT). The hyper-parameters of these models were tuned to improve model accuracy through the GridSearchCV package in Python 3.7.
GPR is a nonlinear regression algorithm conforming to a generalized Gaussian probability distribution [53]. Compared with conventional regression methods, the parameter optimization of GPR is simpler and is often used for regression problems with small samples [54]. In this study, the DotProduct kernel function was applied in the GPR method to construct the yield prediction models. The SVR designs the linear classifier of the maximum decision boundary to ensure the minimum generalization error in the worst case, according to the principle of structural risk minimization [55]. The SVR model uses kernel functions to map the input into a high-dimensional space and then make it linearly separable to balance error minimization and overfitting [25]. In this study, the linear kernel function was applied in the SVR method. The RFR model integrates multiple decision trees through the bagging algorithm idea of ensemble learning, and is suitable for high-dimensional data for feature selection and machine learning [56]. The algorithm is more robust to noise immunity and less prone to overfitting problems [57]. For the RFR model, n_estimators, max_depth, min_samples_split were all optimally adjusted using the GridSearchCV package. Although the DT is a popular and effective modeling algorithm mainly for predicting simple data [58], it can be more easily overfitted compared with the RFR, and provides poor prediction results [59]. The Lasso method obtains a relatively refined model by constructing a penalty function and forcing the sum of the absolute values of some regression coefficients to be within a fixed value and setting some of the other regression coefficients to be zero [60]. Although this method can effectively eliminate significant and high correlation variables, the penalty function may lead to underfitting [27]. The GBRT method is a DT integrated learning algorithm with forward distribution [61]; it builds a new model through the current model and a fitting function to minimize the loss function, thus the accuracy of the model is gradually improved [62,63]. This algorithm can reduce the model learning rate to prevent overfitting, however, the constructed model is sensitive to parameter changes.

Evaluation Indexes of Models
The procedure followed for the six prediction models for winter wheat yield, constructed based on the ten VIs extracted from UAV multispectral image data and measured yield, is shown by the flowchart in Figure 5. A hold-out method was used to select the training and test samples [64]. All samples were divided into disjoint training and test samples in a ratio of 3:1. At the same time, we used a distribution ratio of training and test samples of roughly 3:1 in different yield intervals, in a similar way to stratified sampling, to avoid the phenomenon of uneven sample distribution. To avoid the influence of an inconsistent order of magnitude of the sample data, all the sample data were standardized before the modeling process started. In this study, the coefficient of determination (R 2 ), root mean square error (RMSE) and mean absolute error (MAE) were selected for the evaluation of the performance of the model, and their formulas are: where i (=1, 2, . . . , k) is the index of the sample; k is the number of the samples used in the modeling; y i is the measured winter wheat yield; y is the mean of all y i ; f i is the predicted winter wheat yield; f is the mean of all f i . The larger the R 2 value, the higher the accuracy of the model. A small RMSE or MAE value means a small discrepancy between the predicted yield and the measured yield.

Comparison of Model Accuracies at Different Stages
In this study, the measured yield for winter wheat samples (which are mentioned in Section 2.2.2) were taken as the truth of the yield, and the ten VIs at each growth stage were used as the input variables of the selected six regression models. Figure 6 shows the accuracy of the predictions made by these models at each single growth stage. It was found that the results of the GPR model were 0.77 ≤ R 2 ≤ 0.87, 49.41 g/m 2 ≤ RMSE ≤ 65.49 g/m 2 and 42.82 g/m 2 ≤ MAE ≤ 55.48 g/m 2 . The R 2 , RMSE and MAE values of the SVR and RFR models at the jointing, flowering and filling stages were all larger than 0.72, less than 71.60 g/m 2 and less than 64.07 g/m 2 , respectively. However, the prediction accuracies of DT, Lasso and GBRT were all low, and the R 2 , RMSE and MAE values at the jointing, heading, flowering and milk-ripe stages were all under 0.71, above 73.40 g/m 2 and above 63.97 g/m 2 , respectively. These results imply that GPR, SVR and RFR were more suitable for yield prediction than the other three; thus, these three models were considered as the candidates for the identification of the optimal model. The training results of GPR, SVR and RFR are presented in the form of scatter plots of yield prediction of the models and measured yield at each growth stage, see Figure 7. The plots show good linear fitting between the yield predictions and measurements. The RMSEs and MAEs of the three models were all lower than 78.20 g/m 2 and 64.07 g/m 2 , respectively, showing high prediction accuracy. Moreover, it was found that GPR outperformed SVR and RFR with respect to comparison of the prediction accuracies of the above three models at the same growth stage. Among the accuracies of the same model at different growth stages, the prediction accuracy at the filling stage was the best.

Correlation Analysis
In this section, the Pearson correlation coefficient (R) between each of the ten selected VIs and the yield at each stage is determined, and the correlation is used to identify both the growth stages and the combination of some of the ten VIs that led to the best accuracy. The Pearson correlation coefficient is often used to evaluate the degree of correlation between two variables. Its advantage lies in the averaging of the two variables, which reduces the influence of the numerical difference between the two variables on the similarity of the two variables. The formula of R between yield and each VI at each growth stage is: where i (=1, 2, . . . , k) is the index of the sample; k is the number of the samples used in the modeling; x i is the value of the VI at the growth stage; x is the mean of all x i ; y i is the measured yield; y is the mean of all y i . The closer |R| is to 1, the stronger the correlation between the VI and the yield. Generally, if the correlation values are in the range 0.8-1.0, 0.6-0.8, 0.4-0.6, 0.2-0.4, and 0.0-0.2, then they are regarded to be an extremely strong correlation, a strong correlation, a moderate correlation, a weak correlation, and a very weak or no correlation, respectively [65].  Figure 8 shows the absolute value of R between yield and each of the ten VIs at each growth stage. We found that, at the filling stage, the VIs of RBRI, RVI, NDVI, MTVI, EVI2, MSAVI2 and the yield were extremely strong correlated, which can be called extremely strong correlated VIs (ESCVIs). At the flowering stage, RVI, MTVI, EVI2, MSAVI2 were ESCVIs. The VIs of the other three growth stages were mainly moderately or strongly correlated with the yield. It can be concluded that the correlation between each of the VIs and the yield was generally highest at the filling stage, and higher at the flowering stage than the other three stages. This explains why the accuracy of the selected six models at the filling stage was the highest among the five growth stages in Section 3.1. Previous studies have shown that the correlations between each VI and crop yield at the flowering and filling stages are particularly close [66,67], which is consistent with the results in this section.

Yield Prediction for Multiple Stages
To determine the correlation between each VI and crop yield at each growth stage, in this section, the GPR, SVR and RFR methods are used to build yield models for multiple stages and different combinations of the VIs. First, all the VIs at the flowering and filling stages were input into the model training simultaneously. Secondly, the ESCVIs of Section 3.2.1 at either the flowering stage (RVI, MTVI, EVI2, MSAVI2) or the filling stage (RBRI, RVI, NDVI, MTVI, EVI2, MSAVI2) were input into model training. Finally, all the ESCVIs of the two stages were used for the model training. In this experiment, 89 samples were randomly selected for the training phase and the remaining 30 samples were used for the verification phase. The accuracy of the models constructed with two combinations of the VIs at each of the two stages and their merged stages (i.e., flowering and filling stages) are shown in Table 2. It can be seen from Table 2 that the R 2 , RMSE and MAE values of the GPR, SVR and RFR models were in the range 0.75-0.88, 49.18-67.66 g/m 2 , 42.57-53.08 g/m 2 , respectively. In terms of RMSE and MAE, GPR slightly outperformed SVR at all three stages mentioned in this section. At the flowering stage, SVR effectively improved its performance from R 2 = 0.77, RMSE = 64.34 g/m 2 and MAE = 52.32 g/m 2 to R 2 = 0.82, RMSE = 59.26 g/m 2 and MAE = 49.65 g/m 2 using the ESCVIs (consisting of RVI, MTVI, EVI2, and MSAVI2). Different from the flowering stage, the three models at the filling stage using the ESCVIs (consisting of RBRI, RVI, NDVI, MTVI, EVI2 and MSAVI2) slightly outperformed the models using all the VIs. Moreover, when jointly using VIs at the flowering & filling stages, Table 2 shows that the accuracy of the above three models based on all the VIs was not significantly improved. However, the accuracies of the models based on the ESCVIs were slightly improved at this stage, especially the GPR model (R 2 = 0.88, RMSE = 49.18 g/m 2 , MAE = 42.57 g/m 2 ) and the RFR model (R 2 = 0.86, RMSE = 52.88 g/m 2 , MAE = 43.42 g/m 2 ). All the above results imply that the models that were based on the ESCVIs, rather than all the VIs, significantly improved their accuracies at the flowering stage. In addition, the accuracies of the predictions made by the three models were also improved at the filling and flowering & filling stages, but the improvement was insignificant.

Yield Prediction in Large Plots
To evaluate the adaptability of the optimal model at different scales, ten 3 m × 3 m and seven 5 m × 5 m sampling plots were selected for testing. These plots were in the field shown in Figure 1, and their distribution is shown in Figure 9. To obtain the mean yield of each plot, the three (or five 1 m × 1 m sampling squares) shown in this figure were randomly selected for the sampling of the plot. The mean yield of the three (or five) sampling squares in the plot was used as the reference (or truth) for the yield of the plot. Based on the results presented in Section 3.2 that the GPR model constructed by the ESCVIs at the flowering and filling stages was the optimal model, this model was used to predict the mean yield of winter wheat in each plot; the comparisons between the predicted values and the truth of each plot are shown in Table 3.  Table 3. Comparison of yield prediction accuracy of the GPR model constructed by combining the ESCVIs at the flowering and filling stages in different sized plots. Boldface represents the best performance. It can be seen from Table 3 that the RMSE of the predicted yields of all the 3 m × 3 m sampling plots was 55.95 g/m 2 , which was smaller than the RMSE of 63.44 g/m 2 of the 1 m × 1 m sampling squares (see all the green points in Figure 9) by 7.49 g/m 2 . Meanwhile, the MAE of the predicted yields of all the 3 m × 3 m sampling plots was smaller than the MAE of the 1 m × 1 m sampling squares by 12.24 g/m 2 . The yield sampling points in the 5 m × 5 m sampling plots were evenly distributed within the sampling plots, and the RMSE and MAE of the yield prediction were smaller than that of 1 m × 1 m plots by 23.68 g/m 2 and 15.49 g/m 2 , respectively. The reason is assumed to be that the prediction errors of wheat yield in large sampling plots will cancel each other, which is consistent with previous studies [64]. From Table 3, the difference between the predicted mean yield and the measured mean yield was small, which verifies the strong adaptability of the GPR model constructed by combining the ESCVIs at the flowering and filling stages at different scales.

Spatial Distribution of Predicted Yield
In this section, the accuracy of the predictions made by the optimal model (i.e., the GPR model constructed by the ESCVIs at the flowering and filling stages) for an area that was expanded from the area covered by the sampling points (in Figure 1) is evaluated. Since the true yield of the expanded area is unknown, the spatial distribution of the predicted yield was compared with that of the UAV orthoimage. Figure 10 shows the results, where a, b, c, d, e, f, and g are the feature areas selected for analysis of their accuracy, and the minimum unit of prediction was 1 m × 1 m. It should be noted that the reason for the selection of the filling stage in (B) is that model accuracy at the filling stage was higher than at the flowering stage.  Figure 10, we found that the spatial characteristics of the yield predictions shown in (A) agree with the (horizontal) line seeding texture of the sowing machine shown in (B), especially in the feature areas b and d. The yield predictions are mainly distributed in the range 400-700 g/m 2 , which is consistent with the measured yields of the region. Those areas where the yield was above 700 g/m 2 (in orange) were mainly concentrated in the eastern region (areas c, e, and g), while the areas that yielded under 400 g/m 2 were concentrated in a and f. During the period of sampling, we found that the planting density in the eastern part of the plot was larger than the western part, while the growth in the western part was obviously not as good as in the eastern part. Moreover, bare soil could be directly observed in the f area in Figure 10B. Therefore, the spatial distribution of the predicted yield was roughly consistent with the actual situation.

Comparing (A) and (B) in
To some extent, the predicted spatial distribution of yield reflected the yield distribution of winter wheat in farmland. The actual yield of winter wheat was higher in the region with higher yield prediction, which was roughly consistent with the field survey. According to Figure 10, the areas with low yield (area a and f) can be found in advance, associated with insufficient nitrogen application, diseases and insect pests, etc., and prevention and control can be carried out in advance to maintain the normal level of yield and ensure food security.

Discussion
Many studies have found that ground spatial heterogeneity and the physiological characteristics of crops vary with crop growth stage [68][69][70]. Figure 6 indicates that the ten selected VIs at different stages may lead to significantly different accuracies of yield predicted by a model, which is consistent with previous studies. The jointing stage is the key growth stage for determining the numbers of panicles and amount of grain, but the VIs at this stage cannot reflect the dry matter accumulation process of yield forming organs, resulting in low prediction accuracy. The filling stage is the stage when starch, protein and other organic matter produced by photosynthesis are transferred from vegetative organs to grains [71]. The VIs at the filling stage directly reflect the final growth state of winter wheat and are closely related to the thousand-grain weight, thus yield prediction accuracy is highest at the filling stage. As can be seen from Figure 7, the R 2 values for the prediction models constructed by GPR, SVR and RFR machine learning methods at the filling stage were 0.87, 0.86 and 0.83, respectively, showing the highest accuracy among all the stages, which is consistent with previous studies [71,72]. Due to the gradual transfer of nutrients from canopy leaves and stems to grains, chlorophyll content in leaves decreases, and the correlation between the VIs based on red and near-infrared wavelengths and dry matter accumulation in grains decreases, thus the accuracy of the model decreased at the later stage of crops (i.e., the milk-ripe stage) [33].
This study also constructed yield prediction models for the VIs at multiple stages. When all ten VIs were used as the input variables, the accuracy of the models was lower than that of a single stage, which may be caused by the multicollinearity of the input VIs and the low correlation between some of the VIs and the yield. Some studies have also shown that the VIs at all stages have certain errors, thus the accumulated VIs for multiple stages may have accumulated errors in the VI variables, which may decrease model accuracy [73]. It is worth mentioning that in this study, when the ESCVIs at the flowering and filling stages were used as input variables, the yield prediction accuracy was improved to some extent, compared with the same model but at a single stage, which is consistent with previous studies [66,67].
The prediction of wheat yield at the field scale has significant practical significance for wheat production and planting planning. In this study, the spatial resolution of UAV remote sensing data was 5.4 cm, the area size of the sampling square was 1 m 2 , and the growth of wheat in the selected quadrat was relatively uniform. Therefore, the mean of the VIs of the quadrat was adopted for the VIs of the quadrat. It is worth mentioning that 3 m × 3 m and 5 m × 5 m sampling plots have rarely been tested in previous studies. It can be seen from Table 3 that the prediction error will not increase if 1 m 2 sampling squares fully represent the mean yield of each sampling plot. Thus, the optimal model (i.e., the GPR model constructed by the ESCVIs at the flowering and filling stages) can be applied to the yield predictions at a larger field scale. Previous studies have shown that the larger the spatial scale, the more positive and negative errors of the predictions will be canceled, and the higher the accuracy of the yield predictions [74]. It can be seen that UAV remote sensing data has a high degree of generality, and with increase in field size, the actual yield fluctuates less to a certain extent. Therefore, the yield prediction model has strong adaptability at different field scales, and this study can provide a reference and technical support for farm managers in field management and regulation.
Some uncertainties remain in this study. First, there are uncertainties in data quality, such as GPS RTK recording drift errors in the central position of the sampling squares and artificial harvesting and these measurement errors potentially create uncertainties in yield prediction. Second, different varieties of wheat have different dry matter accumulation [75], and their nitrogen accumulation in grains and various vegetative organs is also different [76]. The accuracy of the yield prediction model may change due to the different accumulation of dry matter and nitrogen. Therefore, yield prediction for different varieties of wheat can be studied in the future. Third, crop yields are affected by many factors, such as climate factors (e.g., temperature, rainfall, light, etc.), soil properties (e.g., PH, organic matter, humidity, etc.), and human management practices (e.g., fertilization, irrigation, pesticides) [77]. Due to small differences in variables at the field scale, this study has not considered these variables for the time being. However, these variables need to be considered in subsequent predictions of crop yield at a larger scale. For example, Han et al. used VIs, climate, soil and other variables to predict the county-level winter wheat yield in various agricultural regions in China [78]. In terms of research scale, the transition from field scale to regional scale should be gradually effected to achieve efficient, high-precision and large-scale crop yield prediction [79].

Conclusions
In this study, VIs extracted from multi-spectral UAV data and machine learning methods were used to construct six regression models or algorithms for the prediction of winter wheat yield at the field scale. The results showed that the yield prediction model established at the filling stage outperformed other single-stage models. They also showed that the yield prediction model established at the single stage by GPR, SVR and RFR showed better performance compared with the other three algorithms, among which the GPR model could predict the winter wheat yield more accurately before harvest. The GPR, SVR and RFR models for different stages and from the input of different combinations of some of the VIs were developed and compared. It was found that the optimal yield prediction model of a single stage was established by the ESCVIs of the filling stage. This result showed that if only one observation was provided in the predicted yield area, the best observation time was at the filling stage. Moreover, the GPR model combined with the ESCVIs at the flowering and filling stages was the best performer, i.e., the optimal model. This indicates that, if the yield prediction area can be observed many times, the model combined with the ESCVIs at the flowering and filling stages can improve the accuracy. Furthermore, the optimal model was tested using its predicted yield of large plots compared with the yield measurements, and we found that the optimal model had strong adaptability at different scales. The spatial distribution of wheat yield predicted by the optimal model for the fields studied was also compared with the reference of the UAV orthoimage and the predicted result was found to be reliable.
As indicated above, this study provides a reference and technical support for field management and decision making in small and medium-sized planting areas. However, there is still much to explore in crop yield prediction at the field scale. Regarding the limitations of this study, future work will focus on the following: (1) increasing samples of measured yield data to improve model accuracy and stability; (2) combining more types of crop physiological parameters to improve model accuracy; (3) establishing a cross-regional model to verify the universal applicability of the model; and (4) exploring a combination with satellite imagery to predict crop yield over a wide region possible.

Data Availability Statement:
The data presented in this study are available within the article.