A Novel Hybrid GOA-XGB Model for Estimating Wheat Aboveground Biomass Using UAV-Based Multispectral Vegetation Indices

: The rapid and nondestructive determination of wheat aboveground biomass (AGB) is important for accurate and efﬁcient agricultural management. In this study, we established a novel hybrid model, known as extreme gradient boosting (XGBoost) optimization using the grasshopper optimization algorithm (GOA-XGB), which could accurately determine an ideal combination of vegetation indices (VIs) for simulating wheat AGB. Five multispectral bands of the unmanned aerial vehicle platform and 56 types of VIs obtained based on the ﬁve bands were used to drive the new model. The GOA-XGB model was compared with many state-of-the-art models, for example, multiple linear regression (MLR), multilayer perceptron (MLP), gradient boosting decision tree (GBDT), Gaussian process regression (GPR), random forest (RF), support vector machine (SVM), XGBoost, SVM optimization by particle swarm optimization (PSO), SVM optimization by the whale optimization algorithm (WOA), SVM optimization by the GOA (GOA-SVM), XGBoost optimization by PSO, XGBoost optimization by the WOA. The results demonstrated that MLR and GOA-MLR models had poor prediction accuracy for AGB, and the accuracy did not signiﬁcantly improve when input factors were more than three. Among single-factor-driven machine learning (ML) models, the GPR model had the highest accuracy, followed by the XGBoost model. When the input combinations of multispectral bands and VIs were used, the GOA-XGB model (having 37 input factors) had the highest accuracy, with RMSE = 0.232 kg m − 2 , R 2 = 0.847, MAE = 0.178 kg m − 2 , and NRMSE = 0.127. When the XGBoost feature selection was used to reduce the input factors to 16, the model accuracy improved further to RMSE = 0.226 kg m − 2 , R 2 = 0.855, MAE = 0.172 kg m − 2 , and NRMSE = 0.123. Based on


Introduction
Wheat is the primary grain crop in China, with a 227,000-km 2 planting area and yield of 131.7 Mt in 2020; however, extensive wheat management has resulted in chronically low yields. The current wheat production cannot fully meet the demands of 1.4 billion people, and China has to import 8 Mt of grain on an annual basis. Therefore, it is essential to improve the management level of wheat farmland. Aboveground biomass (AGB) is an important indicator for monitoring agricultural ecosystems. It is not only closely related to crop growth monitoring, yield per unit area, and yield formation, but also the primary content of global climate change, carbon cycle, material flow, and energy exchange. For starters, was the best (R 2 = 0.78, RMSE = 2.86 t ha −1 , and MAE = 1.86 t ha −1 ) and could reduce the number of features from 27 to 9.
Moreover, the accuracy and efficiency of the ML models depended largely on the internal model parameters. To improve the accuracy of the simulation, model parameters must be determined in real-time. ML model calibration methods are primarily based on grid-search and gradient descent algorithms. However, these algorithms are complicated and prone to local convergence. Biological heuristic algorithms are extremely accurate and efficient and can provide global optimal solutions. The combination of biological heuristic algorithms and ML models can reduce the optimal solution to optimization problems and improve the calculation speed and performance of the model. Wang et al. [11] designed a hybrid model based on an SVM and binary coded ant colony optimization algorithm to classify remote sensing images. Dong et al. [12] evaluated the performance of four bio-inspired algorithms to optimize the KNEA model for predicting monthly reference evapotranspiration. They reported gray wolf optimizer algorithm outperformed other algorithms in different climate zones of China. As an effective ML model, XGBoost can accurately predict biomass. However, there are still certain limitations on parameter optimization: establishing an accurate relationship between variables and objectives and selecting the most appropriate input parameters. In this study, a state-of-the-art heuristic algorithm, the grasshopper optimization algorithm (GOA), was coupled with the XGBoost model to solve the limitation of parameter optimization and feature selection.
Because of the positive relationship between leaf biomass and VIs, a general model of leaf biomass across all growth stages should be realized. Except for Li [13], there were limitations to the AGB model across all growth stages. Furthermore, more than one hundred different VIs have been used to develop inverse AGB, and scholars are frequently confused about which VIs can best describe the dynamics of biomass during growth stages. It is necessary to devise a method for quickly selecting suitable combinations from a large number of VIs. Therefore, the main objectives of this study are as follows: (1) to develop a new hybrid model by coupling the GOA with XGBoost (GOA-XGB) and evaluate the accuracy of the new model in estimating AGB based on 56 types of multispectral VI data; (2) to compare the performances of six standalone ML models and a hybrid model coupling the GOA with SVM (GOA-SVM); and (3) to recommend the optimal ML model and parameter input combination for estimating wheat AGB.

Study Region
Winter wheat field experiments were performed during the growing seasons of 2020-2021 at the Gaoqiao Farm Experiment Station (34 • 25 39 N, 108 • 00 03 E, geographic WGS84), Shaanxi, China ( Figure 1). The soil texture type was classified as fine loam with organic content of 15.3-19.1 g/kg, nitrate-nitrogen (NO 3 -N) content of 8.21-41.71 mg/kg, ammonium nitrogen (NH 3 -N) content of 0.45-8.2 mg/kg, available potassium content of 60.58-109.61 mg/kg, and available phosphorus content of 3.14-21.18 mg/kg at the top 30-cm soil layer. This experimental station was characterized by a warm temperate and semi-humid continental monsoon climate. The minimum and maximum temperatures were 10 • C and 40 • C, respectively.
Wheat AGB was observed seven times on the same date UAV data were collected. In the test area, 32 sample plots were randomly selected to record wheat density. The area of each plot was 1 × 1 m, respectively. On the UAV observation day, ten wheat strains were randomly selected, harvested from the aboveground part of the stem, inactivated in an oven at 75 °C, and then dried to constant weight at 105 °C. The VIs and AGB data were split into two parts: 75% for training and validation of the ML model and 25% for model testing. Moreover, 75% of the data were randomly divided into five folds, four of which were used to train the model and the fifth to validate the model. This product was repeated five times and all folds were used to train the model four times and validate the model one time. The statistical value of AGB can be found in Table 3.  Wheat AGB was observed seven times on the same date UAV data were collected. In the test area, 32 sample plots were randomly selected to record wheat density. The area of each plot was 1 × 1 m, respectively. On the UAV observation day, ten wheat strains were randomly selected, harvested from the aboveground part of the stem, inactivated in an oven at 75 • C, and then dried to constant weight at 105 • C. The VIs and AGB data were split into two parts: 75% for training and validation of the ML model and 25% for model testing. Moreover, 75% of the data were randomly divided into five folds, four of which were used to train the model and the fifth to validate the model. This product was repeated five times and all folds were used to train the model four times and validate the model one time. The statistical value of AGB can be found in Table 3.
Note 1: G, R, RE, NIR indicates green, red, red edge, and near-infrared band reflectance, respectively. MLP is a type of forward feedback ANN, which is called a deep feedforward network. MLP has the property of mapping and can be considered a directed graph from the input vector to the output vector, where the data points between input and output vectors are nodes. The node layer is formed in homogenous vector data points for data transmission. During network data transmission from an input node to a lower layer, other nodes are neurons with activation functions. The neural network training algorithm is the most crucial part of the model application. The backpropagation algorithm is adopted in MLP training, and supervised training can overcome the weakness of unidentifiable data. A feedforward neural network can contain three types of nodes: input, implicit, and output nodes [39]. In this study, a three-layer network with an input layer, a hidden layer, and an output layer was used. The number of nodes in the input layer equals the number of input features. The trial and error method was used to debug the number of neurons in the hidden layer, and the number of nodes in the output layer was 1.

Gaussian Process Regression (GPR)
GPR is a nonparametric supervised model that uses Gaussian process priors to perform regression analysis on data. The model hypothesis of GPR includes noise (regression residual) and Gaussian process prior, and its solution is obtained as per Bayesian inference. Without constraining the kernel form, GPR is theoretically a universal approximation of arbitrary continuous functions in compact space [40]. Moreover, GPR can provide a posteriori of predicted results, which has an analytic form when the likelihood is a normal distribution. Therefore, GPR is a probabilistic model with generality and analyzability. We did not tune the parameters of GPR because it is a parameter-free algorithm.

Support Vector Machine (SVM)
The SVM model was established by Vapnik [41]. General models usually adopt empirical risk minimization theory, i.e., minimum cumulative error as the goal. To reduce the overfitting problem, the structural risk minimization theory was used to completely discard some points as noise. SVM models can be used for classification, pattern recognition, and regression analysis. For the regression analysis, the original problem is transformed into a convex quadratic programming problem so that the model has a unique global optimal solution. At the core of SVM are kernel functions, which can implicitly transform original low-dimensional input datasets into higher-dimensional feature spaces. The kernel function based on the radial basis function was used in this study because it has better applicability in prediction than other kernel functions, such as linear, polynomial, and S-shaped functions. Please refer to the literature for more information on the SVM model. A grid-search method was used to tune the parameters of SVM, which are the regularization coefficient and the width of the kernel function. The range of these two parameters was from 0.01 to 10,000; the step was 10 times the previous value.

Random Forest (RF)
The RF model, proposed by Breiman [42], integrates a set of decision tree ensemble learning methods with control variances using the "Bagging" idea. Compared with a single decision tree, this method integrates a set of CARTs to improve the model generalization ability and determine a more accurate and stable prediction pattern [42]. RF trains each tree using only a subset of the dataset, known as bootstrap samples. All decision trees predict results based on the voting mechanism. The remaining samples that do not belong to the bootstrap set, namely, out-of-bag samples, are used for internal cross-validation to improve accuracy and assist data importance assessment. A quantitative measure of the contribution of each input auxiliary data to the predictive program is known as RF variable importance. Compared with other ML algorithms, RF is insensitive to noise and overtraining. A grid-search method was also used. Two parameters were tuned in this study, namely, the number of rounds ranged from 100 to 1000 with step 100 and the max tree depth ranged from 2 to 30 with step 5.

Gradient Boosting Decision Tree (GBDT)
The GBDT model proposed by Friedman [43] has been extensively used for classification regression problems. It is based on the boosting strategy and uses the classification and regression tree (CART) as a weak classifier. In the GBDT model, a weak learner measure observes errors in each node and uses test functions to segment nodes. Weak learners require to be successively set up, and the residual of the previous weak classifier will be learned by the next weak classifier as the input state. The GBDT model uses the gradient descent strategy to accelerate the convergence of weak classifiers. Note that additional details are reported by Friedman [40]. This study tuned three parameters: the number of trees [100, 1000], the maximum tree depth [2,30], and the learning rate [0.001, 0.3]. The grid search technique was used.

XGBoost
XGBoost is a novel tree-based ensemble algorithm based on GBDT, which has many improvements. First, XGBoost uses a presorting mechanism to find the optimal split point. Feature columns are sorted and stored in blocks [44]. All features are presorted according to feature values and are stored in the first traversal. When traversing the segmentation points again, the best segmentation points on features are found with the cost of O (data number). The presorted data will be stored in memory as a block structure, which can be used repeatedly in subsequent iterations to significantly reduce the amount of computation [44]. Therefore, XGBoost supports parallel computing but not characteristic parallelism. Second, XGBoost optimizes the loss function into a second-order Taylor expansion form using both the first and second derivatives to accelerate the optimization speed. Third, the complexity of the tree model is added into the regular term to participate in the loss function calculation to avoid the overfitting problem. This study tuned three parameters: the number of trees [100, 1000], the maximum tree depth [2,30], and the learning rate [0.001, 0.3]. Moreover, the grid search technique was employed.

Grasshopper Optimization Algorithm (GOA)
In this algorithm, the exploration process is realized by imitating grasshoppers' sudden movement behaviors when searching for a food source, and the mining process is realized by imitating the grasshoppers' moving to a local food source and consuming food [45]. These behaviors of grasshoppers are performed naturally. The algorithm simulates the two behaviors of grasshoppers through a location update formula. The next location of grasshoppers is updated according to the current location, global target value, and locations of all other grasshoppers. The updated simulation mathematical model of grasshopper position in the D-dimensional space is as follows: where X i is the i-th grasshopper position vector; c w is similar to the inertial weight in the particle swarm algorithm, which balances the exploration and mining processes of the entire group around the target; ε and η are the upper and lower bounds of the search scope, respectively; s is the attraction function; d ij = x j − x i is the distance between the i-th and j-th grasshoppers; x i and x j are the location vectors of the i-th and j-th grasshoppers, respectively. T d is the optimal target value currently searched; c n is the parameter, which is the decreasing coefficient of the contraction comfort, repulsive, and attraction zones; the calculation formula is as follows: where c max is the maximum value, t is the current number of iterations, c min is the minimum value, and L is the maximum number of iterations.
where s(r) is the attraction function, u is the strength of attraction, r is the distance variable, and τ is the range of attraction. The literature explains how the functions of social forces (attraction or repulsion) change with the distance between grasshoppers. When a grasshopper is 2.0 units away from another grasshopper, there is neither an attraction nor repulsion--the comfort zone. When the value of the function s(r) is greater than 0, the grasshoppers are attracted to each other: the attraction zone. When the value of the function s(r) is less than 0, grasshoppers are in a state of exclusion from each other: the repulsion zone. Equation (4) is the distance between the i-th and j-th grasshoppers: Equation (5) is the unit vector from the i-th to j-th grasshopper: Through the interaction of grasshoppers, the algorithm gradually approaches the food source and eventually consumes the food with the linear decreasing algorithm of parameters to search the global optimal value. In this study, GOA was used to optimize the parameters of the SVM and XGBoost models (

Particle Swarm Optimization (PSO) Algorithm
Kennedy and Eberhart proposed the PSO algorithm in 1995 [46]. The similarity between PSO and GA is in the initialization of the population; both randomly generate initial solutions, but in PSO, a random velocity and position are set for each potential solution, called particles, and the particles fly in the problem space to find the optimal solution. untie. Particles have a simple behavior: they simulate the "success" of those around them as well as the "success" of the particle itself. The emergence of simple behavior groups from these simple individuals allows for an optimal solution search in high-dimensional spaces. The original PSO is frequently plagued by the premature convergence problem. The fitness-distance-ratio method [47] can resolve this problem, and the new equation is obtained as follows: where x i is the location of the i-th particle; v i is the corresponding velocity; p i is the i-th particle's best position; p g is the global best position; p n is the nearby best position. w is the inertia weight; r 1 , r 2 , and r 3 are the random numbers; c 1 , c 2 , and c 3 are the acceleration constants, and these values are equal to 2 in this study. The range of x i was the same as the GOA.

Particle Swarm Optimization (PSO) Algorithm
Kennedy and Eberhart proposed the PSO algorithm in 1995 [46]. The similarity between PSO and GA is in the initialization of the population; both randomly generate initial solutions, but in PSO, a random velocity and position are set for each potential solution, called particles, and the particles fly in the problem space to find the optimal solution. untie. Particles have a simple behavior: they simulate the "success" of those around them as well as the "success" of the particle itself. The emergence of simple behavior groups from these simple individuals allows for an optimal solution search in high-dimensional spaces. The original PSO is frequently plagued by the premature convergence problem. The fitness-distance-ratio method [47] can resolve this problem, and the new equation is obtained as follows: where xi is the location of the i-th particle; vi is the corresponding velocity; pi is the i-th particle's best position; pg is the global best position; pn is the nearby best position. w is the inertia weight; r1, r2, and r3 are the random numbers; c1, c2, and c3 are the acceleration constants, and these values are equal to 2 in this study. The range of xi was the same as the GOA.

Whale Optimization Algorithm (WOA)
Mirjalili et al. [48] developed the WOA algorithm, which simulates humpback whale prey behavior. The whales hunt for food using a technique known as bubble-net hunting, in which bubbles are created by encircling or passing through the "9-shaped path." When hunting, the humpback whale dives into the water more than 10 m deep and creates a spiral-shaped bubble that surrounds its prey. The bubble formed by the whale eventually contraction-rises to the surface, trapping prey in the swarm. The optimization process in Mirjalili et al. [48] developed the WOA algorithm, which simulates humpback whale prey behavior. The whales hunt for food using a technique known as bubble-net hunting, in which bubbles are created by encircling or passing through the "9-shaped path." When hunting, the humpback whale dives into the water more than 10 m deep and creates a spiral-shaped bubble that surrounds its prey. The bubble formed by the whale eventually contraction-rises to the surface, trapping prey in the swarm. The optimization process in the WOA algorithm begins by randomly initializing the whale population. The whales then use the wraparound method or the bubble-net hunting method to find (optimal) locations of their prey. Whales use two mechanisms to locate and attack their prey. First, the prey is surrounded, and the second involves making a net of bubbles. The WOA can be calculated as follows: (1) Searching and encircling prey: where A and C are coefficients and can be computed as follows: where a is the linearly decreasing coefficient from 2 to 0, r is the random number range [0, 1]. If A > 1, then prey behavior can be achieved around the best location of the whale: (2) Spirally updating location: In this section, some whales will randomly prey around the best location of the whale and others will prey obey the "9" path: where p is the random number ranging between 0 and 1, l varies from 0 to 1, b is a constant to describe the spiral shape and is set by 1 in this study.

Tune the Parameters of the Hybrid Machine Learning Models
In this study, the population of PSO, WOA, and GOA was set to 50, with a total of 200 iterations. The parameter range of different hybrid machine learning models was the same as that of their corresponding standalone models. To achieve these models, except for LSTM, the R language (R package 4.4) was used.

Statistical Indicators
We employed five commonly used statistical indicators to evaluate the performances of the model from different dimensions. The calculation formulas are as follows: • Root Mean Square Error (RMSE) • Mean Absolute Error (MAE) • Normalized Root Mean Square Error (NRMSE) • Percent of Bias (PBI AS) where n is the total number of data; O e and O m are the estimated AGB values by the ML models and measured AGB values, respectively; O m is the mean AGB value. R 2 is between 0 and 1; a value closer to 1 indicates better regression fitting between the estimated and measured AGB; hence, the better model performance. Meanwhile, RMSE or MAE being closer to 0 indicates a better model performance. RMSE is generally useful when model errors follow a normal distribution, whereas MAE is suitable for models with a uniform error distribution. To evaluate the model performance with NRMSE, four levels of indication are used, which are NRMSE ≤ 10%, 20% ≥ NRMSE > 10%, 30% ≥ NRMSE > 20%, and NRMSE > 40%, corresponding to perfect, good, fair, and poor model performances, respectively.

Linear Regression (LR) Model
The LR model showed that when there was only one input factor (Band G), the agreement between the simulated and observed AGB values was poor; R 2 = 0.383 (Table 4). When the number of input factors was 2 and 3, the multiple LR (MLR) accuracies slightly improved compared with those of the single-factor LR model, but the RMSE and MAE decreased by approximately 7%. When the number of input factors was 4, the model accuracy was no longer improved compared with the top three MLR models. When the number of input factors was 1, the GOA-LR model was identical to the corresponding LR model, showing that the optimization algorithm could not reflect its advantages when a single factor was used. When the number of input factors was 2, the GOA-MLR model (GOA-MLR1) had different input factors from the corresponding MLR model, and its accuracy was significantly better than that of the MLR1 model; RMSE and MAE decreased by 14.7% and 17.9%, respectively. When the number of input factors was 3, the GOA-MLR2 model was also superior to the MLR2 model; MRSE and MAE decreased by 15.8% and 11.6%, respectively. In addition, the accuracy of the GOA-MLR2 model slightly improved compared with the GOA-MLR1 model. However, the GOA-MLR2 model had no advantage over the GOA-MLR1 and GOA-MLR3 models, indicating that the MLR model could no longer describe the nonlinear relationship between VIs and AGB when the number of VIs was more than three.

ML Models with Single VI as Input
To investigate the VI that is most associated with AGB, 5 bands and 56 types of VIs were used to drive different ML models. The results are illustrated in Figure 3 and Table 5. Near-infrared (NIR) band ranked first in all six ML models. However, the accuracy of different ML models varied. When NIR was used as the input, the GPR model had the highest accuracy, with RMSE = 0.318, R 2 = 0.725, MAE = 0.267, and NRMSE = 0.173, and the deviation was small. The XGBoost model ranked second, followed by the SVM, MLP, GBDT, and RF models. Six ML models of VIs, which ranked second in accuracy, gave different answers, among which GBDT, RF, SVM, and XGB models showed CIg, whereas GPR and MLP models recommended MSR_G and GRVI, respectively. In terms of accuracy, the XGBoost model was the highest, which had an error slightly worse than the XGBoost and GPR models with NIR as input. In addition, the GPR model fed by MSR_G also had high accuracy and was superior to the GBDT, RF, and SVM models with CIg as input. Except for MLP, the third-ranked VI-driven ML models had the same accuracy as the second ones, indicating that, for the ML models, different VIs have numerical differences, but their effects are equivalent.

ML Models with All Features as Input
To preliminarily evaluate the performance of eight different ML models, all features, i.e., 5 bands and 56 types of VIs, were used as input to drive the models ( Table 6). The results showed that all ML models yielded good accuracies with NRMSE less than 0.2. In addition, there were no obvious overestimation and underestimation problems in all models (PBIAS > 0.1). However, different models varied slightly in accuracy; that is, the GOA-XGB1 model was superior to other models, with RMSE = 0.232 kg m −2 , R 2 = 0.847 (Figure 4), and MAE = 0.178 kg m −2 . The XGBoost1 model was slightly better than GOA in the consistency of simulated and predicted values, but in accuracy, it was slightly worse than the GOA-XGB1 model; RMSE and MAE increased by 6% and 5%, respectively ( Figure 4). The RF1 model ranked third, with 13.7% and 9.5% higher RMSE and MAE than the GOA-XGB1 model, respectively. The GBDT model was comparable to the RF model.
Compared with the GOA-XGB1 model, RMSE and MAE increased by 16.8% and 10.1%, respectively. The RMSE of the GPR1 model was comparable to the GBDT1 model, but its MAE was significantly lower than that of the GBDT1 model, which was 20% higher than that of the GOA-XGB1 model. The performance of the SVM model was poor; RMSE and MAE were 25% higher than those of the GOA-XGB1 model. Even when GOA was employed to optimize SVM's parameters, the accuracy was not significantly improved. The MLP model was significantly inferior to other models and could only explain 72.2% of the data from the perspective of R 2 ; its RMSE and MAE were 43% higher than those of the best model.

ML Models with Optimized Features as Input
When there are too many input factors in an ML model, especially irrelevant factors, the model will try to explain the relationship between noises and targets, resulting in the decline of model prediction ability. To further select important factors, we used the factors of the GOA-XGB1 model (whose gain value was greater than 0.01) as input to reinsert the eight ML models and evaluate whether the model accuracy improved. The results are shown in Table 7 and Figure 5. All eight ML models had good accuracy (NRMSE < 0.2). In general, each model did not have an overestimation or underestimation problem. The GOA-XGB2 model was superior to other models, with RMSE = 0.226 kg m −2 , R 2 = 0.855, and MAE = 0.172 kg m −2 . The RMSE and MAE of the XGBoost2 model were about 5% higher than those of the GOA-XGB2 model, and the XGBoost2 ranked second in accuracy. The RF2 and GPR2 models performed slightly worse than the XGB2 model and slightly better than the GBDT2 model. The performances of the SVM2 and GOA-SVM2 models were comparable and only better than that of the MLP2 model. Figure 6 shows that the importance rankings of the top 16 features from GOA-XGB2 were entirely consistent with GOA-XGB1 ( Figure 6). This demonstrates that removing redundant features improves the model's interpretability.

ML Models with Optimized Features as Input
When there are too many input factors in an ML model, especially irrelevant factors, the model will try to explain the relationship between noises and targets, resulting in the decline of model prediction ability. To further select important factors, we used the factors of the GOA-XGB1 model (whose gain value was greater than 0.01) as input to reinsert the eight ML models and evaluate whether the model accuracy improved. The results are shown in Table 7 and Figure 5. All eight ML models had good accuracy (NRMSE < 0.2). In general, each model did not have an overestimation or underestimation problem. The GOA-XGB2 model was superior to other models, with RMSE = 0.226 kg m −2 , R 2 = 0.855, and MAE = 0.172 kg m −2 . The RMSE and MAE of the XGBoost2 model were about 5% higher than those of the GOA-XGB2 model, and the XGBoost2 ranked second in accuracy. The RF2 and GPR2 models performed slightly worse than the XGB2 model and slightly better than the GBDT2 model. The performances of the SVM2 and GOA-SVM2 models were comparable and only better than that of the MLP2 model. Figure 6 shows that the importance rankings of the top 16 features from GOA-XGB2 were entirely consistent with GOA-XGB1 ( Figure 6). This demonstrates that removing redundant features improves the model's interpretability.    By comparing Tables 4 and 5, 75% of the models achieve higher accuracy after using fewer input factors. Compared with the GOA-XGB1 model, the RMSE and MAE of the GOA-XGB2 model decreased by 2.6% and 3.4%, respectively. Compared with the GOA-SVM1 model, the RMSE and MAE of the GOA-SVM2 model decreased by 4.7% and 9.2%, respectively. The XGB, RF, and GPR models had similar results. However, the accuracy of the GBDT and MLP models did not change or even decrease after using fewer input factors. SVM1 model, the RMSE and MAE of the GOA-SVM2 model decreased by 4.7% and 9.2%, respectively. The XGB, RF, and GPR models had similar results. However, the accuracy of the GBDT and MLP models did not change or even decrease after using fewer input factors.

Mapping AGB at Field Scale
The knowledge of wheat biomass at the field scale is helpful to evaluate crop growth potential, rationally plan the management of water and fertilizer, and achieve accurate management of farmland. In this study, the model with the highest accuracy, the GOA-XGB2 model, was used to simulate various dynamics of wheat AGB. The results are shown in Figure 7. The mean value of the plot was 1.49 ± 0.34 kg m −2 . AGB was the smallest in the central northern part of the plot, with an AGB value of 1 kg m −2 , whereas the biomass was larger in the southern part, with a value of more than 1.6 kg m −2 , mainly because there was only a small amount of irrigation in 2021. The entire plot had a certain slope, and the surface runoff mainly gathered in the southern region.

Mapping AGB at Field Scale
The knowledge of wheat biomass at the field scale is helpful to evaluate crop growth potential, rationally plan the management of water and fertilizer, and achieve accurate management of farmland. In this study, the model with the highest accuracy, the GOA-XGB2 model, was used to simulate various dynamics of wheat AGB. The results are shown in Figure 7. The mean value of the plot was 1.49 ± 0.34 kg m −2 . AGB was the smallest in the central northern part of the plot, with an AGB value of 1 kg m −2 , whereas the biomass was larger in the southern part, with a value of more than 1.6 kg m −2 , mainly because there was only a small amount of irrigation in 2021. The entire plot had a certain slope, and the surface runoff mainly gathered in the southern region.

Uncertainty of Observed Data
Using UAV to quickly and nondestructively collect spectral, texture, point cloud, and digital elevation information for wheat prediction is proven to be an effective method. The applicability of this method to different crops varies to some extent. For example, sparse vegetation is suitable for point cloud features, whereas tall vegetation can be aided by digital elevation. The uncertainty of using UAV to simulate wheat AGB mainly comes from two aspects: crop and image. First, when calculating wheat AGB at the field scale, individual growth differences between wheat plants within a given plot are not considered; it is assumed that wheat is generally at a similar growth level within a given plot. For destructive sampling, simply multiplying the average biomass by the number of plants may result in systematic errors and outliers in data. In addition, there are multisource errors in UAV remote sensing images, which affect the accuracy of wheat AGB estimation. Because the growth of wheat can cause changes in canopy structure, leaf development, and senescence, differences in radiation intensity on different dates and the uniformity of UAV shooting angle induce considerable uncertainties to data consistency. As VIs are susceptible to the mixed influence of canopy greenness and soil reflectance [49], the accumulation of wheat AGB is directly related to the change in wheat's physical structure. Using hyperspectral cameras to obtain rich spectral features of narrow bands to estimate AGB can reduce collinearity and redundancy of spectral predictors due to similar calculation formulas [7,50]. However, the high-cost requirements of the program pose great difficulties for popularization. Moreover, when four or five narrowband multispectral sensors are used to calculate VIs, a large number of VIs highly correlating with each other will be generated due to the close calculation formula and partial spectral information. We found similar problems, but we attempted to use a coupling algorithm to reduce the collinearity effect and screen out an ideal combination of features that can reduce the redundancy and overfitting problems of VIs. The results showed that the input features reduced from 5 bands plus 56 VIs (61 features) to 16 features, and the accuracy of the model further improved. Finally, farmlands are not completely controlled environments as laboratory experiments. Farmland can be affected by weeds, plant health, or

Uncertainty of Observed Data
Using UAV to quickly and nondestructively collect spectral, texture, point cloud, and digital elevation information for wheat prediction is proven to be an effective method. The applicability of this method to different crops varies to some extent. For example, sparse vegetation is suitable for point cloud features, whereas tall vegetation can be aided by digital elevation. The uncertainty of using UAV to simulate wheat AGB mainly comes from two aspects: crop and image. First, when calculating wheat AGB at the field scale, individual growth differences between wheat plants within a given plot are not considered; it is assumed that wheat is generally at a similar growth level within a given plot. For destructive sampling, simply multiplying the average biomass by the number of plants may result in systematic errors and outliers in data. In addition, there are multisource errors in UAV remote sensing images, which affect the accuracy of wheat AGB estimation. Because the growth of wheat can cause changes in canopy structure, leaf development, and senescence, differences in radiation intensity on different dates and the uniformity of UAV shooting angle induce considerable uncertainties to data consistency. As VIs are susceptible to the mixed influence of canopy greenness and soil reflectance [49], the accumulation of wheat AGB is directly related to the change in wheat's physical structure. Using hyperspectral cameras to obtain rich spectral features of narrow bands to estimate AGB can reduce collinearity and redundancy of spectral predictors due to similar calculation formulas [7,50]. However, the high-cost requirements of the program pose great difficulties for popularization. Moreover, when four or five narrowband multispectral sensors are used to calculate VIs, a large number of VIs highly correlating with each other will be generated due to the close calculation formula and partial spectral information. We found similar problems, but we attempted to use a coupling algorithm to reduce the collinearity effect and screen out an ideal combination of features that can reduce the redundancy and overfitting problems of VIs. The results showed that the input features reduced from 5 bands plus 56 VIs (61 features) to 16 features, and the accuracy of the model further improved. Finally, farmlands are not completely controlled environments as laboratory experiments. Farmland can be affected by weeds, plant health, or seeding losses that involve canopy changes in crop management; however, we did not consider this in this study, which is a limitation.

Comparison of Different Models
Certain studies report that the MLR model's obvious advantage is that it is highly explicable [51]. Therefore, the MLR model can use the standardized partial regression coefficient to determine the impact strength of one or more predictive variables on response variables. However, we found that when the MLR model had more input features, its accuracy improved significantly, indicating that MLR is only interpretable in finite dimensions and is not strong for complex problems.
In this study, the performance of ML models is significantly better than that of the MLR model because the nonlinear regression model can judge the nonlinear relationship in data. We explored twelve ML regression algorithms, all of which yielded acceptable accuracies. Tree-based algorithms achieved high precision. This is similar to the results by Han et al. [1] who compared four ML algorithms and recommended the RF model. Notably, there are limitations in model comparisons. Owing to the small sample size, the advantages and disadvantages of using different modeling strategies have not been fully demonstrated. For example, the ANN model has a high requirement for the number of samples. When the number of neurons in the hidden layer is large, small sample data will make the model overfit. SVM also lacks a clear internal working mechanism, especially when it contains enormous noise, its parameters become very sensitive, resulting in great uncertainties. The GPR model has a good effect when the data conform to the Gaussian distribution, but when the data distribution is irregular, the model produces considerable uncertainties. The tree-based models (GBDT, RF, and XGBoost) were more dependent on feature selection and could sort the contribution of features and eliminate features with low contribution, which has obvious advantages when the data dimensions are high and there is correlation between them. Particularly, after the optimization algorithm is used, the model has higher precision, and its structure becomes more simplified.

Determining the Most Important VIs
Han et al. [1] found that BIOVP was a volume indicator to estimate corn AGB, and this indicator was highly correlated with some spectral indicators, such as NGRDI and VARI. Because BIOVP contains both spectral and plant height information, they affect the BIOVP calculation accuracy. Jannoura et al. [52] obtained a significant relationship between NGRDI of peas and oats and AGB. However, other authors have described saturation of maximum leaf-area index values for corn, soybean, and alfalfa [53]. There are also studies that used point cloud computing based on digital images. Compared with other commonly used indicators, the CSM has certain applicability because it can avoid the influence of saturation, i.e., the saturation of NDVI in a later period. In this study, owing to the low plant height and high density of wheat, we found that the applicability of this index was low after trying. The top five contribution degrees of the optimal model parameter combination determined in this study were B, G, NGI, NREI, and MCARI4. Notably, except for MCARI4, all other parameters reached their maximum value at the end of growth, which differed from NDVI and other indices, which decreased with a decrease in chlorophyll content of leaf species. In addition, some authors [54,55] reported that it was difficult to obtain good growth monitoring results at early stages due to low biomass rates, mainly because the differences in VI characteristics from mid-late growth were relatively pronounced. We obtained similar results. However, this problem can be alleviated by choosing a better model, such as the GOA-XGB2 model.

Effect of Growth Stage on VIs-AGB Model
The most common method for developing AGB models is to create multiple models for different crop growth stages [56]. This is due to the fact that wheat exhibits different spectroscopic characteristics at different growth stages, and it is extremely difficult to establish an AGB prediction model for the entire growth stage using just one or few vegetation indices. For example, while NDVI is thought to have a good relationship with AGB, it also suffers from the problem of light saturation, which means that NDVI no longer increases during the flowering and grain-filling stages of wheat while biomass continues to increase. Li et al. [13] achieved good results by incorporating accumulated temperature, days after sowing, and other data to assist the vegetation index in establishing the AGB model of the entire growth stage. However, one potential limitation of this method is that the growth accumulation temperatures of different wheat varieties vary, and a lack of water and fertilizer can also affect wheat growth, so information such as GDD may also be a model uncertainty factor. This study proposes a general AGB prediction model (GOA-XG2) for wheat at all growth stages that is not dependent on information, such as GDD. It mines various vegetation index combinations using machine learning to simulate the characteristics of wheat at various stages, and it achieves extremely high accuracy. The new GOA-XGB2 method has screened out 16 combinations of vegetation indices and bands that could reflect the growing trend of AGB during the growth period, which is the main reason for its feasibility. Geng et al. [9] used the XGBoost model to simulate AGB and reduced the number of features from 27 to 9.

Conclusions
Rapid and nondestructive determination of wheat AGB is crucial for accurate and efficient agricultural management. In this study, we established a new model, named XGBoost optimization by GOA (GOA-XGB), which can accurately determine an ideal VI combination for inversion of wheat AGB. GOA-XGB was compared with many state-ofthe-art models. The results showed that SMLR and GOA-MLR models had poor prediction accuracy for AGB, and the accuracy did not improve significantly when the input factors were more than three. Among single-factor-driven ML models, the GPR model had the highest accuracy, followed by the XGBoost model. When the input combination of multispectral bands and VIs were used, the GOA-XGB model with 37 input factors had the highest accuracy, with RMSE = 0.232 kg m −2 , R 2 = 0.847, MAE = 0.178 kg m −2 , and NRMSE = 0.127. When the XGBoost feature selection was used to reduce the input factors to 16, the model accuracy further improved to RMSE = 0.226 kg m −2 , R 2 = 0.855, MAE = 0.172 kg m −2 , and NRMSE = 0.123. Based on the developed model, the average AGB of the plot was 1.49 ± 0.34 kg. The above results show that the newly established model has high accuracy and is helpful in accurately estimating the AGB of wheat at any growth stage at the farmland scale.
The results of this study confirm that the GOA-XGB model (driven by the combination of vegetation indices) can accurately predict the AGB in the whole growth stage of wheat. The limitation of this study is that it was only evaluated in one wheat field. In the future, we hope to apply the model to satellite remote sensing platforms to conduct large-scale wheat AGB and related estimates. In addition, we wish to evaluate the GOA-XGB model and related models for their ability to predict leaf nitrogen concentration and yield, and estimate nitrogen dynamics in crop ecosystems.