Improvement of Wheat Grain Yield Prediction Model Performance Based on Stacking Technique

: Crop growth and development is a dynamic and complex process, and the essence of yield formation is the continuous accumulation of photosynthetic products from multiple fertility stages. In this study, a new stacking method for integrating multiple growth stages information was proposed to improve the performance of the winter wheat grain yield (GY) prediction model. For this purpose, crop canopy hyperspectral reﬂectance and leaf area index (LAI) data were obtained at the jointing, ﬂagging, anthesis and grain ﬁlling stages. In this case, 15 vegetation indices and LAI were used as input features of the elastic network to construct GY prediction models for single growth stage. Based on Stacking technique, the GY prediction results of four single growth stages were integrated to construct the ensemble learning framework. The results showed that vegetation indices coupled LAI could effectively overcome the spectral saturation phenomenon, the validated R 2 of each growth stage was improved by 10%, 22.5%, 3.6% and 10%, respectively. The stacking method provided more stable information with higher prediction accuracy than the individual fertility results (R 2 = 0.74), and the R 2 of the model validation phase improved by 236%, 51%, 27.6%, and 12.1%, respectively. The study can provide a reference for GY prediction of other crops.


Introduction
Wheat is one of the most widely cultivated food crops in the world, with China ranking first in terms of yield and sales [1]. Therefore, a timely and accurate prediction of grain yield (GY) can not only help guide agricultural production management and assist relevant government departments to formulate scientific food policies, but also has significance for mitigating climate risks, ensuring national food security and economic stability [2][3][4][5][6][7]. The essence of the formation of the yield is the accumulation of dry matter produced by photosynthesis in different growth stages. The level of GY is influenced by complex factors such as light, soil, moisture, and atmosphere, which poses a great challenge to the accurate prediction of GY [8]. The conventional GY measurement method is not only expensive and less efficient, but also harmful to crops, making it unsuitable for large-areas crop yield estimation [9,10]. In recent decades, remote sensing technology has been successfully applied to crop growth monitoring through satellite platforms, manned airborne platforms, and ground spectral equipment [11]. However, several limitations such as deficient temporal-spatial resolution and cloud cover contamination restrain the application of satellite-based platforms [12]. In comparison, near-ground hyperspectral remote sensing data is not only cost-effective, but also has narrow band, strong continuity, and large spectral information contents [13]. In recent years, scholars at home and abroad have carried out research and made progress in yield estimation of various crops using remote sensing technology [14][15][16]. To the best of our knowledge, most current studies construct GY prediction models by using a single index such as spectral reflectance, spectral index, or vegetation index. However, as the plants grow, spectral saturation occurs when the crop coverage is high, making it difficult to continue improving GY prediction accuracy.
The essence of crop yield formation is the continuous accumulation of dry matter produced by photosynthesis in the plant body [17,18]. It was found that LAI determines many biophysical processes, such as photosynthesis, respiration, and transpiration in crops [19,20]. LAI is used as a common monitoring variable during crop growth and development and has a close relationship with final crop yield [21]. For example, Li et al. [22] used UAV-based hyperspectral remote sensing data to evaluate correlations between soybean yield and physiological and ecological parameters such as plant height, LAI, fresh biomass, and dry biomass, and identified LAI and fresh biomass as key indicators of soybean yield. Dente et al. [23] proved that LAI was closely related to GY by studying the influence of LAI at different growth stages of wheat on GY estimation and concluded that LAI can be used as an important index for GY estimation. Delécolle et al. [24] used remote sensing inversion data on the LAI as the input parameter of an output estimation model to improve the accuracy of the output estimation. Based on this, an important objective of this study is to use the winter wheat canopy structure parameter LAI coupled with vegetation indices as input features for yield prediction, in order to effectively overcome the canopy spectral saturation phenomenon.
The level of GY mainly depends on the number of ears, grains, and thousand seed weight of the crop [25]. During the growth stage of wheat from the jointing stage to the flag-raising stage, the LAI increases quickly, which is more conducive to accelerating the photosynthesis of plants. This period plays a decisive role in the number of ears of wheat. The number of grains in wheat mainly depends on the intensity of the photosynthesis from the flagging stage to the anthesis stage. In this growth period, there is a transition from vegetative growth to reproductive growth, and the nutrient transfers from the leaves to the ear, so it is the key period of grain formation and accumulation. After the anthesis stage, the number of grains per panicle is determined. The thousand seed weight of wheat is mainly determined by the intensity of photosynthesis and the rate of energy transfer from the anthesis to the grain filling stage [26,27]. Therefore, the physiology, shape, and structure of the crop vary throughout the growth cycle from germination, flowering, and fruiting to senescence and death, and the formation of GY occurs sequentially and overlappingly at different growth stages [28]. A study has shown that each key growth period of crop is closely related to the formation of the final GY [4]. Most existing studies considered the grain filling stage as the key stage of GY formation; therefore, it was considered to have the best prediction accuracy in many studies [29]. Hernandez et al. [30] reached a similar conclusion in that a penalized ridge regression model constructed using spectral reflectance data at the anthesis or grain filling stages could effectively predict GY at different water levels. However, the GY prediction model constructed using only a single critical growth stage will lose part of GY correlation information, and the prediction results may vary with the changes in the vegetation types, growth stages, and environmental conditions. Moreover, the robustness of the model is not strong, and the generalization ability is weak [31,32]. Based on this, many researchers have proposed multitemporal vegetation index-based GY prediction models to improve the accuracy and universality of model prediction. For example, Patel et al. [33] used multitemporal Terra/MODIS satellite data to simulate GY in the west of Uttar Pradesh, India. Li et al. [34] used the support vector regression method to construct a GY prediction model based on the normalized vegetation index obtained from multitemporal Landsat TM images, which effectively improved the prediction accuracy. Li et al. [35] used a large regional remote sensing GY estimation model based on multifertility MODIS-NDVI data and obtained a higher estimation accuracy than the single-fertility GY prediction model. Ghizlane et al. [36] obtained multispectral data of the four key growth periods of wheat based on the UAV remote sensing platform and estimated GY using the method of random forest combined with the vegetation index. To the best of our knowledge, there are no studies related to the ensemble of ground spectral data and LAI data from multiple growth stages of crops for yield prediction. Therefore, it is important to find a method that can combine the advantages of multiple growth stages to achieve an improved prediction accuracy and robustness for grain yield prediction.
As an important part of ensemble learning, the stacking method trains a secondary model through "learning method", which is used to combine other primary models. Most existing studies applied this method to heterogeneous models to integrate and improve the final prediction accuracy of the model using the differences between the primary models [37]. Currently, this method has been applied to the prediction of plant photosynthesis capacity, map synthesis, and forest coverage monitoring [38][39][40]. Considering the heterogeneity of the canopy spectral information in the different growth stages of crops, multiple key growth stages related to GY formation were selected, and the integration of hyperspectral data and phenotypic data of the multiple growth stages based on the stacking method was studied. This was helpful to improve the estimation accuracy and generalization ability of the GY prediction model.
In recent years, machine-learning techniques using spectral reflectance data as input to build predictive models for plant traits have been employed and have shown improved prediction accuracy and robustness [41]. The elastic net algorithm is a type of neural network first proposed by Durbin and Willshaw [42]. Recently, the elastic network algorithm has been proven to achieve excellent results in machine learning, which can effectively avoid over fitting of the model prediction results due to small sample size and high dimension. For example, based on the elastic network regression model, Wang et al. [43] constructed the transmission and distribution cost allocation model of a power grid project to realize a reasonable allocation of operation and maintenance cost. Chen et al. [44] proposed a dam deformation prediction model based on the extreme learning machine and elastic network. The elastic network algorithm effectively alleviated the over fitting phenomenon, which remains challenging for the extreme learning machine, and the prediction results showed better stability. These results indicate that elastic networks can be used to solve regression problems.
To our knowledge, no studies have applied the strategy of combining the ENET algorithm and stacking techniques to stack LAI and hyperspectral data from multiple growth cycles to construct an ensemble learning framework for GY prediction. The main objectives of this study are as follows: (1) To assess the ability of winter wheat canopy hyperspectral vegetation indices and LAI to be used for grain yield prediction based on the ENET algorithm; (2) to study the effect of Stacking technology in improving the accuracy of grain yield prediction based on ensemble learning; (3) to analyze the influence of each individual growth stage on the development of an accurate integrated learning model. To achieve these goals, the correlations between 15 vegetation indices and LAI from four growth stages and yield were analyzed. Then, yield prediction models for individual growth stage of winter wheat were constructed based on the ENET algorithm in two cases of vegetation indices without or with coupled LAI, respectively. Finally, stacking techniques were used to integrate the yield prediction capabilities in the four growth stages, develop an ensemble learning framework to improve the robustness of the model, and analyze the weights attributed to each growth stage based on the prediction results.

Experiment Location and Research Material
Field trials were conducted in Yongxingtun village, Shanyang District, Jiaozuo City, Henan Province, China, during two cropping seasons in 2016-2017 and 2017-2018. Figure 1 showed the specific location of the study area. The experimental area was 110 m in length from east to west and 38 m in length from north to south, with each plot measuring 6 m × 8 m. There were 48 plots, 16 treatments, and three replications. The orthogonal experiment was conducted with different amounts of nitrogen fertilizer, different moisture contents, and different crop varieties (Table A1). Four levels of N fertilizer were set: 0, 0.0195, 0.0390, and 0.0585 kg/m 2 , respectively; three levels of water irrigation were set: rainfed (W1, no irrigation), normal water (W2, irrigation water 192 mm), and twice normal water (W3, irrigation water 384 mm); there are two crop varieties: Jing 9843 (J9843) and Zhong Mai 175 (ZM175). The experimental design scheme was described in Figure A1.  Figure 2 showed the general framework of this paper, which consists of four main parts. (1) Data acquisition: simultaneous acquisition of winter wheat canopy hyperspectral data and LAI data at the jointing, flagging, anthesis, and grain filling stages, and acquiring the GY data during the winter wheat harvest period; (2) Data processing: pre-processing and statistical analysis of measured data using ViewSpecPro, SPSS, Origin and Excel, and construction of various vegetation indices; (3) Data analysis: to explore the relationship between GY and LAI and vegetation indices, respectively, through correlation analysis; (4) Model framework construction: first, 15 vegetation indices and LAI coupled vegetation indices were used as input features for the ENET model to construct individual growth stages GY prediction models. Subsequently, based on the Stacking technology, the GY prediction results of the different growth stages were stacked to construct the GY prediction model that integrates multiple growth stages. Finally, the accuracy of each model was evaluated to screen the optimal model for winter wheat GY prediction.

ASD Canopy Hyperspectral Data
The canopy spectral reflectance of winter wheat measurements was performed using an ASD FieldSpec spectroradiometer. The spectral range of the ground object spectrometer was 350-2500 nm, the spectral sampling interval was 1.4 nm in the range of 350-1000 nm, the spectral sampling interval was 2 nm in the range of 1000-2500 nm, the resampling interval was 1 nm, and the field of view of the spectrometer was 25 • [45]. To obtain a higher-precision spectral reflectance, the data were collected between 10:00 and 14:00 Beijing time under clear and windless weather conditions. Before obtaining the spectral reflectance, a BaSO4 calibration panel was used for the spectral calibration to eliminate the influence of sky light change on the spectral measurement [46]. The whiteboard was calibrated once in about 15 min in sunny and windless weather. When collecting the spectral reflectance, the probe should be kept vertically downward, and the height should be stable at approximately 0.6-1.0 m above the canopy. Each plot randomly selected four sample points that represent the overall growth of the wheat for spectral data collection. The parameter setting was that each sample point automatically collected 10 spectral curves, so that 40 spectral curves could be obtained from each plot. After collecting the spectral data, the ViewSpec Pro software was used to process the spectral reflectance data. The average value of 10 spectral curves was taken as the reflectance of a sample point. After processing, the reflectance was derived, and the average reflectance values of the four sample points in each cell were obtained. The average reflectance values of the four sample points in each cell were then averaged again; finally, the canopy spectral reflectance data of each experimental plot were obtained. In this study, the average spectral reflectance of the winter wheat canopy was obtained at the jointing stage, flagging stage, anthesis stage, and grain filling stage.

Data Collection of Grain Yield
GY data were obtained after winter wheat ripening. To ensure the accuracy of data collection, a plot combine harvester was used for harvesting, and each plot was then individually bagged and dried. When the grain moisture content was approximately 12.5%, GY was measured by weighing, GY unit was kg/hm 2 , and GY data of 48 plots were obtained. In addition, the yield characters of different treatments were significantly different (Table A2).

Data Collection of Leaf Area Index
Leaf area index (LAI) was measured by using the LAI 2200 plant canopy analyzer (LI-COR Inc., Lincoln, NE, USA) at the jointing, flagging, anthesis, and grain filling stages of winter wheat. Firstly, the LAI 2200 were performed backlit measurement in an open area to ensure that all LAI data obtained were accurate and valid. Then, the LAI 2200 was placed at the sample point parallel to the ridge and perpendicular to the ridge, respectively, with the probe placed close to the wheat plant. For wheat LAI measurement, three sample points were randomly selected for each plot. LAI values were measured for 4 times, and the arithmetic mean value was taken as the LAI of the sample point, and finally the average of the LAI values at the three sample points was calculated as the LAI value of the plot. There were significant differences in the statistical characteristics of LAI between different treatments at different growth stages (Tables A3-A6).

Method 2.3.1. Construction of Vegetation Index
The acquired hyperspectral data contain hundreds of spectral bands, and many adjacent bands are highly correlated. To reduce the data dependency, instead of using all the original bands, we extracted the narrow-band indices as spectral features and used them for modeling the winter wheat yield [36]. As listed in Table 1, 15 vegetation indices were calculated as secondary traits to predict the grain yield in this study. These indices showed resistance to environmental factors such as background (e.g., soil, atmosphere, etc.) and showed potential in assessing yield and physiological and biochemical parameters of plants.

Correlation Analysis
A correlation analysis is a consistency analysis of the change trend in two or more sets of data, and the magnitude of the correlation coefficient is often used to assess whether the relationship between each set of data is close and its degree. The correlation coefficient is obtained by dividing the covariance of two random variables by the standard deviation. The correlation coefficient lies between 1 and 1. The greater its absolute value, the greater the degree of correlation. When its value is close to 0, it indicates that there is no correlation. The calculation method is as follows: where ρ X,Y represents the correlation coefficient, and cov(X, Y) and σ represent the covariance and standard deviation, respectively.

Elastic Network Algorithm
The elastic network (ENET) algorithm uses an adjustable convex combination of L1 regular and L2 regular, by introducing an additional parameter α to control the ratio of L1 regular and L2 regular term [62]. Thus, the objective function of defining ENET is as follows: The ENET algorithm realizes the linear combination of ridge regression and lasso regression by introducing L1 and L2 penalty terms into the objective function [63]. The L1 regular term in the Lasso regression can avoid the loss of important variables by screening the variables and eliminating irrelevant variables, which helps improve the prediction accuracy of the model. The L2 regularized shrinkage regression coefficient in the ridge regression can achieve the stability of model prediction by reducing the error. The elastic network regression algorithm can automatically solve a specific objective function. The solution method is generally to determine the minimum value of the objective function through continuous iteration. When the objective function value reaches the local minimum or the global minimum, the network stops running. The algorithm does not require manual training and can ensure the global convergence of the network [64]. It can balance the complexity of the model while maintaining the prediction accuracy. It has the advantages of strong robustness and stable operation, which is fully reflected in this study.

Stacking Technology
Ensemble learning can achieve better generalization performance than using a single learner by constructing and combining multiple learners. Based on the different strategies of the learner combination, it can be divided into boosting, bagging, and stacking learning methods [37]. Among them, stacking is a typical representative of the learning method, which learns how to best combine the predictions from multiple machine learning models with good performance and is typically used to generate a set of heterogeneous prediction variables. The basic idea of stacking regression is as follows: first, the entire training set is divided into T blocks by bootstrapped sampling, and the subset of T-1 blocks is randomly selected for model training of the primary learners. The model is tested on the remaining subset. Subsequently, through the training of the primary learner, the output of various learners is used as input of the next level to train the secondary learner to obtain a final output. Figure 3 showed the working principal diagram of this study based on the use of the stacking regression technology to predict the yield of winter wheat. The stacking method of different growth periods is similar to the stacking method of the regression model, and the entire process is divided into two levels. In this study, the ENET was used as the primary model and the secondary model for testing, to effectively avoid collinearity between the yield prediction results of each growth period of the winter wheat because of the overlap of information. First, the dataset was randomly and evenly divided into 10 parts, 9 of which were randomly selected as the original training dataset to train the first-level learner, and the remaining one was used as the model test set. Subsequently, the primary model was trained with ten-fold cross-validation to obtain the prediction of each fold, and the results were collected in the prediction matrix outside the sample. At the same time, the 10 models trained in the test set were used to predict the yield of winter wheat. The results produced 10 groups of predicted values in the test set, and the average predicted value was calculated. In this study, the regression models of the four growth stages were tested, and the outputs of each primary model were used as input to the secondary model to train again, thus integrating the prediction ability of the multiple growth stages. Similarly, ten-fold cross-validation was also used in the secondary model to obtain the final predicted value to minimize model uncertainty. To compare the prediction accuracies of the different models more fairly, the dataset was divided into training set and test set with a sampling strategy of 9:1, which was repeated 100 times, and the same ten-fold cross-validation was performed on the primary model and the secondary model in each division process. Finally, 1000 models were generated, and the average values of the coefficient of determination (R 2 ) and root-mean-square error (RMSE) on these 1000 test sets were used as the final evaluation index of the model. The stacking regression framework was built in R v4.0.2.

Cross-Validation
Cross-validation [65,66], also known as rotation estimation, this theory was proposed by Seymour [67]. It is a statistically practical method of cutting data samples into smaller subsets. Typically, it is used as an accuracy test and hyperparameter screening method when the amount of sample data is small. This study used 10-fold cross-validation to determine the best parameters of the model to obtain a model with higher accuracy. By dividing the dataset into 10 parts, 9 of them were used as training data in turn, and the remaining one was used as test data. Each test yielded a corresponding accuracy. The average error rate of 10 results was used to estimate the accuracy of the algorithm. The advantage of this method is that the randomly generated sub samples could be repeatedly used for training and verification, and the results could be verified each time.

Model Performance Evaluation
The coefficient of determination (R 2 ) and the root-mean-square error (RMSE) were selected as the evaluation indicators of the modeling accuracy and verification accuracy. The closer the R 2 value to 1 and the lower the RMSE value, the better the model fitting effect. The calculation method is as follows: where x i is the measured value of the yield; y i is the predicted value of yield; y is the average yield; n is the number of samples.

t-test
The t-test [68], also known as the student's test, uses the t-distribution theory to infer the probability of the differences. Typically, in the case of small sample data, the t-test is performed to determine whether the p value is greater than 0.05 to compare whether the difference between the two sets of data is significant. If p < 0.05, there is a significant difference between the two groups. If p > 0.05, there is little possibility of a significant difference between the two groups. The calculation formula is as follows: where X 1 and X 2 are the mean values of two samples; S 2 1 and S 2 2 are their variances; n 1 and n 2 are the two sample sizes, respectively.

Phenotypic Analysis
The measured GY of each plot was obtained in the winter wheat harvest period, and the main statistical parameters of GY from different growing seasons were shown in Table 2.
The results show that the coefficient of variation (CV) values of GY from both growing seasons separately were similar, while the CV of overall GY from both growing seasons had elevated compared to the results from single growing season only. LAI measured values of each plot were obtained the jointing, flagging, anthesis, and grain filling stages of winter wheat, and the main statistical parameters and variation characteristics of LAI were shown in Table 3. Overall, the mean values of LAI from different growing seasons all showed a trend of increasing and then decreasing with the advancement of the fertility period, with the greatest increase rate of LAI from the jointing stage to the flagging stage and the peak of LAI at the anthesis stage, and the smallest mean value of LAI at the grain filling stage. In addition, the CV of LAI from single growing seasons were similar, while the CV of LAI from both growing seasons were significantly higher.

Correlations between Hyperspectral Vegetation Indices and Grain Yield
A correlation analysis between GY and 15 hyperspectral vegetation indices and LAI at the jointing, flagging, anthesis, and grain filling stages of wheat was conducted, as shown in Figure 4. The results showed that at the jointing stage, the absolute value of the correlation coefficient (|r|) between GY and 15 vegetation indices and LAI ranged from 0.39 to 0.49, and the absolute value of the correlation coefficient between NDWI and GY was the highest, with the best correlation (|r| = 0.49). At the flagging stage, the |r| value between GY and indices ranged from 0.47 to 0.67, among which the absolute value of the NDWI and GY correlation coefficient was the highest, that was, the strongest correlation (|r| = 0.67). At the anthesis stage, the |r| value between GY and the indices ranged from 0.48 to 0.81, and the absolute value of the correlation coefficient between NDWI and GY was the highest (|r| = 0.81). In the grain filling period, the |r| value between GY and the indices was between 0.7 and 0.82, and the correlation between the VREI and GY was the strongest (|r| = 0.82). The results showed that the correlation coefficients of LAI and GY were 0.47, 0.67, 0.68, and 0.78 at the jointing, flagging, anthesis, and grain filling stages, respectively. Overall, the correlation between GY and the hyperspectral vegetation indices and LAI showed the gradually increasing trend as the growth period progressed. All indices of each growth period were significantly correlated with GY (p < 0.001), which could be used to predict the yield of winter wheat. The correlations between different indices were shown in the attached Figure A2.

Elastic Network for Grain Yield Prediction
Based on the ENET, the vegetation indices and the vegetation indices coupled LAI were used as the input factors for the model to construct GY prediction models for four growth stages, respectively. Based on the prediction results, the box-plot of GY prediction accuracy for each growth stage was drawn with or without LAI, as shown in Figure 5. Only 15 vegetation indices were used as the input features of the model to construct the GY prediction model for each growth period. The results showed that GY prediction accuracy at the jointing stage was R 2 = 0.20, RMSE = 1.33 t ha −1 ; the GY prediction accuracy at the flagging stage was R 2 = 0.40, RMSE = 1.18 t ha −1 ; the GY prediction accuracy at the anthesis stage was R 2 = 0.56, RMSE = 1.06 t ha −1 ; the GY prediction accuracy during the grain filling period was R 2 = 0.60, RMSE = 0.94 t ha −1 . The above results show that the prediction accuracy R 2 of the GY prediction model based on the vegetation indices increased with the growth period and that the RMSE decreased with the growth of winter wheat, indicating that GY prediction ability for winter wheat increased with the growth period and reached the best prediction effect at the grain filling stage.
To study the influence of LAI on the accuracy of winter wheat GY prediction model, 15 vegetation indices of the four growth periods coupled with LAI of the same growth period were used as input factors for the model to construct a single-growth-period GY prediction model. The results showed that R 2 = 0.22, RMSE = 1.31 t ha −1 at the jointing stage, R 2 = 0.49, RMSE = 1.1 t ha −1 at the flagging stage, R 2 = 0.58, RMSE = 1.02 t ha −1 at the anthesis stage, and R 2 = 0.66, RMSE = 0.88 t ha −1 at the grain filling stage. Overall, the prediction accuracy of GY of winter wheat gradually increased with the growth period, and the prediction accuracy of the model at the grain filling stage was the highest.
The results showed that the 15 vegetation indices could be coupled with LAI to construct a GY prediction model. During the jointing, flagging, anthesis, and grain filling stages, the average R 2 of GY prediction model increased by 10%, 22.5%, 3.6% and 10%, the average RMSE were reduced by 1.5%, 6.7%, 3.8%, and 6.4%, respectively. A comparative analysis of the above results showed that GY prediction accuracy of each growth period had been improved after coupling the vegetation indices with LAI. Among them, the GY prediction accuracy of the flagging stage had improved the most.
To further verify the performance of GY prediction by coupling LAI with the vegetation indices, a paired t-test was used to evaluate whether there was a significant difference in the impact of vegetation indices on GY prediction accuracy when coupling with LAI, as listed in Table 4. The results showed that the p-values for all four growth stages were less than 0.05, indicating that the vegetation indices coupled with LAI were used to construct GY prediction models with statistically significant improvement in prediction accuracy. The T value of the flagging stage was the highest (t = 32.264), and the accuracy was improved most evidently.
As shown in Figure A3, the Pearson's correlations between the GY predictions for the four individual growth stages were high (r = 0.6-0.95). By contrast, the density curves of the grain yield predictions were slightly different from each other. Moreover, the base learners differed significantly in the distribution interval of the prediction accuracy.

Comparative Analysis of Grain Yield Prediction Accuracy from Multiple Growth Stages
To evaluate the performance of the stacking method, the ENET algorithm was used as a secondary model to construct GY prediction model by integrating the prediction results of the four growth stages with and without coupled LAI in two cases, respectively. The accuracy of GY prediction results obtained by combining four of the base learners were presented in Figure 6.
The GY prediction model based on multiple growth stages was constructed using only the vegetation indices as the input factors for the model. The average prediction accuracy of the integrated model was R 2 = 0.67, RMSE = 0.85 t ha −1 . Overall, the GY prediction accuracy of the integrated multiple growth stages had been improved to varying degrees compared with the four single growth stages. Compared with the prediction accuracy of GY at the jointing, flagging, anthesis, and grain filling stages, the average R 2 of the integrated model increased by 235%, 67.5%, 19.6% and 11.7%, and the average RMSE decreased by 36.1%, 28%, 19.8%, and 9.6%, respectively.
The vegetation indices coupled with LAI were used as the input factors of the model to construct a GY prediction model by integrating multiple growth stages. The results showed that the prediction accuracy of the integrated model was R 2 = 0.74, RMSE = 0.81 t ha −1 . Overall, after integrating the multiple growth stages, the prediction accuracy of GY had been improved to varying degrees. Compared with the jointing, flagging, anthesis and grain filling stages, the average R 2 values of the integrated model increased by 236%, 51%, 27.6% and 12.1%, and the average RMSE decreased by 38.2%, 26.4%, 20.6%, and 8%, respectively. Based on the analysis of the above research results, the vegetation indices coupled with LAI were used as the model input factors, and based on the stacking technology, a GY prediction model integrating the four growth stages was constructed, yielding highest verification accuracy (R 2 = 0.74, RMSE = 0.81 t ha −1 ). To further verify whether the improvement in GY prediction accuracy by integrating information from multiple growth stages is significant, paired samples t-tests were used to assess whether there is a statistical difference in the validation R 2 between the integrated model and the single-growth stage model under two cases of vegetation indices with and without coupled LAI, respectively. The results were listed in Table 5. The results showed that the integrated model outperformed the other base models with or without LAI, and the p values were less than 0.05 indicating that the accuracy improvement obtained by the integrated model was statistically significant. This further demonstrated the effectiveness of the stacking technique proposed in this paper for GY prediction. Among them, the accuracy of the jointing stage was improved most evidently, and the t value was the highest (t = 51.153 and t = 60.18, respectively), consistent with the above results. Moreover, when the vegetation indices coupled with LAI were used as the input factors of the model, the prediction accuracy of the integrated model improved more evidently, and GY prediction effect was better.

Weight Results for the Secondary Models
To determine the weight of each growth stage in the model stacking process, the regression coefficients of the prediction results of each growth period in the secondary model were statistically analyzed with and without coupling LAI, as shown in Figure 7. The results showed that when only the vegetation indices were used as the input factors for GY prediction model, the average regression coefficients of the jointing, flagging, anthesis, and grain filling stages in the integrated model were 0.04, 0.05, 0.49, and 0.66, respectively. In the case of vegetation indices coupled with LAI, a GY prediction model integrating multiple growth stages was constructed, and the average regression coefficients of the jointing, flagging, anthesis, and grain filling stages were 0.06, 0.23, 0.48, and 0.72, respectively. Overall, with the development of the growth period, the proportion of each growth stage in the integrated model showed a gradual upward trend, in which the average regression coefficient of the grain filling stage and the weight were the highest. Compared with only using vegetation indices for GY prediction, after coupling the vegetation indices with LAI, the weight of the jointing, flagging, and grain filling stages in the integrated model all increased, while the weight of the anthesis stage decreased slightly, and the weight of the flagging stage increased most evidently.

Discussion
This study aims to improve the accuracy and robustness of crop yield prediction models. In this study, the data of two planting seasons from 2016 to 2018 were used to evaluate the improvement effect of GY prediction model. The information from multiple cropping seasons increased the variability of the data (Tables 2 and 3), which was due to the differences in environmental and phenological conditions between different years and enhanced the prediction accuracy and transferability of the model. In recent years, hyperspectral remote sensing has great potential for application in rapid and accurate estimation of crop parameters and yields in agricultural fields because of its advantages of narrow waveband and strong continuity [69]. Considering the information overlap between adjacent bands, 15 vegetation indices were calculated in this study for constructing the GY prediction model instead of all bands in order to avoid generating data redundancy. These indices are located in the visible red and near-infrared wavelengths, which are important for photosynthesis in plants and are often used for timely monitoring of vegetation cover and growth vigor, etc. [70]. The results of the correlation analysis ( Figure 4) indicated that all vegetation indices performed a highly significant correlation with GY (p < 0.001). However, there are differences in the potential of different vegetation indices for GY prediction due to the fact that the vegetation information obtained from different spectral channels has various correlations with different elements of vegetation or a certain character state. The NDWI index showed a high correlation with GY due to its ability to effectively extract the water content of the vegetation canopy and respond to moisture changes in a timely manner [71]. The VREI index provides a timely response to changes in chlorophyll concentration, canopy structure and water content in plants, and is often used in vegetation phenology change monitoring, precision agriculture and vegetation productivity modeling [59]. The vegeta-tion cover will change as the fertility period advances, and the effect of soil background can be effectively reduced by introducing MCARI and SAVI. The ARVI can effectively reduce the dependence of spectral reflectance on atmospheric properties [48], which is important for the transfer of models to airborne remote sensing platforms. NDVI eliminates most of the variation in irradiance associated with instrument calibration, sun angle, topography, cloud shadows, and atmospheric conditions, enhancing the response to vegetation [52]. PRI is sensitive to changes in carotenoid content in plants and gradually increases in correlation with GY during late crop growth. SIPI and PSRI indices are often used for canopy physiological stress monitoring and crop yield analysis [56,58]. In addition, the correlation between the same vegetation index and GY varies at different growth stages. This is due to the fact that as the crop fertility progresses, the growth and development of the plant and the influence of the external environment led to changes in its internal components and structure, which cause the spectral characteristics of the plant canopy or leaves to change subsequently [72]. Thus, the combination of multiple vegetation indices used to construct the GY prediction model can achieve complementary information to overcome the limitations of a single band or a single index, which effectively improves the prediction accuracy and robustness of the model (Table A7). This has important significance for the transfer of the model to practical applications.
The essence of yield formation is the complex process of production, transportation, accumulation, and distribution of photosynthetic products [73]. As the reproductive period advances, spectral saturation of the vegetation index occurs when the vegetation cover is greater than 80%. In this study, the coupling of structural parameters (LAI) and spectral parameters (vegetation indices) of the crop effectively improved the accuracy of GY prediction at each growth stage, and the results of the t-test (Table 4) also fully demonstrated that the improvement of model effects at each growth stage was significant (p < 0.05). This is because LAI is an important canopy structure information of crops and one of the important variables to determine the intensity of photosynthesis [4,74]. The size of vegetation leaf area directly affects light interception, photosynthesis and transpiration efficiency of vegetation canopy, and is considered to be the most important factor affecting plant productivity [75]. The results are consistent with those reported by Delécolle et al. [24] and Wang et al. [76]. Among them, the most significant improvement in model accuracy was observed in the flagging stage (the relative value of R 2 increased by 22.5%) which is consistent with our statistical t-test (t = 32.264). This is because the flagging stage is a critical period for the formation of wheat spike number, and this growth stage is mainly reflected in the rapid growth of leaf area and the consequent increase in photosynthetic efficiency, which contributes more to the accumulation of dry matter [77]. Our study demonstrated that canopy spectral saturation can be effectively overcome by coupling LAI and vegetation indices, and this study can provide a new method for accurate and rapid estimation of crop yield.
The results of GY estimation at single growth stage indicated that there were significant differences in yield estimation at different fertility stages, which showed an increasing accuracy of prediction with the advancement of growth stage. This is because crop growth and development is a dynamic and continuous process, and changes in the growth state and environmental characteristics of the crop at each growth stage can lead to different morphological characteristics of the spectral curve, and thus the accuracy of the yield estimation models at different growth stages varies. Since each key stage of crop growth and development can have a significant impact on yield formation, it is important to find a method to combine information from multiple growth stages to improve the accuracy and robustness of the model. The objective of employing a stacking technique was to improve the prediction ability by combining data from multiple growth stages instead of using single growth stage data. When implementing a stacking approach, it is necessary to include self-sufficient, independent, and diverse base learners for analysis ( Figure A3) [47]. The results showed that there was a strong correlation between the predicted results of different growth stages (correlation coefficients of R = 0.6-0.95), which was due to the overlap of information between adjacent fertility stages. ENET is used as a secondary model to perform the multicollinearity data analysis in the model when stacking multiple base learners due to its advantages of both ridge regression and lasso regression [63]. In this study, vegetation indices and LAI from four key growth stages of the crop were successfully combined to achieve higher accuracy in predicting grain yield than any single growth stage (Figures 5 and 6). In addition, the accuracy parameters (R 2 and RMSE) of the base learner fluctuated more and had a wider range than those of the ensemble approach, which also demonstrated the stability of the GY model constructed based on the Stacking technique. However, stacking techniques are sensitive to the timing of data collection at different growth stages. Due to the differences in developmental rates of different varieties, this may lead to the fact that different varieties may be at different growth stages at the same time. In addition, due to the different gradients of water and fertilizer stress treatments, which may have an impact on the growth and development rate of the crop, this poses a significant challenge to our data collection process.
The analysis of the regression coefficients combining the four base learners showed that the highest weight was given to the grain filling stage with or without LAI, which is consistent with the highest accuracy of the previous prediction of the grain filling stage. Therefore, the use of the grain filling stage as a key learner in the grain yield prediction has a very positive effect on the improvement of the prediction effectiveness of the ensemble model. This is rational since grain filling is the stage where wheat transfers organic matter such as starch and protein produced by photosynthesis from the vegetative organs to the grains, and the vegetation indexes in this period are closely related to the final quality of thousand-grain weight [29]. However, the regression coefficients for the jointing period showed negative values in some cases, which is obviously detrimental to improving the model accuracy. The reason for this phenomenon is that the crop cover is small and the ground surface is exposed at the jointing stage, so it is easy to be disturbed by the soil background and other factors during the spectral data collection [78]. Moreover, the LAI at this stage is relatively small and photosynthesis is weak, resulting in the accumulation of little dry matter yet, and its contribution in the integrated model is poor compared to other growth stages. The t-test results (Table 5) show that the GY prediction model with stacked four growth stages performed with good accuracy and robustness than any single growth stage, which has important implications for the practical application of the prediction model.

Conclusions
Winter wheat is one of the most widely grown grain crops in the world, and preharvest insights into yields can help optimize management practices. In this study, we developed an ensemble learning framework for stacking information from multiple growth stages of the crop. The results showed that the ensemble model outperformed other individual growth stage models. In addition, coupling vegetation indices with LAI can effectively overcome the spectral saturation phenomenon and improve the prediction accuracy of the model. Our study demonstrated the effectiveness of Stacking technique for improving GY prediction accuracy ( Figure A4), which can provide new ideas for yield estimation of other crops.

Data Availability Statement:
Since the data sets were acquired through field collection, all data cannot be shared due to legal, ethical, and privacy restrictions.