Next Article in Journal
Transgenic Citrus sinensis Expressing the Pepper Bs2 R-Gene Shows Broad Transcriptional Activation of Defense Responses to Citrus Canker
Previous Article in Journal
Evaluation of Soil Heavy Metals in Major Sugarcane-Growing Areas of Guangxi, China
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hyperspectral Yield Estimation of Winter Wheat Based on Information Fusion of Critical Growth Stages

1
College of Agriculture, Shanxi Agricultural University, Taigu, Jinzhong 030801, China
2
Key Laboratory of Sustainable Dryland Agriculture of Shanxi Province, Taiyuan 030031, China
3
Department of Basic Sciences, Shanxi Agricultural University, Taigu, Jinzhong 030801, China
*
Author to whom correspondence should be addressed.
Agronomy 2026, 16(2), 186; https://doi.org/10.3390/agronomy16020186
Submission received: 26 November 2025 / Revised: 26 December 2025 / Accepted: 4 January 2026 / Published: 12 January 2026
(This article belongs to the Section Precision and Digital Agriculture)

Abstract

Timely and accurate crop yield estimation is vital for food security and management decision-making. Integrating remote sensing with machine learning provides an effective solution. In this study, based on canopy hyperspectral data collected by an ASD FieldSpec 3 handheld spectrometer during the critical growth stages of winter wheat, 18 vegetation indices (VIs) were systematically calculated, and their correlation with yield was analyzed. At the same time, a continuous projection algorithm, Successive Projections Algorithm (SPA), was used to screen the characteristic bands. Recursive Feature Elimination (RFE) was employed to select optimal features from VIs and characteristic spectral bands, facilitating the construction of a multi-temporal fusion feature set. To identify the superior yield estimation approach, a comparative analysis was conducted among four machine learning models: Deep Forest (DF), Support Vector Regression (SVR), Random Forest (RF), and Gaussian Process Regression (GPR). Performance was evaluated using the coefficient of determination (R2), root mean square error (RMSE), and relative root mean square error (rRMSE). Results indicate that the highest correlations between VIs and grain yield were observed during the flowering and grain-filling stages. Independent analysis showed that VIs reached absolute correlations of 0.713 and 0.730 with winter wheat yield during the flowering and grain-filling stages, respectively, while the SPA further identified key bands primarily in the near-infrared and short-wave infrared regions. On this basis, integrating multi-temporal features through RFE significantly improved the accuracy of yield estimation. Among them, the DF model with the fusion of flowering and filling stage features performed best (R2 = 0.786, RMSE = 641.470 kg·hm−2, rRMSE = 15.67%). This study demonstrates that combining hyperspectral data and VIs from different growth stages provides complementary information. These findings provide an effective method for crop yield estimation in precision agriculture.

1. Introduction

Wheat, as one of the world’s three major staple crops, serves as the primary food source for nearly half of the global population. Its yield is directly critical to national food security and social stability [1]. With continued population growth and economic development, global food demand continues to rise [2]. In 2024, China’s wheat sowing area reached 2.359 × 107 hm2, with a yield of 1.401 × 108 t, accounting for approximately 20% of the total annual grain output. In this context, establishing precise yield prediction models is of great significance for safeguarding food security and optimizing agricultural policies. Achieving yield estimation through agricultural monitoring and intelligent algorithms can provide a scientific basis for government departments to formulate grain regulation policies and for agricultural producers to optimize management measures, while simultaneously enhancing China’s voice in the international grain market [3,4].
In recent years, the rapid development of remote sensing technology has provided a new perspective for crop yield prediction [5,6]. Specifically, hyperspectral remote sensing data, characterized by high spectral resolution, can reflect the physiological status and environmental responses of crops at different growth stages in greater detail, offering a new pathway for crop yield prediction [7]. In current remote sensing yield estimation research, Vegetation Indices (VIs) have become an important tool due to their efficiency in crop growth monitoring. Tao et al. [8] utilized Unmanned Aerial Vehicle (UAV) based hyperspectral data to confirm a robust correlation between VIs and winter wheat yield, while Li et al. [9] also noted that optimized VIs can effectively suppress soil background noise. However, the use of single vegetation indices often has limitations, particularly under high biomass conditions where saturation phenomena are prone to occur, making it difficult to comprehensively reflect the complex physiological state of crops [10,11]. To overcome this limitation, Li et al. [9] and Wu et al. [12] proposed that advanced feature selection algorithms must be introduced to address the “curse of dimensionality” and information redundancy in hyperspectral data. Ranđelović et al. [13] demonstrated that combining Random Forest models with optimized feature selection strategies can significantly improve yield prediction accuracy by identifying the most sensitive spectral variables. Research by Feng et al. [14] also indicated that fusing multiple spectral information, especially combining vegetation indices with specific spectral features (such as red-edge parameters), can more accurately reveal crop physiological changes and environmental adaptability. Despite these advances, most existing studies often isolate “characteristic bands” from “vegetation indices,” lacking a systematic assessment of hybrid feature subsets (bands + indices) selected by screening methods. This limits the full utilization of spectral information by the models [15].
In addition to feature fusion in the spectral dimension, the fusion of hyperspectral data from multiple growth stages is also an important method for improving crop yield prediction accuracy [16,17,18,19]. Crop yield formation is a dynamic cumulative process, and observations from a single period are often one-sided. Bian et al. [20] emphasized in winter wheat research that combining data from different phenological stages can significantly enhance model robustness. Research by Liu et al. [21] also confirmed that the accuracy of data fusion across multiple growth stages is superior to that of single growth stages. This suggests that integrating spectral information from the flowering stage and the filling stage is essential for comprehensively capturing the dynamic changes in crops across various growth stages.
Regarding modeling methods, machine learning algorithms have become mainstream due to their ability to effectively handle non-linear relationships and high-dimensional data. Fu et al. [22] compared Random Forest (RF), Artificial Neural Networks (ANN), and traditional regression models for winter wheat yield estimation, demonstrating that the ANN model was superior in capturing the complex relationships between spectral features and yield, achieving an R2 of 0.78. However, existing literature mostly focuses on traditional machine learning models such as Random Forest (RF) or Support Vector Regression (SVR), relatively overlooking Deep Forest (DF)—a tree-based model that combines the complexity of deep learning but has lower requirements for training data volume [23]. Although Shen et al. [24] demonstrated the advantages of deep learning architectures, their work was largely based on massive datasets. How to balance model complexity and generalization ability on small-to-medium-scale hyperspectral data remains a scientific issue to be addressed.
Given these limitations, this study aims to explore a hyperspectral prediction model for winter wheat yield based on machine learning. Specifically, this study conducted the following work: (1) Constructed a hyperspectral dataset covering the flowering stage, filling stage, and their combination (Multi-stage) of winter wheat; (2) Proposed a hybrid feature selection strategy, using the RFE algorithm to select the top-10 most representative features (comprising both VIs and characteristic bands), and compared them against traditional full-band VIs and SPA-selected characteristic bands; (3) Systematically evaluated the performance of four regression models (SVR, RF, DF, and GPR). By optimizing feature selection and model training, this study aims to propose a practical data fusion method to provide more reliable technical support for winter wheat yield estimation in precision agriculture.

2. Materials and Methods

2.1. Experimental Design

The experiment was carried out at the farming station of Shanxi Agricultural University in Taigu District, Jinzhong City, Shanxi Province, China, from September 2023 to June 2025. The test site built a water pool according to the FAO standard [25]. The experimental site consisted of 30 plots, each measuring 2 m × 3 m with a depth of 1.5 m. To prevent lateral seepage and moisture exchange between adjacent units, all plots were isolated by 24.5 cm thick concrete walls. Two winter wheat cultivars, ‘Changmai 6878’ (dryland variety) and ‘Zhongmai 175’ (irrigated variety), were used for the experiment. All plots received uniform basal irrigation during the overwintering and regreening stages. Five irrigation treatments were established: T1 (rain-fed), T2 (jointing), T3 (jointing + flowering), T4 (jointing + filling), and T5 (jointing + flowering + filling). For each irrigation event, 60 mm of water was applied. The experiment followed a completely randomized design with three replicates per treatment, and other field management practices remained consistent with local conventional agricultural standards. The geographic location of the study area is illustrated in Figure 1.

2.2. Data Acquisition

2.2.1. Canopy Hyperspectral Acquisition

Canopy hyperspectral reflectance was collected at five critical growth stages: jointing, booting, heading, flowering, and filling. Measurements were conducted using a FieldSpec 3 spectroradiometer (Analytical Spectral Devices, ASD, Boulder, CO, USA) with a spectral range of 350–2500 nm. The spectral sampling interval and resolution were 1.4 nm and 3 nm for the 350–1000 nm range, and 2 nm and 10 nm for the 1000–2500 nm range, respectively. Field measurements were carried out under clear, cloudless, and calm conditions (wind speed < level 3) during the period of 10:00 to 14:00 local time to ensure stable solar illumination. To ensure representativeness, uniform growth positions within each plot were selected for spectral acquisition. The fiber optic probe was positioned vertically at a constant height of 1 m above the canopy, and 15 individual spectral scans were recorded and averaged per plot to determine the final canopy reflectance. Radiometric calibration was performed using a standard white reference panel before each measurement to maintain high data accuracy and consistency.

2.2.2. Determination of Winter Wheat Yield and Yield Components

At the maturity stage of winter wheat, 1 m double rows were selected to determine the number of panicles per m2, and 20 single stems were taken from each plot for indoor testing to examine the number of grains per panicle and 1000-grain weight. At the same time, 1 m2 samples were harvested in the area of reflectance spectrum, and the actual yield was measured after threshing and air-drying treatment. The formula for calculating the measured yield is as follows (1).
T h e   m e a s u r e d   y i e l d = s a m p l i n g   f r e s h   w h e a t   w e i g h t × [ 1 g r a i n   m o i s t u r e   c o n t e n t ] ÷ [ 1 14 % ] ÷ s a m p l i n g   a r e a

2.3. Data Analysis Method

The original spectral data were preprocessed in ViewSpec Pro 6.0 spectral analysis software, and the data quality was ensured by spectral curve shape analysis and outlier detection. In addition, the spectral regions with serious noise removal and significant atmospheric water absorption are 350–400 nm, 1350–1500 nm, 1800–2100 nm and 2350–2500 nm [26]. Finally, SG smoothing (window width 9, second-order polynomial) combined with first-order derivative and SNV in series was used as the pretreatment scheme of this study. This tandem pretreatment scheme was designed to suppress sensor noise and preserve spectral morphology while eliminating baseline drift and scattering interference, thereby amplifying subtle features sensitive to crop growth.
Sample Set Partitioning Based on Joint X-Y Distances (SPXY) is used to divide the samples. This method considers the distribution characteristics of spectral feature space (x) and yield index (y) through Euclidean distance measurement. Its mathematical expression is:
d k ( x , y ) = α d ( x i , x j ) m a x ( d x ) + ( 1 α ) d ( y i , y j ) m a x ( d y )
where α is the weight coefficient (0.5 in this study), dx and dy are the normalized distances of spectral space and yield space, respectively [27].

2.4. Feature Extraction and Analyses

2.4.1. Vegetation Indices

Vegetation indices (VIs) are parameters obtained by algebraically combining the reflectance of different bands, which can reduce the interference of background conditions on spectral reflectance data and have higher sensitivity than a single band [28]. On the basis of previous studies, 18 VIs with good correlation with yield were selected in this study. Table 1 lists the definition, description and source of the VIs used in this study.
Before modeling, the correlation between the vegetation index calculated by the spectral reflectance values extracted at different growth stages and the yield was analyzed to determine the optimal growth period for yield estimation. Pearson correlation coefficient (r) and its absolute value (|r|) were used as the evaluation index of the correlation between spectral vegetation index and yield, and the analysis was completed in SPSS 26.0.

2.4.2. Spectral Characteristic Bands Were Selected Based on SPA

Successive projections algorithm (SPA) is a feature variable selection method based on forward iterative search. Its core is to select the least collinearity variable combination from the spectral matrix through vector projection analysis [44]. The algorithm selects a set of feature variables with low collinearity and high information content from the original variables by iterative projection. Its mathematical essence is to construct a projection operator through Gram-Schmidt orthogonalization, so that the wavelength selected each time maintains the maximum linear independence from the selected subspace, thereby effectively eliminating data collinearity interference, significantly reducing model redundancy, and improving computational efficiency and prediction accuracy. In this study, the SPA was used to preliminarily screen the characteristic bands of key growth periods.

2.4.3. Recursive Feature Elimination

Recursive Feature Elimination (RFE) algorithm is a sequential selection backward search algorithm. It is a greedy algorithm for finding the optimal feature subset. Its core idea is to repeatedly construct the susceptibility model, eliminate one or several of the most irrelevant features from the results of each operation according to the objective function, and then repeat the above process to realize the importance ranking of features [45]. In this study, the RFE algorithm is used to rank the importance of features and secondary screening, and a multi-period combined feature set is constructed as the input of the model.

2.5. Modeling Method

In order to evaluate the effectiveness of the method and explore the relationship between canopy hyperspectral reflectance and winter wheat yield, this study selected four modeling methods: Support Vector Regression (SVR), Gaussian Process Regression (GPR), Random Forest (RF) and Deep Forest (DF).
(1)
Support vector regression
Support vector regression algorithm is evolved from support vector machine algorithm. As a supervised learning machine learning algorithm, the basic principle is to minimize the distance between samples by establishing an optimal decision hyperplane, so as to fit the sample data. Compared with other algorithms, the model has higher accuracy, better processing ability for high-dimensional and small sample data, and good generalization ability and robustness [46].
(2)
Gaussian process regression
Gaussian process regression is a machine learning method that combines statistical learning theory and Bayesian theory [47]. It has a good effect on dealing with regression and classification problems with small samples, high dimensions and nonlinear complex relationships. In recent years, it has become one of the most efficient machine learning algorithms for crop parameter estimation [48].
(3)
Random forest
Random forest model is a prediction model based on decision tree and ensemble learning strategy. Each tree uses a random subset of predictor variables in each split of its decision tree, combined with the prediction results of multiple decision trees to improve the prediction accuracy and robustness of the overall model [49]. Its characteristics make it have excellent performance in solving a variety of prediction problems, including classification and regression problems. Its advantage is that the regression process can evaluate the importance of each feature through unbiased estimation, and it is more efficient than the traditional linear model when dealing with complex nonlinear relationships [50].
(4)
Deep forest
Deep Forest is a deep learning model based on decision tree integration. It extracts local features through Multi-Grained Scanning and combines cascade forests to enhance features layer by layer. Cascade forest is an ensemble learning method based on decision tree, which aims to improve the performance of traditional forest models [23]. The design idea of cascade forest originates from the hierarchical structure of deep learning, but it cascades multiple random forests or extreme random forests layer by layer. Each layer is trained based on the output characteristics of the previous layer. Cascaded forest as a classifier has high model performance, simple structure, less hyperparameter adjustment requirements, and good adaptability to small sample data sets.
In this study, to ensure a fair comparison and balance accuracy with generalization, hyperparameters were tailored to each algorithm’s sensitivity. Bayesian optimization was used to determine kernel parameters for SVR and GPR, while the RF and DF models followed empirical configurations: 500 trees for RF, and for DF, a 6-layer cascade with 200 trees per forest and an early stopping threshold of 2 layers to mitigate overfitting on the small hyperspectral dataset. Under these optimized settings, the four models (SVR, RF, GPR, and DF) were evaluated as parallel candidates on identical feature subsets to benchmark their predictive accuracy and generalization for wheat yield estimation.

2.6. Model Validation and Evaluation

Five-fold cross-validation is a commonly used model validation method to evaluate the generalization performance of the model. Five-fold cross-validation is mainly applied to small sample data sets [51]. In this study, five-fold cross-validation was applied to the model, and the average value was finally selected as the final performance of the model. The coefficient of determination (R2), root mean squared error (RMSE) and relative root mean square error (rRMSE) were used as the evaluation indexes of model accuracy. The calculation methods of each evaluation index are shown in Formulas (2)–(4).
R 2 = 1 i = 1 n   ( y i x i ) 2 i = 1 n   ( y i y ¯ ) 2
R M S E = i = 1 n   ( x i y i ) 2 n
r R M S E = R M S E y ¯ × 100 %
where y i is the measured value of wheat yield; x i is the predicted value of wheat yield; y ¯ overline is the mean of wheat yield; n is the number of samples.

3. Results

3.1. Descriptive Statistical Analysis

In the field experiments conducted from 2024 to 2025, water treatments exerted a significant regulatory effect on winter wheat grain yield. Both cultivars exhibited a clear increasing trend in yield with the intensification of irrigation, showing high consistency across the two study years. For cultivar Changmai 6878, the maximum yield was achieved under the T5 treatment. In 2025, the average yield for T5 reached 7227.3 kg·hm−2, representing an 81.8% increase compared to the T1 treatment (3974.5 kg·hm−2) of the same year. In 2024, the yield under T5 (3693.6 kg·hm−2) showed a more pronounced increase of 164.4% over the T1 treatment (1397.1 kg·hm−2). Notably, for Changmai 6878, the yield differences among T3, T4, and T5 were relatively small, particularly in 2024, suggesting that the yield increment for this cultivar tends to plateau once moderate irrigation intensity is reached. Cultivar Zhongmai 175 demonstrated even greater potential for yield enhancement. Its peak yield also occurred under the T5 treatment, reaching 7844.9 kg·hm−2 in 2025, a 93.1% increase over T1. In 2024, the T5 yield (4068.5 kg·hm−2) increased by a substantial 302.8% compared to the T1 treatment (1010.0 kg·hm−2). Compared to Changmai 6878, Zhongmai 175 was more sensitive to the irrigation gradient, exhibiting a superior yield ceiling under high water availability. Regarding interannual variations, the overall yield levels in 2025 were significantly higher than those in 2024, which could be attributed to more favorable climatic conditions or reduced environmental fluctuations during the 2025 growing season. The larger error bars observed in the 2024 yield data reflect a higher degree of dispersion among replications at lower yield levels. These results indicate that increasing irrigation intensity significantly enhances the actual grain yield of both winter wheat cultivars, with the dynamic evolution of yields for Changmai 6878 and Zhongmai 175 under different treatments illustrated in Figure 2.

3.2. Optimal Feature Extraction Results

3.2.1. Correlation of VIs with Yield

As shown in Figure 3, this study screened 18 significantly correlated vegetation indices by analyzing the correlation between vegetation indices and yield at five growth stages of winter wheat. The analysis results showed that the ARVI at the filling stage showed a strong correlation with the yield (r = 0.888), which was the highest correlation coefficient among the five growth stages, indicating that the vegetation index had the strongest predictive ability at the filling stage. All vegetation indices at grain-filling stage showed significant correlation with yield (|r| > 0.70). Compared with other stages, vegetation indices at grain-filling stage generally had stronger predictive ability.
Except for the weak correlation between TCARI index (flowering stage), other vegetation indexes (such as ARVI, PRI, etc.) still showed a very significant correlation with yield (|r| > 0.70). At jointing stage, except TCARI and CRI, other vegetation indices were significantly correlated with yield (p < 0.05), and the correlation coefficient was more than 0.58. At booting stage, except TCARI and TVI, other vegetation indices also showed significant correlation (p < 0.05), and the correlation coefficient was more than 0.59. At heading stage, in addition to TCARI, CARI and CRI, other vegetation indices were also significantly correlated with yield, and the correlation coefficient was more than 0.52. The results showed that the vegetation index had a significant statistical correlation with crop yield and could be used as an effective index for yield estimation.
As shown in Figure 4, the box plot of the absolute value of the correlation between vegetation index and yield at each growth stage is shown. By comparing the correlation between vegetation index and yield in each period, it can be seen that the absolute value of the correlation coefficient between vegetation index and yield in flowering stage and filling stage is higher than that in other periods. Except for TCARI, the absolute value of correlation coefficient of other flowering vegetation indexes was more than 0.713. The absolute value of vegetation index in filling stage was more than 0.730. There was a significant correlation between most vegetation indices and yield at flowering and filling stages. This result confirms the importance of filling and flowering stages as key windows for yield prediction [52].

3.2.2. Characteristic Band Screening

SPA was used to screen the characteristic bands of hyperspectral data after pre-processing (SG + 1st + snv) at flowering and filling stages to extract the spectral information most related to winter wheat yield. Figure 5 shows that there are some differences in the characteristic bands extracted at different growth stages, which reflects the specificity of the spectral response of crops at different growth stages. At the flowering stage, the 10 key bands screened by SPA were 726, 745, 756, 929, 936, 1134, 1345, 1503, 2103 and 2346 nm, covering the important bands of near-infrared and short-wave infrared. Some bands are closely related to vegetation canopy structure and water status. The 10 bands selected during the filling stage were 565, 691, 714, 735, 929, 1118, 1347, 2103, 2348 and 2349 nm. Compared with the flowering stage, the concentration was higher in the short-wave infrared region (especially 2200–2350 nm), reflecting the spectral response characteristics of water and organic matter accumulation during grain filling.

3.2.3. Spectral Characteristics and Characteristic Band Sorting Based on RFE

Based on multi-period hyperspectral data, the RFE algorithm was employed to optimize feature subsets derived from vegetation indices and spectral bands, respectively. Through an iterative evaluation and ranking of feature importance, five vegetation indices and five key spectral bands were successively selected, and a novel feature combination was constructed for yield prediction. As shown in Figure 6, the heat map of feature importance evolution process based on RFE algorithm is shown. The color represents the normalized importance of each feature in different iteration stages. The vertical axis is sorted from bottom to top according to the importance of the final retained features. The horizontal axis represents the order in which the model gradually eliminates features during the iteration process, from left to right, reflecting the dynamic change process of feature selection. The feature selection results indicated that the top five vegetation indices identified at the flowering stage were MTCI, SR, PRI, MSR and CRI, while the key spectral bands selected were 745 nm, 2103 nm, 936 nm, 1345 nm, and 929 nm. At the filling stage, the five highest-ranked vegetation indices were determined to be VARI, CARI, CRI, PSRI and NDVI, with key bands located at 735 nm, 565 nm, 714 nm, 1347 nm, and 691 nm. Most of these selected indices and bands were concentrated in the red-edge, near-infrared, and short-wave infrared regions, which are known to effectively reflect photosynthetic efficiency and plant water status during critical growth stages. Furthermore, the features selected from the flowering and filling stages were integrated to construct a multi-period comprehensive feature set, thereby enhancing temporal information richness and providing stronger support for physiological interpretation.

3.3. Evaluation of Winter Wheat Yield Prediction Model

In this study, the vegetation indices, characteristic bands, and combined features were extracted from the optimal observation windows (flowering and filling stages) determined in the previous section. Four machine learning methods, DF, SVR, RF and GPR, were used to construct the yield estimation model. In order to systematically evaluate the estimation performance of different data sets, a progressive feature combination is designed. Firstly, modeling is based on vegetation index and characteristic band, respectively. Then, the vegetation index of the top five based on RFE ranking is combined with the characteristic bands of the top five for joint modeling, and compared in three scenarios: flowering period, filling period and two-phase combination.
From Table 2, it can be seen that in the modeling of a single growth period (flowering stage), the overall performance of the model using the fusion input of the feature band and the vegetation index is better than that of the vegetation index or the feature band alone, and the best performance is with the DF model. The validation set R2 reached 0.673, RMSE was 983.669 kg·hm−2, and rRMSE was 19.46%. During the filling stage, the performance of most models was further improved. Under the fusion input, the R2 of the GPR model reached 0.742, the RMSE decreased to 701.410 kg·hm−2, and the rRMSE decreased to 17.73%. It shows that the filling stage has stronger sensitivity and stability to yield estimation.
In the joint feature modeling of flowering stage + filling stage, most models have the best performance, indicating that the fusion of multiple growth stages and joint features helps to improve the prediction accuracy. Among them, the DF model achieved the best results under the fusion input condition (R2 = 0.786, RMSE = 641.470 kg·hm−2, rRMSE = 15.67%). Compared with the single vegetation index input, the R2 of the model increased by 7% (0.730 → 0.786), RMSE decreased by 11% (721.229 → 641.470 kg·hm−2), and rRMSE decreased by 11% (17.53% → 15.67%). Compared with the single characteristic band input, the improvement was more significant, R2 increased by 3% (0.760 → 0.786), and RMSE decreased by 26% (872.010 → 641.470 kg·hm−2). This result highlights the importance of the synergistic use of vegetation index and characteristic bands, indicating that the DF model can effectively integrate spectral information.
It is verified that multi-period feature information has complementary and cumulative effects on yield estimation. In addition, the performance of different models is also different. The DF model is the most stable and accurate in most cases, indicating that it can more fully exploit the nonlinear relationship between spectral characteristics and yield after integrating multiple basic models.
The scatter plot is used to compare the best predicted values obtained by each machine learning method with the corresponding measured values (Figure 7). The DF model has the best effect, and the R2 under the fusion feature of flowering + filling period is 0.786. RMSE and rRMSE were the lowest in the four models, 641.470 kg·hm−2 and 15.67%, respectively. The GPR model, R2 was 0.769, RMSE and rRMSE were 667.440 kg·hm−2 and 16.31%, respectively. The R2 of RF and SVR models were 0.725 and 0.738, respectively, and the prediction accuracy was slightly lower, but still better than the single-stage input model, indicating that the introduction of hyperspectral features at the two growth stages improved the generalization ability of the model to a certain extent. On the whole, the combined input of flowering + filling stage and vegetation index + characteristic band has a significant synergistic gain effect, and the DF model is the model with the best yield estimation accuracy in this study.

4. Discussion

4.1. Winter Wheat Yield Under Different Irrigation

Field experiments from 2024 to 2025 demonstrated that irrigation significantly regulated winter wheat yield, which increased consistently with irrigation intensity. Both cultivars achieved their peak yields under the T5 treatment: in 2025 and 2024, Zhongmai 175 reached 7844.93 and 4068.47 kg·hm−2, respectively, while Changmai 6878 reached 7227.3 and 3693.6 kg·hm−2. Notably, in the 2024 season, Zhongmai 175 exhibited a greater yield potential with a 302.8% increase from T1 to T5, compared to a 164.4% increase for Changmai 6878.
Water availability is fundamental to yield formation, yet significant differences were observed between cultivars: Zhongmai 175 was more sensitive to water, showing distinct yield gradients, whereas the yield increment for Changmai 6878 plateaued after the T3 treatment, suggesting its water requirements were largely optimized at the T3 stage. Interannual comparisons revealed that overall yields in 2025 were significantly higher than those in 2024, likely due to more favorable climatic conditions. The larger error bars in 2024 reflected a higher degree of dispersion among replications at lower yield levels. Under conditions of severe water deficit, the yield reduction rate was predominantly determined by the total water supply, while timely irrigation significantly promoted the grain-filling process and ultimate yield accumulation [53,54].

4.2. The Strong Correlation Between Vegetation Index

Remote crop yield estimation methods are commonly based on the high correlation between the crop yield and the vegetation index taken at a specific crop growth stage [21]. This study found that the correlation between the vegetation index and the yield of winter wheat at the filling stage was the strongest, especially the ARVI, which showed a strong correlation with the yield (r = 0.888). Moreover, the correlation between almost all vegetation indices and yield at filling stage reached a very significant level (|r| > 0.70, p < 0.001), which was much higher than that at jointing stage, booting stage and heading stage. This shows that as the crop enters the end of reproductive growth, the corresponding relationship between vegetation index and yield gradually increases. At the flowering stage, although some indexes (such as TCARI) were not strongly correlated, most indexes (such as MSR, PRI) still showed high correlation. As a transition window from vegetative to reproductive growth, the anthesis stage of winter wheat is characterized by peak canopy development and structural stability. The MSR vegetation index overcomes the saturation effect at high LAI levels through non-linear transformation, precisely characterizing the crop’s production capacity. Meanwhile, PRI captures light use efficiency during peak photosynthesis by monitoring the xanthophyll cycle [55]. The synergy between MSR’s biomass representation and PRI’s photosynthetic monitoring at this stage jointly reveals the yield formation potential during the source-sink transition.

4.3. Yield and the Difference in Growth Period in Characteristic Band

Based on the SPA, the characteristic bands of hyperspectral data of winter wheat at multiple growth stages were screened, and the results revealed the differences and stages of spectral response characteristics at different growth stages [56]. This result shows that the sensitive bands of crops at different growth stages to the spectrum are not completely consistent, but are affected by physiological status, tissue structure and nutrient distribution. At flowering period, characteristic bands are concentrated in the red-edge (726–756 nm), NIR, and SWIR regions. Red-edge bands accurately capture the production foundation at peak biomass and LAI. The 929–1345 nm range is associated with cell structure and tissue water content, while 2103–2346 nm reflect the vigorous potential for lignin, cellulose, and protein synthesis [57]. In contrast, the characteristic bands selected during the grain-filling stage exhibit a significant evolution in wavelength distribution and physiological significance. Beyond retaining common bands such as 929 and 2103 nm, the enhanced sensitivity at the green peak (565 nm) and the red-edge onset (691–735 nm) reflects the dynamic red-edge shift resulting from chlorophyll degradation. Most notably, the 2348–2349 nm region shows a higher concentration of bands, directly associated with starch accumulation and carbohydrate transport in the grains, revealing the physiological transition of winter wheat from source to sink. In addition, as a typical band screening algorithm, SPA has the advantage of effectively reducing the multicollinearity of spectral data and improving the efficiency of the model. However, there are also problems of noise sensitivity and general stability. Constructed a high-precision monitoring model of wheat stripe rust by fusing multi-growth period hyperspectral features (vegetation index and texture features) and combining CA/SPA-CARS algorithm to select sensitive bands. It shows that the accuracy of multi-modal feature fusion is more than 10% higher than that of single feature modeling, and the synergistic enhancement effect of spectral and texture features is verified. Therefore, in future research, we can try to combine several other feature selection algorithms, such as CARS or principal component analysis (PCA), for band fusion to improve the robustness and versatility of the extracted features.

4.4. Feature Fusion and Deep Forest Model to Optimize Crop Yield Estimation

Based on the multi-growth period hyperspectral data and its fusion features, the RFE algorithm was used to optimize the feature subset of vegetation index and spectral band. The results of feature selection show that the prediction ability of a single input variable (such as a single vegetation index or spectral band) is limited. Models with a single feature usually face the risk of overfitting and sensitivity to environmental changes [58]. Combining multiple spectral information (such as vegetation index and multiple spectral bands) significantly improves the prediction ability of the model, especially for complex crop growth stages. In terms of model evaluation, the DF model was superior to SVR, RF and GPR in multiple scenarios, especially in the fusion input and multi-period combination (R2 = 0.786, RMSE = 641.470 kg·hm−2, rRMSE = 15.67%). DF leverages a cascade structure to achieve feature re-representation, uncovering high-order non-linear relationships through layer-by-layer enhancement of predictive information. Compared to traditional deep learning, its ensemble learning mechanism effectively suppresses overfitting risks in small-sample scenarios. Coupled with an adaptive termination mechanism that balances model complexity, it demonstrates superior prediction accuracy and robustness over shallow models within low-dimensional feature spaces. The distinct advantage of the DF model lies in its capacity to handle the inherent non-linear relationships within high-dimensional spectral data [59]. In addition, for the modeling of a single growth period, the yield estimation model at the filling stage is superior to other single growth periods, which is consistent with the results of Han et al. [60]. Though prediction accuracy varies across agricultural zones and algorithms, the filling stage consistently outperforms other single periods. The yield estimation model at the filling stage is superior to the flowering stage. However, compared with the multi-period estimation model, the multi-period feature fusion significantly improves the model performance. While this study achieved high-accuracy yield prediction using two-year hyperspectral data and the DF model, certain limitations remain. The implementation of concrete isolation walls (water tanks) according to FAO standards effectively eliminated lateral seepage interference and ensured data independence; however, it also constrained the sample size, theoretically increasing the risk of overfitting during feature selection and modeling. Future research will incorporate multi-site observation data to further evaluate the spatial generalizability of the model.

5. Conclusions

Based on the field-measured data, this study clarified the effect of key irrigation periods on the yield formation of winter wheat by applying different irrigation treatments. On this basis, the correlation between vegetation index and yield at flowering and filling stages was systematically analyzed, and the SPA was used to preliminarily screen the feature bands. Then, the RFE method was used to rank the importance of vegetation index and feature bands so as to construct the feature set for modeling. In order to further establish a winter wheat yield estimation model, the vegetation index, characteristic band and fusion feature corresponding to the flowering stage, filling stage and their combination were used as input variables, and SVR, GPR, RF and DF were combined. Four machine learning algorithms were used to carry out model construction and verification analysis. The conclusions are as follows:
(1) Winter wheat yield was most effectively promoted by the T5 treatment, with Zhongmai 175 exhibiting higher water sensitivity and a maximum yield increase of 302.8% compared to T1. While both cultivars reached peak yields under T5, 2025 consistently outperformed 2024 due to more favorable climatic conditions. These findings confirm that timely irrigation during key observation windows is essential for optimizing the grain-filling process and final yield accumulation.
(2) Based on correlation analysis and model validation results, vegetation indices during the flowering and grain-filling stages were found to exhibit strong predictive capabilities for winter wheat yield estimation. The grain-filling stage, in particular, emerged as the optimal single growth stage. When the input features for the DF model during grain filling were changed from a single vegetation index to a fusion of vegetation indices and characteristic spectral bands, the R2 value increased from 0.729 to 0.735, and the RMSE decreased from 808.270 kg·hm−2 to 710.201 kg·hm−2. This result indicates that even within a single critical growth stage, integrating multiple spectral features can effectively optimize model prediction accuracy and stability. Furthermore, integrating features from both the flowering and grain-filling stages further enhances model performance. With both vegetation indices and feature bands as inputs, the DF model’s R2 increased from 0.735 during grain filling to 0.786, while RMSE decreased from 710.201 kg·hm−2 to 641.470 kg·hm−2. This demonstrates that integrating multi-period and multi-feature information effectively improves yield estimation accuracy, further validating the advantages of multidimensional feature fusion in crop yield prediction.
(3) In this study, the optimal performance for winter wheat yield estimation was achieved by the DF model utilizing feature fusion from flowering and filling stages, with an R2 of 0.786, RMSE of 641.470 kg·hm−2, and rRMSE of 15.67%.
In comparison with support vector regression (SVR, R2 = 0.738), Gaussian process regression (GPR, R2 = 0.769), and random forest (RF, R2 = 0.725) under identical feature combinations, R2 improvements of approximately 6%, 2%, and 8%, respectively, were observed with the DF model. Furthermore, strong robustness was demonstrated by the DF model across various feature sets, indicating its superior capability in processing multi-period and multi-feature fusion data.

Author Contributions

Conceptualization, X.W., L.X. and X.G.; methodology, X.W., Y.W. and L.X.; software, H.W. and C.K.; validation, X.W., M.F., X.G. and L.X.; formal analysis, C.K., J.S. and Y.W.; investigation, H.W. and Y.W.; resources, X.W. and Y.W.; data curation, J.S. and C.K.; writing—original draft preparation, X.W.; writing—review and editing, L.X., X.G., Y.Z. and M.F.; visualization, X.W., Y.W., J.S. and H.W.; supervision, M.F., X.G. and L.X.; project administration, L.X.; funding acquisition, X.W., L.X., Y.Z. and M.F.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the project of Shanxi Province Key Lab Construction (Z135050009017-3-4); Shanxi Agricultural University Science and Technology Innovation Enhancement Project (CXGC202444); Shanxi Province Graduate Student Practice Innovation Project (2024SJ140).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to institutional data management policies and research cooperation agreements that designate [Shanxi Agricultural University] as the owner and custodian of the data. To obtain access, the corresponding author will submit reasonable requests to the university’s scientific research management department for review and approval, following which data sharing will be facilitated in compliance with internal protocols.

Acknowledgments

We sincerely thank all the members of the team for their enthusiastic help and the availability of laboratory conditions. During the preparation of this study, the authors used generative AI tool in order to improve language. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MTCIMeris Terrestrial Chlorophyll Index
PSRIPlant Senescence Reflectance Index
PRIPhotochemical Reflectance Index
TCARITransformed Chlorophyll Absorption Reflectance Index
CARIChlorophyll Absorption Ratio Index
NDVINormalized Difference Vegetation Index
GNDVIgreen normalized difference vegetation index
MSAVIModified Soil-Adjusted Vegetation Index
VARIVisible Atmospherically Resistant Index
RVIRatio Vegetation Index
GRVIGreen-Red Vegetation Index
CRICarotenoid Reflectance Index
ARVIAtmospherically Resistant Vegetation Index
SRSimple Ratio
TVITriangular Vegetation Index
RDVIRenormalized Difference Vegetation Index
EVIEnhanced Vegetation Index
MSRModified Simple Ratio
VIsVegetation Indices
SPASuccessive Projections Algorithm
RFERecursive Feature Elimination
SPXYSample Set Partitioning Based on Joint X-Y Distances
SVRSupport Vector Regression
GPRGaussian Process Regression
RFRandom Forest
DFDeep Forest
UAVUnmanned Aerial Vehicle
ANNArtificial Neural Networks

References

  1. Fei, S.; Hassan, M.A.; He, Z.; Chen, Z.; Shu, M.; Wang, J.; Li, C.; Xiao, Y. Assessment of ensemble learning to predict wheat grain yield based on UAV-multispectral reflectance. Remote Sens. 2021, 13, 2338. [Google Scholar] [CrossRef]
  2. Van Dijk, M.; Morley, T.; Rau, M.L.; Saghai, Y. A meta-analysis of projected global food demand and population at risk of hunger for the period 2010–2050. Nat. Food 2021, 2, 494–501. [Google Scholar] [CrossRef] [PubMed]
  3. Mueller, N.D.; Gerber, J.S.; Johnston, M.; Ray, D.K.; Ramankutty, N.; Foley, J.A. Closing yield gaps through nutrient and water management. Nature 2012, 490, 254–257. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, L.; Tian, Y.; Yao, X.; Zhu, Y.; Cao, W. Predicting grain yield and protein content in wheat by fusing multi-sensor and multi-temporal remote-sensing images. Field Crops Res. 2014, 164, 178–188. [Google Scholar] [CrossRef]
  5. Yue, J.; Feng, H.; Li, Z.; Zhou, C.; Xu, K. Mapping winter-wheat biomass and grain yield based on a crop model and UAV remote sensing. Int. J. Remote Sens. 2021, 42, 1577–1601. [Google Scholar] [CrossRef]
  6. Weiss, M.; Jacob, F.; Duveiller, G. Remote sensing for agricultural applications: A meta-review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
  7. Sishodia, R.P.; Ray, R.L.; Singh, S.K. Applications of Remote Sensing in Precision Agriculture: A Review. Remote Sens. 2020, 12, 3136. [Google Scholar] [CrossRef]
  8. Tao, H.; Feng, H.; Xu, L.; Miao, M.; Yang, G.; Yang, X.; Fan, L. Estimation of the Yield and Plant Height of Winter Wheat Using UAV-Based Hyperspectral Images. Sensors 2020, 20, 1231. [Google Scholar] [CrossRef]
  9. Li, Z.; Chen, Z.; Cheng, Q.; Duan, F.; Sui, R.; Huang, X.; Xu, H. UAV-Based Hyperspectral and Ensemble Machine Learning for Predicting Yield in Winter Wheat. Agronomy 2022, 12, 202. [Google Scholar] [CrossRef]
  10. Thenkabail, P.S.; Smith, R.B.; De Pauw, E. Hyperspectral Vegetation Indices and Their Relationships with Agricultural Crop Characteristics. Remote Sens. Environ. 2000, 71, 158–182. [Google Scholar] [CrossRef]
  11. Mutanga, O.; Skidmore, A.K. Narrow band vegetation indices overcome the saturation problem in biomass estimation. Int. J. Remote Sens. 2004, 25, 3999–4014. [Google Scholar] [CrossRef]
  12. Wu, R.; Fan, Y.; Zhang, L.; Yuan, D.; Gao, G. Wheat Yield Estimation Study Using Hyperspectral Vegetation Indices. Appl. Sci. 2024, 14, 4245. [Google Scholar] [CrossRef]
  13. Ranđelović, P.; Đorđević, V.; Miladinović, J.; Bukonja, S.; Ćeran, M.; Đukić, V.; Vasiljević, M. Soybean Yield Prediction with High-Throughput Phenotyping Data and Machine Learning. Agriculture 2026, 16, 22. [Google Scholar] [CrossRef]
  14. Feng, H.; Tao, H.; Fan, Y.; Liu, Y.; Li, Z.; Yang, G.; Zhao, C. Comparison of Winter Wheat Yield Estimation Based on Near-Surface Hyperspectral and UAV Hyperspectral Remote Sensing Data. Remote Sens. 2022, 14, 4158. [Google Scholar] [CrossRef]
  15. Yue, J.; Feng, H.; Yang, G.; Li, Z. A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy. Remote Sens. 2018, 10, 66. [Google Scholar] [CrossRef]
  16. Luo, S.; He, Y.; Li, Q.; Jiao, W.; Zhu, Y.; Zhao, X. Nondestructive estimation of potato yield using relative variables derived from multi-period LAI and hyperspectral data based on weighted growth stage. Plant Methods 2020, 16, 150. [Google Scholar] [CrossRef]
  17. Zhang, Y.; Qin, Q.; Ren, H.; Sun, Y.; Li, M.; Zhang, T.; Ren, S. Optimal Hyperspectral Characteristics Determination for Winter Wheat Yield Prediction. Remote Sens. 2018, 10, 2015. [Google Scholar] [CrossRef]
  18. Fan, J.; Zhou, J.; Wang, B.; De Leon, N.; Kaeppler, S.M.; Lima, D.C.; Zhang, Z. Estimation of Maize Yield and Flowering Time Using Multi-Temporal UAV-Based Hyperspectral Data. Remote Sens. 2022, 14, 3052. [Google Scholar] [CrossRef]
  19. Tan, S.; Pei, J.; Zou, Y.; Fang, H.; Wang, T.; Huang, J. Improving rice yield prediction with multi-modal UAV data: Hyperspectral, thermal, and LiDAR integration. Geo Spat. Inf. Sci. 2025, 1–20. [Google Scholar] [CrossRef]
  20. Bian, C.; Shi, H.; Wu, S.; Zhang, K.; Wei, M.; Zhao, Y.; Sun, Y.; Zhuang, H.; Zhang, X.; Chen, S. Prediction of Field-Scale Wheat Yield Using Machine Learning Method and Multi-Spectral UAV Data. Remote Sens. 2022, 14, 1474. [Google Scholar] [CrossRef]
  21. Liu, Y.; Sun, L.; Liu, B.; Wu, Y.; Ma, J.; Zhang, W.; Wang, B.; Chen, Z. Estimation of Winter Wheat Yield Using Multiple Temporal Vegetation Indices Derived from UAV-Based Multispectral and Hyperspectral Imagery. Remote Sens. 2023, 15, 4800. [Google Scholar] [CrossRef]
  22. Fu, Z.; Jiang, J.; Gao, Y.; Krienke, B.; Wang, M.; Zhong, K.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; et al. Wheat Growth Monitoring and Yield Estimation based on Multi-Rotor Unmanned Aerial Vehicle. Remote Sens. 2020, 12, 508. [Google Scholar] [CrossRef]
  23. Zhou, Z.-H.; Feng, J. Deep forest. Natl. Sci. Rev. 2019, 6, 74–86. [Google Scholar] [CrossRef]
  24. Shen, Y.; Mercatoris, B.; Cao, Z.; Kwan, P.; Guo, L.; Yao, H.; Cheng, Q. Improving Wheat Yield Prediction Accuracy Using LSTM-RF Framework Based on UAV Thermal Infrared and Multispectral Imagery. Agriculture 2022, 12, 892. [Google Scholar] [CrossRef]
  25. Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop Evapotranspiration—Guidelines for Computing Crop Water Requirements—FAO Irrigation and Drainage Paper 56; FAO: Rome, Italy, 1998; Volume 56. [Google Scholar]
  26. Li, C.; Li, X.; Meng, X.; Xiao, Z.; Wu, X.; Wang, X.; Ren, L.; Li, Y.; Zhao, C.; Yang, C. Hyperspectral Estimation of Nitrogen Content in Wheat Based on Fractional Difference and Continuous Wavelet Transform. Agriculture 2023, 13, 1017. [Google Scholar] [CrossRef]
  27. Galvão, R.K.H.; Araujo, M.C.U.; José, G.E.; Pontes, M.J.C.; Silva, E.C.; Saldanha, T.C.B. A method for calibration and validation subset partitioning. Talanta 2005, 67, 736–740. [Google Scholar] [CrossRef]
  28. Han, Y.; Zhang, J.; Bai, Y.; Liang, Z.; Guo, X.; Zhao, Y.; Feng, M.; Xiao, L.; Song, X.; Zhang, M.; et al. Ensemble Learning-Driven and UAV Multispectral Analysis for Estimating the Leaf Nitrogen Content in Winter Wheat. Agronomy 2025, 15, 1621. [Google Scholar] [CrossRef]
  29. Dash, J.; Curran, P.J. The MERIS terrestrial chlorophyll index. Int. J. Remote Sens. 2004, 25, 5403–5413. [Google Scholar] [CrossRef]
  30. Prasad, N.; Semwal, M.; Kalra, A. Hyperspectral vegetation indices offer insights for determining economically optimal time of harvest in Mentha arvensis. Ind. Crops Prod. 2022, 180, 114753. [Google Scholar] [CrossRef]
  31. Gamon, J.A.; Penuelas, J.; Field, C.B. A narrow-waveband spectral index that tracks diurnal changes in photosynthetic efficiency. Remote Sens. Environ. 1992, 41, 35–44. [Google Scholar] [CrossRef]
  32. Cogato, A.; Meggio, F.; Collins, C.; Marinello, F. Medium-Resolution Multispectral Data from Sentinel-2 to Assess the Damage and the Recovery Time of Late Frost on Vineyards. Remote Sens. 2020, 12, 1896. [Google Scholar] [CrossRef]
  33. Gitelson, A.A.; Merzlyak, M.N. Remote estimation of chlorophyll content in higher plant leaves. Int. J. Remote Sens. 1997, 18, 2691–2697. [Google Scholar] [CrossRef]
  34. Qi, J.; Huete, A.R.; Cabot, F.; Chehbouni, A. Bidirectional properties and utilizations of high-resolution spectra from a semiarid watershed. Water Resour. Res. 1994, 30, 1271–1279. [Google Scholar] [CrossRef]
  35. Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods 2019, 15, 10. [Google Scholar] [CrossRef]
  36. Houborg, R.; Mccabe, M.F. Daily Retrieval of NDVI and LAI at 3 m Resolution via the Fusion of CubeSat, Landsat, and MODIS Data. Remote Sens. 2018, 10, 890. [Google Scholar] [CrossRef]
  37. Verger, A.; Filella, I.; Baret, F.; Peñuelas, J. Vegetation baseline phenology from kilometric global LAI satellite products. Remote Sens. Environ. 2016, 178, 1–14. [Google Scholar] [CrossRef]
  38. Kaufman, Y.J.; Tanre, D. Atmospherically resistant vegetation index (ARVI) for EOS-MODIS. IEEE Trans. Geosci. Remote Sens. 1992, 30, 261–270. [Google Scholar] [CrossRef]
  39. Fan, Y.; Feng, H.; Yue, J.; Liu, Y.; Jin, X.; Xu, X.; Song, X.; Ma, Y.; Yang, G. Comparison of different dimensional spectral indices for estimating nitrogen content of potato plants over multiple growth periods. Remote Sens. 2023, 15, 602. [Google Scholar] [CrossRef]
  40. Gao, C.; Li, H.; Wang, J.; Zhang, X.; Huang, K.; Song, X.; Yang, W.; Feng, M.; Xiao, L.; Zhao, Y.; et al. Combined use of spectral resampling and machine learning algorithms to estimate soybean leaf chlorophyll. Comput. Electron. Agric. 2024, 218, 108675. [Google Scholar] [CrossRef]
  41. Liang, L.; Huang, T.; Di, L.; Geng, D.; Yan, J.; Wang, S.; Wang, L.; Li, L.; Chen, B.; Kang, J. Influence of Different Bandwidths on LAI Estimation Using Vegetation Indices. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 1494–1502. [Google Scholar] [CrossRef]
  42. Son, N.T.; Chen, C.F.; Chen, C.R.; Minh, V.Q.; Trung, N.H. A comparative analysis of multitemporal MODIS EVI and NDVI data for large-scale rice yield estimation. Agric. For. Meteorol. 2014, 197, 52–64. [Google Scholar] [CrossRef]
  43. Li, Z.; Li, Z.; Fairbairn, D.; Li, N.; Xu, B.; Feng, H.; Yang, G. Multi-LUTs method for canopy nitrogen density estimation in winter wheat by field and UAV hyperspectral. Comput. Electron. Agric. 2019, 162, 174–182. [Google Scholar] [CrossRef]
  44. Araújo, M.C.U.; Saldanha, T.C.B.; Galvão, R.K.H.; Yoneyama, T.; Chame, H.C.; Visani, V. The successive projections algorithm for variable selection in spectroscopic multicomponent analysis. Chemom. Intell. Lab. Syst. 2001, 57, 65–73. [Google Scholar] [CrossRef]
  45. Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene Selection for Cancer Classification using Support Vector Machines. Mach. Learn. 2002, 46, 389–422. [Google Scholar] [CrossRef]
  46. Drucker, H.; Burges, C.J.; Kaufman, L.; Smola, A.; Vapnik, V. Support vector regression machines. Adv. Neural Inf. Process. Syst. 1996, 9, 155–161. [Google Scholar]
  47. Williams, C.K.; Rasmussen, C.E. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 2. [Google Scholar]
  48. Verrelst, J.; Muñoz, J.; Alonso, L.; Delegido, J.; Rivera, J.P.; Camps-Valls, G.; Moreno, J. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and -3. Remote Sens. Environ. 2012, 118, 127–139. [Google Scholar] [CrossRef]
  49. Zhang, Y.; Huang, C.; Li, H.; Li, S.; Lu, J. Spectral Index Optimization and Machine Learning for Hyperspectral Inversion of Maize Nitrogen Content. Agronomy 2025, 15, 2485. [Google Scholar] [CrossRef]
  50. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  51. Wu, Z.; Li, D.; Zou, L.; Zhao, H. Multi-granularity awareness via cross fusion for few-shot learning. Inf. Sci. 2025, 714, 122209. [Google Scholar] [CrossRef]
  52. Li, C.; Wang, Y.; Ma, C.; Chen, W.; Li, Y.; Li, J.; Ding, F.; Xiao, Z. Improvement of Wheat Grain Yield Prediction Model Performance Based on Stacking Technique. Appl. Sci. 2021, 11, 12164. [Google Scholar] [CrossRef]
  53. Xi, Y.; Wang, D.; Weiner, J.; Du, Y.-L.; Li, F.-M. Time to Onset of Flowering, Water Use, and Yield in Wheat. Agronomy 2023, 13, 1217. [Google Scholar] [CrossRef]
  54. Basheir, S.M.O.; Hong, Y.; Lv, C.; Xu, H.; Zhu, J.; Guo, B.; Wang, F.; Xu, R. Identification of Wheat Germplasm Resistance to Late Sowing. Agronomy 2023, 13, 1010. [Google Scholar] [CrossRef]
  55. Sun, Q.; Jiao, Q.; Qian, X.; Liu, L.; Liu, X.; Dai, H. Improving the Retrieval of Crop Canopy Chlorophyll Content Using Vegetation Index Combinations. Remote Sens. 2021, 13, 470. [Google Scholar] [CrossRef]
  56. Sun, X.; Zhang, B.; Dai, M.; Jing, C.; Ma, K.; Tang, B.; Li, K.; Dang, H.; Gu, L.; Zhen, W.; et al. Accurate irrigation decision-making of winter wheat at the filling stage based on UAV hyperspectral inversion of leaf water content. Agric. Water Manag. 2024, 306, 109171. [Google Scholar] [CrossRef]
  57. Fu, Y.; Yang, G.; Song, X.; Li, Z.; Xu, X.; Feng, H.; Zhao, C. Improved Estimation of Winter Wheat Aboveground Biomass Using Multiscale Textures Extracted from UAV-Based Digital Images and Hyperspectral Feature Analysis. Remote Sens. 2021, 13, 581. [Google Scholar] [CrossRef]
  58. Mu, C.; Liu, Y.; Liu, Y. Hyperspectral Image Spectral–Spatial Classification Method Based on Deep Adaptive Feature Fusion. Remote Sens. 2021, 13, 746. [Google Scholar] [CrossRef]
  59. Datta, D.; Mallick, P.K.; Reddy, A.V.N.; Mohammed, M.A.; Jaber, M.M.; Alghawli, A.S.; Al-Qaness, M.A.A. A Hybrid Classification of Imbalanced Hyperspectral Images Using ADASYN and Enhanced Deep Subsampled Multi-Grained Cascaded Forest. Remote Sens. 2022, 14, 4853. [Google Scholar] [CrossRef]
  60. Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China. Remote Sens. 2020, 12, 236. [Google Scholar] [CrossRef]
Figure 1. Location Map of the Study Area.
Figure 1. Location Map of the Study Area.
Agronomy 16 00186 g001
Figure 2. Changes in wheat yield across different irrigation treatments. Note: T1 (no irrigation), T2 (jointing stage), T3 (jointing stage + flowering stage), T4 (jointing stage + filling stage), T5 (jointing stage + flowering stage + filling stage), each irrigation amount is 60 mm.
Figure 2. Changes in wheat yield across different irrigation treatments. Note: T1 (no irrigation), T2 (jointing stage), T3 (jointing stage + flowering stage), T4 (jointing stage + filling stage), T5 (jointing stage + flowering stage + filling stage), each irrigation amount is 60 mm.
Agronomy 16 00186 g002
Figure 3. Correlation between Vegetation Indices and Yield during Key Growth Stages of Winter Wheat. Note: (a) Jointing Stage (b) Flag Leaf Stage (c) Heading Stage (d) Flowering Stage (e) Filling Stage. The shape of the ellipses indicates the strength of the correlation: narrower ellipses represent stronger correlations, while rounder ellipses represent weaker correlations.
Figure 3. Correlation between Vegetation Indices and Yield during Key Growth Stages of Winter Wheat. Note: (a) Jointing Stage (b) Flag Leaf Stage (c) Heading Stage (d) Flowering Stage (e) Filling Stage. The shape of the ellipses indicates the strength of the correlation: narrower ellipses represent stronger correlations, while rounder ellipses represent weaker correlations.
Agronomy 16 00186 g003
Figure 4. Absolute values of correlation coefficients between vegetation indices and yield across different growth periods. Note: (S1) Jointing Stage, (S2) Flag Leaf Stage, (S3) Heading Stage, (S4) Flowering Stage, (S5) Grain-filling Stage.
Figure 4. Absolute values of correlation coefficients between vegetation indices and yield across different growth periods. Note: (S1) Jointing Stage, (S2) Flag Leaf Stage, (S3) Heading Stage, (S4) Flowering Stage, (S5) Grain-filling Stage.
Agronomy 16 00186 g004
Figure 5. Characteristic spectral regions during flowering and grain-filling stages. Note: Red dashed lines indicate the selected bands.
Figure 5. Characteristic spectral regions during flowering and grain-filling stages. Note: Red dashed lines indicate the selected bands.
Agronomy 16 00186 g005
Figure 6. Heatmap of the Evolution of Feature Importance at Different Growth Stages.
Figure 6. Heatmap of the Evolution of Feature Importance at Different Growth Stages.
Agronomy 16 00186 g006aAgronomy 16 00186 g006b
Figure 7. Scatter plot of model predicted values versus measured values. (a) SVR—The measured value of yield (kg·hm−2); (b) GPR—The measured value of yield (kg·hm−2); (c) RF—The measured value of yield (kg·hm−2); (d) DF—The measured value of yield (kg·hm−2). Note: Blue and red colors represent the training and validation sets, respectively.
Figure 7. Scatter plot of model predicted values versus measured values. (a) SVR—The measured value of yield (kg·hm−2); (b) GPR—The measured value of yield (kg·hm−2); (c) RF—The measured value of yield (kg·hm−2); (d) DF—The measured value of yield (kg·hm−2). Note: Blue and red colors represent the training and validation sets, respectively.
Agronomy 16 00186 g007aAgronomy 16 00186 g007b
Table 1. Vegetation Index in This Study.
Table 1. Vegetation Index in This Study.
Vegetation IndexComputing FormulaDocument
MTCIMeris Terrestrial Chlorophyll Index ( R 754 R 710 ) / ( R 710 R 682 ) [29]
PSRIPlant Senescence Reflectance Index ( R 682 R 502 ) / R 750 [30]
PRIPhotochemical Reflectance Index ( R 570 R 530 ) / ( R 570 + R 530 ) [31]
TCARITransformed Chlorophyll Absorption Reflectance Index 3 [ ( R 702 R 670 ) 0.2 ( R 702 R 550 ) ( R 702 / R 670 ) ] [32]
CARIChlorophyll Absorption Ratio Index ( R 702 R 670 ) 0.2 ( R 702 + R 670 ) [32]
NDVINormalized Difference Vegetation Index ( R 802 R 682 ) / ( R 802 + R 682 ) [33]
GNDVIGreen normalized difference vegetation index ( R 802 R 562 ) / ( R 802 + R 562 ) [33]
MSAVIModified Soil-Adjusted Vegetation Index 0.5 2 ( R 802 + 1 ) ( 2 R 802 + 1 ) 2 8 ( R 802 R 682 ) [34]
VARIVisible Atmospherically Resistant Index ( R 562 R 682 ) / ( R 562 + R 682 R 490 ) [35]
RVIRatio Vegetation Index R 802 / R 682 [35]
GRVIGreen-Red Vegetation Index R 802 / R 562 [36]
CRICarotenoid Reflectance Index 1 / R 510 1 / R 550 [37]
ARVIAtmospherically Resistant Vegetation Index R 802 [ R 682 ( R 490 R 682 ) ] R 802 + [ R 682 ( R 490 R 682 ) ] [38]
SRSimple Ratio R 750 / R 550 [39]
TVITriangular Vegetation Index 0.5 [ 120 ( R 752 R 550 ) 200 ( R 670 R 550 ) ] [40]
RDVIRenormalized Difference Vegetation Index ( R 802 R 682 ) / R 802 + R 682 [41]
EVIEnhanced Vegetation Index 2.5 × ( R 824 R 651 ) / ( 1 + R 824 + 2.4 × R 651 ) [42]
MSRModified Simple Ratio ( R 750 / R 705 1 ) / ( R 750 / R 705 + 1 ) [43]
Table 2. Performance statistics of different yield estimation models.
Table 2. Performance statistics of different yield estimation models.
PeriodFeature TypeMachine Learning ModelR2RMSE/(kg·hm−2)rRMSE/%
Flowering periodVegetation indexSVR0.3021606.10032.20%
GPR0.3141592.70031.93%
RF0.3851507.40530.22%
DF0.4141472.15629.52%
Feature bandSVR0.4031202.85126.71%
GPR0.599831.53019.40%
RF0.3291274.87628.81%
DF0.575856.30019.38%
Vegetation index + Characteristic bandSVR0.5171194.80026.55%
GPR0.6521014.90020.90%
RF0.5821111.92023.17%
DF0.673983.66919.46%
Filling periodVegetation indexSVR0.704845.42018.60%
GPR0.693860.41918.89%
RF0.684873.42219.15%
DF0.729808.27018.07%
Feature bandSVR0.736898.90017.93%
GPR0.7081004.92719.80%
RF0.6851043.63920.56%
DF0.728967.52019.06%
Vegetation index + Characteristic bandSVR0.696760.14019.21%
GPR0.742701.41017.73%
RF0.709743.60418.79%
DF0.735710.20117.95%
Flowering period + Filling periodVegetation indexSVR0.736716.78017.03%
GPR0.706752.57018.30%
RF0.670797.35319.38%
DF0.730721.22917.53%
Feature bandSVR0.6521048.85322.98%
GPR0.6681024.70022.45%
RF0.6461058.11323.19%
DF0.760872.01019.11%
Vegetation index + Characteristic bandSVR0.738710.08417.35%
GPR0.769667.44016.31%
RF0.725727.10017.76%
DF0.786641.47015.67%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, X.; Wang, Y.; Wu, H.; Kang, C.; Sun, J.; Gao, X.; Feng, M.; Zhao, Y.; Xiao, L. Hyperspectral Yield Estimation of Winter Wheat Based on Information Fusion of Critical Growth Stages. Agronomy 2026, 16, 186. https://doi.org/10.3390/agronomy16020186

AMA Style

Wang X, Wang Y, Wu H, Kang C, Sun J, Gao X, Feng M, Zhao Y, Xiao L. Hyperspectral Yield Estimation of Winter Wheat Based on Information Fusion of Critical Growth Stages. Agronomy. 2026; 16(2):186. https://doi.org/10.3390/agronomy16020186

Chicago/Turabian Style

Wang, Xuebing, Yufei Wang, Haoyong Wu, Chenhai Kang, Jiang Sun, Xianjie Gao, Meichen Feng, Yu Zhao, and Lujie Xiao. 2026. "Hyperspectral Yield Estimation of Winter Wheat Based on Information Fusion of Critical Growth Stages" Agronomy 16, no. 2: 186. https://doi.org/10.3390/agronomy16020186

APA Style

Wang, X., Wang, Y., Wu, H., Kang, C., Sun, J., Gao, X., Feng, M., Zhao, Y., & Xiao, L. (2026). Hyperspectral Yield Estimation of Winter Wheat Based on Information Fusion of Critical Growth Stages. Agronomy, 16(2), 186. https://doi.org/10.3390/agronomy16020186

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop