Estimation of the Bio-Parameters of Winter Wheat by Combining Feature Selection with Machine Learning Using Multi-Temporal Unmanned Aerial Vehicle Multispectral Images

: Accurate and timely monitoring of biochemical and biophysical traits associated with crop growth is essential for indicating crop growth status and yield prediction for precise ﬁ eld management. This study evaluated the application of three combinations of feature selection and machine learning regression techniques based on unmanned aerial vehicle (UAV) multispectral images for estimating the bio-parameters, including leaf area index (LAI), leaf chlorophyll content (LCC), and canopy chlorophyll content (CCC), at key growth stages of winter wheat. The performance of Support Vector Regression (SVR) in combination with Sequential Forward Selection (SFS) for the bio-parameters estimation was compared with that of Least Absolute Shrinkage and Selection Operator (LASSO) regression and Random Forest (RF) regression with internal feature selectors. A consumer-grade multispectral UAV was used to conduct four ﬂ ight campaigns over a split-plot experimental ﬁ eld with various nitrogen fertilizer treatments during a growing season of winter wheat. Eighteen spectral variables were used as the input candidates for analyses against the three bio-parameters at four growth stages. Compared to LASSO and RF internal feature selectors, the SFS algorithm selects the least input variables for each crop bio-parameter model, which can reduce data redundancy while improving model e ﬃ ciency. The results of the SFS-SVR method show be tt er accuracy and robustness in predicting winter wheat bio-parameter traits during the four growth stages. The regression model developed based on SFS-SVR for LAI, LCC, and CCC, had the best predictive accuracy in terms of coe ﬃ cients of determination (R 2 ), root mean square error (RMSE) and relative predictive deviation (RPD) of 0.967, 0.225 and 4.905 at the early ﬁ lling stage, 0.912, 2.711 µg/cm 2 and 2.872 at the heading stage, and 0.968, 0.147 g/m 2 and 5.279 at the booting stage, respectively. Furthermore, the spatial distributions in the retrieved winter wheat bio-parameter maps accurately depicted the application of the fertilization treatments across the experimental ﬁ eld, and further statistical analysis revealed the variations in the bio-parameters and yield under di ﬀ erent nitrogen fertilization treatments. This study provides a reference for monitoring and estimating winter wheat bio-parameters based on UAV multispectral imagery during speci ﬁ c crop phenology periods.


Introduction
Winter wheat is a crucial grain crop that plays a pivotal role in global food security and agricultural sustainability.In recent years, the significance of winter wheat research has been underscored by the growing challenges posed by climate change, population growth, and the need for sustainable agricultural practices [1,2].Crop biophysical and biochemical parameters provide important information about various aspects of crop conditions that have direct implications for productivity.Leaf area index (LAI) is a key biophysical parameter for quantifying crop canopy structure and function.Previous studies have highlighted the significance of LAI data in enhancing estimates of crop yield and land-atmosphere carbon dioxide exchanges by updating state variables in process-based agroecosystem models [3][4][5].Canopy chlorophyll content (CCC) is defined as the total chlorophyll content per unit ground area in a contiguous group of plants, serving as a valuable metric for estimating canopy nitrogen content, vegetation physiological status and gross primary production [6,7].Different from other crop physiological and biochemical traits, leaf chlorophyll content (LCC) directly reflects the nutrition status of individual crop plants.LAI, LCC, and CCC serve as crucial phenotypic traits in corps, offering effective insights into crop growth, plant health and yield prediction [8,9].Timely monitoring and accurate estimation of the bio-parameters are necessary for grasping wheat growth dynamics and offer guidance for field management.
Remote sensing techniques have found widespread application in estimating growth bio-parameters and grain yield across various experimental environments for precision agriculture practices [10].The collection of bio-parameter data at the ground level typically involved labor-intensive and time-consuming manual processes conducted via pointwise sampling.Moreover, ground-measured data were often limited to a few sampling points, posing challenges in representing the traits of the entire crop field area, and thereby restricting the scope of traditional ground bio-parameter data [11].Large-scale and high throughput data can be acquitted by the satellite-based remote sensing technology; however, it is difficult to reveal detailed local features due to coarse spatial resolution.In this context, unmanned aerial vehicle (UAV) remote sensing technologies have emerged as a capable tool for mapping crop bio-parameters traits with fine spatial and temporal resolution.
Over the past few decades, various vegetation indices (VIs) have been proposed for spectral remotely sensed bio-parameters estimation to simplify predictive modeling [12][13][14].However, the optimal VI relevant to bio-parameters varies depending on the crop growth stage, the range of variation in bio parameter or crop phenotype [15][16][17][18][19][20].Consequently, using a single VI to calibrate a general-purpose model for the entire growing season may not accurately capture the variation in bio-parameters for crucial individual stages.Many studies emerged using the combinations of spectral bands or spectral indices as input variables, combined with multiple linear regression or machine learning (ML) for predictive modeling [21][22][23][24][25].However, it should be noted that the highly correlated VIs will be generated in regression modeling due to the similarity of spectral index calculation formulas and spectral information when using VIs calculated by several spectral bands, especially for multispectral data with board-band [11].The presence of data redundancy and multicollinearity among spectral variables will significantly diminish the stability and efficiency of model prediction.
Feature variable selection can adaptively select the optimal combination of variable candidates to match the ML model, reduce data dimensionality, and improve modeling accuracy and efficiency [26].Therefore, many studies utilized feature selection to improve predictive modeling performance, and they can be divided into three categories: filter, wrapper, and embedded [27,28].For embedding algorithms, variable selection is embedded into the model training process, and is achieved by determining high-importance score contributed to the model, such as LASSO [29], variable importance in projection based on partial least squares (PLS-VIP) [30] and various regression trees [31,32].The filter-based algorithm, such as Pearson correlation coefficient thresholding, is most commonly used due to its simplicity.The selected variables by the filter-based algorithm can be explained easily for the dependent variable.The disadvantage is that it does not take into account the characteristics of the ML model and is more suitable for simple empirical regression algorithms.The wrapper algorithm treats the feature selection as a search problem and evaluates the merits of feature variables through the evaluation function of the induction learner, which can select "tailor-made" variables for each model.The generation procedure for finding the optimal variable combination based on the wrapper includes forward or backward search, recursive feature selection (RFE), and bionic algorithms [33].The wrapper algorithms are computationally more expensive than the filtering algorithm, due to repetitive training steps and cross-validation.However, the wrapper algorithms are more accurate than filtering algorithms.Wang [34] and Wang [35] estimated the wheat LCC using multiple ML models combined with the important ranking of the random forest model.Zhu [15] and Yin [36] used multiple MLs combined with filterbased and RFE feature selection to estimate wheat LCC at different growth stages, respectively.These studies indicated that using the ML model alone cannot achieve optimal model accuracy, and the combination of feature selection and ML model can more accurately estimate the LCC of winter wheat.However, the results in the above studies indicated that the number of feature variables selected by the RF or the RFE is still relatively large, which has not effectively achieved the goal of data dimensionality reduction, and there is still a problem of data redundancy for regression modeling.
Therefore, the primary objective of this study was to develop a machine learning regression modeling combined with an adapted variable selection scheme for estimating the bio-parameters of winter wheat at various growth stages.The specific objectives were (i) to examine changes in crop bio-parameters during the growth stages and the correlation with spectral variables; (ii) to compare and evaluate the combination of variable selection and machine learning estimation performance in monitoring bio-parameters traits during the growth stages; and (iii) to explore the variations in crop bio-parameters and grain yield under multi-fertilization treatments, with an aim to provide a reference and technical support for UAV remote sensing monitoring of crop bio-parameters with fertilizer management, thus boosting the applications of UAV multispectral remote sensing technologies in precision agriculture.

Study Site and Experimental Design
During the winter wheat growing season of 2022/2023 in Xuzhou, Jiangsu Province, China, the experimental study was conducted at the Jiangsu Xuhuai Regional Institute of Agricultural Science (33°16′58″N; 117°17′23″E, elevation 35 m a.s.l.).The experiment involved two local wheat varieties (XM35 and XM28) and four nitrogen fertilizer rates (0, 180, 225, 270 kg N/ha).The field experiment used a split-plot design; a total of 82 plots with 7.5 × 1.5 m 2 each (Figure 1).The treatment of nitrogen fertilizer in each plot was split into base and topdressing fertilizer in a proportion of 1:1.Four treatments of nitrogen fertilizer were applied before sowing and at the jointing stage.Irrigation applied natural rainfed field conditions and weed control followed local field management practices.Winter wheat was sown on 10 October 2022, and harvested on 11 June 2023, completing a 243-day life span.Measurements were conducted at four growth stages of winter wheat: late jointing (DAS 172), booting (DAS 185), heading (DAS 198), and early filling stage (DAS 214) (Table 1).

In Situ Measurements and Laboratory Processes
Growth-related bio-parameters, including the LAI, LCC, and CCC, were collected during the growing season (Table 1).Examples of ground photos reflecting crop growth status are shown in Figure 2, the photos were taken at a height of approximately one meter above the wheat canopy.The LAI and LCC measurements were conducted within a 1 m 2 area in each field plot.Each ground sample area's center position was recorded using a standard portable navigational equipment combining the Network Real-Time Kinematic technology (Network RTK).The LAI value was obtained using an LAI-2200C plant canopy analyzer (Plant Canopy Analyzer, LI-COR, Lincoln, NE, USA).For each sampling area, one sky value and five target values recorded by LAI-2200C were utilized in the LAI calculation of crop canopy.The sky value served as the calibration reference, while the average of the five target values was taken as the ground truth LAI for the corresponding site.Measurements were executed between 16:00 and 18:00 local time, specifically avoiding direct sunlight whenever possible.
For the measurement of LCC, the "five-point sampling" method was applied to select five flag leaves from wheat plants within one-meter square in each sampling area.Subsequently, these selected leaves were promptly placed in an insulated box with ice for transportation to the laboratory.In total, 0.1 g fresh leaf disks were collected from wheat leaf blades of each sampling area using a leaf puncher (diameter = 8 mm), and their pigments were extracted using 10 mL 95% analytical reagent alcohol.Extract absorbance at 649 nm and 665 nm was measured using an ultraviolet-visible spectrophotometer (MAPADA, Shanghai, China) after 24 h of dark storage.The determination of total chlorophyll concentrations in mg/mL involved the utilization of extinction coefficients to the absorbance values.These concentrations were subsequently converted to µg/cm 2 , taking into account the specific area of the leaf disks and the solution volume, as detailed in the work [13].The Canopy Chlorophyll Content (CCC), expressed per unit of leaf area, was determined by multiplying the LAI and LCC [37].A total of 194 valid measurements for each crop bio-parameter were obtained from the four sampling campaigns during the growth season.

UAV Platform and Flight Configuration
Multispectral data were simultaneously acquired by a DJI Phantom 4 multispectral UAV (Da-Jiang Innovations, Shenzhen, China).The equipment integrates five optical filter sensors with different central wavelengths (blue: 450 ± 16 nm, green: 560 ± 16 nm, red: 650 ± 16 nm, red edge: 730 ± 16 nm, near infrared: 840 ± 26 nm).The UAV campaigns were conducted between 10:00 and 14:00 local time, under clear sky and low wind speed conditions.The 90% reflectance calibration board took a photo using the UAV camera before takeoff and landing.The flight path was automatically generated by DJI GS Pro and the flight parameter settings were kept consistent each time (Table 2).Network RTK technology was utilized to enhance UAV positioning accuracy.Following image acquisition, band registration and image stitching were performed using DJI Terra, followed by a radiometric correction to obtain multispectral reflectivity orthophoto.To reduce the influence of additional factors such as soil, we performed an image segment using the Sequential Maximum Angle Convex Cone (SMACC) tool [38] within the ENVI 5.3 software (Harris Geospatial Solutions Inc., Boulder, CO, USA).Linear spectral unmixing is a commonly used method in spectral image classification for mixed pixels.The method consists of two steps: first, extracting the spectra of "pure" ground objects (endmember extraction); and second, representing mixed pixels through linear combinations of end elements (mixed pixel decomposition).The abundance image is the visualization result of mixed pixel decomposition, revealing the relative contributions of each endmember within each pixel.The SMACC tool integrates linear spectral unmixing to simplify the endmember extraction process.It enables the rapid and automated extraction of endmember spectra and abundance images from raw spectral images with a streamlined process.The experimental field predominantly contained wheat and soil.Thus, the number of endmembers considered included wheat, bare soil, and shadow.According to the statistical histogram watershed of wheat abundance image at each growth stage, a threshold was set to remove the soil background.As shown in Figure 3, we could identify the endmembers (wheat and soil) from the images with a spatial resolution of 3 cm.

Calculation of Vegetation Index
The reflectance values of five bands served as the basis for calculating the vegetation indices, which are commonly utilized in the works to estimate growth bio-parameters and monitor crop growth status.In this study, the initial variables set was established by the combination of five spectral bands and the 13 vegetation indices, which are used to develop a feature selection and machine learning-based model for estimating bio-parameters values of winter wheat.The spectral variables formulas applied are presented in Table 3. Chlorophyll Index using Red Edge Reflectance CIred-edge (NIR/Edge) − 1 [41] Chlorophyll Vegetation Index CVI NIR × (Red/Blue 2 ) [42] Green Normalized Difference Vegetation Index GNDVI (NIR − Green)/(NIR + Green) [43] Leaf Chlorophyll Index LCI (NIR − Edge)/(NIR + Red) [44] Modified Soil Adjusted Vegetation Index MSAVI 2 [45] MERIS Terrestrial Chlorophyll Index MTCI (NIR − Edge)/(Edge − Red) [46] Modified Triangular Vegetation Index 2 MTVI2 2 1.2 (NIR Green) 2.5 (Red Green) 1.5 (2 NIR+1) (6 NIR 5 Red ) 0.5 Note: In the formulations, B, G, R, RE, and NIR represent the reflectance values corresponding to the blue (450 nm), green (560 nm), red (650 nm), red edge (730 nm), and near-infrared (840 nm) bands, respectively.'-' refers to the reflectance value of the corresponding band.

Least Absolute Shrinkage and Selection Operator Regression (LASSO)
LASSO is a statistical method used for variable selection and regularization in linear regression models.In LASSO regression, a shrinkage (or regularization) process is incorporated into the traditional linear regression model, which helps prevent overfitting and select the most relevant predictor variables.It achieves this by introducing a penalty term based on the absolute values of the regression coefficients.The key feature of LASSO is its ability to shrink some coefficients to exactly zero, effectively performing automatic variable selection.The final objective of the process is to minimize the prediction error.The parameter for regularization amount control was tuned in this study through 5-fold crossvalidation.The range of the parameter was set between 0.01 and 100 values along the regularization pass to identify the parameter value with the minimal mean squared error.The regression modeling was performed with the R package 'glmnet'.

Random Forest Regression (RFR)
RFR is an ensemble learning method in machine learning that leverages the power of multiple decision trees using the "Bagging" idea [50].RFR regression works by constructing a multitude of decision trees during training and outputs the average prediction of the individual trees for the regression task.RF also provides insights into feature importance, aiding in variable selection and understanding the factors influencing the regression model.In this study, three parameters were tuned with grid search, namely the number of rounds ranged from 50 to 150 with step 20 and the max tree depth ranged from 3 to 20 with step 5, the mode of max branching features number was set 'None', 'log2' and 'sqrt'.The regression modeling was performed with Python's 'sklearn' library.

Support Vector Machine Based Sequential Forward Selection Regression (SFS-SVR)
SFS-SVR is the combination of the sequential forward selection algorithm and support vector machine.The variable selection is firstly performed by SFS by wrapper algorithm idea, which extends the variables subset from an initial set of variables in each iteration with the variable that increases the induction learner performance the most [28,51].SFS starts with an empty subset and iteratively adds a variable to the subset to select the input variable combination that has the best merit value based on the evaluation function.In this study, the support vector machine with radial basis kernel was utilized as induction learner, and its root mean square error was used as the criterion to be minimized, using the resampling technique at each iteration to stabilize the feature rankings.The variables selected by SFS are taken as input variables for the next step regression modeling.The SVR with radial basis function kernel was utilized as regression model, and a grid search was employed to optimize model parameters C and γ.To avoid overfitting, C was varied from 0.1 to 10 and combined with γ from 0.005 to 5 in the grid search.The variable selection and regression modeling were performed with R package 'mlr3' and 'caret', respectively.

Accuracy Assessment
To test how accurately the models predict the value of bio-parameters values, including LAI, LCC, and CCC, the coefficients of determination (R 2 ), root mean square error (RMSE), and relative predictive deviation (RPD) were selected to evaluate the accuracy of model training and model validation.Hold-out validation was utilized to obtain the merits of this study.With regard to dataset partitioning, 70% of the samples were used for training and 30% were for validation.

Distribution of Biochemical Parameters in the Winter Wheat
Table 4 displays the variations in ground-measured LAI, LCC, and CCC values at four growth stages of winter wheat.Across all stages, the LAI varies from 0.70 to 5.82, with the standard deviation (SD) of 1.23, the LCC varies from 15.40 µg/cm 2 to 70.08 µg/cm 2 with SD of 12.98, the CCC varies from 0.12 g/m 2 to 3.25 g/m 2 with SD of 0.82.The mean values of three biochemical parameters showed a trend of increasing first and then decreasing.Figure 4 shows the LAI, LCC, and CCC values of winter wheat at four growth stages under different nitrogen fertilizer levels.The results showed that LAI values presented a consistent changing trend under different N fertilizer levels, and reached their maximum at the booting stages.Similar to the changing trend of LAI, LCC, and CCC values of winter wheat under N180, N225, and N270 treatments reached their maximum at the booting stage, and started to decrease afterward.The results indicated that the bio-parameter values of wheat are at their peak during the booting stage, and then begin to decrease.This is because nutrients and water are primarily utilized for the growth of roots, stems, and leaves of winter wheat before the jointing state, and then they are allocated to the growth of wheat spikes.Besides, as the leaves in the lower layer within the canopy senesce, this leads to changes in bio-parameters traits of the canopy.Note that the LCC and CCC values of winter wheat under N0 treatment reached their maximum at the heading stage.This relative delay phenomenon may be caused by the slow growth of wheat leaves due to nitrogen deficiency.The standard deviations of LCC at high N levels were lower than those at low N levels, indicating that the crop canopies with sufficient N fertilizer were more homogeneous, while the standard deviation results of LCC were the opposite for LAI and CCC.Overall, the values of three bio-parameters were roughly correlated with N fertilizer and presented a similar changing trend across the four growth stages.

Correlation Analysis
Figure 5 shows a mantel test heat map of correlation analysis between bio parameters and different spectral variables of four growth stages of winter wheat.The correlation between bio-parameters and variables is represented by line color and thickness, while the correlation between variables is represented by color and rectangular area.Mantel's P refers to p-value; the larger the Mantel test's r and the smaller the p-value, the greater the impact of the variable on the bio-parameter.The results indicated significant correlation between bio-parameters and most of the spectral variables across different growth stages.Meanwhile, a high correlation was observed among spectral variables, posing potential multicollinearity challenges for regression modeling.By using spectral variable selection, some redundant variables will be removed to obtain a more simplified model.

Feature Variable Selection
Eighteen variable candidates, including five spectral bands and 13 vegetation indices, were used to select the optimal variables suitable for modeling to estimate each of the three bio-parameters.LASSO, RF important measurement, and SFS were implemented on the 18 spectral variables to select the optimal variable combination at four growth stages.
Figure 6 shows the selected feature variables by the LASSO model for each bio-parameter and growth stage.The optimal combination of variables can be selected by the LASSO internal selector to simplify the model.LASSO reduces the coefficients of unimportant features to zero, and the size of the coefficient reflects the impact of the feature on the target variable.A coefficient with a larger absolute value indicates that the feature has a significant impact on the target variable, while a coefficient with a smaller or zero value indicates that the feature has a smaller or non-existent impact on the dependent variable [29,52].The results showed that the selected feature variables at each growth stage were quite different; thus, this also demonstrated that it was not appropriate to choose a unified variable for further research.It was noted that MTVI2 and NDVI were the top selected feature variables for LAI across all growth stages, CCCI, NDVI, and SIPI were the preferred feature variables for LCC across all growth stages, while MSAVI, and MTCI were the favored feature variables for CCC across all growth stages.The feature importance score of all variable candidates derived from the RF model for each bio-parameter and growth stage are shown in Figure 7. RF considers the importance of each feature variable and assigns greater weights to more important features on the model.This means that the variable with a high importance score contributed a larger share in the model's prediction and had a more significant impact on the final regression results.When the variable importance score is very low, it either means the variable is not important or it is highly collinear with one or more other variables [50].It should be noted that although RF can estimate the importance of the feature variable, it cannot provide specific information on how the feature variable explains the target variable.GNDVI and SIPI made a relatively high contribution to the RF model for LAI at all growth stages.Rededge, CCCI, NDRE, and SIPI had relatively high and stable contributions to the RF model for LCC at all growth stages.Red, GNDVI, and SIPI had relatively high and stable contributions to the RF model for CCC at all growth stages.Different LASSOs and RFs embed internal selectors; SFS is a wrapper variable selection algorithm that separates from the regression modeling.SFS can form a feature subset from all feature candidates for the following regression modeling.The feature subset is the optimal combination of feature variables determined by the induction learner during the forward search process.This reflects which features are considered to have a positive impact on model performance, while which features are ignored or excluded.
The optimal variables selected by the three methods for the bio-parameters modeling are listed in Tables 5-7, respectively.Among them, the variables with the importance score are more than 0.7 derived from RF were presented in the tables.The results showed that the optimal variables were quite different among the three methods for each bio-parameter and growth stage.We marked the variable selected more than two in each growth stage for the three methods in the tables.For LAI, the selected frequency of rededge and MTVI2 were highest at the late jointing stage, the selected frequency of blue, red, rededge, ACI and MTVI2 were highest at the booting stage, the selected frequency of LCI, MSAVI, MTVI2 and NDVI were highest at the heading stage, NDVI was the only variable that was commonly selected by all three methods at the early filling stage.For LCC, CCCI was the only variable that was commonly selected by all three models at the late jointing stage, the selected frequencies of CCCI, MTCI and SIPI were highest at the booting stage, the selected frequency of MTCI and NDVI were highest at the heading stage, the selected frequency of blue, rededge, CCCI and SIPI were highest at the early filling stage.For CCC, MSAVI and CIre were the variables commonly selected by all three models at the late jointing and booting stage, respectively.The selected frequency of NIR and MTCI was highest at the heading stage, while the selected frequency of CIre and CVI was highest at the early filling stage.Overall, SFS selected fewer variables from all variable candidates than LASSO and RF, thus having better performance in reducing data redundancy.

Winter Wheat Bio-Parameters Mapping
The models with the highest predictive capability among those developed in this study were used to construct pixel-level spatial mapping of the three bio-parameter values at each growth stage.Figure 9 displays the predicted maps of winter wheat bio-parameter values across four growth stages using the SFS-SVR model.The visualization results were highly consistent with the field experiment, as displayed in Figures 1 and 4, which indicates that the inversion results were reliable.The within-plot variance for each fertilization treatment was low, and the difference between plot treatments was obvious, particularly for the results of CCC.Consequently, the results were available for further field-level precision fertilization study.

Late Jointing
Booting Heading Early Filling Figure 9.The maps of winter wheat bio parameters retrieved by SFS-SVR at four growth stages.

The Relationship between Winter Wheat Grain Yield and Biochemical Parameters
Remote sensing estimation of winter wheat bio-parameters is based on Vis, which can serve as indicators of grain yield [53].It is necessary to verify whether the relationship between wheat LAI, LCC, CCC, and yield (measured) is significant.The average values of bio-parameters of each plot were extracted using the ArcGIS zoom statistics tool.Figure 10 shows the relationship between LAI, LCC, CCC, and yield under different stages.The results displayed that the LAI, LCC, and CCC were related to yield, and the relationship varied with the growth stage.For LAI, the goodness of fit (R 2 ) values were 0.561 (late jointing), 0.631 (booting), 0.674 (heading) and 0.722 (early filling).For LCC, the R 2 values were 0.534 (late jointing), 0.278 (booting), 0.297 (heading) and 0.525 (early filling).For CCC, the R 2 values were 0.461 (late jointing), 0.601 (booting), 0.563 (heading) and 0.523 (early filling), respectively.The growth stages with the highest correlation between yield and LAI, LCC, and CCC, were early filling (R 2 = 0.722), late jointing (R 2 = 0.534), and booting stage (R 2 = 0.601), respectively.The relevance ranking: LAI > CCC > LCC.The correlation analysis results showed that the LAI and CCC were crucial indicators for assessing the yield of winter wheat, but the assessing results can be affected by variations at the growth stage.This also provided a basis for the prediction of winter wheat yield using the crop bio-parameters.To explore the variations in wheat biochemical parameters and yield under different nitrogen treatments, the bio-parameters values at the growth stage with the highest prediction accuracy were compared with yield under different nitrogen treatments (Figure 11).It can be seen that both bio-parameters and yield increased with N fertilizer level, except for the treatment of N270 under T4, the difference of bio-parameter values and yield under different treatments presented good consistency.For the four N fertilizer treatments, the average yield ranking was T3 > T4 > T1 > T2, and the average values of the three bio-parameters were T4 > T3 > T1 > T2.This indicated that wheat growth status and yield were not only related to the N fertilizer level, but also to the fertilization approach of base and topdressing fertilizers, and excessive fertilization cannot increase wheat yield.In this study, the optimal N treatment for increasing yield was N225 under T3, rather than N270 under T4, which had the highest fertilization rate.The reason could be that for the N270 under T4 treatment, its effective nutrients were not sufficient to supply the growth of wheat grain, but rather to the growth of other organs such as leaves and stems.Overall, the trend of changes in the three bio-parameters and yield presented consistency among different N treatments.With the increase in N fertilizer level except for the treatment of N270 under T4, both bio-parameters and yield increased.The results indicated that reasonable treatment with base fertilizer and topdressing fertilizer can promote the improvement of wheat growth and yield.This also indirectly demonstrated the accuracy of bioparameter prediction results.In addition, these findings will provide a scientific basis for enhanced monitoring and diagnosis of nitrogen nutrition, contributing to improved field management practices for winter wheat.received fertilizer split a half urea fertilizer at sowing and a half slow-release fertilizer at the jointing state; T3 received fertilizer split a half slow-release fertilizer at sowing, and a half urea fertilizer at the jointing state; T4 received fertilizer split a half urea fertilizer at sowing and a half slow-release fertilizer at the jointing state.

Uncertainty of Observed Data
Firstly, a commonly overlooked issue in UAV multispectral remote sensing applications is the presence of multi-source errors in multispectral data, which can affect the accuracy of wheat bio-parameters estimation.Due to the multiple independent sensor lenses with different spectral bands, band registration and image mosaic are necessary for data preprocessing.However, the surface texture of the field crop canopy is uniform, so it is easy for it to result in fewer matching feature points due to less distinctive texture features [54].The existing approach to processing image data was performed by popularly used software such as DJI Terra.In addition, the growth of winter wheat leads to changes in canopy structure, the growth and senescence of wheat leaves and spikes, as well as the differences in changes of light radiation of different flight operation times, which brings a certain degree of uncertainty in data consistency.A general method alleviating this issue was achieved by two approaches: (i) collecting data during periods of relatively stable and sufficient solar radiation conditions and without cloud coverage; and (ii) recording light radiation information using the built-in photometer of multispectral sensor and participating in subsequent calibration with a standard reflectance panel.Besides, this study did not consider the heterogeneity of bio-parameters vertical distribution due to the influence of light conditions, which have been reported in studies [55][56][57].It was assumed that the foliar chlorophyll content in the vertical layer of wheat was constant in this study.Thus, using only the LCC of flag leaves may lack representativeness for the chlorophyll content of the sample area, which has to some extent affected the estimation of LCC.Thus, we used CCC to estimate canopy chlorophyll content.Finally, due to the similarity of the spectral index calculation formula, when using the five broadband multispectral bands to calculate VIs, a lot of highly correlated VIs will be generated during regression modeling.We attempt to use different algorithms to reduce collinearity effects and screen an optimal variable combination to reduce the redundancy of VIs.The results of this study showed that the variable candidates can be reduced from 18 spectral variables to only a few feature variables, further improving the accuracy and efficiency of modeling prediction.

Comparison of Different Models
This study evaluated the effectiveness of three different regression methods combined with variable selection, including LASSO, RFR, and SFS-SVR, on estimating winter wheat growth bio-parameters traits.Previous studies have indicated that due to the saturation of a single VI and its low sensitivity during the growth stages, it is difficult to accurately estimate the multi-temporal changes in crop bio-parameters using traditional VI methods [16][17][18][19][20]. Non-parametric machine learning methods, such as RFR and SVR, are less sensitive to skewness in data distribution, and can therefore be used to handle nonnormal data [21][22][23][24][25].In this study, the results demonstrate that it is feasible to accurately predict the bio-parameters of winter wheat at variable stages based on the VIs and machine learning regression combined with variable selection.Moreover, the combination of machine learning with variable selection is suitable for solving data redundancy and multicollinearity problems.In addition, the relative importance of each input variable may vary depending on the crop growth stage and severity of crop stress.The utilization of different stages of data as input variables resulted in variations in model accuracy.Comparing SFS with LASSO and RF internal feature selector, the input variables selected by the three methods were different.Specifically, SFS selected the fewest variables for bioparameters modeling at four growth stages.The training accuracy of the RFR model was the highest among all three models, but its training accuracy was much higher than the testing accuracy, and the testing accuracy was the lowest among the three models.It indicated that the constructed RFR model was overfitted, which may be due to limited modeling samples and the use of inappropriate optimization model parameters.For the regression results of the LASSO model at each growth stage, at most one bio-parameter prediction had the highest testing accuracy.By comparing the three models, we found that the performance of SFS-SVR was more robust, and it showed a better ability to predict wheat bio-parameters across different growth stages.In addition, compared to the other two methods, the SFS usually selected the least input variables, which can reduce data redundancy while greatly saving model prediction time and improving model efficiency.

Effects of Crop Phenology on Bio-Parameters Estimation
Effects of growth stage, crop type and the range of variation in bio-parameters should be taken into account when applying remote sensing in precision agriculture [16,58,59].The findings in this study confirmed varying accuracies of the models across different growth stages, and that the phenological factor can impact model accuracy within the experimental setup.Specifically, the results indicated that the LAI estimation at the early filling stage of winter wheat had the highest prediction accuracy, the LCC estimation at the heading stage had the highest prediction accuracy, and the CCC estimation at the booting stage had the highest prediction accuracy.For the prediction accuracy of winter wheat bio-parameters values at the four growth stages, LAI: early filling > booting > late jointing > heading, LCC: heading > early filling > booting > late jointing, CCC: booting > early filling > heading > late jointing.The accuracy of bio-parameter estimation varied with different growth stages of winter wheat, which can be attributed to changes in various factors including crop canopy structure, leaf thickness and cell structure, leaf pigment content, and crop coverage [15,31,36,60].In addition, the LAI, LCC, and CCC change with the increase in leaf size and number in the vertical distribution of wheat, and the growth and senescence of wheat spikes also affect the estimation of wheat canopy reflectance and bio-parameters.At the jointing stage, due to the small size of winter wheat, multiple scattering of leaves and soil background mixing significantly affect the canopy reflectance.At the peak booting stage of bio-parameters, the VIs may become saturated, reducing the prediction accuracy of three bio-parameters models.At the early filling stage, the senescent of wheat leaves and spikes may affect the canopy reflectance, which reduces the prediction accuracy at LCC at the early filling stage compared to that at the heading stage.
The study also demonstrated the importance of bio-parameters in evaluating comprehensive yield traits of winter wheat, which is consistent with previous research results [59,61].Further correlation analysis between wheat yield and LAI, LCC, and CCC confirmed that the bio-parameters are important indicators for yield estimation, but their relationship varied depending on the growth stage (Figure 10).In addition, the correlation between yield and LCC was low (R 2 = 0.534 at LJ), indicating that it is difficult to accurately evaluate wheat yield using LCC alone.As a product of the combination of LAI and LCC, the correlation between CCC and yield had improved (R 2 = 0.601 at BS), but did not exceed the correlation between LAI and yield (R 2 = 0.722 at EF).The high correlation between yield and LAI indicated that LAI can better characterize the winter wheat growth status, while achieving better yield evaluation.Therefore, further research should use LAI as an important factor for assimilating the wheat yield prediction model.

Conclusions
In this study, UAV-based spectral variables were adopted to estimate the growthrelated bio-parameters of winter wheat at the four growth stages.We proposed three statistical methods with regard to feature selection and machine learning, including LASSO, RFR, and SFS-SVR, for winter wheat LAI, LCC and CCC estimation.The finding of this study revealed that: (1) the values of three bio-parameters were generally correlated with N fertilizer and presented similar changing trends across the four growth stages; (2) LAI estimates at the early filling stage, LCC estimates at the heading stage and CCC estimates at the booting of winter wheat were more suitable than estimates at other stages; (3) SFS-SVR was a robust method for the bio-parameters estimation of winter wheat at key growth stages based on UAV multispectral imagery, effectively reducing data redundancy and enhancing predicted accuracy; and (4) LAI was a more crucial indicator related to the yield of winter wheat than CCC and LCC, and the correlation can be impacted by the variations in the growth stage.
In summary, the results demonstrated the potential of using the SFS-SVR regression to estimate winter wheat bio-parameters traits in field scale.This study is valuable for monitoring and estimating winter wheat bio-parameter-based on UAV multispectral imaging of specific crop phenology periods, thereby offering guidance for field management and optimizing agricultural practices to enhance crop yield.Future studies should encompass multiple crops across diverse agricultural contexts, which would enhance the generalizability of findings.A more in-depth temporal analysis over multiple growing seasons can assess model robustness under varying environmental conditions.Exploring sensor fusion with different types of sensors, on-farm validation studies, and optimizing UAV flight parameters contribute to the scalability and practical applicability of the methodology.Additionally, incorporating advanced machine learning techniques and assessing crop yield is crucial for comprehensive advancements.

Figure 1 .
Figure 1.Diagram of the winter wheat experimental site.(a) RGB image marked with multiple nitrogen fertilizer levels application, and (b) experimental design with multiple fertilizer treatments.

Figure 2 .
Figure 2. Examples of ground photos reflecting winter wheat growth status.

Figure 3 .
Figure 3.The results of wheat extraction based on the sequential maximum angle convex cone (SMACC) method.(a) RGB Image, (b) wheat abundance, (c) soil abundance, (d) wheat image after removing soil background.

Figure 4 .
Figure 4. Variation in bio-parameters under various nitrogen (N) fertilizer levels.

Figure 5 .
Figure 5.The results of correlation analysis between three bio-parameters and spectral variables at four stages of winter wheat.(a) Late jointing, (b) booting, (c) heading, and (d) early filling stage.

Figure 6 .
Figure 6.Selected feature variables by LASSO and their importance scores at four growth stages, displayed as regression coefficients with bio parameters.blue bar refers to positive correlation, red bar refers to negative correlation.(a) LAI, (b) LCC, and (c) CCC.

Figure 7 .
Figure 7.The feature importance ranking from RF of winter wheat at four growth stages.(a) LAI, (b) LCC, and (c) CCC.

Figure 8 .
Figure 8.The validation results of the SFS-SVR model in evaluating the LAI (a), LCC (b) and CCC (c) status of winter wheat at four growth stages.(LJ: late jointing, BS: booting, HS: heading, EF: early filling stage).

Figure 10 .
Figure 10.The relationship between the winter wheat yield and LAI, LCC and CCC at four growth stages.(LJ: late jointing, BS: booting, HS: heading, EF: early filling stage).

Figure 11 .
Figure 11.The variation in winter wheat LAI (a), LCC (b), CCC (c) and yield (d) under different N treatments.N represents the nitrogen fertilizer level (180, 225, 270 kg N/ha).Treatment T1 received fertilizer split a half mixing of urea and slow-release fertilizers in a proportion of 1:1 at sowing and a half mixing of urea and slow-release fertilizers in a proportion of 1:1 at the jointing state; T2

Table 1 .
Measurement date and corresponding growth stage of winter wheat.

Ground and UAV Measurement Data (2023) Growth Stage Description Abbreviation
Days after sowing (DAS).Winter wheat was sown on 10 October 2022, and harvested on 11 June 2023, completing a 243-day life span. Note:

Table 3 .
Spectral variables used in this study.

Table 4 .
Descriptive statistics of the values of bio-parameters for different growth stages.

Table 5 .
The results of variable selection for LAI modeling.

Table 8 .
The results of the regression models for winter wheat bio-parameters prediction at the four growth stages.