Estimation of the Leaf Area Index of Winter Rapeseed Based on Hyperspectral and Machine Learning

: Leaf area index (LAI) is essential for evaluating crop growth and development. Destructive manual measurement methods mainly achieve traditional crop LAI acquisition. Due to the advantages of being fast and non-destructive, spectroscopy technology provides a feasible method for obtaining crop LAI. In order to achieve efﬁcient acquisition of winter oilseed rape LAI, this study collected hyperspectral data and LAI data at the full-bloom stage of winter oilseed rape. It calculated the spectral indexes related to the LAI of the original spectrum and the ﬁrst-order differential spectrum, respectively. The index with the highest correlation with the LAI of winter oilseed rape at the ﬂowering stage was selected as the optimal spectral index for input. Subsequently, three machine learning methods, Back Propagation Neural Network (BPNN), Support Vector Machine (SVM), and Random Forest (RF), were used to construct the LAI model of winter oilseed rape, and the model was tested. The results show that the correlation coefﬁcient between the spectral index calculated by the ﬁrst-order differential processing of the original spectral data and the LAI of winter rapeseed is signiﬁcantly improved compared with the original data. Among them, the spectral index NDVI with the best correlation coefﬁcient with LAI can be obtained under the ﬁrst-order differential: the correlation coefﬁcient is 0.734, and the wavelength combination is 716 nm and 724 nm. At the same time, we found that when the input variables are the same, the RF model has higher estimation accuracy than the other models. The best estimation accuracy is obtained when the input variable is the ﬁrst-order differential spectral index. The R 2 of the model validation set is 0.810, RMSE is 0.455 cm 2 /cm 2 , MRE is 10.465%, and the model accuracy is high. The results of this study can provide a theoretical basis for crop monitoring based on spectral technology and provide a theoretical basis for crop growth.


Introduction
Oilseed rape is one of the most significant economic crops in China and ranks among the four major oil crops worldwide.It boasts extensive cultivation, remarkable growth adaptability, versatile applications, promising development prospects, and substantial economic value [1].Winter oilseed rape seeds can be extracted to produce biodiesel, a renewable energy source that offers advantages over conventional fossil fuels by reducing greenhouse gas emissions and diminishing reliance on finite energy resources [2].Leaf Sustainability 2023, 15, 12930 2 of 13 area index (LAI) denotes the ratio of multiple plant leaves to land area per unit of land area.This crucial physiological parameter reflects vegetation density, canopy structure and function, biological characteristics, and ecological environmental factors [3].Its value correlates closely with biomass and crop yield, rendering it a vital indicator for assessing the growth status of crop populations.Nevertheless, traditional field experiments to measure rape LAI are time-consuming, cause significant plant damage, and yield imprecise measurements [4].Consequently, researchers have shifted their focus towards efficient, accurate, and non-destructive methods to obtain vegetation parameters like rape LAI within this domain.
In recent years, remote sensing technology has gained widespread usage in real-time monitoring of modern agriculture, owing to its dynamic and macroscopic advantages.Numerous scholars have conducted studies using hyperspectral remote sensing technology to determine vegetation physiological parameters [5][6][7].For instance, Qi et al. [8] employed hyperspectral remote sensing to measure the hyperspectral data of peanut leaves under different planting densities.They constructed a vegetation index based on leaf spectra by combining various bands across the full spectral range (350-1830 nm).By quantifying the correlation with chlorophyll content, they selected the optimal vegetation index reflecting the SPAD value of peanut leaves and established a regression-based SPAD estimation model.The study findings revealed that excessive or inadequate plant densities resulted in decreased chlorophyll content in peanut leaves, reaching its maximum under appropriate plant density.Moreover, the spectral reflectance of peanut leaves exhibited significant variations across different chlorophyll content levels.Similarly, Li et al. [9] utilized corn canopy spectral data in conjunction with measured plant height and biomass data to analyze the relationship between biomass and spectral reflectance.They identified sensitive bands and constructed a hyperspectral vegetation index.Regression analyses employing Partial Least Squares (PLS), Support Vector Machine (SVM), and Random Forest (RF) were conducted to establish a biomass inversion model for maize, and the model's accuracy was validated.The results demonstrated that integrating plant height data into biomass inversion enhanced the model's accuracy to some extent, thereby improving the effectiveness of crop biomass estimation.
In previous studies, spectral indices have been calculated using fixed wavelengths to estimate crop LAI.The spectral indices with better correlation to LAI were selected for modeling [10,11].However, when studying crops, the physiological information of the subjects may vary greatly due to different growing environments, even if the varieties are the same.This leads to variations in spectral data.Using fixed wavelengths to calculate spectral data under such circumstances results in underutilization of the spectral information and can affect the accuracy of the resulting estimation model.Recently, researchers have employed the correlation matrix method to further process and model spectral data [12,13].However, these studies have mainly focused on soil and environmental disciplines [14] and other crops, less on winter oilseed rape.Xiang et al. [15] studied the LAI of soybean at the flowering stage under different nitrogen application levels and film mulching treatments.The original hyperspectral reflectance data of soybean at the flowering stage were processed by 0-2 order differential transformation, and the index with the highest correlation with the LAI of soybean at the flowering stage was selected as the optimal spectral index for input.Three machine learning methods: SVM, RF, and the BP neural network optimized by genetic algorithm (GA-BP), were used to construct soybean LAI prediction models.The conclusions are as follows: 1.5-order differential and RF methods are the optimal differential order and the optimal model construction method in this study, respectively.The R 2 of the optimal soybean LAI estimation model modeling set and validation set are: 0.890 and 0.880, RMSE are: 0.348 cm 2 /cm 2 and 0.320 cm 2 /cm 2 , NRMSE are: 11.278% and 10.354%, MRE are: 9.795% and 9.572%, respectively.Therefore, this study focused on winter oilseed rape in Northwestern China.Leaf canopy hyperspectral data were collected during the flowering stage.Seven spectral indices (14 in total, including order 0 and order 1) were calculated based on the full-band hyperspectral reflectance.The RF model was applied to estimate the LAI of winter oilseed rape during flowering.The full-band spectral reflectance was used as the input, and the LAI value of winter oilseed rape was used as the output.The model accuracy of SVM and the back-propagation neural network (BPNN) were compared with RF to establish a more accurate LAI estimation model for winter oilseed rape.The results of this study provide a valuable reference for estimating LAI in winter oilseed rape.

Overview of the Test Area
The experiment was carried out at the Water-saving Irrigation Experimental Station of Water-saving Agriculture Research Institute of Northwest A & F University (34 • 14 N, 108 • 10 E, altitude 521 m), the location of the experimental area is shown in Figure 1.Shaanxi Province is an important production base of rapeseed, the experimental area is located in the Guanzhong Plain irrigation area, it is a warm temperate monsoon semi-humid climate.This specific climatic conditions have an important impact on the growth and yield of winter rapeseed.The experimental station is equipped with an irrigation system and meteorological monitoring station, which provides the necessary conditions for the experiment.The winter oilseed rape variety 'Shanyou 18' cultivated by Northwest A & F University was sown on 3 October 2021.
order 0 and order 1) were calculated based on the full-band hyperspectral reflectance.The RF model was applied to estimate the LAI of winter oilseed rape during flowering.The full-band spectral reflectance was used as the input, and the LAI value of winter oilseed rape was used as the output.The model accuracy of SVM and the back-propagation neural network (BPNN) were compared with RF to establish a more accurate LAI estimation model for winter oilseed rape.The results of this study provide a valuable reference for estimating LAI in winter oilseed rape.

Overview of the Test Area
The experiment was carried out at the Water-saving Irrigation Experimental Station of Water-saving Agriculture Research Institute of Northwest A & F University (34°14′ N, 108°10′ E, altitude 521 m), the location of the experimental area is shown in Figure 1.Shaanxi Province is an important production base of rapeseed, the experimental area is located in the Guanzhong Plain irrigation area, it is a warm temperate monsoon semihumid climate.This specific climatic conditions have an important impact on the growth and yield of winter rapeseed.The experimental station is equipped with an irrigation system and meteorological monitoring station, which provides the necessary conditions for the experiment.The winter oilseed rape variety 'Shanyou 18' cultivated by Northwest A & F University was sown on 3 October 2021

Data Acquisition and Processing
On April 8, April 11, April 13, and April 15, 2022, ten oilseed rape plants were randomly chosen in each plot to measure LAI.The measurements included the leaf length L_ij and width B_ij of all selected leaves, which were determined using a tape measure.The LAI value for each plot was represented by the average of the measurements.A total of 64 data sets were obtained, and the LAI was calculated using the following equation [16]:

Data Acquisition and Processing
On 8 April, 11 April, 13 April, and 15 April 2022, ten oilseed rape plants were randomly chosen in each plot to measure LAI.The measurements included the leaf length L_ij and width B_ij of all selected leaves, which were determined using a tape measure.The LAI value for each plot was represented by the average of the measurements.A total of 64 data sets were obtained, and the LAI was calculated using the following equation [16]: where: α is the conversion factor, taken as 1.0; ρ denotes the planting density (plants/m 2 ); n is the total number of leaves (pieces) of the jth plant; m is the number of plants tested.
The final data obtained are shown in Table 1.

Hyperspectral Data Acquisition and Processing
The canopy spectral reflectance of winter oilseed rape was measured using the ASD Field-Spec 3 back-mounted field hyperspectral spectrometer (Analytical Spectral Devices, Inc, St, Boulder, CO, USA).A detailed description of the instrument and its specific usage method can be found in the literature [17].The instrument has a wavelength range of 350 to 1830 nm, and measurements were conducted using a 1.5 m long fiber.For each sample, 10 spectral curves were collected, and the average value was calculated to represent the spectral reflectance of the sample [18].During the measurements, the optical fiber probe was positioned in a vertical downward orientation, approximately 1 m above the canopy.Data collection took place between 10:00 and 14:00, resulting in a total of 64 data sets.To mitigate (or eliminate) the impact of background noise, baseline drift, stray light, and other irrelevant information on the hyperspectral reflectance curve, this study employed Savitzky-Golay convolution smoothing as a preprocessing technique for the spectral data [19].A quadratic polynomial was used, and a window size of 9 points was applied for function fitting and filtering to remove noise.

Selection and Extraction of Vegetation Index
For healthy crops, their leaves exhibit three main characteristics in terms of physiological and spectral properties: (1) strong absorption of various photosensitive pigments such as chlorophyll in the visible light range (400-750 nm) resulting in low reflectivity; (2) multiple scattering at the cell-air interface inside the leaves leading to high reflectivity in the near-infrared range (750-1300 nm); and (3) absorption of water causing low reflectivity in the short-wave infrared range (1300-2500 nm) [20].Vegetation spectral indices can serve as indicators of the plants' physiological status.Therefore, based on the existing literature, seven vegetation spectral indices closely related to vegetation were selected, including the ratio index (RI), triangular vegetation index (TVI), modified simple ratio (mSR), modified normalized difference index (mNDI), difference index (DI), normalized difference vegetation index (NDVI), and the soil-adjusted vegetation index (SAVI).The specific calculation formulas for each index are presented in Table 2.
Table 2. Spectral index of winter rapeseed in flowering period.

Modeling Technique
Of the aforementioned spectral indices, the five with the highest correlation to LAI in the 0-order and 1-order were selected as input variables for the model.Subsequently, three machine learning models, namely BPNN, SVM, and RF, were employed to model the LAI of winter rapeseed [23].BPNN is a type of artificial neural network available in the Neural Network Toolbox of MATLAB, which is commonly used for solving classification, regression, and pattern recognition problems.Utilizing BPNN in MATLAB requires the utilization of the Neural Network Toolbox, which offers various functions and tools for facilitating neural network construction, training, testing, and evaluation.The transfer function of the hidden layer in BPNN is set to TANSIG, and the Levenberg-Marquardt (Train-LM) algorithm, based on numerical optimization theory, is employed as the network training function.After several rounds of training, the number of neurons in the middle layer was determined to be 15.Typically, the repetitive oscillation is halted if the cost function becomes less than a very small positive number or the iteration no longer decreases.At this point, the training of the BP network and the establishment of the mapping relationship between input and output are completed [24].RF, on the other hand, is an ensemble learning method composed of multiple decision trees.Each decision tree is trained based on random sampling of training data and randomly selected feature subsets.In constructing the RF model, through parameter optimization and multiple training iterations, the final number of decision trees for the RF model was determined to be 500 [25].SVM, as a principle, aims to map linearly separable or approximately linearly separable data sets into a high-dimensional space and utilizes hyperplanes or nonlinear kernel functions for segmentation to achieve classification or regression tasks.In this study, the SVM model's penalty coefficients C and γ were set to 20 and 0.02, respectively [26].

Data Processing and Model Evaluation
A total of 64 data sets were obtained in this experiment.After eliminating one set of abnormal data, 2/3 of the data (42 groups) were randomly selected as the modeling dataset, while the remaining 1/3 of the data (21 groups) were used as the validation dataset.The aforementioned data were inputted into MATLAB for analysis, and different codes were used to construct SVM and RF models.BPNN, available in the Neural Network Toolbox of MATLAB, was also employed.In this study, three evaluation indices, namely R 2 , RMSE, and MRE, were used to assess the accuracy of each model [27].A higher R 2 value (closer to 1) and lower RMSE and MRE values indicate better prediction accuracy of the models [28].
The calculation formula of accuracy evaluation index is as follows [25]:

Spectral Index Construction and Optimal Spectral Index Band Combination Extraction
Firstly, the correlation matrix diagram was constructed for 14 spectral indices.As depicted in Figures 2 and 3, the color gradient from blue to red represents the strength of the correlation between each spectral index and the winter rapeseed LAI, ranging from a highly negative correlation to a highly positive correlation [27].Subsequently, the hyperspectral reflectance was computed after applying 0-order and 1-order differential processing on a band-by-band basis using spectral indices.The correlation matrix method was then employed to assess the relationship with winter rapeseed LAI.The two wavelengths exhibiting the highest coefficients were selected to construct spectral indices of varying orders.The correlations between the 0th and 1st order indices of winter rapeseed LAI exceeded 0.310 (p < 0.01), indicating a highly significant relationship.This suggests that all seven selected spectral indices can be utilized to accurately predict the LAI of winter rapeseed during its flowering stage.Among these indices, the highest correlation coefficient with winter rapeseed LAI was observed for NDVI, with a coefficient value of 0.734 and wavelength combination coordinates of (716, 724) under first-order differentiation.The order of correlation coefficients for each spectral index with winter rapeseed LAI, from highest to lowest, was as follows: NDVI > RI > TVI > DI > SAVI > mNDI > mSR.From the seven spectral indices, the top five with the highest correlations-NDVI, RI, TVI, DI, and SAVI-were selected as the optimal band combinations.The corresponding wavelength bands for these indices were (716, 724), (716, 724), (755, 725), (717, 753), and (717, 753).The band combinations for the 0th-order spectral index are presented in Table 3. Note: The wavelength range with the higher correlation is between 670 and 760.Spectral index, the spectral index combinations are the five most correlated from high to low.

Construction and Comparison of LAI Prediction Models for Winter Oilseed Rape
Each order's optimal spectral index combination was used as the independent variable, and the LAI of winter rapeseed was used as the response variable.SVM, RF, and BPNN were selected to construct the LAI estimation model of winter rapeseed at multiple flowering stages.The model's accuracy was comprehensively evaluated from three indicators: R 2 , RMSE, and MRE.The prediction results of different modeling methods are as Table 4. Table 4 presents the prediction results of different models for winter oilseed rape LAI estimation on the modeling and validation sets under 0th and 1st order differential changes.It can be observed that under the 1st order differential, the R 2 values of all three models are higher than those under the 0th order, and the validation set R 2 values of all three models are above 0.6, which indicates that all three models exhibit good linear fitting results.Among them, under the 1st order differential, the RF model achieves a high validation set R 2 of 0.810.This is significantly higher compared to the BPNN and SVM models, which have an increment of 0.171 and 0.180 in R 2 , respectively.The RF model under the 1st order differential also demonstrates a lower validation set RMSE of 0.455 cm 2 /cm 2 , reducing by 24.0% and 31.3%compared to BPNN and SVM, respectively.Additionally, the MRE of the RF model under the 1st order differential is 10.465%, which is lower by 30.5% and 33.9% compared to BPNN and SVM, respectively.When considering the same modeling method, the order of accuracy for winter oilseed rape LAI estimation among the three models is as follows: RF > BPNN > SVM.Therefore, the 1st order differential processing combined with the RF method are identified as the optimal fractional differentiation and modeling approach in this study.The LAI estimation model for winter oilseed rape constructed using this approach achieves a training set R 2 of 0.705, RMSE of 0.725 cm 2 /cm 2 , and MRE of 14.562%.Moreover, the validation set R 2 is 0.810, with an RMSE of 0.455 cm 2 /cm 2 and an MRE of 10.465%.The evaluation results of the model are shown in Figure 4.

Discussion
In calculating and selecting the spectral index, all the combinations of wavelength indexes are in the red edge range, and the optimal wavelength position is consistent with the previous research results [29].The red edge refers to the point where the reflectance of green plants increases fastest between 670 and 760 nm.[30] Studies have shown that the absorption spectrum curve of chlorophyll in leaves has a red edge, while the absorption spectrum curve of water and carotenoids has no red edge.The change and characteristic information of chlorophyll content can be reflected on the red edge to the greatest extent because the red edge has a high sensitivity to chlorophyll content [31].This study found that LAI is closely related to chlorophyll in leaves because the growth of leaves requires chlorophyll to participate in photosynthesis, which is the root cause of LAI growth.When selecting the best spectral index combination for modeling, it is found that the first-order differential spectral index has a high predictive ability for LAI.When analyzing the input characteristics of the model, it can be observed that the first-order differential spectral index is closely related to LAI.While filtering the background noise, the differential treatment also retains the red edge band's ability to describe the plant's physical and chemical parameters [32].Therefore, the band combinations selected under each order differential treatment in this study that have a very significant correlation with the LAI of winter oilseed rape are mostly distributed in the range of 670-760 nm, consistent with previous studies' results [33].The first-order differential spectral index and LAI correlate better than the original one.It is because the model based on the first-order differential spectral index is more adaptable to unknown samples, which can improve the response and information mining ability of the spectrum to LAI, enhance the correlation between LAI and spectrum, and better characterize the growth status of winter rapeseed.It may be due to the differential treatment of canopy reflectance spectral data, which can reduce the influence of baseline drift and background noise and enhance the spectral characteristics of physiological and biochemical parameters in winter rapeseed.It is worth noting that baseline drift and noise interference are mostly non-stationary signals.Differential and spectral variable selection can effectively eliminate background noise [34], thereby improving the correlation between spectrum and LAI.
Among the three modeling methods employed in this study, the RF-based evaluation model for winter rapeseed LAI demonstrated superior accuracy, indicating RF's advantage over other models in estimating winter rapeseed LAI [35].These findings align with previous estimations of crop LAI.Research has shown that different modeling methods have a significant impact on the prediction accuracy of the estimation model [36].By comparing the prediction accuracy of SVM and RF models, this study found that the RF model outperformed SVM.The low accuracy of the SVM model can be attributed to limitations in kernel function and related parameter selection [37].Additionally, the performance of the BPNN model could potentially be improved, which may be attributed to the small sample size or improper selection of activation function, leading to the poor generalization ability of the model.In summary, optimizing kernel functions and parameter settings, increasing the number of training samples, and selecting appropriate activation functions are key factors for enhancing the accuracy of SVM and BPNN models in future research.
In this study, the utilization of hyperspectral data for studying the LAI model of winter rapeseed has yielded promising results.However, it is important to note that there are certain limitations to the applicability of the model because only LAI hyperspectral data from a single growth stage, specifically the flowering stage, were used as the input variables for model construction.To address this limitation, it is necessary to incorporate hyperspectral data from different growth stages as input variables in future research.Additionally, exploring alternative machine learning methods could be beneficial in testing and improving the model, aiming to achieve a balance between estimation accuracy and universality.Such exploration can provide valuable insights into predicting the growth status of winter rapeseed across multiple growth stages, further contributing to effective solutions for promoting sustainable energy development.

Conclusions
In this experiment, the research focused on the LAI of winter rapeseed specifically at the flowering stage.The LAI values and canopy hyperspectral reflectance data were measured for winter rapeseed during this growth phase.The hyperspectral reflectance data were processed through 0-order and 1-order differential transformations, and bandby-band spectral indices were calculated accordingly.The correlation matrix method was utilized to determine the optimal wavelengths for constructing spectral indices of different orders.Finally, based on these optimal spectral indices and three machine learning models, namely RF, SVM, and BPNN, LAI prediction models for winter rapeseed at the flowering stage were constructed.The following conclusions were drawn from the study: (1) Compared with the original hyperspectral reflectance data, the correlation between the optimal spectral index extracted after the first-order differential transformation and winter rapeseed was significantly improved.The average correlation coefficient between each spectral index and winter rapeseed LAI under the first-order treatment was 1.5% higher than that of the original spectral index.Among which, NDVI (716&724) showed the highest correlation, with a correlation coefficient of 0.734.
(2) When comparing the same modeling method with different input variables, the accuracy of the winter rapeseed LAI prediction model is higher for the first-order differential treatment compared to the original order.On the other hand, when keeping the input variables the same and changing the modeling method, the accuracy of the winter

Figure 1 .
Figure 1.Geographical location of the test area.

Figure 1 .
Figure 1.Geographical location of the test area.

Figure 4 .
Figure 4. Correlation between the predicted and measured values of winter rapeseed LAI of 0-order and 1-order based on BPNN, RF, and SVM4.Discussion.

Figure 4 .
Figure 4. Correlation between the predicted and measured values of winter rapeseed LAI of 0-order and 1-order based on BPNN, RF, and SVM.

Table 1 .
Statistical results of winter rapeseed LAI data at flowering stage.

Table 3 .
Spectral index wavelength combinations at different differential orders.

Table 4 .
Comparison of model accuracy prediction evaluation under different differential orders.