Retrieval of Leaf Chlorophyll Contents (LCCs) in Litchi Based on Fractional Order Derivatives and VCPA-GA-ML Algorithms

The accurate estimation of leaf chlorophyll content (LCC) is a significant foundation in assessing litchi photosynthetic activity and possible nutrient status. Hyperspectral remote sensing data have been widely used in agricultural quantitative monitoring research for the non-destructive assessment of LCC. Variable selection approaches are crucial for analyzing high-dimensional datasets due to the high danger of overfitting, time-intensiveness, or substantial computational requirements. In this study, the performance of five machine learning regression algorithms (MLRAs) was investigated based on the hyperspectral fractional order derivative (FOD) reflection of 298 leaves together with the variable combination population analysis (VCPA)-genetic algorithm (GA) hybrid strategy in estimating the LCC of Litchi. The results showed that the correlation coefficient (r) between the 0.8-order derivative spectrum and LCC had the highest correlation coefficients (r = 0.9179, p < 0.01). The VCPA-GA hybrid strategy fully utilizes VCPA and GA while compensating for their limitations based on a large number of variables. Moreover, the model was developed using the selected 14 sensitive bands from 0.8-order hyperspectral reflectance data with the lowest root mean square error in prediction (RMSEP = 5.04 μg·cm−2). Compared with the five MLRAs, validation results confirmed that the ridge regression (RR) algorithm derived from the 0.2 order was the most effective for estimating the LCC with the coefficient of determination (R2 = 0.88), mean absolute error (MAE = 3.40 μg·cm−2), root mean square error (RMSE = 4.23 μg·cm−2), and ratio of performance to inter-quartile distance (RPIQ = 3.59). This study indicates that a hybrid variable selection strategy (VCPA-GA) and MLRAs are very effective in retrieving the LCC through hyperspectral reflectance at the leaf scale. The proposed methods could further provide some scientific basis for the hyperspectral remote sensing band setting of different platforms, such as an unmanned aerial vehicle (UAV) and satellite.


Introduction
Litchi, as a typical subtropical evergreen fruit tree, is one of the important economic pillars of farmers in southern China, such as Guangdong province. The timely and rapid monitoring of the growth and nutrition of this crop is conducive to precise field management [1]. Chlorophyll absorbs sunlight and uses its energy to synthesize carbohydrates Plants 2023, 12, 501 2 of 18 from CO 2 and H 2 O. It plays an important role in vegetation stress, photosynthetic capacity, and physiological status and thus affects the primary production and crop harvest [2][3][4][5]. In addition, the leaf chlorophyll content (LCC) is closely related to the nitrogen (N) content [6,7] and can be used as a close proxy for the N concentration at the leaf level [8,9]. The nutritional status of crops is also closely related with the chlorophyll. The laboratory chemical measurement of LCC is destructive and relatively time-and labor-consuming. It is difficult to meet the practical demands of precise crop management in large or regional fields [10]. Thus, it is crucial to create quick, non-destructive, and efficient techniques that can deliver precise LCC estimations.
With the advancement of remote sensing techniques, hyperspectral remote sensing data, with their abundance of data, continuity, and rich hidden characteristics, have been widely used to non-destructively and accurately monitor crop chlorophyll contents [9,11]. However, there is a significant chance of over-fitting when modeling spectral data with a large number of wavelength variables and relatively few samples, which will lead to subpar or ineffective prediction results of multivariable estimation models. Therefore, efficient variable (feature) selection techniques have taken center stage in the analysis of hyperspectral remote sensing data. By alleviating the dimensionality curse, variable selection can yield faster and more cost-effective variables, improve the predictive performance of the chosen variables, and make it easier to understand and justify the models that are generated [12]. Yun et al. [13] confirmed the importance and necessity of variable selection in complex analysis systems. In recent decades, more and more experts and scholars have invested in relevant research and proposed many variable selection algorithms. These variable selection algorithms can be summarized into four types: (1) wavelength point-based selection algorithm, which is characterized by taking each wavelength variable as a unit and studying it based on four factors: different variable initialization, modeling methods, evaluation indicators, and selection strategies, and finally selecting the best combination of variables, such as the successive projections algorithm (SPA) [14], Monte Carlo uninformative variables elimination (MC-UVE) [15], competitive adaptive reweighted sampling (CARS) [16], and variable combination population analysis (VCPA) [17]; (2) wavelength range selection algorithm; its characteristic is that each wavelength range is taken as a unit, and then, the best combination of interval variables is selected according to different search strategies. Each interval is composed of several continuous variables, which is consistent with the continuous band characteristics of vibration and rotation spectra, making the modeling more interpretable, such as interval partial least-squares (iPLS) [18], interval random frog (iRF) [19], fisher optimal subspace shrinkage (FOSS) [20], and the interval variable iterative space shrinkage approach (iVISSA) [21]; (3) hybrid variable selection algorithm; its characteristic is to combine two or three existing algorithms and optimize the algorithm by combining the advantages of the algorithm, such as CARS-SPA [22] and iPLS-SPA [23]; (4) improved variable selection algorithm, which is based on the method of improving at least one of the four factors of the variable initialization, model method, evaluation index, and selection strategy, such as stability competitive adaptive reweighted sampling (SCARS) [24] and variable permutation population analysis (VPPA) [25].
Leaf reflectance is an efficient method for determining the LCC [26][27][28] since an increase or reduction in LCC may produce more or less absorption in blue and red wavelengths, which ultimately alters the spectral reflectance of leaves. In recent years, hyperspectral reflectance data have been used in some studies to estimate LCC at various scales based on the reaction of leaf reflectance to LCC ( Table 1). The current research on crop LCC is essentially concerned with analyzing the difference in LCC inversion from two levels of the spatial scale effect and wide and narrow band spectral resolution. The remote sensing data acquisition platforms are constantly updated from aerospace and aviation to low altitude; LCC inversion models are continuously improved from traditional empirical models, such as linear regression (LR), to physical models, such as PROSPECT, and then to hybrid inversion models by using machine learning algorithms (MLAs). However, in the studies mentioned above, hyperspectral data only use original spectral reflectance or mathematical transformation forms, such as first and second derivatives, and ignore the potential information contained between them, which may result in the loss of crucial information and a decline in model accuracy. Zhang et al. [29] analyzed the correlation between hyperspectral reflectance through fractional order derivatives (FODs) and heavy metal content in maize leaves and found that FODs can expand the selection space of sensitive bands. Moreover, few studies have considered the potential interaction impact of variables through random combinations, while the majority of studies use a single variable selection approach.
Hence, to address the above difficulties, this study proposed machine learning regression algorithms (MLRAs) using hyperspectral reflectance data for litchi LCC estimation. The following are the main objectives of this study: (1) to explore the impact of FODs on litchi leaf spectra and comparatively analyze the correlation between the litchi LCC and FOD spectra based on Pearson's correlation; (2) to explore the hybrid variable selection algorithm, VCPA coupled with the genetic algorithm (GA), and its potential application in retrieving the LCC of litchi; (3) to develop MLRAs and evaluate the accuracy of the optimal litchi LCC estimation model based on FOD-VCPA-GA.   Figure 1a displays the leaf spectral curves of litchi with various LCCs. As shown in this figure, the reflectance curves of litchi leaves with different LCCs included one reflection peak (about 550 nm) and two absorption valleys (450 nm and 670 nm) in the 400-780 nm (visible) range. Chlorophyll, which has a strong absorption of blue and red light and a high reflection of green light, is primarily responsible for this property [39]. The leaf reflectance gradually dropped in the vicinity of 550 nm as the LCC increased.  Figure 1a displays the leaf spectral curves of litchi with various LCCs. As shown in this figure, the reflectance curves of litchi leaves with different LCCs included one reflection peak (about 550 nm) and two absorption valleys (450 nm and 670 nm) in the 400-780 nm (visible) range. Chlorophyll, which has a strong absorption of blue and red light and a high reflection of green light, is primarily responsible for this property [39]. The leaf reflectance gradually dropped in the vicinity of 550 nm as the LCC increased. In the range of 670-750 nm, there was a reflection "steep slope", and as the LCC increased, the reflection curve of litchi leaves shifted to the long wave direction. After 750 nm, there were no overt variations in the leaf reflectance of litchi with various LCCs. At 1450 nm and 1950 nm, there were two absorption valleys that were mostly brought on by the effect of leaf water content. The spectral features of litchi leaves described above were comparable to those of green plant leaves.

Correlation Analysis between LCC and FOD Spectra
The linearity of the link between two variables can be confirmed via correlation analysis. We can determine the existence of a linear relationship between two variables, its strength, and whether it is positive or negative by looking at the correlation coefficient (r). In this study, Pearson's correlation coefficients for LCC and FOD spectra (0-2 order) were calculated and tested at the 0.01 significance level (r > 0.1465). A thorough outcome was plotted in Figure 1b. The position of the band with a positive and negative association with LCC fluctuated with the continual increase in order, and it was primarily dispersed in the visible near-infrared (VIS-NIR) range (400-900 nm). The reflectance in the 400-497 nm, 665-679 nm, and 756-900 nm regions was positively correlated with the LCC for the original spectral (0 order) data, while the reflectance in the 498-664 nm and 680-755 nm ranges was negatively correlated. The maximum absolute value of the correlation coefficient was shown at 709 nm (r = −0.8542). Table 2 displays the statistics for the number of bands that passed the 0.01 significance test (0-2 order). As shown in Table 2, the overall number of bands passing the 0.01 significance test and the number of bands positively connected to the LCC were reduced as the order increased, while the number of bands negatively related to the LCC essentially increased first and then decreased. At 756 nm of the 0.8 order, the correlation coefficient reached its greatest value (r = 0.9179), followed by 720 nm of the 1.8 order (r = 0.9020) and 723 nm of the 1.6 order (r = 0.9018). These bands all appeared in the red-edge region, which is an important indicator area for describing the state of plant pigments. In conclusion, the results of correlation analysis showed that the correlation between FOD In the range of 670-750 nm, there was a reflection "steep slope", and as the LCC increased, the reflection curve of litchi leaves shifted to the long wave direction. After 750 nm, there were no overt variations in the leaf reflectance of litchi with various LCCs. At 1450 nm and 1950 nm, there were two absorption valleys that were mostly brought on by the effect of leaf water content. The spectral features of litchi leaves described above were comparable to those of green plant leaves.
The linearity of the link between two variables can be confirmed via correlation analysis. We can determine the existence of a linear relationship between two variables, its strength, and whether it is positive or negative by looking at the correlation coefficient (r). In this study, Pearson's correlation coefficients for LCC and FOD spectra (0-2 order) were calculated and tested at the 0.01 significance level (r > 0.1465). A thorough outcome was plotted in Figure 1b. The position of the band with a positive and negative association with LCC fluctuated with the continual increase in order, and it was primarily dispersed in the visible near-infrared (VIS-NIR) range (400-900 nm). The reflectance in the 400-497 nm, 665-679 nm, and 756-900 nm regions was positively correlated with the LCC for the original spectral (0 order) data, while the reflectance in the 498-664 nm and 680-755 nm ranges was negatively correlated. The maximum absolute value of the correlation coefficient was shown at 709 nm (r = −0.8542). Table 2 displays the statistics for the number of bands that passed the 0.01 significance test (0-2 order). As shown in Table 2, the overall number of bands passing the 0.01 significance test and the number of bands positively connected to the LCC were reduced as the order increased, while the number of bands negatively related to the LCC essentially increased first and then decreased. At 756 nm of the 0.8 order, the correlation coefficient reached its greatest value (r = 0.9179), followed by 720 nm of the 1.8 order (r = 0.9020) and 723 nm of the 1.6 order (r = 0.9018). These bands all appeared in the red-edge region, which is an important indicator area for describing the state of plant pigments. In conclusion, the results of correlation analysis showed that the correlation between FOD spectra and the LCC of litchi was greater than the commonly used first-and second-order derivatives, and it is worthwhile to further investigate its potential for estimating LCC. Tb, Pb, and Nb refer to the number of total, positive, and negative correlation bands that passed the 0.01 significance test, respectively (400-2400 nm); rmax refers to the maximum absolute value of correlation coefficient.

Performance of VCPA-GA Hybrid Strategy for Variable Selection
A VCPA-GA hybrid strategy was proposed to further optimize and extract sensitive band information from the spectra of 400-900 nm. Figure 2 shows the distribution of sensitive bands screened using the VCPA-GA hybrid strategy. Variable selection is a critical and necessary step for the LCC estimation models, as illustrated in Figure 2, where the variable regions selected using VCPA-GA are similar but the number of sensitive bands selected has been greatly reduced, with the majority of them being concentrated around 590 nm, 760 nm, and 840 nm. The spectral reflectance near 590 nm and 760 nm was strongly related to the LCC, which was basically consistent with the results of the Pearson correlation analysis.

Performance of VCPA-GA Hybrid Strategy for Variable Selection
A VCPA-GA hybrid strategy was proposed to further optimize and extract sensitiv band information from the spectra of 400-900 nm. Figure 2 shows the distribution of sen sitive bands screened using the VCPA-GA hybrid strategy. Variable selection is a critica and necessary step for the LCC estimation models, as illustrated in Figure 2, where th variable regions selected using VCPA-GA are similar but the number of sensitive band selected has been greatly reduced, with the majority of them being concentrated around 590 nm, 760 nm, and 840 nm. The spectral reflectance near 590 nm and 760 nm wa strongly related to the LCC, which was basically consistent with the results of the Pearson correlation analysis.   Table 3 shows the statistical results of the VCPA-GA hybrid strategy based on the 0-2-order dataset, including the number of selected variables (N var ), the number of optimal PLS latent variables (N lvs ), the root mean square error in calibration (RMSEC),the root mean square error in cross validation (RMSECV), and the root mean square error in prediction (RMSEP). As seen in Table 3, the number of chosen sensitive bands did not exhibit any clear regularity as the order increased. The 0.2 derivative was the most frequently chosen order among them (N var = 54), while the original spectrum had the fewest bands (N var = 5). The prediction performance of the 0.8 order (RMSEP = 5.04) was better than that of the other orders, followed by that of the 1.4 order (RMSEP = 5.24) and 1.8 order (RMSEP = 5.25). FOD spectrum has some potential in determining the LCC of litchi. The variable selection is a crucial and necessary step in FOD spectral data mining. VCPA-GA hybrid strategy may fully exploit the benefits of the VCPA and GA algorithms and comprises a great enhancement to the FOD spectral variable selection. Nvar and NIvs refer to the number of selected variables and the number of optimal PLS latent variables. RMSEC, RMSECV, and RMSEP refer to the root mean square error in calibration, the root mean square error in cross validation, and root mean square error in prediction.

MLRAs for Estimating the LCC of Litchi
After selecting the best sensitive band combination of the 0-2-order derivative through the VCPA-GA hybrid strategy, five machine learning regression models were constructed for estimating the LCC of litchi. The training, testing, and validation results of MLRAs are shown in Table 4. For the training set, the XGBoost model performed best for all datasets of the 0-2 order, with R 2 reaching 0.99, followed by the RF (R 2 : 0.85~0.92) and SVR (R 2 : 0.83~0.88) models. Among them, the training effect for the 0.2-order derivative data with the XGBoost model was the best with the lowest MAE and RMSE value (MAE = 1.21 µg·cm −2 , RMSE = 1.70 µg·cm −2 ), followed by that of the 0.4 order with XGBoost (MAE = 2.06 µg·cm −2 , RMSE = 2.75 µg·cm −2 ) and the 1.6 order with RF (MAE = 2.42 µg·cm −2 , RMSE = 3.19 µg·cm −2 ). There was no glaring rule discovered for the testing set. The MAE values of SVR and GPR were typically high in all models of 0-2-order spectra datasets, and the testing effect of the RR model of the 1.8 order was the best (R 2 = 0.85, MAE = 3.59 µg·cm −2 , RMSE = 4.67 µg·cm −2 ).  The validation of the MLRAs for predicting the LCC was conducted using an independent dataset (n = 47). The validation performance varied between orders and models, just as the training and testing sets did, and it remained largely steady at the 0.2 order in five MLRAs in terms of R 2 , MAE, RMSE, and RPIQ. The rankings were as follows: RR (R 2   The validation of the MLRAs for predicting the LCC was conducted using an independent dataset (n = 47). The validation performance varied between orders and models, just as the training and testing sets did, and it remained largely steady at the 0.

Study Area
Guangdong is the most important litchi-producing area in China, with the cultivation area and output ranking first among all provinces and regions in the country. In this study, two commercial 'Guiwei' litchi orchards, normally operated by local farmers, were selected as the study area ( Figure 4). One (Litchi orchard 1) was located in Yangxi County of Yangjiang City (111 • 22 -111 • 48 E, 21 • 29 -21 • 55 N), and the other (Litchi orchard 2) was in Dianbai District of Maoming City (110 • 54 -111 • 29 E, 21 • 22 -21 • 59 N). The above two areas belong to a subtropical monsoon climate, with sufficient sunshine, abundant rainfall, and a pleasant climate. The annual average temperature is about 23 • C, the vegetation is evergreen, and the flowers are always in bloom. Litchi is one of the specialties of the two places. Data collection was carried out at the flower bud differentiation (28 December 2020) and the blooming florescence (19 March 2021) stages. The selected trees were in good condition.

Study Area
Guangdong is the most important litchi-producing area in China, with the cultivation area and output ranking first among all provinces and regions in the country. In this study, two commercial 'Guiwei' litchi orchards, normally operated by local farmers, were selected as the study area ( Figure 4). One (Litchi orchard 1) was located in Yangxi County of Yangjiang City (111°22′-111°48′ E, 21°29′-21°55′ N), and the other (Litchi orchard 2) was in Dianbai District of Maoming City (110°54′-111°29′ E, 21°22′-21°59′ N). The above two areas belong to a subtropical monsoon climate, with sufficient sunshine, abundant rainfall, and a pleasant climate. The annual average temperature is about 23 °C, the vegetation is evergreen, and the flowers are always in bloom. Litchi is one of the specialties of the two places. Data collection was carried out at the flower bud differentiation (28 December 2020) and the blooming florescence (19 March 2021) stages. The selected trees were in good condition.

Hyperspectral Measurements and Preprocessing
In total, 49 'Guiwei' litchi trees (25 in Yangxi county and 24 in Dianbai District) were selected. Moreover, the longitude and latitude information of each tree was recorded using a GPS. Six leaves of each litchi tree were collected and put into fresh-keeping bags for later spectral measurements and chlorophyll extraction. Hyperspectral data for the litchi leaves were measured using an ASD FieldSpec3 spectrometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) [5] with the range 350-2500 nm. To reduce the influence of the solar altitude angle, the spectral measurement was carried out at 10:00-14:00 Beijing time with cloudless and sunny weather. Every 3-5 min, the spectral reflectance was calibrated using a standardized whiteboard (25 cm × 25 cm, 100% reflectance). Ten spectral curves were collected for each leaf sample, with a measurement interval of 0.1 s. The average value of the 10 spectral curves was taken as the spectral data of this leaf sample. In total, 294 leaves were collected. There were 294 sets of data. One group of data was removed because of data damage. Thus, 293 sets of data were used for the analysis.

Hyperspectral Measurements and Preprocessing
In total, 49 'Guiwei' litchi trees (25 in Yangxi county and 24 in Dianbai District) were selected. Moreover, the longitude and latitude information of each tree was recorded using a GPS. Six leaves of each litchi tree were collected and put into fresh-keeping bags for later spectral measurements and chlorophyll extraction. Hyperspectral data for the litchi leaves were measured using an ASD FieldSpec3 spectrometer (Analytical Spectral Devices, Inc., Boulder, CO, USA) [5] with the range 350-2500 nm. To reduce the influence of the solar altitude angle, the spectral measurement was carried out at 10:00-14:00 Beijing time with cloudless and sunny weather. Every 3-5 min, the spectral reflectance was calibrated using a standardized whiteboard (25 cm × 25 cm, 100% reflectance). Ten spectral curves were collected for each leaf sample, with a measurement interval of 0.1 s. The average value of the 10 spectral curves was taken as the spectral data of this leaf sample. In total, 294 leaves were collected. There were 294 sets of data. One group of data was removed because of data damage. Thus, 293 sets of data were used for the analysis.
The edge bands 350-399 nm and 2401-2500 nm with high optical noise were removed [40]. The remaining spectral curves, as the original reflectance spectrum, were smoothed using the Savitzky-Golay filtering method [41]. Then, the fractional order derivative (FOD) of the smoothed spectral data was calculated with the Grünwald-Letnikov .
where Γ is the Gamma function, x is the value of the corresponding point, m is the difference between the upper and lower bounds of the differential, and v is the order allowed to vary from 0-2 (increment by 0.2 at each step) in this study. In addition, v = 0 indicated that the spectral data comprised the original reflectance.

Determination of the LCC
In this study, SPAD-502 plus portable chlorophyll meter (minola Osaka company) was used to measure the leaf chlorophyll content of litchi. Since the value read from the SPAD-502 plus is unitless, it needs to be converted into LCC (µg·cm −2 ), and the conversion process was completed using Equation (2) [43].
The chlorophyll content of the selected trees ranged from 12.44 to 73.95 µg·cm −2 . The descriptive statistics of leaf chlorophyll content are presented in Figure 5. The edge bands 350-399 nm and 2401-2500 nm with high optical noise were removed [40]. The remaining spectral curves, as the original reflectance spectrum, were smoothed using the Savitzky-Golay filtering method [41]. Then, the fractional order derivative (FOD) of the smoothed spectral data was calculated with the Grünwald-Letnikov (G-L) algorithm as shown in the Equation (1) [42] using a program in Matlab R2021a (The Math-Works Inc.: Natick, MA, USA).
where is the Gamma function, x is the value of the corresponding point, m is the difference between the upper and lower bounds of the differential, and is the order allowed to vary from 0-2 (increment by 0.2 at each step) in this study. In addition, indicated that the spectral data comprised the original reflectance.

Determination of the LCC
In this study, SPAD-502 plus portable chlorophyll meter (minola Osaka company) was used to measure the leaf chlorophyll content of litchi. Since the value read from the SPAD-502 plus is unitless, it needs to be converted into LCC (μg • cm .), and the conversion process was completed using Equation (2) [43].
The chlorophyll content of the selected trees ranged from 12.44 to 73.95 μg • cm . The descriptive statistics of leaf chlorophyll content are presented in Figure 5.

VCPA-GA Hybrid Strategy for Variable Selection
VCPA is a relatively new variable selection algorithm. The first step is to use an exponentially decreasing function (EDF) to count the remaining variables. Binary matrix sampling (BMS) [44] is utilized in each EDF run to create the population of various variable combinations. Then, using the model population analysis (MPA) [45], the variable subset with the lowest cross validation root mean square error (RMSECV) was found using the top 10% of the sub models. When all EDF runs are finished, VCPA looks through the 14 remaining variables to get the best variable subset. GA uses the selection, exchange,

VCPA-GA Hybrid Strategy for Variable Selection
VCPA is a relatively new variable selection algorithm. The first step is to use an exponentially decreasing function (EDF) to count the remaining variables. Binary matrix sampling (BMS) [44] is utilized in each EDF run to create the population of various variable combinations. Then, using the model population analysis (MPA) [45], the variable subset with the lowest cross validation root mean square error (RMSECV) was found using the top 10% of the sub models. When all EDF runs are finished, VCPA looks through the 14 remaining variables to get the best variable subset. GA uses the selection, exchange, and mutation operators to describe the biological world's natural selection and genetic mechanisms. Through continuous genetic iterations, the variables with better objective function values are retained, and the variables with lower objective function values are deleted until the desired results are obtained. This has been widely used in feature variable screening [46].
The two main steps of the VCPA-GA hybrid method are shown in Figure 6. This strategy's specifics was described in Yun et al. [47]. A calibration set (193 samples) and an independent test set (100 samples) were created from the dataset. Once the model establishment and variable selection were completed in the calibration set, an independent test set was used to verify the calibration model. As a modeling technique, partial least square (PLS) was employed. Using 5-fold cross validation (CV) with a range of 1 to 10, the ideal number of PLS latent variables was determined. All data were centered before preprocessing so that the mean of each column would be zero. Fifty replications of VCPA-GA ( and mutation operators to describe the biological world's natural selection and genetic mechanisms. Through continuous genetic iterations, the variables with better objective function values are retained, and the variables with lower objective function values are deleted until the desired results are obtained. This has been widely used in feature variable screening [46]. The two main steps of the VCPA-GA hybrid method are shown in Figure 6. This strategy's specifics was described in Yun et al. [47]. A calibration set (193 samples) and an independent test set (100 samples) were created from the dataset. Once the model establishment and variable selection were completed in the calibration set, an independent test set was used to verify the calibration model. As a modeling technique, partial least square (PLS) was employed. Using 5-fold cross validation (CV) with a range of 1 to 10, the ideal number of PLS latent variables was determined. All data were centered before preprocessing so that the mean of each column would be zero. Fifty replications of VCPA-GA (ɷ = 100, ɷ is the number of variables left for GA) were performed in order to assess the model's repeatability and produce statistical results. All calculations were implemented using MATLAB (Version 2021a, the MathWorks, Inc) on a desktop computer equipped with an 12th Gen Intel(R) Core (TM) i9-12900H 2.50 GHz CPU and 32GB of RAM memory, and the operating system was Windows 11.

The Evaluation of the Proposed MLRMs
For this study, hyperspectral sensitive bands selected using a VCPA-GA hybrid strategy were taken as independent variables with LCCs as dependent variables. Then, 293 measured LCC values were randomly divided into three parts: 187 as a training set, 59 as a testing set and 47 as a validation set for validating model performance, as shown in Figure 5.
Five MLRAs were selected to explore and analyze hyperspectral reflection data for LCC modeling based on their fast training, strong performance, and popularity in different application fields. These five MLRAs were Ridge regression (RR), random forest (RF), extreme Gradient Boosting (XGBoost), support vector regression (SVR), and Gaussian processes regression (GPR). Here, RR [48] is a biased estimation regression method specially used for the analysis of collinear data. It is essentially an enhanced least squares estimate technique. It is more practical and dependable to derive regression coefficients by giving up the least square method's impartial aspect, but at the expense of losing some information and lowering accuracy. As for the RF model [49], decision trees are built for each sample that is extracted based on RF using the bootstrap resampling approach, and the predicted average values of all the decision trees are used as the final prediction results. A distributed gradient enhancement toolkit called XGBoost [50] has been tuned for great performance, adaptability, and portability. It provides a decision tree with gradient boosting (GBDT). Being more than ten times faster than standard toolkits, it is now the = 100, and mutation operators to describe the biological world's natural selection and genetic mechanisms. Through continuous genetic iterations, the variables with better objective function values are retained, and the variables with lower objective function values are deleted until the desired results are obtained. This has been widely used in feature variable screening [46]. The two main steps of the VCPA-GA hybrid method are shown in Figure 6. This strategy's specifics was described in Yun et al. [47]. A calibration set (193 samples) and an independent test set (100 samples) were created from the dataset. Once the model establishment and variable selection were completed in the calibration set, an independent test set was used to verify the calibration model. As a modeling technique, partial least square (PLS) was employed. Using 5-fold cross validation (CV) with a range of 1 to 10, the ideal number of PLS latent variables was determined. All data were centered before preprocessing so that the mean of each column would be zero. Fifty replications of VCPA-GA (ɷ = 100, ɷ is the number of variables left for GA) were performed in order to assess the model's repeatability and produce statistical results. All calculations were implemented using MATLAB (Version 2021a, the MathWorks, Inc) on a desktop computer equipped with an 12th Gen Intel(R) Core (TM) i9-12900H 2.50 GHz CPU and 32GB of RAM memory, and the operating system was Windows 11.

The Evaluation of the Proposed MLRMs
For this study, hyperspectral sensitive bands selected using a VCPA-GA hybrid strategy were taken as independent variables with LCCs as dependent variables. Then, 293 measured LCC values were randomly divided into three parts: 187 as a training set, 59 as a testing set and 47 as a validation set for validating model performance, as shown in Figure 5.
Five MLRAs were selected to explore and analyze hyperspectral reflection data for LCC modeling based on their fast training, strong performance, and popularity in different application fields. These five MLRAs were Ridge regression (RR), random forest (RF), extreme Gradient Boosting (XGBoost), support vector regression (SVR), and Gaussian processes regression (GPR). Here, RR [48] is a biased estimation regression method specially used for the analysis of collinear data. It is essentially an enhanced least squares estimate technique. It is more practical and dependable to derive regression coefficients by giving up the least square method's impartial aspect, but at the expense of losing some information and lowering accuracy. As for the RF model [49], decision trees are built for each sample that is extracted based on RF using the bootstrap resampling approach, and the predicted average values of all the decision trees are used as the final prediction results. A distributed gradient enhancement toolkit called XGBoost [50] has been tuned for great performance, adaptability, and portability. It provides a decision tree with gradient boosting (GBDT). Being more than ten times faster than standard toolkits, it is now the is the number of variables left for GA) were performed in order to assess the model's repeatability and produce statistical results. All calculations were implemented using MATLAB (Version 2021a, the MathWorks, Inc) on a desktop computer equipped with an 12th Gen Intel(R) Core (TM) i9-12900H 2.50 GHz CPU and 32GB of RAM memory, and the operating system was Windows 11. and mutation operators to describe the biological world's natural selection and genetic mechanisms. Through continuous genetic iterations, the variables with better objective function values are retained, and the variables with lower objective function values are deleted until the desired results are obtained. This has been widely used in feature variable screening [46]. The two main steps of the VCPA-GA hybrid method are shown in Figure 6. This strategy's specifics was described in Yun et al. [47]. A calibration set (193 samples) and an independent test set (100 samples) were created from the dataset. Once the model establishment and variable selection were completed in the calibration set, an independent test set was used to verify the calibration model. As a modeling technique, partial least square (PLS) was employed. Using 5-fold cross validation (CV) with a range of 1 to 10, the ideal number of PLS latent variables was determined. All data were centered before preprocessing so that the mean of each column would be zero. Fifty replications of VCPA-GA (ɷ = 100, ɷ is the number of variables left for GA) were performed in order to assess the model's repeatability and produce statistical results. All calculations were implemented using MATLAB (Version 2021a, the MathWorks, Inc) on a desktop computer equipped with an 12th Gen Intel(R) Core (TM) i9-12900H 2.50 GHz CPU and 32GB of RAM memory, and the operating system was Windows 11.

The Evaluation of the Proposed MLRMs
For this study, hyperspectral sensitive bands selected using a VCPA-GA hybrid strategy were taken as independent variables with LCCs as dependent variables. Then, 293 measured LCC values were randomly divided into three parts: 187 as a training set, 59 as a testing set and 47 as a validation set for validating model performance, as shown in Figure 5.
Five MLRAs were selected to explore and analyze hyperspectral reflection data for LCC modeling based on their fast training, strong performance, and popularity in different application fields. These five MLRAs were Ridge regression (RR), random forest (RF), extreme Gradient Boosting (XGBoost), support vector regression (SVR), and Gaussian processes regression (GPR). Here, RR [48] is a biased estimation regression method specially used for the analysis of collinear data. It is essentially an enhanced least squares estimate technique. It is more practical and dependable to derive regression coefficients by giving up the least square method's impartial aspect, but at the expense of losing some information and lowering accuracy. As for the RF model [49], decision trees are built for each sample that is extracted based on RF using the bootstrap resampling approach, and the predicted average values of all the decision trees are used as the final prediction results. A distributed gradient enhancement toolkit called XGBoost [50] has been tuned for great performance, adaptability, and portability. It provides a decision tree with gradient boosting (GBDT). Being more than ten times faster than standard toolkits, it is now the

The Evaluation of the Proposed MLRMs
For this study, hyperspectral sensitive bands selected using a VCPA-GA hybrid strategy were taken as independent variables with LCCs as dependent variables. Then, 293 measured LCC values were randomly divided into three parts: 187 as a training set, 59 as a testing set and 47 as a validation set for validating model performance, as shown in Figure 5.
Five MLRAs were selected to explore and analyze hyperspectral reflection data for LCC modeling based on their fast training, strong performance, and popularity in different application fields. These five MLRAs were Ridge regression (RR), random forest (RF), extreme Gradient Boosting (XGBoost), support vector regression (SVR), and Gaussian processes regression (GPR). Here, RR [48] is a biased estimation regression method specially used for the analysis of collinear data. It is essentially an enhanced least squares estimate technique. It is more practical and dependable to derive regression coefficients by giving up the least square method's impartial aspect, but at the expense of losing some information and lowering accuracy. As for the RF model [49], decision trees are built for each sample that is extracted based on RF using the bootstrap resampling approach, and the predicted average values of all the decision trees are used as the final prediction results. A distributed gradient enhancement toolkit called XGBoost [50] has been tuned for great performance, adaptability, and portability. It provides a decision tree with gradient boosting (GBDT). Being more than ten times faster than standard toolkits, it is now the best and quickest open source improvement tree toolkit. Prior to moving on to linear modeling, SVR [35] maps training samples to a high-dimensional space and then transforms a nonlinear problem in a low-dimensional space into a linear problem in a high-dimensional environment. Here, nonlinear issues were converted into linear ones using a radial basis function. GPR [51] is a nonparametric model for regression analysis of data using Gaussian process priors. It is based on the Bayesian framework. By using past data for training, it can convert a prior distribution into a posterior model and produce predictions with statistical significance. The above five MLRAs were implemented using the scikit learn Python package.
The agreement between the measured and predicted LCC values was evaluated using the coefficient of determination (R 2 ), mean absolute error (MAE), root mean square error (RMSE), and ratio of performance to inter quartile distance (RPIQ) generated during prediction (Equations (3)-(6)).
where n is the number of samples, y i is the ith measured LCC of each sample,ŷ i is the ith estimated LCC of each sample, y i is the mean LCC, and Q1 and Q3 are the first and third quartiles, respectively.

Discussion
The LCC is a key indicator of a crop's physiological status, and changes in it can be used to assess a crop's photosynthetic ability, growth and development stage, nutrition, stress from humans or the environment, illnesses, and pests. Hyperspectral remote sensing technology has become a non-destructive way to estimate the LCC and may provide detailed information about how vegetation differs from soil, water, and other ground objects in terms of its spectral reflection characteristics. Numerous spectral transmission techniques have been studied in the past, such as integer derivatives, continuum-removal transformations, and mathematical transformations. Integer derivatives are particularly good at enhancing absorption features, lowering background noise, and eliminating baseline drafts [52]. However, they cannot detect gradual tilts or curvatures and useful target variables. In recent times, FOD has received an increasing amount of attention in the processing of hyperspectral data to widen the selection space for sensitive bands. In this study, we calculated the 0-2-order derivative of spectral reflectance of litchi leaves in increments of 0.2. Pearson correlation analysis showed that the absolute value of the correlation coefficient between the 0.8-order derivative spectrum at 756 nm and LCC reached a maximum, with the r of 0.9179 ( Table 2). The proposed VCPA-GA hybrid strategy had the best performance in the FOD datasets. Especially, the generalization of the proposed hybrid variable selection strategy had RMSEP values of 5.04, 5.24, and 5.25 µg·cm −2 for the LCC using 0.8-, 1.4-, and 1.8-order spectral data, respectively (Table 4). Compared with that of the first and second-order derivatives, the accuracy of the LCC estimation model based on the FOD was significantly improved. An explanation for this may be because compared to integer-order spectral data, the FOD spectra offer a superior balance among spectral resolution, spectral information, and noise.
The findings of our research are consistent with the previous research conclusions to a certain extent. Cui et al. [53] investigated the potential of using the FOD for estimating the soil copper content and found that the model using the 0.8-order FOD spectra performed the best, and the R 2 and RPD of the validation set were 0.6416 and 1.63, respectively. Jin and Wang [54] created hyperspectral indices using FOD spectra to retrieve the leaf mass per area (LMA), and results showed that the 0.3-order FOD indices provided the highest accuracies to trace LMA and at the same time had the least sensitivity to random noise. In short, the FOD spectra are, in general, superior or at least compatible to the original reflectance or first-and second-order derivatives and could further promote the practical application of hyperspectral remote sensing in estimating plant physiological and biochemical parameters, as mentioned above. Thus, we suggest that FOD analysis is efficient to identify the best band combination that could be applied to a large measurement database with a wide variety of plant leaves and field conditions from various remote sensing platforms.
Variable selection technology plays a key role in eliminating irrelevant or uninformative variables and reducing data dimension in hyperspectral data. Yun et al. [47] used the VCPA-based hybrid strategy with iteratively retaining informative variables (IRIVs) and GA to select the optimized variables in near-infrared (NIR) spectral datasets for beer, cotton, and tablets. The findings demonstrated that when compared to other approaches, the VCPA-IRIV and VCPA-GA significantly improve model prediction performance and that the modified VCPA step is a very successful method for removing the unhelpful variables. This also provides methodological support for our study. In this study, VCPA gradually reduced the number of variables based on EDF until all hyperspectral bands were reduced and optimized. Then, a modified version of VCPA was combined with GA to create a hybrid approach for variable selection in order to get beyond the current limiting problem associated with GA for a high number of variables. By choosing too few variables, VCPA has another problem that our hybrid strategy can assist in overcoming. The original VCPA only chooses less than 14 variables, but it has components that could cause the variable space to continuously contract. Although GA is a useful optimization tool, it has a number of limitations when working with many variables. There were 501 variables in this litchi hyperspectral dataset from 400 nm to 900 nm. Finding the ideal variable subset for GA would be exceedingly challenging given this enormous variable space. The variable space decreased from 501 to 100 when modified VCPA was used as the initial step, making it much simpler to identify the ideal variable subset in this highly compressed and optimal space. It is clear from Table 3 that the RMSEC and RMSECV decrease as the order increases, indicating that the variable space is constantly optimized. Additionally, the 0.2-order derivative sensitive band combination chosen by VCPA-GA for LCC prediction using the RR model has the best accuracy. Compared with previous studies, our research proved that the suggested VCPA-GA hybrid approach may successfully be applied to hyperspectral reflectance with FODs. It could also ensure MLRA's accuracy and avoid model overfitting.
MLRAs, such as SVR, RF, BPNN, and kernel-based extreme learning machine (KELM), have been widely used for estimating crop biochemical properties [32,33,36,55]. In our study, for the purpose of investigating and evaluating FOD spectral data optimized using the VCPA-GA approach for litchi LCC modeling, five MLRAs were developed taking into account their quick training, good performance, and popularity in numerous application areas. A comparison of them revealed that the accuracy of the models was different for the data of various FOD spectra. Among them, the RR model, based on 0.2-order derivative spectra, can estimate the LCC of litchi well. The performance of GPR and XGBoost closely followed the performance of RR (Table 4) in terms of R 2 , RMSE, and RPIQ. The stochastic gradient of XGBoost, which enhances the method, may prevent overfitting, can enhance prediction accuracy, and can be used to explain why it has greater accuracy. Additionally, the XGBoost ensemble can handle noisy data based on the deployment of a number of decision-based tree classifiers. There are numerous such instances where the XGBoost model was effectively used to forecast soil characteristics and nutrients [56,57]. Future research could also look into combining radiative transfer models (RTMs) and machine learning algorithms to accurately estimate the chlorophyll content at both the leaf and canopy scales, in addition to investigating other advanced machine learning techniques, such as stochastic gradient boosting (SGB), Cubist (CB), and deep learning.

Conclusions
In this study, we investigated the performance of five MLRAs and assessed the potential of fractional order derivatives and a VCPA-GA hybrid variable selection strategy to enhance the hyperspectral estimate of litchi LCC. Compared with the common first and second derivatives, the correlation coefficient between the FOD spectrum and LCC was improved, reaching 0.9179 at the 0.8 order (756 nm), followed by the 1.8 order (0.9020, 720 nm) and 1.6 order (0.9018, 723 nm). The VCPA-GA hybrid method improved upon VCPA's ability to shrink the variable space constantly, and combined it with GA for further optimization. To investigate how this hybrid approach could be improved, hyperspectral datasets (0-2 order) of litchi leaves were used. The findings demonstrated that the VCPA-GA hybrid strategy fully utilizes the benefits of both VCPA and GA while compensating for their shortcomings. It fixes the issue of VCPA's propensity to choose fewer variables and removes GA's restrictions when working with a large number of variables. Additionally, as compared to the commonly used first-and second-order derivatives, this hybrid strategy performs noticeably better with FOD spectral data, demonstrating the effectiveness of employing FOD spectral data to compress and optimize the variable space. As a result, for FOD spectral data, VCPA-GA is an effective substitute for variable selection approaches.
From the performance of the MLRAs, we found that the training effect of the XG-Boost algorithm was the best for the 0 order, with the highest R 2 (0.99) and lowest MAE (0.53 µg·cm −2 ) and RMSE (0.71 µg·cm −2 ). During validation, RR also showed the highest accuracy at the 0.2 order, with R 2 = 0.88, MAE = 3.40 µg·cm −2 , RMSE = 4.23 µg·cm −2 , and RPIQ = 3.59. It is important to note that the VCPA-GA hybrid method is a broad one that may be used with other optimization or variable selection strategies to obtain even greater optimization. Although it was used in this study based on hyperspectral datasets of litchi leaves, it might also be used with other high-dimensional datasets from scales including the canopy, landscape, and region.