A Novel Approach for Estimation of Above-Ground Biomass of Sugar Beet Based on Wavelength Selection and Optimized Support Vector Machine

: Timely diagnosis of sugar beet above-ground biomass (AGB) is critical for the prediction of yield and optimal precision crop management. This study established an optimal quantitative prediction model of AGB of sugar beet by using hyperspectral data. Three experiment campaigns in 2014, 2015 and 2018 were conducted to collect ground-based hyperspectral data at three di ﬀ erent growth stages, across di ﬀ erent sites, for di ﬀ erent cultivars and nitrogen (N) application rates. A competitive adaptive reweighted sampling (CARS) algorithm was applied to select the most sensitive wavelengths to AGB. This was followed by developing a novel modiﬁed di ﬀ erential evolution grey wolf optimization algorithm (MDE–GWO) by introducing di ﬀ erential evolution algorithm (DE) and dynamic non-linear convergence factor to grey wolf optimization algorithm (GWO) to optimize the parameters c and γ of a support vector machine (SVM) model for the prediction of AGB. The prediction performance of SVM models under the three GWO, DE–GWO and MDE–GWO optimization methods for CARS selected wavelengths and whole spectral data was examined. Results showed that CARS resulted in a huge wavelength reduction of 97.4% for the rapid growth stage of leaf cluster, 97.2% for the sugar growth stage and 97.4% for the sugar accumulation stage. Models resulted after CARS wavelength selection were found to be more accurate than models developed using the entire spectral data. The best prediction accuracy was achieved after the MDE–GWO optimization of SVM model parameters for the prediction of AGB in sugar beet, independent of growing stage, years, sites and cultivars. The best coe ﬃ cient of determination (R 2 ), root mean square error (RMSE) and residual prediction deviation (RPD) ranged, respectively, from 0.74 to 0.80, 46.17 to 65.68 g / m 2 and 1.42 to 1.97 for the rapid growth stage of leaf cluster, 0.78 to 0.80, 30.16 to 37.03 g / m 2 and 1.69 to 2.03 for the sugar growth stage, and 0.69 to 0.74, 40.17 to 104.08 g / m 2 and 1.61 to 1.95 for the sugar accumulation stage. It can be concluded that the methodology proposed can be implemented for the prediction of AGB of sugar beet using proximal hyperspectral sensors under a wide range of environmental conditions. (CARS) and support vector machine (SVM) modeling after SVM parameter optimization using three optimization algorithms. A novel modiﬁed di ﬀ erential evolution grey wolf optimization algorithm (MDE–GWO) was used and compared to two existing algorithms. Results showed that when SVM models developed using CARS selected wavelengths after parameter optimization by MDE–GWO provided the best prediction accuracy for AGB for each of the three growth stages. These SVM models over performed the corresponding models using the full spectral range.


Introduction
Sugar beet is one of the most important crops for sugar production that is stored in roots. As the development of roots (below ground) and leaves (above-ground) biomass is closely correlated to each other, above-ground biomass (AGB) is considered as an essential parameter for plant growth status, yield and harvest quality [1,2]. Therefore, accurate estimation of AGB is essential for sugar improved algorithm of convergence factor to optimize the parameters of SVM that is hoped to result in more accurate prediction results.
This paper is the first to evaluate the feasibility of a novel MDE-GWO algorithm for improving the prediction accuracy of SVM models of AGB in sugar beet. The objectives of this study are (1) to determine the most important wavelengths for the assessment of AGB in sugar beet, (2) to develop a nonlinear convergence factor for DE-GWO to improve the prediction accuracy of SVM model and (3) to demonstrate the feasibility of MDE-GWO for the optimization of SVM models.

Experimental Design and Crop Growing
Three experiments were conducted in 2014, 2015 and 2018 at different locations in Inner Mongolia Autonomous Region, China, which were laid out in a randomized complete block splitplot design with one factor (N level), as shown in Figure 1. In the year 2014, the study area was located in Tai Pingdi town (119°24′E, 42°29′N) of Song Shan District, Chi Feng city, and included seven levels of N (0, 15, 32.5, 76, 108.5, 163 and 217.5 kg/hm 2 ) with four replicates. In the year 2015, the study area was located in an experimental farm (111°41′E, 40°48′N) of the Inner Mongolia Agricultural University, in Hohhot city, and included four levels of N (0, 80, 120 and 200 kg/hm 2 ) with three replicates. In the year 2018, the study area was located in Ma Heli village (111°13′E, 40°38′N) of Tumd Left Banner, in Hohhot city, and included six levels of N (0, 70, 90, 116, 130 and 150 kg/hm 2 ) with four replicates. The N treatments were randomly assigned into plots (Figure 1), each having approximately 50 m 2 (5 m by 10 m) area. Sugar beet was transplanted with 25 cm by 50 cm spacing. All plots were fertilized with 1.2 kg/plot potassium chloride and 3.8 kg/plot calcium superphosphate. The entire amount of phosphorus, potassium and nitrogen fertilizer was applied prior to seeding as basal fertilizer. Other detailed management information is shown in Table 1. For disease and pest control, pesticides were applied following the local standard practices. Sugar beet was grown once a year and the cropping season started from May (transplanting) and ended up in October (harvesting).

Measurements
All measurements were made during three growth stages-namely, rapid growth stage of leaf cluster, sugar growth stage and accumulation stage-which are the critical stages for the diagnosis of fertilizer requirement as well as for yield prediction. Detailed information about data collection is shown in Table 2. HSI of sugar beet canopy were recorded using a hyperspectral line-scanning spectrometer (Imspecim V10E, Oulu, Finland), with a scanning field of view of 40 • under windless, cloudless and appropriate sunshine conditions around midday (10:00-14:00 LST). The spectral range of the sensor is from 383 to 1003 nm, with a spectral resolution (full width at half maximum (FWHM)) of 2.8 nm. The sensor was held stably 1 m above the canopy by a triangular frame with a nadir sighting ( Figure 2). For each spectral measurement, two scans were performed per plot at the same location where the plant was sampled for AGB assessment with the traditional method, which was necessary to reduce error. The image spatial resolution was set to 1628 pixels by 428 pixels. The exposure time was 5 ms, and the electronic control platform enables rotating the sensor at a rate of 0.36 degrees per second. The average spectral resolution of the data was less than 1 nm in the range of 383-1003 nm. Therefore, a hypercube with dimensions of 1628 (x axis) by 428 (y axis) by 854 wavelengths (z axis) was obtained. Hyperspectral data were recorded for the three growth stages. Considering the scanning area of the spectrometer and the different sizes of sugar beet during each growth stage, the number of sugar beet plants per plot varied per stage: four for the rapid growth stage of leaf cluster and the sugar growth stage, and two for the sugar accumulation stage. In total, 168 samples were taken in 2014, 72 samples in 2015 and 144 samples in 2018. Therefore, the total number of sugar beet samples obtained during the 3-year experimental period was 384. and the electronic control platform enables rotating the sensor at a rate of 0.36 degrees per second. The average spectral resolution of the data was less than 1 nm in the range of 383-1003 nm. Therefore, a hypercube with dimensions of 1628 (x axis) by 428 (y axis) by 854 wavelengths (z axis) was obtained. Hyperspectral data were recorded for the three growth stages. Considering the scanning area of the spectrometer and the different sizes of sugar beet during each growth stage, the number of sugar beet plants per plot varied per stage: four for the rapid growth stage of leaf cluster and the sugar growth stage, and two for the sugar accumulation stage. In total, 168 samples were taken in 2014, 72 samples in 2015 and 144 samples in 2018. Therefore, the total number of sugar beet samples obtained during the 3-year experimental period was 384. Percent plant reflectance was derived as the ratio of reflected radiance to incident radiance estimated by the white reference of a white standard panel and black references (dark current signal), which were taken prior to each reflectance measurement. The reflectance was calibrated using the following formula [25,26]: where R0 is the raw spectral intensity, Rb is corrected spectral intensity, W is calibrated spectral intensity of the white board and B is calibrated spectral intensity obtained by covering the camera lens completely with a black cap.

Above-Ground Biomass (AGB) Measurement
Sugar beet samples were collected in each plot. Samples were divided into two parts, aboveground part (leaves and stems) and under-ground part (root tubers), immediately after HSI measurement of sugar beet canopy. Samples were weighed for total fresh weight and then, for logistic reasons, a sub-sample of about 50% of the total fresh weight was selected randomly from the aboveground part and brought back to the laboratory, after which the dry weight of the sub-samples was recorded after oven drying at 80 °C until variation in weight became constant. Then, the AGB in g/m 2 was calculated based on the transplanted space of sugar beet, using the following equation: where Dp is the dry weight (g) of the part sample brought back to laboratory, FT and Fp are fresh weight (g) for the total sample and part sample, N is the number of plants and A is the area (m 2 ) of the total sample calculated as the row spacing and plant spacing of sugar beet. Percent plant reflectance was derived as the ratio of reflected radiance to incident radiance estimated by the white reference of a white standard panel and black references (dark current signal), which were taken prior to each reflectance measurement. The reflectance was calibrated using the following formula [25,26]:

Data Analysis and Modeling
where R 0 is the raw spectral intensity, R b is corrected spectral intensity, W is calibrated spectral intensity of the white board and B is calibrated spectral intensity obtained by covering the camera lens completely with a black cap.

Above-Ground Biomass (AGB) Measurement
Sugar beet samples were collected in each plot. Samples were divided into two parts, above-ground part (leaves and stems) and under-ground part (root tubers), immediately after HSI measurement of sugar beet canopy. Samples were weighed for total fresh weight and then, for logistic reasons, a sub-sample of about 50% of the total fresh weight was selected randomly from the above-ground part and brought back to the laboratory, after which the dry weight of the sub-samples was recorded after oven drying at 80 • C until variation in weight became constant. Then, the AGB in g/m 2 was calculated based on the transplanted space of sugar beet, using the following equation: where D p is the dry weight (g) of the part sample brought back to laboratory, F T and F p are fresh weight (g) for the total sample and part sample, and A is the area (m 2 ) of the total sample calculated as the row spacing and plant spacing of sugar beet.

Data Analysis and Modeling
Five square regions of interest (ROI) of 400 pixels, which included the top, middle and bottom parts of the leaf, were selected randomly from the sugar beet HSI by the ENVI 5.3 software to calculate the mean reflectance spectrum (Figure 3). The reason for choosing ROI from different positions on the leaf is that, due to the influence of external environmental factors, the distribution of nitrogen content in different parts of leaves (including sugar beet) is uneven. Due to the highly noisy spectral regions of 383-389 nm and 991-1003 nm, these regions were cut out and the only wavelength range Remote Sens. 2020, 12, 620 6 of 22 of 390-990 nm was used for subsequent data analysis. The collected datasets of each growth stage per year were randomly separated into two sub-datasets, calibration set (50% of observations) and validation set (50% of observation). Then, the calibration set of each stage consisted of three years' sub-dataset, whereas the validation set consisted of individual year samples used to verify the accuracy of the AGB calibration model. In other words, 64 samples per growth stage were selected to build the calibration models, whereas 28, 12 and 24 samples (validation set) were used in 2014, 2015 and 2018, respectively, to validate the prediction models (Table 3).
per year were randomly separated into two sub-datasets, calibration set (50% of observations) and validation set (50% of observation). Then, the calibration set of each stage consisted of three years' sub-dataset, whereas the validation set consisted of individual year samples used to verify the accuracy of the AGB calibration model. In other words, 64 samples per growth stage were selected to build the calibration models, whereas 28, 12 and 24 samples (validation set) were used in 2014, 2015 and 2018, respectively, to validate the prediction models (Table 3). Three studied growth stages 28 12 24 The slash indicates no data. In this paper, models for the prediction of AGB were developed using both full spectra and selected wavelengths. CARS was first applied to select the most sensitive wavelengths to AGB for three growth stages. Three optimization methods of grey wolf optimization (GWO), differential evolution-GWO (DE-GWO) and modified DE-GWO (MDE-GWO) were used to optimize SVM parameters, c and γ. A support vector machine (SVM) was finally used to predict AGB using the full spectra and selected wavelengths by CARS. The main steps of the HSI prediction of AGB in sugar beet followed in this study are shown in Figure 4.  Three studied growth stages 28 12 24 The slash indicates no data.
In this paper, models for the prediction of AGB were developed using both full spectra and selected wavelengths. CARS was first applied to select the most sensitive wavelengths to AGB for three growth stages. Three optimization methods of grey wolf optimization (GWO), differential evolution-GWO (DE-GWO) and modified DE-GWO (MDE-GWO) were used to optimize SVM parameters, c and γ. A support vector machine (SVM) was finally used to predict AGB using the full spectra and selected wavelengths by CARS. The main steps of the HSI prediction of AGB in sugar beet followed in this study are shown in Figure 4.

Competitive Adaptive Reweighted Sampling Algorithm (CARS)
The literature shows that the utilization of all variables contained in spectra will not always result in the best prediction accuracy, despite the calculation cost. Therefore, the selection of a set of wavelength variables can not only lead to an increase in the prediction performance accuracy, but reduce the computational cost. CARS was adopted in this study to select the most significant wavelengths for AGB. It is a variable selection algorithm to imitate Darwin's evolution theory of survival of the fittest [27]. In CARS, each wavelength variable is considered as an individual, and individuals contributing to the low prediction accuracy are gradually eliminated. During wavelength selection, the exponentially decreasing function (EDF) is utilized to remove the wavelengths having relatively small absolute regression coefficients by force. Then, adaptive reweighted sampling (ARS) is employed to further eliminate wavelengths in a competitive way and select individuals with larger absolute values of regression coefficients resulted from a partial least squares (PLS) regression model to obtain multiple subsets of wavelength variables. Eventually, according to the lowest root mean squared error of cross-validation (RMSE), an optimal subset of wavelength variables was selected as the optimal wavelengths to be used further in the analysis. More detailed information about the principle and algorithm of CARS can be found in the open literature [28].

Grey Wolf Optimization Algorithm (GWO)
Although SVM is one of the most efficient methods in creating reliable quantitative models for key crop properties, the accuracy, stability and generalization of SVM are determined by two parameters, penalty factor (c) and kernel parameter (γ), which change with data [16]. An optimization approach is needed to optimize these parameters with the aim of maximizing the performance of SVM. Grey wolf optimization (GWO), imitating the hierarchical mechanism (4 level hierarchy) and hunting mechanism of the grey wolf pack, is a meta-heuristic algorithm proposed by Mirjalili et al. [29], with the characteristics of providing strong convergence with fewer input parameters and can be easily realized. Like other bionic algorithms, GWO has a strict mechanism of synergy within the group. In each iteration, the leader wolves are selected through competition within the group. Under the guide of leader wolves, wolves are constantly approaching the prey and attempt to find better prey through collaborative communication. In the algorithm, the position of each grey wolf corresponds to a possible solution. The alpha (α) wolves, the leaders of the pack, are considered as the dominant solution of problems. The beta (β) wolves, the second most eligible candidates for the

Competitive Adaptive Reweighted Sampling Algorithm (CARS)
The literature shows that the utilization of all variables contained in spectra will not always result in the best prediction accuracy, despite the calculation cost. Therefore, the selection of a set of wavelength variables can not only lead to an increase in the prediction performance accuracy, but reduce the computational cost. CARS was adopted in this study to select the most significant wavelengths for AGB. It is a variable selection algorithm to imitate Darwin's evolution theory of survival of the fittest [27]. In CARS, each wavelength variable is considered as an individual, and individuals contributing to the low prediction accuracy are gradually eliminated. During wavelength selection, the exponentially decreasing function (EDF) is utilized to remove the wavelengths having relatively small absolute regression coefficients by force. Then, adaptive reweighted sampling (ARS) is employed to further eliminate wavelengths in a competitive way and select individuals with larger absolute values of regression coefficients resulted from a partial least squares (PLS) regression model to obtain multiple subsets of wavelength variables. Eventually, according to the lowest root mean squared error of cross-validation (RMSE), an optimal subset of wavelength variables was selected as the optimal wavelengths to be used further in the analysis. More detailed information about the principle and algorithm of CARS can be found in the open literature [28].

Grey Wolf Optimization Algorithm (GWO)
Although SVM is one of the most efficient methods in creating reliable quantitative models for key crop properties, the accuracy, stability and generalization of SVM are determined by two parameters, penalty factor (c) and kernel parameter (γ), which change with data [16]. An optimization approach is needed to optimize these parameters with the aim of maximizing the performance of SVM. Grey wolf optimization (GWO), imitating the hierarchical mechanism (4 level hierarchy) and hunting mechanism of the grey wolf pack, is a meta-heuristic algorithm proposed by Mirjalili et al. [29], with the characteristics of providing strong convergence with fewer input parameters and can be easily realized. Like other bionic algorithms, GWO has a strict mechanism of synergy within the group. In each iteration, the leader wolves are selected through competition within the group. Under the guide of leader wolves, wolves are constantly approaching the prey and attempt to find better prey through collaborative communication. In the algorithm, the position of each grey wolf corresponds to a possible solution. The alpha (α) wolves, the leaders of the pack, are considered as the dominant solution of problems. The beta (β) wolves, the second most eligible candidates for the position of α and the delta (δ), who obey the orders of α are considered as the second and third best solutions, respectively. The lowest level of grey wolves is omega (ω) wolves, whose main responsibility is to balance the internal relations of the population.
The hunting mechanism of a grey wolf pack included three successive steps of encircling, hunting and attacking. To encircle a prey, the position of each individual wolf in the pack in each iteration was modeled as detailed in Equations (3)-(7) [30]: where t is the current iteration and T is the maximum iteration, For attacking, when |A| < 1, the grey wolves attack the prey, otherwise, when |A| > 1, the grey wolves will expand the region to search a prey. In this paper, the GWO algorithm was employed for parameter optimization, due to the fewer operators and parameters that need to be adjusted. The prey is the optimal value of the parameter. However, for high-dimensional or multi-objective optimization problems, the GWO algorithm is prone to fall into a local optimum with low optimization accuracy.

Differential Evolution Algorithm (DE)
The differential evolution (DE) algorithm, a multi-objective (continuous variable) optimization algorithm, was proposed by Storn and Price [31] on the basis of evolutionary ideas. The main idea of DE is to evolve based on individual differences. DE algorithm can be run in four successive steps Remote Sens. 2020, 12, 620 9 of 22 including initialization, mutation, crossover and selection. To start with, the number of candidate solutions in the population (NP) is randomly created [32].
where a j is a uniformly distributed random number within the range [0, 1], regenerated for each value of j, D is the dimension of each solution vector and X max j and X min j are the upper and lower bounds of the j-th decision parameter, respectively.
The mutation operator creates mutant vectors X i , by perturbing a randomly selected vector Xr 1 with the difference of two other randomly selected vectors Xr 2 and Xr 3 , according to the following equation: where G is the evolutionary algebra of the population, i, r1, r2 and r3 ∈ {1, 2, . . . , NP} are randomly chosen and must be different from each other and F is the scaling factor ∈ [0, 2] adjusting the perturbation vector's size, X G r2 − X G r3 , and improving algorithm convergence. The process of crossover in DE is enumerated as follow: where rand j,i denotes a random number within the range [0, 1], generated anew for each value of j. The crossover constant C R , chosen from within the range [0, 1], is an algorithm parameter that controls the diversity of the population and aids the algorithm to escape from local minima. j rand is an integer randomly generated within the range [0, D], to ensure that the trial vector is different from the current individual. The selection operator forms the population by choosing between the trial vectors and their predecessors (target vectors); those individuals present a better fitness or are more optimal according to Equation (18): Although the DE algorithm has strong global search ability, its performance is very sensitive to parameter changes, and the local search ability is insufficient. Therefore, DE was introduced in GWO to solve the defect in this work.

Differential Evolution Grey Wolf Optimization Algorithm (DE-GWO)
Considering the complementarity and difference between the different intelligent optimization algorithms, a more efficient hybrid optimization algorithm, the DE-GWO algorithm, was proposed by Zhu et al. [21] to solve the premature problem and improve the overall search capability of GWO. DE is used to maintain the diversity of the grey wolf population to avoid the reduction of population differences during iteration. The mathematical model of DE-GWO can be described in six steps as follows: Step 1: Initialization: After repeated attempts, initial parameter values of DE-GWO were determined. The size of initial population of DE-GWO was 30, and maximum iteration was 500. The dimension of independent variables was 2. The upper and lower bounds of scaling factors were set to be 0.8 and 0.2, respectively, and the crossover probability was 0.2; upper and lower bounds of parameter values were 0.01 and 100, respectively. This was followed by setting the initial values of the parameters a, A and C and randomly generating the initial positions of the population individuals using Equation (15).
Step 2: Perform mutation operations on individual populations to produce variable populations using Equation (16) and then generate the initial parent population through the selection operation of DE using Equation (18).
Step 3: Calculate the objective function value of each grey wolf individual in the population using Equation (19). According to the size of the objective function value, the first three individuals with the lowest fitness value were selected, and then they were recorded as X α , X β , X δ in ascending order.
where f RMSE (·) is the function to calculate the root mean squared error (RMSE) of SVM.
Step 4: Calculate the distance between other grey wolf individuals in the population and the optimal X α , X β , X δ , using Equations (8)- (10), and update the position of each grey wolf individually using Equations (11)- (14).
Step 5: Update the values of A, C and a using Equations (5)- (7). Based on the intermediate population generated by the mutation operation of DE, the progeny population was created by the cross operation. Then, the parent population is updated by the selection process of DE.
Step 6: Calculate the fitness value of all grey wolf individuals, and update the positions of X α , X β , X δ of the parent population according to the size of the fitness value.
Step 7: Determine whether the maximum number of iterations was reached. If yes, exit the optimization and return the value of X α as the final optimal solution, otherwise, go back to Step 2 to continue.

Modified Differential Evolution Grey Wolf Optimization Algorithm (MDE-GWO)
Compared to GWO and DE, the defection of the Local optimum can be solved and the global search ability can be improved by the DE-GWO algorithm. However, higher prediction accuracy and faster convergence speed could be achieved, only when the global exploration and local exploitation are in a good balance. According to the GWO algorithm, the relationship between the absolute value of A and 1 determines if the algorithm will perform the local exploitation or global exploration. Nevertheless, the A value is changed with the convergence factor (a) as explained in Equation (5), which means a is the direct influencing factor of the balance between global exploration and local exploitation in DE-GWO. In general, a is set to decrease linearly from 2 to 0 as iteration increases. However, in practice, the actual iterative search process of the GWO algorithm is non-linear. Therefore, the linear decreasing tendency of a cannot accurately reflect the actual cooperative hunting process of the grey wolf pack. To obtain more accurate optimization results, a novel modified algorithm of the DE-GWO algorithm (MDE-GWO) was proposed in this paper. To balance the ability of global exploration and local exploitation, a novel formula for calculating a based on a cosine function and an exponential function was developed (Equation (20)). It helps DE-GWO to alter the search space according to the non-linear variations by searching quickly in a large area at first and then attacking slowly in a small space.
where t is the iteration and T is the maximum iteration.

Support Vector Machines Algorithm (SVM)
The SVM is a linear data processing method derived from statistical learning theory, which is originally introduced by Cortes and Vapnik [33], based on the principle of structural risk minimization. However, the introduction of kernel methods is the pivotal part for SVM to solve the contradiction between high dimension and the computational complexity of samples. An SVM is described as a quadratic optimization problem [33]: Remote Sens. 2020, 12, 620

of 22
where w is the optimal solution and b is the bias parameter, c > 0 is the penalty parameter of the error term and ξ i is called the slack variable that is related to prediction errors in SVM. The formation and function of SVM are determined by the type of kernel function used. Polynomial, radial basis functions (RBF) and sigmoid (two-layer neural networks) are the most commonly used kernel functions for the nonlinear data. Due to small computational cost, high prediction accuracy and good stability, the RBF was used in this paper, which can be written as follows [34]: where γ is the width parameter of RBF function. The values of c and γ are very critical and affect the prediction accuracy of SVM. The selection of correct values will result in high prediction accuracies and reduce the processing time. They should be determined prior to the training stage. Therefore, to get an accurate prediction result for AGB in sugar beet by the SVM model, the penalty factor, c, and the kernel parameter, γ, were taken as the base 2 indexes and GWO, DE-GWO and MDE-GWO were employed to optimize them in this paper. Prediction models of GWO-SVM, DE-GWO-SVM and MDE-GWO-SVM were constructed in Matlab R2014a software. The flowchart of MDE-GWO-SVM is shown in Figure 5, as an example. The performance of SVM prediction models was compared by means of the coefficient of determination (R 2 ), the root mean squared error (RMSE) and the ratio of prediction deviation (RPD), which is the standard deviation of measured AGB divided by RMSE. The higher the R 2 and RPD values and the lower the RMSE values, the better the model prediction performance.  Figure 5, as an example. The performance of SVM prediction models was compared by means of the coefficient of determination (R 2 ), the root mean squared error (RMSE) and the ratio of prediction deviation (RPD), which is the standard deviation of measured AGB divided by RMSE. The higher the R 2 and RPD values and the lower the RMSE values, the better the model prediction performance.   Figure 5.

Above-Ground Biomass (AGB) Variability
Flow chart of modified differential evolution grey wolf optimization (MDE-GWO-SVM) algorithm. Table 4 summarizes the measurement results of AGB during each growth stage for the calibration and validation datasets. For the calibration set, AGB varies from 33.47 to 596.95 g/m 2 in the rapid growth stage of leaf cluster, 82.64 to 1012.04 g/m 2 in the sugar growth stage and 138.54 to 789.77 g/m 2 in the sugar accumulation stage. The sugar growth stage has the highest mean value of AGB of 530.25 g/m 2 than the other two stages. For the validation dataset, the ranges of AGB of each year are smaller and within the range of the corresponding calibration dataset. This is important as if the range of variability in the validation set is larger than that of the calibration set, prediction accuracy might deteriorate due to the smaller range of variability accounted for in the calibration stage [35,36].

Correlation between Above-Ground Biomass (AGB) and Canopy Reflectance Wavelength
Considerably similar patterns of Pearson correlation coefficients (r) between AGB and wavelengths can be clearly observed across the entire spectral range (Figure 6) for the three studied growth stages. However, large differences in correlation coefficients among growth stages are observed particularly at wavebands in the visible range of 390-410 nm (blue), 570-584 nm (green), 660-680 nm (red) and 700-710 nm (red-edge) and 936-960 nm in the near-infrared spectral range. Similar results were reported by Hansen et al. [11] and Nguyen et al. [5] for wheat and rice, respectively. Therefore, these peak differences can be attributed to differences in AGB content, hence, the potential for successful AGB prediction with HSI. The large differences in raw reflectance at these wavebands (data not shown) are in agreement with the reasonable linear correlations between AGB and reflectance with r value ranges of −0.7-0.33, 0.37-0.49, 0.42-0.58, −0.57-−0.53 and 0.49-0.67 for blue, green, red, red-edge and near-infrared bands, respectively. Moreover, due to the high dimensionality and collinearity of the three-dimensional HSI data, the correlation between adjacent wavelengths and AGB is almost identical and does not clearly differ among different adjacent wavelengths within a certain range, such as 870 nm to 890 nm. Besides, the highest correlation coefficient value was less than 0.7, illustrating that information from a single-band is not enough to predict AGB of sugar beet successfully and that several selected wavelengths or the full spectral data are necessary. Therefore, the potential of the variable selection algorithm is investigated in the following section.
Remote Sens. 2020, 12, x FOR PEER REVIEW 13 of 23 Figure 6. The Pearson correlation coefficients (r) between the spectral wavelengths and the measured above-ground biomass (AGB).

Characteristic Wavelengths Selection with Competitive Adaptive Reweighted Sampling (CARS)
As shown in Figure 7, with the increasing number of sampling runs from 0 to 50, different patterns of changes in the number of sampled variable (wavelengths), root mean square error of cross-validation (RMSE) obtained from the PLS regression analysis and the regression coefficients path can be observed, and the pattern of changes varies among the three studied growth stages. Variation in the number of sampled wavelength shows a two-phase selection, namely, fast selection and refined (fine-tuned) selection. In the fast selection stage, due to the EDF, the number of sampled variables dropped rapidly in the early stage of sampling runs. Then, with the increase of sampling runs, ARS was used to select the key variables based on the regression coefficients. Therefore, after the second sampling run, the number of sampled variables was decreased cautiously in the refined selection until the optimal subset was obtained. The RMSE values of 10-fold cross-validation have not changed in the sampling runs 1-18 (rapid growth stage of leaf cluster and sugar growth stage) and 1-17 (sugar accumulation stage) because of the presence of uninformative variables. Then, RMSE values decreased in sampling runs 19-31 (rapid growth stage of leaf cluster), 19-30 (sugar growth stage) and 18-31 (sugar accumulation stage), indicating that redundant variables that are uncorrelated with AGB were excluded. The minimal RMSE value of three growth stages was located at the sampling runs 31, 30 and 31 for the rapid growth stage of leaf cluster, sugar growth stage and sugar accumulation stage, respectively. These minimal RMSE values marked by a blue asterisk line in Figure 7 were used to determine the optimal variable subset. After the sampling runs of minimal RMSE, some key variables for AGB prediction were culled, resulting in larger residuals, hence, the RMSE values increased, and reached larger maximum values than those at the start of the sampling runs.
CARS algorithm reduced the number of wavelengths considerably from the original 823 to 21 variables for both the rapid growth stage of leaf cluster and sugar accumulation stage and 23 for the sugar growth stage. These wavelengths are predominately located at eight distinctive wavebands centering at 410, 674, 715, 751, 833, 893, 940 and 971 nm (Figure 8). These selected wavelengths were then used for SVM modeling, whose results are compared to those of SVM models developed with the full spectrum.

Characteristic Wavelengths Selection with Competitive Adaptive Reweighted Sampling (CARS)
As shown in Figure 7, with the increasing number of sampling runs from 0 to 50, different patterns of changes in the number of sampled variable (wavelengths), root mean square error of cross-validation (RMSE) obtained from the PLS regression analysis and the regression coefficients path can be observed, and the pattern of changes varies among the three studied growth stages. Variation in the number of sampled wavelength shows a two-phase selection, namely, fast selection and refined (fine-tuned) selection. In the fast selection stage, due to the EDF, the number of sampled variables dropped rapidly in the early stage of sampling runs. Then, with the increase of sampling runs, ARS was used to select the key variables based on the regression coefficients. Therefore, after the second sampling run, the number of sampled variables was decreased cautiously in the refined selection until the optimal subset was obtained. The RMSE values of 10-fold cross-validation have not changed in the sampling runs 1-18 (rapid growth stage of leaf cluster and sugar growth stage) and 1-17 (sugar accumulation stage) because of the presence of uninformative variables. Then, RMSE values decreased in sampling runs 19-31 (rapid growth stage of leaf cluster), 19-30 (sugar growth stage) and 18-31 (sugar accumulation stage), indicating that redundant variables that are uncorrelated with AGB were excluded. The minimal RMSE value of three growth stages was located at the sampling runs 31, 30 and 31 for the rapid growth stage of leaf cluster, sugar growth stage and sugar accumulation stage, respectively. These minimal RMSE values marked by a blue asterisk line in Figure 7 were used to determine the optimal variable subset. After the sampling runs of minimal RMSE, some key variables for AGB prediction were culled, resulting in larger residuals, hence, the RMSE values increased, and reached larger maximum values than those at the start of the sampling runs.
CARS algorithm reduced the number of wavelengths considerably from the original 823 to 21 variables for both the rapid growth stage of leaf cluster and sugar accumulation stage and 23 for the sugar growth stage. These wavelengths are predominately located at eight distinctive wavebands centering at 410, 674, 715, 751, 833, 893, 940 and 971 nm (Figure 8). These selected wavelengths were then used for SVM modeling, whose results are compared to those of SVM models developed with the full spectrum. Remote Sens. 2020, 12, x FOR PEER REVIEW 14 of 23

Modified Differential Evolution Grey Wolf Optimization (MDE-GWO)
The fitness value convergence curve during the iteration carried out for DE-GWO and MDE-GWO is shown in Figure 9. The curve for the MDE-GWO algorithm shows rapid and higher convergence accuracy, compared to the DE-GWO algorithm. Converge began with a minimum fitness value of 17.93 and 17.82 after iterations of 130 and 70 for DE-GWO and MDE-GWO, respectively. Due to the dynamic adjustment of the convergence factor (Ia) with the increasing iteration, the convergence speed is improved, with significant improvement in the accuracy of optimization, evaluated as RMSE. Therefore, the proposed MDE-GWO algorithm was chosen as the best method for optimizing the SVM performance for the prediction of AGB of sugar beet with less computational effort and higher accuracy.  Figure 8. The selected key wavelengths marked by a square for the three studies growth stages, resulted from competitive adaptive reweighted sampling algorithm (CARS).

Modified Differential Evolution Grey Wolf Optimization (MDE-GWO)
The fitness value convergence curve during the iteration carried out for DE-GWO and MDE-GWO is shown in Figure 9. The curve for the MDE-GWO algorithm shows rapid and higher convergence accuracy, compared to the DE-GWO algorithm. Converge began with a minimum fitness value of 17.93 and 17.82 after iterations of 130 and 70 for DE-GWO and MDE-GWO, respectively. Due to the dynamic adjustment of the convergence factor (Ia) with the increasing iteration, the convergence speed is improved, with significant improvement in the accuracy of optimization, evaluated as RMSE. Therefore, the proposed MDE-GWO algorithm was chosen as the best method for optimizing the SVM performance for the prediction of AGB of sugar beet with less computational effort and higher accuracy.

Support Vector Machine (SVM) Models for Above-Ground Biomass (AGB) Prediction
Using the selected characteristic wavelength variables shown in Figure 8, quantitative prediction models (GWO-SVM, DE-GWO-SVM and MDE-GWO-SVM) for AGB of sugar beet were established. In order to evaluate the performance of models with selected wavelengths by CARS, corresponding models using the full wavelengths were also developed. The performance of these prediction models is illustrated in Table 5. Surprisingly, almost all models with the wavelengths selected by CARS performed better than those developed with the full wavelength range, with improvement in % R 2 and RPD values of 2.8-50% and 5.6-164%, respectively, and decreases in RMSE values by 18.6-28% for the validation set.
In terms of model prediction accuracy, it can be observed in Table 5 that different optimization algorithms have different effects on the performance of SVM models. Overall, the accuracy of SVM models was affected by the optimization algorithms. The best results were achieved by employing the MDE-GWO algorithm to optimize the two parameters of SVM, c and γ, followed successively by DE-GWO and GWO algorithms.  1.61 and 1.95, respectively. Therefore, the MDE-GWO algorithm is deemed to be the best among the three methods for parameter optimization of SVM in this paper, and the convergence factor calculated by the proposed MDE-GWO algorithm is more suitable in this case than those obtained with GWO or DE-GWO algorithms.  Figure 9. Fitness value convergence curve shown for differential evolution grey wolf optimization (DE-GWO) (a) and modified differential evolution grey wolf optimization (MDE-GWO) (b) algorithms.

Support Vector Machine (SVM) Models for Above-Ground Biomass (AGB) Prediction
Using the selected characteristic wavelength variables shown in Figure 8, quantitative prediction models (GWO-SVM, DE-GWO-SVM and MDE-GWO-SVM) for AGB of sugar beet were established. In order to evaluate the performance of models with selected wavelengths by CARS, corresponding models using the full wavelengths were also developed. The performance of these prediction models is illustrated in Table 5. Surprisingly, almost all models with the wavelengths selected by CARS performed better than those developed with the full wavelength range, with improvement in % R 2 and RPD values of 2.8-50% and 5.6-164%, respectively, and decreases in RMSE values by 18.6-28% for the validation set.
In terms of model prediction accuracy, it can be observed in Table 5 that different optimization algorithms have different effects on the performance of SVM models. Overall, the accuracy of SVM models was affected by the optimization algorithms. The best results were achieved by employing the MDE-GWO algorithm to optimize the two parameters of SVM, c and γ, followed successively by DE-GWO and GWO algorithms. .95, respectively. Therefore, the MDE-GWO algorithm is deemed to be the best among the three methods for parameter optimization of SVM in this paper, and the convergence factor calculated by the proposed MDE-GWO algorithm is more suitable in this case than those obtained with GWO or DE-GWO algorithms.  Table 5. The results of support vector machine (SVM) prediction of above-ground biomass (AGB) for each growth stage, after grey wolf optimization (GWO), differential evolution grey wolf optimization (DE-GWO) and modified differential evolution grey wolf optimization (MDE-GWO) algorithms.

Important Waveband for the Prediction of Above-Ground Biomass (AGB)
The spectral profiles obtained from HSI data contained more than 800 wavelengths with numerous redundant and multi-collinearity information. In order to reduce and even eliminate the redundancy part of data, CARS was implemented to extract the most useful information from the full spectra, by selecting the most significant wavelengths for AGB prediction. Results showed that CARS has reduced the number of wavelength variables considerably, with reduction rates of 97.4%, 97.2% and 97.4%, for the rapid growth stage of leaf cluster, sugar growth stage and sugar accumulation stage, respectively. The majority of selected wavelengths for each growth stage were located at the blue, red, red-edge and near-infrared regions, which were more sensitive to biomass [37,38]. These wavelengths are predominately located at eight distinctive wavebands centering at 410, 674, 715, 751, 833, 893, 940 and 971 nm (Figure 8). Biomass is associated with plant or leaf moisture. Studies found 955 and 970 nm to be sensitive to moisture [39,40], which is consistent with findings of this study for bands at 954, 955 and 970 nm for the rapid growth stage of leaf cluster, 941 and 981 nm for the sugar growth stage and 953 and 957 nm for the sugar accumulation stage.
The sensitive wavelengths of the rapid growth stage of leaf cluster are mainly located in NIR, with a few in the visible region and none in red-edge. This can be attributed to a small leaf area index at this early growth stage, hence the large background of soil. With plant aging, the center of the selected wavebands has the tendency to migrate to the shorter wavelengths, which can be attributed to the larger canopy coverage, the leaves aging and the growth center (e.g., canopy or roots). It is worth noting that the wavelengths located at the red-edge, chosen by CARS as sensitive variables for both the sugar growth stage and accumulation stage, can be attributed to the correlation between the red-edge and biochemical parameters such as chlorophyll content and biomass [41]. Since the canopy

Important Waveband for the Prediction of Above-Ground Biomass (AGB)
The spectral profiles obtained from HSI data contained more than 800 wavelengths with numerous redundant and multi-collinearity information. In order to reduce and even eliminate the redundancy part of data, CARS was implemented to extract the most useful information from the full spectra, by selecting the most significant wavelengths for AGB prediction. Results showed that CARS has reduced the number of wavelength variables considerably, with reduction rates of 97.4%, 97.2% and 97.4%, for the rapid growth stage of leaf cluster, sugar growth stage and sugar accumulation stage, respectively. The majority of selected wavelengths for each growth stage were located at the blue, red, red-edge and near-infrared regions, which were more sensitive to biomass [37,38]. These wavelengths are predominately located at eight distinctive wavebands centering at 410, 674, 715, 751, 833, 893, 940 and 971 nm (Figure 8). Biomass is associated with plant or leaf moisture. Studies found 955 and 970 nm to be sensitive to moisture [39,40], which is consistent with findings of this study for bands at 954, 955 and 970 nm for the rapid growth stage of leaf cluster, 941 and 981 nm for the sugar growth stage and 953 and 957 nm for the sugar accumulation stage.
The sensitive wavelengths of the rapid growth stage of leaf cluster are mainly located in NIR, with a few in the visible region and none in red-edge. This can be attributed to a small leaf area index at this early growth stage, hence the large background of soil. With plant aging, the center of the selected wavebands has the tendency to migrate to the shorter wavelengths, which can be attributed to the larger canopy coverage, the leaves aging and the growth center (e.g., canopy or roots). It is worth noting that the wavelengths located at the red-edge, chosen by CARS as sensitive variables for both the sugar growth stage and accumulation stage, can be attributed to the correlation between the red-edge and biochemical parameters such as chlorophyll content and biomass [41]. Since the canopy coverage reaches a maximum and leaves overlap with each other in the sugar growth stage, more significant wavelengths were found nearby the blue, green and red-edge regions, as compared with the rapid growth stage of leaf cluster (Figure 8). At the sugar growth stage, leaves almost stop growing, and the growth center of the plant is transferred to roots, while leaves continue photosynthesis. However, the chlorophyll content of canopy leaves decreases [42] as the leaves aging during the sugar accumulation stage, leading to a decrease in the near-infrared sensitivity, which agrees with findings for winter wheat [1]. This is the reason why wavelengths of 567 and 569 nm near the green peak associated with photosynthesis and leaf senescence [43] were selected as significant variables for the sugar accumulation stage. Meanwhile, three bands located in the absorption range (630-690 nm) of chlorophyll, e.g., 665.9, 671.1 and 673.3 nm, were also found significant in the sugar accumulation stage, and this can be attributed to changes in the chlorophyll content at this late growth stage. Overall, during this stage, the majority of important variables sits within the visible range, whereas the NIR range has a smaller number of important variable for AGB prediction. In contrast, the NIR range seems much more important for the earlier two stages, particularly for the rapid growth stage of leaf cluster (Figure 8).

Performance of Support Vector Machine (SVM) Models
Based on the HSI data of sugar beet, this paper analyzed the influence of different optimization algorithms on the accuracy of SVM models for the prediction of AGB of sugar beet. Results showed that SVM models with selected wavelengths provided higher prediction accuracy than corresponding models with full wavelengths, indicating that CARS has successfully extracted the most significant information related to AGB and that the irrelevant information in the full spectra indeed weakens the performance of each model. The best R 2 , RMSE and RPD values of the most accurate model found for models based on CARS selected wavelengths were 0.80, 30.16 g/m 2 and 2.03, respectively. Therefore, we believe that the CARS selected wavelengths approach should be applied to assess the AGB of sugar beet. However, there is still room for improvement in prediction accuracy, as comparable studies for the assessment for biomass using hyperspectral data provided better results (e.g., R 2 > 0.87) [44,45]. This might be attributed to CARS ignoring some important wavelengths to AGB prediction. Therefore, the selection of the best feature extraction algorithm will be the focus of future work.
Among the three algorithms used to optimize the SVM parameters, the MDE-GWO algorithm was found to be the best method for the prediction of AGB of sugar beet with less computational effort and higher accuracy. Though the results of SVM models optimized with MDE-GWO were generally better than GWO and DE-GWO, part of MDE-GWO-SVR models still showed poor model prediction (RPD < 1.5) [46,47], such as the model for the rapid growth stage of leaf cluster with the validation data of 2015 (Table 5). This might be due to the variable data used (data collected from different years, measurement times, N status and sugar beet cultivars), resulting in poor regression and poor model performance [48]. Meanwhile, the prediction accuracy differs among different growth stages, which could be attributed to different ranges of AGB, and different background interferences, few reasons to mention among others. Another reason is the effect of outliers on the prediction accuracy, as no outliers were removed from the current dataset [49]. Furthermore, different external uncontrolled conditions in the field can negatively affect the data acquisition, including the camera height, nadir angle, light conditions and canopy structure. Therefore, further research is required to detect and remove outliers to improve accuracy. It is also interesting to validate the approach used in this study for the prediction of AGB in other crops.
Grid search with 5-fold cross-validation (SCV) is a common simple method to optimize SVM c and γ parameters, although the running time and prediction accuracy of SVM optimized by SCV was not ideal [50]. This might be to the unsuitable search step of the grid search, resulting in missing the optimal solution, while increasing the time. In this work, an R 2 of 0.72 was obtained from optimization by SCV, which is rather smaller than those by GWO, DE-GWO and MDE-GWO methods. Nevertheless, GWO, DE-GWO and MDE-GWO as intelligent algorithms could be autonomously adjusted during the calculations, avoiding the disadvantages of, e.g., SCV. Although the structure of the MDE-GWO-SVM algorithm was the most complicated, it did not require to traverse every position in the solution space in search of the best solution. By this, MDE-GWO-SVM saves time and provides an efficient new solution to the problem of SVM parameter optimization for the AGB assessment of sugar beet.

Conclusions
Timely assessment of above-ground biomass (AGB) in sugar beet is essential for the evaluation of crop growth necessary for precision management of farm input resources (e.g., agrochemicals), aiming at maximizing yield and yield quality for minimized environmental footprint. In this paper, a novel approach for the assessment of AGB in sugar beet was proposed, which was based on hyperspectral image (HSI) data, combined with wavelength selection using competitive adaptive reweighted sampling (CARS) and support vector machine (SVM) modeling after SVM parameter optimization using three optimization algorithms. A novel modified differential evolution grey wolf optimization algorithm (MDE-GWO) was used and compared to two existing algorithms. Results showed that when SVM models developed using CARS selected wavelengths after parameter optimization by MDE-GWO provided the best prediction accuracy for AGB for each of the three growth stages. These SVM models over performed the corresponding models using the full spectral range.
The most sensitive wavelengths selected by CARS were 21 wavelengths for both the rapid growth stage of leaf cluster and sugar accumulation stage and 23 wavelengths for the sugar growth stage. These wavelengths are predominately located at various regions centering at blue (410 nm), red (674 nm), red-edge (715 nm and 751 nm) and near-infrared (833 nm, 893 nm 940 nm and 971 nm) spectral bands. In addition, with plant aging, the center of the selected wavebands has the tendency to migrate toward the shorter wavelength range.
It is recommended to adopt the MDE-GWO-SVM model using selected wavelengths instead of the full spectral range for the measurement of AGB of sugar beet under field conditions using proximal HIS data. This method has the potential to be also used for airborne and remote sensing data, which has to be verified. Further studies need also to test the applicability of the method developed in this study for the prediction of AGB in other crops having different canopy structures.