A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy

Above-ground biomass (AGB) provides a vital link between solar energy consumption and yield, so its correct estimation is crucial to accurately monitor crop growth and predict yield. In this work, we estimate AGB by using 54 vegetation indexes (e.g., Normalized Difference Vegetation Index, Soil-Adjusted Vegetation Index) and eight statistical regression techniques: artificial neural network (ANN), multivariable linear regression (MLR), decision-tree regression (DT), boosted binary regression tree (BBRT), partial least squares regression (PLSR), random forest regression (RF), support vector machine regression (SVM), and principal component regression (PCR), which are used to analyze hyperspectral data acquired by using a field spectrophotometer. The vegetation indexes (VIs) determined from the spectra were first used to train regression techniques for modeling and validation to select the best VI input, and then summed with white Gaussian noise to study how remote sensing errors affect the regression techniques. Next, the VIs were divided into groups of different sizes by using various sampling methods for modeling and validation to test the stability of the techniques. Finally, the AGB was estimated by using a leave-one-out cross validation with these powerful techniques. The results of the study demonstrate that, of the eight techniques investigated, PLSR and MLR perform best in terms of stability and are most suitable when high-accuracy and stable estimates are required from relatively few samples. In addition, RF is extremely robust against noise and is best suited to deal with repeated observations involving remote-sensing data (i.e., data affected by atmosphere, clouds, observation times, and/or sensor noise). Finally, the leave-one-out cross-validation method indicates that PLSR provides the highest accuracy (R2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV (RMSE) = 0.18); thus, PLSR is best suited for works requiring high-accuracy estimation models. The results indicate that all these techniques provide impressive accuracy. The comparison and analysis provided herein thus reveals the advantages and disadvantages of the ANN, MLR, DT, BBRT, PLSR, RF, SVM, and PCR techniques and can help researchers to build efficient AGB-estimation models.


Introduction
Accurate estimates of crop biophysical variables are crucial for monitoring vegetation growth and for analyzing important physiological parameters during the crop growth cycle [1,2].One such variable, above-ground biomass (AGB), plays an important role in plant functioning because it reflects the status of crop growth and is related to solar-energy consumption, yield, and grain quality [3,4].Therefore, AGB is considered as one of the most important crop biophysical parameters, and its accurate estimation can help improve crop monitoring and yield prediction [5].Traditional AGB estimates are based on destructive measurements, which are not only time and labor consuming, but more importantly, are difficult to apply over large areas [6].In recent years, Hyperspectral remote-sensing data acquired from the ground [7,8], unmanned aerial vehicles [9], airborne platforms [10][11][12], and satellite platforms [13] have been able to capture crop canopy spectra in narrow bands and thereby provide information on the biochemical composition of the canopy.Crop physiology research shows that spectral absorption by plant leaves is mainly due to the leaf pigments, especially chlorophyll content (Chl) [14].The reflectance is low in both the blue and red regions of the spectrum, due to absorption by chlorophyll for photosynthesis; it has a peak at the green region which gives rise to the green color of vegetation [15].In the near-infrared region, the reflectance is much higher than that in the visible band due to the cellular structure in the leaves [16].Previous studies have shown that near-infrared-and red-band vegetation indexes (VIs) are effective for estimating AGB [8,9,11].However, during the reproductive growth of crops, with the senescence of leaves, the effectiveness of photosynthesis is reduced [14,17].With clear decreases in both photosynthesis and the near-infrared reflectance, the correlation between AGB and the red-and near-infrared-based VIs reduced.Therefore, hyperspectral remote sensing of AGB has received increasing attention as an efficient and precise method for nondestructive monitoring in agricultural research [18].
Physically based models and empirical regression techniques are two essential approaches for estimating vegetation characteristics from hyperspectral measurements [19].Physically based models were founded on physical principles.The two main examples of this approach are radiative transfer (RT) models and geometric optical models [19].Because vegetation canopy reflectance depends on a number of factors [20] (e.g., leaf-area index, Chl, water content, matter content, soil reflectance, and bidirectional reflectance distribution function), physically based models require canopy biophysical parameters, soil parameters, and some external parameters to simulate canopy reflectance, and these are often not readily available.In contrast, empirical regression techniques require a large number of ground measurements, and offer a direct relationship between spectral features and vegetation parameters.Previous research has used many powerful empirical regression techniques that make full use of the narrow hyperspectral bands, VIs, and even different types of sensor data [21].These techniques essentially fall into two categories: (i) machine-learning techniques such as artificial neural network (ANN) [22], decision tree regression (DT) [23], boosted binary regression tree (BBRT) [24], random forest regression (RF) [25], support vector machine regression (SVM) [26], and (ii) conventional regression techniques such as multivariable linear regression (MLR) [26,27], partial least squares regression (PLSR) [7,8,22], and principal component regression (PCR) [7].Many studies have obtained promising results by using these techniques [8][9][10][11]26].However, hyperspectral data redundancy is a big problem because of the high spectral dimensions and large number of bands [28].In addition, the correlation between the spectral and AGB vary with the crop growth period, which is related to the physiological state of the crop [17].To address this problem, many researchers have tried to extract features from narrow hyperspectral bands first, and many methods to do this have been proposed; for example, correlation analysis, continuum removal [29], red-edge position [30], gray relational analysis [31], and out-of-bag analysis [21].Spectral vegetation indexes (VIs) have been widely used for decades, and more than 60 VIs [32] have been proposed for estimating biophysical variables [33].
Conventional regression techniques are more suitable for data that have a clear linear or exponential relationship with a distinct estimation equation, whereas machine-learning techniques are typically better able to cope with the strong nonlinearity between the biophysical and biochemical parameters and the reflection spectra [34].However, many studies indicate that empirical regression techniques are rarely transferable to other sites with different vegetation, or to data acquired from other types of sensors or under different acquisition conditions.Despite this, empirical regression techniques still have some advantages, such as fewer input variables, less computation, and ease of application, which have resulted in their widespread use under many conditions.
Numerous studies have used hyperspectral remote-sensing data and empirical regression techniques to estimate AGB [26], and some analyses of the performance of these techniques have also been carried out, although they focus mostly on comparing the estimation accuracy.No comprehensive study is available as yet that evaluates these regression techniques for estimating AGB, and no studies have evaluated the different statistical techniques to better understand their respective advantages and disadvantages.
The main objective of the present study is to evaluate the performance (in particular, data selection, sampling methods, noise immunity) of eight regression techniques for estimating AGB.The following four tests were applied: (1) VIs were used to train regression models, which were validated to select best VI input (Section 4.1).
(2) The noise immunities of these techniques were compared by simulating remote-sensing errors by adding white Gaussian noise (Section 4.2).(3) The stability of these techniques was examined by using samples of varying sizes and different sampling methods for modeling and validation (Section 4.3).( 4) Leave-one-out cross validation was used to evaluate the accuracy of the AGB estimation of these techniques (Section 4.4).
We discuss the performance of eight AGB estimation techniques and the advantages and disadvantages of each technique (Section 5), then summarize the optimal conditions for using these techniques.

Study Area
The study area was situated in Changping District, which is located in the northwest part of Beijing City, China (see Figure 1).Experiments were conducted at the National Precision Agriculture Research Center of China (116 • 26 36 E, 40 • 10 44 N).Changping District has an average altitude of 36 m, its total area is about 1352 km 2 , and it has a warm temperate semi-humid continental monsoon climate, with an average rainfall of 450 mm, an average low temperature of −10 to 7.5 • C and an average high temperature of 35 to 40 • C.
The aim of the agronomy experiment was to increase the difference in AGB by using two crop varieties, three water treatments, and four nitrogen treatments.The AGB was measured by using ground-based techniques.The experiments involved two winter wheat cultivars, J9843 and ZM175, which are the main winter wheat varieties grown in northern China.The irrigation treatment included rainfall only (W0, see Figure 1), rainfall plus normal irrigation (W1, 100 mm), and rainfall plus double the normal irrigation (W2, 200 mm).The nitrogen fertilizer treatment included no fertilizer (N0), one-half the normal fertilization (N1, 195 kg/ha), normal fertilization (N2, 390 kg/ha), and twice the normal fertilization (N3, 780 kg/ha).

Measurement of Data
A 5000 m 2 square area was selected as the experimental field (see Figure 1) and divided into 48 plots each of size 6 m × 8 m.In each plot, winter wheat near the center of the given plot was selected for spectral, physiological, and biochemical measurements and analyses.AGB, Chl and canopy spectral measurements were made at four growth stages: the winter wheat jointing stage (13 and 14 April 2015), the flag leaf stage (26 and 27 April 2015), the flowering period (12 to 14 May 2015), and the filling period (25 to 27 May 2015).Four ground-based measurements allowed 192 sets of Chl, winter wheat biomass, and canopy hyperspectral data to be collected.

Measurements of Winter Wheat Canopy Reflectance
Canopy hyperspectral reflectance was acquired by using an ASD FieldSpec 3 spectrometer (FieldSpec 3 spectrometer, Analytical Spectral Devices, Boulder, Colorado, CO, USA) from 10:00~14:00 (Beijing time, UTC/GMT+08:00) in windless and cloudless conditions.We calibrated the field spectrometer based on the reflectance from a 40 cm × 40 cm BaSO4 white board, and the vertical height from the canopy is 1.3 m.The winter wheat canopy reflectance was measured 10 times (the scanning time was 0.2 s) at the center of each plot, and the average reflectance was recorded.To reduce the influence of sky and field conditions on the spectral measurements, each plot was measured three times, and the mean value was used as the canopy reflectance for the given experimental plot.Figure 2a shows the average hyperspectral reflectance spectrum for the four growing stages.

Measurement of Data
A 5000 m 2 square area was selected as the experimental field (see Figure 1) and divided into 48 plots each of size 6 m × 8 m.In each plot, winter wheat near the center of the given plot was selected for spectral, physiological, and biochemical measurements and analyses.AGB, Chl and canopy spectral measurements were made at four growth stages: the winter wheat jointing stage (13 and 14 April 2015), the flag leaf stage (26 and 27 April 2015), the flowering period (12 to 14 May 2015), and the filling period (25 to 27 May 2015).Four ground-based measurements allowed 192 sets of Chl, winter wheat biomass, and canopy hyperspectral data to be collected.

Measurements of Winter Wheat Canopy Reflectance
Canopy hyperspectral reflectance was acquired by using an ASD FieldSpec 3 spectrometer (FieldSpec 3 spectrometer, Analytical Spectral Devices, Boulder, Colorado, CO, USA) from 10:00~14:00 (Beijing time, UTC/GMT+08:00) in windless and cloudless conditions.We calibrated the field spectrometer based on the reflectance from a 40 cm × 40 cm BaSO 4 white board, and the vertical height from the canopy is 1.3 m.The winter wheat canopy reflectance was measured 10 times (the scanning time was 0.2 s) at the center of each plot, and the average reflectance was recorded.To reduce the influence of sky and field conditions on the spectral measurements, each plot was measured three times, and the mean value was used as the canopy reflectance for the given experimental plot.Figure 2a shows the average hyperspectral reflectance spectrum for the four growing stages.

Measurements of Winter Wheat Chlorophyll and Above-Ground Biomass
During measurements, the planting density of winter wheat (row spacing 15 cm) was investigated, and 20 stems were collected near the center of each plot.Chl was measured from the first and second uppermost leaves by using a Dualex 4 (Dualex Scientific Portable Sensor for Leaf Measurements, Force-a, Université Paris Sud, Orsay, France) and the average values were processed (see Figure 2b).
After ground measurements, the winter wheat organs were processed in the laboratory.They were first put into paper bags and dried at 80 °C to remove moisture, then, once the sample weight became constant (about 24 h), they were weighed by using a balance with an accuracy of 0.001 g.Finally, the biomass per unit area was calculated based on the measured planting density and sample dry weight.The winter wheat AGB was calculated by using where m is the dry weight of the sample, n is the number of winter wheat ears per unit area, and l is the row spacing.The statistics of the AGB measurement for different growing periods is shown in Table 1.

Methods
Data selection, sampling methods, noise immunity, and prediction performance were analyzed by using a series of VIs and ground-based measurements of the AGB.The flowchart in Figure 3 illustrates the process.

Measurements of Winter Wheat Chlorophyll and Above-Ground Biomass
During measurements, the planting density of winter wheat (row spacing 15 cm) was investigated, and 20 stems were collected near the center of each plot.Chl was measured from the first and second uppermost leaves by using a Dualex 4 (Dualex Scientific Portable Sensor for Leaf Measurements, Force-a, Université Paris Sud, Orsay, France) and the average values were processed (see Figure 2b).
After ground measurements, the winter wheat organs were processed in the laboratory.They were first put into paper bags and dried at 80 • C to remove moisture, then, once the sample weight became constant (about 24 h), they were weighed by using a balance with an accuracy of 0.001 g.Finally, the biomass per unit area was calculated based on the measured planting density and sample dry weight.The winter wheat AGB was calculated by using where m is the dry weight of the sample, n is the number of winter wheat ears per unit area, and l is the row spacing.The statistics of the AGB measurement for different growing periods is shown in Table 1.

Methods
Data selection, sampling methods, noise immunity, and prediction performance were analyzed by using a series of VIs and ground-based measurements of the AGB.The flowchart in Figure 3 illustrates the process.

Machine Learning Techniques
(1) Artificial neural networks have represented a hot research topic in artificial intelligence since the 1980s; this method is very powerful in dealing with nonlinear relationships [35].An ANN is based on a collection of connected units called artificial neurons, and each neuron can transmit a signal to other neurons.ANN is composed of a large number of neurons, with each neuron representing a particular output function.The connection between two neurons represents the weighted value of the signal through the connection.The network outputs different weighted values and the incentive function, according to the network connection mode.(2) Support vector machines were proposed by Cortes and Vapnik [36] in 1995 and offer many unique advantages for dealing with complex multidimensional data.A SVM constitutes a supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis.A SVM can be used as a regression method, maintaining all the main features that characterize the algorithm (i.e., maximal margin).Support vector regression (SVR) uses the same principles as a SVM for classification, with only a few minor differences.Herein, we use LIBSVM (LIBSVM 3.1.2-ALibrary for Support Vector Machines, Version 3.12 [37]) for the tests.(3) A decision tree is a tree structure in which each internal node represents a test of an attribute, each branch represents a test output, and each leaf node represents a category [38].A decision node has two or more branches, each representing values for the attribute tested.Breaking down a dataset into smaller and smaller subsets incrementally develops an associated decision tree.The final result is a tree with decision nodes and leaf nodes.A leaf node represents a decision for the numerical target.(4) A boosted binary regression tree is a powerful regression method proposed by Friedman [24] in 2001.Boosted binary regression trees combine binary regression trees by using a gradient-boosting technique [39].(5) Random forest regression is a data analysis and statistical method that is widely used in machine-learning research.It was proposed by Breiman and Cutler [25] in 2001 and is an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees.RF has a higher accuracy, better tolerance to outliers and noise, and makes excellent use of the full spectral information.

Machine Learning Techniques
(1) Artificial neural networks have represented a hot research topic in artificial intelligence since the 1980s; this method is very powerful in dealing with nonlinear relationships [35].An ANN is based on a collection of connected units called artificial neurons, and each neuron can transmit a signal to other neurons.ANN is composed of a large number of neurons, with each neuron representing a particular output function.The connection between two neurons represents the weighted value of the signal through the connection.The network outputs different weighted values and the incentive function, according to the network connection mode.(2) Support vector machines were proposed by Cortes and Vapnik [36] in 1995 and offer many unique advantages for dealing with complex multidimensional data.A SVM constitutes a supervised learning model with associated learning algorithms that analyze data used for classification and regression analysis.A SVM can be used as a regression method, maintaining all the main features that characterize the algorithm (i.e., maximal margin).Support vector regression (SVR) uses the same principles as a SVM for classification, with only a few minor differences.Herein, we use LIBSVM (LIBSVM 3.1.2-ALibrary for Support Vector Machines, Version 3.12 [37]) for the tests.(3) A decision tree is a tree structure in which each internal node represents a test of an attribute, each branch represents a test output, and each leaf node represents a category [38].A decision node has two or more branches, each representing values for the attribute tested.Breaking down a dataset into smaller and smaller subsets incrementally develops an associated decision tree.The final result is a tree with decision nodes and leaf nodes.A leaf node represents a decision for the numerical target.(4) A boosted binary regression tree is a powerful regression method proposed by Friedman [24] in 2001.Boosted binary regression trees combine binary regression trees by using a gradient-boosting technique [39].(5) Random forest regression is a data analysis and statistical method that is widely used in machine-learning research.It was proposed by Breiman and Cutler [25] in 2001 and is an ensemble learning method for classification and regression that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or the mean prediction (regression) of the individual trees.RF has a higher accuracy, better tolerance to outliers and noise, and makes excellent use of the full spectral information.

Conventional Regression Techniques
(1) Multiple linear regression is a regression method in which two or more independent variables are used to analyze a dependent variable.The regression equation is used to calculate the parameters by using the least squares method in which the sum of the errors squared is minimized.(2) Partial least squares regression is a data analysis method proposed by Wold [40] in 1966.PLSR has also been widely used in studies of vegetation because it provides an efficient way to make full use of hyperspectral information.Previous studies [8][9][10] indicate that PLSR makes excellent use of the full spectral information and is a flexible method for monitoring agricultural crop parameters.(3) Principal component analysis (PCA) is a technique to simplify data sets based on a linear transformation of data into a new coordinate system.After that transformation, the largest variance in the data projection appears in the first coordinate (called the first principal component), the second largest variance appears in second coordinate (second principal component), and so on.PCA often reduces the dimensionality of data sets.This method can reduce the dimensionality of hyperspectral data, thus avoiding the problem of collinear variables that can occur in PLSR and MLR regression [7].
In the present work, we analyze the ANN, SVM, RF, BBRT, DT, MLR, PLSR, and PCR regression by using Matlab2014a (Matrix Laboratory 2014a, MathWorks, Inc., Natick, MA, USA) on a Microsoft Windows platform.

Selection of Vegetation Indexes
A VI is a combination of two or more characteristic spectra acquired by multispectral or hyperspectral remote sensing.It is a simple, effective, and empirical measure of the surface vegetation status.VIs are widely used to classify vegetation and environmental changes, determine crop and forage yield, monitor droughts, etc.After many years of research on narrow-band hyperspectral spectra, incomplete statistics show that dozens of VIs exist that can be used to estimate biophysical parameters [13,.
Data redundancy and multi-collinearity can seriously affect regression performance.By using selected 54 VIs (Table 2) [32], the abilities of eight techniques to solve the multi-collinearity problem can be analyzed.Section 4.1 gives the best input VIs (we set seven levels: 5, 10, 15, 20, 30, 40 and 54) based on eight techniques by analyzing the modeling and validation as a function of the VIs used as the input.

Noise Simulation
Many error sources exist in remote-sensing imaging and sensor systems (see Figure 4), including radiation errors caused by the atmosphere, topography, or other geometric errors and systematic errors related to the charge-coupled device (CCD) sensor [78,79].Although radiometric calibration and radiometric correction are applied to correct for sensor degradation and atmospheric effects, the noise cannot be completely removed.Noise such as shot noise due to the quantum properties of light and readout noise generated by the output amplifier remain, and both follow a Poisson distribution [78].In addition, the dark-current noise and thermal noise are present and are proportional to the CCD temperature; these follow a Gaussian distribution [78,79].

Noise Simulation
Many error sources exist in remote-sensing imaging and sensor systems (see Figure 4), including radiation errors caused by the atmosphere, topography, or other geometric errors and systematic errors related to the charge-coupled device (CCD) sensor [78,79].Although radiometric calibration and radiometric correction are applied to correct for sensor degradation and atmospheric effects, the noise cannot be completely removed.Noise such as shot noise due to the quantum properties of light and readout noise generated by the output amplifier remain, and both follow a Poisson distribution [78].In addition, the dark-current noise and thermal noise are present and are proportional to the CCD temperature; these follow a Gaussian distribution [78,79].To evaluate how sensor noise and other uncertainty sources affect these data-analysis techniques, we simulate the internal noise (dark current, random noise) of a CCD used for remote sensing.We add random noise to the validation VIs to analyze how remote-sensing noise affects the stability of these techniques with the help of white Gaussian noise [7].These models to estimate the AGB use the original VIs and are validated by using VIs with noise added.In these tests, the signal-to-noise ratio (SNR) is set to 5, 10, 15, 20, 30, and 50 (the noise increases with decreasing SNR).We also compare the results to those obtained with noiseless VIs.

Modeling Parameters and Sampling Methods
Before the test, the ability to deal with multiple VIs (best input VIs) and to obtain the modeling parameters must be analyzed.In this test, the optimal modeling parameters are determined according to the accuracy of validation by using an exhaustive method within limits (Table 3).In the To evaluate how sensor noise and other uncertainty sources affect these data-analysis techniques, we simulate the internal noise (dark current, random noise) of a CCD used for remote sensing.We add random noise to the validation VIs to analyze how remote-sensing noise affects the stability of these techniques with the help of white Gaussian noise [7].These models to estimate the AGB use the original VIs and are validated by using VIs with noise added.In these tests, the signal-to-noise ratio (SNR) is set to 5, 10, 15, 20, 30, and 50 (the noise increases with decreasing SNR).We also compare the results to those obtained with noiseless VIs.

Modeling Parameters and Sampling Methods
Before the test, the ability to deal with multiple VIs (best input VIs) and to obtain the modeling parameters must be analyzed.In this test, the optimal modeling parameters are determined according to the accuracy of validation by using an exhaustive method within limits (Table 3).In the next modeling test, these optimal parameters were selected for testing.Through thousands of rough modeling and verification processes (Note in Table 3), we obtained the best optimal model parameters under the optimal VIs input.Leave one sampling (LOS) was used to evaluate the performance of each technique, and global random sampling (GRS) and growth-period sampling (GPS) were used to evaluate the performance and stability of each technique with different sampling methods [81,82].Global random sampling represents random samples from all samples, with a total of three samplings taken and denoted GRS1/3 (64 samples for modeling, the remaining 128 samples for validation), GRS1/2 (96 samples for modeling, the remaining 96 samples for validation), and GRS2/3 (128 samples for modeling, the remaining 64 samples for validation).Growth-period sampling uses random samples from each period and ensures an equal number of samples per birth period.Therefore, all samples from the whole growth period were divided into four layers, with each layer including one growth period.Again a total of three samplings were taken and are likewise denoted GPS1/3, GPS1/2, and GPS2/3.Leave one sampling represents leave-one-out cross-validation [83] in which only one sample is selected for verification, with all other samples taken as training samples (191 samples for modeling, the remaining one sample used for validation).

Precision Evaluation
We use the coefficient of determination R 2 , the root mean square error (RMSE), the mean absolute error (MAE), and normalized root mean square error (NRMSE) and coefficient of variance of the root mean square error [CV(RMSE)] to evaluate the accuracy of each technique.A larger R 2 value corresponds to a smaller RMSE, MAE, NRMSE, and CV(RMSE) and greater model accuracy.R 2 , RMSE, MAE, NRMSE, and CV(RMSE) are calculated as follows: where x i and y i are the estimated and measured AGB values, respectively, y max and y min are the maximum and minimum measured values, respectively, x and y are the average estimated and measured values, respectively, and n is the sample number.

Results
The correlation coefficients r between the AGB and the VIs are shown in Table 4.The results show that all the measured VIs are correlated with biomass to varying degrees.Of the 54 VIs investigated, NPQI performs the best (r = 0.757).The correlation with AGB of the red-, green-, and blue-band-based spectral VIs [e.g., NPQI (0.757), BGI (0.555), TCARI (0.188), NPCI (0.474), BRI (0.519), and MCARI (0.237)] is greater than that of red-and near-infrared-band-based spectral VIs [e.g., MND 680 (0.193), MND 705 (0.086), NDVI (0.039), SR (0.029), EVI2 (0.182), OSAVI (0.088), and EVI (0.149)].The correlation coefficients r among the 54 VIs are shown in Figure 5.For each VI, there are 53 colors that represent different correlation coefficient values.The results (Figure 5) show that complex correlations exist among these 54 VIs.NPQI (first), BGI (second), ARI (16th) and TCARI (22nd) are low correlated with other VIs.Among the top 22 VIs, a low correlation (zone a) was observed, but the remaining 32 VIs have a high correlation (zone b).The VIs analysis results of total explained variance and variance inflation factor (VIF) values are shown in Tables A1 and A2 (in Appendix A).

Selection of Vegetation Indexes
The best AGB models and the associated validation accuracy of the eight techniques are shown in Figure 6.After a different number of VIs (Table 5) were incorporated into the modeling, the validation accuracy of ANN, BBRT, DT, and RF (Figure 6a-c,g) flattens out.BBRT and DT performed well when using the top five VIs as the input, although the validation accuracy decreases slightly after using lower-correlation VIs as the input for modeling (Figure 6b,c).The performance of ANN, PCR, and SVM becomes complex (Figure 6a,f,h) after using lower-correlation VIs as the input for modeling.

Selection of Vegetation Indexes
The best AGB models and the associated validation accuracy of the eight techniques are shown in Figure 6.After a different number of VIs (Table 5) were incorporated into the modeling, the validation accuracy of ANN, BBRT, DT, and RF (Figure 6a-c,g) flattens out.BBRT and DT performed well when using the top five VIs as the input, although the validation accuracy decreases slightly after using lower-correlation VIs as the input for modeling (Figure 6b,c).The performance of ANN, PCR, and SVM becomes complex (Figure 6a,f,h) after using lower-correlation VIs as the input for modeling.

Selection of Vegetation Indexes
The best AGB models and the associated validation accuracy of the eight techniques are shown in Figure 6.After a different number of VIs (Table 5) were incorporated into the modeling, the validation accuracy of ANN, BBRT, DT, and RF (Figure 6a-c,g) flattens out.BBRT and DT performed well when using the top five VIs as the input, although the validation accuracy decreases slightly after using lower-correlation VIs as the input for modeling (Figure 6b,c).The performance of ANN, PCR, and SVM becomes complex (Figure 6a,f,h) after using lower-correlation VIs as the input for modeling.Note: the number of input VIs (n) represents the top n VIs in Table 4.
The results, given in Table 5, are the optimum numbers of input VIs determined by the change of validation accuracy (Figure 6).The numbers were confirmed when the best estimation accuracy was obtained.For example, with the input modeling of the top 30 VIs, the highest accuracy was acquired (Figure 6a).Accordingly, for RF, the parameters ntree = 520 and mtry = 8, the ANN hidden layer is 10 and 2; c = 10 and g = −2.5 for LIBSVM.In PCR modeling, we use 85% as the threshold to determine the principal component, and when we use the top five VIs as the input, the cumulative variance of the top two principal components is 89.136%.
Considering that the correlation coefficients r among VIs were high (Figure 5), multi-collinearity may be a problem when using so many VIs for modeling.The VIs analysis results of total explained variance (Table A1) and VIF values (Table A2) also support this view.The results in Figure 6 show that different techniques have varying abilities in tackling multi-collinearity data.ANN, BBRT and RF show a good performance when dealing with collinear data (Figure 6a,b,g), which performed well in 30 VIs, 40 VIs and 54 VIs group modeling and validation.Greater modeling accuracy was obtained with MLR after using more VIs as the input for modeling (Figure 6d), but validation accuracy decreased, especially after more than 20 VIs were input (R 2 (V), MAE (V) and RMSE (V) in Figure 6d).

Test with White Gaussian Noise
A comparative analysis of different estimation accuracies resulting from the eight selected techniques with white Gaussian noise (different SNR values) is presented in Figure 7.For each technique, the three figures (Figure 7) represent R 2 , RMSE and MAE with different SNR.Note: the number of input VIs (n) represents the top n VIs in Table 4.
The results, given in Table 5, are the optimum numbers of input VIs determined by the change of validation accuracy (Figure 6).The numbers were confirmed when the best estimation accuracy was obtained.For example, with the input modeling of the top 30 VIs, the highest accuracy was acquired (Figure 6a).Accordingly, for RF, the parameters ntree = 520 and mtry = 8, the ANN hidden layer is 10 and 2; c = 10 and g = −2.5 for LIBSVM.In PCR modeling, we use 85% as the threshold to determine the principal component, and when we use the top five VIs as the input, the cumulative variance of the top two principal components is 89.136%.
Considering that the correlation coefficients r among VIs were high (Figure 5), multi-collinearity may be a problem when using so many VIs for modeling.The VIs analysis results of total explained variance (Table A1) and VIF values (Table A2) also support this view.The results in Figure 6 show that different techniques have varying abilities in tackling multi-collinearity data.ANN, BBRT and RF show a good performance when dealing with collinear data (Figure 6a,b,g), which performed well in 30VIs, 40VIs and 54VIs group modeling and validation.Greater modeling accuracy was obtained with MLR after using more VIs as the input for modeling (Figure 6d), but validation accuracy decreased, especially after more than 20 VIs were input (R 2 (V), MAE (V) and RMSE (V) in Figure 6d).

Test with White Gaussian Noise
A comparative analysis of different estimation accuracies resulting from the eight selected techniques with white Gaussian noise (different SNR values) is presented in Figure 7.For each technique, the three figures (Figure 7) represent R 2 , RMSE and MAE with different SNR.RF performs best in this test with a validation R 2 near 0.2 and SNR = 5.For MLR, poor noise immunity was observed; its validation accuracy declined from SNR = 30 (Figure 7).Meanwhile, the validation of MLR (R 2 (<0.2),RMSE (about 8 t/ha) and MAE (about 6 t/ha)) is the worst in eight techniques.As the noise increases (Figure 7, SNR <30), MLR, PCR and PLSR are extremely sensitive to it, whereas ANN, SVM, DT, and BBRT are more robust against noise; however, the latter techniques also show a poor performance with increased levels of noise (Figure 7, SNR = 5).

Stability Test
Figure 8 shows the results for modeling and validation using GRS1/3, GRS1/2, and GRS2/3 with the parameters given in Table 5.The calculated absolute difference of AGB − GRS modeling and the validation accuracy (we use ∇R 2 , ∇RMSE and ∇MAE to show the difference) are given in Table 6.
Figure 7 shows the noise immunity for the eight analytical techniques.The results indicate RF > SVM >DT > BBRT >ANN > PCR > PLSR > MLR.RF performs best in this test with a validation R 2 near 0.2 and SNR = 5.For MLR, poor noise immunity was observed; its validation accuracy declined from SNR = 30 (Figure 7).Meanwhile, the validation of MLR (R 2 (<0.2),RMSE (about 8 t/ha) and MAE (about 6 t/ha)) is the worst in eight techniques.As the noise increases (Figure 7, SNR <30), MLR, PCR and PLSR are extremely sensitive to it, whereas ANN, SVM, DT, and BBRT are more robust against noise; however, the latter techniques also show a poor performance with increased levels of noise (Figure 7, SNR = 5).

Stability Test
Figure 8 shows the results for modeling and validation using GRS1/3, GRS1/2, and GRS2/3 with the parameters given in Table 5.The calculated absolute difference of AGB − GRS modeling and the validation accuracy (we use ∇ R 2 , ∇ RMSE and ∇ MAE to show the difference) are given in Table 6.The difference in the modeling and validation results with eight techniques for global random sampling varies (Figure 8).For MLR and PLSR, a stable performance was observed with three sampling methods (Table 6).SVM and PCR have a poorer performance than MLR and PLSR in this test (Figure 8 and Table 6).For ANN, RF, DT and BBRT, the difference in modeling and validation accuracy is huge (Figure 8 and Table 6).As the sampling method changes (1/3, 1/2, 2/3), these four techniques still show a poor performance in all investigated techniques.In addition, the validation accuracy of almost all techniques is lower than modeling accuracy (Figure 8).
The modeling and validation results for GPS1/3, GPS1/2, and GPS2/3 are shown in Figure 9.The absolute difference of AGB − GRS between modeling and validation accuracy (we use ∇ R2, ∇ RMSE and ∇ MAE to show the difference) appears in Table 7.The difference in the modeling and validation results with eight techniques for global random sampling varies (Figure 8).For MLR and PLSR, a stable performance was observed with three sampling methods (Table 6).SVM and PCR have a poorer performance than MLR and PLSR in this test (Figure 8 and Table 6).For ANN, RF, DT and BBRT, the difference in modeling and validation accuracy is huge (Figure 8 and Table 6).As the sampling method changes (1/3, 1/2, 2/3), these four techniques still show a poor performance in all investigated techniques.In addition, the validation accuracy of almost all techniques is lower than modeling accuracy (Figure 8).
The modeling and validation results for GPS1/3, GPS1/2, and GPS2/3 are shown in Figure 9.The absolute difference of AGB − GRS between modeling and validation accuracy (we use ∇R2, ∇RMSE and ∇MAE to show the difference) appears in Table 7.

Estimation Accuracy with Leave One Sampling
The results for validation with LOS appear in Figure 10.PLSR provides the highest accuracy for AGB−LOS modeling.The LOS validation results shown in Figure 10 suggest that all of these techniques have impressive accuracy: the R 2 values are at least 0.79 [PCR with RMSE = 1.63 t/ha, MAE = 1.24 t/ha, NRMSE = 0.10, CV(RMSE) = 0.25], and PLSR [R 2 = 0.89, RMSE = 1.20 t/ha, MAE = 0.90 t/ha, NRMSE = 0.07, CV(RMSE) = 0.18] has the highest accuracy.The leave-one-out cross validation indicates that the prediction performance of these techniques can be ranked as (Figure 10i) PLSR > MLR > RF > SVM > BBRT > ANN > DT > PCR.

Analysis and Selection of Vegetation Indexes
Our correlation analysis shows that VIs are correlated with AGB to varying degrees.The results of the correlation analysis (Table 4) demonstrate that only the correlation of the top 20 VIs exceeds 0.2.Previous studies have shown that near-infrared-and red-band VIs are effective for estimating AGB [8,9,11].The correlation between AGB and red-and near-infrared-band VIs is low in this study (Table 4).This may be because, during the reproductive stage, photosynthesis and the near-infrared reflectance [84] both clearly decrease (Figure 2), reducing the correlation between AGB and the red-and near-infrared-based VIs.This result is consistent with the results of a previous study [17].By contrast, the correlation between the entire growth stage of winter wheat AGB estimates and the red-, green-, and blue-band spectral indexes is more promising (Table 4).Our study also demonstrated that these vegetation indices are effective in estimating AGB (Figure 6, top five VIs input).Thus, red-, green-, and blue-band spectral indexes are useful as they can be used to estimate AGB during vegetative growth and reproductive growth stages.

Analysis and Selection of Vegetation Indexes
Our correlation analysis shows that VIs are correlated with AGB to varying degrees.The results of the correlation analysis (Table 4) demonstrate that only the correlation of the top 20 VIs exceeds 0.2.Previous studies have shown that near-infrared-and red-band VIs are effective for estimating AGB [8,9,11].The correlation between AGB and red-and near-infrared-band VIs is low in this study (Table 4).This may be because, during the reproductive stage, photosynthesis and the near-infrared reflectance [84] both clearly decrease (Figure 2), reducing the correlation between AGB and the red-and near-infrared-based VIs.This result is consistent with the results of a previous study [17].By contrast, the correlation between the entire growth stage of winter wheat AGB estimates and the red-, green-, and blue-band spectral indexes is more promising (Table 4).Our study also demonstrated that these vegetation indices are effective in estimating AGB (Figure 6, top five VIs input).Thus, red-, green-, and blue-band spectral indexes are useful as they can be used to estimate AGB during vegetative growth and reproductive growth stages.
A serious multi-collinearity problem arises in the investigation of 54 VIs (Figure 5, Tables A1  and A2); our results (Figure 6) show that the verification accuracy of eight techniques differs when fed multi-collinearity data (Figure 5) as the input.Previous studies have shown that machine learning techniques (ANN [81], BBRT [24] and RF [21,85]) can make full use of the narrow hyperspectral bands (strongly collinear data) and VIs.In the current study, ANN, BBRT and RF are almost unaffected by using multi-collinearity data (Figure 6a,b,g), which may indicate that these techniques are robust against noise, which may relate to the principles of these techniques.Garg et al. [86] indicated that the machine learning technique is suitable for tackling the multi-collinearity problem; our results showed that machine learning techniques have better abilities in tackling the multi-collinearity problem than that of conventional regression techniques (Figure 6).In addition, the results show that MLR has a poor performance when using multi-collinearity data to estimate AGB, which confirms the results of a previous study [87]; PLSR performs best in three conventional regression techniques for tackling the multi-collinearity problem, which confirms the results of a previous study [87].Thus, PLSR is a useful tool that can be used to estimate several response variables simultaneously, while accounting for multi-collinearity variables [88].

Analysis of Noise Immunity
Our results show that machine learning techniques are more immune to powerful noise than conventional regression techniques (Figure 7); RF performs best in this noise test.This may be because the RF method randomly changes the input variable and validates the importance of the input data, thus generating a large number of decision trees and reducing the impact of noise; this result corresponds to the results of previous studies [25, 89,90].Our results (Figure 7) show that MLR is more sensitive to noise than PLSR, which is consistent with the findings of Zhao et al. [91].The results of Atzberger et al. [7] indicate that the noise immunity of PCR, PLSR, and SMLR is ranked as PCR > PLSR > MLR, which is exactly the same ranking as obtained in the present work (Figure 7).The present noise immunity results (Figure 7) are important because repeated observations by remote-sensing techniques occur at different times; so techniques with poor noise immunity may lead to low accuracy because of data errors [26,78,79] (e.g., due to atmosphere, clouds, observation times, sensor noise).Our noise immunity results may explain why different regression studies of vegetation parameters based on remote sensing obtain significantly different results.

Analysis of Stability and Prediction Performance
PLSR and MLR both perform better in stability tests than machine learning techniques (Tables 6  and 7, from 1/3 to 1/2 and 2/3 sampling).Farifteh et al. [92] indicated that PLSR performs more stably in soil salinity estimation than ANN (PLSR: R 2 : 0.6~0.98,RMSE% = 11.6~48%;ANN: R 2 = 0.46~0.97,RMSE% = 12.5~57%).Thus, PLSR and MLR may be suitable for works in which fewer samples are available for modeling.BBRT and DT perform poorly in stability tests, and the AGB estimation model seems to be over fit because R 2 is close to unity in all tests (Figure 6b,c).However, the validation accuracy of BBRT remains high, whereas that of DT is poor.Fewer studies evaluating DT for AGB estimation are available, which may be because DT does not deliver high accuracy for AGB estimation by remote sensing.Yuan et al. [81] indicated that the accuracy of the simple random sampling method is lower than stratified sampling, and our results are in agreement with that study; our results also indicate that all GPS models are more stable than GRS with 2/3 sampling (Table 6 GRS 2/3, and Table 7 GPS 2/3).This may be because the inappropriate sample selection method affects modeling and validation accuracy, which may indicate that GPS sampling is more suitable for these techniques.A previous study showed that stratified sampling helps to generate a good calibration set [82]; this may explain why GPS performed better than GRS in this study.

Conclusions
We have provided herein a series of machine learning and conventional regression techniques to estimate hyperspectral winter wheat AGB and select input data for the sampling methods.We have also analyzed the noise immunity and prediction accuracy.The results allow the following conclusions to be drawn: (1) Machine learning is the correct technique for tackling the multi-collinearity problem.ANN, BBRT and RF are almost unaffected by the multi-collinearity problem (Figure 6a,b,g), while MLR and PCR could not solve it.(2) Machine learning techniques are much more immune to noise than conventional regression techniques.In terms of noise immunity, the techniques are ranked as follows (Figure 7): RF > SVM >DT > BBRT >ANN > PCR > PLSR > MLR.Thus, RF may be suitable for work that requires repeated observations via remote sensing.(3) The growth-period random sampling method performed better in stability tests.PLSR and MLR perform well in all stability tests (Figures 8 and 9 and Tables 6 and 7); these techniques and the sampling method may be suitable for work in which only a few samples are available for high-accuracy and stability estimation modeling.( 4) This study demonstrated the potential application of VIs, machine learning and conventional regression techniques in estimating winter wheat biomass.The experimental results indicated that PLSR, MLR, and RF may be suitable for work that requires high-accuracy estimation models.VIF (Table A2) provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity.VI 1~VI 54 represent different VIs in Table 4 which fed into MLR modeling and validation.

Figure 1 .
Figure 1.(a) Location of study area shown in red.(b) Map showing Changping District in Beijing City.(c) Design of treatments and an unmanned-aerial-vehicle image of the experimental field (acquired on 12 May 2015).Three plant groups are present with two winter wheat varieties (J9843 and ZM175), three water treatments (W0, W1, and W2), and four nitrogen treatments (N0, N1, N2, and N3).

Figure 1 .
Figure 1.(a) Location of study area shown in red.(b) Map showing Changping District in Beijing City.(c) Design of treatments and an unmanned-aerial-vehicle image of the experimental field (acquired on 12 May 2015).Three plant groups are present with two winter wheat varieties (J9843 and ZM175), three water treatments (W0, W1, and W2), and four nitrogen treatments (N0, N1, N2, and N3).

Figure 2 .
Figure 2. (a) Average hyperspectral reflectance spectrum for the four growing stages.(b) Average Chl and above-ground biomass (AGB) for the four growing stages.

Figure 2 .
Figure 2. (a) Average hyperspectral reflectance spectrum for the four growing stages.(b) Average Chl and above-ground biomass (AGB) for the four growing stages.

Figure 3 .
Figure 3. Flowchart showing experiment methodology.Data selection, sampling methods, noise immunity, and prediction accuracy were analyzed.

Figure 3 .
Figure 3. Flowchart showing experiment methodology.Data selection, sampling methods, noise immunity, and prediction accuracy were analyzed.

Figure 4 .
Figure 4. Remote-sensing imaging process and noise sources in a charge-coupled device (CCD) system.

Figure 4 .
Figure 4. Remote-sensing imaging process and noise sources in a charge-coupled device (CCD) system.

Figure 5 .
Figure 5. Correlation coefficients r among 54 VIs.NPQI (first), BGI (second), ARI (16th) and TCARI (22nd) are weakly correlated with other VIs.Zone a (orange and light blue), low correlation zone; Zone b (red and dark blue), high correlation zone.VIs are ordered according to corr.coeff to AGB.

Figure 5 .
Figure 5. Correlation coefficients r among 54 VIs.NPQI (first), BGI (second), ARI (16th) and TCARI (22nd) are weakly correlated with other VIs.Zone a (orange and light blue), low correlation zone; Zone b (red and dark blue), high correlation zone.VIs are ordered according to corr.coeff to AGB.

Figure 5 .
Figure 5. Correlation coefficients r among 54 VIs.NPQI (first), BGI (second), ARI (16th) and TCARI (22nd) are weakly correlated with other VIs.Zone a (orange and light blue), low correlation zone; Zone b (red and dark blue), high correlation zone.VIs are ordered according to corr.coeff to AGB.

Figure 7 .
Figure 7. Three measures of accuracy as a function of the signal-to-noise ratio.Note: SNR 100 represents no noise.All techniques used the first 30 sets of VIs (Table 4) as the input, with SNR = 5, 10, 15, 20, 30, 50, and no noise.

Figure 7 .
Figure 7. Three measures of accuracy as a function of the signal-to-noise ratio.Note: SNR 100 represents no noise.All techniques used the first 30 sets of VIs (Table 4) as the input, with SNR = 5, 10, 15, 20, 30, 50, and no noise.

Figure 7
Figure7shows the noise immunity for the eight analytical techniques.The results indicate RF > SVM >DT > BBRT >ANN > PCR > PLSR > MLR.RF performs best in this test with a validation R 2 near 0.2 and SNR = 5.For MLR, poor noise immunity was observed; its validation accuracy declined from SNR = 30 (Figure7).Meanwhile, the validation of MLR (R 2 (<0.2),RMSE (about 8 t/ha) and MAE (about 6 t/ha)) is the worst in eight techniques.As the noise increases (Figure7, SNR <30), MLR, PCR and PLSR are extremely sensitive to it, whereas ANN, SVM, DT, and BBRT are more robust against noise; however, the latter techniques also show a poor performance with increased levels of noise (Figure7, SNR = 5).

Figure 8 .
Figure 8. Modeling and validation results for global random sampling (GRS).

Figure 8 .
Figure 8. Modeling and validation results for global random sampling (GRS).

Figure 9 .
Figure 9. Modeling and validation results for growth period sampling (GPS).

Figure 9 .
Figure 9. Modeling and validation results for growth period sampling (GPS).

Table 2 .
Summary of VIs used in this study.

Table 5 .
Optimal number of input VIs for the eight analytical techniques.

Table 5 .
Optimal number of input VIs for the eight analytical techniques.

Table 6 .
Absolute difference of AGB − GRS modeling and accuracy of validation.R 2 > 0.15, ∇ RMSE and ∇ MAE > 0.800 t/ha are marked with *. ∇ R 2 , ∇ RMSE, and ∇ MAE show the absolute difference between modeling and validation R 2 , RMSE, MAE, respectively.The smaller the difference, the more stable and reliable the technique.

Table 6 .
Absolute difference of AGB − GRS modeling and accuracy of validation.

Table 7 .
Absolute difference AGB − GPS between modeling and validation accuracy.R 2 > 0.15, ∇ RMSE and ∇ MAE > 0.800 t/ha are marked with *. ∇ R 2 , ∇ RMSE, and ∇ MAE show the absolute difference between modeling and validation R 2 , RMSE, MAE, respectively.The smaller the difference, the more stable and reliable the technique.

Table 7 .
Absolute difference AGB − GPS between modeling and validation accuracy.∇R 2 > 0.15, ∇RMSE and ∇MAE > 0.800 t/ha are marked with *. ∇R 2 , ∇RMSE, and ∇MAE show the absolute difference between modeling and validation R 2 , RMSE, MAE, respectively.The smaller the difference, the more stable and reliable the technique. Note:

Table A1 .
Total explained variance of each group of VIs (%).