Estimation Model of Rice Aboveground Dry Biomass Based on the Machine Learning and Hyperspectral Characteristic Parameters of the Canopy

Wang, Xiaoke; Xu, Guiling; Feng, Yuehua; Peng, Jinfeng; Gao, Yuqi; Li, Jie; Han, Zhili; Luo, Qiangxin; Ren, Hongjun; You, Xiaoxuan; Lu, Wei

doi:10.3390/agronomy13071940

Open AccessArticle

Estimation Model of Rice Aboveground Dry Biomass Based on the Machine Learning and Hyperspectral Characteristic Parameters of the Canopy

by

Xiaoke Wang

¹,

Guiling Xu

¹,

Yuehua Feng

^1,2,*,

Jinfeng Peng

¹,

Yuqi Gao

¹,

Jie Li

¹,

Zhili Han

¹,

Qiangxin Luo

¹,

Hongjun Ren

¹,

Xiaoxuan You

¹ and

Wei Lu

¹

College of Agronomy, Guizhou University, Guiyang 550025, China

²

Laboratory of Plant Resource Conservation and Germplasm Innovation in Mountainous Region (Ministry of Education), Guizhou University, Guiyang 550025, China

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(7), 1940; https://doi.org/10.3390/agronomy13071940

Submission received: 12 June 2023 / Revised: 10 July 2023 / Accepted: 18 July 2023 / Published: 22 July 2023

(This article belongs to the Topic Applications of Big Data and Machine Learning in Smart Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Accurately estimating aboveground dry biomass (ADB) is crucial. The ADB of rice has primarily been estimated using vegetation indices with several discrete bands; nevertheless, these indices cannot take advantage of continuous bands available with hyperspectral remote sensing. This study analyzed the quantitative relationship between canopy hyperspectral characteristic parameters (HCPs) and the ADB of rice. Twenty HCPs were used, including red edge area (SDr), blue edge area (SDb), and others. The variable-screening methods involved stepwise regression (SR), a regression coefficient (RC), variable importance in projection (vip), and random forest (RF). Stepwise and partial least squares regression methods were employed with traditional linear regression as well as machine learning methods including random forest (RF), a support vector machine (SVM), a BP artificial neural network (BPNN), and an extreme learning machine. Whole- and screening-variable models were constructed to estimate rice ADB at jointing, booting, heading, and maturing stages and across growth stages. Screening-variable models include SVM models based on SR (SVM-sr), RF models based on vip (RF-vip), and others. The results show that the HCPs had a significant correlation with ADB containing elements in the red edge region, namely SDr, SDr/SDb, and (SDr − SDb)/(SDr + SDb) at each growth stage. In addition, the screening performance of vip and SR was better than that of RC and RF, and fewer variables were screened. Moreover, the HCPs of the red edge region were screened using different screening methods at each growth stage. Among them, SDr/SDb and (SDr − SDb)/(SDr + SDb) appeared frequently, indicating they are important. Furthermore, at each growth stage, ADB could be well-estimated using diverse models with the RF modeling method based on vip screening variables found to be the best modeling method for ADB estimation; the independent variables of the RF-vip model involved the (SDr − SDb)/(SDr + SDb) at each growth stage.

Keywords:

rice; aboveground dry biomass; hyperspectral characteristic parameter; machine learning; model

1. Introduction

Rice, as one of the major food crops in the world, meets many of the food needs of more than half of the human population [1,2], so the production of rice is essential to ensure global food security. Monitoring rice growth is an important means used to improve rice yield, which is the basis for making field-management decisions [2]. Aboveground biomass is one of the main factors determining the economic yield of crops [3], and can reflect the growth status of crops [4]. Therefore, monitoring aboveground biomass is an important prerequisite for judging the growth status of rice. Traditional methods for monitoring the aboveground biomass of a crop are based on field sampling; despite having high accuracy, field sampling is time-consuming, labor-intensive, time-sensitive, and destructive to the sampling site [5]. Using non-contact sensing technology can avoid the shortcomings of traditional methods and provide representative results [6]. For example, remote sensing technology creates a real-time and non-destructive data collection and has been widely used to obtain crop biomass information.

Vegetation indices calculated from remote sensing data have been used to estimate the aboveground biomass of various crops [7]. Traditional vegetation indices based on wide bands, such as the normalized difference vegetation index (NDVI), are susceptible to interference from the soil background and atmospheric environment, while a saturation effect occurs when the amount of biomass is high [8]. Although using a narrow-band vegetation index calculated based on a broad-band vegetation index formula has improved the performance of remote sensing when compared with a broad-band vegetation index by screening the best band combination [9], only a few individual bands of hyperspectral data have been used in past research. The best band combination of the same vegetation index varies from region to region for the same physiological and biochemical parameters, resulting in a single vegetation index not being universally applicable in different regions [10,11]. Therefore, trying other kinds of spectral parameters to estimate aboveground biomass may avoid the defects caused by using a vegetation index.

When compared with multispectral remote sensing, the technique of hyperspectral remote sensing has the characteristic of high spectral resolution while having a continuous band with a narrow band interval, so it can display the typical spectral characteristics of ground objects [12]. The characteristic band regions of green vegetation are the blue edge, green peak, yellow edge, and the red absorption valley of the visible light band and the red edge from the low red reflectance to the high infrared reflectance [12]. Using the parameters extracted from the characteristic band regions to obtain the crop growth status can take full advantage of hyperspectral datasets when compared with the data available in a vegetation index [13], and some studies have shown that the estimation performance of the characteristic parameters is better than that of a vegetation index [14]. The red edge position in the red edge parameter, as one of the earliest characteristic parameters derived from hyperspectral remote sensing [15], has been used to invert the chlorophyll content of vegetation [16,17], the leaf area index [18], mineral nutrition [14,19], and biomass [20,21] because of its insensitivity to the light and soil background [22]. Red edge amplitude, red edge area, blue edge, green peak, yellow edge, and red valley parameters were subsequently developed and used in the inversion of vegetation growth status [14,20,23,24,25]. Nonetheless, when compared with the use of a vegetation index, the application of characteristic parameters derived from hyperspectral remote sensing is still rare, especially in the estimation of crop biomass.

Identification of an appropriate modeling method is another important step when constructing aboveground biomass models [26]. From simple regression [27] to multiple regression [28] to machine learning (ML) methods [29], various methods have been used to estimate the aboveground biomass of crops. Among them, the amount of sensitive information in the model is determined by the number of independent variables (NIV), so the estimation performance of multiple linear regression and stepwise regression (SR) has been proven to be better than simple regression [9,28]. Meanwhile, any multicollinearity between independent variables may weaken the estimation performance of a model to some extent [30,31]. Partial least squares regression (PLSR), which combines principal component analysis, canonical correlation analysis, and multiple linear regression, can solve the problem of multiple collinearities between independent variables to some extent [32,33]. That is, PLSR is a more practical linear regression analysis method [34,35]. In contrast, regression methods based on ML, such as random forest [36], support vector machines [37], and artificial neural networks [38], are not affected by linear regression assumptions and can describe the nonlinear mapping relationship between independent and dependent variables [39]. The modeling method with the best estimation performance is usually determined by the specific study area and type of remote sensing data employed [37,40]. It is not clear which method can produce the best estimation results, so it is necessary to compare different modeling methods [26].

At present, some methods for estimating the aboveground biomass of rice using hyperspectral remote sensing have been reported. Gnyp et al. [9] reported that using a soil-adjusted vegetation index and an optimized narrow-band vegetation index (such as ratio vegetation index and NDVI) can improve the estimation performance of the aboveground dry biomass (ADB) of rice in each growth period, but they only use a univariate regression method to model without hyperspectral characteristic parameters (HCPs) involved. Wang et al. [20] screened the HCPs that were significantly correlated with the aboveground fresh biomass of rice and conducted univariate linear and nonlinear regression in the whole growth stage, but this method did not involve the estimation of dry biomass, single growth stage, and ML methods. Dong et al. [41] used an SR method to screen a vegetation index and combined this with a support vector machine (SVM) to estimate the fresh biomass of rice at all growth stages, but also did not involve the use of dry biomass, different growth periods, and hyperspectral characteristic parameters. In conclusion, HCPs and ML methods have rarely been used to estimate the ADB of rice in each growth stage.

Therefore, the aims of this study were to: (1) analyze the relationship between the HCPs and ADB of rice; (2) compare the estimation performance of different ADB models (machine learning models and traditional linear regression); (3) determine the best inversion model for each growth stage; and (4) determine whether a general model for each growth stage can be established.

2. Materials and Methods

2.1. Study Site

Field experiments were conducted in Jiuzhou Town (107°46′44″ E, 26°59′17″ N), Huangping County, Guizhou Province, southwest China (Figure 1) from April to September in 2020 and 2021. This site was situated in a subtropical monsoon climate zone with an elevation of 701 m, an average annual temperature of 15.7 °C, an average annual precipitation of 1200 mm, and a frost-free period of 296 days. The experiment site had an average soil pH of 4.98 (1:2.5 soil/water), organic matter of 20.85 g kg⁻¹, total N of 2.51 g kg⁻¹, total K of 13.28 g kg⁻¹, total P of 0.43 g kg⁻¹, alkali-hydrolyzable N of 107.21 mg kg⁻¹, exchangeable K of 69.48 mg kg⁻¹, and Olsen-P of 3.13 mg kg⁻¹.

2.2. Experimental Design

The three cultivars used were Qyou 6 (Chongqing Zhong Yi Seed Co., Ltd., Chongqing, China), Yixiangyou 2115 (Sichuan Lv Dan Seed Co., Ltd., Chengdu, China), and Huanghuazhan (Hunan Jin Se Nong Feng Co., Ltd., Changsha, China). The fertilizers used were urea (containing 46.2% N) (Chongqing Jianfeng Chemical Co., Ltd., Chongqing, China), triple superphosphate (containing 16% P₂O₅) (Guizhou Fuquan Phosphate Fertilizer Co., Ltd., Fuquan, China), and potassium chloride (containing 60% K₂O) (CNAMPGC Holding Co., Ltd., Beijing, China).

Experiments were conducted using a split-plot design with three replications. The three cultivars were assigned to the main plots. Nitrogen application rates of 0, 75, 150, 225, and 300 kg ha⁻¹ were applied in individual subplots having a size of 25.84 m² (6.8 m long, 3.8 m wide) to obtain a large range of ADBs. Urea was broadcasted as N fertilizer and was split-applied as 35% basal, 20% at 7 days after transplanting, 30% at the panicle initiation stage, and the remainder at the booting stage. For all treatments, 96 kg P₂O₅ ha⁻¹ and 67.5 kg K₂O ha⁻¹ were applied as basal fertilizers before transplanting, and 67.5 kg K₂O ha⁻¹ was applied at the panicle initiation stage. Pre-germinated rice seeds of each cultivar were sown in a seedbed on 21 April 2020 and 19 April 2021; seedlings were transplanted with a density of 0.2 m by 0.3 m with one plant per hill on 27 May 2020 and 29 May 2021. Other field-management practices such as irrigation and pesticide application were conducted in accordance with high-yield cultivation management measures.

2.3. Measurement Methods

2.3.1. Canopy Spectral Reflectance Measurement

Rice canopy reflectance was measured using a FieldSpec^® 4 Standard-Res portable spectroradiometer (Analytical Spectral Devices Inc., Boulder, CO, USA). This type of spectroradiometer can acquire reflectance data at wavelengths of 350–2500, with a 1.4 nm sampling interval between 350 nm and 1000 nm and a 2 nm sampling interval between 1001 nm and 2500 nm. Hyperspectral data were subdivided into 1 nm bandwidths by using the self-driven interpolation method of the ASD spectroradiometers.

Measurements were obtained from 10 a.m. to 3 p.m. (Beijing time) on clear days. The reflectance was obtained with a 25° field of view at a height of approximately 0.75 m above the crop canopy, resulting in a sample area of 0.09 m² with a 0.33 m diameter at the canopy surface. Calibration measurements were performed with a reference panel at least every 10–15 min to eliminate the effects of environmental changes. The reflectance was measured at the growth stages of jointing (JS), booting (BS), heading (HS), and maturing (MS). Ten sample counts were collected for each scanning position of the rice canopy. Within one subplot, five scanning positions were selected randomly, and the reflectance was averaged to represent each subplot.

2.3.2. Plant Sampling and Measurements

In addition to the reflectance measurements, four hills of the scanned subplots with an average number of tillers per subplot were cut at the ground surface. All plant samples were rinsed with water, and the roots were removed. The samples were then separated into stalk sheaths, leaves, and panicles. The dry weights of these plant organs were determined by oven-drying at 80 °C to constant weight after deactivating the enzymes at 105 °C for 30 min in the oven. Next, the total ADB (Mg ha⁻¹) was calculated. A total of 360 samples of ADB were collected from the JS to the MS.

2.4. Data Analysis

2.4.1. Data Preprocessing of Hyperspectral Data

The reflectance spectra were analyzed using ViewSpecPro software Version 6.2.0 (Analytical Spectral Devices Inc.) to obtain averaged raw spectral reflectance, which was then smoothed using the Savitzky–Golay digital filter available in Matlab R2018a with a frame size of 15 data points (second-degree polynomial); then, the first derivative was calculated.

2.4.2. Extraction of HCPs

Hyperspectral characteristic parameters refer to the spectral parameters that are based on spectral position characteristics, namely blue edge, yellow edge, red edge, green peak, and red valley. The definition of HCPs is shown in Table 1. Meanwhile, we selected two classic vegetation indices (VIs), normalized-difference red edge index (Nre) [42] and red-edge chlorophyll index (Rec) [43], to compare with HCPs. HCPs and VIs are hereinafter collectively referred to as parameters.

2.4.3. Sample Division

The 2 years of data employed here were consolidated into one data set and divided into training and test sets with a ratio of 2:1. More specifically, the sample data set was first arranged in ascending order according to the measured value of ADB of rice. The first and last sorted samples were assigned to the training set. For the intermediate samples, the first of every three samples was assigned to the test set while the next two samples were assigned to the training set.

2.5. The Construction Methods of an ADB Model

Multivariate ADB models for each growth stage and across growth stages were constructed, including models based on twenty hyperspectral characteristic parameters and models based on screening variables.

2.5.1. Variable Screening

In spectral data, many spectral data may be redundant and provide only interfering information. Variable screening can reduce the number of noisy variables, abate the complexity of the model, and help to build a stable ADB estimation model. The variable-screening methods used in this study include linear and nonlinear methods, in which regression coefficient (RC), variable importance in projection (vip), and stepwise regression (SR) involve linear methods; random forest (RF) is a nonlinear method.

The RC method eliminates variables using a stepwise process. Initially, a PLSR model based on twenty HCPs was constructed. Subsequently, the hyperspectral characteristic parameters were sorted according to the absolute value of the regression coefficients obtained from the PLSR model. Each time, the HCP with the lowest value was eliminated, and the best combination of independent variables was the independent variable with the largest training determination coefficient during the reverse elimination process [44].

In addition, vip is a variable selection method based on the use of a threshold. The vip score of the independent variable is a summary of the projection contribution of each independent variable in PLSR. Variables with a vip score greater than 1 were included in the model [45].

Meanwhile, SR is a method of independent variable selection that combines forward selection and backward elimination, starting with no variables. At each step, the p value of the F test was used to determine if a new variable should be selected or eliminated as an existing variable [46].

Lastly, RF is a machine learning algorithm. The backward feature elimination method was used to eliminate relatively less important variables among all variables, and the most important variables were retained after multiple iterations. When the training root mean square error was minimized, the most accurately estimated variable combination could be obtained [47].

2.5.2. Regression Methods

(1): Traditional linear methods

Stepwise regression (SR) is a method for fitting regression models, and its process is described in Section 2.5.1.

Partial least squares (PLS) is a commonly and widely used multivariate linear quantitative analysis method [32] which projects the original variable to a new dimension with the greatest change and uses the dependent variable to regress the latent variable [48]. The data were normalized before modeling, and the optimal number of latent variables was determined with cross-validation.

(2): Machine learning methods

Random forest (RF) is an integrated learning method [49]. First, random variables are selected from the training data set to form random samples, and each tree is constructed using a deterministic algorithm. Then, variable discriminant conditions are randomly selected on each node and regressed with out-of-bag errors [50]. Finally, the results of each decision tree or regression tree are integrated to generate predictive values. This study used the Matlab “Windows-Precompiled-RF_mexstandalone-v0.02” toolbox interface for RF regression.

An SVM is a general ML algorithm invented by Cortes and Vapnik [51]. In this study, the libsvm 3.25 toolbox [52] was used to optimize the penalty parameter C and the kernel function parameter g through grid search and cross-validation with the radial basis function as the kernel function.

A back propagation artificial neural network (BPNN) is a multi-layer network first proposed by Werbos [53]. In this method, the initial weights and thresholds are randomly determined, and the network error and structure are optimized by the gradient descent of the BP network. The structure of a BP neural network consists of input, output, and hidden layers. The most important parameter in a neural network regression model is the number of hidden layer neurons, which is determined via a process of trial and error [54].

An extreme learning machine (ELM) is a feedforward neural network based on a single hidden layer [55]. The input weights and hidden layer deviations of an ELM are randomly assigned, and the sigmoid function is used as the activation function. The number of hidden nodes is determined by trial and error.

2.6. The Evaluation of the Hyperspectral Model

The determination coefficient (R²) and root mean square error (RMSE) were used to evaluate the model fitting performance and estimation performance. Akaike information criterion (AIC) and Bayesian information criterion (BIC) [56] were used to evaluate the simplicity and accuracy of the model. The calculation formula is as follows:

R^{2} = 1 - \frac{\sum (y_{i} - {\hat{y}}_{i})^{2}}{\sum (y_{i} - \bar{y})^{2}} SSE = \sum {(y_{i} - {\hat{y}}_{i})}^{2} RMSE = \sqrt{\frac{\sum (y_{i} - {\hat{y}}_{i})^{2}}{n}}

BIC = n \ln (SSE / n) + \ln (n) \times K AIC = n \ln (SSE / n) + 2 K

where y_i,

{\hat{y}}_{i}

, and

\bar{y}

are the measured, predicted, and average of measured data, respectively; n is the number of samples; SSE represents the residual sum of squares; and K is the number of variables in the model.

3. Results and Analysis

3.1. Variations in Rice ADB

The rice ADB increased from 1.63 to 24.07 Mg ha⁻¹ throughout the rice growth stage (Table 2). At each stage, the mean and standard deviation of the training set and the test set were consistent. The training and test sets exhibited a similar statistical distribution of AGB, avoiding potentially biased estimations in model construction and testing. The data variability at the JS and all stages (AS) was large (31.25% and 56.16%, respectively), while the data variability at the BS, HS, and MS (14.28%, 14.85%, and 16.23%, respectively) was low.

3.2. The Relationship between HCPs and ADB

Based on the training sets of different growth stages, the linear correlation and curve correlation analysis between each parameter and ADB were carried out. Correlation analysis showed that the best relationship between each parameter and ADB was nonlinear at different growth stages (Figure 2).

The explained degree (ED) of parameters on the ADB varied with the growth stages, and the overall ED was not high. From the JS to AS, the explained ranges were 0.4–50.1% (Figure 2a), 1.7–20.4% (Figure 2b), 2.6–16.9% (Figure 2c), 0.4–53.2% (Figure 2d), and 7.0–67.7% (Figure 2e), respectively. The HCPs with the highest ED for ADB at each growth stage were Rrb, λo, ρr, Nrb, and λg, respectively. The HCPs with a high ED for ADB at each growth stage were SDr, Rrb, and Nrb. In addition, the R² of Nrb at each growth stage reached an extremely significant level (p < 0.01), which was better than the Nre and Rec on the whole.

The ED of HCPs for the ADB varied among the various parameters. In general, except for Db, Dy, SDy, SDr, λo, Rry, and Nry, the ED of other HCPs to biomass was better in the AS than in every other growth period. Among the amplitude parameters (Db, Dy, and Dr), Dr had the highest ED across growth stages. The ED of the position parameters (λb, λy, and λr) of the first derivative to the ADB was consistent in each growth stage. In the area parameters (SDb, SDy, SDr, and SDg), SDr had the highest ED for each growth stage except for the AS. The ED of reflectance parameters (ρg and ρr) for the ADB was higher in ρr at each growth stage. The ED of position parameters (λg and λo) of the original spectra for ADB was different at each growth stage. In the ratio and normalized parameters, the EDs of normalized parameters at each growth stage were similar to or better than those of the ratio parameters. The ratio and normalized parameters of SDb and SDr at each growth stage were better than those of SDb and SDr; whether the ratios and normalized parameters of SDy and SDr and ρg and ρr were better than themselves varied with the growth stage.

3.3. Screening of HCPs

3.3.1. Variable Screening Based on the RC

The RC-based variable screening results are shown in Table 3, where the variable names are sorted from large to small according to the absolute value of the RC. The top variables varied with the growth stage. The same variables were screened at different growth stages, namely SDr, Dr, Rrb, ρr, Nrb, Nry, and λo.

3.3.2. Variable Screening Based on vip

The importance of variables based on the vip and the results of the variable screening are shown in Table 4 and Figure 3, respectively. The selected variables were ranked from large to small according to the vip.

The number of variables screened based on the vip was small. The (SDr − SDb)/(SDr + SDb) was screened at each growth stage, and it was ranked first at each growth stage.

3.3.3. Variable Screening Based on SR

The SR-based variable screening results are shown in Table 5. Although the variables screened at different growth stages were different, Rrb was screened at the JS, MS, and AS, the same as λo, which was screened at the BS, and ρr, which was screened at the HS, which was related to the red edge region.

3.3.4. Variable Screening Based on RF

The results of variable screening based on RF are shown in Table 6, where the selected variables are ranked from large to small according to their RF Importance score. Similar to the results of the linear-screening method, the results of variable screening based on RF varied in the growth period, but the same variables were screened at each growth period, specifically Rrb, Nrb, ρr, ρg, SDr, Ngr, SDg, Rgr, Dy, λo, and Nry. Among them, the Rrb and Nrb were at the top of the order of variable importance in each growth period with very high importance.

3.4. Construction and Application of the ADB Model Based on Parameters

3.4.1. The Performance Evaluation Results of the ADB Model on the Training Set

The performance evaluation results of the ADB model on the training set at different growth stages are shown in Figure 4 and Figure 5.

The performance of the AS models based on HCPs was the best with R² values of 0.77–0.97 (Figure 4e). The performance of the model at the JS (0.59–0.88) (Figure 4a) and MS (R² 0.53–0.89) (Figure 4d) was second, while the performances of the models at the BS and HS were not high where the R² was 0.16–0.85 (Figure 4b) and 0.17–0.84 (Figure 4c), respectively. The model with the best performance at each growth stage was the model constructed with the RF method. For example, the RF–RF model R² at the JS was as high as 0.88.

In terms of models based on VIs, the R² at JS, BS, HS, MS, and AS was 0.48–0.84 (Figure 5a), 0.11–0.68 (Figure 5b), 0.12–0.68 (Figure 5c), 0.50–0.85 (Figure 5d), and 0.60–0.92 (Figure 5e), respectively, which was inferior to the models based on HCPs. In addition, the RMSEs at JS, BS, HS, MS, and AS were 0.40–0.72 (Figure 5a), 0.54–0.89 (Figure 5b), 0.75–1.25 (Figure 5c), 1.02–1.83 (Figure 5d), and 1.37–3.07 (Figure 5e), respectively, which exceeded the RMSEs (0.34–0.64, 0.37–0.87, 0.54–1.21, 0.86–1.77, and 0.88–2.33 at JS, BS, HS, MS, and AS, respectively) of models based on HCPs.

3.4.2. The Performance Evaluation Results of the ADB Model on the Test Set

The performance evaluation results of the ADB model on the test set at different growth stages are shown in Figure 6 and Figure 7. Overall, the performance of the AS and JS models based on HCPs was the best, with R² values of 0.80–0.88 (Figure 6e) and 0.49–0.76 (Figure 6a), respectively. The performance of the model at the MS (R² 0.30–0.47) (Figure 6d) was second, while the performances of the models at the BS and HS were not high where the R² was 0.01–0.36 (Figure 6b) and 0.05–0.49 (Figure 6c), respectively.

For the models based on VIs, the R² at JS, BS, HS, MS, and AS was 0.06–0.60 (Figure 7a), 0.12–0.22 (Figure 7b), 0.01–0.13 (Figure 7c), 0.18–0.49 (Figure 7d), and 0.61–0.72 (Figure 7e), respectively, which was inferior to the models based on HCPs as a whole. Furthermore, the RMSEs at JS, BS, HS, MS, and AS were 0.63–127.57 (Figure 7a), 0.87–0.92 (Figure 7b), 1.31–3.65 (Figure 7c), 1.83–4.68 (Figure 7d), and 2.74–3.17 (Figure 7e), respectively, which exceeded the RMSEs (0.48–0.95, 0.77–2.51, 1.02–1.79, 1.86–2.90, and 1.78–2.20 at JS, BS, HS, MS, and AS, respectively) of models based on HCPs in general.

The performance of the linear models based on HCPs was the best at the JS and AS; R² was 0.49–0.70 and 0.80–0.87, respectively (Figure 6). The performance of the MS (R² 0.40–0.45) was second, while the performance of the BS and HS was not high, with R² values of 0.19–0.29 and 0.07–0.19, respectively. The SR model was slightly better than the PLS model at the JS, BS, HS, and MS, while the opposite was true at the AS. Except for the AS, the PLS-rc model was slightly lower than the PLS-sr model.

For machine learning models based on HCPs, the overall performance decreased from the AS, JS, MS, HS, to BS (Figure 6). The performance of various machine learning models at the AS and JS was better than that at the BS, HS, and MS. The best machine learning method at the JS was the support vector machine with an R² above 0.70, followed by random forest with an R² above 0.60. The models performed well at the BS and HS, except for the BPNN, BPNN-vip, and BPNN-rf models at the BS and the RF-sr, SVM-sr, BPNN, BPNN-vip, and BPNN-sr models at the HS. The R² of the machine learning model was between 0.30–0.47 and 0.81–0.88 at the MS and AS, respectively.

3.4.3. The Determination of the Appropriate Model Based on HCPs

To further compare the pros and cons between machine learning models and traditional linear models, we used Taylor diagrams to determine the most appropriate model.

Figure 8 shows the results of a statistical comparison of an estimation of the performance of ADB models at each growth stage. In the Taylor diagram, the closer the model is to the observation point, the better the estimation performance is. It can be seen that other models performed better than BPNN-sr, BPNN-rf, BPNN-rc, BPNN, ELM, ELM-rc, ELM-vip, and PLS-vip at the JS (Figure 8a). The models with relatively better performance at the BS were SR, PLS-sr, RF-sr, RF-vip, and ELM-sr (Figure 8b). The models with relatively better performance at the HS were RF, RF-rc, RF-vip, and RF-rf (Figure 8c). Other models had better performance at the MS than ELM-RF, ELM-SR, ELM, ELM-VIP, BPNN, BPNN-RF, and BPNN-VIP (Figure 8d). The performance of the model at the AS was the best (Figure 8e).

Based on the analysis results of the Taylor diagram, the best ADB estimation model for each growth stage was determined according to the training set R², NIV, and the AIC/BIC method (Figure 4, Figure 5, Figure 6 and Figure 7). The determination principle of the best ADB estimation model had a larger training set R², a smaller number of model-independent variables, and a lower AIC/BIC value.

The results showed that the estimation performance of the SVM-sr model at the JS was the best (R² 0.76), but the fitting performance was poor (R² 0.64). The RF-vip model had the highest fitting performance (R² 0.87) when the NIV was small, and its estimation performance was also very good (R² 0.63); the NIV was small (eight), while the AIC and BIC values were also low, at −12.41 and 0.20, respectively.

The RF-vip model at the BS had the best estimation performance (R² 0.33) while ensuring a high fitting performance (R² 0.64). At the same time, the NIV was small, and the AIC and BIC values (−4.64, 2.36) were also low.

The RF-vip model at the HS had a high fitting performance (R² 0.81), good estimation performance (R² 0.41), and low AIC and BIC values (23.96, 37.97), while the NIV (nine) was small.

The estimation performance of the SVM-sr model in the MS (R² 0.43) was better, but the fitting performance (R² 0.53) was not high, and the NIV (two) was too small. Although the NIV of the RF-vip (seven) was slightly higher than that of the SVM-sr, the fitting performance (R² 0.86) was very high, while the estimation performance (R² 0.40) was relatively high, and the AIC and BIC values (56.04, 67.25) were also lower.

At the AS, the NIV of the BPNN-sr and RF-vip models was relatively low (10 and 9, respectively). The RF-vip model had the highest fitting performance (R² 0.96), and the estimation performance was relatively good (R² 0.83). The AIC and BIC values (193.05, 220.93) were also lower.

Based on the above analysis, the best model for each growth period is the RF model based on vip screening variables.

4. Discussion

4.1. Relationship between Hyperspectral Characteristic Parameters and ADB

The present study showed that at the JS, BS, HS, and MS, the HCPs with the highest ED of ADB were Rrb, λo, ρr, and Nrb, respectively, which were all related to the red edge region, which was similar to the previous views [18,20]. The reason may be mainly a result of the insensitivity of the red edge region to soil background and atmospheric effects [18].

In addition, the present study showed that the ED of SDr to ADB was higher than that of SDb and SDy, which was consistent with the results of previous studies [20]. The main reason was that the red edge was formed by the strong absorption of chlorophyll near 680 nm and the strong scattering of leaf structure (biomass and leaf area index) near 760 nm [57], so SDr contained both pigment information and biomass information [58]. The SDb and SDy are mainly controlled by pigment and may contain more pigment information and less biomass information than SDr, so the ED of SDb and SDy to ADB was lower than that of SDr.

Bannari et al. [59] pointed out that a vegetation index obtained using different single-band combinations improves the ED of the dependent variable relative to that of a single-band. In the present study, the same phenomenon was also found in that the ratio and normalized values of the characteristic parameters SDr and SDb were higher than their own EDs of ADB. The main reason is that the combination of different variables can eliminate the interference of environmental noise to a certain extent [60].

4.2. HCP Screening for ADB Estimation

Regarding the screening of spectral data, our predecessors used more linear methods, such as Pearson correlation [61] and stepwise regression [9,62]. The variables selected by these two methods had a good linear relationship with the outcomes, while the best relationship between the spectral data and the outcomes was not necessarily a linear relationship, which limited the ability of spectral data to explain the dependent variable to a certain extent. Therefore, the present study used a variety of variable-screening methods to screen hyperspectral characteristic parameters.

This study showed that the same variable-screening method contained the hyperspectral characteristic parameters of the red edge region in the screening results at different growth stages. For example, the vip-screening method screened the Nrb at each growth stage, and the most frequent hyperspectral characteristic parameters in the screening results were Rrb and Nrb, which were similar to the results of previous studies [20]. This finding indicates that the ratio and normalized value of SDr and SDb played an important role in the estimation of the ADB of rice. This was the reason why the ratio and the normalized value of SDr and SDb had a high ED for the aboveground dry biomass of rice at each growth stage.

4.3. Evaluation of the ADB Estimation Model

Previous studies have shown that using the selected variables to construct the model can improve the estimation performance, simplicity, and practicability of the original model [63,64]. The present study also drew similar results. For example, compared with the SVM model, the SVM-sr model at the JS reduced the NIV. At the same time, the model fitting and estimation performances were improved, as was the simplicity of the model. This occurred mainly because redundant information contains noise signals, and reducing redundant information can improve the performance of the model.

The results of this study showed that the RF, SVM, BPNN, and ELM machine learning methods combined with appropriate variable-screening methods could improve the performance of the model when compared with traditional linear regression models. This is similar to previous research results [65], indicating that a good nonlinear relationship exists between ADB and HCPs. Among them, the RF model performed best and had the best fitting performance at every growth stage, which is similar to some previous research results [63]. This occurred because a random forest has a good ability to correct data [49].

The results of the present study showed that the best model for these three growth stages was still RF-vip, rather than a linear model, when the estimation performance of the model at the BS, HS, and MS was lower than that at the JS, indicating that an RF model can reduce the saturation effect caused by closure in the middle and late stages of rice growth, which is consistent with the previous views of Yang et al. [36]. At the same time, not all machine learning models performed well. For example, the SVM-sr fitting and estimation performance at the BS were poor, with an R² of 0.16 and 0.20, respectively. This is why we compared different ML modeling methods.

Thus far, few people have studied the use of HCPs to construct ADB models at different growth stages [20]. The present study showed that the best ADB model for each growth stage of rice is the RF model based on vip screening variables, indicating that based on the HCPs, a unified model with simplicity and high accuracy can be obtained by using the vip variable screening and random forest modeling methods, which stands in contrast to the findings of the research of Gnyp et al. [9]. Their research showed that the fixed-band vegetation index, the optimized narrow-band vegetation index, and the six-band best multi-narrowband reflectance can be used to construct an ADB model of each growth period; however, a unified ADB model suitable for each growth period could not be obtained.

5. Conclusions

Based on 20 HCPs and different variable-screening methods, the present study constructed linear regression and machine learning models for rice ADB and evaluated the model accuracy, precision, and simplicity with the following results.

(1): At each growth stage, the hyperspectral characteristic parameters that were significantly related to ADB contained elements in the red edge region, including SDr, Rrb, and Nrb.
(2): The Rrb and Nrb appeared frequently in the variable screening results, indicating that they played an important role in the estimation of rice ADB.
(3): The RF modeling method based on vip screening variables was found to be the best modeling method for estimating ADB in rice. The independent variables of the RF-vip model involved Nrb at each growth stage.

However, the estimation performance of the ADB model based on the HCPs in the middle and late growth stages in this paper was not as good as that in the early growth stage. Further research needs to find sensitive spectral parameters in the middle and late growth stages of rice and improve the estimation performance of the ADB model during the whole growth period.

Author Contributions

Conceptualization, Y.F. and G.X.; Data curation, X.W., J.P. and Y.G.; Funding acquisition, Y.F.; Investigation, X.W., J.P., Y.G., J.L., Z.H., Q.L., H.R., X.Y. and W.L.; Methodology, X.W. and Y.F.; Software, X.W.; Supervision, G.X. and J.L.; Writing—original draft, X.W.; Writing—review and editing, Y.F. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (32260531), the National Key Research and Development Plan Project Sub Topic of China (2022YFD1901500/2022YFD1901505-07), the Talents Program of High Level and Innovative in Guizhou Province (Grant no. Qiankehe Platform Talents [2018]5632, 5632-2), the Key Laboratory of Molecular Breeding for Grain and Oil Crops in Guizhou Province (Qiankehezhongyindi (2023) 008), and the Key Laboratory of Functional Agriculture of Guizhou Provincial Higher Education Institutions (Qianjiaoji (2023) 007).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

References

McCough, S.R.; Doerge, R.W. QTL mapping in rice. Trends Genet. 1995, 11, 482–487. [Google Scholar] [CrossRef] [PubMed]
Du, M.; Noguchi, N. Monitoring of Wheat Growth Status and Mapping of Wheat Yield’s within-Field Spatial Variations Using Color Images Acquired from UAV-Camera System. Remote Sens. 2017, 9, 289. [Google Scholar] [CrossRef] [Green Version]
Yang, W.; Peng, S.; Laza, R.C.; Visperas, R.M.; Dionisio Sese, M.L. Yield Gap Analysis between Dry and Wet Season Rice Crop Grown under High-Yielding Management Conditions. Agron. J. 2008, 100, 1390–1395. [Google Scholar] [CrossRef] [Green Version]
Dhillon, M.S.; Dahms, T.; Kuebert-Flock, C.; Borg, E.; Conrad, C.; Ullmann, T. Modelling Crop Biomass from Synthetic Remote Sensing Time Series: Example for the DEMMIN Test Site, Germany. Remote Sens. 2020, 12, 1819. [Google Scholar] [CrossRef]
Boschetti, M.; Bocchi, S.; Brivio, P.A. Assessment of pasture production in the Italian Alps using spectrometric and remote sensing information. Agric. Ecosyst. Environ. 2007, 118, 267–272. [Google Scholar] [CrossRef]
Kim, M.; Wang, Q.; Li, H. Non-contact sensing based geometric quality assessment of buildings and civil structures: A review. Autom. Constr. 2019, 100, 163–179. [Google Scholar] [CrossRef]
Chao, Z.; Liu, N.; Zhang, P.; Ying, T.; Song, K. Estimation methods developing with remote sensing information for energy crop biomass: A comparative review. Biomass Bioenergy 2019, 122, 414–425. [Google Scholar] [CrossRef]
Li, F.; Zhang, H.; Jia, L.; Bareth, G.; Miao, Y.; Chen, X. Estimating winter wheat biomass and nitrogen status using an active crop sensor. Intell. Autom. Soft Comput. 2010, 16, 1221–1230. [Google Scholar]
Gnyp, M.L.; Miao, Y.; Yuan, F.; Ustin, S.L.; Yu, K.; Yao, Y.; Huang, S.; Bareth, G. Hyperspectral canopy sensing of paddy rice aboveground biomass at different growth stages. Field Crops Res. 2014, 155, 42–55. [Google Scholar] [CrossRef]
Gonsamo, A. Normalized sensitivity measures for leaf area index estimation using three-band spectral vegetation indices. Int. J. Remote Sens. 2011, 32, 2069–2080. [Google Scholar] [CrossRef]
Heiskanen, J.; Rautiainen, M.; Stenberg, P.; Mõttus, M.; Vesanto, V. Sensitivity of narrowband vegetation indices to boreal forest LAI, reflectance seasonality and species composition. ISPRS J. Photogramm. Remote Sens. 2013, 78, 1–14. [Google Scholar] [CrossRef]
Pu, R.; Gong, P. Hyperspectral Remote Sensing and Its Applications; Beijing Higher Education Press: Beijing, China, 2000. [Google Scholar]
Verrelst, J.; Malenovský, Z.; Van der Tol, C.; Camps-Valls, G.; Gastellu-Etchegorry, J.; Lewis, P.; North, P.; Moreno, J. Quantifying Vegetation Biophysical Variables from Imaging Spectroscopy Data: A Review on Retrieval Methods. Surv. Geophys. 2019, 40, 589–629. [Google Scholar] [CrossRef] [PubMed]
Gong, P.; Pu, R.; Heald, R.C. Analysis of in situ hyperspectral data for nutrient estimation of giant sequoia. Int. J. Remote Sens. 2002, 23, 1827–1850. [Google Scholar] [CrossRef]
Gates, D.M.; Keegan, H.J.; Schleter, J.C.; Weidner, V.R. Spectral Properties of Plants. Appl. Opt. 1965, 4, 11–20. [Google Scholar] [CrossRef]
Li, L.; Ren, T.; Ma, Y.; Wei, Q.; Wang, S.; Li, X.; Cong, R.; Liu, S.; Lu, J. Evaluating chlorophyll density in winter oilseed rape (Brassica napus L.) using canopy hyperspectral red-edge parameters. Comput. Electron. Agric. 2016, 126, 21–31. [Google Scholar] [CrossRef]
Ta, N.; Chang, Q.; Zhang, Y. Estimation of Apple Tree Leaf Chlorophyll Content Based on Machine Learning Methods. Remote Sens. 2021, 13, 3902. [Google Scholar] [CrossRef]
Pu, R.; Gong, P.; Biging, G.S.; Larrieu, M.R. Extraction of red edge optical parameters from Hyperion data for estimation of forest leaf area index. IEEE Trans. Geosci. Remote Sens. 2003, 41, 916–921. [Google Scholar]
Feng, W.; Zhu, Y.; Yao, X.; Tian, Y.; Guo, T.; Cao, W. Monitoring nitrogen accumulation in wheat leaf with red edge characteristics parameters. Trans. CSAE 2009, 25, 194–201. [Google Scholar]
Wang, X.; Huang, J.; Li, Y.; Wang, R. Study on hyperspectral remote sensing estimation models for the ground fresh biomass of rice. Acta Agron. Sin. 2003, 5544, 815–821. [Google Scholar]
Li, W.; Dou, Z.; Wang, Y.; Wu, G.; Zhang, M.; Lei, Y.; Ping, Y.; Wang, J.; Cui, L.; Ma, W. Estimation of above-ground biomass of reed (Phragmites communis) based on in situ hyperspectral data in Beijing Hanshiqiao Wetland, China. Wetl. Ecol. Manag. 2019, 27, 87–102. [Google Scholar] [CrossRef]
Salisbury, J.W.; Milton, N.M.; Walsh, P.A. Significance of non-isotropic scattering from vegetation for geobotanical remote sensing. Int. J. Remote Sens. 1987, 8, 997–1009. [Google Scholar] [CrossRef]
Elvidge, C.D.; Chen, Z. Comparison of broad-band and narrow-band red and near-infrared vegetation indices. Remote Sens. Environ. 1995, 54, 38–48. [Google Scholar] [CrossRef]
Zarco-Tejada, P.J.; Miller, J.R.; Noland, T.L.; Mohammed, G.H.; Sampson, P.H. Scaling-up and model inversion methods with narrowband optical indices for chlorophyll content estimation in closed forest canopies with hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2001, 39, 1491–1507. [Google Scholar] [CrossRef] [Green Version]
Chen, J.; Li, F.; Wang, R.; Fan, Y.; Raza, M.A.; Liu, Q.; Wang, Z.; Cheng, Y.; Wu, X.; Yang, F.; et al. Estimation of nitrogen and carbon content from soybean leaf reflectance spectra using wavelet analysis under shade stress. Comput. Electron. Agric. 2019, 156, 482–489. [Google Scholar] [CrossRef]
Lu, D.; Chen, Q.; Wang, G.; Liu, L.; Li, G.; Moran, E. A survey of remote sensing-based aboveground biomass estimation methods in forest ecosystems. Int. J. Digit. Earth 2016, 9, 63–105. [Google Scholar] [CrossRef]
Chen, P.; Haboudane, D.; Tremblay, N.; Wang, J.; Vigneault, P.; Li, B. New spectral indicator assessing the efficiency of crop nitrogen treatment in corn and wheat. Remote Sens. Environ. 2010, 114, 1987–1997. [Google Scholar] [CrossRef]
Jayathunga, S.; Owari, T.; Tsuyuki, S. Digital Aerial Photogrammetry for Uneven-Aged Forest Management: Assessing the Potential to Reconstruct Canopy Structure and Estimate Living Biomass. Remote Sens. 2019, 11, 338. [Google Scholar] [CrossRef] [Green Version]
Zhu, W.; Sun, Z.; Peng, J.; Huang, Y.; Li, J.; Zhang, J.; Yang, B.; Liao, X. Estimating Maize Above-Ground Biomass Using 3D Point Clouds of Multi-Source Unmanned Aerial Vehicle Data at Multi-Spatial Scales. Remote Sens. 2019, 11, 2678. [Google Scholar] [CrossRef] [Green Version]
Ray-Mukherjee, J.; Nimon, K.; Mukherjee, S.; Morris, D.W.; Slotow, R.; Hamer, M. Using commonality analysis in multiple regressions: A tool to decompose regression effects in the face of multicollinearity. Methods Ecol. Evol. 2014, 5, 320–328. [Google Scholar] [CrossRef]
Sun, J.; Zhou, X.; Hu, Y.; Wu, X.; Zhang, X.; Wang, P. Visualizing distribution of moisture content in tea leaves using optimization algorithms and NIR hyperspectral imaging. Comput. Electron. Agric. 2019, 160, 153–159. [Google Scholar] [CrossRef]
Wold, S.; Sjöström, M.; Eriksson, L. PLS-regression: A basic tool of chemometrics. Chemom. Intell. Lab. Syst. 2001, 58, 109–130. [Google Scholar] [CrossRef]
Wang, G.; Wang, W.; Fang, Q.; Jiang, H.; Xin, Q.; Xue, B. The Application of Discrete Wavelet Transform with Improved Partial Least-Squares Method for the Estimation of Soil Properties with Visible and Near-Infrared Spectral Data. Remote Sens. 2018, 10, 867. [Google Scholar] [CrossRef] [Green Version]
Hansen, P.M.; Schjoerring, J.K. Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression. Remote Sens. Environ. 2003, 86, 542–553. [Google Scholar] [CrossRef]
Ryu, C.; Suguri, M.; Umeda, M. Multivariate analysis of nitrogen content for rice at the heading stage using reflectance of airborne hyperspectral remote sensing. Field Crops Res. 2011, 122, 214–224. [Google Scholar] [CrossRef] [Green Version]
Yang, H.; Li, F.; Wang, W.; Yu, K. Estimating Above-Ground Biomass of Potato Using Random Forest and Optimized Hyperspectral Indices. Remote Sens. 2021, 13, 2339. [Google Scholar] [CrossRef]
Breunig, F.M.; Galvão, L.S.; Dalagnol, R.; Dauve, C.E.; Parraga, A.; Santi, A.L.; Della Flora, D.P.; Chen, S. Delineation of management zones in agricultural fields using cover–crop biomass estimates from PlanetScope data. Int. J. Appl. Earth Obs. Geoinf. 2020, 85, 102004. [Google Scholar] [CrossRef]
Zeng, N.; Ren, X.; He, H.; Zhang, L.; Li, P.; Niu, Z. Estimating the grassland aboveground biomass in the Three-River Headwater Region of China using machine learning and Bayesian model averaging. Environ. Res. Lett. 2021, 16, 114020. [Google Scholar] [CrossRef]
Ali, I.; Greifeneder, F.; Stamenkovic, J.; Neumann, M.; Notarnicola, C. Review of Machine Learning Approaches for Biomass and Soil Moisture Retrievals from Remote Sensing Data. Remote Sens. 2015, 7, 16398–16421. [Google Scholar] [CrossRef] [Green Version]
Yue, J.; Feng, H.; Yang, G.; Li, Z. A Comparison of Regression Techniques for Estimation of Above-Ground Winter Wheat Biomass Using Near-Surface Spectroscopy. Remote Sens. 2018, 10, 66. [Google Scholar] [CrossRef] [Green Version]
Dong, Y.; Cai, B.; Wang, F.; Zhang, Y.; Wang, X.; Wang, F.; Xie, J. Estimation of Fresh Biomass of Rice Based on Optimum Vegetation Index. Bull. Sci. Technol. 2019, 35, 58–65. [Google Scholar]
Fitzgerald, G.; Rodriguez, D.; Leary, G.O. Measuring and predicting canopy nitrogen nutrition in wheat using a spectral index—The canopy chlorophyll content index (CCCI). Field Crops Res. 2010, 116, 318–324. [Google Scholar] [CrossRef]
Li, F.; Miao, Y.; Feng, G.; Yuan, F.; Yue, S.; Gao, X.; Liu, Y.; Liu, B.; Ustin, S.L.; Chen, X. Improving estimation of summer maize nitrogen status with red edge-based spectral vegetation indices. Field Crops Res. 2014, 157, 111–123. [Google Scholar] [CrossRef]
Ong, P.; Chen, S.; Tsai, C.; Chuang, Y. Prediction of tea theanine content using near-infrared spectroscopy and flower pollination algorithm. Spectrochim. Acta Part A Mol. Biomol. Spectrosc. 2021, 255, 119657. [Google Scholar] [CrossRef] [PubMed]
Chong, I.; Jun, C. Performance of some variable selection methods when multicollinearity is present. Chemom. Intell. Lab. Syst. 2005, 78, 103–112. [Google Scholar] [CrossRef]
Jin, J.; Wang, Q. Selection of Informative Spectral Bands for PLS Models to Estimate Foliar Chlorophyll Content Using Hyperspectral Reflectance. IEEE Trans. Geosci. Remote Sens. 2019, 57, 3064–3072. [Google Scholar] [CrossRef]
Gao, Y.; Lu, D.; Li, G.; Wang, G.; Chen, Q.; Liu, L.; Li, D. Comparative Analysis of Modeling Algorithms for Forest Aboveground Biomass Estimation in a Subtropical Region. Remote Sens. 2018, 10, 627. [Google Scholar] [CrossRef] [Green Version]
Haaland, D.M.; Thomas, E.V. Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information. Anal. Chem. 1988, 60, 1193–1202. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Breiman, L. Out-of-Bag Estimation; University of California: Berkeley, CA, USA, 1996. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Chang, C.; Lin, C. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 21–27. [Google Scholar] [CrossRef]
Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1974. [Google Scholar]
Esteban, L.G.; Fernández, F.G.; de Palacios, P. MOE prediction in Abies pinsapo Boiss. timber: Application of an artificial neural network using non-destructive testing. Comput. Struct. 2009, 87, 1360–1365. [Google Scholar] [CrossRef]
Huang, G.; Zhu, Q.; Siew, C. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Pham, T.D.; Yoshino, K.; Bui, D.T. Biomass estimation of Sonneratia caseolaris (L.) Engler at a coastal area of Hai Phong city (Vietnam) using ALOS-2 PALSAR imagery and GIS-based multi-layer perceptron neural networks. GISci. Remote Sens. 2017, 54, 329–353. [Google Scholar] [CrossRef]
Horler, D.N.H.; Dockray, M.; Barber, J. The red edge of plant leaf reflectance. Int. J. Remote Sens. 1983, 4, 273–288. [Google Scholar] [CrossRef]
Filella, I.; Peñuelas, J. The red edge position and shape as indicators of plant chlorophyll content, biomass and hydric status. Int. J. Remote Sens. 1994, 15, 1459–1470. [Google Scholar] [CrossRef]
Bannari, A.; Morin, D.; Bonn, F.; Huete, A.R. A review of vegetation indices. Remote Sens. Rev. 1995, 13, 95–120. [Google Scholar] [CrossRef]
Pôças, I.; Calera, A.; Campos, I.; Cunha, M. Remote sensing for estimating and mapping single and basal crop coefficientes: A review on spectral vegetation indices approaches. Agric. Water Manag. 2020, 233, 106081. [Google Scholar] [CrossRef]
Melendez-Pastor, I.; Navarro-Pedreño, J.; Gómez, I.; Koch, M. Identifying optimal spectral bands to assess soil properties with VNIR radiometry in semi-arid soils. Geoderma 2008, 147, 126–132. [Google Scholar] [CrossRef]
Schlerf, M.; Atzberger, C.; Hill, J.; Buddenbaum, H.; Werner, W.; Schüler, G. Retrieval of chlorophyll and nitrogen in Norway spruce (Picea abies L. Karst.) using imaging spectroscopy. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 17–26. [Google Scholar] [CrossRef]
Gao, J.; Meng, B.; Liang, T.; Feng, Q.; Ge, J.; Yin, J.; Wu, C.; Cui, X.; Hou, M.; Liu, J.; et al. Modeling alpine grassland forage phosphorus based on hyperspectral remote sensing and a multi-factor machine learning algorithm in the east of Tibetan Plateau, China. ISPRS J. Photogramm. Remote Sens. 2019, 147, 104–117. [Google Scholar] [CrossRef]
Sun, H.; Feng, M.; Xiao, L.; Yang, W.; Ding, G.; Wang, C.; Jia, X.; Wu, G.; Zhang, S. Potential of Multivariate Statistical Technique Based on the Effective Spectra Bands to Estimate the Plant Water Content of Wheat under Different Irrigation Regimes. Front. Plant Sci. 2021, 12, 631573. [Google Scholar] [CrossRef] [PubMed]
Ma, W.; Tan, K.; Du, P. Predicting soil heavy metal based on Random Forest model. In Proceedings of the 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Beijing, China, 10–15 July 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]

Figure 1. Map of study area.

Figure 2. Relationship between parameters and ADB at different growth stages. (a)—jointing stage; (b)—booting stage; (c)—heading stage; (d)—maturing stage; (e)—all stages. The same below. E, P, Q denote exponential, power, and quadratic fit. * and ** indicate a significant correlation at p < 0.05 and p < 0.01 level, respectively.

Figure 3. Variable importance based on vip. The red dotted line indicates vip score = 1. Relationship between parameters and ADB at different growth stages. (a)—jointing stage; (b)—booting stage; (c)—heading stage; (d)—maturing stage; (e)—all stages. The same below. E, P, Q denote exponential, power, and quadratic fit.

Figure 4. The training set performance evaluation results of ADB model based on HCPs. P, R, S, B, and E are abbreviations for PLS, RF, SVM, BPNN, and ELM, respectively. NIV, number of independent variables. P represents the PLS model of twenty variables, P-rc represents the PLS model of screening variables based on RC, R represents the RF model of full variables, R-rc represents the RF model of screening variables based on RC, etc. The same below.

Figure 5. The training set performance evaluation results of ADB model based on VIs.

Figure 6. The test set performance evaluation results of ADB model based on HCPs.

Figure 7. The test set performance evaluation results of ADB model based on VIs.

Figure 8. Taylor diagram of ADB estimation model based on HCPs at different growth stages.

Table 1. The definition of HCPs.

HCP Identifier	HCP Symbol	Name	Definition
1	Db	Blue edge amplitude	Maximum value of the first-derivative spectral reflectance in blue edge (490~530 nm)
2	λb	Blue edge position	Corresponding wavelength of the maximum value of the first-derivative spectral reflectance in the blue edge (490~530 nm)
3	SDb	Blue edge area	Sum of first-derivative spectral reflectance in blue edge (490~530 nm)
4	Dy	Yellow edge amplitude	Maximum value of first-derivative spectral reflectance in the yellow edge (560~640 nm)
5	λy	Yellow edge position	Corresponding wavelength of the maximum value of the first-derivative spectral reflectance in the yellow edge (560~640 nm)
6	SDy	Yellow edge area	Sum of first differential spectra in the yellow edge (560~640 nm)
7	Dr	Red edge amplitude	Maximum value of the first-derivative spectral reflectance in the red edge (680~760 nm)
8	λr	Red edge position	Corresponding wavelength of the maximum value of the first-derivative spectral reflectance in the red edge (680~760 nm)
9	SDr	Red edge area	Sum of first-derivative spectral reflectance in red edge (680~760 nm)
10	ρg	Green peak reflectance	Maximum raw spectral reflectance within the green peak (510~560 nm)
11	λg	Green peak position	Wavelength corresponding to the maximum raw spectral reflectance in the green peak (510~560 nm)
12	SDg	Green peak area	Sum of the raw spectral reflectance in the green peak (510~560 nm)
13	ρr	Red valley reflectance	Minimum raw spectral reflectance in red valley (650~690 nm)
14	λo	Red valley position	Wavelength corresponding to minimum raw spectral reflectance in red valley (650~690 nm)
15	Rrb	SDr/SDb	Ratio of red edge area to blue edge area
16	Rry	SDr/SDy	Ratio of red edge area to yellow edge area
17	Nrb	(SDr − SDb)/(SDr + SDb)	Normalized value of red edge area and blue edge area
18	Nry	(SDr − Sdy)/(SDr + Sdy)	Normalized value of red edge area and yellow edge area
19	Rgr	ρg/ρr	Ratio of green peak reflectance to red valley reflectance
20	Ngr	(ρg − ρr)/(ρg + ρr)	Normalized value of green peak reflectance and red valley reflectance

Table 2. Descriptive statistics of ADB for train and test sets at different growth stages.

Stage	Data Set	n	Min	Max	Mean	SD	CV
JS	All	90	1.63	5.85	3.20	1.00	31.25
	Train	60	1.63	5.85	3.21	1.01	31.48
	Test	30	1.72	5.43	3.20	1.00	31.33
BS	All	90	4.41	9.00	6.66	0.95	14.28
	Train	60	4.41	9.00	6.67	0.95	14.30
	Test	30	4.68	8.92	6.66	0.97	14.49
HS	All	90	6.56	12.41	9.06	1.35	14.85
	Train	60	6.56	12.41	9.06	1.34	14.78
	Test	30	6.69	12.31	9.07	1.38	15.26
MS	All	90	9.91	24.07	15.77	2.56	16.23
	Train	60	9.91	24.07	15.77	2.61	16.57
	Test	30	11.28	21.47	15.78	2.50	15.82
AS	All	360	1.63	24.07	8.68	4.87	56.16
	Train	240	1.63	24.07	8.68	4.88	56.24
	Test	120	1.72	21.47	8.67	4.88	56.22

n, number of observations; Min, minimum value; Max, maximum value; Mean, mean value; SD, standard deviation; CV, coefficient of variation. Min, Max, Mean, and SD in Mg ha⁻¹; CV in %.

Table 3. Variable screening results based on RC.

Growth Stage	Number of Variables	Variable Names
JS	20	SDr Dr Rrb ρr Dy ρg Nrb Rry SDy SDg Db Nry λg SDb Rgr λo λb λy λr Ngr
BS	11	ρg Db SDg ρr Nry Nrb SDr Dr Dy Rrb λo
HS	17	SDy ρg Dy SDg SDr Dr Db Ngr Nry Rry Rgr Nrb Rrb ρr λg λo λr
MS	15	Rrb Nrb Nry SDr λo Dr λb λy λr Db λg SDb ρr Rgr Rry
AS	20	SDg ρg Db SDb SDr Dr SDy Ngr ρr Rrb Rgr Dy λg Nrb Nry λo λb λy λr Rry

Table 4. Variable screening results based on vip.

Growth Stage	Number of Variables	Variable Names
JS	8	Rrb Nrb λg Dr SDr ρr Dy λo
BS	4	λo λg Nrb Nry
HS	9	Rry ρr Nrb Rgr Dy Ngr λo Rrb SDy
MS	7	Rrb Nrb SDr Nry SDy Ngr Rgr
AS	9	λg Rrb Nrb ρr SDg λb λy λr ρg

Table 5. Variable screening results based on the SR.

Growth Stage	Number of Variables	Variable Names
JS	5	Dy SDy ρr Rrb Rry
BS	1	λo
HS	1	ρr
MS	2	SDb Rrb
AS	10	Db SDb Dr SDr ρg λg SDg Rrb Rgr (ρg − ρr)/(ρg + ρr)

Table 6. Variable screening results based on the RF.

Growth Stage	Number of Variables	Variable Names
JS	17	Rrb Nrb ρr Dr λg ρg SDr SDb Ngr SDg Rgr Dy Db Rry λo Nry λr
BS	19	Rry Nrb Rrb Dy Nry SDy Dr SDb Db ρr Ngr SDr ρg λo Rgr SDg λg λr λb
HS	16	ρr Rrb Nrb Ngr Nry Rgr SDy Db Rry SDb ρg Dy SDr Dr SDg λo
MS	18	Rrb Nrb SDr SDy Nry ρr λg Rgr Dr Ngr ρg Dy λo SDg Rry SDb Db λr
AS	15	Nrb Rrb λg ρr Rgr Ngr λo Dy SDy Nry ρg SDg λy λb SDr

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Xu, G.; Feng, Y.; Peng, J.; Gao, Y.; Li, J.; Han, Z.; Luo, Q.; Ren, H.; You, X.; et al. Estimation Model of Rice Aboveground Dry Biomass Based on the Machine Learning and Hyperspectral Characteristic Parameters of the Canopy. Agronomy 2023, 13, 1940. https://doi.org/10.3390/agronomy13071940

AMA Style

Wang X, Xu G, Feng Y, Peng J, Gao Y, Li J, Han Z, Luo Q, Ren H, You X, et al. Estimation Model of Rice Aboveground Dry Biomass Based on the Machine Learning and Hyperspectral Characteristic Parameters of the Canopy. Agronomy. 2023; 13(7):1940. https://doi.org/10.3390/agronomy13071940

Chicago/Turabian Style

Wang, Xiaoke, Guiling Xu, Yuehua Feng, Jinfeng Peng, Yuqi Gao, Jie Li, Zhili Han, Qiangxin Luo, Hongjun Ren, Xiaoxuan You, and et al. 2023. "Estimation Model of Rice Aboveground Dry Biomass Based on the Machine Learning and Hyperspectral Characteristic Parameters of the Canopy" Agronomy 13, no. 7: 1940. https://doi.org/10.3390/agronomy13071940

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Estimation Model of Rice Aboveground Dry Biomass Based on the Machine Learning and Hyperspectral Characteristic Parameters of the Canopy

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Site

2.2. Experimental Design

2.3. Measurement Methods

2.3.1. Canopy Spectral Reflectance Measurement

2.3.2. Plant Sampling and Measurements

2.4. Data Analysis

2.4.1. Data Preprocessing of Hyperspectral Data

2.4.2. Extraction of HCPs

2.4.3. Sample Division

2.5. The Construction Methods of an ADB Model

2.5.1. Variable Screening

2.5.2. Regression Methods

2.6. The Evaluation of the Hyperspectral Model

3. Results and Analysis

3.1. Variations in Rice ADB

3.2. The Relationship between HCPs and ADB

3.3. Screening of HCPs

3.3.1. Variable Screening Based on the RC

3.3.2. Variable Screening Based on vip

3.3.3. Variable Screening Based on SR

3.3.4. Variable Screening Based on RF

3.4. Construction and Application of the ADB Model Based on Parameters

3.4.1. The Performance Evaluation Results of the ADB Model on the Training Set

3.4.2. The Performance Evaluation Results of the ADB Model on the Test Set

3.4.3. The Determination of the Appropriate Model Based on HCPs

4. Discussion

4.1. Relationship between Hyperspectral Characteristic Parameters and ADB

4.2. HCP Screening for ADB Estimation

4.3. Evaluation of the ADB Estimation Model

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI