Simulating the Leaf Area Index of Rice from Multispectral Images

: Accurate estimation of the leaf area index (LAI) is essential for crop growth simulations and agricultural management. This study conducted a ﬁeld experiment with rice and measured the LAI in different rice growth periods. The multispectral bands (B) including red edge (RE, 730 nm ± 16 nm), near-infrared (NIR, 840 nm ± 26 nm), green (560 nm ± 16 nm), red (650 nm ± 16 nm), blue (450 nm ± 16 nm), and visible light (RGB) were also obtained by an unmanned aerial vehicle (UAV) with multispectral sensors (DJI-P4M, SZ DJI Technology Co., Ltd.). Based on the bands, ﬁve vegetation indexes (VI) including Green Normalized Difference Vegetation Index (GNDVI), Leaf Chlorophyll Index (LCI), Normalized Difference Red Edge Index (NDRE), Normalized Difference Vegetation Index (NDVI), and Optimization Soil-Adjusted Vegetation Index (OSAVI) were calculated. The semi-empirical model (SEM), the random forest model (RF), and the Extreme Gradient Boosting model (XGBoost) were used to estimate rice LAI based on multispectral bands, VIs, and their combinations, respectively. The results indicated that the GNDVI had the highest accuracy in the SEM (R 2 = 0.78, RMSE = 0.77). For the single band, NIR had the highest accuracy in both RF (R 2 = 0.73, RMSE = 0.98) and XGBoost (R 2 = 0.77, RMSE = 0.88). Band combination of NIR + red improved the estimation accuracy in both RF (R 2 = 0.87, RMSE = 0.65) and XGBoost (R 2 = 0.88, RMSE = 0.63). NDRE and LCI were the ﬁrst two single VIs for LAI estimation using both RF and XGBoost. However, putting more than one VI together could only increase the LAI estimation accuracy slightly. Meanwhile, the bands + VIs combinations could improve the accuracy in both RF and XGBoost. Our study recommended estimating rice LAI by a combination of red + NIR + OSAVI + NDVI + GNDVI + LCI + NDRE (2B + 5V) with XGBoost to obtain high accuracy and overcome the potential over-ﬁtting issue (R 2 = 0.91, RMSE = 0.54).


Introduction
Leaf area index (LAI) was first introduced by Watson [1] and defined as the sum of the leaf area per unit ground area. LAI is commonly used as an important structural and biophysical indicator of vegetation for crop photosynthesis [2], productivity [3], and water utilization [4]. Moreover, LAI is often required as an input parameter in many models for crop growth diagnosis, biomass estimation, and yield prediction in the application of precision agriculture [5][6][7]. The observations of LAI include direct and indirect measurement blue, the use of a multispectral camera with an NIR band has more advantages than the visible light image. Therefore, our study selected the multispectral images for estimating the LAI. Nevertheless, the calculation of VIs also needs several pre-treatment steps for the multispectral images, calling for professional skills and domain specific expertise.
Another factor affecting the LAI estimation accuracy is the algorithm. Besides the process-based method such as the radiative transfer model, which is not the focus of this study, the VI-based algorithms are usually empirical or semi-empirical. Peng et al. [34] estimated the LAI from VIs by four empirical models including the linear model, the exponential model, the logarithmic model, and the quadratic polynomial model. Dong et al. [35] estimated the LAI of spring wheat and canola using eight VIs by a semi-empirical model based on the modified Beer's law. In the later stages of crop growth, the VIs move closer to saturation because the field is almost covered with plant leaves. Moreover, due to the dynamic change of LAI, there would be multiple collinearities between different VIs and the LAI. Therefore, the machine learning models are increasingly applied for LAI estimation. Reisi Gahrouei et al. [36] applied an artificial neural network (ANN) and support vectors regression (SVR) to estimate the LAIs of canola, corn, and soybeans with the VIs from the multispectral images and indicated that SVR provided better accuracy than ANN. Maimaitijiang et al. [37] predicted the soybean's LAI with the VIs by four machine learning models including partial least squares regression (PLSR), random forest (RF), SVR, and extreme learning regression (ELR). However, a classical shortcoming of the machine learning models is over-fitting. Recently, the Extreme Gradient Boosting (XGBoost) algorithm proposed by Chen and Guestrin [38] based on GBDT and RF models was proven as a novel implementation method for gradient boosting machines (GBMs) and classification and regression trees (CART) with the ability to reduce over-fitting [39][40][41]. However, the accuracy of XGBoost for estimating LAI was not evaluated. Moreover, previous studies mainly used only one of the VIs or selected some of the VIs as input variables for the empirical models, while very few studies explored the effects of different combinations of VIs on the LAI estimation. In addition, the conversion from spectral bands to VIs may lose some information, while almost no study evaluated the effects of combinations of spectral bands and VIs together on the LAI estimation.
Therefore, this study conducted field experiments in Yancheng, Jiangsu Province, China and used the DJI Phantom 4 Multispectral (DJI-P4M, SZ DJI Technology Co., Ltd., Shenzhen, China) UAV to obtain both multispectral and RGB images of rice and establish the observation dataset. The observation dataset included five bands and five VIs retrieved from the UAV images and the observed LAI from the field experiments. The objectives of this study were to (1) establish the rice LAI estimation model by XGBoost and compare the accuracy with RF and semi-empirical (SEM) models, (2) evaluate the effects of different combinations of spectral bands and VIs on the rice LAI estimation accuracy, and (3) find out the optimal combinations of both spectral bands and VIs for rice LAI estimation for different models.

Study Site
The field experiments were carried out in 14 fields of the Qixing farm (33 • 11 01 N, 119 • 52 53.8 E), Yancheng, Jiangsu Province, China from April to October 2020 ( Figure 1). Yancheng is located between the northern subtropical zone and the southern warm temperate zone. The average annual precipitation is around 1014.7 mm, and the average annual runoff is about 3.96 billion m 3 . All 14 fields were irrigated with water pumps and pipes and connected to the drainage ditches, while the outlet of the main ditch was a river.
Remote Sens. 2021, 13, 3663 4 of 22 sium content, and effective zinc content were 7.4, 24.4 g·kg , 180.2 mg·kg , 24.2 mg·kg , 242.8 mg·kg −1 , 0.7 mg·kg −1 respectively. Three varieties of rice including glutinous rice, indica rice, and japonica rice were planted in the experiments. The seedling and the trans-pla4nting dates of all three varieties were 21 April 2020 and 30 May 2020, respectively. The harvest date of indica rice was 14 September 2020, and the harvest date of glutinous rice and japonica rice was 23 October 2020.
Six cameras were installed in the DJI-P4M, which contained the bands of red edge (RE, 730 nm ± 16 nm), near-infrared (NIR, 840 nm ± 26 nm), green (560 nm ± 16 nm), red (650 nm ± 16 nm), blue (450 nm ± 16 nm), and visible light (RGB). Based on the bands except for RGB, five vegetation indexes (VI) including Green Normalized Difference Vegetation Index (GNDVI), Leaf Chlorophyll Index (LCI), Normalized Difference Red Edge Index (NDRE), Normalized Difference Vegetation Index (NDVI), and Optimization Soil-Adjusted Vegetation Index (OSAVI) were calculated from the bands (Equations (1)-(5)). During the crop growth period, rice plants in 14 fields were sampled 8 times on the same day of the UAV data collection (June 12, June 21, July 1, July 13, July 20, August 2, August 14, and August 25 in 2020). Three samples were selected and cut from the junction between the root and the stem, and the leaves were separated from the plants using scissors for each field. In each sampling, 42 plants (3 plants × 14 field) were taken in 14 fields. Therefore, 336 plants were taken during the whole experiments. However, due to the operation miss, 324 plants in total were used in our study. The leaves were then scanned with a resolution of 300 PPI (EPSON V39 Scanner), and the leaf area was determined by image threshold segmentation (ITS) and specific leaf area (SLA, cm 2 ·g −1 ).
In the ITS, the images were first converted to the HSV (hue, saturation, value) color space using the OpenCV library of Python [42]. After that, the leaf threshold was manually segmented (Figure 2a-c), and the leaf areas (LA, cm 2 ) were calculated from the number of leaf pixels (NLP) (Equation (6) During the crop growth period, rice plants in 14 fields were sampled 8 times on the same day of the UAV data collection (June 12, June 21, July 1, July 13, July 20, August 2, August 14, and August 25 in 2020). Three samples were selected and cut from the junction between the root and the stem, and the leaves were separated from the plants using scissors for each field. In each sampling, 42 plants (3 plants × 14 field) were taken in 14 fields. Therefore, 336 plants were taken during the whole experiments. However, due to the operation miss, 324 plants in total were used in our study. The leaves were then scanned with a resolution of 300 PPI (EPSON V39 Scanner), and the leaf area was determined by image threshold segmentation (ITS) and specific leaf area (SLA, cm 2 ·g −1 ).
In the ITS, the images were first converted to the HSV (hue, saturation, value) color space using the OpenCV library of Python [42]. After that, the leaf threshold was manually segmented (Figure 2a-c), and the leaf areas (LA, cm 2 ) were calculated from the number of leaf pixels (NLP) (Equation (6)). After scanning, the leaves were put at 105 • C for 30 min, then dried at 75 • C to a constant weight, and the dry matter of the leaves (DML, g) was weighed. The SLA was calculated by Equation (7), and the LA of a single plant (LAS, cm 2 ·plant −1 ) was calculated by SLA and the dry matter of leaves of the single plant (DMLS, g) (Equation (8)).
The plant density (PD, plant·cm −2 ) of each field was determined from the NIR band image captured by DJI-P4M using the OpenCV library of Python (Figure 2d-f). More exactly, the NIR images were treated by several steps including Gaussian blur [43], adaptive binary [44], and denoising with closing operation of the convolution kernel [45]. After that, the LAI was calculated using LAS and PD (Equation (9)).
In total, 324 pairs of observation data were collected with the observed LAI and five VIs.

Semi-Empirical Model (SEM)
An SEM in the form of the modified Beer's law was used to calculate the LAI of rice (Equation (10)) [46,47].
In Equation (10), the VI ∞ is the VI value for the infinitely dense green canopy; K VI is the extinction coefficient linking LAI and the VI; VI g is the VI value of bare soil. To calculate the LAI, three parameters of the SEM (K VI , VI ∞ , and VI g ) needed to be determined and were estimated from the observations using the Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm [48]. More exactly, five VIs including GNDVI, LCI, NDRE, NDVI, and OSAVI were used as VIs in Equation (10) to fit K VI , VI ∞ , and VI g with the observed LAI one by one. To enhance the credibility and the reliability, the 324 pairs of data were divided into the training and the test datasets. There were 240 and 84 pairs of data in the training and the test datasets, respectively.

Random Forest Model (RF)
The RF was developed by Breiman [49] based on CARTs and bagging (bag) methods ( Figure 3). RF is an ensemble classifier composed of a series of decision trees. Each tree in the forest makes independent predictions based on the characteristics of its own test data and finally uses a voting method (the minority obeys the majority) to make the final prediction. More exactly, the RF selects k sub-training sample sets from the total data set D to form k decision trees (D 1 , D 2 , . . . , D t , . . . , D k ), Based on D t , a basic decision tree model can be formed. After that, the test data are inputted into each decision tree of RF, and the risk level corresponding to each leaf node is the evaluation result of the decision tree. In addition, the RF method uses bootstrapping to perform random sampling with replacement to obtain different sample data. The datasets not included in the model are called "out-of-bag" (OOB). Using different sample data to train the decision tree can successfully reduce the correlation of samples. Moreover, the node selection of RF was also random, which further reduced the correlation of samples, basically solving the overfitting problem of a single decision tree model and causing RF to have a good tolerance to noise. Further details about RF can be found in Breiman [49]. The training and the test samples for RF were 204 and 120 pairs of the data, respectively, which was the same as the XGBoost. The RF was implemented using the package "randomForest" in R software (version 4.0.2), and this package also offered the important feature selection analysis including the indexes of IncMSE and IncNodePurity. More exactly, the IncMSE referred to the increase in the estimated error with a randomly selected variable compared with the original estimated error. The larger the IncMSE value was, the more important the variable was. The IncNodePurity referred to the degree of influence of this variable on each decision tree node. The larger the IncNodePurity value was, the more important the variable was [50].
Remote Sens. 2021, 13, x FOR PEER REVIEW 7 of 204 and 120 pairs of the data, respectively, which was the same as the XGBoost. The R was implemented using the package "randomForest" in R software (version 4.0.2), an this package also offered the important feature selection analysis including the indexes IncMSE and IncNodePurity. More exactly, the IncMSE referred to the increase in the es mated error with a randomly selected variable compared with the original estimated e ror. The larger the IncMSE value was, the more important the variable was. T IncNodePurity referred to the degree of influence of this variable on each decision tr node. The larger the IncNodePurity value was, the more important the variable was [50

Extreme Gradient Boosting Model (XGBoost)
The XGBoost is a new, efficient ensemble learning algorithm established by Chen an Guestrin [38], which is an improved algorithm of gradient boosting and uses the Tayl expansion to obtain the second derivative as an independent variable ( Figure 4). T boosting algorithm is based on the concepts of strong and weak learnable problems giv by Kearns and Valiant [51], who proposed a very interesting theorem: a problem strongly learnable if and only if it is weakly learnable. The XGBoost integrates sever "weak" learners to generate a "strong" learner by additive learning. The prediction fun tion of XGBoost can be given in Equation (11).
In Equation (11), yi,p is the predicted value, f(xi) is a learner, K is the number of lear ers, and xi is the input data.
To evaluate the fitting effect, the loss function (ψ) was determined as Equation (12 ( ) In Equation (12), l is the function that indicates the size of the residual between t predicted and the observed values. Ω is the regularization term, which indicates the com plexity of the model (Equation (13)).

Extreme Gradient Boosting Model (XGBoost)
The XGBoost is a new, efficient ensemble learning algorithm established by Chen and Guestrin [38], which is an improved algorithm of gradient boosting and uses the Taylor expansion to obtain the second derivative as an independent variable ( Figure 4). The boosting algorithm is based on the concepts of strong and weak learnable problems given by Kearns and Valiant [51], who proposed a very interesting theorem: a problem is strongly learnable if and only if it is weakly learnable. The XGBoost integrates several "weak" learners to generate a "strong" learner by additive learning. The prediction function of XGBoost can be given in Equation (11).
In Equation (11), y i,p is the predicted value, f (x i ) is a learner, K is the number of learners, and x i is the input data.
To evaluate the fitting effect, the loss function (ψ) was determined as Equation (12).
cluding the indexes of gain, cover, and frequency. The gain implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. The cover metric means the relative number of observations related to this feature. The frequency is the percentage representing the relative number of times a particular feature occurs in the trees of the model [52].

Statistical Evaluation
Grid-search method was used for the training of RF and XGBoost. Tuning parameters of RF are "number-of-tree" and "max-feature". The range of number-of-tree was 50-500, and the interval was 50. The max-feature was set as the square root of the feature. XGBoost model has 3 parameters that needed to be tuned, which are "number-of-tree", "learningrate", and "max-depth-of-tree". The range of number-of-tree was also 50-500, and the interval was also 50. The range of learning-rate was 0.01-0.3, and the interval was 0.05. The range of max-depth-of-tree was 1-15, and the interval was 3. The tuned parameters of RF and XGBoost with different inputs are indicated in Table S1.
The evaluation statistics used in this study included determination coefficient (R 2 ), normalized root mean squared error (RMSE), mean absolute error (MAE), Nash-Sutcliffe coefficient (NSE), and mean absolute percentage error (MAPE) (Equations (14)-(18)). In Equation (12), l is the function that indicates the size of the residual between the predicted and the observed values. Ω is the regularization term, which indicates the complexity of the model (Equation (13)).
In Equation (13), γ is the mini loss, T is the label set of the leaf node of the regression tree, λ is the regularization parameter, and ω is the score vector.
More details about XGBoost can be found in Chen and Guestrin [38]. Different from the SEM, there were 204 and 120 pairs of data in the training and the test datasets, respectively.
The XGBoost was implemented using the package "xgboost" in R software (version 4.0.2). Similar to RF, this package also offered the important feature selection analysis including the indexes of gain, cover, and frequency. The gain implies the relative contribution of the corresponding feature to the model calculated by taking each feature's contribution for each tree in the model. The cover metric means the relative number of observations related to this feature. The frequency is the percentage representing the relative number of times a particular feature occurs in the trees of the model [52].

Statistical Evaluation
Grid-search method was used for the training of RF and XGBoost. Tuning parameters of RF are "number-of-tree" and "max-feature". The range of number-of-tree was 50-500, and the interval was 50. The max-feature was set as the square root of the feature. XGBoost model has 3 parameters that needed to be tuned, which are "number-of-tree", "learningrate", and "max-depth-of-tree". The range of number-of-tree was also 50-500, and the interval was also 50. The range of learning-rate was 0.01-0.3, and the interval was 0.05. The range of max-depth-of-tree was 1-15, and the interval was 3. The tuned parameters of RF and XGBoost with different inputs are indicated in Table S1.
In Equations (14)- (18), O i , P i , O ave , and P ave are observed LAI, predicted LAI, mean of observed LAI, and mean of predicted LAI, respectively. Higher R 2 and NSE values indicate better model performance, and the regression line fits the data well. Conversely, the lower values of RMSE, MAE, and MAPE show better prediction accuracy.

Performance of the Semi-Empirical Model (SEM)
The scatterplots between VIs and LAI of the SEM in the test process are shown in Figure 5 with the parameters (K VI and VI ∞ ) in Table 1 (the parameters were fitted in the training process, while the statistical indexes were determined in the test process). Although the VI g was also a parameter in the SEM (Equation (10)), the fitted value of VI g in our study was almost zero for all inputs, which was similar to the study of Dong et al. [35]. Gonsamo [53] also concluded that the contribution of VI g on the final LAI estimation was far less than that of VI ∞ . With the five VIs as the input, K VI and VI ∞ ranged from 0.285 to 0.899 and 0.540 to 0.950, respectively. Using GNDVI as the input obtained the highest accuracy for LAI prediction. More exactly, R 2 and NSE were both 0.780, while RMSE, MAE, and MAPE were 0.770, 0.540, and 0.240, respectively. Previous studies also indicated that the GNDVI was more stable than NDVI by replacing the red band with the green band of NDVI and had higher LAI prediction accuracy than NDVI [54,55]. The LCI and the NDVI had similar accuracies for LAI prediction with GNDVI using SEM. The R 2 s of LCI and NDVI were 0.77 and 0.75, while the RMSEs were 0.80 and 0.82, respectively. The NDRE had the lowest accuracy for LAI prediction, and R 2 , NSE, RMSE, MAE, and MAPE were 0.67, 0.35, 1.31, 0.90, and 0.40, respectively. The LCI and the NDRE were usually applied in predicting the chlorophyll content. However, Simic Milas et al. [56] also found that NDRE correlated with the LAI of corn (R 2 = 0.62), which also had similar accuracy with our study. Moreover, the lower accuracy of NDRE compared with LCI could be because NDRE had a higher relationship with the crop's chlorophyll content in the late stages compared to the early stages [57,58]. Moreover, the biases between predicted LAI and the 1:1 line (dash lines in Figure 5) for all VIs as inputs were larger when the LAI > 2. The reason might be the VIs became insensitive to the LAI as light absorption by chlorophyll became saturated at high LAI [59].

Machine Learning Models with Multispectral Band
For RF, the order of prediction accuracy (test process) of LAI for the five bands w NIR > red > green > blue > RE ( Figure 6). The R 2 and the RMSE of the five bands we 0.733, 0.706, 0.688, 0.629, and 0.596 and 0.988, 1.039, 1.025, 1.136, and 1.167, respective (Table 2). NIR and red are popular bands used to calculate VIs such as Difference Veget tion Index (DVI) [60] and OSAVI [26]. Considering NIR and red were the first two accura input variables, the combinations of NIR and red were treated as the input variables of R together. The NIR and the red combination had higher prediction accuracy than the sing NIR or red as an input variable. More exactly, R 2 and NSE of the NIR and red combinati were 0.875 and 0.872, increasing 19.4% and 23.0% compared with single NIR as an inp variable, respectively. Furthermore, RMSE, MAE, and MAPE of the NIR and red comb nation were 0.655, 0.392, and 0.171, only accounting for 66.3%, 57.1%, and 57.2% of t

Machine Learning Models with Multispectral Band
For RF, the order of prediction accuracy (test process) of LAI for the five bands was NIR > red > green > blue > RE ( Figure 6). The R 2 and the RMSE of the five bands were 0.733, 0.706, 0.688, 0.629, and 0.596 and 0.988, 1.039, 1.025, 1.136, and 1.167, respectively (Table 2). NIR and red are popular bands used to calculate VIs such as Difference Vegetation Index (DVI) [60] and OSAVI [26]. Considering NIR and red were the first two accurate input variables, the combinations of NIR and red were treated as the input variables of RF together. The NIR and the red combination had higher prediction accuracy than the single NIR or red as an input variable. More exactly, R 2 and NSE of the NIR and red combination were 0.875 and 0.872, increasing 19.4% and 23.0% compared with single NIR as an input variable, respectively. Furthermore, RMSE, MAE, and MAPE of the NIR and red combination were 0.655, 0.392, and 0.171, only accounting for 66.3%, 57.1%, and 57.2% of the single NIR as an input variable, respectively. However, adding more bands into the combination could only increase the accuracy slightly. For example, the highest accurate combination of bands was red, blue, green, NIR, and RE. The R 2 and the NSE of this combination were 0.889, only increasing 1.60% and 1.95%, respectively, compared with the combination of NIR and red. Meanwhile, RMSE, MAE, and MAPE of the combination of five bands were 0.612, 0.369, and 0.161, respectively, only decreasing by 6.56%, 5.86%, and 5.85% compared with the combination of NIR and red. able 2. Statistical evaluation for RF and XGBoost with multispectral bands (B) as input variables.  This phenomenon can be explained by the important feature selection analysis [50,61]. More exactly, the order of increase in mean square error (IncMSE) and increase in node purity (IncNodePurity) of the five bands is indicated in Figure 7a. To make the IncMSE and the IncNodePurity comparable, all values were converted into the relative value (ranged from zero to one) by dividing the maximum IncMSE or IncNodePurity, respectively. Figure 7 indicates that the NIR had the maximum relative IncMSE (RIncMSE, 1.00) and secondary relative IncNodePurity (RIncNodePurity, 0.99), while the red had the secondary RIncMSE (0.75) and the maximum RIncNodePurity (1.00). This phenomenon can be explained by the important feature selection analysis [50,61]. More exactly, the order of increase in mean square error (IncMSE) and increase in node purity (IncNodePurity) of the five bands is indicated in Figure 7a. To make the IncMSE and the IncNodePurity comparable, all values were converted into the relative value (ranged from zero to one) by dividing the maximum IncMSE or IncNodePurity, respectively. Figure 7 indicates that the NIR had the maximum relative IncMSE (RIncMSE, 1.00) and secondary relative IncNodePurity (RIncNodePurity, 0.99), while the red had the secondary RIncMSE (0.75) and the maximum RIncNodePurity (1.00).  Similar to RF, the order of prediction accuracy (test process) of LAI for the five bands using XGBoost was also NIR > red > green > blue > RE (Figure 8). The R 2 and the RMSE of the five bands were 0.771, 0.720, 0.698, 0.630, and 0.574 and 0.881, 0.980, 1.012, 1.119, and 1.197, respectively ( Table 2). The chord also showed that the three importance indicators (gain, cover, and frequency) of NIR and red were all larger than other bands (Figure 7b) [52]. More exactly, the values of gain, cover, and frequency of NIR and red were 0.726, 0.368, and 0.375, and 0.214, 0.366, and 0.268, respectively. Moreover, the accuracy of XGBoost was higher than RF when only one band was the input variable. For example, using NIR as the input variable obtained R 2 and NSE of 0.771 and 0.769 of XGBoost, increasing 5.18% and 8.46% compared with the RF. Similar results were also found in solar radiation and reference evapotranspiration prediction [62]. Furthermore, using NIR and red bands as input variables also increased the prediction accuracy in the XGBoost. Specifically, R 2 and NSE of the NIR and red combination were 0.883 and 0.879, increasing 14.5% and 14.3% compared with single NIR as an input variable, respectively. Furthermore, RMSE, MAE, and MAPE of the NIR and red combination were 0.636, 0.377, and 0.164, only accounting for 72.2%, 59.3%, and 59.2% of the single NIR as an input variable, respectively. This is consistent with previous studies which found that NIR and red bands can better reflect vegetation conditions [25,63]. However, different from the RF, the highest accurate combination of bands was red, blue, green, and NIR; adding RE into the com-  (XGBoost, b). IncMSE is the order of increase in mean square error, and IncNodePurity is the increase in node purity. Similar to RF, the order of prediction accuracy (test process) of LAI for the five bands using XGBoost was also NIR > red > green > blue > RE (Figure 8). The R 2 and the RMSE of the five bands were 0.771, 0.720, 0.698, 0.630, and 0.574 and 0.881, 0.980, 1.012, 1.119, and 1.197, respectively ( Table 2). The chord also showed that the three importance indicators (gain, cover, and frequency) of NIR and red were all larger than other bands (Figure 7b) [52]. More exactly, the values of gain, cover, and frequency of NIR and red were 0.726, 0.368, and 0.375, and 0.214, 0.366, and 0.268, respectively. Moreover, the accuracy of XGBoost was higher than RF when only one band was the input variable. For example, using NIR as the input variable obtained R 2 and NSE of 0.771 and 0.769 of XGBoost, increasing 5.18% and 8.46% compared with the RF. Similar results were also found in solar radiation and reference evapotranspiration prediction [62]. Furthermore, using NIR and red bands as input variables also increased the prediction accuracy in the XGBoost. Specifically, R 2 and NSE of the NIR and red combination were 0.883 and 0.879, increasing 14.5% and 14.3% compared with single NIR as an input variable, respectively. Furthermore, RMSE, MAE, and MAPE of the NIR and red combination were 0.636, 0.377, and 0.164, only accounting for 72.2%, 59.3%, and 59.2% of the single NIR as an input variable, respectively. This is consistent with previous studies which found that NIR and red bands can better reflect vegetation conditions [25,63]. However, different from the RF, the highest accurate combination of bands was red, blue, green, and NIR; adding RE into the combination decreased the prediction accuracy slightly. This indicates that RE does not contribute much to LAI prediction and causes slight over-fitting of the model [64]. In addition, adding more bands into the combination could only increase the accuracy slightly, which was also similar to the RF. For example, the highest R 2 and NSE in the XGBoost were 0.900 and 0.894, respectively, only increasing 1.93% and 1.71% compared with using the NIR and red combination as the input variable. However, the accuracy of XGBoost was only slightly higher than RF with combined bands as input variables. The highest R 2 and NSE of XGBoost increased only 1.24% and 0.56%, while RMSE, MAE, and MAPE of XGBoost decreased only 2.61%, 14.36%, and 14.29%, respectively, compared with the RF.

RF XGBoost
x FOR PEER REVIEW 13 of 22

Machine Learning Models with Vegetation Index
For RF, prediction accuracy (test process) of LAI for the four VIs except OSAVI was very close, and the first two accurate VIs were NDRE and LCI (Figure 9). The ranges of R 2 , NSE, RMSE, MAE, and MAPE of the four VIs except OSAVI were 0.861-0.875, 0.856-

Machine Learning Models with Vegetation Index
For RF, prediction accuracy (test process) of LAI for the four VIs except OSAVI was very close, and the first two accurate VIs were NDRE and LCI (Figure 9). The ranges of R 2 , NSE, RMSE, MAE, and MAPE of the four VIs except OSAVI were 0.861-0.875, 0.856-0.867, 0.667-0.696, 0.409-0.420, and 0.178-0.183, respectively ( Table 3). The lowest accurate VI was OSAVI. Although OSAVI only decreased the R 2 (0.811) by 7.31%, it increased 20.54% of RMSE compared with NDRE. The reason might be the OSAVI considering the effect of soil on the crop's chlorophyll content, which could improve the LAI prediction in the early growth stages [26,65]. However, the effect of soil on rice's LAI could be almost ignored, as the soil surface was covered with about 4 cm water depth (observed in the experiment). Meanwhile, putting more than one VI together as a combination could not improve the prediction accuracy. For example, the highest accuracy was the combination with all five VIs as the input variables, which had R 2 and NSE of 0.892 and 0.890, increasing by only 1.94% and 2.65% compared with using NDRE as the single input variable. The importance analysis also showed similar results: RIncMSE and RIncNodePurity of the five VIs were very close except the OSAVI (Figure 10a).     Figure 10. Important feature selection analysis with vegetation indexes (VIs) as input variables b random forest model (RF, a) and Extreme Gradient Boosting model (XGBoost, b). IncMSE is th order of increase in mean square error, and IncNodePurity is the increase in node purity.
For XGBoost, using LCI as the input variable had slightly higher accuracy tha NDRE, which was different from RF ( Figure 11). The R 2 and the RMSE with LCI wer 0.881 and 0.879, respectively, which increased 1.03% and 1.15% compared with NDRE Moreover, prediction accuracy (test process) of LAI for the four VIs except OSAVI wa also very close, which was similar to the RF. The ranges of R 2 , NSE, RMSE, MAE, an MAPE of the four VIs except OSAVI were 0.883-0.881, 0.851-0.879, 0.637-0.708, 0.397 0.431, and 0.173-0.188, respectively ( Table 3). The lowest accurate VI was also OSAV which decreased 7.71% and 7.96% of R 2 and NSE, while it increased 25.59%, 34.00%, an 34.10%, respectively, compared with using LCI as the single input variable. However, pu ting more than one VI together as a combination could not improve the prediction accu racy obviously, which was similar to RF. For example, the most accurate combination o VIs was NDVI, GNDVI, LCI, and NDRE in XGBoost; R 2 and NSE of this combination wer 0.890 and 0.886, respectively, only increasing 1.02% and 0.80% compared with using LC as the single input variable. Meanwhile, RMSE, MAE, and MAPE of the above combina tion were 0.617, 0.341, and 0.148, which decreased 3.14%, 14.1%, and 14.5% compared wit using LCI as the single input variable, respectively. Figure 10b also indicates that the LC was the most important VI for the LAI estimation compared with other VIs. For example  (XGBoost, b). IncMSE is the order of increase in mean square error, and IncNodePurity is the increase in node purity.
For XGBoost, using LCI as the input variable had slightly higher accuracy than NDRE, which was different from RF ( Figure 11). The R 2 and the RMSE with LCI were 0.881 and 0.879, respectively, which increased 1.03% and 1.15% compared with NDRE. Moreover, prediction accuracy (test process) of LAI for the four VIs except OSAVI was also very close, which was similar to the RF. The ranges of R 2 , NSE, RMSE, MAE, and MAPE of the four VIs except OSAVI were 0.883-0.881, 0.851-0.879, 0.637-0.708, 0.397-0.431, and 0.173-0.188, respectively ( Table 3). The lowest accurate VI was also OSAVI, which decreased 7.71% and 7.96% of R 2 and NSE, while it increased 25.59%, 34.00%, and 34.10%, respectively, compared with using LCI as the single input variable. However, putting more than one VI together as a combination could not improve the prediction accuracy obviously, which was similar to RF. For example, the most accurate combination of VIs was NDVI, GNDVI, LCI, and NDRE in XGBoost; R 2 and NSE of this combination were 0.890 and 0.886, respectively, only increasing 1.02% and 0.80% compared with using LCI as the single input variable. Meanwhile, RMSE, MAE, and MAPE of the above combination were 0.617, 0.341, and 0.148, which decreased 3.14%, 14.1%, and 14.5% compared with using LCI as the single input variable, respectively. Figure 10b also indicates that the LCI was the most important VI for the LAI estimation compared with other VIs. For example, the gain value of LCI was 0.544, which increased 106% compared to the NDVI. Moreover, the gain value of other three VIs (NDRE, OSAVI, GNDVI) could be ignored compared with LCI. Meanwhile, four VIs (except OSAVI) also obtained the highest accuracy in XGBoost, while all five VIs were needed for the highest prediction accuracy of LAI by RF, which was also in accordance with using the bands as input variables. In addition, the order of importance indicators in both RF and XGBoost for the VIs was not strictly the same with the LAI estimation accuracy of a single VI, which was different from the bands as input variables. The possible reason was the VIs were the nonlinear arithmetic combination of bands.

Machine Learning Models with Multispectral Band and Vegetation Index
The above analysis indicated that both RF and XGBoost improved the prediction accuracy of LAI compared with the SEM whether they used the bands or the VIs as the input variables. Moreover, the band combination (e.g., NIR, red) could increase the prediction accuracy of LAI, while the improvement of VI combinations was relatively low. Previous studies indicated that the conversion of the bands into VI might reduce the useful information. Therefore, our study compared two combinations with both bands and VIs. For RF, adding red and NIR bands (2B) into the VI combination of OSAVI, NDVI, GNDVI, LCI, and NDRE (5VI) improved the prediction accuracy of LAI. More exactly, R 2 and NSE of the 2B + 5VI were 0.906 and 0.904, respectively, which increased 3.54%, and 3.67% and 1.57% and 1.57%, respectively, compared with only using 2B and 5VI as the input varia- In addition, the order of importance indicators in both RF and XGBoost for the VIs was not strictly the same with the LAI estimation accuracy of a single VI, which was different from the bands as input variables. The possible reason was the VIs were the non-linear arithmetic combination of bands.

Machine Learning Models with Multispectral Band and Vegetation Index
The above analysis indicated that both RF and XGBoost improved the prediction accuracy of LAI compared with the SEM whether they used the bands or the VIs as the input variables. Moreover, the band combination (e.g., NIR, red) could increase the prediction accuracy of LAI, while the improvement of VI combinations was relatively low. Previous studies indicated that the conversion of the bands into VI might reduce the useful information. Therefore, our study compared two combinations with both bands and VIs. For RF, adding red and NIR bands (2B) into the VI combination of OSAVI, NDVI, GNDVI, LCI, and NDRE (5VI) improved the prediction accuracy of LAI. More exactly, R 2 and NSE of the 2B + 5VI were 0.906 and 0.904, respectively, which increased 3.54%, and 3.67% and 1.57% and 1.57%, respectively, compared with only using 2B and 5VI as the input variables ( Figure 12). Moreover, RMSE, MAE, and MAPE of the 2B + 5VI were 0.568, 0.326, and 0.142, respectively, which decreased 13.28%, 16.84%, and 16.69% and 6.58%, 10.68%, and 10.69%, respectively, compared with only using 2B and 5VI as the input variables (Table 4). Similar results were found for XGBoost: R2 and NSE of the 2B + 5VI were 0.919 and 0.912, respectively, which increased 4.08% and 1.67% and 3.37% and 3.05%, respectively, compared with only using 2B and 5VI as the input variables ( Figure 12). Moreover, RMSE, MAE, and MAPE of the 2B + 5VI were 0.542, 0.300, and 0.131, respectively, which decreased 14.78%, 20.42%, and 20.12% and 12.72%, 14.77%, and 14.38%, respectively, compared with only using 2B and 5VI as the input variables (Table 4).  Moreover, our study also put all 10 bands and VIs as the input combination (ALL 10) for RF and XGBoost, which indicated that ALL-10 increased the accuracy slightly fo RF compared with 2B + 5VI (Figure 12). For example, R 2 and RMSE of ALL-10 were 0.91 and 0.551, while they were 0.906 and 0.568 of 2B + 5VI (Table 4). However, ALL-10 de creased the accuracy slightly for XGBoost compared with 2B + 5VI (Figure 12). The R 2 an  Moreover, our study also put all 10 bands and VIs as the input combination (ALL-10) for RF and XGBoost, which indicated that ALL-10 increased the accuracy slightly for RF compared with 2B + 5VI ( Figure 12). For example, R 2 and RMSE of ALL-10 were 0.911 and 0.551, while they were 0.906 and 0.568 of 2B + 5VI (Table 4). However, ALL-10 decreased the accuracy slightly for XGBoost compared with 2B + 5VI ( Figure 12). The R 2 and the RMSE of ALL-10 were 0.915 and 0.561, while they were 0.919 and 0.542 of 2B + 5VI for the XGBoost (Table 4). This is because the two models use different modeling methods. The RF model did not fully reflect the information of 2B + 5VIs, while the XGBoost model overcame the problem of increasing the band information and increasing the noise after eliminating the information of the NIR and the red bands [38,41]. In addition, our study also divided the statistical evaluation of each model for LAI estimation into three categories of only using a single band, a single VI, and band or VI combinations as input variables ( Figure 13). Generally, the combinations (band, VI, or band + VI) had the highest estimation accuracy of LAI, followed by the single VI and the single band, which had the lowest accuracies. Meanwhile, for the single VI as an input variable, the machine learning models had higher accuracy than the SEM. Importantly, although the estimation accuracies of RF and XGBoost with different input variables were similar in the test process, we still propose that the XGBoost was more suitable for LAI estimation in our study. The main reason is that a big difference between the training and the test processes for the LAI estimation accuracy was found in RF (Tables S2-S4), which implied the over-fitting issue might occur in RF.
Remote Sens. 2021, 13, x FOR PEER REVIEW lowest accuracies. Meanwhile, for the single VI as an input variable, the machine models had higher accuracy than the SEM. Importantly, although the estimation cies of RF and XGBoost with different input variables were similar in the test pro still propose that the XGBoost was more suitable for LAI estimation in our stu main reason is that a big difference between the training and the test processes fo estimation accuracy was found in RF (Tables S2-S4), which implied the over-fitt might occur in RF.

Conclusions
LAI is an important indicator of vegetation for crop growth and a vital var driving almost all agricultural and eco-hydrological models. This study establi rice LAI estimation models with different multispectral bands, VIs, and their c tions by XGBoost and compared the accuracy with RF and SEM models. The res

Conclusions
LAI is an important indicator of vegetation for crop growth and a vital variable for driving almost all agricultural and eco-hydrological models. This study established the rice LAI estimation models with different multispectral bands, VIs, and their combinations by XGBoost and compared the accuracy with RF and SEM models. The results indicated that the performance of multispectral bands and VIs for LAI estimation varied from different methods. The GNDVI had the highest LAI estimation accuracy in the SEM, and the NIR was the best band in both RF and XGBoost models. Meanwhile, NDRE and LCI performed better than other VIs in RF and XGBoost models, respectively. Band combination of NIR + red could improve the LAI estimation accuracy in both RF and XGBoost models, while adding more bands into the combination had little improvement for the LAI estimation accuracy. Similar to the band combinations, putting more than one VI together could only increase the LAI estimation accuracy slightly. Furthermore, our study found the bands + VIs combinations obtained higher LAI estimation accuracies in both RF and XGBoost models. Considering the potential over-fitting issue of RF, our study suggested using 2B + 5VI (red + NIR + OSAVI + NDVI + GNDVI + LCI + NDRE) to estimate the rice LAI by XGBoost. All bands and VIs could be easily obtained by the DJI Phantom 4 Multispectral (DJI-P4M, SZ DJI Technology Co., Ltd.), which is inexpensive and very convenient, increasing the practical value of this study.
Due to the limited parameters in RF and XGBoost, our study only calibrated the parameters manually by trial-and-error, making it difficult to obtain the global optimal solution. In the future, the automatic intelligent parameter optimization algorithms (e.g., bat algorithm, salp swarm algorithm) will be coupled with the XGBoost to further improve the estimation accuracy [39,66]. Moreover, the machine learning models could also be coupled with SEM or process-based methods to increase the universality of the model while improving the estimation accuracy.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13183663/s1, Table S1: Parameters of RF and XGBoost with different inputs; Table S2: Statistical evaluation for the RF and XGBoost with multispectral bands (B) as input variables during the training period; Table S3: Statistical evaluation for the RF and XGBoost with vegetation indexes (VIs) as input variables during the training period; Table S4: Statistical evaluation for the RF and XGBoost with multispectral bands and vegetation indexes (VIs) as input variables during the training period.