Predicting Grassland Leaf Area Index in the Meadow Steppes of Northern China : A Comparative Study of Regression Approaches and Hybrid Geostatistical Methods

Leaf area index (LAI) is a key parameter used to describe vegetation structures and is widely used in ecosystem biophysical process and vegetation productivity models. Many algorithms have been developed for the estimation of LAI based on remote sensing images. Our goal was to produce accurate and timely predictions of grassland LAI for the meadow steppes of northern China. Here, we compare the predictive power of regression approaches and hybrid geostatistical methods using Chinese Huanjing (HJ) satellite charge coupled device (CCD) data. The regression methods evaluated include partial least squares regression (PLSR), artificial neural networks (ANNs) and random forests (RFs). The two hybrid geostatistical methods were regression kriging (RK) and random forests residuals kriging (RFRK). The predictions were validated for different grassland types and different growing stages, and their performances were also examined by adding several groups of vegetation indices (VIs). The two hybrid geostatistical models (RK and RFRK) yielded the most accurate predictions (root mean squared error (RMSE) = 0.21 m2/m2 and 0.23 m2/m2 for RK and RFRK, respectively), followed by the RF model (RMSE = 0.27 m2/m2), which was the most accurate among the regression models. These three models also exhibited the best temporal performance across the duration of the growing season. The PLSR and ANN models were less accurate (RMSE = 0.33 m2/m2 and 0.35 m2/m2 for ANN and PLSR, respectively), and the PLSR model performed the worst (exhibiting varied temporal performance and unreliable prediction accuracy that was susceptible to ground conditions). By adding VIs to the predictor variables, the predictions of the PLSR and ANN models were obviously improved (RMSE improved from 0.35 m2/m2 to 0.28 m2/m2 for PLSR and from 0.33 m2/m2 to 0.28 m2/m2 for ANN); the RF and RFRK models did not generate more accurate predictions and the performance of the RK model declined (RMSE decreased from 0.21 m2/m2 to 0.32 m2/m2).


Introduction
Leaf area index (LAI) is defined as the one-sided green leaf area per unit ground area [1] and is a crucial parameter driving the biological processes of plants.The index is closely associated with vegetative biological and physical processes, such as photosynthesis and transpiration.Therefore, LAI is used as an essential input in a variety of climate and ecosystem models [2][3][4][5].As an important indicator describing the energy and carbon exchange between vegetation and the atmosphere, the quantitative retrieval of this biophysical parameter is necessary for understanding the changes in energy and carbon cycling in response to climate change [5][6][7].
To this end, remote sensing data, which are endowed with high temporal resolution and the capacity for large-scale observation, are widely used for LAI prediction.A number of LAI prediction methods have been developed from remotely sensed data.The most popular and commonly used approaches are empirical statistical methods, including simple linear regression [8], multiple linear regression [9], and partial least squares regression (PLSR) [10].These methods primarily compute the relationship between LAI and a spectral observation or a combination of spectral observations (vegetation indices, VIs) by relying on statistical or physical knowledge.However, due to their empirical nature, these regression models are site and sensor specific, and their performance can be hampered by factors, such as differences in surface properties and sun position, as well as viewing geometry [11][12][13].Machine learning methods, such as decision tree learning, artificial neural networks (ANNs) [14], support vector machines [15], and random forests (RFs) [16,17] are also increasingly employed to optimize the use of spectral information with the goal of minimizing prediction uncertainty.The non-linear relationship between remote sensing data and biogeophysical variables endows these flexible models with the ability to combine different data structure features in a non-linear manner and to conform to the requirements of different tasks [18,19].
In addition, geostatistical prediction methods, including ordinary kriging (OK) [20], kriging with external trend [21,22], and regression kriging (RK) [23,24], which model the data structure of spatial autocorrelation and incorporate this information in the response variables for unsampled locations, have also been used to map environmental variables [25][26][27].Remote sensing images are widely used as auxiliary data.The motivation for utilizing geostatistical analysis is that geostatistical methods can exploit the presence of spatial autocorrelation and joint dependence in space and time, which occur in most natural resource variables, and can improve ecological interpretation and help to assess error spatially [21,28].Moreover, several new hybrid prediction methods that combine regression methods with geostatistical interpolation, such as the random forests residuals kriging (RFRK) method [29], have also been proposed to account for the spatial structure of observed data and the environmental correlation.
Although these methods have reportedly met with varying success [19,21,30,31], systematic comparisons among them have rarely been conducted, especially for geostatistical methods, which are often not the first choice for analyses and are not widely used to map vegetation photosynthetic parameters [32].The goal of this study was to comparatively assess the predictive power of PLSR, ANNs, RFs, and hybrid geostatistical methods (RK and RFRK) in assessing grassland LAI in a meadow steppe in Hulunber, northern China.First, the four reflectance bands (three visible bands (blue, green and red) and one near-infrared (NIR) band) were used as predictor variables while training the models with an experimental dataset, and their performance was assessed using an independent validation dataset.Next, VIs were added to the inputs to determine whether additional predictor variables would improve model performance.Finally, model performance was evaluated for different grassland types (grazing grassland, mowed grassland, and fenced grassland) and during different growing stages (early, middle, and late stage) to future explore their spatial and temporal stability and discover their sensitivity to environmental factors.This study provides useful knowledge regarding the performance of different methods for the quantitative prediction of grassland LAI to guide their applications in ecosystem modeling.

Study Site
Hulunber is the most complete and best-preserved natural meadow steppe on the Eurasian continent and has an amazing abundance of species endowed with high economic and ecological value.The Hulunber grassland ecosystem observation and research station (Hulunber station) is located in the middle of the Hulunber meadow steppe (49 ˝20 1 24 11 N, 119 ˝59 1 44 11 E), which is approximately 30 km northeast of the Hailaer District in Hulunber City, Inner Mongolia, China (Figure 1).The study area is located around the Hulunber station and covers an area of approximately 17 km ˆ7 km; the land cover is mainly meadow steppe, with 18.21 km 2 of cropland in the center (Figure 1).The region is characterized by a semi-arid inland climate with an annual mean precipitation of 350-400 mm and annual mean temperatures ranging from ´3 to 1 ˝C.The average elevation is 626 m, and the terrain features a rolling surface that varies by as much as 200 m in elevation.Field observations were conducted on the grassland of the study area, primarily at the experimental site in an area of 3 km ˆ3 km that was centered at an eddy covariance flux tower.The site is homogeneous, and Leymus chinensis and Stipa baicalensis are the dominant grass species.The length of the growing season is approximately 140 days and lasts from May to September [33].There are three grassland types at the site [34]: grazing grassland, which feeds cattle; mowed grassland that is used for silage; and fenced grassland, which is enclosed by a fence and has grown naturally for the past seven years without any external influence.

Study Site
Hulunber is the most complete and best-preserved natural meadow steppe on the Eurasian continent and has an amazing abundance of species endowed with high economic and ecological value.The Hulunber grassland ecosystem observation and research station (Hulunber station) is located in the middle of the Hulunber meadow steppe (49°20′24′′ N, 119°59′44′′ E), which is approximately 30 km northeast of the Hailaer District in Hulunber City, Inner Mongolia, China (Figure 1).The study area is located around the Hulunber station and covers an area of approximately 17 km × 7 km; the land cover is mainly meadow steppe, with 18.21 km 2 of cropland in the center (Figure 1).The region is characterized by a semi-arid inland climate with an annual mean precipitation of 350-400 mm and annual mean temperatures ranging from −3 to 1 °C.The average elevation is 626 m, and the terrain features a rolling surface that varies by as much as 200 m in elevation.Field observations were conducted on the grassland of the study area, primarily at the experimental site in an area of 3 km × 3 km that was centered at an eddy covariance flux tower.The site is homogeneous, and Leymus chinensis and Stipa baicalensis are the dominant grass species.The length of the growing season is approximately 140 days and lasts from May to September [33].There are three grassland types at the site [34]: grazing grassland, which feeds cattle; mowed grassland that is used for silage; and fenced grassland, which is enclosed by a fence and has grown naturally for the past seven years without any external influence.

Sampling Design and Field Measurements
Field campaigns were performed during the growing seasons of 2014 and 2015.The experimental dates were 6 June, 1 July and 28 July in 2014 and 19 June, 10 August and 26 August in 2015.For regular experiments performed at the site, a two-scale sampling strategy designed by the VALERI project [35,36] was adopted to collect ground LAI data.The two scales used for the VALERI method were the site scale (at least a 3 km × 3 km square representative of the entire experimental site) and the elementary sampling unit scale (ESU, 30 m × 30 m, in this study corresponding to the pixel size of the remote sensing imagery).For the site scale, the 3 km × 3 km region was divided into nine 1 km × 1 km grids, and three to five ESUs were randomly selected in each grid.In total, 29 ESUs were chosen across the entire site.For each field campaign, the sampling plots were selected randomly from the 29 ESUs.A more detailed sampling protocol of this site has been described  For regular experiments performed at the site, a two-scale sampling strategy designed by the VALERI project [35,36] was adopted to collect ground LAI data.The two scales used for the VALERI method were the site scale (at least a 3 km ˆ3 km square representative of the entire experimental site) and the elementary sampling unit scale (ESU, 30 m ˆ30 m, in this study corresponding to the pixel size of the remote sensing imagery).For the site scale, the 3 km ˆ3 km region was divided into nine 1 km ˆ1 km grids, and three to five ESUs were randomly selected in each grid.In total, 29 ESUs were chosen across the entire site.For each field campaign, the sampling plots were selected randomly from the 29 ESUs.

Sampling Design and Field Measurements
A more detailed sampling protocol of this site has been described previously [37].In addition, each ESU was located with a Global Positioning System (GPS) that was accurate to 2 m, ensuring that the measurements for each campaign were collected in the same location.
The effective LAI was measured using an LAI-2200C plant canopy analyzer (Li-Cor, Lincoln, NE, USA) with a 270 ˝view cap.The LAI-2200C is an indirect, non-contact instrument that measures the gap fraction by observing diffuse radiation transmission through the canopy based on the assumption of a random leaf distribution within the canopy [38][39][40].At each ESU, the effective LAI was measured at five points organized in a "cross" pattern in which each sample point was 15 m from the next point.One above-canopy and six below-canopy LAI-2200 measurements were obtained at each point to obtain one local LAI value, and five local LAI values were averaged to calculate a mean value for each ESU.The measurements were collected near sunrise or sunset to ensure nearly uniform sky illumination.From six field campaigns at the experimental site, a total of 690 LAI measurements were collected from 138 ESU plots.

Satellite Data
Chinese HJ-1A/1B (Huanjing (HJ)) charge coupled device (CCD) images were used as the remote sensing data source in this study.The HJ-1A/1B satellites are small civilian Earth-observing satellites that were launched on 6 September 2008 by China [41].Among the payloads aboard the two satellites, multispectral CCD cameras are widely used in eco-environmental monitoring.Each satellite carries two CCD cameras, named CCD1 and CCD2, with a 700 km swath width, 48 h return period and 30 m pixel size.The HJ-1A/1B CCDs have three visible bands (blue (430-520 nm), green (520-600 nm) and red (630-690 nm)) and one near-infrared (NIR) band (760-900 nm) [42].
Six HJ-1A/1B CCD images corresponding to the dates of the field experiments were used in this study (Table 1).All images were radiometrically and geometrically corrected and were projected as UTM coordinates (WGS84 datum, Zone 50N).All images were high quality, and minimal (<10%) or no cloud contamination occurred at the site.To obtain the reflectance of the top of the canopy, the images were atmospherically corrected using the FLAASH program embedded in ENVI 4.8 software [43].Two important parameters were used in the FLAASH program for atmospheric correction: aerosol optical depth and the water vapor column.These parameters were obtained using a Microtops II Sunphotometer (Solar Light Company, Inc., Glenside, PA, USA) during each field experiment.Finally, geometric correction was performed on all HJ-1A/1B CCD images using ground points collected in the field around the site; the correction accuracy was limited to within 1 pixel.This study incorporated reflectance in four individual bands (blue, green, red, and NIR) and four VIs calculated from individual bands as independent variables.The vegetation indices included the simple ratio (SR), normalized difference vegetation index (NDVI), Atmospherically Resistant Vegetation Index (ARVI), and Wide Dynamic Range Vegetation Index (WDRVI), which exhibited strong and significant relationships with canopy LAI in the previous study [8,17,44].These indices were computed using the following equations [45][46][47][48]: where blue, red, and N IR refer to the band reflectance, γ is the atmospheric self-correcting factor (a value of 1 was recommended by Kaufman and Tanre [47] and is used in this study), and α is the weighting coefficient (an α value of 0.1 is used in this study [48]).

Partial Least Squares Regression (PLSR)
PLSR is a technique that reduces a large number of measured collinear spectral variables to a few non-correlated latent variables or factors, which must both summarize the variance of the explanatory variables well and correlate highly with the response variables [49,50].The aim of PLSR is to build a linear model as follows: where Y is the mean-centered vector of the response variable, X is the mean-centered matrix of the predictive variables, β is the matrix of coefficients, and ε is the matrix of residuals.PLSR is closely related to principal component regression (PCR).In addition to PCR, PLSR uses both the predictor variables and response variable during the decomposition process and performs the decomposition on both the predictor variables and the response variable simultaneously, whereas PCR performs the decomposition on the predictor variables alone.The optimal number of factors for a PLS analysis is usually determined by minimizing the prediction residual error sum of squares (PRESS) statistic.
The PRESS statistic was calculated through a cross-validation (CV) prediction for each model.The root mean squared error of cross validation (RMSCV) is also used to assess the predictive abilities of the PLS models [51].The analysis was accomplished using the "pls" package [52] within the statistical software package R 3.2.0.

Artificial Neural Networks (ANNs)
ANNs are non-linear statistical learning approaches that have great potential for predictive modeling [53,54].ANNs are composed of a large number of highly interconnected artificial neurons with weighted links that connect the input and output data through a learning process [14].Various types of neural networks have been developed, and a layered feed-forward ANN with three layers is the most common ANN structure.In an ANN, information flows in a unidirectional forward mode from an input layer to an output layer via hidden layer(s).Neural networks attempt to identify the best solution based on network complexity through adaptive learning processes and the incorporation of various ancillary information (e.g., topography, sun angle, and ground data) [14].A feed-forward neural network with a single hidden layer was used in this study.The analysis was accomplished using the "nnet" package within the statistical software package R 3.2.0[55].

Random Forests (RFs)
An RF is an ensemble learning method that can be used for either classification or regression [56,57].The algorithm is conceptually similar to the bagging decision tree but has extensions.An RF is a combination of tree predictors (n tree ) such that each tree depends on a collection of random variables sampled independently and then aggregates to produce accurate predictions.This method also shows better resistance to the over-fitting problem and to noise in the data compared with other regression methods [56].Unlike bagging trees, an RF grows its trees from a randomly chosen subset of the total number of predictors at each splitting node (m try ), and the tree is allowed to grow fully without pruning.Each tree in the RF is independently grown to its maximum size based on a bootstrap sample from the training dataset (approximately two-thirds), and the remaining one-third of the samples are randomly left out.The left out samples are called the out-of-bag (OOB) samples, which are used to calculate an unbiased OOB error rate and the variable's importance (measured by calculating the percent increase in the mean square error when the OOB data for each variable are permuted) [56,58].At each binary split, the predictor that produces the best split is chosen from a random subset (m try ) of the entire predictor set (p); m try is recognized as the main tuning parameter of an RF and must therefore be optimized [59,60].The analysis was accomplished using the "randomForest" package [57] within the statistical software package R 3.2.0.

Regression Kriging (RK)
RK is a hybrid geostatistics method that combines a regression between the target variable and environmental variables with the ordinary kriging (OK) of the regression residuals.In RK, a liner regression is first used to fit the explanatory variation, and then kriging is used to fit the unexplained variation and to model the spatial variability of the data [61,62].Finally, predictions at unvisited locations ẑRK ps 0 q are performed by summing the predicted trend and residuals.The trend is commonly fitted using generalized least-squares regression, and the residuals are interpolated using OK [32,61].
where βk corresponds to the estimated trend model coefficients, q k ps 0 q represents the predictive variables at the location s 0 , eps i q is the regression residual, λ i is the kriging weight determined by the spatial autocorrelation structure of the residual, and p is the number of auxiliary predictors.The analysis was accomplished using the "gstat" package [63] within the statistical software package R 3.2.0.

Random Forests Residuals Kriging (RFRK)
Although RF is a robust method that can improve prediction accuracy, this method ignores spatial autocorrelation information.To overcome this disadvantage, a hybrid method that combines RF and OK was developed and has been verified to generate much lower prediction errors and to yield a more realistic spatial distribution than the RF model [29].RFRK is an extension of RF and is very similar to RK. RFRK also consists of trends and residuals.Here, the trend is modeled using RF; the residuals from RF are interpolated to prediction grids using OK, and the interpolated residuals are added to the RF prediction results to obtain the RFRK prediction results.The RFRK formula is as follows: where LAI RFRK ps i q is the predicted LAI at location s i , LAI RF ps i q is the trend modeled by RF, and LAI OK ps i q is the residual interpolated by OK.This analysis was accomplished using the "randomForest" and "gstat" packages within the statistical software package R 3.2.0.

Model Implementation and Validation
In this study, we assumed that the ground measurements for each field campaign were conducted independently, and the temporal mixed effects of clustered data that originated from repeated measurements were not considered.To compare performance among models, the models were first implemented using predictive variables for the four spectral bands, and their performance for different grassland types and growing seasons was analyzed.The VIs were then gradually added to the input variables and were divided into five groups: four bands plus the best-performing VI, four bands plus one VI, four bands plus two VIs, four bands plus three VIs, and four bands plus four VIs.For each group, the LAI at the study site was first predicted using four bands plus the group number of VIs that were randomly combined.Then, after predicting LAI using all the combinations, each LAI prediction was assessed separately using ground measurements, and the accuracy indicator (root mean squared error (RMSE) in this study) was averaged to assess the performance of the group.
An independent dataset with 34 ESU plots was randomly selected from the original 138 samples to validate model performance, while the remaining 104 samples were used to train the models.The RMSE, mean absolute error (MAE) and coefficient of determination (R 2 ) between measured and predicted values were calculated to assess the accuracy of each model: where ŷi is the predicted LAI value, y i is the measured LAI value and n is the number of measured values in the validation data.RMSE and MAE values close to zero and an R 2 value close to one indicate a better predictive capability of a model.

Field LAI Measurements
Detailed summary statistics of the LAI measurements are shown in Table 2. Generally, the mean effective LAI values for the site ranged from 0.5-3.6 m 2 /m 2 .During the early stage of the growing season (6 June 2014 and 19 June 2015), the mean and variability of the grass LAI values were relatively low (mean LAI value of 0.9-1.6 m 2 /m 2 and a standard deviation of 0.1-0.3m 2 /m 2 ).The middle of the growing stage (July in 2014) exhibited relatively high LAI values as well as increased variability.At the end of the growing season (August in 2015), the LAI variability in the mowed grassland remained at a high level due to grass cutting activity, while the grazing grassland showed a low LAI variability, and the fenced grassland still had a very high LAI value (LAI > 3.0 m 2 /m 2 ). 1 LAI in units of m 2 /m 2 ; 2 "-"means no data are available for that day.

LAI Spatial Prediction Based on the Four Reflectance Bands
For the selection of tuning parameters used in the regression models, exploratory trials were conducted using the training dataset.For the PLSR, the first three principal components (PC) contained 99.01% of the predictor variable information and 74.17% of the response variable information (data not shown), which indicates that these three components could reasonably substitute the original inputs and explain the output information.For the ANN model, after trials, the optimized network with the best estimation performance was determined using 1 neuron in the hidden layers.For the RF model, n tree was set to 500 (after exploratory trials using the training data).For m try , the results from previous studies [59,64] showed that m try optimization resulted in minimal improvements in RF predictions.We therefore used a predictive variable number of 2 in this analysis.
To apply the hybrid geostatistical methods (RK and RFRK), we utilized regression residual semivariograms and parameters obtained from the training dataset (Figure 2 and Table 3), and the optimal variogram models were determined by the criteria of the sum of squared errors reported by the "gstat" package.An exponential or Gaussian model was fit to the sample semivariogram.For the RK, the models revealed the spatial autocorrelation of grassland LAI during the peak of the growing season with negligible nugget, a partial sill of 0.2, and a spatial range of 400-600 m.However, the models indicated a lower spatial dependence at the beginning and end of the growing season (larger partial sill and spatial range).For RFRK, the models also revealed aspects of spatial autocorrelation but performed worse than RK.The LAI maps predicted from the five models are displayed in Figure 3. Overall, the prediction surfaces were similar in terms of the spatial patterns of grassland LAI.Differences between the mowed grassland, fenced grassland and grazing grassland were apparent from northwest to southeast.We observed a higher LAI value in the mowed grassland than in the grazing grassland throughout the growing season.The grazing grassland is used to feed cattle throughout the growing season, and the LAI value remained low.In the mowed grassland, the grass was fenced prior to August and then cut for silage.Thus, the LAI value was higher than the grazing grassland before it was cut and then sharply declined.The fenced grassland accumulated a considerable amount of litter prior to 2015, which delayed and then prevented the grass from growing, resulting in a lower LAI value in June 2014 but a similar value to the grazing grassland and mowed grassland on 1 July 2014 and 28 July 2014, respectively.At the beginning of 2015, the fenced grassland was burned and little litter remained.The grass grew quickly and displayed higher LAI values in 2015.
Table 4 presents the general statistics for the predicted LAI maps.Compared to the measured LAI (Table 2), the predicted mean values approximated the measured values, whereas the minimum, maximum and standard deviation values were more varied.The RK predictions had higher standard deviation values (smaller minima and greater maxima) except for 19 June 2015 (Table 4).The LAI maps predicted by the PLSR model had a lower standard deviation value at the middle and end of the growing season (July and August), and the RF predictions exhibited greater minima.

Model Evaluation
Table 5 presents a model evaluation result derived from an independent validation of the LAI maps using the validation dataset.RK was the most accurate method and exhibited the lowest RMSE (0.21 m 2 /m 2 ) and MAE (0.16 m 2 /m 2 ) values, as well as the highest R 2 (0.92) value.The accuracy of the RFRK predictions was nearly as good, with RMSE, MAE, and R 2 values 0.23 m 2 /m 2 , 0.17 m 2 /m 2 , and 0.91, respectively, indicating an obvious improvement compared to the RF predictions.The predictive ability of the regression models was further improved by a geostatistical analysis of the regression residuals to compensate for the spatial autocorrelation information.However, the RF model still performed the best compared with all of the regression models due to its randomness and majority rule [56].Although PLSR performed satisfactorily (RMSE of 0.35 m 2 /m 2 , MAE of 0.27 m 2 /m 2 , and R 2 of 0.77), this model was the worst of the five evaluated models (followed by the ANN model).

LAI Predictions Based on the Four Reflectance Bands and VIs
The selected four VIs all had a good relationship with LAI; after conducting a correlation analysis between satellite-derived VIs and measured LAI, the R 2 values were higher than 0.65 (0.77, 0.74, 0.69, and 0.66 for SR, WDRVI, ARVI and NDVI, respectively), and the correlation coefficients were higher than 0.79 (0.88, 0.86, 0.83, and 0.79 for SR, WDRVI, ARVI and NDVI, respectively).To examine whether adding predictor variables would improve the model performance, the four VIs of SR, NDVI, ARVI and WDRVI were calculated, added to the original predictors, and divided into five groups (Table 6): four bands plus the best-performing VI (SR), four bands plus one VI, four bands plus two VIs, four bands plus three VIs, and four bands plus four VIs, detailed information was introduced in Section 3.6.The statistical analysis was performed separately for each of the five groups and the validation results are shown in Table 6.Generally, the performance of all the models, except for RK, with input variables of the four bands plus SR had the smallest prediction error with a small MAE and RMSE.For the PLSR and ANN models, adding VIs to the input variables improved the model performance; more predictors mean more information that can be used, resulting in better accuracy of the model.For the RF and RFRK models, adding more VIs did not result in more accurate predictions.By combining an ensemble of decision trees and randomly changing the predictors and training data for each decision tree, the RF model improved the prediction accuracy and demonstrated a more robust capacity with respect to the over-fitting problem and resisting noise data [56,57,59].Furthermore, the RF model exhibited insensitivity to highly correlated predictors and irrelevant information [65,66].Thus, VIs that were highly correlated with reflectance bands did not provide a considerable amount of additional knowledge to the RF model.For the RK model, the regression trend was implemented using multiple linear regression, and the multi-collinearity between independent variables stemming from the addition of VIs highly correlated with the reflectance band prevented the available information from being fully used, which generates ill-posed inversion problems [14,67].The introduction of VIs to the RK model therefore resulted in poorer performance.

Model Performance in Different Grassland Types and at Different Growing Stages
Model performance was also assessed for different grassland types and different growing stages using the validation dataset.The RMSE between the predicted and measured LAIs is displayed in Figure 4. Across the entire site, the five models performed similarly and generated similar RMSEs (except for the prediction on 28 July 2014 for which PLSR and ANN generated a higher RMSE than the other three models, mainly due to the higher RMSE value of the grazing grassland).In addition, higher RMSE values were observed at the end of the growing season (August) for all five models.In the mowed grassland, the predicted LAI values were greater than 1 m 2 /m 2 on all six experimental dates.The five models exhibited a similar performance throughout the season but greater uncertainty was observed in the late-middle and end stages of the growing season (28 July 2014 and 10 August 2015), possibly resulting from an increasing amount of litter in the grassland [68,69].After this time, grass cutting activity reduced the proportion of canopy litter, and the prediction uncertainty decreased on 26 August 2015.For the grazing grassland, the five models displayed more varied performance, especially the PLSR and ANN models.In contrast, the RF, RK and RFRK models performed steadily and exhibited low prediction error throughout the growing season.The PLSR and ANN models performed well with a smaller RMSE value during the beginning and early-middle stages of the growing season (June and early July).More uncertainty was observed during the late-middle and end stages of the growing season, possibly resulting from the continuous decline in vegetation cover due to grazing [70].In the fenced grassland, the validation data existed for only two dates (10 August and 26 August in 2015).PLSR performed the worst.All five models generated considerable uncertainty for 26 August 2015 due to dead grasses.
Overall, the RF, RK and RFRK models performed well with a comparatively smaller and steady RMSE value throughout the growing season, while the performance of the PLSR and ANN models was more varied.Vegetation cover and grass litter were two factors that affected model performance.

Model Performance in Different Grassland Types and at Different Growing Stages
Model performance was also assessed for different grassland types and different growing stages using the validation dataset.The RMSE between the predicted and measured LAIs is displayed in Figure 4. Across the entire site, the five models performed similarly and generated similar RMSEs (except for the prediction on 28 July 2014 for which PLSR and ANN generated a higher RMSE than the other three models, mainly due to the higher RMSE value of the grazing grassland).In addition, higher RMSE values were observed at the end of the growing season (August) for all five models.In the mowed grassland, the predicted LAI values were greater than 1 m 2 /m 2 on all six experimental dates.The five models exhibited a similar performance throughout the season but greater uncertainty was observed in the late-middle and end stages of the growing season (28 July 2014 and 10 August 2015), possibly resulting from an increasing amount of litter in the grassland [68,69].After this time, grass cutting activity reduced the proportion of canopy litter, and the prediction uncertainty decreased on 26 August 2015.For the grazing grassland, the five models displayed more varied performance, especially the PLSR and ANN models.In contrast, the RF, RK and RFRK models performed steadily and exhibited low prediction error throughout the growing season.The PLSR and ANN models performed well with a smaller RMSE value during the beginning and early-middle stages of the growing season (June and early July).More uncertainty was observed during the late-middle and end stages of the growing season, possibly resulting from the continuous decline in vegetation cover due to grazing [70].In the fenced grassland, the validation data existed for only two dates (10 August and 26 August in 2015).PLSR performed the worst.All five models generated considerable uncertainty for 26 August 2015 due to dead grasses.
Overall, the RF, RK and RFRK models performed well with a comparatively smaller and steady RMSE value throughout the growing season, while the performance of the PLSR and ANN models was more varied.Vegetation cover and grass litter were two factors that affected model performance.The predicted mean LAI is the mean value averaged from the maps predicted using the five models; the missing RMSE in the fenced grassland indicates that no validation data were available for that day.

Model Comparison and Study Limitations
In our study, the geostatistical methods proved to be the most accurate models in predicting grassland LAI, as indicated by the higher R 2 and lower RMSE values.Through geostatistical analysis of the residuals of the regression models, the prediction ability was further improved by supplying spatial autocorrelation information, showing better performance than the regression methods [21,23,71].Despite these attractive properties, geostatistical methods are more sophisticated than simple mechanistic or kriging techniques.The requirements for the regression modeling are high, and the ground data accuracy will directly affect the fitted variogram parameters [32,61].Moreover, issues associated with the transferability of results between images can also represent a serious limitation [23].However, outside of the regression training area, the geostatistical analysis lost its power and generated more uncertainty.
The machine learning methods, such as ANNs and RFs, tend to be more powerful in predicting grassland LAI than the linear regression methods (e.g., PLSR) [17,19].Through adaptive learning processes and the incorporation of more ancillary information, the best solution could be found without the constraints of linearity and multi-collinearity.Moreover, the ANN and RF models can also be used as variable selection tools to identify informative variables based on the network's performance [14] or variable importance score [30,66].However, the "black box" property of these approaches affected the model transparency [53], which prevents users from interpreting the results in physical terms [18].In comparison, the recently popular RF method demonstrates a higher efficiency spatially and temporally compared with other machine learning methods due to its randomness and majority rule [59,72], and the model is recommended to be considered superior when combined with the hybrid inversion strategy [17,18].The ANN method is susceptible to variation in the training data [19,73] and environmental interference information such as atmospheric scattering and background reflectance [17], which was also shown in this study and which reduced its spatial and temporal accuracy.
The PLSR also demonstrated certain power for predicting grassland LAI.The method was simple, computationally inexpensive and devoid of the co-linearity problem.However, the model was also very sensitive to some disturbing factors, such as surface property differences and satellite sun and viewing geometry.Therefore, this type of model cannot be applied to different site and sensor conditions [12,13].The problem can be alleviated by using certain VIs that are sensitive to the target variables and relatively insensitive to interference factors [8,17,44].
The major source of uncertainty in this study is associated with the ground measurement data.For repeated measurements, the machine learning methods accounted for the spatial structure of errors, but the temporal structure error was neglected by assuming independent ground measurements between field campaigns.In contrast, certain methods that can handle mixed effects The predicted mean LAI is the mean value averaged from the maps predicted using the five models; the missing RMSE in the fenced grassland indicates that no validation data were available for that day.

Model Comparison and Study Limitations
In our study, the geostatistical methods proved to be the most accurate models in predicting grassland LAI, as indicated by the higher R 2 and lower RMSE values.Through geostatistical analysis of the residuals of the regression models, the prediction ability was further improved by supplying spatial autocorrelation information, showing better performance than the regression methods [21,23,71].Despite these attractive properties, geostatistical methods are more sophisticated than simple mechanistic or kriging techniques.The requirements for the regression modeling are high, and the ground data accuracy will directly affect the fitted variogram parameters [32,61].Moreover, issues associated with the transferability of results between images can also represent a serious limitation [23].However, outside of the regression training area, the geostatistical analysis lost its power and generated more uncertainty.
The machine learning methods, such as ANNs and RFs, tend to be more powerful in predicting grassland LAI than the linear regression methods (e.g., PLSR) [17,19].Through adaptive learning processes and the incorporation of more ancillary information, the best solution could be found without the constraints of linearity and multi-collinearity.Moreover, the ANN and RF models can also be used as variable selection tools to identify informative variables based on the network's performance [14] or variable importance score [30,66].However, the "black box" property of these approaches affected the model transparency [53], which prevents users from interpreting the results in physical terms [18].In comparison, the recently popular RF method demonstrates a higher efficiency spatially and temporally compared with other machine learning methods due to its randomness and majority rule [59,72], and the model is recommended to be considered superior when combined with the hybrid inversion strategy [17,18].The ANN method is susceptible to variation in the training data [19,73] and environmental interference information such as atmospheric scattering and background reflectance [17], which was also shown in this study and which reduced its spatial and temporal accuracy.
The PLSR also demonstrated certain power for predicting grassland LAI.The method was simple, computationally inexpensive and devoid of the co-linearity problem.However, the model was also very sensitive to some disturbing factors, such as surface property differences and satellite sun and viewing geometry.Therefore, this type of model cannot be applied to different site and sensor conditions [12,13].The problem can be alleviated by using certain VIs that are sensitive to the target variables and relatively insensitive to interference factors [8,17,44].
The major source of uncertainty in this study is associated with the ground measurement data.For repeated measurements, the machine learning methods accounted for the spatial structure of errors, but the temporal structure error was neglected by assuming independent ground measurements between field campaigns.In contrast, certain methods that can handle mixed effects can be considered in future analyses [74,75].Furthermore, HJ-CCD images and derived VIs were used in this study, and even though satisfactory results were obtained, the comparatively narrow spectral range still limited the more powerful VIs (e.g., reduced simple ratio (RSR) and cellulose absorption index (CAI)) used.In future studies, more powerful predictive variables derived from optical remote sensing datasets with broader spectral range (e.g., Landsat, Sentinel-2), light detection and ranging (LIDAR) [76] and synthetic aperture radar (SAR) [24] can be used, providing more information to train the dataset, which may lead to more accurate results.The study models were tested to determine their sensitivity to additional input variables (VIs); however, the impact of irrelevant information on model prediction was not tested in this study, and certain irrelevant and interference predictors can be introduced into the input variables to test their effect in future modeling.Finally, although the area of the study site is limited (only 3 km ˆ3 km), the biomes studied represent meadow steppes in northern China, and they can serve as a reference for future studies.Future research must also be expanded to larger area and new grassland types (desert grassland and typical grassland) that have not yet been adequately represented.

Conclusions
This study compared regression and hybrid geostatistical methods for predicting grassland LAI from Chinese HJ-1A/1B CCD images of northern China.The methods used were partial least squares regression, artificial neural networks, random forests, regression kriging, and random forests residuals kriging.Ground measurements of LAI were used for model training and validation.The two hybrid geostatistical models (RK and RFRK) resulted in more accurate predictions than the other models, and the regression residuals used to supply spatial autocorrelation information improved the prediction ability.The RF model was the most accurate of the regression models, while the other three models resulted in improved temporal performance throughout the growing season.The PLSR and ANN models were less accurate, and the PLSR model performed the worst.The temporal performances of PLSR, ANN, and PLSR were more varied, and their prediction accuracies were more susceptible to ground conditions, including vegetation cover and grass litter.By adding VIs to the predictor variables, the predictions of the PLSR and ANN models were markedly improved.However, the RF and RFRK models did not generate more accurate predictions, and the RK model generated poorer predictions due to inversion problems caused by the multi-collinearity between the independent input variables.These results should be further validated for a larger area, and more powerful predictive variables and remote sensing datasets (e.g., Landsat and Sentinel-2) should be utilized to develop accurate hybrid methods.Differential performance was observed for the methods evaluated herein for the meadow steppe of northern China, indicating that further comparisons for other applications, contexts, data quality and objectives are necessary.

Field
campaigns were performed during the growing seasons of 2014 and 2015.The experimental dates were 6 June, 1 July and 28 July in 2014 and 19 June, 10 August and 26 August in 2015.

Figure 3 .
Figure 3. Spatially distributed maps of grassland LAI.Figure 3. Spatially distributed maps of grassland LAI.

Figure 3 .
Figure 3. Spatially distributed maps of grassland LAI.Figure 3. Spatially distributed maps of grassland LAI.

Figure 4 .
Figure 4. RMSE of the validation results for different grassland types using the measured validation dataset for: (a) the whole site; (b) mowed grassland; (c) grazing grassland; and (d) fenced grassland.The predicted mean LAI is the mean value averaged from the maps predicted using the five models; the missing RMSE in the fenced grassland indicates that no validation data were available for that day.

Table 2 .
Descriptive statistics of the measured LAI dataset 1 .

Table 3 .
Parameters of the fitted empirical variogram models built from the residuals for RK and RFRK prediction.

Table 4 .
Descriptive statistics of the LAI maps predicted by the study models.

Table 4 .
Descriptive statistics of the LAI maps predicted by the study models.

Table 5 .
Validation of the predicted LAI maps using the measured validation dataset.

Table 6 .
Validation of LAI predictions based on the four reflectance bands including VIs using the measured validation dataset.

Table 6 .
Validation of LAI predictions based on the four reflectance bands including VIs using the measured validation dataset.