Canopy closure (CC), the percentage of land area covered by the vertical projection of tree crowns [1
], is the most common variable estimated in forest inventories, because CC > 10% represents a criterion for the international definition of forests [2
]. Leaf area index (LAI), defined as the total one-sided area of all leaves in the canopy within a defined region (m2
), mainly quantifies the amount of live green leaf material present in the canopy per unit ground area [3
]. Both CC and Leaf area index(LAI) are commonly used to characterize the structure and function of forest ecosystems, but they refer to different aspects: CC always refers to subcompartment divisions, land classification, stand type classification, and stand quality evaluation, while LAI is an important structural parameter related to the energy and mass exchange characteristics of terrestrial ecosystems such as photosynthesis, respiration, transpiration, carbon and nutrient cycle, and rainfall interception [3
]. The main purpose of this study is to use remote sensing technology to replace the traditional method of field forest measurements, so CC is more appropriate and convenient for application.
The traditional methods of CC measurement include visual assessment, sample points, transects, canopy projections, observation tubes, and canopy instrument analysis [8
]. These methods are manual field measurements, which need considerable human and material resources, and it is difficult to obtain CC estimates at a regional scale. Estimation of CC based on remote sensing was originally performed using aerial photos of the target area, but because the forest area is very large and the definition of the forest boundary will change over time, using aerial photos to obtain CC is too expensive [9
]. Use of light detection and ranging (LiDAR) data can aid in the estimation of CC at a pixel level; however, it is not a cost-effective resource for long-term repeat measurements and cannot be used for historical analyses. Although LiDAR or integration of LiDAR with other sensor data in combination with field measurements is considered to be the best way to estimate CC, automation and extrapolation to a larger regional scale is still a challenge [10
]. The estimation of CC based on microwave data is also limited because of issues with system availability, data processing, and heterogeneous interactions between bands and forest structure [15
Optical satellite remote sensing images have the characteristics of effectiveness, reproducibility, and low cost, and are well-established processing methods. Therefore, a variety of optical remote sensing data sources have been successfully applied to the estimation of forest parameters such as CC. Hyperspectral and multi-angle data from optical remote sensing products are not the best remote sensing data sources for CC estimation due to complex data processing [19
]. Medium- and low-spatial resolution optical remote sensing data (e.g., Moderate-resolution Imaging Spectroradiometer (MODIS)) cannot always meet forest canopy estimation requirements [21
]. Although optical satellite imagery with medium spatial resolution is currently the main data source for CC estimation, it cannot detect small changes in closure at the forest stand scale [22
]. Therefore, high spatial resolution remote sensing data provide a more effective data source for CC estimation with higher spatial and temporal resolution and lower acquisition costs. Studies have shown that high spatial resolution data can not only be used to verify the estimation of CC from lower resolution data, but can also reduce the estimation error [26
The remote sensing inversion model of forest canopy density mainly includes physical and statistical models. The physical model is mainly represented by the geometrical optics model and radiative transfer model [29
]. However, it is more complex and needs a large number of model parameters; beyond that, non-unique solutions and other problems also make it difficult for it to be popularized and applied widely [30
]. The statistical model can be combined with remote sensing data and a small number of measured samples to predict the canopy closure of non-sampled areas. Compared with the physical model, the statistical model is an economical and efficient method to predict canopy closure. The remote sensing factors such as remote sensing band, vegetation index, and texture features, as well as soil characteristics, seasonal information, topography, and other auxiliary information are often used to estimate canopy closure by using statistical models [31
]. With a variety of remote sensing information, vegetation indices show a very good performance when predicting canopy closure [34
]. Statistical models are generally divided into parametric models, semi-parametric models, and non-parametric models. Thus, the performance of three kinds of models combined with vegetation index to estimate canopy closure needs to be further studied [9
Earth observation satellites such as the China Gaofen-1 satellite (GF-1) have been successively introduced, greatly enriching the availability of high resolution multispectral remote sensing data. This data source has great potential in the estimation of CC and is worthy of further study. Therefore, in this study, high-resolution GF-1 remote sensing images were used. The northern temperate forest (Wangyedian Forest Farm, Chifeng City, Inner Mongolia Autonomous Region, China) and subtropical forest (Gaofeng Forest Farm, Gaofeng City, Guangxi Zhuang Autonomous Region, China) were selected as experimental areas. We developed a parametric model (multiple linear regression (MLR)), semi-parametric model (generalized additive model (GAM)) and non-parametric model (random forest (RF)) for estimating CC with GF-1 high spatial resolution images, and compared the performance of the three models in plantation forests. The study had three main objectives: to evaluate the performance of GF-1 high spatial resolution remote sensing imagery to estimate CC; to identify which of the three types of model was the best method; and to find whether the performance of the three models was consistent in the two experimental areas.
, RMSE and rRMSE ranges for the MLR, GAM, and RF models of WYD were 0.45–0.76, 0.0632–0.0953 and 9.98–15.05%, respectively (Table 4
). Among the three models, the semi-parametric model (GAM) performed best, and the parametric model (MLR) performed the worst. The RMSE (rRMSE) values of the GAM model were reduced by 0.0211 (3.33%) and 0.0321 (5.05%), respectively, compared with the other models. The value of R2
, RMSE, and rRMSE of the MLR, GAM, and RF established in the Gaofeng area ranged from 0.40 to 0.59, 0.0967 to 0.1152, and 16.73% to 19.93% respectively. Among the three models, the GAM performed best, and the non-parametric model (RF) performed the worst. The GAM reduced the RMSE (rRMSE) compared with the parametric model (MLR) and non-parametric model (RF) by 0.0051 (0.88%) and 0.0185 (3.20%), respectively.
The number of independent variables to establish the parametric model (MLR) screened by stepwise regression was six for each experimental area and there were four identical variables (Difference Vegetation Index (DVI), near-infrared band (NIR), Return to Vegetation Index (RDVI), and Soil Adjustment Vegetation Index (SAVI)). Only one of the three independent variables of the GAM was the same, NIR. The RF model used all 10 variables (Green band (GREEN), NIR, DVI, Ratio Vegetation Index (RVI), Simple Ratio Index (SR), Normalized Difference Vegetation Index (NDVI), RDVI, Perpendicular Vegetation Index (PVI), SAVI, and Improved Soil Adjustment Vegetation Index (MSAVI)) (Table 4
). The performance of the MLR, GAM, and RF established in the two study areas was consistent, that is, the GAM had the highest modeling accuracy, and the MLR was second, and RF had the lowest accuracy. From the comparison of the two study areas, the accuracy of the three models established in WYD was higher than that in GF. The reason was that part of the GF-1 remote sensing image in GF was covered by thin clouds (Figure 1
The scatter plots of the estimated CC values compared with the actual measured values of the parametric model (MLR), semi-parametric model (GAM), and parametric model (RF) established in the two test areas are shown in Figure 2
. We can see from the positional relationship of the y = x line that distribution of the scatter plots of MLR and GAM in the two test areas had a large degree of similarity, but the GAM model was better than MLR with estimation accuracy. The errors of the MLR and GAM models were mainly overestimation of high depression closure values and underestimation of low depression closure values.
The average importance of the predictor of RF model in the two experimental areas is shown in Figure 3
. The importance of variables for both WYD and GF were similar. RDVI and NDVI were the most important variables, followed by RVI and SR, while NIR and GREEN had the lowest mean decrease in MSE. According to the analysis of the importance of variables in the two experimental areas, the importance of vegetation index was higher than that of band reflectance data (NIR and GREEN). The importance of normalized vegetation index variable (RDVI, NDVI) was higher than that of other vegetation index variables. This revealed a strong relationship between RDVI and NDVI and the predicted variable (canopy density). Importance variables were concentrated in several certain variables, indicating that other predictors were noise variables.
Medium-resolution remote sensing images are currently the main source of remote sensing data used to estimate CC. The main reasons are that these data sources can be obtained free of charge and the time series is long [9
]. For example, the Landsat series of satellites have been providing stable data for 45 years [54
]. This study established a parametric model (MLR), a semi-parametric model (GAM), and a non-parametric model (RF) for two experimental areas based on GF-1 remote sensing images, and the ranges of RMSE and rRMSE were: 0.0632–0.1152 and 9.98–19.93%. The range of RMSE of the existing estimation model of CC was 0.07–0.13, and the rRMSE was about 20%. In this study, GF-1 remote sensing imagery was used to estimate CC. The accuracy is comparable to other remote sensing images, and in some cases, it was higher than the accuracy of other remote sensing images (such as Sentinel-2A MSI and Landsat 8 OLI) (Table 5
The parametric model (MLR), semi-parametric model (GAM), and non-parametric model (RF) for the two study areas had significant differences in R2
. The semi-parametric model (GAM) had a relatively good fit because GAM can be used to fit variables with complex relationships, and the independent and dependent variables do not need to satisfy any hypothetical relationships and distributions. The smoothing spline had the ability to adjust the curve fit according to the data, regardless of whether the relationship between the data was linear or non-linear, so GAM was flexible and versatile [23
]. The parametric model (MLR) showed the second best fit. When the relationship between the dependent variable and the independent variable is obviously linear, a linear regression model can be used to reflect the linear relationship between variables. In this study, there were only 56 samples, and the linear relationship between the independent and dependent variables was not obvious, but having multiple independent variables (six), can make up for the shortcomings of a weak linear relationship.
The scatter plots of MLR and GAM had similar distributions. This was because two of the three modeling variables for GAM were also modeling variables for MLR. The parametric model (MLR) and the semi-parametric model (GAM) used a univariate linear function and a smoothing spline, respectively, to fit the independent variables and the dependent variable, and both models calculated the intercept, and the models were an additional form between multiple variables. The three evaluation indicators of RF were all worse than MLR and GAM because of the number of modeling samples. Although non-parametric models do not require data to satisfy theoretical assumptions and the method is relatively easy to implement, small samples will limit the modeling effect and accuracy of RF, and will weaken the predictive ability to some extent [9
]. Although the small sample number was not suitable for RF establishment, its RMSE and rRMSE were very good. Hence, the reliability of RF was high, but it was not the best model for the estimation of CC in this study. The number of samples is one of the factors affecting the accuracy of modeling.
In this study, the importance of eight vegetation indices predicted by RF model showed that two normalized vegetation indices (RDVI and NDVI) were higher than the other vegetation indices. For the normalized vegetation index could eliminate the effects of solar height angle, satellite observation angle, topographic change, cloud/shadow, and atmospheric attenuation, and meanwhile reflect the influence of vegetation canopy background. NDVI was a good predictor of canopy cover of arid forests in Africa [58
]. Martin et al. (2015) also obtained similar conclusions when using stochastic forest model to estimate canopy density based on the vegetation index extracted by Landsat-8 [39
]. SAVI and MSAVI were also reported to play an important role in the prediction of canopy density in some studies [9
]. However, they were not obvious in our study, mainly because the forests in the two experimental areas were relatively dense and were not affected by the soil background.
The residuals predicted by the three models were all within a reasonable range in the two experimental areas (Figure 4
). However, MLR and GAM models both had different degrees of overestimation and underestimation for low canopy density and high canopy density, respectively. The phenomenon is mainly because the change of vegetation index sensitivity to vegetation coverage. In areas with higher vegetation cover, the vegetation index tended to be compressed; in the area with lower vegetation cover, the vegetation index was exaggerated. Although there was no obvious overestimation and underestimation for low canopy density and high canopy density in RF, the prediction of low canopy density and high canopy density by RF model was not ideal. The main reason was that the final prediction of the RF model was based on the average of each single tree generated by the bootstrap sample. Reference data sets contained fewer higher and lower canopy densities samples, so RF might underrepresent the tree structure and the prediction of RF tended towards averages. In other research, RF was reported to be the optimal model when using a large population of samples [33
]. In general, GAM is relatively advantageous compared with MLR and RF in this study.
Canopy closure is a key parameter of forests, and plays an important role in forest management, investigation, and planning. It is also widely used in various fields related to ecology. A variety of modeling methods are effective for estimating CC. The focus of this research was to study the estimation of CC using a parametric model (MLR), a semi-parametric model (GAM), and a non-parametric model (RF) based on high spatial resolution remote sensing images (GF-1). The main conclusions drawn from the study are as follows. Firstly, establishing the three models using high spatial resolution remote sensing imagery (GF-1) to estimate the CC of the artificial forest can achieve satisfactory results. Secondly, the semi-parametric model (GAM) was better, showed strong generalization ability, and was more stable compared with the parametric model (MLR) and non-parametric model (RF). Thirdly, MLR, GAM, and RF are typical models of parametric, semi-parametric, and non-parametric models. However, the approach needs to be further tested with other models using high spatial resolution remote sensing images to estimate CC performance. This will make the conclusions of this study more universal. Fourthly, this study could be a data source and model for remote sensing in CC estimation. This study can provide a reference for the selection of data sources and model forms of remote sensing in canopy estimation and provide theoretical and technical support for transregional applications.