Development of Crown Ratio and Height to Crown Base Models for Masson Pine in Southern China

: Crown ratio (CR) and height to crown base (HCB) are important crown characteristics inﬂuencing the behavior of forest canopy ﬁres. However, the labor-intensive and costly measurement of CR and HCB have hindered their wide application to forest ﬁre management. Here, we use 301 sample trees collected in 11 provinces in China to produce predictive models of CR and HCB for Masson pine forests ( Pinus massoniana Lamb.), which are vulnerable to forest canopy ﬁres. We ﬁrst identiﬁed the best basic model that used only diameter at breast height (DBH) and height (H) as independent variables to predict CR and HCB, respectively, from 11 of the most used potential candidate models. Second, we introduced other covariates into the best basic model of CR and HCB and developed the ﬁnal CR and HCB predictive models after evaluating the model performance of di ﬀ erent combinations of covariates. The results showed that the Richards form of the candidate models performed best in predicting CR and HCB. The ﬁnal CR model included DBH, H, DBH 0.5 and height-to-diameter ratio (HDR), while the ﬁnal HCB model was the best basic model (i.e., it did not contain any other covariates). We hope that our CR and HCB predictive models contribute to the forest crown ﬁre management of Masson pine forests.


Introduction
The tree canopy is an important forest layer where trees interact with their surrounding environment [1,2]. Many key physiological processes related to the growth and development of trees, such as photosynthesis [3,4], respiration [5,6] and transpiration [7,8], occur in the canopy layer. It also plays a vital role in maintaining the soil water of the forest stand [9,10]. Additionally, the tree canopy is the primary fuel stratum involved in independent crown fires [11], and the spatial continuity and density of tree canopies determine the initiation and rate of crown fire spread and severity [12,13]. Therefore, canopy fuel characteristics (e.g., canopy fuel load (CFL), which describes available canopy fuel in the aerial layer [14][15][16]; canopy bulk density (CBD), which indicates the dry weight of the fuel available per canopy volume unity [14][15][16]), have been studied extensively for forest crown fire management [15,17,18]. For instance, CBD has been widely integrated into stand density management diagrams (SDMDs) to guide thinning practices while reducing the potential risk of crown fires [19][20][21]. Many forest growth simulators have incorporated crown characteristics as the explanatory variables in their predictive functions, such as FORMIT-M [22], the Forest Vegetation Simulator (FVS-PROGNOSIS) [23] and the Tree and Stand Simulator (TASS) [24]. Additionally, crown characteristics have also been extensively used as the independent variable for predicting basal collected from 11 provinces in southern China representing the core distribution areas of Masson pine: Anhui, Fujian, Guangdong, Guangxi, Guizhou, Hubei, Hunan, Jiangsu, Jiangxi, Sichuan and Zhejiang ( Figure 1). A total of 301 P. massoniana trees were sampled in this study and were evenly distributed among 10 diameter classes (i.e., 2,4,6,8,12,16,20,26,32 cm and >38 cm) with about 30 sample trees per diameter class. After measuring the diameter at breast height (DBH) and CW of each sample in the field, the trees were felled, and tree height (H) and CL were measured. The CR and HCB of P. massoniana were calculated based on their definitions. All 301 trees were used for model fitting and validation. Distributions of the measured and derived variables for the 301 trees are shown in Figure 2.
Forests 2020, 11, x FOR PEER REVIEW 3 of 18 pine: Anhui, Fujian, Guangdong, Guangxi, Guizhou, Hubei, Hunan, Jiangsu, Jiangxi, Sichuan and Zhejiang ( Figure 1). A total of 301 P. massoniana trees were sampled in this study and were evenly distributed among 10 diameter classes (i.e., 2,4,6,8,12,16,20,26,32 cm and >38 cm) with about 30 sample trees per diameter class. After measuring the diameter at breast height (DBH) and CW of each sample in the field, the trees were felled, and tree height (H) and CL were measured. The CR and HCB of P. massoniana were calculated based on their definitions. All 301 trees were used for model fitting and validation. Distributions of the measured and derived variables for the 301 trees are shown in Figure 2.     Figure 2.

Basic Model Selection
We modeled CR and HCB using logistic, exponential, Richards and Weibull equations [33,37,49,61]. The Richards and Weibull equations are special forms of logistic and exponential equations, respectively [49,61]. We selected six CR models and five HCB models as candidate models (Table 1). Because H and DBH are key predictor variables in modeling forest crown characteristics [37], we used both H and DBH as initial predictor variables in each HCB and CR candidate model. Based on model evaluation and validation, the best basic model was then finally identified for further analysis.

Additional Variable Selection
Stand development stage, site quality, stand density or competition may substantially influence the tree crown [36,48,67]. Hasenauer and Monserud [36] divided the variables that predict CR and HCB into the following three groups: tree size characteristics (SIZE), competition measures (COMP) and site factors (SITE). The βX in candidate models in Table 1 is described as a linear function of the variables of the three groups and is expressed as follows [36,37,61,62]: where β 0 is the intercept of the model, and b, c and d are a set of parameters of the model.
Size-related attributes included DBH and H. In addition, we also considered transformations of the basic size variables, including the power of DBH (DBH 2 and DBH 0.5 ) and the natural logarithm of DBH and H (ln(DBH) and ln(H)). These size-related attributes are important means of describing stand structure, tree vigor and the competitive ability of individual trees, and hence are closely related to CR and HCB [68,69]. For instance, a shorter DBH and a larger tree height, which correspond to a higher stand density, are usually associated with a lower CR (or higher HCB) [37,49].
Competition-related covariates included the height-to-diameter ratio (HDR) and CW. We also considered their transformation, such as the natural logarithm of CW (ln(CW)). HDR has been used extensively to describe tree slenderness, which is significantly associated with the degree of competition within a forest stand [70][71][72]. CW is also a variable that describes competition among neighboring trees and has been used extensively in several studies [73][74][75]. CR generally decreases as HDR or CW increases and vice versa for HCB [62,76].
Site-related covariates were not considered in our study because of a lack of site data. Relative to tree size and competition variables, many authors [36,37] have reported that site variables make insignificant contributions to predictions of CR and HCB.
We selected variables using graphical or visual analysis of the data and examination of the correlation statistics [37,62,63,69,77,78]. Moreover, an analysis of the relationship between the residuals of the original models and the potential regressors (one at a time) was also conducted for variable selection. Different combinations of candidate variables and their transformations were tested in the best base model based on the adjusted determination coefficient (R 2 a ) and root mean square error (RMSE) [33,78]. Additionally, because many [69,79,80] have suggested that introducing many predictors into models not only impedes computational convergence but also results in biased parameter estimates, we also considered over-parameterization before deciding on the final form of our predictive models.

Parameter Estimation and Evaluation
The parameters of all nonlinear regressions were estimated using the ordinary nonlinear least squares technique [33]. The performance of models was evaluated using four statistical indices: R 2 a , RMSE, Akaike's information criterion (AIC) and the Bayesian information criterion (BIC) [81,82]. Their formulas are shown below: where y i is the observed value,ŷ i is the estimated value, y i is the mean observed value, n is the number of observations and p is the number of estimated parameters of the model (including the intercept). A model with a high R 2 a and low values of the other indicators (RMSE, AIC and BIC) was judged to have a good fit. Additionally, the presence of heteroscedasticity was checked by plotting the studentized residuals against the predicted values, and normal quantile-quantile (Q-Q) plots were used to check for normality of the studentized residuals [83]. If heteroscedasticity was detected, dependent variables were square-root or logarithm transformed to eliminate heteroscedasticity [84].

Model Validation
The robustness and predictive performance of the model was further validated using a 10-fold cross-validation procedure [85,86]. The cross-validation evaluation indicators used in this study included the normalized mean square error for the test set (NMSE te ) [87] and the predicted residual error sum of squares (PRESS) [88]. The description of each indicator is as follows: where y i is the observed value,ŷ i is the estimated value andŷ i is the mean observed value. A model with lower NMSEte and PRESS was judged to have better predictive performance.
All analyses were conducted using R version 3.6.1 statistical software (R Foundation for Statistical Computing, Vienna, Austria) [89]. The following R packages were used in this study: "nlme" [90] for model fitting and "ggplot2" [91] for plotting.

Basic Model Selection
The goodness of fit statistics of model evaluation and validation for all candidate CR models are shown in Table 2. Additionally, we also drew the residual and quantile-quantile (Q-Q) plots of all models and found that they were normally distributed without obvious heteroscedasticity ( Figure 3). Furthermore, plots of the predicted values versus the observed values of all models ( Figure S2, in the Supplementary file) indicated that there were no differences among these models. Although the CR4 model performed best in model evaluation and validation, the predictive value of CR4 cannot be constrained to [0, 1], meaning that this model lacks biological relevance. After the CR4 model, the CR2 model showed the highest performance and was therefore selected as the best base model to further develop the final CR model. The best basic model form for CR was the following: In the process of selecting the best basic model of HCB, we found that the residuals of all candidate models showed pronounced heteroscedasticity ( Figure S3, in the Supplementary file). Additionally, the plots of the predicted values versus the observed values for all HCB candidate models revealed a tendency for overestimation ( Figure S4, in the Supplementary file). However, the heteroscedasticity was successfully mitigated (Figure 4) and the tendency of overestimation was ameliorated ( Figure S5, in the Supplementary file) by conducting a square-root transformation of the dependent variable.
The goodness of fit statistics of model evaluation and validation for all candidate HCB models with the square-root transformation are shown in Table 3. Although there were almost no significant differences in the performance of all HCB candidate models, HCB2 and HCB4 were slightly better performing than the other models. The model HCB4 was not considered, given that the range of dependent variables could not be limited to [0, H]. Therefore, we selected the HCB2 model as the best basic model form to construct the final model of HCB. The best basic model for predicting HCB was the following: In the process of selecting the best basic model of HCB, we found that the residuals of all candidate models showed pronounced heteroscedasticity ( Figure S3, in the Supplementary file). Additionally, the plots of the predicted values versus the observed values for all HCB candidate models revealed a tendency for overestimation ( Figure S4, in the Supplementary file). However, the heteroscedasticity was successfully mitigated (Figure 4) and the tendency of overestimation was ameliorated ( Figure S5, in the Supplementary file) by conducting a square-root transformation of the dependent variable.   The goodness of fit statistics of model evaluation and validation for all candidate HCB models with the square-root transformation are shown in Table 3. Although there were almost no significant differences in the performance of all HCB candidate models, HCB2 and HCB4 were slightly better performing than the other models. The model HCB4 was not considered, given that the range of dependent variables could not be limited to [0, H]. Therefore, we selected the HCB2 model as the best Residual and quantile-quantile (Q-Q) plots of HCB candidate models with square-root-transformed data. The solid line in the residual plots is the smoothing spline.

Inclusion of Additional Covariates
The relationships among the dependent variables (CR and HCB), the candidate covariates and their transformation were first investigated by performing a visual analysis ( Figures S6 and S7, in the Supplementary file). We found that all of the candidate covariates and their transformations were negatively correlated with CR ( Figure S6). In contrast, HCB showed a positive relationship with all of the candidate covariates and their transformation, except for HDR, which exhibited a negative correlation ( Figure S7). Furthermore, we also conducted a Pearson correlation analysis to quantitively investigate the relationships among the dependent variables (CR and HCB), the candidate covariates and their transformation. The correlation coefficients are shown in Table 4. We found that the results of the Pearson correlation analysis were consistent with the visual analysis.  To identify additional covariates, we also performed an analysis of the relationship between the residuals of the original basic models and the potential regressors (one at a time) (Figures S8 and S9, in the Supplementary file). We found that almost all of the residuals improved after adding the regressors (one at a time). Similar to Temesgen et al. [62], the inclusion of HDR in the best basic model of CR improved the performance of the model, although the correlation between HDR and CR was not significant in this study (Table 4). Thus, we still considered HDR when developing the final CR model.
Based on the best basic model, different combinations of additional covariates were added and examined. We identified the best combination of additional covariates for the best basic model of CR and HCB (Table 5) based on adjusted determination coefficients and root mean square error. In general, we observed that the performance of the model improved as additional covariates were added (Table 6). For instance, the R 2 a increased by 0.058 from 0.3983 (initial variables: DBH and H) to 0.4563 (two additional covariates: DBH0.5 and HDR). However, the R 2 a only increased by 0.0089 from 0.4563 (two additional covariates: DBH0.5 and HDR) to 0.4652 (three additional covariates: DBH2, DBH0.5, HDR). Consequently, in order to maximize model simplicity and generality, we finally selected CR2_1 as our final CR predictive model, which was written as follows:  The residual plot and quantile-quantile (Q-Q) plot of the final CR model is shown in Figure 5. The residuals were normally distributed with no pronounced heteroscedasticity.  The residual plot and quantile-quantile (Q-Q) plot of the final CR model is shown in Figure 5. The residuals were normally distributed with no pronounced heteroscedasticity. For the best basic HCB model (HCB2), the best combination of additional covariates is shown in Table 5. The models improved slightly when covariates were added (Table 7). However, since CW measurement in these models is labor-intensive as well as time-and money-consuming, we decided to not introduce any additional covariates and selected the best basic HCB model (HCB2) as our final HCB model. Figure 6 shows the comparison between the predicted and observed values of the final CR and HCB models.  For the best basic HCB model (HCB2), the best combination of additional covariates is shown in Table 5. The models improved slightly when covariates were added (Table 7). However, since CW measurement in these models is labor-intensive as well as time-and money-consuming, we decided to not introduce any additional covariates and selected the best basic HCB model (HCB2) as our final HCB model. Figure 6 shows the comparison between the predicted and observed values of the final CR and HCB models.

Discussion
To systematically compare this research with existing models, we constructed a table for other similar models containing model equations, tree species, predictors and lack of fit statistics (i.e., 2 R or RMSE) (Table S1, in the Supplementary file). The equations for predicting CR and HCB are mainly logistic equations, followed by exponential equations (Table S1). The final CR and HCB models of Masson pine in this study were developed based on the best candidate model (basic model): the Richards equation (a special form of the logistic equation). The advantage of this form is that the prediction of CR and HCB can be constrained to be between 0 and 1 as well as between 0 and H, which avoids the generation of predictions that are inconsistent with reality [33,36]. Furthermore, the Richards form has also been found to significantly reduce biases, and the problems of convergence rarely occur [49]. Thus, the Richards form has been used extensively to model crown characteristics, such as CR [61,63], HCB [49,66] and CW [78]. However, the best candidate model (basic model) might change after additional covariates are included. Similar to other CR and HCB modeling studies (Table S1), DBH and H were also the key predictors for predicting CR and HCB in our study. This could be because H and DBH can describe tree structure, tree vigor and competitive ability well [37,69]. Additionally, we also included potential covariates into the best basic model for CR and HCB. Generally, we found that the model performance of CR and HCB improved with the addition of covariates. However, improvements in performance with the addition of covariates were minor for the HCB model; therefore, the best basic model of HCB was selected as our final HCB model.
In contrast, there was a pronounced improvement in the performance of the CR model when other covariates were introduced. In addition to DBH and H, HDR and DBH 0.5 were included in our final CR model. Many authors [36,[62][63][64] have shown that HDR is a robust predictor of CR and some of them [36,64] have argued that HDR can be considered as an indicator of past competition. For instance, Dyer and Burkhart [64] found that HDR made significant contributions to predicting CR and they think this ability to predict CR stems from the fact that HDR largely accounted for the effect of density on CR; that is, HDR can accurately represent local competition status. Therefore, HDR can serve as a good surrogate for local competition in the absence of competition variables.
Generally, a tree with a greater diameter or a smaller tree height is associated with a higher CR (or lower HCB) in a stand [33,49]. The findings of our study are consistent with this conclusion. In general, the canopy of a larger tree that grows better in a stand experiences less competition from surrounding trees and has better light conditions, resulting in higher crown length (or lower HCB) [81]. Additionally, we observed that CR was negatively correlated with tree height while HCB was positively correlated with tree height. This observation suggested that taller trees in a dense stand experience a greater degree of competition and physical interactions from neighboring trees, or poorer light conditions under the canopy of taller trees, which may result in crown recession and a larger HCB (or lower CR). We also observed that CR was negatively correlated with HDR. This same finding has been reported by several other studies [49,61,62,69]. HDR has also been called the

Discussion
To systematically compare this research with existing models, we constructed a table for other similar models containing model equations, tree species, predictors and lack of fit statistics (i.e., R 2 or RMSE) (Table S1, in the Supplementary file). The equations for predicting CR and HCB are mainly logistic equations, followed by exponential equations (Table S1). The final CR and HCB models of Masson pine in this study were developed based on the best candidate model (basic model): the Richards equation (a special form of the logistic equation). The advantage of this form is that the prediction of CR and HCB can be constrained to be between 0 and 1 as well as between 0 and H, which avoids the generation of predictions that are inconsistent with reality [33,36]. Furthermore, the Richards form has also been found to significantly reduce biases, and the problems of convergence rarely occur [49]. Thus, the Richards form has been used extensively to model crown characteristics, such as CR [61,63], HCB [49,66] and CW [78]. However, the best candidate model (basic model) might change after additional covariates are included.
Similar to other CR and HCB modeling studies (Table S1), DBH and H were also the key predictors for predicting CR and HCB in our study. This could be because H and DBH can describe tree structure, tree vigor and competitive ability well [37,69]. Additionally, we also included potential covariates into the best basic model for CR and HCB. Generally, we found that the model performance of CR and HCB improved with the addition of covariates. However, improvements in performance with the addition of covariates were minor for the HCB model; therefore, the best basic model of HCB was selected as our final HCB model.
In contrast, there was a pronounced improvement in the performance of the CR model when other covariates were introduced. In addition to DBH and H, HDR and DBH 0.5 were included in our final CR model. Many authors [36,[62][63][64] have shown that HDR is a robust predictor of CR and some of them [36,64] have argued that HDR can be considered as an indicator of past competition. For instance, Dyer and Burkhart [64] found that HDR made significant contributions to predicting CR and they think this ability to predict CR stems from the fact that HDR largely accounted for the effect of density on CR; that is, HDR can accurately represent local competition status. Therefore, HDR can serve as a good surrogate for local competition in the absence of competition variables.
Generally, a tree with a greater diameter or a smaller tree height is associated with a higher CR (or lower HCB) in a stand [33,49]. The findings of our study are consistent with this conclusion. In general, the canopy of a larger tree that grows better in a stand experiences less competition from surrounding trees and has better light conditions, resulting in higher crown length (or lower HCB) [81]. Additionally, we observed that CR was negatively correlated with tree height while HCB was positively correlated with tree height. This observation suggested that taller trees in a dense stand experience a greater degree of competition and physical interactions from neighboring trees, or poorer light conditions under the canopy of taller trees, which may result in crown recession and a larger HCB (or, lower, C.R.). We also observed that CR was negatively correlated with HDR. This same finding has been reported by several other studies [49,61,62,69]. HDR has also been called the slenderness coefficient, as it is usually associated with a tree's mechanical stability when calculated at the level of the individual tree [92]. Therefore, HDR has been used extensively to characterize stand stability against disturbances, such as wind and snow [20,72,92].
Many authors have established models of CR and HCB for different tree species (Table S1). The coefficient of determination (R 2 ) and RMSE have often been used to measure model performance. For CR models, the values of R 2 and RMSE are mainly in the range of 0.09-0.94 and 0.041-0.168, respectively; in contrast, the range of R 2 and RMSE for HCB models is 0.04-0.91 and 0.239-3.330, respectively. In our study, the value of R 2 a and RMSE in the CR model was 0.46 and 0.128. These values were within the range of previous studies, suggesting that our CR model is applicable for Masson pine forests in China with an appropriate level of reliability. In comparison, the value of RMSE was 0.350 and R 2 a was 0.88 in our HCB model, which is better than most of those documented in previous studies. However, our HCB model only contained DBH and H relative to existing models. The distinguished performance of our HCB model might be explained by the stable (linear) relationship between H and HCB, which also explains why there was little difference among model candidates of HCB. In contrast, the other independent variable (CW) was measured before the trees were felled. Moreover, it was generally more difficult to measure CW, which may have resulted in less accurate measurements.
Competition indices are frequently used as predictors in CR and HCB models (Table S1). For instance, Hasenauer and Monserud [36] included crown competition factor (CCF) in their CR model for Austrian forests. Sharma et al. [69] introduced basal area in larger trees (BAL) to predict HCB. In addition, many [63,69,93] have insisted that including site-related covariates, such as elevation, site index, slope and aspect, could significantly improve model performance; however, some [36,49,62] have disagreed with this suggestion and have argued that these variables only contribute 1% to 5% to the R 2 in CR and HCB models.
Mixed-effect models have been extensively used to develop forestry models, such as the individual tree diameter growth model [77,[94][95][96][97], the tree crown width model [78,80,98,99], the site index model [100][101][102] and the biomass model [103][104][105][106]. These studies have shown that mixed-effect models can significantly improve model performance. For instance, Fu et al. [33] reported that the R 2 increased from 0.26 to 0.64 after introducing both blocks and the interaction of blocks and sample plots as random effects when developing a crown ratio model. Pokharel and Dech [107] included ecological land classification (ELC) ecosites as a random effect to predict basal area growth and found that the model's performance was also significantly improved. They attributed the improvement to the fact that ELC ecosites used substrate characteristics (e.g., soil texture, moisture regime and soil depth) and overstorey tree species composition to delineate forest conditions. Therefore, ELC could accurately represent the specific site environment of each tree.
In our study, detailed site information, such as ELC ecosites, were not available; we only had access to information on the province in which each tree was located. Including province as a random effect in the models did not improve model performance, which might stem from the fact that the delimitation of provinces is not inherently based on the biological, geographical and climatic factors that significantly affect tree growth. In future studies, site conditions and other types of relevant information affecting the environments associated with trees need to be collected when conducting forest inventories.

Conclusions
Here, we developed CR and HCB models for Masson pine in southern China. In practice, forest managers could use our CR and HCB models combined with other forest models (e.g., the basal area increment model and crown width model) to estimate crown surface area, crown volume and crown biomass. Additionally, CR and HCB models can also be used as submodels in forest growth simulators, which are useful tools for prescribing forest management strategies. Most importantly, CR and HCB are critically important for forest fire management, as these models can be used in fire modeling systems that can be used to assess crown fire hazards and provide fuel treatment alternatives for forest managers. However, other canopy fuel characteristics, such as CFL and CBD, were not examined because of a lack of data. We encourage future studies to examine these other canopy fuel characteristics, given that the joint consideration of all possible fuel characteristics would facilitate the development of forest fire management strategies.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/11/1216/s1. Table S1. Detail information of crown ratio and height to crown base mode in existing literature; Figure S1. Schematic diagram of tree crown dimensions; Figure S2. Plots of predicted values against observed values of CR candidate models; Figure S3. Residual and quantile-quantile (Q-Q) plots of HCB candidate models. The solid lines in residual plots is a smoothing spline; Figure S4. Plots of predicted values against observed values of HCB candidate models; Figure S5. Plots of predicted values against observed values of HCB candidate models with root square-transformed data; Figure S6. Scatter plot of CR and all variables; Figure S7. Scatter plot of HCB and all variables; Figure S8. Residuals of the CR best base model with the potential covariates. The first subplot showed the residual of the CR best base model; Figure S9. Residuals of the HCB best base model with the potential covariates. The first subplot showed the residual of the HCB best base model.