Live Crown Ratio Models for Loblolly Pine ( Pinus taeda ) with Beta Regression

: The growth and production potential of a tree depends on its crown dimensions as these are closely related to a tree’s photosynthetic capacity. However, tree crowns have been studied less compared to their main stems because of their lower market value and because the measurement of crown dimensions, such as crown volume or surface area, is difﬁcult. Frequently, an individual tree’s live crown ratio (LCR) is predicted by linear or nonlinear models that are a function of easy-to-measure dendrometric variables using ordinary least-squares techniques. Using the long-term data from established genetic and spacing trials, we developed and evaluated the predictive performance of three nonlinear models and introduced a new generalized linear model for predicting LCR. The nonlinear models were ﬁt using exponential, Weibull, and Richards functions. The generalized linear model was based on beta regression. This resulted in a slightly smaller error than the other models in predicting the LCR of loblolly pine trees used in this study. Crown ratio is measured in percentage unit and should be modeled using generalized linear models that assume a beta distribution for error terms.


Introduction
The growth potential and production potential of a tree depend on its crown dimensions because these determine a tree's photosynthetic capacity [1][2][3]. Depending on the tree and its site characteristics, the tree crown accounts for a substantial portion of the tree's total biomass and provides critical information in the assessment of economic feasibility of crown utilization for bioenergy production. Tree crowns are also relevant in the studies of stand growth due to their close correlation with stem diameter and stand density. From a wildlife management perspective, crown dimensions impact the availability of the type and amount of understory vegetation, which correlates with the habitat suitability of different wildlife species. However, tree crowns have been studied much less compared to main stems as a result of their lower market value [1,2]. Additionally, crown dimensions such as crown volume or surface area are difficult to measure directly; therefore, live crown length, height to the base of the live crown, or live crown ratio (LCR) are often used as surrogate predictors in individual tree growth and yield models [3]. The LCR plays a key role in determining thinning and other treatment (e.g., fertilization) responses in individual tree growth and yield models [4] Individual tree LCR is defined as the ratio of live crown length to tree height and provides an estimate of potential photosynthetic surface area for a tree compared to its total respiration area [5]. Therefore, LCR is considered an indirect measure of a tree's photosynthetic capacity and serves as an indicator of a tree's competitive status in the stand [6]. A tree with an LCR of 1.0 (or 100%) has the maximum possible leaf surface area and biomass, whereas an LCR value closer to zero indicates that a tree's growth is inhibited because of limited leaf surface area [1]. A tree with a greater LCR can take greater advantage of available light resources than a tree with a smaller LCR [5]. Accordingly, for a given species and age, trees with larger crown sizes have higher growth rates [7]. For example, Patton et al. [8] observed higher annual diameter growth after thinning in planted white spruce (Picea glauca (Moench) Voss) trees with larger crown ratios in Minnesota, USA.
The size of a live crown, as well as its distribution along the bole, affects the shape of a tree's bole [9] and biomechanics. This also affects the quality of logs that can be retrieved from a tree. Therefore, crown dimensions have also been used as predictors of stem taper in addition to diameter and height (e.g., [10,11]). Tree crowns can also be critical in fuel load assessments and in the development of fire management strategies [12]. For example, the height to base of a live crown, a variable algebraically related to LCR, is a critical factor in modeling fire behavior, as it relates to the rate of spread of a crown fire. Additionally, smaller heights to the base of live crowns (i.e., larger crown ratios) are associated with an increased risk of crown fires developing from surface fires [13].
The LCR is indicative of tree vigor, and an LCR of 0.33 or higher is optimal for the optimum growth of southern pines [14]. Crown ratio itself is influenced by many tree and site factors such as species, age, and stand density. Younger trees have a high LCR, with almost 100% of the stem eligible to be considered as a live crown and, when the stand is at its oldest and growth is slowing, the LCR reaches its minimum [14]. The increase in stand density causes lower branches to die and reduces LCR [14][15][16]. Similarly, increasing management intensity, such as bedding, chemical site preparation, pest control treatments, and micronutrient applications, tends to decrease the LCR [14].
The LCR follows an inverse sigmoidal curve, with trees having almost 100% LCR in their early years, rapidly declining as stands enter a period of height growth, and then gradually leveling off at the minimum value as stands grow older [14]. This LCR property has been modeled using several nonlinear functions such as exponential, logistic, Richards, and Weibull functions. The objective of this study was to develop an LCR model based on beta regression using tree-and stand-level variables and compare its performance with the LCR models based on commonly used functions to predict individual tree LCR for loblolly pine (Pinus taeda L.) trees.

Data
Data used in this study were obtained from a long-term genetic and spacing study established to evaluate their effects on the growth and yield of loblolly pine. Seedlings from eight open-pollinated families (families NC1-NC8) of loblolly pine in North Carolina, and one open pollinated "local commercial check" (LCK) from east-central Mississippi and west-central Alabama, were planted between 22 April and 7 May 1985, at two sites on the John Starr Memorial Forest in Winston County, MS. Families NC1 and NC8 are fast growing with small crowns, NC4 and NC7 are fast growing with large crowns, NC3 and NC6 are slow growing with small crowns, and NC2 and NC5 are characterized by slow growth and large crowns. The eight NC families were selected based on the 12-year-old progeny tests to represent the range of growth rates and crown dimensions [17]. The soil was an acidic, silty clay loam with fragipan, and was somewhat poorly drained [17]. The soil series was the same at both sites; however, one site was previous forest ground that was cleared and site-prepared, while the other site was an agriculture field [17].
The experimental design was a randomized complete block design with four blocks at each site. The family treatments were arranged in split-split plots, with each replication split into three spacing levels (1.  local check (LCK) family, respectively. Summary statistics of individual tree dbh, height, and crown ratio on family level are presented in Table 1. 0.52 (0.14)

Methods and Models
Several model forms, linear and nonlinear, are used to model crown ratios. The most commonly used functions include the exponential, logistic, Richards, and Weibull. One important consideration in modeling ratios and percentages such as LCR is to select a model form that ensures predictions are biologically meaningful. Since the LCR is bounded between zero and 1.0 (0 to 100%), the selected models should provide predictions that are confined within this range. Depending on model formulations, many functions allow predictions to fall outside the (0, 1) boundary. In this study, we used three common nonlinear functions (i.e., exponential, Richards, and Weibull) and one generalized linear model based on beta regression ( Table 2). Although initially considered, the logistic function was dropped because, compared to other functions, it performed poorly for ages 9 and 21 and was similar to other nonlinear functions for ages 13 and 17. The exponential function was selected from the model used in the distance-dependent individual tree loblolly pine growth and yield model PTAEDA 4.0. Therefore, we refer to this model as PTAEDA 4.0 in this study. All statistical analyses were carried out using statistical software R 4.1.1 [18]. Table 2. Models and equations for predicting live crown ratio of loblolly pine trees used in this study.

Model
Prediction Equation Note:LCR = predicted live crown ratio; D = dbh (cm); H = total tree height (m); A = age (years). The sixth root in the denominator of Richards function and the tenth power in the Weibull functions were used based on the findings of Soares and Tomé [19] to overcome the issue of model convergence.

Beta Regression
Beta regression was introduced by [20] and is useful for modeling variables that take on values that are bounded in (0, 1) intervals. Beta regression is based on the assumption that the dependent variable is beta-distributed and that its mean is related to a set of predictors through a linear predictor and a link function and naturally incorporates features such as heteroscedasticity or skewness, commonly observed in data that take on values in (0, 1) intervals, such as rates or proportions [21]. This approach has been used to model percentage of canopy cover in conifer-dominated stands [22], riparian percentage of shrub cover [23], proportion of biomass components [24], and fire severity in forested landscapes [25].
Complete statistical details about the beta regression formulation and estimation can be found in [20]. Briefly, with the parameterization of beta distribution suggested by [20], the beta regression model can be formulated as: where, g(·) is a strictly increasing and double differentiable link function that maps (0, 1) into the real line R, x i = (x i1 , . . . . . . . . . , x ik ) T is a vector of k explanatory variables, β = (β 1 , . . . . . . . . . , β k ) T is a vector of unknown k unknown regression parameters (k < n), and η i is a linear predictor (i.e., η i = β 1 x i1 + . . . + β k x ik ). Usually, x i1 = 1 for all i such that the model has an intercept [20]. Beta regression was implemented in R 4.1.1 [18] with the betareg function in betareg package [21]. We used logit link function g(µ) = log µ 1−µ ; therefore, the predicted LCR was obtained asLCR i =

Model Evaluation and Additional Predictors
The selected models were compared with one another using mean absolute error (MAE), mean absolute percentage error (MAPE), root mean square error (RMSE), and RMSE percent, computed as follows where y i ,ŷ i , and _ y are observed, predicted, and mean live crown ratio, respectively, and n is the number of trees in the dataset.
Zhao et al. [14] reported that live crown ratio was affected by stand density and management intensity. Therefore, we first identified the best model using only tree-level variables (i.e., dbh, height, and their combinations as identified). Then, stand-level variables, namely trees per hectare (TPH) and the basal area per hectare (BAL, m 2 /ha) of trees larger than the subject tree in the given plot, were added to the model. The contribution of stand-level variables was assessed by comparing errors produced by enhanced models with the errors obtained from the base model.

Results and Discussion
There were no obvious trends in the LCR and dbh relationship (Figure 1) for any family of loblolly pine trees used in this study. However, the LCR slightly increased with increasing tree heights up to a certain point (about 12 m), and then declined and remained constant ( Figure 2) for all families. Although dbh and height are commonly used predictors, their regression relationship with LCR without any transformation or combination is not that obvious.

RMSE =
∑ (y − y ) n (8) RMSE% = RMSE y × 100 (9) where y , y , and y are observed, predicted, and mean live crown ratio, respectively, and n is the number of trees in the dataset.
Zhao et al. [14] reported that live crown ratio was affected by stand density and management intensity. Therefore, we first identified the best model using only tree-level variables (i.e., dbh, height, and their combinations as identified). Then, stand-level variables, namely trees per hectare (TPH) and the basal area per hectare (BAL, m 2 /ha) of trees larger than the subject tree in the given plot, were added to the model. The contribution of standlevel variables was assessed by comparing errors produced by enhanced models with the errors obtained from the base model.

Results and Discussion
There were no obvious trends in the LCR and dbh relationship (Figure 1) for any family of loblolly pine trees used in this study. However, the LCR slightly increased with increasing tree heights up to a certain point (about 12 m), and then declined and remained constant ( Figure 2) for all families. Although dbh and height are commonly used predictors, their regression relationship with LCR without any transformation or combination is not that obvious.  The LCR decreased with increasing age for all levels of initial plantation spacing ( Figure 3). This is consistent with the findings of previous studies and is expected because, during the early stage of growth, trees retain almost all branches, which start to decline rapidly as they begin to grow in height [14]. At any given age, the LCR was higher for wider spacing. This is also expected as, the wider the spacings, the longer the crown length [26]. When these results were disaggregated, the trend was consistent across all families of loblolly pine used in this study. The average LCR at age 9 ranged from 0.45 for the narrow (1.5 × 1.5 m) spacing to 0.66 for the wide spacing (3.0 × 3.0 m), whereas the average LCR at age 21 was 0.26 for the narrow spacing and 0.34 at age 21 years. Age 9 average LCR for medium (2.4 × 2.4 m) spacing was 0.58, which was reduced to 0.30 by age 21. It should be noted that the percent decline in LCR from ages 9 to 21 was almost The LCR decreased with increasing age for all levels of initial plantation spacing (Figure 3). This is consistent with the findings of previous studies and is expected because, during the early stage of growth, trees retain almost all branches, which start to decline rapidly as they begin to grow in height [14]. At any given age, the LCR was higher for wider spacing. This is also expected as, the wider the spacings, the longer the crown length [26]. When these results were disaggregated, the trend was consistent across all families of loblolly pine used in this study. The average LCR at age 9 ranged from 0.45 for the narrow (1.5 × 1.5 m) spacing to 0.66 for the wide spacing (3.0 × 3.0 m), whereas the average LCR at age 21 was 0.26 for the narrow spacing and 0.34 at age 21 years. Age 9 average LCR for medium (2.4 × 2.4 m) spacing was 0.58, which was reduced to 0.30 by age 21. It should be noted that the percent decline in LCR from ages 9 to 21 was almost identical for 2.4 × 2.4 m and 3.0 × 3.0 m spacings, i.e., both spacings experienced the same reduction in percentage from the initial live crown ratio.   The coefficient of determination (R 2 ) for the fitted models ranged from 0.65 for the exponential PTAEDA 4.0 model to 0.69 for the beta regression model ( Table 3). Note that the number of parameters estimated in these models is different. The PTAEDA 4.0 model requires the estimation of only two parameters, the Weibull model has three parameters, and Richards and beta models have four parameters. However, since the number of samples used in this study is large (n = 20,330), the difference between R 2 and adjusted R 2 is negligible. Furthermore, the number of independent (predictor) variables required by all models is same, i.e., all models require age, dbh, and total height to predict LCR. All models had similar residual plots, with residuals forming a band around zero without any trend. The predicted values of LCR obtained from all models are plotted against measured The coefficient of determination (R 2 ) for the fitted models ranged from 0.65 for the exponential PTAEDA 4.0 model to 0.69 for the beta regression model ( Table 3). Note that the number of parameters estimated in these models is different. The PTAEDA 4.0 model requires the estimation of only two parameters, the Weibull model has three parameters, and Richards and beta models have four parameters. However, since the number of samples used in this study is large (n = 20,330), the difference between R 2 and adjusted R 2 is negligible. Furthermore, the number of independent (predictor) variables required by all models is same, i.e., all models require age, dbh, and total height to predict LCR. All models had similar residual plots, with residuals forming a band around zero without any trend. The predicted values of LCR obtained from all models are plotted against measured (observed) values of loblolly pine LCR in Figure 4. Table 3. Parameter estimates, standard errors, and the coefficient of determination (R 2 ) for live crown ratio models fitted within the study. Prediction equations are provided in Table 2.

Model
Parameter Estimate * (Standard Error)  Although the error statistics produced by all models were very similar, beta regression produced the smallest error statistics compared to all other models (Table 4). Mean absolute error (MAE) for the beta regression model was 0.05, whereas all other models had an MAE value of 0.06, which could be interpreted as the predicted LCR being within 5.00% and 6.00% of the actual LCR for beta and other models, respectively. The mean absolute percentage error (MAPE) ranged from 15.45% to 16.84% for beta Although the error statistics produced by all models were very similar, beta regression produced the smallest error statistics compared to all other models (Table 4). Mean absolute error (MAE) for the beta regression model was 0.05, whereas all other models had an MAE value of 0.06, which could be interpreted as the predicted LCR being within 5.00% and 6.00% of the actual LCR for beta and other models, respectively. The mean absolute percentage error (MAPE) ranged from 15.45% to 16.84% for beta and PTAEDA 4.0 models, respectively. The RMSE was 0.07 for the beta model and 0.08 for all other models. The RMSE percentage was 18.79%, 18.81%, 18.44%, and 17.58% for PTAEDA 4.0, Weibull, Richards, and beta models, respectively. Therefore, our final model for predicting the LCR of loblolly pine trees is in the form of Equation (10). However, we would like to note that the predictions from different models were not significantly different from each other (p-value = 0.753).
The RMSE percent and MAPE values for all models generally increased with increasing age (Table 5). Except for age 17, the model based on beta regression produced comparatively smaller RMSEs than other models. For age 17, the PTAEDA 4.0 model had the highest RMSE percent (19.75%), and all other models had almost identical RMSE percentages: 18.32%, 18.26%, and 18.41% for Weibull, Richards, and beta models, respectively. Model performance was also identical in terms of MAPE (Table 5). The LCR is bounded between 0 and 1. Hence, the most appropriate model for this type of data should assume an error distribution that is also within this interval. Another alternative is to fit models for predicting height to the base of a live crown (HCB) and then compute the LCR using HCB and total tree height. However, the relationship between HCB with dbh and total height may not be linear. For example, [13] found that the HCB models without transformation tended to overestimate HCB and produced heteroscedastic errors. They corrected the issue of overestimation and heteroscedasticity by using square root transformation of the dependent variable. However, any transformation of the dependent variable can result in erroneous prediction caused by the back-transformation bias.
As a tree crown is affected by competition, using stand-level variables such as stand density can further improve the predictive ability of individual tree LCR models [27]. Therefore, the basic beta regression model in Table 2 was enhanced by incorporating stand-level variables computed for each sampled plot. The additional stand-level variables used in the current study were basal area per hectare of trees larger than the subject tree (BAL) and trees per hectare (TPH). Adding BAL to the base model did not improve model performance at any age (Table 6). Adding TPH or both BAL and TPH to the base model reduced RMSE percent and MAPE at ages 9 and 13, but not ages 17 and 21 (Table 6). Liu and Burkhart [28] found a statistically significant correlation between crown height (which is algebraically related with LCR) and initial planting density for the loblolly pine trees through age 8, which is in contrast to the results obtained in the current study. This could be because the minimum age of the trees used in this study was 9 years and, potentially, the effect of the stand-level variables on crown dimensions was less evident during the later years of tree growth. Chumra et al. [29] found statistically significant differences in crown lengths between two families of young loblolly pine trees growing on the same site in the U.S. Western Gulf. A likelihood ratio test of full versus reduced model showed statistically significant differences in crown ratio among families. Parameter estimates for the family-specific beta regression model are given in Table 7. Table 7. Parameter estimates and their standard errors (in parenthesis) for family-specific beta regression models for predicting live crown ratio of loblolly pine. Family NC3, which is characterized by its slow growth and small crown, was, however, not significantly different from the local commercial check. It should be noted that crown length was measured in absolute units (e.g., m or ft.) but LCR was a relative measure. Trees with the same LCR can have different crown lengths and vice versa, depending on total tree height. Chumra et al. [29] also studied the differences in crown shape ratio, estimated as the ratio of maximum measured crown diameter to crown length. They found this ratio to be more impacted by environmental variables such as site conditions than genetic factors such as families. The final equation for predicting individual tree LCR by family iŝ where b 0 − b 4 are the coefficients for family in Table 7 and all other variables are described as previously.

Conclusions
Using data from a genetic and spacing study, we developed and evaluated the predictive performance of three nonlinear models and a generalized linear model for predicting the LCR of loblolly pine trees. The generalized linear model based on beta regression explained the most variability in LCR (i.e., had a higher R 2 ) compared to the other models and generally produced a smaller root mean square error. The beta regression model used in this study produced slightly smaller error statistics (performed better) than the nonlinear models fit using the ordinary least-squares method, as well as the crown ratio model used in a widely used individual tree growth and yield model (PTAEDA 4.0) for loblolly pine in the southern United States. Furthermore, the assumption of error distribution in beta regression makes it more appropriate for fitting LCR models. Family effects were significantly different but family NC3, which was characterized by its slow growth and small crown, was not significantly different from the local commercial check. The basal area of larger trees did not reduce error, but the addition of trees per hectare did slightly improve model performance. Crown ratio is measured in percentage unit and should be modeled using generalized linear models that assume a beta distribution for error terms.