A Crown Contour Envelope Model of Chinese Fir Based on Random Forest and Mathematical Modeling

: The tree crown is an important part of a tree and is closely related to forest growth status, forest canopy density, and other forest growth indicators. Chinese ﬁr ( Cunninghamia lanceolata (Lamb.) Hook ) is an important tree species in southern China. A three-dimensional (3D) visualization assistant decision-making system of plantations could be improved through the construction of crown contour envelope models (CCEMs), which could aid plantation production. The goal of this study was to establish CCEMs, based on random forest and mathematical modeling, and to compare them. First, the regression equation of a tree crown was calculated using the least squares method. Then, forest characteristic factors were screened using methods based on mutual information, recursive feature elimination, least absolute shrink and selection operator, and random forest, and the random forest model was established based on the different screening results. The accuracy of the random forest model was higher than that of the mathematical modeling. The best performing model based on mathematical modeling was the quartic polynomial with the largest crown radius as the variable (R-squared ( R 2 ) = 0.8614 and root mean square error ( RMSE ) = 0.2657). Among the random forest regression models, the regression model constructed using mutual information as the feature screening method was the most accurate ( R 2 = 0.886, RMSE = 0.2406), which was two percentage points higher than mathematical modeling. Compared with mathematical modeling, the random forest model can reﬂect the differences among trees and aid 3D visualization of a Chinese ﬁr plantation. and Y.C.; data curation, Y.T.; writing—original Y.T.; Y.T., X.S., Y.Q., Y.T.


Introduction
The tree crown is an important part of a tree that reflects the growth status of individual tree, and also reflects the adaptation and variation degree of trees to different growth environments [1,2]. Significant physiological processes such as photosynthesis, respiration, and transpiration take place in the tree crown. The ecological environment in the tree crown is also an important component of the forest ecosystem. The shape of the tree crown and the distribution of its leaves affect the interception of rainfall and the utilization of solar energy [3][4][5][6]. The tree crown structure affects the growth of trees and also the dynamic changes of forest stands. Therefore, the study of the shape of tree crowns is of great significance. Visualization of the tree crown provides the basis for forest stand dynamic visualization simulation and is a major research topic in forestry informatization. It is also important for a three-dimensional (3D) visualization assistant decision-making system of plantations, because it permits direct observation of the growth status of plantations and forest model is more robust to outliers and noise, is more rapid than boosting algorithm, and is less overfitted. This model has been applied to fire prediction, forest growth, and harvest prediction [28].
Fujian Province is located in North China and it has the highest forest coverage rate (66.8%) in China. Chinese fir is one of the most important plantation tree species in Fujian Province and accounts for 21.35% of the total plantation area in China. To date, few studies have been conducted on the crown contour envelope of Chinese fir plantations. Most CCEMs based on mathematical modeling have only used crown depth as an independent variable to predict crown shape, and few studies have added variables such as tree height and diameter at breast height as covariates to improve model accuracy. In addition, machine learning has not been used to predict Chinese fir crown shape. Thus, the goals of the present study were the following: (1) to collect classical CCEMs suitable for Chinese fir and use Chinese fir crown data in Fujian Province to fit them; (2) to use different feature selection methods to screen tree factors that affect the crown shape of Chinese fir, construct the random forest regression model, and then fit hyperparameters; and (3) to evaluate the CCEMs constructed by mathematical modeling and random forest regression model and compare them.

Study Area Data Collection
The study area is located in Fujian Province of China (between 233.5 • to 28.3 • N and 115.6 • to 120.5 • E), which features a subtropical monsoon humid climate. Fujian is the core area for Chinese fir central producing, as it has the highest level of cultivation worldwide. The data of this study were collected from the Dali forest farm, the Lanxia forest farm, and the Jiangle state-owned forest farm in Jiangle County, Shunchang County, Fujian Province. The terrain of the land is primarily hilly and mountainous, with forest coverage greater than 90%. The tree species are mainly Chinese fir and Eucalyptus (Eucalyptus robusta Smith). Shunchang County is the central area of Chinese fir production in Fujian Province and is known as the "hometown of Chinese fir".
In this study, different age groups and different forest stand densities were used to set up temporary sample plots in the Chinese fir plantation. At the Dali forest farm, Lanxia forest farm, and Jiangle forest farm, 65, 23, and 3 standard plots, respectively, with a size of 30 m 2 , were set up in sample plots of Chinese fir plantation. Three to five trees were selected in each standard plot, and a total of 423 trees were studied. Measurements taken on trees included diameter at breast height (DBH, cm), total tree height (HT, m), crown length (CL), height under branch (HBLC), largest crown radius (LCR), and crown radius (CR) at 1/10 CL, 1/4 CL, 1/2 CL, 3/4 CL, and 9/10 CL from crown top to crown bottom ( Figure 1). In this study, we measured the crown radius in the north-south direction and the east-west direction and used the mean value as the variable to establish CCEMs.
The tree factors used to establish CCEMs were selected according to the factors affecting the growth of trees. The crown contour envelope is mainly affected by AGE, N, HT, DBH, CL, HBLC, and LCR. In addition, this paper also defines the following composite tree factors: the tree crown length ratio (CH), HT to DBH ratio (HD = HT/DBH), and tree crown diagonal coefficient (CLC = CL/LCR). To facilitate description of the crown contour envelope, we defined the perpendicular distance from any crown to the horizontal plane of crown top as DINC T , the perpendicular distance from any crown to the horizontal plane of crown bottom as DINC B , the ratio of DINC T to CL as RDINC T , and the ratio of DINC B to CL as RDINC B . All factors related to crown and their descriptions are shown in Table 1. The tree factors used to establish CCEMs were selected according to the factors affecting the growth of trees. The crown contour envelope is mainly affected by AGE, N, HT, DBH, CL, HBLC, and LCR. In addition, this paper also defines the following composite tree factors: the tree crown length ratio (CH), HT to DBH ratio (HD = HT/DBH), and tree crown diagonal coefficient (CLC = CL/LCR). To facilitate description of the crown contour envelope, we defined the perpendicular distance from any crown to the horizontal plane of crown top as DINCT, the perpendicular distance from any crown to the horizontal plane of crown bottom as DINCB, the ratio of DINCT to CL as RDINCT, and the ratio of DINCB to CL as RDINCB. All factors related to crown and their descriptions are shown in Table 1. To construct the CCEMs, the data was divided into different datasets to form a training set and test set by the following two principles: First, to ensure the integrity of the tree crown data, the tree crown data of the same tree should be divided into the same dataset. Second, as there are major differences in the tree crown shapes of Chinese fir of different ages, the training set and the test set must contain Chinese fir of the same age, and the proportion of Chinese fir of different ages in the training set and test set should be similar.
According to the two principles above, the datasets were randomly divided 10 times, according to the 70% proportion of the training set. The coefficient of variation of important factors (AGE, CR, DBH, HT, and N) was calculated, and a group of partition results with the most similar coefficients of variation was selected as the training set and test set for the model construction. Basic information of the dataset is shown in Table 2.

Mathematical Modeling of Chinese Fir Crown Contour Envelope Model (CCEM)
In this study, we defined the geometric cross-section formed by any plane across the trunk and tree crown as the max crown profile. In this geometric cross-section, we constructed a plane rectangular coordinate system with the crown top as the origin, the direction of trunk as the X-axis, and the direction perpendicular to the trunk as the Yaxis. The red curve shown in Figure 2 is the crown contour envelope, and the CCEM was constructed with CR as the dependent variable; RDINC as the independent variable; and HT, DBH, HBLC, CL, and DINC as covariates. The crown contour can be rotated 360 degrees around the X-axis to obtain the entire tree crown, therefore, the crown shape can be studied through the CCEM.
according to the 70% proportion of the training set. The coefficient of variation of important factors (AGE, CR, DBH, HT, and N) was calculated, and a group of partition results with the most similar coefficients of variation was selected as the training set and test set for the model construction. Basic information of the dataset is shown in Table 2.

Mathematical Modeling of Chinese Fir Crown Contour Envelope Model (CCEM)
In this study, we defined the geometric cross-section formed by any plane across the trunk and tree crown as the max crown profile. In this geometric cross-section, we constructed a plane rectangular coordinate system with the crown top as the origin, the direction of trunk as the X-axis, and the direction perpendicular to the trunk as the Y-axis. The red curve shown in Figure 2 is the crown contour envelope, and the CCEM was constructed with CR as the dependent variable; RDINC as the independent variable; and HT, DBH, HBLC, CL, and DINC as covariates. The crown contour can be rotated 360 degrees around the X-axis to obtain the entire tree crown, therefore, the crown shape can be studied through the CCEM. In early research, Gill and other scholars defined the crown shape of different tree species in different growth stages using some simple geometry, such as cylinder, cone, paraboloid, and so on [12][13][14][15]. This simple geometry was obtained by rotating the tree trunk with power function (straight line, linear function, throwing object line, etc.). The tree crown contour envelope based on this is expressed in the form of Model (1). In order to describe the tree crown shape more flexibly and reflect the difference between different tree species at different growth stages, McPherson modified Model (1) slightly to obtain Model (2) as follows: Baldwin constructed a CCEM to predict the crown radius at any position of Pinus taeda L. by using RDINC B as the variable [29][30][31]. The model described the vertical distribution of crown radius better. Crecent Campo added the LCR to Model (3), which was used to predict the tree crown shape of Pinus radiata. Chmura [32,33] combined Baldwin's model form and McPherson's [34] reparameterization method to obtain Model (4) to predict the tree crown shape as: Crecent Campo et al. used RDINC B as the variable and LCR as a constraint condition to construct CCEM successively by a quadratic polynomial, cubic polynomial, and quartic polynomial to simulate the crown shape of Radiata pine (Pinups radiata D. Don). Chen Dong [35] applied the following Models (5) to (8) to Chinese fir and achieved good results: The variable exponential model can predict the radius of the crown at different positions of the crown by changing exponential form with the change in the relative position of the tree crown. Hann [14,34] and Yanrong Guo [19] described the tree crown contour of Chinese fir using Models (9) and (10), respectively as follows: Kozak proposed and revised the trunk equation, a variable exponential equation. Because of the high degree of similarity between the trunk equation and the CCEM, Maguire, Garber, Weiskittle, Huilin Gao, and others modified and reparameterized the Kozak equation by adding tree factors DBH, CH, and HD [36,37]. This equation is more consistent with the characteristics of the tree crown contour. The basic form of the modified model is shown in Model (11) as: Wang Chengde added DBH to Model (1) and HT and N to Model (9) to predict the crown contour of Chinese fir and eucalyptus. The modified model is shown in Models (12)- (14) [38] as follows: The least squares method is widely used in model parameter fitting because it can find the best function matching by minimizing the sum of squares of errors. The above models are all multivariate nonlinear models. Among them, Model (1) and Model (2) can be transformed into simple linear regression models, and Models (3)-(9) can be transformed into multiple linear models for regression [39].
Generally, linear regression can be solved by using the general least squares method, but the nonlinear regression problem, such as that associated with Models (10)- (14), is more difficult to address. The Levenberg-Marquardt (L-M) algorithm is the most widely used nonlinear least squares algorithm. It was proposed by D.W.Marquardt in 1963 [40]. This method is a combination of steepest descent method and linearization method (Taylor series). The steepest descent method is suitable for the initial stage of iteration when the parameter estimation value is far from the optimal value, and the linearization method (Gauss-Newton method) is suitable for the later stage of iteration when the parameter estimation value is close to the optimal value [41]. The optimal value can be found quickly by combining these two methods. When the random forest method is used for regression, over fitting is likely to occur if there are too many features in the dataset. Some algorithms can be used to generate a dataset of the importance of each feature. With this dataset, a threshold can be determined and some features that are most helpful for model training can be selected. Then, model training can be carried out after selecting important variables. The common feature selection methods include filter, embedded, and wrapper [42]. In this study, the mutual information method (MI), recursive feature elimination (RFE), least absolute shrink and selection operator (LASSO), and random forest (RF) were used for feature selection. Each method is described below as follows: Mutual information method (MI): MI indicates whether the two variables X and Y are related, and the strength of the relationship [43]. If (X, Y)~p (X, Y), the mutual information I (X, Y) between X and Y is defined as: If X and Y are closely related, I (X, Y) will be larger. The minimum value of I (X, Y) is H(Y); at this time, H (Y|X) is 0, meaning that X and Y are completely related. When X is determined, Y is a constant value, and there is no probability of other uncertain situations; thus, H (Y| X) is 0. When I (X, Y) is taken as 0, X and Y are independent, and H (Y) = H (Y|X), which means that the appearance of X does not affect Y.
Recursive feature elimination (RFE): RFE is a greedy algorithm for finding the optimal feature subset. The main idea is to repeatedly build the model (regression model), select the best (or worst) features, separate the selected features, and then repeat the process on the remaining features until all features are traversed [44]. The order in which features are eliminated in this process is the sequence of features.
Least absolute shrink and selection operator (LASSO): LASSO was first proposed by Robert Tibshirani in 1996. This method is a type of compressed estimation [45]. By constructing a penalty function, a more refined model is obtained, which compresses some regression coefficients, that is, the sum of absolute values of mandatory coefficients is less than a fixed value, and some regression coefficients are set to 0. Therefore, this method retains the advantage of subset contraction and provides a biased estimation for data with multicollinearity. Lasso adds a norm LP as a penalty constraint in the calculation of minimizing the sum of squares of residuals (RSS). The advantage of lp-norm is that when λ is sufficiently large, some coefficients to be estimated can be reduced to 0 accurately. The determination of λ is carried out by cross-validation method by first cross verifying the given value of λ and selecting the minimum error of cross-validation. According to the obtained value of λ, the model can be refitted with all of the data.
Random forest: Random forest consists of multiple decision trees. Each node in the decision tree is a condition about a feature that divides the dataset into two according to different response variables. The node can be determined by using the impurity, for regression problems, the least squares method is typically used to fit models. When training the decision tree, we can calculate how much Gini impurity is reduced by each feature in the decision tree. For a decision tree forest, we can calculate how much Gini impurity is reduced by each feature, and then take the average reduction of Gini impurity as the value of feature selection. The formula of Gini impurity is as follows: where k indicates the number of categories and p mk is the proportion of node k in node m. Intuitively, the Gini impurity represents the probability of obtaining different categories from two samples that are randomly selected from node m.

Hyperparameter Optimization
The random forest regression model is a set of n decision trees {T 1 (X), . . . , T n (X)}, where x = {x 1 , ..., x p } is the p-dimensional vector of the features related to the target variables; the result is the output value of n trees {Y 1 = T 1 (X), . . . , Y n = T n (X)}, and Y n is the predicted value of the nth tree. For regression problems, Y is the average of the predicted values of a single tree [46][47][48]. The random forest model has the following five important parameters: the maximum depth of the tree (max_depth), the number of features in the feature subset (max_features), the minimum number of leaf node samples (min_samples_leaf), the minimum number of node partition samples (min_samples_split), and the number of decision trees (n_estimators). With the root mean square error (RMSE) as the evaluation index and M = {(X 1 , Y 1 ), . . . , (X n , Y n )} as the training set, the training process of the random forest regression model is as follows: (1) Randomly generate m variables for the binary tree on the node, the choice of binary tree variables satisfies the principle of minimum Gini impurity. GridSearchCV and RandomizedSearchCV are two commonly used methods in hyperparametric optimization. The principle of GridSearchCV is simple, i.e., the program tries each set of hyperparameters one by one, and then selects the best group. This process is time-consuming and is a dimensional disaster. In 2012, James Bergstra and Yoshua Bengio proposed the RandomizedSearchCV method for parametric optimization [47]. The introduction of RandomizedSearchCV can effectively improve the efficiency of optimization, but the solution is not necessarily the optimal solution. This is the first study to use RandomizedSearchCV to obtain the approximate range of the optimal solution, and then GridSearchCV is used to obtain the optimal solution by specifying a small range for the obtained results.

Model Evaluation and Validation
For the goodness of fit of each model, R 2, MAE, MSE and RMSE were used as the test indexes in this study. The calculation formula of each test index is as follows: In the above formulas, n is the number of observed samples, y i is the actual crown radius of the ith observed tree,ŷ i is the predicted crown radius of the ith observed tree, and y is the average of the actual crown radius of all observed samples.

Mathematical Modeling
The transformed Models (1)-(9) were regressed by the ordinary least squares algorithm, and the Models (10)- (14) were regressed by the least squares method, the parameter fitting results of each model are shown in Table 3. Note, the symbol "/" means the parameter is not included in the model. Table 4 shows the evaluation results of the mathematical modeling Chinese fir CCEM. The models with the best fitting effect in the training set are Models (7), (6), (4), (10), and (5); in the test set, the models with the best fitting effect are Model (7), (6), (4), (12), and (10). Under the constraint of LCR, quartic polynomial and cubic polynomial with RDINC T as the variable have the best fitting effect. With the addition of DBH to Model (1) (i.e., Model (12)), the fitting precision is slightly improved; with the addition of HT and N to Model (9) (i.e., Models (13) and (14), respectively), the improvement in fitting precision is not obvious. Therefore, the modified model suitable for specific areas or specific tree species is difficult to apply to other regions or other tree species. To obtain a more accurate CCEM suitable for Chinese fir, the basic model needs to be revised, and the process of adding variables and parameter fitting is difficult.
The residual plots of Models (4), (6), (7), and (10) are shown in Figures 3 and 4. Figure 3 shows the residual diagram of the training set models, and Figure 4 shows the residual diagram of the test set models. The residual error of both the training set and the test set models shows a clear trumpet shape and wide fluctuation range.

Random Forest
The

Random Forest
The   The RFE feature screening results are as follows: RMSE is positively correlated with the number of variables. The RMSE is lowest and the accuracy is the highest with 14 feature combinations, indicating that model accuracy does not continually improve as more variables are used for modeling. When the number of features is greater than nine, RMSE does not significantly change ( Figure 6). Therefore, DINCT, LCR, RDINCB, DINCB, RDINCT, AGE, HBLC, CLC, and DBH were used as the modeling features. The RFE feature screening results are as follows: RMSE is positively correlated with the number of variables. The RMSE is lowest and the accuracy is the highest with 14 feature combinations, indicating that model accuracy does not continually improve as more variables are used for modeling. When the number of features is greater than nine, RMSE does not significantly change ( Figure 6). Therefore, DINC T , LCR, RDINC B , DINC B , RDINC T , AGE, HBLC, CLC, and DBH were used as the modeling features. The RFE feature screening results are as follows: RMSE is positively correlated with the number of variables. The RMSE is lowest and the accuracy is the highest with 14 feature combinations, indicating that model accuracy does not continually improve as more variables are used for modeling. When the number of features is greater than nine, RMSE does not significantly change ( Figure 6). Therefore, DINCT, LCR, RDINCB, DINCB, RDINCT, AGE, HBLC, CLC, and DBH were used as the modeling features.     The random forest screening results are as follows: The variables DINCT, LCR, RDINCB, RDINCT, and DINCB have higher scores; the variables AGE, CLC, DBH, HD, and HBLC have intermediate scores; and CH, HT, N, and CL have lower scores (Figure 8). Therefore, DINCT, LCR, RDINCB, RDINCT, DINCB, AGE, CLC, and DBH were used as the modeling features. According to the characteristics of the tree crown dataset and the parameter optimization process of the random forest algorithm, the number of iterations was set to 100, the RMSE was used as the evaluation standard, and five-fold cross-validation was used for hyperparameter optimization. Table 5 shows the setting of the hyperparameter range and the results of RandomizedSearchCV and GridSearchCV.  According to the characteristics of the tree crown dataset and the parameter optimization process of the random forest algorithm, the number of iterations was set to 100, the RMSE was used as the evaluation standard, and five-fold cross-validation was used for hyperparameter optimization. Table 5 shows the setting of the hyperparameter range and the results of RandomizedSearchCV and GridSearchCV. Therefore, the random forest regression model was established by setting max_depth = 5, max_features = auto, min_samples_leaf = 10, min_samples_split = 4, and n_estimators = 700. Table 6 shows the results of CCEMs based on random forest regression model. Using R 2 as the evaluation indicator, the order of the accuracy of the models established by the four feature screening methods in the training set from high to low is MI, RF, RFE, and LASSO; in the test set, the order of the accuracy of the model established by the four feature screening methods from high to low is MI, RFE, RF, and LASSO. For both the training set and the test set, the random forest regression model established by MI as the feature screening method has the highest accuracy. Using RMSE as the evaluation indicator, the model established by MI as the feature screening method has a minimum value of RMSE in the training set and the test set. In summary, the random forest regression model established by MI as the feature screening method has the best effect for predicting the crown radius, which has some significance and practical value.   9 and 10 show the residual plots of the training set and test set of the random forest regression models established by four feature screening methods. The random forest regression model established by LASSO as the feature screening method has a clear trumpet shape and shows heteroscedasticity in the training set. In the test set, the heteroscedasticity of all models is similar. For combined model evaluation accuracy and residual plot, the random forest regression model constructed based on the MI as the feature screening method performs best for predicting tree crown contour envelope of Chinese fir.

Selection Method
Training Set  9 and 10 show the residual plots of the training set and test set of the random forest regression models established by four feature screening methods. The random forest regression model established by LASSO as the feature screening method has a clear trumpet shape and shows heteroscedasticity in the training set. In the test set, the heteroscedasticity of all models is similar. For combined model evaluation accuracy and residual plot, the random forest regression model constructed based on the MI as the feature screening method performs best for predicting tree crown contour envelope of Chinese fir.

Discussion
The tree crown is important for evaluating the growth vigor of trees and the status of competition with adjacent trees. Forest stand 3D visualization is also an important part of the decision-making system for plantation growth and harvest. In the early 3D visualization of forest stands, trees were only defined using some simple geometry, such as cylinders and cones. Such an approach could not accurately capture the actual growth of trees. In the 1980s, some researchers applied the concept of fractals to the visualization of the tree crown contour. Although this method could capture the shape of the tree crown

Discussion
The tree crown is important for evaluating the growth vigor of trees and the status of competition with adjacent trees. Forest stand 3D visualization is also an important part of the decision-making system for plantation growth and harvest. In the early 3D visualization of forest stands, trees were only defined using some simple geometry, such as cylinders and cones. Such an approach could not accurately capture the actual growth of trees. In the 1980s, some researchers applied the concept of fractals to the visualization of the tree crown contour. Although this method could capture the shape of the tree crown contour to a certain extent, there was no otherness in the tree crown contour based on this method, and the fractal parameters were not easy to determine. The method of taking the tree crown contour as a continuous and complete line segment and expressing it with a specific function expression has been considered. In the early stage, there were only two parameters in this equation: DINC T (or DINC B ) and CR. Therefore, the model based on this model was used to describe tree crown uniformity, but the shape of the tree crown contour was different in different growth stages. Consequently, some researchers tried to add some variables, such as AGE, N, DBH, and CL into the equation. However, AGE is strongly correlated with the DBH, CL, and other variables. Therefore, adding these variables to modify the mathematical modeling can improve the accuracy of the model and better reflect the differences among trees, but the determination and modification of the model form are difficult; furthermore, the model forms of different tree species and different ages need to be considered comprehensively. Among the models mentioned in this paper, the HT, N, AGE, CH, and other variables do not show noticeable improvement in the model accuracy.
The results of random forest regression showed that the addition of multiple tree characteristic factors improved the fitting accuracy of Chinese fir crown contour envelope. In addition, the precision of random forest regression model constructed by different combinations of tree characteristics was also different. Therefore, using a single factor such as HT and AGE, and composite factors such as CR and CLC to predict Chinese fir crown contour envelope could prove to be useful. In both the training set and test set, the simulation accuracy and model interpretation were higher for the random forest regression model than the mathematical regression model, and the overall effect of random forest regression model was better. The results of variable importance analysis showed that the main factors affecting the Chinese fir crown contour envelope in Chinese fir plantation were LCR, N, AGE, DBH, and HT. Among these factors, LCR had the most significant effect on the Chinese fir crown contour envelope.
The CCEM based on the random forest regression method does not need to consider the correlation between variables, and the process is relatively simple. Therefore, we can select different forms of variable combinations to select the best group to build a random regression forest model. In this study, the random forest regression models constructed by four feature selection methods showed high performance; the best was the random forest model constructed by MI. The reserved features of this method were N, AGE, DBH, HT, HBLC, LCR, CLC, DINC T , DINC B , RDINC T , and RDINC B . Among these variables, N and AGE were the initial factors, and the DBH, HT, HBLC, and LCR had mature growth models with AGE and N and SI (site index) as variables and its distribution model; the other composite factors could be calculated from the above single factor [49][50][51][52]. Therefore, the CCEM based on the random forest has higher accuracy than the CCEMs based on mathematical modeling, and it can describe different shapes of tree crown at various stages of growth. Therefore, the random forest CCEM can accurately reflect differences in tree crown morphology among forests. Thus, the forest stand 3D model is of great significance for a 3D visualization of a plantation and for the management of plantation growth and harvest. Figure 11 shows the crown contour envelope of a Chinese fir plantation with 5-year, 10year, 15-year, 20-year, and 25-year standard trees. The X-axis is DINC T , and the Y-axis is CR. The mathematical modeling regression model and random forest regression model selected the best performing Model (7), i.e., the random forest regression model based on MI. The 5-year, 10-year, and 15-year prediction results show that the CCEM based on random forest regression model is superior to the mathematical modeling regression model. For 20-yearold Chinese fir, the random forest prediction result is slightly better than the mathematical modeling, whereas the prediction results of 25-year-old Chinese fir are close. In general, the random forest method has higher fitting accuracy than mathematical modeling. stages of growth. Therefore, the random forest CCEM can accurately reflect differences tree crown morphology among forests. Thus, the forest stand 3D model is of great signi icance for a 3D visualization of a plantation and for the management of plantation grow and harvest. Figure 11 shows the crown contour envelope of a Chinese fir plantation with 5-yea 10-year, 15-year, 20-year, and 25-year standard trees. The X-axis is DINCT, and the Y-ax is CR. The mathematical modeling regression model and random forest regression mod selected the best performing Model (7), i.e., the random forest regression model based o MI. The 5-year, 10-year, and 15-year prediction results show that the CCEM based on ra dom forest regression model is superior to the mathematical modeling regression mode For 20-year-old Chinese fir, the random forest prediction result is slightly better than th mathematical modeling, whereas the prediction results of 25-year-old Chinese fir a close. In general, the random forest method has higher fitting accuracy than mathematic modeling. Figure 11. Crown contour envelope of 5-year-old to 25-year-old Chinese fir. Error1 is the absolute error between true value and random forest; Error2 is the absolute error between true value and Model (7). The black points are crown radius of 1/10 CL, 1/4 CL, 1/2 CL, 3/4 CL, and 9/10 CL from crown top to crown bottom. Figure 11. Crown contour envelope of 5-year-old to 25-year-old Chinese fir. Error1 is the absolute error between true value and random forest; Error2 is the absolute error between true value and Model (7). The black points are crown radius of 1/10 CL, 1/4 CL, 1/2 CL, 3/4 CL, and 9/10 CL from crown top to crown bottom.
One advantage of the mathematical modeling approach is that it is highly generalized; consequently, the CCEM constructed by mathematical modeling is relatively simple. All of the trees it describes in a stand have the same crown shape, but this does not apply to the requirements of modern forestry precision management. A CCEM based on random forest can accurately reflect the differences among trees in a stand combined with existing stand distribution models, such as the HT distribution model and the DBH distribution model. Covariables such as HT and N can also be added to the mathematical model to improve the prediction accuracy. However, the form of the model is extremely difficult to determine, the fitting is more difficult, and its generalization is also reduced to some extent. For example, Chengde Wang added covariates DBH, HT, and N to Models (1) and (9) to obtain Models (12)- (14), and the results showed that adding covariates effectively improved the fitting accuracy [38]. However, in our study, improvements associated with adding covariates were small. If more covariables are added, the model form becomes more difficult to control. For example, Model (7) with three covariables had the lowest prediction accuracy among all of the CCEMs. Another advantage of random forest is that it can help to identify the tree factors most closely related to the crown in the process of feature screening, which aids the study of the crown shape. According to the four feature screening methods in this study, AGE, DBH, N, and HT significantly affect crown shape. Aiming at extended the study case problem of machine learning black box, several random sampling tests were carried out for further discussion in this paper. The splitting of the dataset was the common method; the dataset was randomly split 200 times into the model training set (70%) and the test set (30%). After selecting, validating, and comparing the parameters, the final parameters showed powerful stability in multiple iterations. The ultimate goal establishes that CCEMs aid plantation management. Therefore, CCEMs that are constructed based on random forest can be tailored to specific areas. As the amount of sample data increases, the prediction accuracy of the CCEM increases, because the random regression forest provides a robust method for dealing with the similar samples. For Chinese fir in other areas or other tree species, the feature combination form or hyperparameter optimization scheme used in this study may not be optimal, but if sample data are sufficient, the method described in this paper could still be used to construct CCEMs based on random forest in target areas or tree species.

Conclusions
Crown attributes are key components of growth and yield models because crown shape and size influence production efficiency, which is directly related to growth and mortality [53,54]. HT, DBH, CL, and other tree factors have a substantial effect on crown shape. The main objective of the CCME based on random forest in this study was to predict the crown radius at any crown depth, which might improve the prediction accuracy. Compared with previous research on CCEM, we considered using random forest regression model and adding more basic tree factors and compound tree factor to predict the crown contour envelope of Chinese fir, which showed a significant improvement as compared with the mathematical modeling. This model has higher accuracy and more easily describes differences among trees. We can use different feature combinations to construct the random forest regression model to predict the crown contour envelope. The CCEM constructed by using MI, LASSO, RFE, and RF as feature selection methods performed well, but MI was the best approach. The newly developed CCEMs were important for predicting crown shape and also provided a new method for the study of crown shape of Chinese fir plantation. In future research, we plan to build an automated variable screening and hyperparameter optimization program to rapidly construct the CCEMs of target tree species. Funding: This study was funded by the Key National Research and Development Program of China (project no. 2017YFD0600906). The authors would also like to thank the reviewers for their comments, which were helpful in improving the manuscript.