Modeling Height–Diameter Relationship for Poplar Plantations Using Combined-Optimization Multiple Hidden Layer Back Propagation Neural Network

: Relationship of total height and diameter at breast height (hereafter diameter) of the trees is generally nonlinear, and therefore has complex characteristics, which can be accurately described by the height-diameter model developed using the back propagation (BP) neural network approach. The multiple hidden layered-BP neural network has several hidden layers and neurons, and is therefore considered more appropriate modeling approach compared to the single hidden layered-BP neural network approach. However, the former approach is not widely applied for tree height prediction due to absence of the e ﬀ ective optimization method, but it can be done using the BP neural network modeling approach. The poplar ( Populus spp. L.) plantation data acquired from Guangdong province of China were used for evaluating the BP neural network modeling approach and compared its results with those obtained from the traditional regression modeling and mixed-e ﬀ ects modeling approaches. We determined the best BP neural network structure with two hidden layers and ﬁve neurons in each layer, and logistic sigmoid transfer functions. Relative to the Mitscherlich height-diameter model that had the highest ﬁtting precision among the six traditional height-diameter models evaluated, coe ﬃ cient of determination ( R 2 ) of the neural network height-diameter model increased by 10.3%, root mean squares error (RMSE) and mean absolute error (MAE) decreased by 12% and 13.5%, respectively. The BP neural network height-diameter model also appeared more accurate than the mixed-e ﬀ ects height-diameter model. Our study proposes the method of determining the optimal numbers of hidden layers, neurons of each layer, and transfer functions in the BP neural network structure. This method can be useful for other modeling studies of similar or di ﬀ erent types, such as tree crown modeling, height, and diameter increments modeling, and so on.


Introduction
Tree height is one of the most important tree characteristics and measurement of which is used as a fundamental basis for evaluating forest growth and biomass, site quality, and classifying the vertical structures of a forest [1,2]. Direct measurement of tree height is generally difficult and time consuming. However, due to a strong relationship between tree height and diameter at breast height (DBH), height can be predicted using DBH as a predictor in the height-diameter model [3][4][5][6][7][8][9][10][11][12]. This method uses the measurements of tree height and DBH to fit the mathematical functions with different forms and number of parameters, and the optimal one is determined based on the standard statistical indices. This modeling approach is generally known as traditional modeling, and its main theme is to establish the mathematical equations and get the tree prediction by solving them [3][4][5][6][7][8][9][10][11][12]. However, tree height growth is substantially affected by various factors whose relationships may be in the nonlinear forms, which may pose the difficulty in describing wider variations of the tree height with a single height-diameter equation.
As mentioned above, generally, tree height growth has nonlinear characteristics, and is strongly correlated to various factors, such as tree size, site quality, stand density or competition and climate factors. Site factors consist of slope, altitude, soil depth, soil texture, humus layer, and soil chemical constituents. Competition is attributed to the stand crowding and density, such as number of trees, stand basal area, and canopy density. Climate factors include solar radiation, temperature, and precipitation. Because of the difficulty in acquiring the accurate information for all these factors and easy-to-apply-purpose, height prediction models are usually developed using DBH as a single predictor (simple model) or stand variables, such as basal area and number of trees per hectare are used in the models in addition to DBH (generalized model) or model incorporating DBH, stand variables and random effects (generalized mixed-effects model). In order to develop these simple or complex types of the height-diameter models, some versatile growth functions [1,[9][10][11][12] are used and fitting of these functions to data using the least square regression is generally known as traditional modeling approach. However, in recent years, there has been an increasing trend of applying the mixed-effects modeling approach to account for larger variability of tree height at the subject-level (e.g., sample plot level) and increase the model's prediction accuracy [4,6,7].
The back propagation (BP) neural network is one of the machine learning methods, which is a multiple layer feed-forward network trained by an error inverse propagation algorithm. The BP neural network is a modern modeling approach and can be used to develop various forest models. The BP neural network is often composed of the input layer, hidden layer and output layer. The BP neural networks can realize the mapping function from input to output, and can approximate any nonlinear continuous function with high precisions. The BP neural network is characterized with transfer functions that can be selected between the input layer and hidden layers, and hidden layers have different functions, such as logistic sigmoid and tangent sigmoid functions. The transfer functions selected between the layers are also different, and therefore their expressions are different. Thus, the neural network properly represents the various forms of nonlinear effects. In recent years, the neural network is increasingly applied for predicting forest dynamics with precise results. The neural network modeling approach was used to predict different stand and individual tree characteristics, such as height [13], diameter distribution [14,15] and stem volume [16], and to establish the models of height-diameter relationships [17], growth and yield [18], inside-bark diameter and heartwood relationships [19], and to assess forest biomass [20]. These studies compared the fitting precisions of the traditional regression models of different forms with the neural network models and showed higher precisions of the neural network models. Özçelik et al. [17] and Castaño-Santamaría et al. [13] compared the mixed-effects models with neural network and their results showed higher precisions of the mixed-effects models than those of the neural network models. All these studies were based on a single hidden layer neural network, and therefore their analyses lack sufficient performance analyses and comparisons of the multiple hidden layered-neural networks. Furthermore, none of the previously applied neural network modeling approach has proposed the methods for determining the optimal structure of neural network (determining optimal number of hidden layers and number of neurons in each hidden layer of neural network).
Since tree height-diameter relationship is substantially affected by various factors that may be nonlinearly related, the traditional height-diameter equation cannot accurately simulate growth and development of the tree height. This method has other shortcomings, such as low fitting precision and complex operational steps associated with the fitting procedures. However, the BP neural network modeling approach has both the higher fitting efficiency and higher precision, and therefore has a great usefulness in the forest modeling researches. However, current application of the neural network in the tree height-diameter modeling is limited to a single hidden layer neural network, which lacks the sufficient performance analyses and comparison of the differences in optimizing the neural network structure. This study thus intends to solve this problem, mainly improving the performance of the neural network structure through height-diameter modeling.
Using the poplar tree height and DBH data collected from Guangdong Province in China, this study establishes the multiple hidden layered-BP neural network height-diameter model and analyzes the difference of a single hidden layered-and multiple hidden layered-BP neural network modeling approaches using the MatLab 2016b software (MathWorks, Natick, MA, USA). This study also compares the fitting precision of the optimal BP neural network height-diameter model with that of the traditional regression height-diameter models and mixed-effects height-diameter models. The presented result will be important basis for developing height-diameter models using the BP neural network and predicting tree height. The proposed methods can be useful for other modeling studies of similar or different types, such as tree crown modeling, height and diameter increment modeling, and so on.

Data Materials
The data we used came from the sample plots that were established on the poplar plantations in the Guangdong province of China to develop the height-diameter models. The square-shaped sample plots with an area of 666.67 m 2 were established in the plantations. We only used the sample plots with a stand density of more than 300 trees per hectare and normal records in the tree height. A total of 9659 trees in 112 sample plots (of which 20 sample plots were measured in 1997 and 92 sample plots in 2002) were utilized for modeling height-diameter relationship. We calculated the means of height (hereafter height) and means of DBH (hereafter DBH) by sample plots for easy-to-fit purpose, especially for neural network fitting. We divided the sample plots randomly into two parts: one for training the model (80 sample plots, also defined as a fitting data set) and another for testing the model (32 sample plots, also defined as a validation data set) by application of the k-fold cross-validation method. Summary statistics of both fitting and validation datasets are presented in Table 1 and scattered graph of tree height against DBH is presented in Appendix A ( Figure A1).

Modelling Approach
We developed height-diameter models using three different modeling approaches: traditional least squares regression, mixed-effects modeling, and artificial neural network approach. We focused more on modeling height-diameter relationship using the last approach, for example, BP neural network. The optimal BP neural network height-diameter model obtained from several alternative models was compared against the height-diameter models fitted using the traditional regression and mixed-effects modeling approaches.

Traditional Approach
This involves fitting of the traditional height-diameter functions using ordinary least square regression implemented by the nls function in R software (version 3.2.2) based on fitting data set [21]. We considered six commonly used versatile height-diameter equations ( Table 2) for the purpose. Since all these are the power exponential equations, they are more complex in fitting compared to other forms of the equations (e.g., linear and fractional forms) and choosing the best performing one would be more difficult also. Table 2. Traditional height-diameter equations (H, height (m); DBH, diameter at breast height (cm); a, b, and c are parameters to be estimated).

Name of Equation Form of Equation Source
Richards

Mixed-Effects Modeling Approach
We considered the sample plot-level effect as a random effect to establish the mixed-effects height-diameter model. In order to get the convergence with the global minimum, a relatively less complex function (Schumacher function, Table 2) was chosen to include the random effect. Our main intention of developing mixed-effects height-diameter model was to compare its performance against the model obtained from the BP neural network modeling approach.
We evaluated three different variance-stabilizing functions (exponential function, power function and power function with constant) for their effectiveness in removing the heteroskedasticity problem. Akaike's information criterion (AIC), Bayesian information criterion (BIC), and log likelihood (logLik) criteria were used to select the most effective variance-stabilizing function.
The parameters in the developed mixed-effects height-diameter model were estimated by maximum likelihood using the Lindstrom and Bates (LB) algorithm implemented in the R software (version 3.2.2) nlme function based on fitting dataset [27]. Detailed descriptions of the mixed-effects modeling are presented in the references [28][29][30][31].

BP Neural Network
As pointed out in the introduction section, artificial neural network has been increasingly applied to forest growth and yield modeling in recent years [13][14][15][16][17][18][19][20]. It has the tremendous advantages on nonlinear mapping, adaptive generalization and fault tolerance, which can make up of the shortcomings of traditional modeling approaches. However, most of the existing modeling studies are based on the single hidden layer neural network, and application of the multiple hidden-layered-neural network, e.g., BP neural network in forestry has been rarely reported. The main reason for this is due to the absence of optimization method used to predict the hidden layer numbers, the number of nodes and the transfer functions of the neural network. The neural network modeling studies have shown that more the complex problem, the higher would be usefulness of the multiple hidden layers [32]. The BP neural network is suitable for function approximation, pattern recognition and classification [33]. Considering the above-mentioned advantages of the BP neural network, we developed the neural network height-diameter models in this study, which were expected to be more accurate than those obtained from traditional regression and mixed-effects modeling approaches. The structural parameters of the BP neural network include the number of hidden layers, number of nodes in each layer, and transfer functions between the layers [34].

•
Setting up of the BP Neural Network Structure We established the tree height-diameter model based on the multiple hidden-layered BP neural network. For this, firstly, we set the range and step size of the hidden layers, number of nodes and transfer functions, and secondly, the values in a reasonable range were obtained, so as to generate a series of neural network height-diameter models. Finally, we used the k-fold cross-validation to identify the best performing one among several height-diameter alternatives.
The number of nodes in each hidden layer needs to be determined according to fitting precision. Equation (1) represents a commonly used determination method [35].
where S is the number of hidden layer nodes, n is the number of nodes in the input layer, and o is the number of nodes in the output layer, and m is an integer (m = 1,2, . . . ..,10). The transfer functions, which occur between the hidden layers or between the input layer and hidden layer, are S-shaped logistic sigmoid and tangent sigmoid functions. The former is a unipolar S-function and the latter is a bipolar S-function. The expressions of logistic sigmoid function and tangent sigmoid function are represented by Equation (2) and Equation (3), respectively. The transfer function occurring between the hidden layer and output layer is a linear function, and its expression is represented by Equation (4): In order to get the best BP neural network structure for a given data, we set the selection range of the number of hidden layers, the number of nodes in each layer and the transfer function, and then generated several height-diameter neural network models. Based on the mean squared error and the number of iterations, the best performing model was identified. The exhaustive analyses of all the network models would not be appropriate. Thus, we applied the "trial and error approach" [36] to reduce the number of tests and applied the k-fold cross-validation [37] to improve the test results of the BP neural network structure. •

Normalizing Input and Output Factors
As the input factors may have different measurement units, the existence of singular samples in data would cause an increased network training time and it may also lead to the non-convergence of neural network. In response to this problem, we used the mapminmax function to normalize the input and output factors and mapped them to a scale between −1 and 1. The anti-normalization approach was used to program the results of the operation from the interval [−1,1] mapped to an actual prediction.

•
Training the Model We trained the BP neural network applying the Levenberg-Marquatdt (L-M) algorithm [38,39]. This algorithm does not follow a single negative gradient direction for each iteration, but allows the errors to be searched in the direction of deteriorating. At the same time, through the adaptively adjustment of the steepest gradient descent method and the Gaussian-Newton method optimizes the network weight, so that the neural network can effectively converge. Equation (5) is used for adjusting the weights and thresholds.
where ∆w is the adjusted weights and thresholds, I is unit matrix, J is the Jacobian matrix of the error-weight differential, and e is the vector of errors, µ is an adaptively adjusted scalar that increases as it approaches the steepest descent method with small learning rate, and when it descends to 0, the algorithm becomes a smooth harmonic between the Gauss-Newton methods. While training the model, the parameters were set as follows: learning rate 0.01, maximum number of iterations 1000, target precision 0.001, maximum number of verification failures 20, and minimum performance gradient 0.000001.

Model Evaluation
We used the coefficient of determination (R 2 ), root mean square error (RMSE) and mean absolute error (MAE) as evaluation indices to compare the models based on validation data set. These indices were calculated using formulae (6), (7), and (8), respectively.
where n is the number of samples; Y, Y i , andŶ are mean value, measured value and predicted value of a response variable in the model (tree height, in our case), respectively. Theoretically, the closer the determination coefficient to 1, the smaller the root mean square error and the mean absolute error, and the higher would be the model's fitting precision. •

Model Selection
We first obtained a series of the BP neural network height-diameter models by setting different values of the structural parameters. Then after, the k-fold cross-validation method was used to select the most suitable model [37]. When there is a sample set S containing m data records, and t models to be chosen are M 1 , M 2 , . . . , M t , k-fold (k = 5) cross-validation procedures would be as follow: Step 1. A sample set S is randomly divided into k disjoint subsets, the number of samples in each subset is m/k, and these subsets are denoted by S 1 , S 2 , . . . . . . , S k .
Step 2. For each model M j ( j = 1, 2, · · · , t), following is done: For n = 1 to k { Take S 1 ∪ · · · ∪ S n−1 ∪ S n+1 ∪ · · · ∪ S k as a training set; Train the model M j , and get the corresponding hypothetical function H jn ; Take S n as a verification set, and calculate the model M j generalization error ε S n H jn . } Calculate the average of ε S (H jn ),n = 1, 2, · · · , k, and get the average generalization error of model M j . Step 3. Calculate the average generalization error of all the models, and select the model M p with the smallest average generalization error, which is the best model.
It is noted that, as Arlot and Lerasle [40] recommended, five-fold (k = 5) cross-validation was applied in this study.
In general, the mean square errors are used to represent the generalized errors ε S n H jn , as shown in Equation (9).
where, n is the number of samples, Y i ,Ŷ i , respectively, for all observed height values and model predicted height values.
In addition to the generalized error ε S n H jn , the number of iterations, running time, and other criteria are also used to select the best performing model.
Tree height-diameter modeling process using the BP neural networks is shown in Figure 1. According to data situation and actual demand, we first set the implied layer number, the number of hidden layer and the number of nodes, range of values of the transfer function. Then, after we used the "trial and error approach" to determine the actual value of these structural parameters and generate the N number of height-diameter models. Finally, the optimum BP neural network height-diameter model was selected through the k-fold cross-validation.

Model Generation and Performance
We used MatLab 2016b (MathWorks, Natick, MA, USA) to build the neural network height-diameter models through writing M program. Several neural network height-diameter model alternatives were generated when we set the number of hidden layers from 1 to 3, the number of neurons in each hidden layer from 2 to 11 with the step size of 3, the neurons adapted through "trial and error approach", and logistic sigmoid and tangent sigmoid transfer functions employed with DBH and height as input layer and output layer. The model with the smallest RMSE and MAE and the largest R 2 was then identified using the k-fold cross-validation with k = 5. The iterative results are listed in Tables 3 and 4, and more results are in Appendix A (Table A1). The MSE and the number of iterations were used as evaluation indices in screening the models and the performance statistics of different numbers of the hidden layers corresponding to the neural networks are presented in Table 5. When the number of hidden layer was 1, there were 8-networks structure combinations. When the number of hidden layers was 2, there were 64 network structure combinations. When the number of hidden layers was 3, there were 512 network structure combinations. Difference of the average MSE of the double hidden layers from that of the triple hidden layers was not substantially large even though the neural network with the double hidden layers, the structure of which is 1:5:5:1, had the smallest MSE. There was an indication that this structure, which provided the best precision, could be used as an optimal neural network structure of the height-diameter model.
There was a slight difference between the double-hidden layer and single hidden layer in the number of iterations, but it was much lower than the three-hidden layer, which showed that the convergence rate of the double-hidden layer and single hidden layer were almost similar but slightly higher than the three-hidden layers. Taken together, double hidden layer had a higher precision. We selected the double hidden layers (Table 4). When the number of neurons in each layer was 1:5:5:1, and the logistic sigmoid functions were all selected, the minimum value of MSE was 0.0416. The neural network height-diameter models generated with structure had the best fitting performance. The number of iterations required to generate the best neural network model was not necessarily the minimum number of iterations corresponding to the hidden layer ( Table 5). The reason for this is that the number of iterations required to obtain the minimum increase of MSE.

Transfer Function
We generated several neural network models ( Figure 2) and selected the best performing one. This figure shows that the number of neurons of both the first hidden layer and second hidden layer were 5; the first hidden layer neurons were H 11  respectively. The transfer functions of the input layer and the first hidden layer was the logistic sigmoid function, the transfer function of the first hidden layer and the second hidden layer was also the logistic sigmoid function, and the output layer was the purelin function.

Comparison with Traditional Model Accuracy
We compared the height-diameter models ( Table 2) fitted using ordinary least squares regression and BP neural network height-diameter model based on three evaluation indices (Table 6). Parameter estimates of all the traditional height-diameter models were significant (p < 0.05) and they are presented in the Appendix A (Table A2). The BP neural network modeling approach appeared better than the traditional regression approach for establishing the tree height-diameter models. Among the traditional models evaluated (Table 2), the highest fitting precision was found with the Mitscherlich model, and the lowest fitting precision was with the Logistic model. The BP neural network height-diameter model had larger R 2 (by 10.3%), and smaller RMSE (by 12%) and MAE (by 13.51%) than those of the Mitscherlich height-diameter model. The tree heights predicted from the best BP neural network height-diameter model and the Mitscherlich height-diameter model were compared against the observed height ( Figure 3). The prediction accuracy of the former model shows a substantially higher accuracy than the latter model. The residuals for the BP neural network model are concentrated around 0 (Figure 4), indicating that this model has better fitting performance. Using the two intervals of (−1,1) and (−2,2) in which our data points scattered mostly as examples, Figure 4 shows that using the BP neural networks height-diameter model, 20 points of the residual are within the range of (−1,1), accounting for 62.5%, and range of (−2,2) has 30 points, accounting for 93.75%. However, using the Mitscherlich height-diameter model, 15 points are within the range of (-1,1), accounting for 50%, and range of (−2,2) has 28 points, accounting for 87.5%. Thus, the precision of the BP neural network model appeared higher than the Mitscherlich model.

Comparison with Mixed-Effects Model Accuracy
We included random effects at sample plot-level into the simplest traditional height-diameter model (Schumacher model) among the six models presented in Table 2. The random effect added to the parameter b of this model converged with the smallest AIC and BIC, and the largest log likelihood among the alternative mixed-effects models formulated through addition of the random effect to the fixed-effect parameters. Exponential function showed the most powerful ability to account for the heteroskedasticity and thus was used to develop the final mixed-effects height-diameter model ( Table 7). The mixed-effects height-diameter models with estimated parameters are presented in Equation (10).
where H i and DBH i are the stand mean height and stand mean diameter at breast height of the ith sample plot, u i is the random effects generated by the ith sample plot and assumed to be distributed normally with zero expectation and a variance-covariance matrix ψ, and ε i is the error term of ith sample plot and also assumed to be distributed normally with zero expectation and a variance-covariance matrix R i . The fitting precision of the mixed effects height-diameter model (R 2 = 0.7179, RMSE = 1.1926, and MAE = 0.9888) was substantially lower than that of the neural network height-diameter model, but higher than that of the traditional height-diameter model (Equation 10, Table 6). We considered that when a mixed effects model was used, it would be equivalent to adding more input factors. Then, it was necessary to change the structure of the neural networks and added random effect factor as the input to the neural network for comparison purpose, that is, it would be more meaningful to compare the same number of input factors.

Discussion
We established the modeling method that could generate several BP neural network height-diameter models based on the combinatorics mathematics. Among these models, we selected the best model through the comparison of the fitting and prediction accuracies, and convergence rates. The BP neural network optimization method was employed to establish the optimal height-diameter model for poplar plantations in the Guangdong Province in China. The poplar tree species has been becoming one of the main plantation tree species in China in recent decades due to its faster growing characteristics, and is recognized as a focus for research of the woody plants and ideal materials for bioenergy research. Also, it is of great importance for taking poplar as a research objective in our study. This study compares the fitting performances of a single hidden layered-and multiple hidden layered-BP neural network approaches. When there were two hidden layers, the higher performance was obtained, i.e., the neural network with double hidden layers is attributed to a higher fitting precision, higher estimating efficiency, better acceptable time of the iterations. This is because that, increasing the number of hidden layers may not only result in a longer computational time, but also increases the likelihood of over fitting, which results in the model's non-optimal prediction performance. In the past, most studies focused on single hidden layer neural networks [13][14][15][16][17][18][19][20], but none of them investigated the effects due to more numbers of hidden layers, neurons and transfer functions of the neural networks. In this context, our study, which focused on these features of the neural network modeling, may be interesting and useful to other researchers.
The traditional Mitscherlich height-diameter model with the highest fitting precision was compared with the optimized multiple hidden layered-BP neural network height-diameter model. The BP neural network model appears substantially superior to the Mitscherlich model in predicting tree height ( Table 6, Figures 3 and 4). Our results are also consistent with those from Castaño-Santamaría et al. [13] and Özçelik et al. [17], which predict tree height of the uneven-aged beech forests in northwestern Spain and Crimean juniper in southwestern region of Turkey. These studies compared the neural network models against the nonlinear regression models. Although our study is based on the different tree species from those studied by Castaño-Santamaría et al. [13] and Özçelik et al. [17], comparison results are almost identical, meaning that neural networks can be the best alternative of modeling on any tree data regardless of species. Özçelik et al. [17] used the single hidden layer with only one or two hidden nodes in the neural network and they did not investigate the effects of multiple hidden layers on the precision of the neural network model and determination of appropriate forms of the transfer functions. Our study is substantially different from the previous studies [13][14][15][16][17][18][19][20], because we proposed the method of selecting the optimal model through application of the "trial and error approach" [30], k-fold cross-validation approach [31] and combinatorial optimization approach. It can help determining the structure of the neural network, such as the hidden layer nodes, transfer functions and the number of hidden layers. The hidden layers, number of neurons in each layer, and transfer functions can have substantial effects on the precision of the neural network model. Castaño-Santamaría [13] considered the change of input factors, but did not take into account other factors, such as different transfer functions of the neural network, and did not introduce the process of determining the optimal neural network model. Castro et al. [18] established the multi-layer perceptron neural network growth model for Eucalyptus. They estimated the annual mortality with the best structure associated with three neurons in the input layer, four neurons in the hidden layer, and one neuron in the output layer. All these studies [13,17,18] compared the precisions using different input variables, but none of them compared different numbers of transfer functions and hidden layers, and neurons of neural networks.
In our study, the neural network modeling produced the highest fitting precision and prediction accuracy, followed by mixed-effects models, and finally non-linear traditional regression models ( Table 6, Equation (10)). This might be related to our employed methods of the structural optimization of the BP neural network. Fitting precisions of three different modeling approaches, such as traditional regression, mixed-effects modeling and neural network modeling were also compared in the previous modeling studies [13,17]. These studies showed the highest precisions of mixed-effects models, followed by neural network models, and traditional regression models. This may be due to the weaknesses of their modeling techniques, whereby they did not investigate the effects of multiple hidden layers on the precisions of the neural network models they developed. The results obtained from different modeling approaches including ours thus indicate the inconsistent ranking of approaches on the basis their fitting and prediction accuracies. Further investigation on the models, especially those to be developed with mixed-effects modeling and neural network modeling approaches using large datasets collected from extensive forest areas is necessary to better explore their differences. Generally, the neural network models have strong robustness, but traditional regression models and mixed-effects models have biological significance, for example, they have the parameters describing growth rates and growth patterns. The neural network modeling approach has tremendous advantages, such as avoiding complex selection procedures for the best performing model and obtaining higher precision. Furthermore, the BP neural network modeling has a better generalization ability, and therefore can approximate any nonlinear continuous function with a high precision. The BP neural network modeling approach is suitable for describing plant growth, which generally follows nonlinear patterns, and making it suitable for solving the problems caused by interrelated factors affecting plant growth.
The process of determining the selection of the traditional height-diameter models and neural network height-diameter models was also evaluated in this study. The traditional regression modeling needs the evaluation and comparison of differences among the fitting precisions of the candidate models considered, and the fitted model with the highest precision could be selected as the final model. The neural network modeling, on the other hand, needs the determination of the number of hidden layers and the number of neurons in each layer, and the numbers and forms of the transfer functions, and this modeling approach determines the best structure of the BP neural network. There are none of the well-organized robust methods, which can determine the number of hidden layers and the number of transfer functions, and the number of neurons based on the combinatorial mathematics that could help obtain the high precisions. We applied this method, and thus the height-diameter model proposed in this article is based on the multiple hidden layered-BP neural network. This method can help modelers to find the best neural network structure, and thus provide the best performance. However, neural network weights and thresholds are not easy to explain and determining the optimal structure of neural network is more tedious, and this can be done into a friendly interface program module, which can help future modelers to quickly determine the best structure of the BP neural network.
Our modeling system introduces the method of determining the best neural network structure with optimal numbers of hidden layers and neurons in each layer, and optimal number and forms of transfer functions. The model comparison would make the senses when traditional regression models, mixed-effects models, and neural network models have the same input factor, such as DBH in our case. Because DBH is the main factor influencing tree height, in this study, only the effect of DBH on the tree height was considered for both modeling approaches. In our subsequent studies, in addition to DBH, the effects of other factors on the tree height growth may be considered and models can be made more comprehensive and complex. However, introducing many variables does not guarantee the high accuracies of the models that are developed using any modeling approach. In the neural network modeling, all potential interconnected factors that affect the response variable (tree height, in our case) of the model are assumed to be properly described, and thus this approach can be considered more appropriate than other modeling approaches, such as ordinary least square regression and mixed-effects modeling approaches. This is because The neural network is able to optimize the model efficiently through the combinatorial optimization process.

Conclusions
We proposed the modeling method which can generate several BP neural network models based on the combinatorics mathematics, and among them, the model with the best structure was selected through comparison of the fitting and prediction precisions, and convergence rate. In the process of determining the structure of the neural network, both the number of hidden layers, the numbers of neurons and the number of transfer functions of the BP neural network were considered. We developed the BP neural network optimization method to establish the optimal tree height-diameter model for poplar plantations in the Guangdong Province of China. The optimal BP neural network structure was 1:5:5:1 and transfer functions determined were the logistic sigmoid functions. The optimal structure of the BP neural network height-diameter accounted for 75% variations of tree height-diameter relationship, which is higher (by 10.3%) than the best fitted traditional regression height-diameter model. The BP neural network height-diameter model also outperformed the mixed-effects height-diameter model. In addition to diameter at breast height, tree height growth is also substantially affected by other several factors, such as site and climate factors, and stand conditions, which may be introduced in the height-diameter model for gaining a higher prediction accuracy. The proposed method of the neural network modeling can be suitable for other forest modeling studies of similar or different types, such as tree crown modeling, height and diameter increments modeling, and so on.