Comparison of Modeling Approaches for the Height–diameter Relationship: An Example with Planted Mongolian Pine ( Pinus sylvestris var. mongolica ) Trees in Northeast China

: In the process of modeling height–diameter models for Mongolian pine ( Pinus sylvestris var. mongolica ), the ﬁtting abilities of six models were compared: (1) a basic model with only diameter at breast height (D) as a predictor (BM); (2) a plot-level basic mixed-effects model (BMM); (3) quantile regression with nine quantiles based on BM (BQR); (4) a generalized model with stand or competition covariates (GM); (5) a plot-level generalized mixed-effects model (GMM); and (6) quantile regression with nine quantiles based on GM (GQR). The prediction bias of the developed models was assessed in cases of total tree height (H) predictions with calibration or without calibration. The results showed that extending the Chapman–Richards function with the dominant height and relative size of individual trees improved the prediction accuracy. Prediction accuracy was improved signiﬁcantly when H predictions were calibrated for all models, among which GMM performed best because random effect calibration provided the lowest prediction bias. When at least 8% of the trees were selected from a new plot, relatively accurate and low-cost prediction results were obtained by all models. When predicting the H values of Mongolian pine for a new stand, GMM and BMM were preferable if there were available height measurements for calibration; otherwise, GQR was the best choice.


Introduction
Diameter at breast height (D) and total tree height (H) are the two most important variables for measuring forest inventory, and are often used to estimate site index, biomass, growth and yield, and other important parameters [1][2][3][4][5]. Measuring D is a quick, easy and reliable process, while measuring H is difficult and time-consuming [6]. Moreover, the measurement of H is often severely affected by visual impairment, which may produce considerable bias [7]. Therefore, developing a highly accurate height-diameter (H-D) model is necessary since forest managers prefer to use the H-D model to estimate H. The H-D relationship varies in different growing conditions, as well as with differences in other factors. Research has shown that predicting H with only the D variable is unreliable because trees with similar D values may have different H values, and the base H-D model (BM) is only suitable for homogeneous stands [8]. Numerous studies have shown that the generalized H-D model (GM) has wider applicability than the BM [9,10]. The GM takes stand and competitive factors into account, and the local variation in the relationship between D and H can be explained by extending other stand-level and tree-level variables [11]. The variables expanded to the BM usually include the average height of dominant trees (Hdom) [12], the quadratic mean diameter (Dq) [13], stand density (SD) [14], the ratio of diameter to Dq (Rd) [10], and stand basal area (BAS) [15]. Among these stand variables, Hdom is the most useful because it can represent the combined impact of stand level and site quality [16]. Thus, the accuracy of H predictions can be significantly improved by incorporating Hdom into the model [6,17]. Additionally, competing variables are sometimes introduced into the H-D model to improve prediction accuracy [18]. To date, numerous H-D models have been developed for different tree species around the world [14,[19][20][21][22]. Although current research suggests many different methods for modeling the H-D relationship, there is no unified conclusion as to which method is most useful. Therefore, choosing the most appropriate modeling method to accurately predict H is a problem that urgently needs to be solved.
In the past, ordinary least squares (OLS) regression, which can provide appropriate parameter estimates for the overall average, was widely used to develop the H-D model [8].
In general, H-D data are characterized by a grouping structure of more than one level, such as trees nested within the plots and plots nested within the region, which may influence the H-D relationship. The OLS method is insufficient for explaining potential differences among groups [23]. Consequently, the nonlinear mixed effects (NLME) modeling approach is suitable to account for differences between groups and can reduce predicted bias [24,25]. In NLME, part or all of the model parameters include both fixed effects and random effects. The fixed effects enable the data modeling of typical sample plots, while the random effects explain the differences between each level based on the data structure [26,27]. In addition, NLME can solve the problem of high correlation encountered in datasets with a hierarchical data structure [28], while the OLS method is unable to solve this problem [14]. Therefore, NLME is more popular in the field of forestry [18,29,30].
Quantile regression (QR) was proposed by Koenker and Bassett [31] and has become increasingly popular in forestry research. QR is a method for estimating the conditional distribution of the dependent variable, which allows for more flexibility in expressing the relationship between the response variable and the predictor variable [9]. QR has been applied to self-thinning boundary lines [32], height-diameter [10], taper models [33,34], diameter growth [35], and crown modeling [36,37]. For different quantile curves, two adjacent quantile curves can be used to calibrate the plot-specific H-D curve, which can provide more accurate tree height predictions [10]. As a result, each plot has its H-D curve, which is similar to the NLME approach. QR calibration has been used for tree diameter growth prediction [35], tree height prediction [9,10], branch growth prediction [38], and stand volume growth modeling [39]. Currently, research on QR calibration has not received a great deal of attention, and more diverse data are needed to validate the method. The calibration of model predictions has become a trend in recent research regarding the H-D model. Models based on OLS, NLME, and QR require calibration for the most accurate predictions [10]. For the OLS method, the calibration factor can be constructed and used to calibrate H predictions [40]. For NLME models, the empirical best linear unbiased prediction (EBLUP) method is used to calculate the random effects parameters [41]. The QR method constructs a new H-D curve for each group, which is achieved by combining two consecutive known curves for a specific quantile [38]. With the calibration of model predictions, an additional subsample consisting of H measurements from a new group is indispensable for the three modeling approaches. At present, the most critical issue is the size of the actual calibration subsample used. Therefore, it is important to analyze the effect of different sample sizes on the prediction accuracy of the three modeling methods [38,42,43]. If the subsample is not measured and the model predictions cannot be calibrated, H can still be predicted, but may with a larger bias [10].
Mongolian pine (Pinus sylvestris var. mongolica) is a geographic variety of Scots pine (Pinus sylvestris) and plays an important role in sand fixation, carbon sequestration, water and soil conservation, and timber production [44]. After the successful introduction of Mongolian pine, this tree species has been used in afforestation in large numbers in Northeast China [45]. It has the advantages of being good quality timber and of exhibiting strong growth ability under harsh site conditions [46]. Therefore, it is necessary to provide a variety of management schemes based on tree and stand attributes to maximize the ecological benefits of Mongolian pine [47]. To this end, a systematic study of the H-D curve of Mongolian pine is necessary to provide highly accurate H predictions, which can also help forest managers effectively plan Mongolian pine plantations. Until now, relatively few H-D models of Mongolian pine have been developed in Northeast China; only the H-D model of Mongolian pine grown in sandy areas has been developed [48], and an H-D model of Mongolian pine with wider applicability has not been proposed.
Thus, the objectives of this study were: (1) to evaluate the predictive capability of H-D models developed using the basic model (BM), basic mixed-effects model (BMM), basic nine-quantile regression model (BQR), generalized model (GM), generalized mixed-effects model (GMM), and generalized nine-quantile regression model (GQR); (2) to evaluate the prediction performance of different modeling methods using the jackknife technique; (3) to determine the required sampling percentage for calibration; and (4) to recommend several alternatives, considering their accuracy and applicability.

Study Area and Data Collection
Heilongjiang Province is the northernmost and easternmost provincial administrative region in China (from 121 • 11 -135 • 05 E and from 43 • 26 -53 • 33 N), and it is one of the largest Chinese forestry provinces. The forest area, total amount of forest volume, and timber production in Heilongjiang are among the highest ranked in the country. The area has a temperate continental climate, with a low-temperature and dry spring, warm and rainy summer, waterlogging-prone autumn with an early frost, and a cold, long winter. The annual average temperature is 3.5 • C, the altitude of the mountains is between 50 and 1000 m, and the average annual precipitation is 525 mm.
The data in this study are from pure plantations of Mongolian pine with different ages (12-59 years Forest Farm were selected randomly. In each selected plantation, rectangular or square plots were established and the sample plot sizes ranged from 0.02 to 0.2 ha. The differences in shapes and sizes of sample plots were mainly determined by the stand area, terrain, and other conditions. Finally, a total of 183 plots were established during the last twenty years. The distribution of the plots was representative of the current age range, stand density, and altitude of Mongolian pine. In the field work, the basic properties of all living trees were measured and recorded, such as D, H, tree status, and so on. D was measured using a diameter tape measure, and H was measured by a hypsometer (Vertex IV Ultrasonic, Haglöf, Sweden). A total of 10,841 individual Mongolian pines were measured. A basic summary of the modeling data is presented in Table 1, and the scatter plot between D and H is shown in Figure 1. Additionally, stand and competition variables were calculated, such as dominant tree height (Hdom, the average height of the 100 thickest trees per hectare), basal area (BAS), dominant tree diameter (Ddom, the average D of the 100 thickest trees per hectare), quadratic mean diameter (Dq), stand density (SD), the basal area of trees with larger diameter than the subjected tree (BAL), and relative size in the stand (Rd, ratio of D to Dq).

Basic and Generalized Model
The relationship between H and D has been described using a number of functions. In previous studies, the Chapman-Richards model has been shown to be highly flexible and exhibits satisfactory performance for fitting statistics; this model is one of the most widely used functions for developing H-D models [17,[49][50][51][52]. Therefore, the Chapman-Richards function is used in this study as the form of the BM for Mongolian pine. The specific formula is where H ij and D ij are the total height and diameter at breast height of the jth Mongolian pine in the ith plot, respectively; β 1 -β 3 are the parameters of this model; and the ε ij terms are the residuals.

Basic and Generalized Model
The relationship between H and D has been described using a number of functions. In previous studies, the Chapman-Richards model has been shown to be highly flexible and exhibits satisfactory performance for fitting statistics; this model is one of the most widely used functions for developing H-D models [17,[49][50][51][52]. Therefore, the Chapman-Richards function is used in this study as the form of the BM for Mongolian pine. The specific formula is where and are the total height and diameter at breast height of the th Mongolian pine in the th plot, respectively; 1 -3 are the parameters of this model; and the terms are the residuals.
The H-D relationship is greatly affected by stand characteristics and competition status [29]. Therefore, it is necessary to extend the BM to the GM by introducing covariates. To consider the effects of stand and competition variables on the H-D relationship, we took all variables in Table 1 as candidate variables. After evaluating the effect of stand variables and competing variables on the H-D relationship by a two-step covariate selection method, we developed a GM to predict H. First, we used the BM to fit the data of each plot and obtain the corresponding parameter estimates. Then, the relationship be- The H-D relationship is greatly affected by stand characteristics and competition status [29]. Therefore, it is necessary to extend the BM to the GM by introducing covariates. To consider the effects of stand and competition variables on the H-D relationship, we took all variables in Table 1 as candidate variables. After evaluating the effect of stand variables and competing variables on the H-D relationship by a two-step covariate selection method, we developed a GM to predict H. First, we used the BM to fit the data of each plot and obtain the corresponding parameter estimates. Then, the relationship between the model coefficients and candidate variables was assessed through scatter plots and correlation analysis [50]. The stand factor and competitive factor introduced need to be significant and the fitting accuracy of the GM needs to be significantly improved. In addition, a collinearity analysis that make sure the independence of the variables was also needed.
As a result, Hdom and Rd were selected as the best covariates, as the inclusion of the two variables outperformed other candidates and the corresponding model coefficients were significant (at p < 0.05 level) according to the approximate t-statistic. In addition, the condition number (CN) was used as a collinearity diagnostic indicator to assess the collinearity effects, which indicated that there is no multicollinearity among the variables, using the CN = 30 as a threshold value [53]. Indeed, the CN of the GM was 21.33, which suggested that multicollinearity is not detected. Finally, the specific form of GM was as follows: where β 1 -β 5 are the parameters of this model, and the others were defined above. Temesgen et al. [40] showed that predictions of ordinary models can be calibrated by a calibration factor, which can reduce a certain bias. According to the study of Temesgen et al. [40], the following calibration factors can be calculated when an H subsample is obtained: where c f is the calibration factor,Ĥ i is the H prediction made by the BM or GM, H i is the observed H of tree ith in plot k, and n k is the size of the subsample in plot k. Then the calibrated H predictions (H C ) can be expressed as follows: whereĤ i is the prediction value of the BM or GM, and H C represents the predicted value after calibration.

Basic and Generalized Mixed-Effects Models
NLME models can handle the intrinsic variation and nested structure of sample data. In NLME, all parameters in the model can be represented as fixed effect parameters, and some or all the parameters can contain additional random components. The single-level NLME form was as follows [54]: where m is the total number of plots; Φ i represents the formal parameter vector, which includes the fixed-effect parameter vector β and random-effect parameter vector b i of the ith sample plot; and symbols A i and B i are the design matrices for β and b i , respectively. y i and x i represent the response and independent variable vectors, respectively, and ε i is normally distributed within the group error. b i and ε i follow a normal distribution: where Ψ is the variance-covariance matrix of b i , R i is the within-plot variance-covariance matrix, σ 2 is the residual variance common to all plots, G i accounts for a within-plot variance heteroscedasticity, and Γ i is the identity matrix and accounts for the withinplot autocorrelation structure of residual errors. Since the H-D model was prone to heteroscedasticity, a power variance function was applied to correct for variance heterogeneity (Equation (10)) [55,56]. In addition, the heteroscedasticity of the Basic and Generalized models can also be solved by the same weighting function.
where δ is the estimated parameter and ε ij andĤ ij are the residuals and H predictions, respectively, obtained by fixed-effect unweighted regression. In NLME, there were two different scenarios of model prediction: fixed effects and a combination of fixed and random effects [57,58]. The NLME model results without random effects were referred to as uncalibrated predictions; conversely, the model results using random effects were referred to as calibrated predictions. Estimates of random effects parameters were calculated using the EBLUP method, where the detailed formula was [59][60][61][62] u where u k is a vector of the random effects for plot k, Ψ is the estimate of the variancecovariance matrix for the random effects, R k is the estimated variance-covariance matrix of within-group errors, Z k is the design matrix of the partial derivatives of the nonlinear function corresponding to the random parameters, Z T K represents the transposition of Z k and e k is the vector of residuals calculated by the fixed-effects predictions and observations.

Basic and Generalized Quantile Regression Models
QR can accurately estimate the complete conditional distribution of the dependent variable and can assess the predictors' effects at different quantiles [63]. Based on Equations (1) and (2), the basic and generalized QR models were as follows: whereĤ τ is the predicted tree H at the τth quantile and τ is the quantile. The quantile regression parameters were estimated by minimizing Equation (14), which was different from minimizing the sum of squared deviations in the OLS technique [64].
where S is the sum of the weighted residuals of the τth quantile and y ij andŷ τ D ij are the measured H and predicted H in the τth quantile, respectively. Model predictions of QR can be calibrated using different quantile curves. In this study, the nine-quantile regression (τ = 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9) was found to generate better statistical results than other combinations of quantiles because the current data were too scattered. In predicting Hs for plot k by the QR model, the first step was to identify the two correct closest quantile regression curves based on the smallest absolute mean difference between observed and QR predicted Hs of the subsampled trees measured from plot k. Then, two consecutive quantile regressions (i.e., qth and (q + 0.1)th) were chosen for interpolation of the H-D curve: whereŷ ki D ij indicates the predicted H of the ith tree in plot k,ŷ q D ij andŷ q+0.1 D ij are the QR predictions in the qth and (q + 0.1)th quantiles when D is D ij , and q can be 0.1, 0.2, . . . ,0.8. α represents the interpolated ratio and was obtained by minimizing [35]:  (6) GQR. In the current cases, given the lack of independent data sets, data splitting is unreasonable. Therefore, the use of cross validation techniques, such as the jackknife technique, was considered to be an appropriate way to evaluate the prediction accuracy of a model [65,66]. Thus, the jackknife technique [38,67] was applied to test the prediction ability of the developed models in both calibrated and uncalibrated cases, where the models were fitted using leave-one-out sample plots (i.e., m-1 sample plots), and the fitted model was used to predict each tree height in the excluded sample plot. Three indices were used for the assessment of fitting performance in this study: adjusted coefficient of determination (R 2 a ), root mean square error (RMSE), and Akaike information criterion (AIC). Prediction bias was evaluated by two statistics: mean absolute percent error (MAPE) and mean absolute error (MAE).
where m is the total number of sampled plots; n i is the size of the ith plot; LL was the log likelihood value, p is the number of effective parameters; y ij andŷ ij are the observed and fitted tree heights, respectively; andŷ ij,−i is the predicted tree height obtained by the jackknife technique.
To obtain H predictions with higher accuracy for a new plot, the predictions of all six models were calibrated once there was an H subsample. It was reported that the prediction accuracy improved as the sample size of measured subsamples increased [9,10]. However, more measurements require more work, which would increase the cost and time of fieldwork. Therefore, it was necessary to find an appropriate size that considered both the cost of the survey and the prediction accuracy. In this study, a percentage of trees in each plot were sampled for calibration to improve the prediction ability as there were obvious differences in the sizes of the plots. l (l = 2%, 4%, 6%, 8%, . . . , 20%) sample trees were selected to adjust the calibration coefficients, random effects, and two specific continuous quantile curves to compare the predictive performance of the H-D models. The selection was a random process without replacement, and the calibration was repeated 200 times in each subsample size [6,68].

Model Fitting
A preliminary result for the analysis of random effects parameters showed that the optimal forms of the BMM (Equation (22)) and GMM (Equation (23)) were expressed as follows: where the variables are as defined above, β 1 -β 5 are fixed-effects parameters, u 1 -u 3 are random effects parameters. Among the BM, BMM, GM, and GMM, the BM exhibited heteroscedasticity and a weighted regression was applied with Equation (10), while others had homogeneous variance (Figure 2). The parameter estimates of the BM, BMM, and the basic nine quantile regressions are listed in Table 2, and the parameter estimates of the GM, GMM, and the generalized nine quantile regressions are given in Table 3. All parameters were statistically  Table 4. BM described 71.76% of the variation when only D was used as a predictor. After the introduction of other variables (Hdom and Rd) in BM, the model fit was significantly improved, as R 2 a increased by 34.04% (from 0.7176 to 0.9619), RMSE decreased by 63.28% (from 3.4644 to 1.2718), and AIC decreased by 36.60%. After the introduction of random effects in the BM and GM, the fit of the model was further improved, which had a higher value of R 2 a and lower values of RMSE and AIC. The AIC value of the GMM was lower than that of the BMM; however, the R 2 a and RMSE of the BMM were similar to (or even better than) the GMM results. In addition, based on the fit statistics for BQR (τ = 0.5) and GQR (τ = 0.5), all indicators were similar to those of the BM and GM, respectively, with the exception of a large gap between the AIC of BQR (τ = 0.5) and that of the BM.
where the variables are as defined above, 1 -5 are fixed-effects parameters, 1 -3 are random effects parameters.
Among the BM, BMM, GM, and GMM, the BM exhibited heteroscedasticity and a weighted regression was applied with Equation (10), while others had homogeneous variance ( Figure 2). The parameter estimates of the BM, BMM, and the basic nine quantile regressions are listed in Table 2, and the parameter estimates of the GM, GMM, and the generalized nine quantile regressions are given in Table 3. All parameters were statistically significant, with p values < 0.05. The fit statistics of the six models are shown in Table  4. BM described 71.76% of the variation when only D was used as a predictor. After the introduction of other variables (Hdom and Rd) in BM, the model fit was significantly improved, as 2 increased by 34.04% (from 0.7176 to 0.9619), RMSE decreased by 63.28% (from 3.4644 to 1.2718), and AIC decreased by 36.60%. After the introduction of random effects in the BM and GM, the fit of the model was further improved, which had a higher value of 2 and lower values of RMSE and AIC. The AIC value of the GMM was lower than that of the BMM; however, the 2 and RMSE of the BMM were similar to (or even better than) the GMM results. In addition, based on the fit statistics for BQR (τ = 0.5) and GQR (τ = 0.5), all indicators were similar to those of the BM and GM, respectively, with the exception of a large gap between the AIC of BQR (τ = 0.5) and that of the BM.

Model Evaluation and Prediction Comparison
The results of the evaluation of the six models are listed in Table 5. The performance of the BM was relatively poor when compared to other multivariate equations, which suggested that using D alone as a predictor is not sufficient to explain the variation in H. When all methods were uncalibrated, all models with D as the only predictor had low prediction accuracy, and the highest prediction accuracy was obtained with BQR (τ = 0.5), followed by the BM and BMM. In uncalibrated methods with multiple predictors, the GQR model (τ = 0.5) had the best model performance when compared with the other two models, as the MAPE and MAE were 6.9132% and 0.9706, respectively. The performance of the GM was worse than that of the GQR model but slightly better than that of the GMM. Overall, the GQR model had the highest prediction accuracy, followed by the GM, GMM, BQR model, BM, and BMM in the case of using models without calibration.
When all methods were calibrated, in calibrated methods containing only D as a predictor, the prediction accuracy of the calibrated BMM and calibrated BQR model was better than that of the calibrated BM, while the prediction accuracy of the calibrated BMM was higher than that of the BQR model (Table 5). In the calibrated methods with multiple predictors, the prediction accuracies of the calibrated GMM and calibrated GQR model were better than that of the calibrated GM, while the prediction accuracy of the calibrated GMM was higher than that of the GQR model. When considering all calibration models, the GMM had the highest prediction accuracy, followed by the GQR model, GM, BMM, BQR model, and BM. Scatter plots of measured tree heights and calibrated predicted tree heights are shown in Figure 3.    Figure 4 shows the statistics for calibrating the six developed models using different sampling percentages. The prediction performance of each model improved as the sampling percentage per plot increased. Compared to the case where the subsample was zero and, thus, the model was uncalibrated, the BM, BMM, BQR model, and GMM performed better based on the MAPE and MAE values. The calibrated GQR model and calibrated GM yielded better statistics than the corresponding uncalibrated models when measuring 4% and 6% of trees per plot, respectively ( Table 6). Regardless of the sampling percentage, the GMM always exhibited better predictive performance than the other models, followed by the GQR model, GM, BMM, BQR model, and BM. Thus, tree height predictions cali-  Figure 4 shows the statistics for calibrating the six developed models using different sampling percentages. The prediction performance of each model improved as the sampling percentage per plot increased. Compared to the case where the subsample was zero and, thus, the model was uncalibrated, the BM, BMM, BQR model, and GMM performed better based on the MAPE and MAE values. The calibrated GQR model and calibrated GM yielded better statistics than the corresponding uncalibrated models when measuring 4% and 6% of trees per plot, respectively ( Table 6). Regardless of the sampling percentage, the GMM always exhibited better predictive performance than the other models, followed by the GQR model, GM, BMM, BQR model, and BM. Thus, tree height predictions calibrated by sampling 8% of trees per plot were optimal in terms of the prediction accuracy and considering the measurement costs in our study.  Figure 4 shows the statistics for calibrating the six developed models using different sampling percentages. The prediction performance of each model improved as the sampling percentage per plot increased. Compared to the case where the subsample was zero and, thus, the model was uncalibrated, the BM, BMM, BQR model, and GMM performed better based on the MAPE and MAE values. The calibrated GQR model and calibrated GM yielded better statistics than the corresponding uncalibrated models when measuring 4% and 6% of trees per plot, respectively ( Table 6). Regardless of the sampling percentage, the GMM always exhibited better predictive performance than the other models, followed by the GQR model, GM, BMM, BQR model, and BM. Thus, tree height predictions calibrated by sampling 8% of trees per plot were optimal in terms of the prediction accuracy and considering the measurement costs in our study.   A further comparison was carried out using trees with a variety of diameters and calibrated based on 8% of the trees in each plot. Boxplots of MAPE and MAE across D classes (taking 4 cm as a class) were used to compare the BM, BMM, BQR model, GM, GMM, and GQR model in terms of prediction performance and stability ( Figure 5). The trees with D values greater than 36 cm were classified into 34 diameter classes, as there were few trees that in which D was greater than 36 cm. Both BM and BQR exhibited large changes and relatively poor prediction performance in all D classes predicted at the tree level in terms of MAPE. For the BMM, GM, GMM, and GQR model, the mean MAPE in different D classes decreased with increasing D, and the mean MAPE in different D classes for the GMM was closer to 0 than those of other models, which had a narrower distribution. The MAE, BM and BQR model yielded poor prediction results in different D classes, and the MAE distribution range was larger. In addition, the BMM, GM, GMM, and GQR model were relatively stable in predicting tree height in different D classes. The MAE distribution range of the GMM was always narrowest in different D classes; therefore, a higher prediction accuracy was obtained. Overall, the GMM outperformed other models in predicting individual tree heights, followed by the GQR model, GM, BMM, BQR model, and BM.

Specification of a Generalized H-D Model
H-D models are often used to predict individual H, which is necessary for estimating tree volume and predicting stand development over time [14]. Therefore, it is particularly important to determine the most appropriate method for developing H-D models. Thus, this study used the H-D dataset of Mongolian pine to develop H-D models and compared the predictive power of OLS, NLME and QR approaches. The Chapman-Richards function has been recognized as one of the most widely used functions in the H-D model due to its flexibility in current studies [50,51]. Previous studies have shown that there are cer-

Specification of a Generalized H-D Model
H-D models are often used to predict individual H, which is necessary for estimating tree volume and predicting stand development over time [14]. Therefore, it is particularly important to determine the most appropriate method for developing H-D models. Thus, this study used the H-D dataset of Mongolian pine to develop H-D models and compared the predictive power of OLS, NLME and QR approaches. The Chapman-Richards function has been recognized as one of the most widely used functions in the H-D model due to its flexibility in current studies [50,51]. Previous studies have shown that there are certain differences in the H-D relationship between different stands, and the BM with only D as the predictor was only applicable to the stand for which the data were collected [17,69]. Therefore, a GM with wider applicability was proposed by introducing significant standlevel covariates or competition indices into the basic Chapman-Richards model [9]. In the GM of this study, the asymptotic parameters were best explained by the stand-related variable Hdom, as Hdom can represent the pros and cons of stand quality [70]. In addition, Hdom represented the developmental stage of the stand. In the H-D model, Hdom is often used as a covariate to reduce the bias of H prediction, [29,51,68]. For the tree-level competing factors that affect the H-D relationship, we chose Rd as the representative variable, and the larger the RD, the stronger the competitiveness of the tree. We incorporated Rd into the asymptote parameter as well. Additionally, the independence of all the variables (i.e., D, HDOM, and RD) was further tested by using the CN indictor, which would influence the inferences made about significance and parameter estimates [71,72]. It was found that the CN of the GM was smaller than 30, which suggested that there was no multicollinearity. The final form of GM was a Chapman-Richards function with Hdom and Rd, which had a high degree of flexibility and could further explain the variation in H for similar values of D in different stands.

Comparison of Modeling Approaches
There are still many stochastic variabilities in the H-D relationship controlled by other factors, such as soil, terrain, and climate factors [28,70,73]. The useful variables are sometimes difficult to find, which suggests that the GM may be not sufficient to explain all the differences between stands [70]. Therefore, this study developed H-D models by using the NLME and QR methods. Plot-level random effects can accurately express the H-D relationship. In addition, a set of QR models can accurately reveal possible relationships that standard regression analysis is unable to determine [38]. Different quantile curves can provide the H-D relationship of a certain plot more flexibly.
The fit indices of the BMM and GMM were somewhat unexpected, the statistics R 2 a and RMSE indicated that the BMM has a better fit than the GMM, while the AIC of the GMM was better than that of the BMM, which was consistent with Han et al. [18,48]. A possible explanation for why the GMM does not provide better precision than the BMM is related to the ability of the random-effects parameter to explain the various H-D relationships among stands. In our research, both the stand variable and the random effects were introduced alongside the asymptote coefficient, and the random effects of the BMM were significantly larger than those of the GMM [48].
Among the three modeling methods OLS, NLME, and QR, it was found that the developed NLME model and QR model were better than the OLS model because NLME and QR took into account the potential differences between plots [39]. In addition, the OLS and NLME methods for H-D modeling have been widely studied, while the use of QR has been relatively limited [7]. In our study, the results showed that the GMM and GQR model achieved accurate predictions with calibration. Generally, QR is more flexible than the NLME model because it can provide different parameter estimates based on different quantiles [10,38], and it has the ability to provide different H-D curves, as shown in Tables 2 and 3. However, evaluation statistics showed that the prediction accuracy of the calibrated NLME model was better than that of the calibrated QR model (Table 5). Özçelik et al. [9] found that QR curves from different plots generally follow the basic shapes dictated by the QR model, and do not accept the fact that some of these QR curves might cross one another. In contrast, the plot is used as a grouping variable and enables specific and accurate predictions in NLME. The random effects in the NLME model can describe the differences among plots and account for variation in the H-D curve between plots, which improves the prediction accuracy of NLME model [69,74,75]. Nevertheless, QR still has its own advantages; QR can robustly estimate the extreme values of the response variable when the data are highly scattered and contain certain outliers in this case, and it is suitable to apply QR for analysis [76]. In addition, QR not only does not have high data requirements but also does not require multilevel structural data. The calibration of a QR model can be based on any category of classification, so even if QR prediction accuracy is slightly worse than that of NLME, when there is no obvious hierarchical structure (for example, there is no clustering by plot), QR models are more flexible and appropriate.
As shown in Table 5, calibrating the H predictions of all models using all trees per plot improved the prediction accuracy compared to the case that was not calibrated. However, it is very difficult and impractical to measure H for all trees in a forest inventory, and it is unreasonable to measure many samples for the calibration of H predictions [68]. Because of the uneven distribution of plot area in the data in this study, it may be a more reasonable approach to perform sampling calibration on the percentage of trees in each plot, which can more accurately predict the H of all trees in each plot. Sampling is a simulation of the actual survey situation, and the measured subsample can be used to adjust the calibration coefficient, random effect, or interpolation ratio of the two consecutive quantile regression curves, and effectively improve model estimation accuracy. Based on the results of OLS, NLME, and QR with calibration, the prediction accuracy of NLME was always higher than that of the OLS and QR methods regardless of the sample sizes (Table 6), which was consistent with Özçelik et al. [9]. In addition, the calibrated OLS prediction accuracy was the worst. For the calibration of the BM and GM, the calibration factor was the same for the predicted H of each tree in the plot, indicating that the default bias was the same; this was different from the NLME and QR methods and may be the reason for the poor performance of the the BM and GM's calibration. In practice, it is crucial to determine the number of suitable measurement samples due to the high cost of sampling. Figure 4 shows that with the increase in the sampling percentage of each plot, the change curve of each index value showed a steep trend and then gradually became a straight line. In addition, the study found that for the GQR model and the GM, 4% and 6% of trees were selected as subsamples in each plot, respectively, and the prediction accuracy was better than that of the corresponding uncalibrated model, which was different from the GMM (Table 6). Furthermore, when a smaller percentage of trees was selected for calibration from each plot, the GMM was found to have certain advantages, and the prediction accuracy was significantly better than that of the GQR model and the GM; therefore, NLME was indeed the best recommended technique. Considering the sampling cost and prediction accuracy of the model, we believe that sampling 8% (or more) of trees per plot for calibration is sufficient to provide high-accuracy predictions. The calculation was based on the number of trees and the number of sample plots in this study, and was equivalent to taking an average of five trees from each sample plot for calibration calculation.
The statistical measures presented in this paper are based on the data used for the regression analysis, only assessing the quality of the fitting process. The model fit is currently assessed but not the error of the model application. In the future, it is possible to validate the developed models by using an independent data set to evaluate the quality of the predictions in real world applications. Hence, in a follow-up research, we will also focus on the combination of quantile regression and mixed effects models to explore a new method, namely, the QRNLMM. At present, the generalized QRNLMM has some problems in developing models, such as difficulty obtaining convergence, the calculation principle of random effects, and the prediction of QRNLMM, which are all required for making further explorations. In conclusion, we aim to explore the prediction of H using the QRNLMM, including calibration with different subsamples, in future studies to minimize the burden of field measurements.

Conclusions
In this study, six modeling approaches, the BM, BMM, BQR model, GM, GMM, and GQR model, were used to develop an H-D model based on the H-D data from 183 plots in Mongolian pine plantations in Northeast China, and were further compared. All models benefited from calibration when compared with uncalibrated models, and the calibrated models exhibited more stable performance and higher prediction accuracy. Among the calibrated models, the GMM obtained the highest prediction accuracy, followed by the GQR model, GM, BMM, BQR model, and BM; when there were no available H measurements for calibration, the GQR model provided the most accurate H predictions, suggesting that the calibration of random effects has advantages over both the combination of quantile regressions with different quantiles and calibration factors for tree-specific H prediction. The variation in prediction accuracy with increasing calibration sample size was assessed, and we found that the appropriate sample size for balancing both prediction accuracy and investigation cost was achieved when 8% of trees were randomly selected from a plot. Overall, the mixed-effects model showed the best performance for predicting H. However, the proposed models were developed for Mongolian pine plantations in Northeast China, and caution should be exercised when applying the newly developed models for other tree species or other areas.