A Flexible Height–Diameter Model for Tree Height Imputation on Forest Inventory Sample Plots Using Repeated Measures from the Past

: In this study, height–diameter relations were modeled using two different mixed model types for imputation of missing heights from longitudinal data. Model Type A had a hierarchical structure of sample plot-specific and measurement occasion-specific random effects. In Model Type B, a possible temporal variance was modeled by a sample plot-specific linear time trend. Furthermore, various calibration strategies of random effects were performed on past and current data, and a combination of both. The performance of the mixed models was compared on independent data using bias and root mean square error (RMSE). The results showed that Model Type A achieved the highest precision (lowest RMSE), if sample plot-specific random effects were predicted from old data and measurement occasion-specific ones were predicted from new data. In comparison, however, Model Type B had a higher RMSE, and lower bias. Model performance was almost unaffected from the usage of past or current data for the prediction of random effects. Results revealed that a certain calibration strategy should be simultaneously applied to all random effects from the same hierarchy level. Otherwise, predictions would become imprecise and a serious bias may result. In comparison with traditional uniform height curves, the novel mixed model approach performed slightly better.


Introduction
The estimation of growing stock timber volume can be regarded as a major purpose of traditional forest inventories [1]. In traditional forest inventories with multiple sample plots, the total growing stock of a forest landscape is usually estimated by the Horwitz-Thompson estimator applied to single tree stem volumes together with the trees' inclusion probabilities. The direct measurement of tree volume is, however, not feasible in practice and single tree volume is usually estimated by allometric relationships or regression functions dependent on easy-to-measure auxiliary variables, such as tree height (h) and diameter at breast height (d); the reader is referred for further details to standard textbooks on forest mensuration, such as Kershaw et al. [2]. Compared with the diameter measurement, the measurement of tree height is time-consuming and has larger relative measurement errors. Tree height is therefore often measured only for a subsample per sample plot and the heights of the other trees are imputed. Tree heights may be estimated using functions solely based on d. This type of function is termed "local" [3] or "simple" [4] height-diameter curve. These equations do not include additional variables that may influence the height-diameter relation in different stands. The h − d relationship, however, can vary with environmental conditions and stand structure [5,6] and h − d curves have to be updated for a new survey. In addition, retaining the same sample trees for h − d curve h − d data from the past inventories. It was further examined whether random effects on the sample plot level remain stationary or whether a time trend can be detected, which could be modeled by additional fixed effects. Furthermore, different strategies for the calibration were compared. In the different calibration strategies, only a specific subset of the random effects as predicted by means of new data and the other random effects were adopted from the past. The performance of the different candidate models was examined by means of independent data, which was not used for the model fitting. Finally, a comparison with the traditional uniform height-curve approach was conducted.

Materials and Methods
The BOKU Institute of Forest Growth maintains a permanently repeated forest inventory in the forest district Ofenbach located in the Austrian pre Alpes, in the federal state of Lower Austria nearby the village Forchtenstein. The forest inventory was initiated by Schodterer [28] in 1989 and comprises in total 554 sample plot locations. Since then, approximately one fifth of the total sample size has been re-measured in each year, meaning that a complete follow-up of the entire sample has been conducted during a five-year interval. From 1989 to 2016, sampling plots were measured on five to six occasions. The most recent inventory cycle with the sixth measurement occasion includes so far only recordings for three years (2014, 2015, and 2016) meaning that data from 2017 and 2018 are missing before the follow-up is completed. Therefore, the sixth measurement period was discarded from our data analysis. In addition, inconsistencies with respect to the tree numbering between the first and the second measurement existed and could not be solved properly, because old data collection forms are lacking. Thus, data from the first measurement occasion were also discarded from the analysis.
Sample plots are systematically aligned in a regular grid having a mesh width of 141.4 m × 141.4 m. At each sample point, Bitterlich relascope sampling [29][30][31] was conducted using a basal area factor of 4 m 2 per ha. Trees were only recorded if their diameter was greater than or equal to a lower threshold of 5 cm.
Site, stand and tree variables measured at each sample point are shown in Table 1, although it should be noted that not all of the listed features were addressed on each measurement occasion during the entire period from 1989 to 2016. Furthermore, the following variables were additionally calculated additional by means of the forest to the inventory sample data: basal area, and stem number per hectare, stand density index (SDI), quadratic mean diameter (QMD) and d/QMD ratio. The data were split into a model dataset used for model fitting and an extra dataset for evaluation purpose. The model dataset comprised 1491 sample plots measured in three periods (1994-1998: 468 plots; 1999-2003: 510 plots; and 2004-2008: 513 plots). The number of sample points varied between measurement occasions due to clear cuts in some forest stands. Data mining was necessary and a few outliers and missing values of the tree and stand variables were corrected. For a simple and clear overview, Tables 2 and 3 show summary statistics (number of observations, mean, min, and max) of the observed heights and the covariates used in the final models for both datasets.  Tables 4 and 5 contain summary statistics for tree species groups, which share a similar growth pattern in the survey area and which possess statistically not significant differences with respect to their h − d ratios. The variables include the main regressor-variable d, the diameter of median basal area tree per species (dm), the height of median basal area tree per species (hm), and the tree height (h) as response variables. Figure 1 shows a smoothed scatterplot of the observed height-diameter pairs of the model dataset. Scatterplots of the different species groups can be found in Appendix A.

Basic Model and Model Variants
Many height-diameter equations have been proposed in several studies. A proper heightdiameter model possesses some important characteristics (e.g., [11]). The ℎcurve is monotonically increasing and has a functional inflection point as well as an asymptote. Parameter-parsimonious models that meet these requirements are the "Petterson function" (e.g., [32]) and the "Näslund function" (e.g., [33]). The Petterson function is commonly applied in German speaking countries of Central Europe and the Näslund function is applied in Nordic countries.

Basic Model and Model Variants
Many height-diameter equations have been proposed in several studies. A proper height-diameter model possesses some important characteristics (e.g., [11]). The h − d curve is monotonically increasing and has a functional inflection point as well as an asymptote. Parameter-parsimonious models that meet these requirements are the "Petterson function" (e.g., [32]) and the "Näslund function" (e.g., [33]). The Petterson function is commonly applied in German speaking countries of Central Europe and the Näslund function is applied in Nordic countries.
According to findings in [34][35][36][37] and recent suggestions by Mehtätalo et al. [4], the Näslund function was chosen in our work: where h ijk and d ijk are the total height (in m) and diameter at breast height (in cm) on the k-th measurement occasion of tree j at sample plot i, respectively. Parameters α, β and γ are fixed effects.
As fitting a hierarchical nonlinear mixed model often becomes challenging [18,38], the base function in Equation (1) was linearized. For parameter γ, a grid search was performed with regard to the minimum of bias and RMSE. After setting parameter γ fixed, linear mixed model theory was applied to the transform Hereof, y ijk is the transformed response. The process of model construction was strictly based on guidelines outlined in the standard textbook of Pinheiro and Bates [39]: first, the structure of the random effects has to be defined and tested via LRTs (likelihood ratio test), and, afterwards, the fixed effects were selected guided by Wald-tests [40]. In mixed models, random effects can affect both the absolute term (intercept) and the slope. A first simple model that integrates the random effects into the basic linear function (Equation (2)) is: where β 0 is the fixed intercept and β 1 is the fixed slope coefficient for d ijk , and d is the diameter of tree j on sample plot i at measurement occasion k. Both terms provide an estimate for the conditional expectation of the population mean of the response variable. a i is a random effect on the sample plot level, i.e., the random deviation on the sample plot level around the intercept β 0 . ijk is the residual error. That is, A1 considers the differences in the intercept between the individual sample plots. For the next step of the model construction, it is assumed that not only the intercept, but also the slope of the height curve can vary among the sample plots: with b (1) i being the random deviation of the coefficient for d ijk . Since data comprised several measurement periods, the model should also consider a possible temporal variation of the parameters. Thus, an additional random effect is introduced, which considers the grouping with respect to the measurement occasions: where a ik and b (1) ik are additional random effects for the measurement occasion k. The R [41] package "nlme" [39] was used for fitting the models. However, nlme could not be applied for the calibration of Model A3 by means of new independent data. This is because nlme does not provide a straightforward implementation of the random effect prediction when new units for a grouping level occur. This case simply happened, as a future measurement occasion always represents a new unit k for a ik and b ik . Thus, a little extra algebra was necessary for the evaluation of the BLUPs. As an alternative solution to the extra programming effort associated with the calibration of Model A3, the h − d-model was further reformulated. Whereas it is assumed for Model A3 that the random effects a ik and b ik are independently distributed among the different measurement occasions, the reformulations assume that the model parameters follow a linear temporal trend. The latter assumption then no longer requires random effects on the measurement-occasion level. This approach was implemented by adding the measurement occasion m as an extra fixed effect to Model A1 (Equation (3)) and Model A2 (Equation (4)), meaning that a population-wide time trend is modeled for the parameters. The models are further enhanced by extra sample plot level random effects, which allow the temporal trend to vary among the sample plots. With these prerequisites, the models were specified as Model Type B: B2 : According to the generalized h − d curve approach, further regressor covariates were tested as fixed effects. The random effects at each level were plotted against all covariates and carefully examined for any nonlinear effect. In consequence, only the dm was transformed logarithmically. Diagnostic graphs for the other covariates did not show any relevant trends. The candidate variables were d, m (measurement occasion), logarithmic dm (diameter median basal area tree per species), hm (height median basal area tree per species), dummy categories for the different tree species groups, interaction effects between tree species groups and d, dummies for the mixture, SDI (Stand Density Index) [42], QMD (quadratic mean diameter), d/QMD ratio, basal area per hectare and stem number per hectare, elevation, slope and aspect. Based on Wald tests, the d, the measurement occasion, the logarithmic dm, the hm, the fixed offset effects for dummy-coded tree species groups, the interaction between tree species groups and d, the QMD and the d/QMD ratio proved to be as significant. However, the QMD was strongly correlated with the dm and did not lead to any improvement when the model was used for predictions. The latter also applied to the QMD/d ratio, which did not show any biological logics in connection with h − d curves. Furthermore, the models were kept as simple as possible to produce reliable prediction results; and because of that the two variables were not included in the full models. The respective variables were considered with fixed effects in two full models, which differed with respect to their specification of the variance among the measurement occasions. Model B3 is analogous to Model B2 and also includes a sample plot level random time trend for the intercept parameter as well as for the slope dependent on d.
Model A4 is analogous to A3 and incorporates the measurement occasion itself as a random effect.
y ijk is the transformed tree height and β 0 , . . . , β 14 are fixed effects. d ijk is the diameter at breast height, m is the measurement occasion, log(dm ik ) is the logarithmic diameter of the median basal area tree per tree species, hm ik is the height of the median basal area tree per tree species and I{spec = · · ·} is the dummy categorical variable for the respective tree species. The order of tree species groups in Equations (9) and (10)  (2) i are sample plot level random effects; the former acts on the slope dependent on d ijk and the latter reflects the sample plot specific temporal trend of the model intercept. In Model A4, a ik is a random intercept-effect and b (1) ik is a random slope-effect on the measurement-occasion level. ijk is the residual error.

Matrix Formulation
The linear mixed model can be written in matrix notation [15] as where y is a column vector of the response values having length n (number of observations), β is a vector of the total p fixed effects, b is a vector of the total q Gaussian random effects, and is an independent and identically distributed residual Gaussian error. The model matrix X has dimension n × p and contains the values of the regressor covariables and columns of one-values for fixed effect intercept parameters as well as for dummy-categorical regressor variables. Matrix Z is a n × q design matrix containing the covariate values associated with the respective random slope-effects and one-values for the random intercept-effects. The summand Xβ is the fixed part of the model and Zb + is the random part. It is assumed that the vector of random effects b is normally distributed with b ∼ N(0, D) with D denoting the q × q variance-covariance matrix of the random effects. It is assumed that is normally distributed with ∼ N(0, R) with R being the corresponding n × n variance matrix containing var( ) on the diagonal. Because residual errors of different trees are assumed to be uncorrelated, the off-diagonal elements of R have zero values.

Estimation of Fixed Effects and Model Calibration via BLUP
The fixed effects and the variances and covariances of the random effects were estimated using restricted maximum likelihood methods (REML) [38,39]. Hypothesis testing for the fixed effects is performed using Wald [40] tests sequentially for each of the parameters. The null hypothesis is H 0 : β = 0 and the alternative is The linear mixed model approach allows for model calibration by means of prior information [15,17]. Different kinds of information of the predictors in the fixed part combined with additional local height measurements can be used to improve the prediction accuracy. The random effects in b were predicted via BLUPs: D contains the REML-estimates of the covariances,R the estimated residual variance andβ the estimates of the fixed effects. The BLUP can be either applied to response observations, which have been used for the model fitting or to new observations from independent data. In the latter case, y is substituted by y* denoting any prior information from existing h − d measurements. An example of matrix algebra can be found in Appendix B.
The calibration of the mixed models was done by means of old and new data. The old data represent the model dataset and the evaluation dataset was formed by new independent measurements, which were not used for model fitting. For a comparison under fair conditions, the calibration by means of new data and the adjustment of the uhc were performed exclusively based on tree height information from the median basal area trees per species. For Model B3 in total eight calibration strategies exist, among them two extremes: one calibration variant in which all random effects were adopted from predictions by means of old model data and the other extreme for which all random effects were "re-predicted" by means of the new independent evaluation data. For Model A4, which has only sample plot level random effects, in total four calibration strategies exist. Likewise, in one extreme variant the random effects predictions were adopted from the model fitting data and in the other extreme they were entirely predicted by means of the calibration data. It is noted that random effects at the measurement occasion level (Model B) were always re-predicted by means of the new calibration data.

The Traditional Uniform Height Curve Method
The principle of the uniform height curve approach is explained by Knieling [27] who proposed a height curve type according to Pollanschütz [43]: where h is the tree height and a and b are regression coefficients. In the study of Knieling [27], the permanent samples of the Austrian Forest Inventory 1981/1985 formed the basis for the calculation of the uhc. The coefficients a and b were separately estimated for each forest inventory sample plot and per each tree species. As expected, the sample plot-wise estimates of parameter a had a relatively large variance and estimates of b were almost constant. The rationale is that parameter a represents the asymptote of the h − d-curve that generally increases with stand age and site quality [6,25]. For the practical application, this means that b can be kept fixed depending on tree species, and parameter a can be estimated for each sample plot by the transform using dm and hm.

Model Performance and Evaluation
The goodness of model fits was assessed by means of the AIC (Akaike information criterion) in which the negative logarithmic likelihood (L) is penalized by the number of k parameters. As inference was based on a restricted likelihood, only model candidates could be compared having the same structure in the fixed model part. Thus, models that differ in their specification of random effects were compared using a likelihood-ratio test (LRT) with L F being the likelihood of the full model and L R the likelihood of a restricted model. Compared to the full model, a restricted model has at least one random effects set to zero. The LR test statistic is χ 2 distributed with degrees of freedom corresponding with the number of random effects set to zero. As stated above, the selection of fixed effects among the candidates of the regressor covariates was conducted using Wald tests. In addition, the coefficient of determination was calculated based on the model dataset twice: (i) by means of mean curve predictions (R 2 m ); and (ii) for prognoses with random effect calibrations (R 2 c ).
By Equation (2), the predicted tree heights of the different model variants were calculated. The evaluation of the mixed models was done by comparing the observed (h ijk ) and predicted (ĥ ijk ) heights from the evaluation dataset, with n trees, through a statistical analysis of the residuals. The performance of the models was examined by means of the precision and accuracy of the predictions for independent data from the 5th measurement occasion that was not used for the model fitting. Therefore, the RMSE as well as the bias were evaluated. In addition, it was assessed, using diagnostic plots for the residuals against predictions and for the predictions against observations, whether the models behaved well or whether a bias or some signs of heteroscedasticity occurred.

Results
The grid search revealed that an optimum value of 3 existed for the parameter γ of the Näslund function (Equation (1)), and, accordingly, γ was kept fixed at that optimum. LRTs showed that Model A3 (Equation (5)) was superior to A2 (Equation (4)) and also to A1 (Equation (3)) with respect to the specification of the random effects structure (Table 6). This means that it was appropriate to place random effects on the intercept as well as on the slope that acted on both levels, the sample plot level and the measurement-occasion level. The full Model A4 (Equation (10)) for the A-model-type, with random effects on the measurement-occasion level, additionally includes significant fixed effects for d, logarithmic dm, hm and for dummy-coded tree species groups. When the random parameters on the measurement-occasion level (Model A3 of the A-type) were substituted by a temporal trend in the models belonging to the B-type, Model B2 (Equation (8)) was superior to B1 (Equation (7)), meaning that the inclusion of random effects for both regressor covariates, d and measurement occasion, was advantageous. The full Model B3 (Equation (9)) for the B-model-type includes significant fixed effects for d, measurement occasion, logarithmic dm, hm and dummy-coded for tree species groups.
ik are random effects; AIC is the Akaike information criterion; LRT is the likelihood ratio test with the χ 2 distributed LR (likelihood ratio) and the p -value; R 2 m is the marginal coefficient of determination; and R 2 c is the conditional coefficient of determination. Note that, in contrast to the covariance, the variance is reported as standard deviation squared.  The performance of both full Models B3 and A4 was compared under different calibration strategies using prior observations from the independent evaluation data. In addition, a comparison was made with the traditional uhc approach ( Table 7). The calibration of the mixed models was done by means of old and new data. Note that new data calibration and the localization of the uhc were performed exclusively based on median basal area trees per species to get a fair comparison. Systematically, we started with the prediction of all random effects from new data. For Model B3, the lowest RMSE of 2.095 and the lowest bias of −0.059 were achieved with Calibration Strategy 8 when the random effects were predicted by means of the model data and were assumed to be also valid for predictions of the new independent data. In this calibration strategy, a calibration via BLUPs using new data was not necessary and the prior predictions of random effects by means of the model data were likewise assumed to be valid for future measurement occasions. Thus, in this particular calibration variant, the random effects were entirely predicted using old data, and these predictions were henceforth plugged into Equation (11) to derive response prognoses for new covariate data in X representing the evaluation data. However, Calibration Strategy 1 in which the complete set of random effects was predicted by means of calibrations to the new data performed only slightly worse and had a RMSE of 2.148 and a bias of 0.071.

A1
i and b i were adopted from calibrations to the old data, had a poor performance; in the former case, RMSE was 12.384 and bias was −4.675, and, in the latter case, RMSE and bias were 8.988 and −3.743, respectively. Calibration Strategies 2 and 4 with only a i or b (2) i predicted using data from the past had medium performance. Strategies 5 and 6 with b (2) i or b (1) i predicted using new data had a relatively high RMSE of 5.032 and 5.047, respectively. The former also had a high bias with −1.974. The latter has a small bias with 0.007.
Model A4 had the smallest RMSE of 2.081, when the sample plot level random effects were adopted from calibrations to the old data and when the random effects on the measurement-occasion level were predicted by means of the new data (Calibration Strategy 4); however, bias was 0.332. The smallest bias of 0.214 was achieved with the A-model type, when the complete set of random effects was predicted by means of the new data (Calibration Strategy 1). However, RMSE was slightly higher with 2.178 compared to the Calibration Strategy 4, which used sample plot level random effects from calibrations to the past data. The procedure, which used only b (1) i from past calibrations, had the worst performance among all calibration strategies with Model A4, and using only a i from calibrations by means of the old data yielded a medium prediction performance. Please note that, with Model A4, the random effects a ik and b (1) ik , which consider the measuring occasion, always need to be estimated from new data. Therefore, this approach had fewer variants.
In summary, Model B3 yielded smaller biases than Model A4, when the random effects were entirely predicted either by means of the old data or completely by the new data. In terms of RMSE, Model A4 had the best performance, especially when sample plot level random effects were predicted by means of the old data and when the random effects on the measurement-occasion level were predicted by means of the new data. However, the absolute bias of the best performing calibration variant with Model A4 was higher than with the best performing Model B3 variant.
The traditional uhc approach had a RMSE of 2.167 and a bias of −0.116. Thus, in terms of RMSE and bias, uhc performed only slightly worse than the Model B3 Calibration Strategy 8 with random effects predicted by means of the old data. In terms of RMSE, the uhc approach was slightly inferior compared with Model A4 and random effects for the measurement occasion predicted by means of the new data (Calibration Strategy 4). In terms of bias, uhc was superior.
Graphical diagnostic checks were conducted for Model B3 in the Calibration Strategy 8, Model A4 in Calibration Strategy 4 and the uhc approach. Figures 2 and 3 contain all predicted heights for the evaluation dataset. The smoothed scatterplot of predicted versus observed heights (Figure 2) of the three models showed that the tuples were independently and closely distributed around the reference line having zero intercept and slope one. In addition, the smoothed scatterplots of the residuals versus predictions (Figure 3) showed for both mixed models and also the uhc, that residuals behaved well, and evidence was found neither for a trend nor for heteroscedasticity. The partially accurate estimates (residue = 0) of the uhc can be explained by the localization based on the height and diameter of the median basal area tree per species. Due to the high density in the residue = 0, deviations from it have a much lower density, which results in a lighter gray value compared to the mixed models. Although the mixed model also contains the diameter of the median basal area tree per species as a covariate, this is in addition to some other covariables, which usually leave a residual error. In the mixed model approach, the height curve is not forced to exactly pass the tuple consisting of the diameter and height of the median tree, as is the case with the traditional uhc method. The graphical diagnostic plots of Figures 2 and 3 can be seen for the different tree species in Appendix C. Figure 4 contains the Model B3 and Model A4 mean curves for each tree species group. with Model A4, the random effects a ik and b ik (1) , which consider the measuring occasion, always need to be estimated from new data. Therefore, this approach had fewer variants. In summary, Model B3 yielded smaller biases than Model A4, when the random effects were entirely predicted either by means of the old data or completely by the new data. In terms of RMSE, Model A4 had the best performance, especially when sample plot level random effects were predicted by means of the old data and when the random effects on the measurement-occasion level were predicted by means of the new data. However, the absolute bias of the best performing calibration variant with Model A4 was higher than with the best performing Model B3 variant.
The traditional uhc approach had a RMSE of 2.167 and a bias of −0.116. Thus, in terms of RMSE and bias, uhc performed only slightly worse than the Model B3 Calibration Strategy 8 with random effects predicted by means of the old data. In terms of RMSE, the uhc approach was slightly inferior compared with Model A4 and random effects for the measurement occasion predicted by means of the new data (Calibration Strategy 4). In terms of bias, uhc was superior.
Graphical diagnostic checks were conducted for Model B3 in the Calibration Strategy 8, Model A4 in Calibration Strategy 4 and the uhc approach. Figures 2 and 3 contain all predicted heights for the evaluation dataset. The smoothed scatterplot of predicted versus observed heights (Figure 2) of the three models showed that the tuples were independently and closely distributed around the reference line having zero intercept and slope one. In addition, the smoothed scatterplots of the residuals versus predictions (Figure 3) showed for both mixed models and also the uhc, that residuals behaved well, and evidence was found neither for a trend nor for heteroscedasticity. The partially accurate estimates (residue = 0) of the uhc can be explained by the localization based on the height and diameter of the median basal area tree per species. Due to the high density in the residue = 0, deviations from it have a much lower density, which results in a lighter gray value compared to the mixed models. Although the mixed model also contains the diameter of the median basal area tree per species as a covariate, this is in addition to some other covariables, which usually leave a residual error. In the mixed model approach, the height curve is not forced to exactly pass the tuple consisting of the diameter and height of the median tree, as is the case with the traditional uhc method. The graphical diagnostic plots of Figures 2 and 3 can be seen for the different tree species in Appendix C. Figure 4 contains the Model B3 and Model A4 mean curves for each tree species group.

Model Formulation and Calibration Strategy
The present paper demonstrates two different approaches to account for possible temporal changes of the sample plot level variance of ℎ--curves. The first approach uses additional random effects at the level of the measurement occasion (Model A4), and the second includes a fixed time trend which can vary randomly among the sample plots (Model B3). In the context of the Näslund function (Equation (1)), Schmidt [44] wrote that it provides numerical advantages when specifying complex additive and mixed regression models as linear models. Using the linearized form of Näslund's function may introduce a (small) transformation bias into the model. However, the alternative would be to use a nonlinear mixed model and fit a true nonlinear curve type to avoid a transformation bias (e.g., [4,8,45]). The application of a true nonlinear curve type would change

Model Formulation and Calibration Strategy
The present paper demonstrates two different approaches to account for possible temporal changes of the sample plot level variance of ℎ--curves. The first approach uses additional random effects at the level of the measurement occasion (Model A4), and the second includes a fixed time trend which can vary randomly among the sample plots (Model B3). In the context of the Näslund function (Equation (1)), Schmidt [44] wrote that it provides numerical advantages when specifying complex additive and mixed regression models as linear models. Using the linearized form of Näslund's function may introduce a (small) transformation bias into the model. However, the alternative would be to use a nonlinear mixed model and fit a true nonlinear curve type to avoid a

Model Formulation and Calibration Strategy
The present paper demonstrates two different approaches to account for possible temporal changes of the sample plot level variance of h − d-curves. The first approach uses additional random effects at the level of the measurement occasion (Model A4), and the second includes a fixed time trend which can vary randomly among the sample plots (Model B3). In the context of the Näslund function (Equation (1)), Schmidt [44] wrote that it provides numerical advantages when specifying complex additive and mixed regression models as linear models. Using the linearized form of Näslund's function may introduce a (small) transformation bias into the model. However, the alternative would be to use a nonlinear mixed model and fit a true nonlinear curve type to avoid a transformation bias (e.g., [4,8,45]). The application of a true nonlinear curve type would change statistical inference completely and would require imprecise first order Taylor expansions for the model fitting as well as for the calibration (random effect prediction). Thus, model fitting as well as calibration would easily become instable and may produce illogical results [18]. The rationale of choosing the Näslund function was that, besides its well-known proper statistical behavior, this curve type can be linearized and produces highly stable results even when applied with a large set of possible covariates. To linearize the Näslund function, the parameter γ was determined via a grid search. If optimized with respect to a small bias, Model B3, Strategy 8 would yield γ = 2.6. The smallest RMSE results with an exponent of 3. Model A4 Strategy 4 achieves the lowest bias with γ = 1.5 and the smallest RMSE with γ = 3.2. Kramer et al. [46] and Schmidt [47] suggested fixing gamma at a value of 3 to linearize the function. Based on the suggestions found in the literature [46,47] and with the prerequisite of a small RMSE, even under a negligible bias, the parameter was fixed at a value of 3. For comparison with nonlinear models, Temesgen et al. [8] obtained an RMSE between 1.4 and 4.6 and a bias between 0 and 0.1. Compared to our results (Table 7), this means that with linear mixed models a similar RMSE (especially for Model B3 Calibration Strategy 8 and Model A4 Calibration Strategy 4) but a slightly higher bias (especially for Model A4 Calibration Strategy 4) can be achieved. The latter probably results from the linearization, but still has a negligible small amount. However, as seen in Table 7, model performance depends much more on the calibration strategy.
The different calibration strategies were evaluated using straightforward matrix algebra, shown in Equation (12). Hence, under certain circumstances, it was feasible to use random effect predictions that stem from calibrations by means of different data sources. However, such a mixing of predicted random effects only worked well if the random effects from a certain model hierarchy were thoroughly calibrated for a single data source. Contrarily, it was not feasible to mix random effects predictions from different calibrations on a specific model hierarchy, as this resulted in a relatively high RMSE and bias. The rationale is that covariances between the random effects were not properly addressed in the latter case.
This means that in the case of Model B3 (time trend), either all random effects (Calibration Strategy 1) must be recalibrated by means of new data, or all must be calibrated based on old data (Calibration Strategy 8), to achieve a good model performance. With Model A4 (with random effects on measurement occasion), either all random effects (Calibration Strategy 1) or only those at the level of measurement occasion (Calibration Strategy 4) should be recalibrated by means of new data. In the latter case, sample plot level random effects would stem from old data. Furthermore, it turned out that a recalibration based on current observations is actually not necessary with Model B3, but it works surprisingly well, despite the very small number of observations that only contained d and h of the median basal area tree per species of each sample plot. The latter also applies to Model A4, where a recalibration or partial recalibration is always necessary, i.e., either the complete set of random effects was predicted via BLUPs or only a subset was predicted. Based upon these findings, the Models B3 Calibration Strategy 8 and A4 Calibration Strategy 4 can be applied as appropriate models for height imputations in forest inventory practice. Both models showed similar performance with respect to RMSE, whereas Model A4 Calibration Strategy 4 has a slightly higher bias. A benefit of Model B3 Calibration Strategy 8 is that predictions can be derived using standard routines that are implemented in the R package "nlme" [39]; this is because random effects are predicted based on old data. In particular, as the Model A4 Calibration Strategy 4 4 is calibrated based on old and new data, a little extra programming was necessary.
To account for the temporal variance of the model parameters, the measurement occasion was considered in Model B3 as a population-wide fixed effect as well as by using sample plot level random effects. It is crucial to note that with such a formulation, the temporal trend is linearly extrapolated from the model data into the future. To overcome possible limitations associated with the latter simplification, further trials were made exploring the fixed interaction effects of the measurement occasion and the mean stand height hm together with additional sample plot specific random effects. The advantage of this approach would be that hm could account for other unknown effects on tree height development that could possibly result from different environmental or climatic conditions. However, the accuracy and precision of the latter model construction was not satisfactory.
In addition to the random effects on intercept (Model B3 and A4), d (Model B3 and A4) and measurement occasion (Model B3), additional random effects were also tested for dm and hm, but did result in an enhanced prediction performance. In some existing mixed h − d models (e.g., [15]), additional tree-level random effects are applied that account for possible correlations due to multiple repeated measurements of a single tree. Mehtätalo [17] also tested the implementation of random effects at the tree level, but this approach was discarded because estimation was too computer expensive. Nevertheless, neglecting possible tree-level random effects should not significantly affect the prediction performance of the models, as long as the number of repeated measurements per tree is small compared to the number of trees in modeling data. The experiences made in the present study were in conformance with those presented in Mehtätalo [17], whereby no random effect was placed at the tree level.

Covariates
The final mixed generalized height-diameter models included fixed effects for dummy categorical tree-species groups, for interaction effects between tree species groups and d and for the diameter and height of the median basal area tree per species. Knieling [27] also showed, in estimating the coefficients for the uhc, that tree-specific differences should be taken into account. Therefore, it was also tested whether the slope of the h − d curve varies among the different tree species groups [27]. However, hypothesis tests revealed that a significant tree species effect only exists for the model intercept.
Following Mehtätalo [17], the diameter and height of the median basal area tree were used to describe mean tree size. Both measures can be easily derived from angle-count sample plots [26,43]. Alternatively, Schmidt [36] and Calama et al. [48] integrated age rather than average tree size as a predictor in the model. Calama et al. [48] also showed that the inclusion of age in the model in addition to mean tree size did not lead to an improvement. The inclusion of age has rather a higher relevance in conjunction with growth models, but is actually necessary for the modeling of h − d relationships in a forest inventory context. In addition, the application of stand age is generally restricted to even-aged stands due to the limited explanatory power of age in uneven-aged stands (e.g., [49] and references therein).
As part of the modeling process, the site variables elevation, slope and aspect were also tested as fixed effects. None of these parameters could describe a significant correlation with tree height. Köhler [50] investigated the relationships between site factors and height growth of Norway spruce. In contrast to the findings of our work, it was found that the factors sunshine class (including terrain and slope), elevation and relative humidity proved to be relevant regressor covariates for modeling stand height. Similarly, in [51], it has been shown for the subalpine zone that stand height is decreased by 13% per an increase in elevation of 70 m. However, site elevation in the present study region of the BOKU training forest has only a small range from 350 m to 725 m, overall representing lower mountainous sites. Accordingly, on such sub-montane sites in Austria, only little differences can be expected with respect to height growth [52].
The tested SDI (stand density index) showed no significant influence on the shape of the height curve. Inclusion of SDI in the models also resulted in a worse RMSE and bias and was therefore not considered as a fixed effect in the full models. Some authors (e.g., [53]) demonstrated that stand density is a highly relevant factor influencing the h − d relationship in a forest stand. However, in other studies (e.g., [16]), less influence is attributed to stand density. In this study, both the diameter and the height of the median basal area tree per species are included, which represent the average h − d ratio of the stand. The h − d ratio is highly correlated with stand density in that higher ratios generally occur in denser stands (e.g., [54,55]). The rationale behind this phenomenon is a well-known individual resource reallocation strategy to maintain survival. If high competition is present, trees react with reduced diameter growth rates, whereas height growth remains nearly unchanged. This may also explain why additional measures for tree competition or density (basal area and stem number) likewise proved to be not significant in the model, as long as the h − d ratio is included. Due to the included h − d ratio, the QMD and the d/QMD ratio also show no improved height predictions.
Based on the assumption that trees in mixed stands grow differently than in pure stands, a dummy variable for the mixture-type was tested during the model construction process. Sharma et al. [35] showed that the h − d relationship in mixed stands has a higher variance than in pure stands. This is because mixed stands often form multiple layers. However, it was found in the present study that species mixture does not affect tree height.

Conclusions
When applied to new independent data, the novel mixed model h − d-curve approach provides more precise height predictions than the traditional uhc method. However, the relatively simple uhc method achieved surprisingly good results and has only a slightly higher bias and RMSE. Consideration of repeated measurements in mixed models can be implemented either via additional random effects for the measurement occasion (Model A4) or via a co-modeling of the temporal trend (Model B3) of the relevant model parameters.
Results gave evidence that a recalibration of a mixed h − d model is not strictly necessary, when a temporal trend model (Model B3) is applied for predictions at new measurement occasions. For Model A4 and B3, it is appropriate to use random effect predictions obtained from calibrations by means of measurements in the past. However, this procedure must be consequently applied for all random effects on a specific model hierarchy. Otherwise, predictions would become imprecise and a serious bias may result.
As fixed covariables, the diameter at breast height, the logarithmic diameter, and the height of the median basal area tree per species, as well as the measuring period are used in the created model. Regarding the structure of the random effects, the best results could be achieved with sample plot specific effects on the intercept (Model B3 and A4), diameter at breast height (Model B3 and A4) and measurement period (Model B3).
Author Contributions: A.N. and C.G. designed the study; C.G. and C.W. collected the field data of the current inventory; C.W. and C.G. processed the data; C.G. and A.N. performed the model fitting; S.V. and T.R. supported data analysis; and C.G., A.N., S.V., T.R. and C.W. wrote the paper.

Appendix B
Matrix algebra for example calibrations of Model B3 on a sample plot in model data via BLUP (Equation (12)):

Appendix B
Matrix algebra for example calibrations of Model B3 on a sample plot in model data via BLUP (Equation (12)):         Figure A2. Predicted heights against observed heights.