Performance of Kernel Estimator and Johnson S B Function for Modeling Diameter Distribution of Black Alder ( Alnus glutinosa (L.) Gaertn.) Stands

: We compare the usefulness of nonparametric and parametric methods of diameter distribution modeling. The nonparametric method was represented by the new tool—kernel estimator of cumulative distribution function with bandwidths of 1 cm (KE1), 2 cm (KE2), and bandwidth obtained automatically (KEA). Johnson S B (JSB) function was used for the parametric method. The data set consisted of 7867 measurements made at breast height in 360 sample plots established in 36 managed black alder ( Alnus glutinosa (L.) Gaertn.) stands located in southeastern Poland. The model performance was assessed using leave-one-plot-out cross-validation and goodness-of-fit measures: mean error, root mean squared error, Kolmogorov–Smirnov, and Anderson–Darling statistics. The model based on KE1 revealed a good fit to diameters forming training sets. A poor fit was observed for KEA. Frequency of diameters forming test sets were properly fitted by KEA and poorly by KE1. KEA develops more general models that can be used for the approximation of independent data sets. Models based on KE1 adequately fit local irregularities in diameter frequency, which may be considered as an advantageous in some situations and as a drawback in other conditions due to the risk of model overfitting. The application of the JSB function to training sets resulted in the worst fit among the developed models. The performance of the parametric method used to test sets varied depending on the criterion used. Similar to KEA, the JSB function gives more general models that emphasize the rough shape of the approximated distribution. Site type and stand age do not affect the fit of nonparametric models. The JSB function show slightly better fit in older stands. The differences between the average values of Kolmogorov– Smirnov (KS), Anderson–Darling (AD), and root mean squared error (RMSE) statistics calculated for models developed with test sets were statistically nonsignificant, which indicates the similar usefulness of the investigated methods for modeling diameter distribution.


Introduction
Diameter at breast height (dbh) is a basic tree attribute that is important to both forestry practice and science. It is a simple to measure, and closely related to other difficult to obtain tree features. Diameter as an explanatory variable is used to predict tree attributes such as height [1,2], crown length [3,4], bark thickness [5], volume [6,7], biomass [8,9], and carbon storage [10].
The mean diameter provides only a general view of a tree stand, and it is required to know dbh distribution in order to obtain detailed information about the stand [11]. Research focused on mathematical description of dbh distribution has long history. Initially, diameter frequency models were mainly used as part of yield models. Over time, it was obvious that dbh distribution models can essentially support other areas of forestry practice and science. Examples include the optimization of silviculture treatments, preharvest prediction of the kind and amount of wood assortments, tree bucking control, forest resource evaluation, preparation of timber trade contracts, and financial planning undertaken by forest holdings. A close relationship between diameter distribution and stand volume from one side and between stand volume and financial income from the second side justify the inclusion of diameter distribution models as part of decision support systems [12][13][14][15]. In addition to the abovementioned strictly quantitative approach, mathematically described diameter distribution can provide valuable information about forest dynamics [16] and support the management of uneven-aged and mixed forests [17,18].
Forests as a multipurpose site provide wood and other goods and services that are important to society. Forest managers requires tools for evaluating different management practices and their effects on stand structure [19]. In this context, dbh distribution models can provide very valuable information that constitute the first link of the management chain-the so-called three Ds: data, dynamics, and decisions [20].
Dbh distribution modeling is an example of pattern recognition [21]. It combines a set of different issues related first to the choice of method that is adequate to approximate the number of trees belonging to distinguished diameter classes. Other issues related to dbh distribution modeling include inventory of forest resources [22], geostatistics [23,24], and remote data acquisition [25][26][27][28] combined with the application of advanced machine learning methods [10,29].
One of the most popular approaches that allows a mathematical description of dbh distribution relies on the use of available theoretical probability density functions and is commonly referred to as "parametric". As diameters are thought to be realizations of some kind of continuous distribution [30], the research scope is to find adequate function. To date, there are no biological reasons to justify the choice of any particular function [31,32]. The basic premise justifying the use of theoretical distribution is usually the ease of estimating and interpreting its parameters as well as the goodnessof-fit to empirical data. Examples of theoretical functions applied to date to model frequency of diameter include the following distributions: normal [ [37]. The usefulness of the aforementioned models for describing dbh distribution of black alder stands was assessed in the authors' previous paper [38]. The best results were obtained using Johnson SB function, while the worst result was obtained for normal distribution. In the frame of parametric approach are also used finite mixture models. They combine two or more single functions (e.g., Weibull or gamma) and are mainly used to model irregularly shaped diameter frequencies occurring in mixed-species and uneven-aged stands [39][40][41][42].
In addition to the parametric approach, there are a number of alternative methods for modeling dbh distribution that do not rely on any predefined function and are jointly termed as "nonparametric". The concept of nonparametric methods is based on the optimal use of information contained in the collected data [43]. Previous studies have reported the following examples of nonparametric approaches used in the context of dbh distribution modeling: percentile method [44][45][46], k-nearest neighbor method [25, [47][48][49][50], decimal curves [51], neural networks [52,53], and kernel estimators [54][55][56][57][58][59].
The application of kernel estimator to model dbh frequency does not require any assumptions on the form of the underlying density, which is a characteristic of nonparametric methods [60]. On the other hand, this method requires the selection of the type of kernel function and adequate value of bandwidth, which is also known as a smoothing parameter or window. Kernel function ensures continuity and is responsible for weighting observation [61]. Examples of the available kernel types include Gaussian, Epanechnikov, uniform, biweight, and triweight. The type of kernel has minor influence on the performance of estimator. However, the properties of kernel function, including, for example, the order of its derivatives, play an important role in the optimization process used to select smoothing parameter. Bandwidth determines the neighborhood of the point (diameter record) used during the estimation of the probability value [62]. Taubert et al. [63] focused on the importance of the chosen diameter class width to the final effects of parametric diameter distribution modeling. Similar to kernel estimator, the selected bandwidth has major influence on the results of diameter frequency estimation. Kernel estimator with too small bandwidth provides a dbh distribution model with excessively highlighted local extrema. On the other hand, an estimator with high values of bandwidth causes a very flat final distribution shape and neglects local extrema. This shows the crucial role of bandwidth in determining a bias-variance tradeoff [64].
Because of few studies on the use of kernel estimator to describe dbh distribution, it is actually a poorly recognized tool in forest resource modeling. Results of previous research by Pogoda et al. [65] provide only preliminary information about the size and variability of the smoothing parameter depending on the age of the alder stand, the size of used sample and the chosen method of bandwidth selection. However, a detailed analysis focused on the assessment of the direct use of kernel estimator to model dbh distribution is lacking.
The present study aimed to assess the effects of diameter distribution description for black alder stands by using kernel estimator with different bandwidth values and then performing a comparative analysis of the usefulness of kernel estimator and Johnson SB function for modeling frequency of diameters.

Data Collection
Data from 36 managed black alder stands located in southeastern Poland in the Dąbrowa Tarnowska (10 stands) and Niepołomice (26 stands) Forest District were used in this study. The stands represent four age groups, i.e., 20, 40, 60, and 80 years old and three distinguished site types: moist mixed broadleaved forest (MMB), moist broadleaved forest (MB), and alder swamp forest (AS). Each age group was represented by three stands, while site type by 12 stands (4 age groups of 3 stands). All stands were characterized by the share of black alder not less than 0.9 and the ratio of actual to tabular growing stock of above 0.8. The basic attributes characterizing the studied stands are summarized in Table 1.
In each stand, 10 circular sample plots were established that were located systematically in the nodes of a grid of squares. The side of a square depended on the stand area and ranged from 20 to 70 m. The size of sample plots was 0.02, 0.03, 0.04, and 0.05 ha for stands in age 20, 40, 60, and 80 years old, respectively. For each tree in the sample plot, the dbh (1.3 m above ground) was measured using a caliper in two perpendicular directions with the accuracy of 0.1 cm. The average value from the two measurements was taken for further analysis. In total, 7867 black alders were measured, whose diameters ranged from 6.7 to 61.7 cm.

Diameter Distribution Models
In the first stage of analysis, the cumulative frequency of diameters in the considered stands was approximated using kernel estimator of cumulative distribution function (CDF), proposed in 1964 by Nadaraya, which is expressed by the following formula [66]: where n is sample size, x is the value of dbh (cm) for which the CDF was estimated, i X is individual dbh belonging to the sample, h is a bandwidth (h > 0, cm),

 
K  is an integral of kernel function k expressed by the following formula: In the presented analysis, the estimator with Gaussian kernel was applied as follows: Cumulative frequency of diameters was approximated by kernel estimator of CDF with three different smoothing parameter values. Kernel estimator denoted as "KEA" use bandwidth obtained automatically in the process of optimizing the discrepancy between empirical and predicted probability. Optimization is carried out with the selected criterion, which may include mean integrated squared error (MISE) or asymptotic mean integrated squared error (AMISE) [67]. The research problem is to find an adequate method to estimate the value of predicted probability. In the present study, a plug-in method proposed by Altman and Legér [68] was used for this purpose. Pilot bandwidth required by the above method during the estimation of predicted probability value was selected by Silverman's rule of thumb [67,69]. Kernel estimators denoted as "KE1" and "KE2" use bandwidth 1 cm and 2 cm, respectively. The mentioned values correspond to the span of diameter classes commonly used in practical presentation of building stand tables.
In the second stage of analysis, frequency of diameters was described by Johnson SB ("JSB") function, which was applied for the first time in dbh distribution modeling context by Hafley and Schreuder [37]. The JSB probability density function is given by the formula: and the CDF takes the form as: where x is dbh (cm), γ (-∞ < γ < +∞) and δ (δ > 0) are shape parameters denoting asymmetry and kurtosis, respectively, λ (λ > 0) is a scale parameter, ξ (-∞ < ξ < +∞) is location parameter, and Φ is Laplace integral. The parameters of JSB function were estimated from detailed diameter records using the maximum likelihood method (MLE).

Model Evaluation and Validation
The performance of KE1, KE2, KEA, and JSB function was assessed by the cross-validation method [70][71][72]. It allows to avoid some too optimistic results ("optimism principle") [73], which can occur when model assessment is performed using the same data that was used to construct the model. Considering the hierarchical structure of the collected data (levels: single tree and sample plot), it was assumed that the sample plot will be the basic unit to group the collected data for resampling. This corresponds to the fact that in forestry research, the data representing sample plot are most often used for model development. It refers both to the traditional process of model development based on field inventories and to the construction of models using remotely sensed data in the frame of areabased approach (ABA) [27]. In the present study, cross-validation was performed using the leaveone-plot-out cross-validation method (LOPOCV) [74]. From the set of 10 sample plots established in stand, 9 sample plots were used as a training set, and the remaining one plot constituted the test set. Detailed diameter records forming the training set were used to estimate the parameters of JSB function and to the obtain bandwidth for KEA. The LOPOCV procedure was repeated 10 times, so that each sample plot established in the stand was used exactly once as a test set. Goodness-of-fit was assessed using the following statistics: mean error (ME): root mean squared error (RMSE): the Kolmogorov-Smirnov statistic (KS) [73]: the Anderson-Darling statistic (AD) [75]: Final comparison of kernel estimator and JSB function was based on the analysis of mean values and variability of statistics obtained with Equations (6-9) for training and test sets. The analyses were performed for all 36 stands and within the distinguished age groups and site types. By comparing calculated KS and AD statistics with their critical values, it was assessed in how many cases the CDF approximated by kernel estimator or JSB function differed significantly from the empirical CDF at the assumed level of significance α = 0.05. Because of the lack of normality, the Kruskal-Wallis test was used to validate the significance of differences between the average values of goodness-of-fit statistics (Equations (6-9)) for assessing the results of dbh distribution modeling by the considered methods.
Calculations and analyses were performed using the EasyFit 5.5 Professional Program and packages sROC [76] and SuppDists [77] available in R environment [78].

Results
By using kernel estimator with three bandwidth variants, namely KE1, KE2, and KEA, the nonparametric models of dbh distribution for all (360) training sets were obtained. The KS test showed significant (α = 0.05) differences between empirical and estimated dbh frequencies for 34 training sets for the KE2 model. A similar analysis (KS test, α = 0.05) performed for test sets showed significant differences for 100, 94, and 95 sets for KE1, KE2, and KEA models, respectively. The outcomes of the AD test showed significant (α = 0.05) differences for 68 training sets for only the KE2 model and for 134, 131, and 124 test sets for KE1, KE2, and KEA, respectively. The MLE used for parameter estimation of JSB function failed for 48 (13.3%) training sets due to lack of convergence. Consequently, the parametric model was used for only 312 training and test sets. From the results of the KS test, significant differences were found between empirical dbh distribution and estimated by JSB function for 25 training and 82 test sets, while the AD test showed significant differences for 25 training and 118 test sets.
For training sets, the best fit for nonparametric dbh distribution models (KS = 0.0293, AD = 0.2594, and RMSE = 0.0121) was observed for the KE1 model and the worst fit (KS = 0.0569, AD = 1.1252, and RMSE = 0.0268) for the KEA model (Table 2). In terms of fitting to training sets, the parametric model developed on the basis of JSB function was the worst (KS = 0.0666, AD = 4.2440, and RMSE = 0.0326), in which the largest variability of KS, AD, and RMSE was also observed ( Table  2). The differences between the average values of KS, AD, and RMSE observed for parametric and nonparametric models for training sets were statistically significant (Kruskal-Wallis test, α = 0.05). The goodness-of-fit assessment carried out for test sets showed that for the nonparametric method, the smallest differences between the empirical and modeled CDF (KS = 0.2440, AD = 3.0245, and RMSE = 0.1426) were observed for KEA and the largest for KE1 ( Table 2). The assessment of the effectiveness of the parametric JSB model strongly depended on the fitting criterion used. In terms of KS statistics (KS = 0.2427), it had the best fit, while according to AD statistics (AD = 3.4290), it had the worst fit. The value of RMSE statistics (0.1431) placed JSB function in the second spot immediately below the KEA. The differences between the average values of the goodness-of-fit statistics observed for test sets approximated by KE1, KE2, KEA, and JSB function were statistically nonsignificant (Kruskal-Wallis test, α = 0.05).
An example of diameter distribution modeling by the nonparametric method represented by KE1, KE2, KEA, and the parametric method based on JSB function is shown in Figure 1. The frequency of diameters belonging to the training set (Figure 1a), which was used for model development, was very well fitted by KE1. The application of KEA led to a much more general model that does not consider local maxima and minima in diameter frequency, which were observed in sample distribution. The parametric model based on JSB function is also a more general model. The comparison of estimated models for an independent data-the test set ( Figure 1b)-showed the advantage of general models reflecting the rough shape of distribution over models that excessively fit the shape of the empirical distribution from the sample used for estimation. Among the nonparametric models: KE1, KE2, and KEA, in the majority of the distinguished group of stands, the best fit to empirical diameter frequencies of training sets was obtained using KE1, whereas the worst fit was observed for KEA (Tables 3-5). Compared to the nonparametric method of kernel estimation, the parametric model based on JSB function showed in most cases a much worse fit to data forming training sets. The evaluation of dbh distribution models performed on independent samples of diameter forming test sets indicated that among the nonparametric models, the best fit in the majority of site and age groups of stands was found for KEA, and the worst was for KE1 (Tables 3-5).  Table 2. Abbreviations as denoted in Table 3.
The average value of bandwidth to KEA determined using samples of diameters representing stands growing in MMB, MB, and AS forest was 2.91, 2.33, and 2.95 cm, respectively, and for particular age groups, namely 20, 40, 60, and 80 years old, the value was 1.46, 2.41, 3.34, and 3.71 cm, respectively. Thus, in every case, the bandwidth was greater than 1 cm, and in most cases, it was more than 2 cm, which resulted in a much worse fit of KEA to the training sets compared to that of KE1 and KE2. At the same time, a more general shape of the distribution obtained by KEA allows better fit to independent data forming test sets. The evaluation of JSB function carried out on test sets for the distinguished groups of stands, especially for age groups, showed that this model was inferior to the nonparametric models in stands of age 20 and 40 years old. In older stands, i.e., 60 and 80 years old, the model based on JSB function proved to be better than KE1, KE2, and KEA (Tables 3-5). Abbreviations as denoted in Table 3.
The Kruskal-Wallis test (α = 0.05) conducted independently for separate site and age groups of stands showed significant differences between the average values of RMSE, KS, and AD statistics calculated for the analyzed models on the basis of data of training sets. For independent test sets, differences in the average values of the abovementioned statistics were nonsignificant, except for the KS statistic determined for MB forest (Table 6). Models were also evaluated in terms of the value of systematic error (ME). For both training and test sets, a slightly larger ME error was found for KE1, KE2, and KEA models than that noted for the parametric model of JSB function ( Table 2). The observed differences of ME statistics for the developed models were nonsignificant, except for training sets representing MB forest (Kruskal-Wallis test, Table 6).  Tables 2 and 3.

Discussion
The present study is a continuation of a previous investigation focused on modeling dbh distribution for managed black alder stands located in southeastern Poland [38, 46,65]. Diameter distribution as a basic attribute of stand structure [79,80] is modeled using different methods that can be divided into two distinct categories: parametric and nonparametric. The parametric method used in the present study is represented by the JSB function. It was first used to describe the frequency of diameters for pine stands growing on the east coast of the USA [37]. Because of its good performance, the JSB function was widely used to model dbh structure of different stands growing in various geographical zones [30, [81][82][83][84][85]. Previous studies on the application of JSB function in Poland showed contradictory findings. Zasada [86] reported satisfactory performance of the JSB function when applied to modeling diameter frequency of fir stands. The same author found low usefulness of this function to model dbh distribution of birch stands [87]. Jagiełło et al. [88] confirmed the usefulness of the JSB function for beech stands, while Rymer-Dudzińska and Dudzińska [89,90] indicated high instability of the results obtained using this model. Interesting results were also reported by Siekierski [91], who, as the first researcher in Poland, paid attention to the problem associated with the estimation of parameters to JSB function.
Our study results confirmed that the performance of JSB function used to modeling dbh distribution for black alder stands was in line with the earlier works listed above. The JSB function shows the worst fit to data forming training sets. Additionally, problems occurred for parameter estimation for 48 training sets. Frequencies of diameters belonging to test sets were, however, better approximated by the JSB function. While analyzing the performance of the JSB function, it should be noted that it is a four-parameter function. In some situations, a larger set of parameters cause that model to be more flexible and therefore, more adequate to describe dbh distributions with different shapes. Simultaneously, the application of the abovementioned model may cause the problem of lack of convergence to iterative procedures used for parameter estimation and additionally hamper their biological interpretation [92]. It should also be noted that JSB function is a member of the Johnson family of distribution [93]. Functions belonging to Johnson's family have different normalizing transformations and by this cover other distribution shapes. It is possible that the estimation of JSB function parameters failed in our study because the collected diameter records did not meet the conditions for the given transformation.
Some studies [32, 50,94] have emphasized that the method applied during parameters estimation has substantial influence on the relative performance of the chosen probability density function. In this situation, the application of nonparametric methods of dbh distribution modeling and using information from the observed stand tables seems to be a good support to the parametric approach. The performance of kernel estimator used in the present study strictly depends on the selected bandwidth. KEA, by using bandwidth obtained automatically as a result of the optimization process, emphasizes the general shape of distribution by ignoring local irregularities observed in training set diameter frequency. Different results were achieved using KE1 that can model extrema (minima and maxima) occurring in empirical diameter frequency. The good performance of KE1 can be very helpful in modeling irregular diameter distributions, although the well fit carries the risk that the final model is over-fitted, as observed in the present study. KE1 properly describes the distribution of the training set (Figure 1a), but when transferred to independent data (Figure 1b), it presents much worst fit. KEA, through its general shape, approximate very well to the distribution of diameter records forming test set. The performance of KE2 can be considered as intermediate between KEA and KE1. For training sets characterized by irregular shape of diameter distribution, KE2 depicted local extrema less excessively than KE1.
The observed good performance of KEA justifies the use of bandwidths obtained automatically on the basis of data used for model development instead of arbitrarily adopted values. In the process of bandwidth selection, it is worth additionally considering the hint given by Sheather [69] and to analyze simultaneously few values of bandwidth.
The practical application of kernel estimator requires some diameter records; thus, this approach allows only the interpolation of probability values without the possibility of their extrapolation [58]. By considering two aspects of dbh distribution modeling, first: prediction of the number of trees belonging to distinguished diameter classes solely on the basis of assessed stand attributes, and second: the shape description of dbh distribution, it seems that kernel estimator can effectively support the smoothing of empirical diameter frequencies. This ability may be important especially in the stage of preliminary approximations [56], as it influences the follow up selection of theoretical probability density function for dbh distribution modeling. Uutera and Maltamo [54] reported that the use of kernel estimator with a relatively small amount of data can be seen as an advantage to this method for dbh distribution modeling, considering that traditional field inventories are often laborious and expensive. Remote-sensing techniques and remotely sensed materials, which according to some researchers drastically improve the outcome of inventories [27], also have some limitations related to dbh distribution modeling [20] and still need to be validated by field measurements. As forest inventories and planning require cost reduction [95], it seems that the application of kernel estimator can fit in this scope. Siipilehto et al. [96] and Siipilehto and Rajala [97] reported promising examples of dbh distribution smoothing by using a pre-harvest measurement tool termed as EMO. The practical application of EMO requires the measurement of several diameters in the stand and the subsequent application of kernel estimator, to obtain the diameter frequency.
The results obtained during the approximation of dbh frequency confirmed that KE1, KE2, KEA, and JSB function are slightly different models. They differ in the sensitivity to the structure of the data and behave differently depending on the data provided for their development. KE1 and KE2 to some extent, highlight local extrema (minima and maxima) in diameter frequency. KEA and JSB are susceptible to changes in kurtosis and skewness. The similar performance of KE1, KE2, KEA, and JSB function in the case of test sets can be partially explained by the number of trees that constitute training sets (several dozen or even several hundred trees) used in the stage of model development.
The performance of the investigated models developed with training sets containing fewer numbers of trees seems to be an interesting topic for further analysis.
Our investigation confirmed that the final evaluation of dbh distribution models requires adequate procedure of data resampling. As shown in Table 2 and Figure 1, the evaluation conducted on the data set used for model construction and on independent data sample yields completely different results. From the practical point of view, more valuable information about model performance can be obtained with test sets. Diameter records gathered in the test sets include new information that should be properly presented by the dbh distribution model.
Despite the different methodological grounds, both kernel estimator and JSB function are suitable to describe empirical diameter frequencies of black alder stands. A slight disadvantage of the JSB function is the lack of possibility to estimate the parameters for 48 training sets. It is difficult to unambiguously define a situation in which kernel estimator or JSB function is preferred. Considering the importance of information about diameter frequency for forestry practice and science, it seems that both these methods should be treated as complementary to each other unless the analysis purpose, or data set characteristics (size, variability, etc.), discriminate against one of them.

Conclusions
Kernel estimator with three bandwidth variants, namely KE1, KE2, and KEA, and JSB function represent the nonparametric and parametric approaches of density estimation and are useful tools for dbh distribution modeling in alder stands. The KE1 model is characterized by a very high degree of mapping the shape of empirical diameter distributions. KE1 can accurately represent the irregularities in the shape of distribution observed in the sample used for estimation; this can be considered as an advantage for the estimation of multimodal irregularly shaped distributions. However, the use of KE1 carries the risk of excessive adjustment of the model to the distribution observed in the sample, which implies that the obtained model may differ from the actual dbh distribution occurring in a given population of trees. Models developed using KEA and JSB function provide a more general shape of distribution and do not adjust to local irregularities observed in the sample; thus, ensuring greater compatibility with real-world distribution. Reliable assessment of diameter distribution models requires consideration of the data structure. Such an assessment should be performed on an independent set of measurement data, and conducting it only on the data used in the development of the model can lead to overly optimistic conclusions.