Modeling in Forestry Using Mixture Models Fitted to Grouped and Ungrouped Data

: The creation and maintenance of complex forest structures has become an important forestry objective. Complex forest structures, often expressed in multimodal shapes of tree size/diameter (DBH) distributions, are challenging to model. Mixture probability density functions of two- or three-component gamma, log-normal, and Weibull mixture models offer a solution and can additionally provide insights into forest dynamics. Model parameters can be efﬁciently estimated with the maximum likelihood (ML) approach using iterative methods such as the Newton-Raphson (NR) algorithm. However, the NR algorithm is sensitive to the choice of initial values and does not always converge. As an alternative, we explored the use of the iterative expectation-maximization (EM) algorithm for estimating parameters of the aforementioned mixture models because it always converges to ML estimators. Since forestry data frequently occur both in grouped (classiﬁed) and ungrouped (raw) forms, the EM algorithm was applied to explore the goodness-of-ﬁt of the gamma, log-normal, and Weibull mixture distributions in three sample plots that exhibited irregular, multimodal, highly skewed, and heavy-tailed DBH distributions where some size classes were empty. The EM-based goodness-of-ﬁt was further compared against a nonparametric kernel-based density estimation (NK) model and the recently popularized gamma-shaped mixture (GSM) models using the ungrouped data. In this example application, the EM algorithm provided well-ﬁtting two- or three-component mixture models for all three model families. The number of components of the best-ﬁtting models differed among the three sample plots (but not among model families) and the mixture models of the log-normal and gamma families provided a better ﬁt than the Weibull distribution for grouped and ungrouped data. For ungrouped data, both log-normal and gamma mixture distributions outperformed the GSM model and, with the exception of the multimodal diameter distribution, also the NK model. The EM algorithm appears to be a promising tool for modeling complex forest structures.


Introduction
In forestry, the contemporary paradigms of ecological forestry and close-to-nature silviculture look toward natural disturbance regimes to inform management approaches [1,2]. In general, natural disturbances and silvicultural cutting practices are important drivers of forest dynamics that often cause partial upper canopy tree mortality and create openings that serve as sites and niches for the establishment of new and/or the release of advance tree regeneration [3][4][5][6][7]. Due to tremendous variabilities in size, intensity, severity, and frequency of disturbances [4,8], large live legacy trees often survive natural disturbances and are thus placed in the immediate proximity of the new regeneration [7,[9][10][11]. The outcomes of such partial disturbances are highly heterogeneous vertical forest structures in time and space that range from single to two or more cohorts or single to two or multiple canopy layers formed by pure or mixed tree species [12].
To quantify vertical forest structures, model forest dynamics, growth, and yield, and compare managed to natural stands, size/diameter at breast height (DBH) distributions are typically used [13]. Several shapes of DBH distributions have been consistently found in both managed and natural stands [14]. For example, relatively even-aged stands are often characterized by unimodal and near normal distributions, uneven-aged stands typically exhibit rotated sigmoid or reverse-J diameter distributions and multi-aged stands such as many old-growth forests show multimodal and/or irregularly descending distributions characterized by asymmetry, skewness, interruptions (i.e., gaps in the distribution), multimodality, and heavy tails [14].
Two general approaches (i.e., nonparametric and parametric methods) dominate the modeling of DBH distributions. The nonparametric kernel-based density estimation (NK) method e.g., Ref. [15] is a very efficient approach with the capacity to approximate a wide variety of shapes, but suffers from two restrictions: (i) the need for a suitable choice of bin size [16,17] and (ii) the inability to estimate standard errors and confidence intervals for the estimated parameters. Whereas NK methods smooth empirical DBH distributions without the need to estimate parameters, parametric methods estimate the parameters of particular probability density functions that can conform to a wide array of DBH distributions and are credited with providing a deeper understanding of forest dynamics [18]. Parametric methods are often differentiated into flexible single or onecomponent parametric probability density functions (PDFs) [19][20][21][22] and finite mixture models consisting of two or more components [23][24][25][26]. Recent research has shown that despite their flexibility in shape, popular single/one-component parametric models such as the gamma and Weibull PDFs do not adequately portray irregular bi-and multi-modal DBH distributions [15,25]. It is precisely these bi-and multi-modal DBH distributions that represent heavy-tailed and highly skewed DBH distributions with some large trees and gaps in the distribution, however, that reflect outcomes of natural disturbances that are of interest to forest scientists and managers [15,25]. For these latter, more complex distributions, finite mixture distributions that contain a few more components and thus define different shape parameters for the sub-populations that comprise the overall DBH distribution, are now recommended [27,28].
Finite mixture distributions treat overall, heterogeneous DBH distributions as a compound of several distributions of sub-populations composed of multiple basic shapes that reflect different age cohorts or canopy layers [15,18,29]. Among the recent studies that have investigated various mixture distributions, we refer to [18] for two-component mixture Weibull distributions, [30] for two-component mixtures of three-parameter Weibull distributions, [29] for the mixture of log-normal, normal and two-parameter gamma distributions, [23] for the mixture of two-parameter Weibull distributions, [31] for mixture of normal, log-normal, and two-parameter Weibull distributions, [25] for the mixture of three-parameter gamma and Weibull distributions, [15] for the mixture of two-and three-parameter gamma and Weibull distributions, [24] for the mixture of three-parameter Weibull distributions, and [14,28] for two-component mixtures of two-parameter gamma distributions and gamma-shaped mixture (GSM) models. The GSM model is a special type of the two-parameter gamma mixture model for which estimating the parameters is much easier, because the scale parameters of all components in each GSM model are the same and the shape parameter of the k-th component is k, for k = 1, . . . , K [27,32]. Hence, for a K-component GSM model 2K parameters are estimated whereas for a K-component gamma mixture model the number of parameters is 3K-1.
The parameters of the finite mixture models (including the GSM model) can be estimated via a Bayesian or maximum likelihood (ML) approach e.g., [14,15,25,28]. The Bayesian approach uses Gibbs samplers or Markov chain Monte Carlo (MCMC) computations that are computationally cumbersome and time consuming for large K. The ML approach requires suitable starting parameters and iteratively executes and completes com-putations when numerical algorithms converge to a global maximum or terminates once a convergence criterion is met. Iterative algorithms such as the expectation-maximization (EM) are often combined with the Newton-Raphson (NR) method to find suitable initial values to estimate the parameters of the distributions [33]. Finding a suitable choice of the initial values for implementing the ML approach is not an easy task, however. Because the number of initial values that must be chosen depends on the number of groups (i.e., diameter classes) and the numbers of the components of the finite mixture model, this task becomes particularly challenging when the number of components for which starting values must be found is more than three or four. The NR method is, however, sensitive to the initial values, and the choice of initial values can strongly influence the speed of convergence of the estimation procedure and its ability to locate the global maximum [25]. Finally, complex forests structures characterized by irregular and multimodal empirical DBH distributions often exhibit many local maxima, minima, and saddle points on the likelihood surface that can cause algorithms to become extremely unstable and to not converge on the global maximum. In contrast, the iterative EM algorithm always converges, and does so expeditiously when a good starting value is chosen [34]. The K-means clustering algorithm is a powerful method for obtaining initial values [35] that can then be used in the iterative EM algorithm for estimating the parameters of a mixture model [33].
Although recent research on finite mixture models has predominantly focused on the gamma and Weibull PDFs, e.g., [14,15,25,28], less attention has been given to the log-normal mixture model, which is also a very flexible PDF that is able to fit highly skewed and heavytailed DBH distributions. Further, we have found no research devoted to estimating the parameters of finite mixture models fitted to grouped data in the field of forestry.
In this study, we expand on previous work on finite mixture models in forestry and developed the EM algorithm for the most commonly used multi-component mixture models in forestry, i.e., the gamma, Weibull, and log-normal, mixture models for conditions when observations are available both in grouped and in ungrouped form. In this study, we used the K-means clustering algorithm to obtain initial values for the EM parameter estimation approach. The objective of this paper is to explore the utility of the EM algorithm for fitting gamma, log-normal, and Weibull mixture models to diameter distributions of three irregular and multi-sized/aged forests. We further compared the performance of the resulting models with the best fit to the NK and GSM models when DBH data were ungrouped. Finally, we explored whether the number of groups (i.e., the number and width of diameter classes) influenced which model family provided the better fit in our empirical example datasets.

Results
The EM algorithm resulted in mixture models that fit the empirical data quite well. The model family that produced the closest fit with lowest Akaike Information Criterion (AIC) and largest log likelihood (LL) values differed among the three samples, with the log-normal and gamma families typically producing better fits than the Weibull family. The number of components in the various K-component mixture models that resulted in the best model when data were grouped depended on the width of the diameter classes and, to a minor extent, on the evaluation criterion (AIC or LL). Whether data were analyzed in grouped or ungrouped form did not influence the selection of the best model family and number of model components. For reference, the initial values for implementing the EM algorithm for the three sample plots obtained using the k-means clustering approach are given in Appendix C. Sample 1 was characterized by a broadly bimodal, rotated sigmoid DBH distribution that had no trees in the size classes between 30 and 50 cm in DBH ( Figure A1). The DBH distribution was best captured by two-or three-component models ( Table 1). The twocomponent mixture model was the superior model for grouped-5 and ungrouped (based on AIC), whereas the three-component model was superior for grouped-2.5, grouped-5, and ungrouped (the latter two based on LL). The shapes fit by the two-component mixture model families were similar, with the log-normal and gamma families providing the better fit ( Figure A1a-d). In general, all models underestimated the density of trees in the smallest size class, particularly in grouped-2.5, and successfully reflected the bimodal shape of the underlying DBH distribution. The shapes of the curves of grouped-2.5 and grouped-5 as well as of grouped and their corresponding ungrouped data forms were very similar. Among the three-component mixture models, the log-normal family was consistently identified as superior for grouped and ungrouped data ( Table 1). The threecomponent mixture models identified three modes in the underlying DBH distribution, but the different model families disagreed in the placement of the modes, particularly when the data were in grouped form ( Figure A1e-h). For grouped-2.5, the log-normal distinguished between the smallest DBH class (first mode) and the 17.5-27.5 cm classes (second mode) and identified the 62.5-65 cm classes as representing the third mode of trees in the 52.5-87.5 cm size classes. In contrast, the gamma and Weibull families identified a single mode between 12.5-27.5 cm but distinguished two modes in the 52.5-87.5 cm size classes. For ungrouped data, the three model families provided similar shapes of the fitted curves and agreed on placing two modes in the 12.5-27.5 cm size classes. For grouped-5, the three model families agreed more closely on the fitted shape, with similar shapes of the log-normal and gamma families for size classes up to 32.5 cm and similar shapes of the gamma and Weibull families for the size classes above 52.5 cm. Despite some differences, the shapes of fitted curves of grouped and ungrouped data were generally similar. Table 1. Parameter estimates and goodness-of-fit statistics for the mixture of log-normal, gamma, and Weibull distributions fitted to sample 1 when DBH data are ungrouped (UG), grouped in classes of width 2.5 cm (G2.5), and grouped in classes of width 5 cm (G5) . It should be noted that the estimated vector of mixing parameters is not given for the sake of saving space. Sample 2 was characterized by a broadly bimodal to multimodal distribution with an understory cohort between 12.5-45 cm in DBH, no trees between 45-65 cm in DBH, and a small overstory cohort between 65-75 cm in DBH ( Figure A2). The understory cohort had a mode in the 22-22.5 cm class and one in the 30-35 cm classes. Overall, the two-component Weibull was the superior model for grouped-5 and ungrouped whereas the three-parameter gamma and log-normal were superior for grouped-2.5 ( Table 2). The shapes fit by the two-component mixture model families were similar, particularly of the log-normal and gamma families, but the Weibull family provided the better fit by placing the first mode correctly between 30-35 cm and providing a closer fit to the second mode between 65-75 cm ( Figure A2a-d). In general, all models underestimated the density of trees in the first in grouped-2.5, but successfully reflected the bimodal shape of the underlying DBH distribution. The shapes of the curves of grouped-2.5 and grouped-5 as well as of grouped and their corresponding ungrouped data forms were very similar. Among the three-component mixture models, the log-normal and gamma families were consistently identified as superior for grouped and ungrouped data ( Table 2). This is largely because of nearly identical shapes of the three-component log-normal and gamma mixture models that identified the same modes, whereas the Weibull family missed the actual mode of the underlying DBH distribution, particularly for grouped-5 ( Figure A2e-h). Even though the first mode was a more obvious feature of grouped-2.5, both log-normal and gamma models portrayed this mode more sharply for grouped-5 but overestimated the density of the 30-35 cm and 65-75 cm classes. All three model families captured the empty size classes between 45-60 cm. For ungrouped data, the three model families provided similar shapes of the fitted curves and agreed rather well on placing the modes, with slightly better fits provided by the log-normal and gamma than the Weibull model. The overall relative advantage of the log-normal and gamma over the Weibull mixture distribution was a consistent similar shape for grouped and ungrouped forms. Table 2. Parameter estimates and goodness-of-fit statistics for the mixture of log-normal, gamma, and Weibull distributions fitted to sample 2 when DBH data are ungrouped (UG), grouped in classes of width 2.5 cm (G2.5), and grouped in classes of width 5 cm (G5) . It should be noted that the estimated vector of mixing parameters is not given for the sake of saving space. Sample 3 was characterized by a generally negative exponential or reverse-J diameter distribution typical of many old-growth forests where the number of trees initially declines sharply with increasing tree size ( Figures A3 and A4). In contrast to the previous two samples, the diameter distribution in sample 3 was much wider, contained very few empty diameter classes toward the larger end of the long right-tailed distribution (between 80 and 100 cm DBH; Figures A3 and A4), and was mildly multi-modal. The multi-modality was largely smoothed over by all two-component mixture models but were more clearly expressed in the three-and four-component mixture models. Overall, the three-component log-normal was the superior model for grouped-2.5 and grouped-5 (based on AIC) whereas the four-parameter log-normal was superior for grouped-2.5 and grouped-2.5 (based on LL) and ungrouped (Tables 3 and 4). Though broadly similar, the shapes of the threeand four-component log-normal and gamma models differed from the Weibull model at the first mode around the 10 cm size class, for which the Weibull provided the better fit, particularly for grouped-5, and around the mid-sized classes between 20-40 cm, which the Weibull model typically underestimated. For ungrouped data, the three model families provided similar shapes of the fitted curves and agreed rather well on placing the modes, with slightly better fits provided by the log-normal and gamma than the Weibull model. All three mixture model families provided curves that fit the data much more closely when the data were analyzed in ungrouped form. Table 3. Parameter estimates and goodness-of-fit statistics for the mixture of log-normal, gamma, and Weibull distributions fitted to sample 3 when DBH data are ungrouped (UG) and grouped (G) in classes of width 2.5 cm. It should be noted that the estimated vector of mixing parameters is not given for the sake of saving space. In all three samples, the log-normal mixture model generally did a comparable, if not better, job fitting the ungrouped data than the gamma mixture model and both models outperformed the GSM ( Table 5). The log-normal mixture model was identified as superior to the gamma mixture model by all three goodness-of-fit measures in sample 1 and the Kolmogorov-Smirnov (KS) and the Cramér-von Mises (CVM) measures in sample 2, and the KS and Anderson-Darling (AD) measures in sample 3 (indicated by boldface values in Table 5). The NK model was identified as the superior model by the KS measure in sample 2 and by all three measures in sample 3 while the GSM model consistently exhibited the worst performance in all three samples. The differences in goodness-of-fit among the three model families can be readily seen when the PDFs of the log-normal mixture, NK, and GSM models were superimposed onto the DBH distributions of all three samples ( Figure A5). Table 4. Parameter estimates and goodness-of-fit statistics for the mixture of log-normal, gamma, and Weibull distributions fitted to sample 3 in grouped case when class width is 5 cm. It should be noted that the estimated vector of mixing parameters is not given for the sake of saving space.

Discussion
The EM algorithm was quite successful fitting two-and three-parameter gamma, lognormal, and Weibull mixture distributions to three empirically observed example diameter distributions. Empirical diameter distributions in natural forests are often characterized by random, local, irregular, and multiple modes that reflect peaks of establishment of natural regeneration at certain time intervals and/or episodic growth releases of individual or small groups of trees following disturbances or gap dynamics [4,14,36]. Theoretical probability density functions anticipate gradual, but not necessarily small, differences in the frequency of trees in neighboring size classes and are generally more successful at approximating multimodal than irregular distributions [25]. Thus, as long as empirical diameter distributions do not exhibit large, erratic differences in the frequency of trees in neighboring size classes (i.e., irregular distributions), kernel density and theoretical density functions can provide smoothed fits that closely approximate various shapes of empirical diameter distributions. In this study, flexible two-and three-parameter log-normal and gamma mixture models were particularly successful at approximated two differently shaped, highly skewed, heavy tailed, and multimodal empirical DBH distributions with empty size classes.
The log-normal, gamma, and Weibull mixture distribution families are often used to analyze heterogeneous lifetime or survival data [37], which DBH distributions represent. In this study, these three mixture model distribution families did not perform equally well, however. In most cases, but not always, the log-normal and the gamma mixture models provided a closer fit to the empirical diameter distributions than the Weibull model; in most cases, the log-normal mixture models were only slightly superior to the gamma mixture models. Although mixture models are able to approximate stands with multimodal diameter distributions with high accuracy and fit DBH distributions very well around the largest maxima [14,15,29], the observed qualitative differences among the distribution families were largely due to an underestimation of the tree frequency in the class of the global maximum density, with more gradual and delayed changes in the shape of the Weibull mixture distribution than the log-normal and gamma mixture distributions ( Figures A1 and A2).
The accuracy of the fit of all three mixture models also depended on the number of components of each model. AIC and LL indicated that two components generated the superior model in sample 1 and three or four components provided the better fit for the DBH distributions in samples 2 and 3 , reflecting the fact that the number of components in mixture models should be related to the number of maxima observed in a distribution [14]. Whereas all mixture models using fewer components than the number of maxima smoothed over and missed some local maxima and generally overestimated densities in empty classes, mixture models that matched the number of maxima were able to more closely trace the subtleties of the DBH shapes. As [15] point out, however, the choice of using a two-or three-component model depends largely on the study objective. If the study objective is to fit theoretical distribution models as precisely as possible to a specific empirical data set, then the number of mixture components should be matched to the number of random, local extremes. If, however, the objective is to make more general inferences about regeneration or stand dynamics, then modeling random, local multimodality would not be of central interest and the focus should be on the separation of local maxima that reflect the existence and dynamics of subpopulations. For these types of investigations, two-component mixture models are often sufficient [15,18,23,25,30]. While the decision to select a two-, three-or higher-component mixed model may depend on the specific objective of the study, we found that the ungrouped data format gave more similar results among the model families that generally fit the underlying distribution very well. An additional benefit was that no decision on the width of the size classes needed to be made.
It has also been reported that the implementation of mixture models with more than two parameters is problematic, because the estimation process may fail to converge, the algorithm may become extremely unstable, and the global maximum of the likelihood function may not be found [14]. To avoid non-convergence, the GSM model has been promoted as an alternative, because it smooths small local DBH maxima, making it a useful model for approximating the empirical DBH distributions in stratified stands with complex structures [14]. In this study, however, the EM algorithm always led to convergence and robust estimates of parameters. In addition to a consistently superior performance of the two-and three-component log-normal and gamma mixture models over the GSM model, the speed taken by the central processing unit (CPU) for estimating the parameters was much faster for the finite mixture models (i.e., less than 1.5 seconds for all three samples) compared to the time needed for estimating the 250 components of the GSM model (i.e., 120, 150, and 720 seconds for samples 1 and 150 seconds for samples 1, 2, and 3, respectively). and 2, respectively).
The performances of the two-and three-component log-normal and gamma-mixture models were comparable, and often superior, to the approximation obtained using the highly flexible NK model [14,15,38]. This reflects the capacity of two-and three-component log-normal, gamma, and Weibull mixture distributions to very accurately approximate mul-timodal DBH distributions that describe complex forest structures with three or multiple age cohorts [25]. As exemplified in the two-cohort stands of samples 1 and 2, when DBH distributions reach a first local maximum, decrease thereafter, and increase again to reach a second local maximum (i.e., a rotated sigmoid distribution), two-and three-component mixture models fit very well. In these instances, the mixture models fit around the distinct and sharp local maxima and largely smooth over smaller local maxima, which often express random multimodality and whose overall influence on the quality of the approximation is limited [15]. In contrast, kernel density estimators smooth over the distinct and sharp extremes (i.e., both maxima and empty DBH classes), leading to less precise approximations than in multilayered stands with smoother distributions [15], as was seen with the reverse-J DBH distribution in separate heavy-tailed and highly skewed plot whose DBH is not given in this study. The successful performance of the NK model also reflects that an accurate parametric model was selected to fit the empirical DBH data that overcame one of the main limitations that challenge finite mixture distributions, namely the failure of the estimation process to converge or find the global maximum. We conclude that the EM algorithm, at least for the sample plots used in this study, provided initial values that were sufficiently close to the values for which the log likelihood function was able to reach the global maximum.

Material
The EM algorithm was applied to example data collected in three plots located in two different sampling areas. The first sample used two plots of 0.08 ha size that were part of a larger study on the effects of prescribed burning established in multi-aged mixed ponderosa pine (Pinus ponderosa Dougl. ex Laws.) that contained scattered western junipers (Juniperus occidentalis Hook.). The plots were located in the Malheur National Forest on the southern end of the blue Mountains near Burns, Oregon, USA [39]. Of the many variables originally collected in this study, we only used the diameter at breast height (DBH, measured at 1.3 m height) of all live trees in plot 75 (sample 1) and plot 23 (sample 2). The second sampling area of the study was located in district 1, compartment 111 in the Kordkuy forest in the Golestan province of northern Iran (UTM zone 40: E247508, N4065346). The forest is a semi-natural multi-aged Oriental beech (Fagus orientalis Lipsky) dominated forest that is managed with the single-tree selection system. One 100 × 100 m plot was established in an area of the compartment that had a tree canopy cover of 85%. All living trees in the plot with a DBH greater than 6 cm were measured using two caliper readings at an angle of 90 • to one another. The plot summary statistics are provided in Table 6.
The mixing parameters ω k s are non-negative and sum to one, i.e., ∑ K k=1 ω k = 1. In this study, f (.|θ k ) denotes the PDF of gamma, log-normal, or Weibull distribution given, respectively, by the following: where x > 0 and θ k = (α, β) . For the families given in (2) and (4), α and β play the role of the shape and scale parameters, respectively. Assuming that x = (x 1 , . . . , x n ) are coming from a mixture model with PDF (1), the ML estimator Θ of the parameter vector Θ is obtained by maximizing the log-likelihood function. This means In order to estimate Θ through the ML approach, computer packages use iterative methods such as the NR algorithm, which does not always converge on the global maximum, particularly for irregular and incomplete distributions with heavy tails. When encountering incomplete data problems, the EM algorithm introduced by [40] is the most popular inferential tool for estimating the parameter vector.

Application of Mixture Models to Sample Plots
Mixture models of gamma, log-normal, and Weibull distributions were each fitted to samples 1-3. Groups in all three samples were either 2.5 cm or 5 cm wide (hereafter grouped-2.5 or grouped-5). Parameters of the mixture models in both grouped and ungrouped cases were estimated using the EM algorithm described briefly in Appendices A and B. We note that the initial values for implementing the EM algorithm were obtained through the K-means method of clustering in the R [41] environment using command kmeans(.). To do this, data were partitioned into K groups and the proportion of data points belonging to the k-th cluster was considered as the initial values for k-th mixing parameter for k = 1, . . . , K. For each group k, the initial values for α k and β k were obtained using the method of moments. For applying the K-means approach to the grouped DBH observations, the midpoint of the i-th cluster is repeated f i times for i = 1, . . . , m in order to obtain n observations for applying K-means approach. For implementing the EM algorithm, we have provided an R package called ForestFit [42]. Performance of the various mixture models was compared for both grouped and ungrouped data. The performances were further compared with the GSM and NK models using only the ungrouped data. The Kcomponent GSM model with PDF given by where Θ = (ω 1 , . . . , ω K , β) has been used recently for modelling the DBH distribution [28]. The NK model estimate the fitted PDF f (x), to the data using where x 1 , . . . , x n are DBH observations, h is bin size, and K(y) is a mathematical function that satisfies ∞ −∞ K(y) = 1. It should be noted that in the present study, h is determined by the Plug-in bandwidth estimator proposed by [43] and the kernel was the Gaussian type defined as K(y) = 1/ √ 2π exp −y 2 /2 . The statistical packages GSM [27] and kerdiest [44] that were developed for the R [41] environment were used to fit the GSM and NK models, respectively. Also, for having a fair comparison, we suppose that data are given in ungrouped from. As statistical goodness-of-fit measures, we used the AIC, AD, CVM, KS, and LL given by were x (i) denotes the i-th ordered value of DBH and F(.|Θ) is cumulative distribution function. For computing statistics AIC, AD, CVM, and LL given in (5)-(8), we assume that Θ is estimated using the EM algorithm. We note that the smaller (larger) values of AIC (LL) indicate better models.

Conclusions
We derived the expectation-maximization (EM) algorithm for estimating parameters of the most commonly used finite mixture distributions (i.e., gamma, log-normal, and Weibull mixture models) fitted to grouped and ungrouped empirical diameter at breast height (DBH) data. We used three sample plots with different DBH distributions to showcase the EM algorithm, investigated the performance of the mixture models, and compared their performance to nonparametric kernel-based density estimation (NK) and gammashaped mixture (GSM) model. We want to stress, however, that while our analysis of three different example sample plots provided very encouraging results for the utility of the EM algorithm to model with irregular diameter distributions, the conclusions about the relative performances of the gamma, log-normal, and Weibull mixture models apply only to these three sample plots and should not be extrapolated beyond these plots. Sample plots represented a bimodal rotated sigmoid distribution without any trees between 30 and 50 cm in size, a bimodal distribution with understory trees up to 45 cm, no trees between 45 and 65 cm in DBH, and a small overstory component between 65 and 75 cm (mixed-age ponderosa pine with scattered western junipers in both plots), and a reverse-J diameter distribution with several small modes toward the right tail of the distribution (mixed, deciduous uneven-aged Oriental beech forest). In these mixedspecies, two-cohort/two-layered, and multi-cohort/multi-layered stands, the two-and three-component finite log-normal and gamma mixture models modeled empirical DBHs (grouped and ungrouped) very well and generally outperformed the Weibull mixture model. Both of these mixed model types also consistently outperformed the very flexible GSM model that has previously been shown to work well for stands with high skewness and heavy tails. The log-normal and gamma distributions were superior to the NK model in clearly bimodal stands and were inferior in the reverse-J situation that contained several minor modes along the right tail of the distribution. Because nonparametric models preclude the computation of statistics such as standard errors and confidence intervals or hypothesis testing of the estimated parameters, parametric mixture models are generally regarded as a more insightful approach for fitting DBH distributions. We conclude that the two-and three-component log-normal and gamma mixture models are well suited to characterize multimodal DBH distributions for natural stands with two, three or multiple age cohorts of complex structure. The capability of these models to approximate and quantify multimodal empirical DBH distributions makes these models a valuable tool for investigating forest dynamics in complex stands that increasingly guide management approaches in forestry. Acknowledgments: Authors would like to thank three anonymous referees for their constructive comments that greatly improved quality of the paper. The second author would like to thank S. Ghalandarayeshi from Department of Statistics of Gonbad Kavous University for sharing experimental data of the sampling area that is located in Iran.

Conflicts of Interest:
The authors declare no conflict of interest.
Sample Availability: Samples of the compounds are available from the second author.

Appendix A. A Brief Introduction to the EM Algorithm for Mixture Models Fitted to Ungrouped Data
In this study we use the EM algorithm for estimating Θ in the mixture model (1) when x is available in both grouped and ungrouped (raw) forms. The EM algorithm finds the ML estimators for the parameters of a statistical model when some information about the model is missing. For a k-th component mixture model, each observed value is accompanied by a label that determines the observation belongs to which component. For example, the paired sample (x i , j) shows that i-th observed value x i belongs to j-th component (for i = 1, . . . , n and k = 1, . . . , K) in which j is unknown label or origin of component. Here, we have two sets of data (or variables) including the set of observed variable and latent variable (labels). A sample including both of observed and latent variables is called complete data. For a mixture model corresponds to (1), the observed data (DBH data) are denoted by x 1 , . . . , x n where the k-th component of the mixture has the PDF f . θ k . In the EM algorithm framework, the observed data corresponding to (1) are denoted by x 1 , . . . , x n where the k-th component of the mixture has the pdf f . θ k . We denote the complete data by ξ = ξ 1 , . . . , ξ n = (x 1 , z 1 ), . . . , (x n , z n ) in which z i = (z i1 , . . . , z iK ) is the latent realization of Z i = (Z i1 , . . . , Z iK ) defining the origin component of x i for i = 1, . . . , n. In each realization of Z i , one of its components, say Z ik , equals to 1 and the others are zero. Such realization states that y i comes from the k-th component of the mixture model. We note that by this construction the observation x i comes from the k-th component when Z ik = 1, for i = 1, . . . , n and k = 1, . . . , K. The complete data log-likelihood function, i.e., l c (Θ, ξ) is given as where the constant C is independent of parameter vector Θ. Each EM algorithm has two parts, including expectation (E) and maximization (M) steps. Both E-and M-steps are repeated until convergence occurs. In what follows, we introduce both the E-and M-step of the EM algorithm. Assuming that we are at the (t + 1)-th iteration of the EM algorithm, the E-step requires the calculation of the conditional expectation Q Θ Θ (t) of the complete data log-likelihood function given the observed data x and a current estimate , θ (t) of the parameter vector Θ. We have  (2) where θ = (α, β) . Substitute pdf of the gamma distribution into the right-hand side of (A1) to obtain where C is a constant independent of θ k = (α k , β k ) . Assume that we are currently performing the (t + 1)-th iteration of the EM algorithm. We maximize (A3) with respect to α k and β k in order to update α .

Appendix A.2. M-Step of the EM Algorithm for Mixture of Log-Normal Distributions
Suppose x = (x 1 , . . . , x n ) denotes a sample of n independent observations from mixture of log-normal distributions with pdf given in (3) where θ = (α, β) . Substitute pdf of the log-normal distribution into the right-hand side of (A1) to obtain where C is a constant independent of θ k = (α k , β k ) . Assume that we are currently performing the (t + 1)-th iteration of the EM algorithm. We maximize (A4) with respect to α k and β k in order to update α , respectively. It follows that Step of the EM Algorithm for Mixture of Weibull Distributions Suppose x = (x 1 , . . . , x n ) denotes a sample of n independent observations from mixture of Weibull distributions with pdf given in (4) where θ = (α, β) . Substitute pdf of the gamma distribution into the right-hand side of (A1) to obtain where C is a constant independent of θ k = (α k , β k ) . Assume that we are currently performing the (t + 1)-th iteration of the EM algorithm. We maximize (A5) with respect to α k and β k in order to update α For more details about the EM algorithm for mixture models fitted to ungrouped data, we refer reader to [26,45].

Appendix B. A Brief Introduction to the EM Algorithm for Mixture Models Fitted to Grouped Data
Suppose each element of the random sample x = (x 1 , . . . , x n ) follows the PDF given in (1). Further assume that sample x has been partitioned into m mutually exclusive groups each of the form (a i , b i ) for i = 1, . . . , m. We note that the a 1 and b m can be regarded, respectively, as the minimum and maximum observed values and ∪ m i=1 (a i , b i ) ⊆ S in which S denotes the support of the distribution. It is worth noting that we just know about the number n i of observations falling in (a i , b i ) and the fact that a i < x ij < b i for j = 1, . . . , n i and i = 1 . . . , m. So, the vector of observed data is given by y = (n 1 , a 1 , b 1 ) , (n 2 , a 2 , b 2 ) , . . . , (n m , a m , b m ) . Table A3. Initial values of the EM algorithm for sample 3 obtained using the K-means clustering approach when DBH data are ungrouped (UG) and grouped (G) in classes of width 2.5 cm. It should be noted that the estimated vector of mixing parameters is not given for the sake of saving space.

K Type
Family Fore more details about the EM algorithm for mixture models fitted to grouped data, we refer reader to [46,47].