Analysis of Longitudinal Forest Data on Individual-Tree and Whole-Stand Attributes Using a Stochastic Differential Equation Model

This paper focuses on individual-tree and whole-stand growth models for uneven-aged and mixed-species stands in Lithuania. All the growth models were derived using a single trivariate diffusion process defined by a mixed-effect parameters trivariate stochastic differential equation describing the tree diameter, potentially available area, and height. The mixed-effect parameters of the newly developed trivariate transition probability density function were estimated using an approximate maximum likelihood procedure. Using the relationship between the multivariate probability density and univariate marginal (conditional) densities, the growth equations were derived to predict or forecast the individual-tree and whole-stand variables, such as diameter, potentially available area, height, basal area, and stand density. All the results are illustrated using an observed dataset from 53 permanent experimental plots remeasured from 1 to 7 times. The computed statistical measures showed high predictive and forecast accuracy compared with validation data that were not used to find parameter estimates. All the results were implemented in the Maple computer algebra system.


Introduction
Forest growth forecasts play an important role in the planning of timber resources and maintaining strategies of sustainable forest management. Significant errors in stand growth forecasts lead to an unreasonable use of forest resources, and they also result in unsustainable forestry. In recent decades, the large number of publications on growth models has mainly dealt with even-aged pure stands, and has reflected the basic principles of the mean response regression approach that takes into account the existing experimental databases.
The usefulness of an individual-tree-based machine learning model in predicting and forecasting a stand's response to treatment is due in part to the sensitivity obtained by treating each tree as a separate entity. A modern unified modeling system that can provide an accurate and realistic forecast of forest growth for different species in different terrain and stand conditions has many different applications. Stand growth model systems are composed of many independent system components playing key roles in the behavior of the whole system, and interacting among themselves. The complexity of stand growth systems can be characterized by understanding the relationships, interactions, and behavior of the strongly interdependent variables.
The traditional approach to individual-tree and whole-stand growth modeling is a complex process of model development and selection, where multiple models or model variants are developed and evaluated to determine the model that "best" describes a separate simulated variable [1][2][3]. The evaluation of the distribution shape of tree size variables is of particular importance for the assessment of forest resources and the planning of future forestry procedures. Diverse shapes of distributions of tree size variables have long been used in forest statistics to generalize the structure of stands as, firstly, they are easy to estimate from national forest inventories; and secondly, they become useful themselves to formalize the relationships between the tree or stand variables [4,5].
Most mathematical models describing the actions of an individual tree or the collective trees in a stand aim to create reliable formalizations of the problem with the help of limited experimental datasets. Stochastic consistency with uncontrollable environmental rules or other random influences in a forest stand pertaining to the greater picture is rarely imposed. Studies in recent decades have sought to demonstrate the need for a more critical examination of the paradigms used in the current context, which may lead to models that would stand the test of time.
Considering the evolution of individual-tree and whole-stand growth paradigms over the past decades, we focus on the "diffusion process" theory of how trees in a stand can respond differently individually and collectively to both their internal and environmental factors, which is based on the Brownian motion model. Diffusion processes operate in many natural phenomena starting in the field of physics [6], but diffusion processes have been used in many biological, engineering, economic, and social phenomena [7][8][9]. In summary, a diffusion process is a stochastic continuous time process that satisfies the formalized stochastic differential equation. Since the data on tree or stand variables are obtained from national forest inventories and experimental plots, they consist of observations through time, so it is natural to define the kinetics of trees and stands by using stochastic differential equations. Mixed-effect parameters stochastic differential equations have also been used to define the kinetics of tree height via age [10] and via diameter [11], and tree crown width via age [12]. The usefulness of mixed-effect parameters, which account for both fixed and random effects, in tree or stand growth modeling lies in their ability to split the total variation into within and between stands by using fixed and random effects. The stochastic differential mixed-effect models have long been useful tools in medicine and biology for revealing incomplete or inaccurate model kinetics [13,14].
The mean response regression models used in forestry, in summary, although well advanced and developed in recent years, are still in a very primitive phase, and it is necessary to select one of the "best" of the many possible relationships to model a single variable. First, the models used do not use a mathematical formula describing the evolution from the beginning of the stand establishment to the age of extinction, and the natural growth or decay of the tree or the stand growth variables are not continuously maintained. Second, the equations in the derived models depend on unknown parameters (effects), the specific physical interpretations of which are usually not understood. Third, the developed models representing the growth process do not address the covariance between the fundamental tree size variables. Although many other mean response regression models that adjust the observed data are already available, it is clear that none of them are a panacea for the many problems that confront forest statisticians. In this sense, it is important to propose and evaluate the mixed-effect parameters trivariate stochastic differential equation model for tree diameter, potentially available area, and height kinetics by age. For example, fixed-and mixed-effect parameters bivariate stochastic differential equation models were proposed in [15,16].
The main purpose of this investigation is to explore a trivariate Vasicek-Gompertz-Vasicek stochastic differential equation with fixed-and mixed-effect parameters. It should be noted that our proposed trivariate stochastic differential equation has an exact solution expressed in a trivariate probability density function. All the results were implemented in the Maple computer algebra system [17].
The novelty and contribution of our work is summarized as follows:  We introduce a trivariate Vasicek-Gompertz-Vasicek stochastic differential equation with biologically interpretable parameters (see, for instance, [11]).


The fixed-and mixed-effect parameters are estimated using the maximum likelihood procedure.  Newly derived equations of the mean and quantiles of the tree diameter, potentially available area, and height kinetics in the case of an individual-tree or the whole stand are illustrated and tested using a validation dataset.  Newly derived equations of the stand density are illustrated and tested using statistical measures and a validation dataset.  Newly derived equations of the tree and the stand basal area are illustrated and evaluated using statistical measures and a validation dataset.  The accuracy of the forecasts for the 5-and 15-year forecast periods is assessed using statistical measures and a validation dataset.
The forecast provided in this study is understood as a special prediction related only to data from past events in order to generate or transmit data for the future based on past events. Foresters typically use a projection to calculate a numerical value associated with a future event, which allows for more hypothetical inputs that are assumptions consistent with the purpose of the information, but are not necessarily the most probable [18].

Background
A stochastic differential equation model for height and diameter kinetics during a stand growth period was described in [19], regardless of the potentially available area dynamics. The potentially available area of an individual tree can be easily and reliably calculated using Voronoi diagrams [20]. To illustrate a Voronoi diagram, a polygon (potentially available area) of each individual pine, spruce, and birch tree in a randomly selected plot is shown in different colors in Figure 1. There are several standard stochastic differential equations that can be reduced to the Ornstein-Uhlenbeck model, such as Vasicek, Gompertz, Bertalanffy, and others [21]. Let the tree diameter at breast height (in the sequel-diameter), , the tree potentially available area (in the sequel-area), , and the tree height, (t ∈ [t0, T ]; t0 ≥ 0, l = 1, …, M, and M is the number of plots), frame the trivariate homogenous Vasicek-Gompertz-Vasicek diffusion process, which takes positive values and satisfies the following non-linear Ito-type [22] system of stochastic differential equations: where the drift term is defined as: and the diffusion matrix is defined as: (3) Applying the Ito formula [22] to the transformation: We can deduce that the solution of our original stochastic differential Equation (1) has a trivariate normal-lognormal-normal distribution ; , with the mean vector defined by: , , The variance-covariance matrix : and the transition probability density function: , , , | , 1 , , , | , , , , , , , , , , , , , , , , , l = 1, …, M.

Data
For each tree in the considered plots, we recorded the tree species, age, location of sample trees (planar coordinate position x and y), diameter at breast height, and height for 53 plots across the municipality of Kazlų Rūda in Lithuania. During the 1983-1987 period, 53 permanent experimental plots were established in the Kazlų Rūda forests in Lithuania. Regarding the regeneration mode, all the field sample plots vary between those that were naturally regenerated and those that were artificially regenerated, and spread in pure or mixed-species stands. The distribution of stands by tree species is: pine (Pinus sylvestris L.) stands, 63.8%; spruce (Picea abies L. Karst.), 30.2%; silver birch (Betula pendula Roth.), 5.8%; and others, 0.2%. Each sample plot consisted of about 0.16-0.72 ha and was remeasured several times. The plots were remeasured one to seven times at intervals of five years or more. The age of the i-th tree (ranging from all trees to the 10th) in the first measurement was recorded by counting its growth rings in the growth core (for even aged stands, from records in documents), and the age of the remaining trees was obtained from the arithmetic mean. The position accuracy of the plane coordinates was 1 dcm, the diameter measurements were made with an accuracy of approximately 1 mm, and the height measurements were made with an accuracy of approximately 1 dcm. The entire field sample plot dataset was randomly divided into estimation (43 plots) and validation (10 plots) datasets. Table 1 shows the summary statistics of the estimation and validation datasets.

Statistical Measures
In this study, we assessed the prediction and forecast accuracies of the newly developed models based on statistical measures, such as the mean bias B (the percentage mean bias, % ), the absolute mean bias AB (the percentage absolute mean bias, % ), the root mean square error RMSE (the percentage root mean square error, % ), and the coefficient of determination R 2 . These statistical measures are defined by: where: ∑ is the total number of observations used to validate the model; K is the number of stands; ni is the number of measured trees in the ith plot; and , ∧ , and are the measured, estimated, and average values of the dependent variable. The Student's t-test [23] determines whether there is any difference between the observed values of the validation dataset and the predicted or forecasted values.

Marginal Distributions
Traditional height-age or diameter-age relationship modeling methods are based on a difference equation that requires repeated measurements of data in different stands. Therefore, these data are hierarchical structures on which observations are dependent and often lead to bias in parameter estimates [24].
In the previous section, a stochastic differential equation model (1) was formalized for tree diameter, area, and height assessment against age, with the respective 17 fixedeffect parameters having been estimated. In this study, the fixed-effect parameters of Equation (1) were estimated using an approximate maximum likelihood procedure, which is framed by Equations (A17)-(A21) and implemented using the Maple environment [17]. Computed fixed-effect parameters estimates using the estimation dataset are given in Table 2. Estimates of standard errors were obtained as the inverse of the observed Fisher information matrix (see, for instance, [19]). Parameter estimates of all fitted models were significant (p < 0.05). As can be seen from Table 2, the asymptotic values (α1) of the tree diameter are 54.2402, 54.5705, and 46.5284 cm for Scots pine, Norway spruce, and silver birch trees, respectively. The silver birch trees have the smallest asymptotic diameter value compared with the Scots pine and Norway spruce species, but the difference between the Scots pine and Norway spruce species is insignificant. Similarly, Table 2 shows that the asymptotic values (α3) of the tree height are 42.1928, 42.3987, and 39.4933 m for the Scots pine, Norway spruce, and silver birch, respectively. The silver birch has the smallest asymptotic height value compared with the Scots pine and Norway spruce species, but the difference between the Scots pine and Norway spruce species is also insignificant.
In further study, only the validation dataset will be used, the fixed effect parameters will be taken from Table 2, and the random effects will be calibrated according to Equation (A22).
According to regeneration treatment, forest trees are often in a cyclical regime, which includes artificial or natural regeneration, spread in pure or mixed stands. As a special case of continuous stochastic process, a stationary system of marginal probability density functions may be obtained (by limiting age, t, to infinity), which displays a type of forest stand equilibrium. The marginal transient probability density functions for the diameter, area, and height variables for the three different age groups of 30, 70, and 120 years are shown in Figures A1-A3, respectively.
Figures A1-A3 show that marginal probability density functions change with stand age and show quite significant differences for different stands. These distributions allow us to define the trajectories of the mean and various moments for diameter, area, and height. According to the equations of the mean and quantile trajectories presented in Ta

Diameter, Area and Height Predictions and Forecasts.
The consistent conservation of forest resources can only be achieved through the use of reliable forest inventories, and accurate forest prediction and forecast models. Kinetic models of tree size variables are needed to evaluate and verify the long-term performance of biological and economic tree-growing programs.
In order to apply the stochastic differential Equation (1) model to the modeling of tree size variables in a new plot, we distinguished two scenarios (prediction and forecast) based on the calibration of random effects using Equation (A22). For the first scenario (prediction), the random effects for a new plot from the validation dataset were calibrated using all remeasurement cycles in a plot. For the second scenario (forecast), the random effects for a new plot from the validation dataset were calibrated using only the first measurement cycle in a plot in order to define the forecasts (at the 5-and 15-year forecast periods) of the diameter, area, and height for each individual tree or stand. Most remeasured plots from the validation dataset were uneven-aged and mixed-species. The results of the statistical measures calculated using the plots from the validation dataset for the diameter, area, and height equations against the age (see Table 1), against the age and one additional explanatory variable (see Tables A1-A2), and against the age and two additional explanatory variables (see Equations (A11)-(A16) and Table A2) are presented in: Table 3 for the predictions of the individual-tree scenario models; Table 4 for the predictions of the whole-stand scenario models; Table 5 for the forecasts of the individual-tree scenario models; and Table 6 for the forecasts of the whole-stand scenario models, respectively.

Tree Species
Marginal Trends (Table A1) Conditional Trends (Equations (A11)-(A16) and To validate the mixed-effect parameters stochastic differential equation model, a validation dataset was applied to establish the main objective of the study, which was to develop prediction and forecast models for the individual tree and the whole stand. It is evident that the whole-stand models (see Tables 3 and 4) show better statistical measures and the highest compatibility with the individual-tree models. In summary, it can be stated that the observed validation dataset of the Scots pine species best corresponded to the diameter and height predictions (see Tables 3 and 4). The diameter and height predictions for the silver birch species were significantly worse than for the other tree species because, on the one hand, the number of measurements for this species was not large and, on the other hand, the age of the tree was accurately recorded for a small number of trees. The simulated conditional diameter and height models listed in Table A2 show that the inclusion of the tree potentially available area in the form of an explanatory variable improves the diameter and height predictions very slightly by up to 3%. Tables 3 and 4 show that the individual-tree predictions of the potentially available area have low statistical measures, but the whole-stand predictions of the potentially available area show significantly better statistical measures. The stochastic differential equation kinetic model developed in this study appears to allow a more accurate prediction of tree or stand diameter and height growth than previous models [25][26][27]. The relationship between the potentially available area of individual trees and the tree size variables is critical for scaling up individual-tree effects to whole-stand phenomena. However, this problem has rarely been discussed in the literature, with the exception of a few separate studies [28,29].
The following Tables 5 and 6 provide statistical measures for the 5-and 15-year forecasts forward, with the fixed-effect parameters taken from Table 2 and the random-effects calibrated by Equation (A22) using only first-cycle measurements from a validation dataset. An initial condition , , at initial age is defined using only first-cycle measurements from a validation dataset.  As can be seen from Tables 5 and 6, the statistical measures acquired worse values with an increasing forecast period. It is also clear that the statistical measures (see Tables  5 and 6) for the forecasts of the whole-stand variables were higher than the forecasts for the individual-tree variables. In summary, we can confirm that the observed validation dataset of the Scots pine species best corresponded to the diameter and height forecasts than the area (see Tables 5 and 6). The forecasts of the diameter, area, and height for the silver birch species were the worst compared with the other tree species, probably due to the small number of observations, which was three plots with 19 trees for the 5-year forecast period, and four plots with 12 trees for the 15-year forecast period, respectively.
Traditionally used empirical models develop statistical relationships to observed datasets and are not explicitly linked to the mechanisms that determine kinetics [30,31]. Our derived model, based on Equation (1), is able to characterize individual-tree and wholestand developments, and can be used for understanding and analyzing the behavior of forest stands. The advantage of the presented model is that it enables us to link any variable (diameter, area, or height) of the individual-tree or whole-stand with a separate remaining variable, both remaining or only with age, and formalize a dependency that additionally includes covariance between size variables.

Tree and Stand Basal Area Predictions and Forecasts
The static growth models that are commonly used in forest modeling aim to predict directly at a given age the individual-tree or whole-stand quantities of interest (diameter, height, and basal area). As more remeasurements of the network of permanent plots were available, a kinetic growth model was formalized. The basal area of the individual tree or the whole stand is directly related to other very important tree or stand variables, such as volume, biomass, and CO2 uptake. On the basis of stochastic differential Equation (1), the basal area kinetic model was formulated by use of the mean trends listed in Table A1 for the individual tree, b (measured in cm 2 ), and the whole stand per hectare, B (measured in m 2 ), modes in the following forms, respectively: , , , , , l = 1, 2, …, K 10,000 where K is the number of observed plots in the validation dataset; is the part of the area occupied by the tree species concerned ( 1, 0 1, 0 1, 0 1); the fixed-effect parameter estimations are denoted by "hat" and are taken from Table 2; the random effects φ are calibrated using Equation (A22); and the probability density function is defined using Equation (A1).
The kinetics of the mean tree basal area for the different tree species and the four different stands are shown in Figure 5. As can be seen from Figure 5, the trajectory of the mean tree basal area of the pine species trees is significantly higher than that of other species, and the trajectory of the spruce trees is the lowest. The results of the statistical measures calculated using the plots from the validation dataset for the mean tree basal area against the age; and against the age, area, and height (see Equations (17) and (18)) are presented in: Table 7 for the tree basal area predictions of the individual-tree scenario models; and Table 8 for the basal area per hectare predictions of the whole-stand (the basal area per hectare predicted values are compared with the basal area per hectare in a corresponding observed stand) scenario models. Looking at the accuracy of the basal area predictions expressed by the statistical measures calculated in Tables 7 and 8, we can state that the Norway spruce species trees are the worst predicted. Moreover, the addition of the tree potentially available area with the additional explanatory variable improves the basal area predictions only slightly; for example, the coefficient of determination is only up to 3%. The p-values of all the discussed models indicate that there is no bias between the predicted basal area values and the observed values (from the validation dataset). The statistical measures presented in Table 8 (calculated by comparing the values of the predicted basal area per hectare with the observed basal area per hectare) show better accuracy than comparing the predicted and observed basal areas of the individual trees (see Table 7). Tables 9 and 10 show that the 5-year forecasts are in good agreement with the observed values and have the normal property of a good forecast that it is unbiased. The statistical measures calculated for the 15-year forecasts declare worse values compared with the 5-year forecast accuracy measures. The kinetics of the stand basal area per hectare against the age or the predictions of the mean tree diameter for different tree species and four different stands are shown in Figure 6. As can be seen from Figure 6, the trajectory of the stand basal area of the pine species trees is significantly higher than that of the other species, and the trajectory of the spruce trees is the lowest. The results of the accuracy of the stand basal area per hectare predictions and the forecasts expressed by the statistical measures are calculated in Table 11. Table 11 shows that the stand basal area per hectare predictions and the 15-year forecasts are in good agreement with the observed values and are unbiased.

Stand Density Predictions and Forecasts
In large-scale forest scenario analysis tools, the key is the inclusion of models on stand density effects. Having formulated appropriate mathematical models of stand density, there is the need to have multi-cycle measurement data on stand density that cannot normally be obtained from European-scale forest inventories. As the data in this study include remeasurements from 1 to 7, the accuracy of stand density models is not difficult to assess. On the basis of stochastic differential Equation (1), the stand density kinetic model can be formulated using the mean trends of the area listed in Table A1 in the following where K is the number of observed plots in a validation dataset; is the part of the stand occupied by the tree species concerned ( 1, 0 1, 0 1, 0 1); the fixed-effect parameter estimations are taken from Table 2; and the random effects φ are calibrated by Equation (A22) using the validation dataset. The kinetics of the stand density area against the age for the different tree species and the four different stands are shown in Figure 7. The results of the statistical measures of the stand density predictions and forecasts are calculated in  The mathematical models of individual-tree and whole-stand variables discussed in Sections 3.1-3.3 are extensively analyzed in the previous literature. Most models remain static over a long period of intensive research and ignore the importance of covariance relationships. Another disadvantage of the existing models is that a large system of mathematical equations is used to model a specific tree or stand variable, and the most appropriate relationship is selected from the list using simple statistical indices. It is worth noting a few works, such as [32][33][34][35][36][37][38]. In [32], the description of tree size variables is made by applying statistical models to the diameter distributions. The authors of [33] studied the problems of the performance of the height-diameter models in a local and regional context, and analyzed their use for modeling stand variables. They then considered the extensions and possible limitations of a system of algebraic difference equations [34,35]. Cieszewski and Bailey examined the generalized algebraic difference approach that can be used to derive age-dependent invariant difference equations capable of describing variable asymptotes [36]. Pommerening and Muszta explored and analyzed the definitions of absolute and relative growth rates, growth acceleration, growth multipliers, and allometry from a mathematical point of view [37]. General physical ideas combined with modern methods in forest growth modeling are considered in detail in [4,38].

Conclusions
The main stage in this study consisted of adjusting the Gompertz-and Vasicek-type diffusion processes to model the stand density and basal area in both individual-tree and whole-stand scenarios for uneven mixed-species stands. The newly developed mixed-effect parameters trivariate probability density function of the tree diameter, potentially available area, and height enabled the formulation of the equations for the mean, quantile, and variance trends of the stand density, and the tree or stand basal area in mixed-species forest stands. The model forms and predictors were derived in such a way that the models can be used in all stand structures, and are able to be employed to simulate transformation from even-aged to multi-layered structures.
Our future research will be focused on the creation of a general multivariate copula model for the evaluation of individual-tree or whole-stand variables, such as basal area, density, volume, and their increments. Copula models are very popular because they simplify the specification of a multivariate distribution, allowing the marginal distributions to be modeled arbitrarily at different sample sizes, and then combined using a copula function.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
Original data presented in the study are included in the main text, and further inquiries can be directed to the corresponding author.
The univariate conditional distribution of | , j = 1, 3, l = 1, …, M at a given is univariate normal , ; , and the univariate conditional distribution of at a given , j = 1, 3 is univariate lognormal , ; , with the means and variances defined by: , , , 22 ln (A9) , 1 ≤ j, k ≤ 3 (A10) The conditional mean, median, mode, p-quantile (0 < p < 1), and variance trends of the diameter, area, and height are listed in Table A2. The conditional functions defined in Table A2 represent the cumulative growth of the diameter, area, and height attained by a tree at any particular age and any particular value of additional explanatory variable (such as diameter, area, and height). Table A2. Conditional mean, median, mode, p-quantile (0 < p < 1), and variance trends with one predictor.

Mode ,
Quantile (0 < p < 1) , ; The univariate conditional distribution of | at a given and is univariate normal , , ; , the univariate conditional distribution of at a given and is univariate lognormal , , ; , and the univariate conditional distribution of | at a given and is univariate normal , , ; , with the means and variances defined by: , , ln , , , , The conditional mean, median, mode, p-quantile (0 <p < 1), and variance with two explanatory variables for diameter, area, and height trends can be expressed as shown in Table 2 additionally by changing the mean