Seemingly Unrelated Mixed-Effects Biomass Models for Black Locust in West Poland

Information about tree biomass is important not only in the assessment of wood resources but also in the process of preparing forest management plans, as well as for estimating carbon stocks and their flow in forest ecosystems. The study aimed to develop empirical models for determining the dry mass of the aboveground parts of black locust trees and their components (stem, branches, and leaves). The research was carried out based on data collected in 13 stands (a total of 38 sample trees) of black locust located in western Poland. The model system was developed based on multivariate mixed-effect models using two approaches. In the first approach, biomass components and tree height were defined as dependent variables, while diameter at breast height was used as an independent variable. In the second approach, biomass components and diameter at breast height were dependent variables and tree height was defined as the independent variable. Both approaches enable the fixed-effect and cross-model random-effect prediction of aboveground dry biomass components of black locust. Cross-model random-effect prediction was obtained using additional measurements of two extreme trees, defined as trees characterized by the smallest and largest diameter at breast height in sample plot. This type of prediction is more precise (root mean square error for stem dry biomass for both approaches equals 77.603 and 188.139, respectively) than that of fixed-effects prediction (root mean square error for stem dry biomass for both approaches equals 238.716 and 206.933, respectively). The use of height as an independent variable increases the possibility of the practical application of the proposed solutions using remote data sources.


Introduction
Black locust (Robinia pseudoacacia L.) is an alien tree species in Poland. Its natural range covers the southeastern part of the United States, and its optimum elevation range is between 150 and 1500 m above sea level in the Appalachian Mountains [1]. This tree species was brought to Europe around 1600 by the French gardener Jean Robin [2]. Initially, black locust was grown mainly as a park tree. Over time, due to its properties, the species has found a wider application [2][3][4]. The mechanical properties of the black locust wood make it suitable for veneers, fence posts, poles, flooring, furniture, and boat building. It is also a source of valuable fuelwood because of its high calorific potential [4,5]. This tree species is recommended for soil restoration and regeneration of damaged habitats. Due to its low habitat requirements and ability to fix atmospheric nitrogen, it is planted on degraded, post-industrial, post-agricultural, and initial soils [6][7][8][9][10]. Black locust is now quite common in the United Kingdom, Germany, France, the Netherlands, Belgium, Italy, and Switzerland [11]. It is the most frequently planted tree species in Hungary, occupying 23% of the total forest area of the country [12]. Moreover, the distribution of black locust throughout Europe appears to be mostly limited by low temperatures, but global warming might enhance its growth in presently colder areas [13,14].
Black locust was brought to Poland in 1806 [15]. Currently, this tree species occurs almost all over the country, but most of it is found in the western part of Poland. The black locust stands occupy over 273,000 ha and provide 84,000 m 3 of wood annually. In areas with a high share of this tree species, wood demand outweighs the supply [16,17].
The information about tree biomass is important not only for the assessment of wood resources, but it is also a significant element in the process of forest management plan preparation. This data is needed for the estimation of carbon stock in forest ecosystems [18]. Concerning climate change, there is a need to monitor changes in the forest carbon stock and how they influence the atmospheric CO 2 concentration. Obtaining reliable data on forest biomass and carbon stock allows to improve the estimation of the forest carbon sink and its role in the global carbon cycle as a counteraction to the global climate change [19].
The estimation of the carbon stock accumulated in the forest is difficult and requires reliable tools for its estimation. Despite the growing importance of remote sensing methods in collecting biological data about the forest (including volume, aboveground biomass, and amount of accumulated carbon, etc.), the methods based on field measurements are still important tools to assess forest resources [20] and often are used to verify the results obtained by remote sensing techniques [21].
One of the methods of assessing the tree and stand biomass is to use empirical models based on easy-to-measure tree parameters such as diameter at breast height (DBH) or height (H). Due to the high variability in the shape and structure of tree stems (observed even within one tree species) resulting from the genetically determined variability, local growing conditions, forest treatments, etc., it seems necessary to develop local models to determine the biomass of trees and their components [22,23].
The tree and stand biomass data are characterized by a grouped structure, containing information about trees and sample plot. Modeled relationships between dependent and independent variables differ among sample plots. The mixed-effects models make it possible to describe dependencies at different levels of the data [24,25]. The fixed-effects approach allows modeling data for a typical sample plot (defined grouping level), while the random-effects approach defines difference between each plot and the typical plot [26]. The description of dependencies at different levels translates into the flexibility of predictive model. If no local information is available to estimate the random effects, the prediction using the fixed part of the model can be used. In the case of sufficient data availability, a more accurate random-effects prediction can be carried out [27][28][29]. Random-effects prediction takes on a new meaning in the case of multivariate models, in which the crossmodel correlations within the system of developed models based on the estimated best linear unbiased predictor (EBLUP) is used [28]. In addition, mixed-effects biomass models are fitted in different forms, as they can be nonlinear [30] or linear with logarithmic data transformation [26,27,31].
During biomass modeling, it is worth considering the logical assumption that the sum of the biomass components of the tree estimated using equations should be equal to the estimated biomass of the whole tree (so-called additivity of the biomass models). This assumption can be met by seemingly unrelated regression application [32,33]. The assumption about the additivity of the biomass model was the basis of biomass models e.g., in Germany [34], Canada [35], and Poland [36]. Above papers are examples of using the seemingly unrelated regression. However, the knowledge of the authors indicates that no research has been undertaken yet that focuses on the development of model systems using diameter at breast height and height as both independent and dependent variables.
This study aimed to develop empirical models for determining the dry mass of the aboveground part of black locust trees and its components (stem, branches, and leaves). The models were developed using multivariate mixed-effects models with sample plot as grouping level. To obtain a compatible set of models, we used a seemingly unrelated regression [32,33]. We fitted two approaches of the models. In the first one, as a classic one, DBH was the independent variable. In the second one, the H was defined as the independent variable. Second approach is a new concept to biomass modeling, which allows to increase the practical application of the proposed solutions using remote sensing data sources. Moreover, we developed the model system applying the cross-model calibration (random-effects prediction) of the biomass component models using H and DBH.

Study Sites
The data was collected in West Poland (two State Forest Districts with a total area equal to 50,000 ha, 52 • 08 -51 • 30 N, 15 • 23 -16 • 14 E), which is the region with the highest abundance of black locust in Polish forests [17]. We selected 13 stands located on oligotrophic sites that are typically occupied by the investigated species in Poland ( Table 1). The chosen stands grew on moderate fertile rusty soils. The study sites are located in a transition zone from maritime temperate to continental temperate climate [37]. The mean annual temperature reaches 9.5 • C. January is the coldest month with an average temperature slightly below 0 • C, while the highest average temperature is recorded in July (+19.0 • C). The average annual precipitation is rather low and rarely exceeds 550-600 mm. The majority of the rainfall is recorded during the vegetation period [38].

Material Collection and Preparation
At each study site, we established a rectangular sample plot whose size was set to comprise at least 100 trees (size range of sample plots varied between 864 and 5944 m 2 ). We measured DBH of all live trees on the study plot (accuracy 0.1 cm). Then, we randomly chose three samples representing the whole range of diameter structure and felled them for further analyses (a total of 38 sample trees [39]). The DBH of the harvested trees varied from 8.1 to 46.7 cm and their height ranged from 7.5 to 29.22 m. The mean values of these parameters amounted to 24.1 cm and 20.11 m, respectively ( Table 2). The felled trees were measured section-wise (1-m-long sections) and divided into the following biomass components: stem, branches, and foliage. All parts of each sample tree were weighed in the field using portable scales (precision 0.1 g). Samples of each of the components (stem discs in the middle of each section, and 50-150 g random samples of branches and foliage) from every tree were taken to determine the relationship between fresh and dry biomass. The samples were oven-dried at 105 • C until they reached a constant weight [40]. The dry biomass of various components (Table 2, Figure 1) was calculated for each sample tree based on corresponding fresh to dry mass ratios [41]. Table 2. Diameter at breast height (DBH, cm), tree height (H, m), stem dry biomass (SDB, kg), branches dry biomass (BDB, kg) and foliage dry biomass (FDB, kg) of black locust sample trees.

Individual Mixed-Effect Models
Two approaches were used when elaborating individual mixed-effects models. To ensure that the predicted variables are positive, the first approach involved modeling the dependence of dependent variables, as logarithms of dry biomass components (stem, branches, and foliage) and H-1.3, on DBH defined as the independent variable. In the second approach, the logarithms of dry biomass components and DBH were dependent variables, while H was defined as an independent variable. Thus, the general model for the dependent variable of tree j was: For both approaches, linear mixed-effects modeling with sample plot as a single grouping factor (no other potential factors of grouping were present) was used. Each model for the tree j, j = 1, …, ni of plot i, i = 1, …, M, was of the following form: where ′ is the fixed part and ′ + describes the random part. Moreover, it is assumed that:

Individual Mixed-Effect Models
Two approaches were used when elaborating individual mixed-effects models. To ensure that the predicted variables are positive, the first approach involved modeling the dependence of dependent variables, as logarithms of dry biomass components (stem, branches, and foliage) and H-1.3, on DBH defined as the independent variable. In the second approach, the logarithms of dry biomass components and DBH were dependent variables, while H was defined as an independent variable. Thus, the general model for the dependent variable of tree j was: For both approaches, linear mixed-effects modeling with sample plot as a single grouping factor (no other potential factors of grouping were present) was used. Each model for the tree j, j = 1, . . . , n i of plot i, i = 1, . . . , M, was of the following form: where x ij β is the fixed part and z ij b i + ij describes the random part. Moreover, it is assumed that: where b i are independent among plots and of the residual errors ij . Matrix ψ is a positive definite variance-covariance matrix of random effects. The zero-mean residual errors ij were mutually independent, normally distributed, with variance calculated using the following formulas: First approach [24]: where σ 2 and δ are the estimated scale and shape parameters of the power function, respectively. The special case when δ = 0 leads to homoscedastic residuals.

Seemingly Unrelated Mixed-Effects Model System
Seemingly unrelated (multivariate) mixed-effects models with random group effects are useful in the case when several independent variables are measured for the same sample units. In this modeling context, random-effects prediction utilizes the cross-model correlations of the model system, thus taking into account the logical interactions between the biomass components. Moreover, their use in addition to the fixed-effects prediction enables to apply the relationship among the random effects and residuals across models for random-effects prediction. Assuming that we have L individual mixed-effects models and series of the observations of the mth of plot i to vector y i (m) and the corresponding model matrices of the fixed and random elements to X i (m) and Z i (m) , the seemingly unrelated mixed-effects model system for each plot i, i = 1, . . . , m can be elaborated as [25,27]: where var b (l) i = ψ (l) for l = 1, . . . ,L. We assume that the cross-model correlations of the random effects and residuals are the same for all groups. The above multivariate model can be written as an univariate one: where y i = y (1) i , y i , . . . , y (L) i , correspondingly for b i and i and X i and Z i are blockdiagonal matrices, which have the response-specific model matrices on the diagonal [25,27].

Fixed-and Random-Effects Prediction
The main purpose of creating regression models is to use them for predicting dependent variable. The strength of mixed-effects models lies in the fact that, depending on the circumstances, it is possible to perform both fixed-and random-effects predictions, whereas the system of seemingly unrelated mixed-effects models allows to use also the cross-model correlations of random effects and residual errors to predict all dependent variables using measurements of some of them. This is possible by defining vectors that include only observed components and by defining matrices by removing rows and columns that correspond to the unobserved variables. During the analysis, the best linear predictor of the complete random effect vector is calculated, and the fixed part and variance-covariance matrices are replaced by their estimates, which leads to the EBLUP [25,27].
In this study, during plot-specific cross-model random-effects prediction, we used both seemingly unrelated mixed-effects model system approaches. To predict unobserved biomass components during calibration in the first and second approach, we used H and DBH as a measured dependent variable, respectively. In our case, dependent variables were predicted in a logarithmic scale. To transform them to the original scale we applied bias correction by adding half of the prediction variance into the predictions before applying the exponential transformation [42]. The fixed-effects approach was applied for 0 available trees in sample plot, whereas the cross-model random-effects prediction was created for two available extreme trees in sample plot [43]. We used DBH as their selection criterion. The remaining tree in sample plot was used to calculate the root mean square error (RMSE) based on the equation: where y j describes the observed value of the dependent variable,ŷ j predicted value of the dependent variable, and n explains the number of trees. All analyses and graphs were performed using the nlme [44] and ggplot2 [45] packages of R 3.6.3 software [46] and RStudio 1.3.1073 [47].

Results
The use of the logarithm of the dependent variables and the linear individual mixedeffects models with a sample plot as a grouping factor sufficiently reflects the modeled dependencies for both the first and the second approach. In the case of the first approach, Akaike Information Criterion (AIC) for height equals −9.9, while for stem dry biomass, branches dry biomass, and foliage dry biomass, it equals 32.4, 69.7, and 57.9, respectively ( Figure 2). In the case of the second approach, AIC for DBH equals 7.2, while for stem dry biomass, branches dry biomass, and foliage dry biomass, it equals 56.9, 104.5, and 78.2, respectively ( Figure 3).
(RMSE) based on the equation: describes the observed value of the dependent variable, ̂ pred the dependent variable, and explains the number of trees.

Results
The use of the logarithm of the dependent variables and the linear indiv effects models with a sample plot as a grouping factor sufficiently reflects dependencies for both the first and the second approach. In the case of the f Akaike Information Criterion (AIC) for height equals −9.9, while for stem branches dry biomass, and foliage dry biomass, it equals 32.4, 69.7, and 57.9 ( Figure 2). In the case of the second approach, AIC for DBH equals 7.2, while biomass, branches dry biomass, and foliage dry biomass, it equals 56.9, 10 respectively ( Figure 3).
. Individual mixed-effect models (first approach) for height (H, (a)), stem dry biomass (SDB, (b)), bra (BDB, (c)), and foliage dry biomass (FDB, (d)) as dependent variables with diameter at breast height dent variable.   While analyzing the obtained residuals for both approaches of individual mixed-effect models, heteroscedasticity was identified. Application of the power function (Equations (4) and (5)) for residual variance modeling allowed achieving the stable distribution of residuals ( Figure 4). Furthermore, a likelihood ratio test indicated that the variance function significantly improved the fit (p < 0.0001). The estimated fixed regression coefficients and variance function parameters of the model system are given in Table 3. While analyzing the obtained residuals for both approaches of individual mixed-effect models, heteroscedasticity was identified. Application of the power function (Equations (4) and (5)) for residual variance modeling allowed achieving the stable distribution of residuals ( Figure 4). Furthermore, a likelihood ratio test indicated that the variance function significantly improved the fit (p < 0.0001). The estimated fixed regression coefficients and variance function parameters of the model system are given in Table 3.
For the first approach of the model system, the random effects for H are strongly correlated with random effects estimated for biomass components. The strongest correlation is observed between H random effects and stem dry biomass (range from −0.981 to 0.978), while the weakest correlation is observed between H and foliage dry biomass (range from −0.579 to −0.549) ( Table 4). In the case of biomass components, the strongest correlation between random effects is between stem dry biomass and branches dry biomass (range from −0.849 to 0.856). For the second approach of the model system, the correlation between random effects of DBH and biomass components is within the range from −0.007 to −0.992 (Table 4). Similar to the first approach in the case of biomass components, the strongest correlation between random effects is between stem dry biomass and branches dry biomass (range from −0.330 to 0.982).     Table 4. Random-effects variance-covariance matrix for two approaches of seemingly unrelated mixed-effects model system. Matrix diagonal-variance, matrix upper triangle-correlation between random effects.

First Approach
Dependent variable Comparing both approaches of the model system in terms of the correlation of the random effects of the non-biomass component with the biomass components, it is difficult to find an unambiguous trend. In the case of stem dry biomass, the strongest correlation is visible for H (first approach), while for branches dry biomass the strongest correlation between random effects occurs with DBH (second approach, Table 4).
The strongest correlation between residual errors for the first approach of the model system occurs for branches dry biomass and foliage dry biomass (0.557, Table 5), while the weakest correlation is present between stem dry biomass and foliage dry biomass (-0.100).
The strongest correlation between residual errors for the second approach is visible for stem dry biomass and diameter at breast height (0.933). Generally speaking, the stronger correlation between residual errors for all analyzed dependent variables occurs in the second approach of the model system (Table 5). Fixed-effects prediction for the second approach allows reaching lower RMSE for biomass components than the fixed-effects prediction for the first approach (Table 6). Both H and DBH measurements for two extreme (the thickest and thinnest in sample plot) trees indicated that plot-specific cross-model random-effects prediction allowed to obtain a smaller RMSE than fixed-effects prediction. The only exception was branches dry biomass for the second approach. Random-effects prediction for the first approach allows reaching the smallest RMSE value (the exception is foliage dry biomass, Table 6). Table 6. Root mean square error achieved during cross-model fixed and random-effects prediction for both approaches of seemingly unrelated mixed-effects model system.

Discussion and Conclusions
During this research, 38 felled trees from 13 sample plots located in West Poland were the basis for the elaboration of two approaches of a seemingly unrelated linear mixedeffects model system. In the first approach, the independent variable is the diameter at breast height, while in the second one, tree height. During analysis, both the fixed-effects and cross-model random-effects prediction in both approaches were taken into account. The sample plot was included as the grouping level. During random-effects prediction, tree height and diameter at breast height measurements for two available extreme trees in a plot in question were tested. Fitting of seemingly unrelated mixed-effects model system in both approaches enables the fixed-effects and cross-model random-effects prediction of aboveground dry biomass components (stem, branches, and foliage) of black locust.
When modeling individual biomass components, it is worth considering the logical assumption that the sum of the component of the tree estimated using equations should be equal to the estimated biomass of the whole tree. This assumption can be met by seemingly unrelated regression application [32,33,48,49]. The assumption about the additivity of the biomass model system and the use of nonlinear models was the basis of elaboration of a consistent set of additive biomass functions for eight tree species and nine components in Germany [34]. During that study, the authors defined two types of models: (i) a simple model with the diameter at breast height and height as independent variables, and (ii) a full model with more independent variables. Likewise, nonlinear seemingly unrelated regression was also used for nationwide biomass models covering the most common tree species of Canada's forests [35]. Models fitted in this study allow us to estimate dry biomass of four components (stem wood, stem bark, branches, and foliage) using either diameter at breast height or, unlike our research, a combination of diameter at breast height and height. Likewise, in the case of seemingly unrelated empirical equations for estimating aboveground biomass of Betula pendula growing on former farmland in central Poland, diameter at breast height or diameter and tree height were defined as independent variables [36]. Seemingly unrelated empirical equations for the determination of the dry biomass of aboveground components for Scots pine in Poland also used diameter and height as independent variables [50], whereas simplified empirical formulas to determine the dry biomass of aboveground components of trees for Scots pine [51] and seemingly unrelated models for above-and belowground biomass for young silver birch [27] used diameter at breast height as the independent variable.
The cited studies are examples of using the seemingly unrelated regression with different approaches to the independent variables used. However, the knowledge of the authors indicates that no research has been undertaken yet that focuses on the development of independent model systems using diameter at breast height and height as independent variables. On the one hand, it is a new approach to biomass modeling, and on the other, due to the use of height as an independent variable, it increases the possibility of the practical application of the proposed solutions using remote data sources [52][53][54].
Since biomass data is often characterized by a grouped structure, mixed-effects models for biomass modeling could be used. The already-mentioned research is an example of this solution [34]. In this case, the authors took into account groups and trees as random effects, while Repola [55], during his research, elaborated three multivariate variance component models for the aboveground biomass components and one for the belowground biomass components. In this research, the multivariate model was based on diameter at breast height and height, and for the next multivariate models, additional commonly measured tree variables were used. It is worth to mention that unlike our research, Repola [55] included both the stand and tree-level during prediction, whereas Fehrmann [26], similar to our research, elaborated plot-specific linear mixed-effect biomass models. Notwithstanding, in his case, diameter at breast height and height were used at the same time as independent variables.
Due to the inclusion grouping data structure during modeling, the mixed-effects models allow the flexible prediction. In the absence of the measurement, data fixed effects can be used, whereas additional measurements allow increasing the accuracy through random-effects prediction [24,25]. However, in the case of biomass modeling, measurement of additional data is troublesome, because it is associated with the felling of trees. That is why multivariate mixed-effects model application allows taking full advantage of the mixed-effects models. In the case of our study, additional measurements of tree height (the first approach of the elaborated model system) and diameter at breast height (the second approach of the elaborated model system) allow us to reach more accurate random cross-model prediction for analyzed biomass components [28,29].