Designing Wood Supply Scenarios from Forest Inventories with Stratified Predictions

Forest growth and wood supply projections are increasingly used to estimate the future availability of woody biomass and the correlated effects on forests and climate. This research parameterizes an inventory-based business-as-usual wood supply scenario, with a focus on southwest Germany and the period 2002–2012 with a stratified prediction. First, the Classification and Regression Trees algorithm groups the inventory plots into strata with corresponding harvest probabilities. Second, Random Forest algorithms generate individual harvest probabilities for the plots of each stratum. Third, the plots with the highest individual probabilities are selected as harvested until the harvest probability of the stratum is fulfilled. Fourth, the harvested volume of these plots is predicted with a linear regression model trained on harvested plots only. To illustrate the pros and cons of this method, it is compared to a direct harvested volume prediction with linear regression, and a combination of logistic regression and linear regression. Direct harvested volume regression predicts comparable volume figures, but generates these volumes in a way that differs from business-as-usual. The logistic model achieves higher overall classification accuracies, but results in underestimations or overestimations of harvest shares for several subsets of the data. The stratified prediction method balances this shortcoming, and can be of general use for forest growth and timber supply projections from large-scale forest inventories.


Introduction
Emerging bioeconomy strategies consider lignocelluloses from forests as an alternative to fossil raw materials [1][2][3], suggesting that the demand for woody biomass is likely to increase globally [4,5].At the same time, forest reference level scenarios gain increasing importance with regards to the United Nations Framework Convention on Climate Change (UNFCCC) [6,7].National forest inventories (NFIs) and aggregated forest growth and wood supply modeling (AFGWSM) can be used to estimate the future availability of woody biomass and the correlated effects on forests and climate.NFIs have been established in several countries to generate data about forest conditions, dynamics, and productivity.Today, most European and North American countries conduct forest inventories based on statistical sampling [8][9][10].In addition, many countries use or are developing AFGWSMs that are directly connected to the respective NFI data [11].
Although wood supply scenarios are not explicitly forecasts, they are expected to generate results that qualify as decision support for policymakers, administrations, industry, and other interest groups [12].As such, they are supposed to be "a coherent, internally consistent and plausible Forests 2018, 9, 77 2 of 24 description of a possible state of the world" [13] (p. 799) [14].Among possible themes for wood supply scenarios, business-as-usual (BAU) scenarios are of specific importance.While focusing rather on short-term trends, these scenario types can establish a baseline according to which other scenarios can be shaped or referenced [15].
To design BAU scenarios for AFGWSMs, "heterogeneous forest land and owners with heterogeneous objectives" [16] (pp.[200][201] have to be integrated.Concept-driven studies that apply theoretical preconceptions to the data [17][18][19][20] can be differentiated from data-driven methods that-as is the goal of this research-aim to learn these concepts from the data first [21]. In the past, numerous data-driven studies have focused on the objectives and harvest decisions of forest owners.An overview of research on the non-industrial private forest (NIPF) sector, which was the subject of the majority of these studies, is provided by Amacher et al. [22] and Beach et al. [23].A common method is to use panels or surveys that are analyzed with various specifications of tobit or logit models [24][25][26][27][28].Only a few panel-based studies have had access to large datasets [29,30], and the attitude towards or intention to harvest is often measured rather than the actual harvest [31].
NFI-based studies offer the advantage that forest development and timber harvests can be derived from a large number of inventory plots that are often permanent and repeatedly measured [8,32].Unlike survey-based studies, a representative fraction of the actual resource is considered as the principal research unit (PRU), as opposed to the individual forest owner or manager.Several studies used NFI data to review AFGWSM scenarios [33][34][35].Another group of studies used NFI or other forest inventory data to investigate and/or project harvest behavior using regression models or other machine learning algorithms [21,[36][37][38][39].
However, when BAU scenarios are learned and projected from NFI data, researchers are confronted with two major issues.First, unlike survey research, NFI-based studies often lack specific and relatable information about the relevant decisionmakers (forest owners or managers).For example, a study with access to linked inventory and survey data reported impacts of the more specific ownership characteristics of education and income on harvest probabilities [36].However, this information is often not available.In the case of the German NFI, for example, opportunities to collect information on individual plot owners are limited, since the exact location and land ownership of NFI tracts cannot be revealed.Second, on the PRU level, NFIs often produce noisy data that is not representative of the forest area upon which a harvest decision is made.Instead, data only become representative at higher aggregation levels.
The principal aim of this research was to parameterize a BAU wood supply scenario from NFI data by dealing explicitly with this restriction.The quality of the scenario was judged on its ability to reproduce 2002-2012 timber harvests reported by the NFI.To consider a BAU scenario as well calibrated, not only overall NFI harvested volumes should be well captured.The scenario should also retrace NFI harvested volumes throughout characteristics of the inventory data (e.g., stand types and timber dimensions).Furthermore, it was considered important that the observed occurrence or non-occurrence of harvest interventions at plot level be reflected in the scenario.
To address this issue, a stratified machine learning approach based on the Classification and Regression Tree (CART) algorithm [40] was designed and tested in the scope of this research.The method optimizes the prediction of harvest occurrences and harvest shares towards patterns of the original inventory.It can be used as an upstream model that predicts harvest decisions (yes or no) for individual inventory plots.Commonly, overall cut-off benchmarks [41] are used to convert predicted probabilities into a binary decision.In contrast, the presented method uses the learned harvest probabilities of strata as decision criteria.The first rule is that this probability must be met by the number of plots selected as harvested in each stratum.For the selection of individual plots to be harvested per stratum, two alternative options are presented: random selection (assuming that no other attributes would influence the harvest decisions in the strata), or a Random Forest (RF) algorithm [42] trained on the training data of the corresponding stratum.Once the plots are predicted Forests 2018, 9, 77 3 of 24 as harvested by the stratified approach, linear regression trained on harvested inventory plots only can be used to predict the harvested volumes for these plots.
To assess the pros and cons of the presented stratified harvest prediction approach, results are compared to a direct harvested volume prediction with linear regression, as well as to a two-step approach with logistic regression (using an overall cut-off benchmark), followed by linear regression trained on harvested inventory plots only.
A unique feature of this paper is that existing machine learning techniques are adapted and combined to serve specific requirements of large-scale inventory based harvest decisions and harvested volume predictions.The results of this research can be used to project business-as-usual timber supplies directly for a 10-year period (in this case 2012-2022).Furthermore, the results can inform harvest scenarios, which are used in combination with NFI-based forest growth projections, as described by Kändler and Riemer [43].

Focus
The regional focus of the study was Baden-Württemberg, a German federal state located in southwest Germany.Baden-Württemberg shares borders with Switzerland (south) and France (west).The state covers an area of around 35,750 km 2 .Around 38.4% of this area is considered to be forest.Around 40% of the forest area is managed by communities, 35.9% is managed by private owners, 23.6% is managed by the state, and only 0.5% is managed by federal bodies [44].A small proportion (around 1.5%) of the forest that was classified as community property was parish or cooperative forest in 2002.Spruce (Picea abies (L.) Karst.) is the most common tree species, and beech (Fagus sylvatica L.) is the most common broadleaved tree species in the region.The analysis focused on a 10-year period from 2002 to 2012.This corresponds to the period between the second and third German NFIs.In this period, forests in Baden-Württemberg were affected by the aftermath of storm Lothar (1999), a drought period in 2003, and storm Kyrill (2007).
With reference to the year 2008, 690 saw mills (116 with more than 10 employees), 52 wood composite producers, and 128 pulp and paper producers were counted within Baden-Württemberg.In 2008, a higher number of saw mills was registered in the west-especially in the southwest-of the federal state.However, the saw mills in the east-especially in the northeast-recorded much higher turnovers per holding.The timber market was dominated by spruce and fir (Abies alba Mill.), which accounted for around 80% of the timber sold via the federal state's forest service in the period 2000-2009.With reference to the same period, 88% of the timber sold via the forest service was stem wood [45].

Data Sources
NFI data (available online at Thünen Institut [44]) constituted the principal data source for this research.To date, three NFIs have been conducted in Germany; the first from 1986 to 1988, the second from 2001 to 2002 and the third from 2011 to 2012.The German NFI has been designed as a systematic single-level cluster sampling, with tracts (clusters of one to four plots) as primary sampling units [46].Plots are permanently marked and remeasured within the scope of each inventory.For this research, trees registered via the angle-count method (caliper threshold 7 cm) were of the highest relevance.For Baden-Württemberg, data from 8963 tracts, with a total of 35,743 plots, were collected on a 2 km × 2 km grid in the scope of the second NFI.Out of these, 13,619 plots were located within forest areas [46,47].In the scope of this research, only the plots that were registered as accessible, productive forest not owned by the federal government (see Section 2.4) were used.The final set of NFI plots included 11,722 plots from 4306 tracts, and did not contain missing values for any of the included variables.

Response Attributes
Both response and predicting attributes were registered in the scope of the NFI, or calculated from NFI data (see Tables A1 and A2).The variable harvested volumes was expressed as cubic meter standing volume over bark (m 3 ob) per ha within a 10-year period (2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012).Since the exact year of harvest remains unknown, harvested volumes correspond to tree dimensions estimated with the Sloboda trend function for the mid-period (2007) [46,48].To calculate harvested volumes, only trees registered via angle count sampling were used.Moreover, only trees indicated as selectively cut or harvested by clear felling and removed from the forest within the period of interest were considered.To extrapolate data from the angle-count method to one-hectare values, tree representation factors from the beginning of the period (second NFI) were used [46].
The variable harvest decision was generated on plot level as a binominal representation of harvested volumes with the characteristics yes (harvested volumes > 0) and no (harvested volumes = 0).

Predicting Attributes
The selection of potential influencing factors (predicting attributes) was determined using two criteria: First, existing concepts (derived from the literature) of how forest owners or managers make harvest decisions, and, second, the availability of information that could be linked directly to NFI plots.Based on this, the following 11 attributes were included in this research: Stand-related attributes (stand type, standing volume, average plot diameter at breast height (DBH), and average plot age), site-related attributes (site index, harvest condition, slope, and altitude above sea level (a.s.l.), ownership-related attributes (ownership type), and policy-related attributes (nature park, and nature protected area).
The variable stand type differentiated between spruce, fir, and Douglas fir (Pseudotsuga menziesii (Mirb.)Franco) stands (conifers 1); pine (Pinus sylvestris L.), larch (Larix decidua Mill.), and other softwood stands (conifers 2); deciduous stands (deciduous); mixed stands with spruce, fir or Douglas fir as principal species (mixed 1); and other mixed stands (mixed 2).A description of the stand types can be found in Table A3.The variable standing volume referred to the standing volume across all of the species at the time of the second NFI.It was indicated as m 3 ob per ha.The variable average plot DBH was calculated as the average DBH of all of the trees (across all species), which was measured on a plot by taking the representation factors of the sample trees into account.Only sample trees of the dominant and predominant forest stories were considered.The variable average plot age was calculated in a similar way to the average DBH, but with regards to the age of the individual trees and for the mid-term of the period (2007).
The variable site index [21] was expressed as the yearly average total increment in m 3 ob per ha at the age of 100 years, and originated from Ministerium für Ländlichen Raum (MLR) [49].Slope, distance to the road, and terrain roughness are identified in the literature as having an impact on wood production through harvesting costs and profitability [21,38,50].The variable harvest condition was registered in the scope of the third NFI.It included the classes favorable and unfavorable.Unfavorable harvest conditions refer to slopes greater than 30% that were not suitable for harvesters due to chunks of rocks, pits, or spring horizons.No differentiation was made between the necessity and non-necessity of cableways, as both groups contained only low numbers of NFI plots.The class also contained plots with large machine track distances (>150 m), and large distances between forest plot and nearest driveway (>1 km) [47].The attribute slope provided information about the slope at the location of the individual NFI plot in percent, and was also tested separately from harvest condition.The variable altitude (m a.s.l.) was included, since previous research established a relationship between altitudinal ranges (elevation) and timber harvests.Sterba et al. [33] found that the scenario they analyzed underestimated salvage harvests of smaller DBH classes at lower elevations.They attribute this finding to a higher susceptibility of these forest types to bark beetles, which they trace back to the biology of Ips typographus and Ips chalcographus.Thürig and Schelhaas [35] pointed towards factors related to altitude (a.s.l.) such as harvesting costs, accessibility, and forest functions (e.g., protection from avalanches in higher altitudes).Different ownership types are considered to show different harvesting behaviors [17,18,37].The analyzed attribute ownership type included the subgroups federal, state, community, and private forest.The private forest was further differentiated into large (>500 ha), medium (5-500 ha), and small (<5 ha).These size classes referred to the total forest area owned by the proprietor of the respective NFI plots within Germany.Federal plots were excluded from the analysis, since the NFI recorded only a few plots of this ownership type.
The variables nature park and nature protection area indicate whether the area surrounding an NFI plot was covered by the respective directive.Some attributes could not be included in the analysis.Profitability is a factor that is often considered with reference to harvest decisions [37].However, the calculation of potential returns at plot level would require pre-assumptions about applied harvest regimes, and a pre-selection of individual trees to be harvested.This would then contradict the data-driven approach followed in this research.The concept of profitability could, therefore, only be tested from a cost perspective by including the attributes harvest condition and slope.Several studies divide medium and small-scale forest owners into various subcategories [25,51].Due to a lack of information and small sample sizes, these subcategories could not be analyzed.
For the ordinary least squares (OLS) regressions and the logistic regression (LREG), numeric variables were centered.

Research Design
Figure 1 shows the research design for learning and predicting harvest decisions and harvested volumes.This paper focuses predominantly on the stratified methods used to learn and predict the binary harvest decision.The location of these elements within the prediction chain is emphasized in Figure 1 (bold and dashed lines).
Different ownership types are considered to show different harvesting behaviors [17,18,37].The analyzed attribute ownership type included the subgroups federal, state, community, and private forest.The private forest was further differentiated into large (>500 ha), medium (5-500 ha), and small (<5 ha).These size classes referred to the total forest area owned by the proprietor of the respective NFI plots within Germany.Federal plots were excluded from the analysis, since the NFI recorded only a few plots of this ownership type.
The variables nature park and nature protection area indicate whether the area surrounding an NFI plot was covered by the respective directive.Some attributes could not be included in the analysis.Profitability is a factor that is often considered with reference to harvest decisions [37].However, the calculation of potential returns at plot level would require pre-assumptions about applied harvest regimes, and a pre-selection of individual trees to be harvested.This would then contradict the data-driven approach followed in this research.The concept of profitability could, therefore, only be tested from a cost perspective by including the attributes harvest condition and slope.Several studies divide medium and small-scale forest owners into various subcategories [25,51].Due to a lack of information and small sample sizes, these subcategories could not be analyzed.
For the ordinary least squares (OLS) regressions and the logistic regression (LREG), numeric variables were centered.

Research Design
Figure 1 shows the research design for learning and predicting harvest decisions and harvested volumes.This paper focuses predominantly on the stratified methods used to learn and predict the binary harvest decision.The location of these elements within the prediction chain is emphasized in Figure 1 (bold and dashed lines).
The createDataPartition function of the caret R package [52,53] was used to split the data into training and test sets.This method was chosen since setting aside a test set completely is preferable to cross-validation or boosting if enough data is available [54].Models were trained and refined on training data only (75% of the data).The fitted models were then applied to the test data one time to predict harvest decision and harvested volumes.Finally, the test set outcome of the various models was compared to NFI results.   1 Direct refers to a one-step training and prediction method where harvested volumes are predicted directly.Downstream and upstream models are the components of a two-step approach, where the harvest decision is predicted first (upstream), and subsequently (downstream), harvested volumes are predicted for plots predicted as harvested.
The createDataPartition function of the caret R package [52,53] was used to split the data into training and test sets.This method was chosen since setting aside a test set completely is preferable to cross-validation or boosting if enough data is available [54].Models were trained and refined on training data only (75% of the data).The fitted models were then applied to the test data one time to predict harvest decision and harvested volumes.Finally, the test set outcome of the various models was compared to NFI results.

Stratification Model
The Classification and Regression Trees (CART) algorithm [40] built the foundation of the stratified prediction method.They were used to stratify the data, and generated harvest probabilities for each stratum.
CART is a tree-based model that partitions the feature space into rectangles, and subsequently fits a simple model in each rectangle.The method stratifies the data into strata with high and low probabilities [54].CART uses the Gini index, and applies the split that maximizes the Gini [55].
A drawback of CART is that only one variable at a time can be considered [56], and that single trees are considered unstable [57].A commonly used method for overcoming these issues is to grow a number of trees as implemented in Random Forest (RF) models [42].However, this procedure results in individual harvest probabilities per plot (share of trees that voted for yes), and is not suited to a stratified approach.
To generate the classification tree (method = class), the rpart function of R's rpart package [58] was used.The model was trained on the complete training set, including the entire set of predicting attributes.The minimum number of observations per terminal leaf was set at 100 to secure sufficient data in both the training and test sets.As a consequence, a node needed to include a minimum of 200 observations before a split was attempted [59].To reduce overfitting, the original tree was pruned to a size that minimized the cross-validation error calculated by a 10-fold cross-validation within the training set [60].Using this method, the trained classification tree was pruned to 18 leaves.The final outcome is shown in Figure 2.
Within the prediction process, the trained CART tree was applied to the test set.The test set plots were thereby assigned to the various leaves of the tree.Thus, each leaf (stratum) had a harvest probability (learned from the training set), and contained a number of test set plots.

Stratified Random Prediction
One option to predict harvest decisions for test set plots within the strata was to assume that no additional criteria would influence the actual harvest decision within a stratum.
Thus, within each stratum, a number of test set plots was randomly selected as harvested in accordance with the specific harvest probability of the stratum.For example, from a stratum that contained 100 test set plots and a harvest probability of 0.6, 60 plots were randomly selected as harvested.
The advantages and drawbacks of this stratified random prediction method in comparison to the commonly used cut-off method can be demonstrated by using a theoretical example.In this example, 10,000 fake plots are assumed to be sorted by CART into 100 strata containing 100 plots each.The derived harvest probabilities of the strata are assumed to range from 0.01 to 1 in steps of 0.01.In the scope of the cut-off method, plots with probabilities ≤0.5 are considered not-harvested, and plots with probabilities >0.5 are categorized as harvested.The random probability prediction described above is repeated 500 times to derive the arithmetic mean and standard deviations.It is assumed that the training set equals the test set.Figure 3 compares the prediction results of the two methods in question.

Stratified Random Prediction
One option to predict harvest decisions for test set plots within the strata was to assume that no additional criteria would influence the actual harvest decision within a stratum.
Thus, within each stratum, a number of test set plots was randomly selected as harvested in accordance with the specific harvest probability of the stratum.For example, from a stratum that contained 100 test set plots and a harvest probability of 0.6, 60 plots were randomly selected as harvested.
The advantages and drawbacks of this stratified random prediction method in comparison to the commonly used cut-off method can be demonstrated by using a theoretical example.In this example, 10,000 fake plots are assumed to be sorted by CART into 100 strata containing 100 plots each.The derived harvest probabilities of the strata are assumed to range from 0.01 to 1 in steps of 0.01.In the scope of the cut-off method, plots with probabilities ≤0.5 are considered not-harvested, and plots with probabilities >0.5 are categorized as harvested.The random probability prediction described above is repeated 500 times to derive the arithmetic mean and standard deviations.It is assumed that the training set equals the test set.Figure 3   The cut-off method would underestimate the number of harvested plots from strata with probabilities below the cut-off, and overestimate the number of harvested plots from strata above the cut-off.Under and overestimations would be greatest around the cut-off (Figure 3a,b).The advantage of the random prediction is that the number of harvested plots is predicted adequately throughout all of the strata.However, in comparison to the cut-off method, this advantage is offset by generally lower classification accuracies within each stratum (Figure 3c).
One clear disadvantage of the random selection is its randomness.Different runs resulted in different plots being selected as harvested.This selection then further influenced the results of the harvested volume model.
To be able to compare harvest shares and harvested volume results with the outcome of the NFI and competing models, the prediction procedure (including the subsequent harvested volume predictions, see below) was, therefore repeated 500 times.From these repetitions, the arithmetic mean and standard deviations were calculated.

Stratified Trained Prediction
Instead of addressing the selection of harvested plots within a stratum with a randomized procedure, it is also possible to take a trained decision.For this purpose, a follow-up model was trained on a stratum's training set and applied to the test set of the same stratum.Subsequently, the test set plots of the stratum were ranked according to their predicted individual harvest probability.The plots with the highest harvest probability were then selected as harvested until the harvest probability of the stratum was fulfilled.For example, in a stratum with a harvest probability of 30% that contained 100 test set plots, the 30 plots with the highest individual harvest probability were selected as harvested.This procedure was repeated for each stratum (leaf) of the classification tree.In this way, an adequate number of plots was selected as harvested for each stratum (compare the random procedure results of Figure 3a,b) by improving the prediction accuracy at the plot level.
RF algorithms appeared to be most practical to learn and predict harvest probabilities for the individual strata.The advantage of RF is that variable selection is automated.The use of logistic regression, in comparison, would require the manual selection and refinement of the models for each stratum.
To apply the RFs, R's randomForest package [61] was used.Within each RF algorithm, 500 unpruned classification trees were grown from 500 bootstrap data samples.To achieve modifications between the trees, not all, but variables were sampled and considered in order to decide on the best split (where is the total number of considered variables, in this research: = The cut-off method would underestimate the number of harvested plots from strata with probabilities below the cut-off, and overestimate the number of harvested plots from strata above the cut-off.Under and overestimations would be greatest around the cut-off (Figure 3a,b).The advantage of the random prediction is that the number of harvested plots is predicted adequately throughout all of the strata.However, in comparison to the cut-off method, this advantage is offset by generally lower classification accuracies within each stratum (Figure 3c).
One clear disadvantage of the random selection is its randomness.Different runs resulted in different plots being selected as harvested.This selection then further influenced the results of the harvested volume model.
To be able to compare harvest shares and harvested volume results with the outcome of the NFI and competing models, the prediction procedure (including the subsequent harvested volume predictions, see below) was, therefore repeated 500 times.From these repetitions, the arithmetic mean and standard deviations were calculated.

Stratified Trained Prediction
Instead of addressing the selection of harvested plots within a stratum with a randomized procedure, it is also possible to take a trained decision.For this purpose, a follow-up model was trained on a stratum's training set and applied to the test set of the same stratum.Subsequently, the test set plots of the stratum were ranked according to their predicted individual harvest probability.The plots with the highest harvest probability were then selected as harvested until the harvest probability of the stratum was fulfilled.For example, in a stratum with a harvest probability of 30% that contained 100 test set plots, the 30 plots with the highest individual harvest probability were selected as harvested.This procedure was repeated for each stratum (leaf) of the classification tree.In this way, an adequate number of plots was selected as harvested for each stratum (compare the random procedure results of Figure 3a,b) by improving the prediction accuracy at the plot level.
RF algorithms appeared to be most practical to learn and predict harvest probabilities for the individual strata.The advantage of RF is that variable selection is automated.The use of logistic regression, in comparison, would require the manual selection and refinement of the models for each stratum.
To apply the RFs, R's randomForest package [61] was used.Within each RF algorithm, 500 unpruned classification trees were grown from 500 bootstrap data samples.To achieve Forests 2018, 9, 77 9 of 24 modifications between the trees, not all, but √ p variables were sampled and considered in order to decide on the best split (where p is the total number of considered variables, in this research: p = √ 11 ∼ = 3).Harvest probabilities for the individual test set plots were then predicted by calculating the share of the 500 trees that voted for harvest = yes [42,61,62].For each RF variable, importance was read out as the mean decrease in accuracy (see Figure A1) with the varImpPlot function of the randomForest package [61].The 11 predicting variables were included within all of the RF procedures applied to the individual strata.However, for some strata, single discrete variables became irrelevant (mean decrease in accuracy = 0), as they contained only one specification of that variable (e.g., the variable harvest condition within leaf 2, see Figures 2 and A1)

Harvested Volume Predictions
The harvested volumes of NFI plots that were predicted as harvested by the upstream models (including logistic regression) were predicted with OLS regressions trained on harvested plots of the training set.For the OLS regressions, a Gaussian family distribution was used: where Y i is case i's harvested volume, β 0 is the regression constant, X ij is case i's score on the jth of p predictor attributes in the model, β j is the partial regression weight of predictor j, and ε i is the error for case i [63].
The final OLS model trained on the training set is shown in Table 1.Numeric attributes were only included in the models when they were found to be statistically significant (p < 0.05).Characteristics of discrete attributes that did not show statistically significant differences to each other were collated in one group.
Ownership types other than large private were collated in a single group, since statistically significant differences were only found between large private forest and all other ownership types.

Direct Harvested Volumes Prediction
In addition to the stratified prediction method described above, harvested volumes were also predicted directly with an OLS regression trained on the entire dataset (Table A3).For this OLS, no statistically significant differences were found between state and community forests.In addition, Forests 2018, 9, 77 10 of 24 no statistically significant differences were found between the stand types conifers 1 and mixed 1, or between the stand types conifers 2, mixed 2, and deciduous.Furthermore, no differences were found between small and medium private forests.The respective groups were joined.Average plot age, average plot DBH, harvest condition, and nature protected area were not statistically significant.

Harvested Decision Prediction with Logistic Regression
A commonly used option would be to use logistic regression to predict harvest occurrences instead of the stratified approach.The logistic regression was conducted with a binominal family distribution, with harvest decision as the response attribute.It can be expressed as: P i is the harvest probability of the ith plot within the 10-year period and βx is a linear combination of parameters (β) and explanatory attributes (x) [21].
When the logistic regression was fit to the training set, no statistically significant differences were found between the stand types conifers 1 and mixed 1, or between the stand types conifers 2, mixed 2, and deciduous.For ownership type, no statistically significant differences were found between state and large private forest.The respective groups were joined.The final logistic model included 11 variables and three interactions (see Table A4).
Within the prediction process, the fitted logistic regression was used to predict individual harvest probabilities at plot level first.To convert harvest probabilities to binary harvest decisions, cut-offs had to be defined.The definition of cut-off points was derived in an intermediate step using the training set.The models were used to predict harvest decisions for the training data.The predictions were then used in combination with the real NFI harvest decisions to determine cut-off points using the R package OptimalCutpoints [41,53] with max kappa (MK) or the Youden index (YI) as criteria [41,64,65].
Max kappa is based on Cohen's kappa, which is defined as: where p 1 is the probability that two classifiers agree, and p 2 is the probability that two classifiers agree by chance [55,64].Since max kappa already resulted in good predictions of overall harvest shares, the Youden index was only used as a criterion to demonstrate the effect of shifting the cut-off benchmark.It resulted in a cut-off clearly different from max kappa and is defined as [41,65]: The derived cut-offs were used to split the predicted probabilities of the test set plots into the characteristics yes and no.
OLS trained on harvested plots of the training data only (Table 1, see above) were then applied to those test set plots predicted as harvested by the upstream logistic models.The harvested volumes of test set plot predicted as not harvested were set to zero.

Results
Table 2 and Figure 4 compare the results of the upstream models that were used to predict the harvest occurrence with NFI results.Logistic regression resulted in higher classification accuracies when compared to the methods based on stratification (CART).Harvest decisions predicted randomly within the strata resulted in the lowest classification accuracies.When the selection within the strata was informed by RF, accuracy values were closer to, but still below, those of the logistic regression.CART: Classification and Regression Trees; MK: Max Kappa; YI: Youden Index. 1 Arithmetic mean and standard deviation (in parenthesis) from 100 repetitions.
Figure 4 offers insights regarding the question of how well the models were able to predict harvest shares within various subsets of the data.Logistic regression (based on max kappa cut-offs) delivered good overall results for the entire test set.However, shares of harvested plots were underestimated strongly for subgroups with lower harvest probabilities (e.g., average plot age of <30 years, lowest standing volume class, and unfavorable harvest conditions).On the other side, the share of harvested plots was overestimated for groups with higher harvest probabilities (e.g., highest standing volume groups).A comparison of these results with the prediction based on the Youden index shows that lower or higher cut-off benchmarks could not solve this problem, but would rather shift the entire figure.
The two methods based on stratification (lower panel of Figure 4) were able to retrace harvest shares of the NFI more accurately across the various subsets of the test set.For some subsets, the method based on random prediction within the strata performed better than the RF-informed stratified prediction (e.g., altitude >700 m a.s.l., small and medium private forest).CART: Classification and Regression Trees; MK: Max Kappa; YI: Youden Index. 1 Arithmetic mean and standard deviation (in parenthesis) from 100 repetitions.
Figure 4 offers insights regarding the question of how well the models were able to predict harvest shares within various subsets of the data.Logistic regression (based on max kappa cut-offs) delivered good overall results for the entire test set.However, shares of harvested plots were underestimated strongly for subgroups with lower harvest probabilities (e.g., average plot age of <30 years, lowest standing volume class, and unfavorable harvest conditions).On the other side, the share of harvested plots was overestimated for groups with higher harvest probabilities (e.g., highest standing volume groups).A comparison of these results with the prediction based on the Youden index shows that lower or higher cut-off benchmarks could not solve this problem, but would rather shift the entire figure.
The two methods based on stratification (lower panel of Figure 4) were able to retrace harvest shares of the NFI more accurately across the various subsets of the test set.For some subsets, the method based on random prediction within the strata performed better than the RF-informed stratified prediction (e.g., altitude >700 m a.s.l., small and medium private forest).
Before results for combined harvest decision and harvested volume models are presented, the harvested volume regression model should be evaluated separately.At the same time, results of the direct OLS-based harvest volume prediction can be considered.Figure 5 shows the results of the two versions of the harvested volume models applied to test set plots that were actually harvested according to the NFI.
OLS that were trained on the entire training set and used to predict harvested volumes directly delivered good overall average harvested volume estimations.However, predicted harvested volumes were generated from 97.6% of the test set plots (for 2.4% of the plots, predicted volumes were ≤0).At the same time, the harvest intensities of the plots harvested according to the NFI were underestimated strongly (mid-panel of Figure 5).When the OLS regression was trained on the harvested plots of the training set only (downstream model), average NFI harvest intensities could be retraced well.Figure A2 shows the results of downstream OLS tested on plots harvested according to the NFI for various subsets of the test set.Harvested volume predictions were close to NFI values for most characteristics.Harvested volumes of some characteristics were under or overestimated.Before results for combined harvest decision and harvested volume models are presented, the harvested volume regression model should be evaluated separately.At the same time, results of the direct OLS-based harvest volume prediction can be considered.Figure 5 shows the results of the two versions of the harvested volume models applied to test set plots that were actually harvested according to the NFI.
OLS that were trained on the entire training set and used to predict harvested volumes directly delivered good overall average harvested volume estimations.However, predicted harvested volumes were generated from 97.6% of the test set plots (for 2.4% of the plots, predicted volumes were ≤0).At the same time, the harvest intensities of the plots harvested according to the NFI were underestimated strongly (mid-panel of Figure 5).When the OLS regression was trained on the harvested plots of the training set only (downstream model), average NFI harvest intensities could be retraced well.Figure A2 shows the results of downstream OLS tested on plots harvested according to the NFI for various subsets of the test set.Harvested volume predictions were close to NFI values for most characteristics.Harvested volumes of some characteristics were under or overestimated.Figure 6 shows the final harvested volume predictions derived by combining upstream and downstream models.The results of direct OLS harvested volume predictions are not shown in the figure, since the results and deficits of this approach were already presented in Figure 5.When analyzing the final harvested volume predictions in Figure 6, Figure A2 and Figure 4 should also be considered.This comparison can provide answers to the question of whether underestimations, fits, or overestimations were caused by the respective upstream or downstream model, or by a combination of both models.Figure 6 shows the final harvested volume predictions derived by combining upstream and downstream models.The results of direct OLS harvested volume predictions are not shown in the figure, since the results and deficits of this approach were already presented in Figure 5.When analyzing the final harvested volume predictions in Figures 4, 6 and A2 should also be considered.This comparison can provide answers to the question of whether underestimations, fits, or overestimations were caused by the respective upstream or downstream model, or by a combination of both models.The best harvested volume predictions were delivered by the stratified approach, where plots were selected randomly as harvested within the individual strata.With only a few exceptions, predicted harvested volumes remained within the standard error (confidence interval: 95%) of the NFI of the various subsets of the test set.
The stratified approach that used RF to inform the selection of harvested plots within the strata delivered the second best results.However, in this case, harvested volumes of several subsets of the test data were under or overestimated.
Predictions based on logistic regression overestimated harvested volumes from conifer and mixed stands.At the same time, harvested volumes from several data subsets of the deciduous stands were underestimated.The best harvested volume predictions were delivered by the stratified approach, where plots were selected randomly as harvested within the individual strata.With only a few exceptions, predicted harvested volumes remained within the standard error (confidence interval: 95%) of the NFI of the various subsets of the test set.
The stratified approach that used RF to inform the selection of harvested plots within the strata delivered the second best results.However, in this case, harvested volumes of several subsets of the test data were under or overestimated.
Predictions based on logistic regression overestimated harvested volumes from conifer and mixed stands.At the same time, harvested volumes from several data subsets of the deciduous stands were underestimated.

Discussion
Research results showed that each of the tested methods had strengths and weaknesses.Logistic regression achieved the highest overall classification accuracies, but overestimated harvest shares for some characteristics, while underestimating the harvest shares of others.The stratified method with random selection resulted in better prediction of harvest shares, but achieved much lower harvest accuracies.The RF informed stratified method achieved results in between the logistic and the stratified random approach.Results are discussed in more detail below, starting with the direct harvested volume prediction.
The linear OLS regression that was trained on both harvested and not-harvested plots and used to predict harvested volumes directly appeared at first glance to deliver good results.However, this approach neglected that a substantial share of NFI plots was not harvested within the 10-year period.Since the absence of logging interventions is part of current business-as-usual, it should also be captured in BAU scenarios (compare also Eid [66] cited in Antón-Fernández and Astrup [21]).A related shortcoming is that, on average, harvested plots were in reality treated with considerably higher harvest intensities than predicted by the directly applied models.Using the direct approach to predict future wood supply might result in adequate business-as-usual overall quantities for the first decade.However, instead of realizing these harvested volumes from around 65% of the plots, the direct model would harvest these quantities from nearly 100% of the plots treated with comparably low intensities.Thus, the prediction would result in a forest inventory structure that is clearly different from business-as-usual at the end of the decade.This direct approach would be especially unsuitable for projection periods longer than 10 years.
However, it was also shown that the OLS was able to predict the harvested volumes of harvested NFI plots well when trained and tested on subsets of harvested plots only (the downstream model).
To make use of these versions of the model in a BAU scenario, it was necessary to first predict the occurrence or non-occurrence of harvest interventions.
Logistic regression is a common approach for binary decision problems [21,36].In the context of this research, it delivered better results than the tested stratified methods with regards to classification accuracy, precision, specificity, and Cohen's kappa.Nevertheless, the logistic regression failed to predict accurate harvest shares for several subsets of the test set.When combined with the downstream OLS model, the logistic model resulted in overestimations of harvested volumes for some characteristics, and underestimations of others.This problem is directly linked to only one overall benchmark being used to convert the predicted harvest probabilities into the decision yes or no.Predicting harvest occurrences from CART with an overall cut-off would create similar problems.
Logistic regression resulted in higher, but-at least in the case of this research-not sufficiently high classification accuracies.An important factor here is the noisiness of the large-scale inventory data.The forest area based on which an owner or manager makes a harvest decision might look different from what is suggested by the inventory plot.With the exception of selection (plenter) felling, a harvest decision is usually made for an area (e.g., a forest stand or a section of a forest stand).In the case of the German NFI, only one or two inventory plots might be found per included decision area.However, a higher number of plots would be needed in order to produce a representative picture of that area [67].
The most accurate way of designing BAU wood supply scenarios would be to generate models with high classification accuracies.However, if this is not feasible (e.g., due to data noise) an alternative is to design models that deliver accurate harvest proportions throughout subsets of the dataset.The stratified prediction methods tested in the scope of this research can be used to optimize the model towards this goal.
In this research, CART was used for the stratification process.However, comparable stratifying machine learning algorithms such as C.45 [68] might also be applicable for this purpose.The CART algorithm divided NFI plots of the training set into strata, and delivered common harvest probabilities for each stratum.The fitted CART could then be applied to the test set.The algorithm assigned the test set NFI plots to the respective stratum where corresponding probabilities (derived from the training set) could be used to select an adequate proportion of the test set plots as harvested.Thus, instead of assuming that plots with harvest probabilities of 20% were not harvested at all (as done by the cut-off method), it was possible to select 20% of these plots as harvested.The question is now: which of the plots should be selected as harvested?
In a first approach, it was simply assumed that no further factors, but rather only noise, influenced the harvest decision at the plot level, and plots were selected randomly as harvested according to the specific harvest probability of a stratum.This approach resulted in good predictions of harvest shares and harvested volume across the various subsets of the test set.An important drawback was that the classification accuracy, precision, specificity, and Cohen's kappa values dropped considerably when compared to the cut-off approach.Furthermore, it was necessary to repeat the procedure and average outcomes of the repetition.
In a second approach, a RF algorithm was used to inform the selection of harvested plots within the various strata.In this way, each NFI plot within the test set obtained an additional individual harvest probability.Again, shares of plots corresponding to the harvest probabilities of specific strata were selected.However, this time, the plots with the highest individual harvest probabilities were selected first, and the selection stopped as soon as the respective harvest share of the stratum was reached.Harvest share predictions were comparable to the results of the random approach.However, for some subsets (e.g., plots above 700 m a.s.l., plots above 50-cm average DBH, and small private forest plots), the random selection delivered better results.At the same time, classification accuracy, precision, specificity, and kappa values increased considerably, but still remained below the results of the logistic models.
With reference to factors impacting harvest decisions, the outcome of this research mostly confirmed the results of previous research.Harvest condition and slope [21,38,50] were found to have impacts on the general decision about whether or not a plot is harvested.These factors were not found to influence the harvest intensity once this decision was made.However, it is possible that plots with more difficult harvest conditions are harvested less frequently, but with higher intensity, and that this pattern is blurred due to the given period of 10 years.The factor ownership type [17,18,37] was also found to have important impacts.An interesting finding was that, for harvest occurrence, differences were found between community, state, and large private forests (as one group), medium private forests, and small private forests (with decreasing harvest probability across the groups listed from first to last, see Table A4).However, only the large private forest was found to be harvested at higher intensities.No differences in harvest intensities could be found between the other ownership types.
A restriction of the research is that the results remain on a relatively broad level as concerns timber species and dimensions.Instead of referring to individual timber species, plot level stand types grouped into five broad groups were used.Included dimension information referred to average values for the NFI plots.However, results of this research could be used to integrate BAU scenarios into AFGWSMs, where results could be further broken down to individual tree levels and even the computation of timber assortments would be possible [43,69].
The available information for NFI plots was dominated by stand or site-specific attributes.With regards to the human factor, only rough information on ownership type and property sizes was available.The objectives and behavior of NIPF owners are considered to be heterogeneous, dynamic, and complex [29,[70][71][72][73][74].However, given this complexity, and the fact that NFI plots are not representative of decision areas, it is unclear in how far more ownership-specific information would help to improve the accuracies of the models.
Within the reference period, business-as-usual procedures were disturbed by storms and a drought period.Here, it should be asked how far and to what degree of severity natural disturbances should be considered as part of usual business.For instance, bark beetle impacts related to elevation and timber dimensions as described by Sterba et al. [33] might, for example, still be considered normal.In contrast, Seidl et al. [75] dedicated a study to the projection of bark beetle disturbances in the context of climate change impacts and adaptive management strategies.This can be considered as diverging from business-as-usual.Within the German NFI, the specific timing of and reason for tree harvests are not recorded.Furthermore, the occurrence of salvage fellings is likely to impact other harvest decisions.Thus, in any case, it would be difficult to deduct harvest occurrences caused by disturbances.
Plots of the German NFI are not representative of the stands or decision areas in which they are located.The non-occurrence of harvest interventions on an NFI plot does not imply that the entire stand was not harvested and vice versa.Thus, it has to be kept in mind that a BAU scenario generated with the presented method is, first of all, a projection of the NFI.However, the NFI is considered a random sampling (by neglecting the systematic arrangement of the tracts) [46] that delivers representative and valid data on forest conditions on a larger scale [8].
CART turned out to be a valuable tool for the stratified prediction.An additional advantage is that the algorithm resulted in a classification tree that can be easily interpreted.In this context, Domingos [57] (p. 2) argues: "[ . . .] it is not enough for a learned model to be accurate; it also needs to be understood by its human users, if they are to trust it and deem it acceptable".However, a disadvantage of CART is its instability [57].Setting the minimum number of observations per terminal leaf to 100 and pruning the tree helped to gain more stability and avoid over-fitting.Nevertheless, smaller changes in the input data could still result in different tree structures.However, alternative algorithms that could be used for stratification such as C4.5 [68] have similar issues [55,57] and more stable methods such as RF are unsuitable for the stratification process.

Conclusions
It can be summarized that the stratified method generated good results.Among the group of tested prediction methods and under the given data conditions and restrictions, it can be considered as the most suitable for generating BAU timber supply scenarios.The direct prediction of harvested volumes with a linear regression trained on both harvested and not-harvested NFI plots is not an option, since this method resulted in harvesting patterns that were clearly different from BAU.Using logistic regression with an overall cut-off benchmark resulted in strong harvested volume overestimations for some characteristics, and underestimations for others.A trade-off of the stratified method is the decrease in classification accuracy and related quality measures.However, when the selection of plots to be harvested was not generated randomly, but rather informed by RF, this impairment remained within an acceptable range.Furthermore, in the scope of large-scale timber supply projections, it might be considered as more important to find the right number of harvested plots than to identify those plots that were actually harvested.
Results of this research must be interpreted in respect of the regional and periodical background conditions, the data quality of the German NFI, and the availability of attributes that could be used and linked to the NFI data.However, considering that large-scale forest inventories are often rather noisy, the presented stratified methods could also be helpful for generating useful BAU forest development and wood supply scenarios in other regions.
The stratified method should be tested and challenged when applied to other time periods, regions, and inventory variations.A further challenge is to combine this method with AFGWSMs to generate BAU scenarios for forest development and timber supply projections beyond one decade.In this context, another challenge is to learn and predict BAU harvesting choices for individual trees.The results of this research showed that BAU harvesting patterns could be retraced well without having more detailed background information on individual harvest decisions.However, the parametrized BAU scenario could also be used as a baseline for the design of alternative scenarios.For this purpose, underlying social and economic grounds for harvest choices would have to be studied specifically in relation to inventory based scenario design.More research explicitly dedicated to this task is needed.

Figure 1 .
Figure 1.Research design. 1 Direct refers to a one-step training and prediction method where harvested volumes are predicted directly.Downstream and upstream models are the components of

Figure 1 .
Figure 1.Research design.1 Direct refers to a one-step training and prediction method where harvested volumes are predicted directly.Downstream and upstream models are the components of a two-step approach, where the harvest decision is predicted first (upstream), and subsequently (downstream), harvested volumes are predicted for plots predicted as harvested.

Figure 2 .
Figure 2. Pruned outcome of the Classification and Regression Tree (CART) algorithm showing harvest probabilities (P) and number of training set plots (N) for the 18 leaves of the tree (leaf numbers sorted by harvest probability).The branch size is related to the number of training set plots sent towards the respective direction (DBH: average stand diameter at breast height).

Figure 2 .
Figure 2. Pruned outcome of the Classification and Regression Tree (CART) algorithm showing harvest probabilities (P) and number of training set plots (N) for the 18 leaves of the tree (leaf numbers sorted by harvest probability).The branch size is related to the number of training set plots sent towards the respective direction (DBH: average stand diameter at breast height).
compares the prediction results of the two methods in question.

Figure 3 .
Figure 3. Comparing results of cut-off prediction (cut-off at probability 0.5) and random prediction (randomly selecting a share of plots that corresponds with the respective harvest probability) on a theoretical example where 10,000 fake plots were sorted by a decision tree in 100 strata at 100 plots and probabilities per stratum ranging from 0.01 to 1 in steps of 0.01: (a) Underestimation of harvested plots; (b) Overestimations of harvested plots; (c) A comparison of the classification accuracy: for the random prediction, the arithmetic mean and standard deviations from 500 repetitions are shown.

Figure 3 .
Figure 3. Comparing results of cut-off prediction (cut-off at probability 0.5) and random prediction (randomly selecting a share of plots that corresponds with the respective harvest probability) on a theoretical example where 10,000 fake plots were sorted by a decision tree in 100 strata at 100 plots and probabilities per stratum ranging from 0.01 to 1 in steps of 0.01: (a) Underestimation of harvested plots; (b) Overestimations of harvested plots; (c) A comparison of the classification accuracy: for the random prediction, the arithmetic mean and standard deviations from 500 repetitions are shown.

Figure 4 .
Figure 4. Test set results comparing the share of harvested plots with regards to various subsets of the data.The prediction results of the models can be compared to the results of the national forest inventory (NFI).CART = Classification and Regression Trees; RF = Random Forest.

Figure 4 .
Figure 4. Test set results comparing the share of harvested plots with regards to various subsets of the data.The prediction results of the models can be compared to the results of the national forest inventory (NFI).CART = Classification and Regression Trees; RF = Random Forest.

Figure 5 .
Figure 5. Harvested volume predictions of ordinary least squares regression (OLS) trained (a) as direct model on the entire training set or (b) as downstream model on plots of the training set that were harvested according to the national forest inventory (NFI).The panels show test set results for the entire test set (upper panel), harvested plots of the test set (mid-panel), and not-harvested plots of the test set (lower panel).

Figure 5 .
Figure 5. Harvested volume predictions of ordinary least squares regression (OLS) trained (a) as direct model on the entire training set or (b) as downstream model on plots of the training set that were harvested according to the national forest inventory (NFI).The panels show test set results for the entire test set (upper panel), harvested plots of the test set (mid-panel), and not-harvested plots of the test set (lower panel).

Figure 6 .
Figure 6.Test set harvested volume predictions of selected methods compared to the results of the national forest inventory (NFI).The order of the bars corresponds to the order of the legend.CART = Classification and Regression Trees; RF = Random Forest.(Error bars of the NFI refer to the standard error of the NFI; error bars of the stratified random prediction refer to the standard deviation calculated from 500 repetitions).

Figure 6 .
Figure 6.Test set harvested volume predictions of selected methods compared to the results of the national forest inventory (NFI).The order of the bars corresponds to the order of the legend.CART = Classification and Regression Trees; RF = Random Forest.(Error bars of the NFI refer to the standard error of the NFI; error bars of the stratified random prediction refer to the standard deviation calculated from 500 repetitions).

Figure A1 .
Figure A1.Variable importance for Random Forests applied to individual leaves of the classification tree.

Figure A1 .
Figure A1.Variable importance for Random Forests applied to individual leaves of the classification tree.

ForestsFigure A2 .
Figure A2.Harvested volume predictions of Least Squares Regression (OLS) for test set plots that were harvested according to the National Forest Inventory (NFI).

Figure A2 .
Figure A2.Harvested volume predictions of Least Squares Regression (OLS) for test set plots that were harvested according to the National Forest Inventory (NFI).

Table 1 .
Ordinary Least Squares (OLS) regression with estimates, standard errors (in parentheses), and significance codes fitted on harvested plots of the training set (sample size: 5587 plots).

Table 2 .
Test set results of the methods used to learn and predict the general harvest decision (sample size 2930 plots).

Table 2 .
Test set results of the methods used to learn and predict the general harvest decision (sample size 2930 plots).