Multistage Sample Average Approximation for Harvest Scheduling under Climate Uncertainty

: Forest planners have traditionally used expected growth and yield coefﬁcients to predict future merchantable timber volumes. However, because climate change affects forest growth, the typical forest planning methods using expected value of forest growth can lead to sub-optimal harvest decisions. We proposed in this paper to formulate the harvest planning with growth uncertainty due to climate change problem as a multistage stochastic optimization problem and use sample average approximation (SAA) as a tool for ﬁnding the best set of forest units that should be harvested in the ﬁrst period even though we have a limited knowledge of what future climate will be. The objective of the harvest planning model is to maximize the expected value of the net present value (NPV) considering the uncertainty in forest growth and thus in revenues from timber harvest. The proposed model was tested on a small forest with 89 stands and the numerical results showed that the approach allows to have superior solutions in terms of net present value and robustness in face of different climate scenarios compared to the approach using the expected growth and yield. The SAA methods requires to generate samples from the distribution of the random parameter. Our results suggested that a sampling scheme that focuses on generating high number of samples in distant future stages is favorable compared to having large sample sizes for the near future stages. Finally, we demonstrated that, depending on the level of forest growth change, ignoring this uncertainty can negatively affect forest resources sustainability.


Introduction
Climate change is arguably one of the most challenging issues that contemporary forest planners need to address. In general, forest practitioners need to provide a long term forest harvest scheduling plan that takes into account the preference of the stakeholders while complying with environmental, regional and ecological restrictions. Usually, forest planning aims at scheduling which forest units should receive a specific treatment such as harvesting, thinning, etc., in each period in order to achieve the management objectives. This planning also known as harvest scheduling, expands several decades, and is subject to many sources of uncertainties due to factors that include natural hazards (fire, windthrow, insects, etc), and technological limitation such as forest inventory errors, growth prediction errors, poor foresight of the price of forest products on the market [1,2]. It is important to incorporate all or some of these uncertainties in forest planning.
The need to incorporate uncertainty in forest harvest scheduling has been acknowledged several decades ago [3]. Subsequently, [4] urged for consideration of forest growth uncertainty in the planning. The authors asserted that the cost of ignoring such uncertainty at a forest scale can be detrimental. When the uncertainty is ignored, we make decisions assuming the growth in the future will be the expected value of growth (average growth). However, it is rare that the actual growth will be the average one and therefore, many harvest decisions made in early periods of the planning horizon were not optimal. This leads to a failure to meet the planning objectives and satisfying many of the aforementioned restrictions. Owing to the computational challenges that the introduction of uncertainty in harvest modeling involves, [5] recommended that each forest harvest scheduling plan should be accompanied by an estimation of its uncertainty or its reliability. Notwithstanding all the exhortation to incorporate uncertainty in forest planning, harvest scheduling models, for a long time, have ignored uncertainty or were limited at reporting the sensitivity of the harvesting plans in case of change of one or many input parameters of the modeling. It is only recently that there has been a prolific number of papers addressing the integration of uncertainty in harvest planning. The most common uncertainty that has been addressed is the wood price and demand uncertainties [6][7][8][9][10]. Recently, there is an interest to incorporate climate uncertainty in forest planning. The first study that we are aware of that explicitly addresses the issue was conducted by [11]. The authors assessed how climate change affected the management decisions of a Eucalyptus forest in Portugal with a planning horizon of 15 years. A similar study was conducted by [12,13] using the same dataset.
In this paper, we model harvest planning as a stochastic optimization problem and use sample average approximation (SAA) as a tool for identifying the best set of actions one can take to meet the management objectives. In other words, we use SAA for solving harvest planning models. Stochastic optimization is a modeling framework for optimization problems dealing with uncertainty. In multistage stochastic programming, decisions are made sequentially at stages and the uncertainty unfolds during periods. Thus, the decision maker needs to implement a decision at the beginning of the planning horizon (first stage decisions) without knowing what the value of the uncertain parameter will be. After a period, in which the uncertainty is revealed, the decision maker can take recourse actions at subsequent stages. Therefore, it is important to make optimal decisions in the early stages of the planning without knowing the magnitude of the uncertainty that will unfold. Specifically, in strategic harvest scheduling, the decision maker needs to prioritize the set of forest units (stands) that should be harvested here and now or during the ongoing decade or period. After that period, the decision maker will have the opportunity to revisit the management in the following periods. In this case, the goal of the harvest scheduling is to decide the set of actions that managers should apply immediately "here and now" (first stage decisions).
SAA is a Monte Carlo simulation based approach for solving stochastic optimization problems. It consists of drawing repetitively samples from the distribution of the random parameter and solving the resultant average sample deterministic optimization problem. Each sample might lead to a different solution. However, if the sample size is large enough, the sample objective function value will approximate the true optimal value. [14] showed that for minimization problems, the expectation of samples' objective value corresponds to the lower bound on the true optimal value of the stochastic optimization problem and that the bound monotonically increases as the sample size increases. SAA has been successfully applied in several fields such as portfolio selection [15], supply chain network design [16,17], facility location problem [18] and personnel assignment [19]. Notwithstanding all these applications, and the fact that SAA is well suited for problems where the objective function is difficult to evaluate such as harvest planning under climate uncertainties [15], to the extent of our knowledge, SAA has never been applied in forestry domain. In addition, unlike stochastic programming which requires to know the probability distribution of the samples, SAA known as a data-driven optimization method, considers the samples to be equiprobable. The idea is that if the sample size is large enough, then the sample statistics will represent those of the actual population the sample is taken from. Hence, there is no need to know the actual distribution of the uncertainty. This is particularly important for harvest scheduling under climate uncertainty because climate change is forecast as possible futures without any probabilities. Moreover, although the method performs well for two-stage stochastic problems [20], it is unclear how the method will perform on a multistage harvest scheduling problem.
The objective of this paper is to fill the gap in the literature on SAA and multistage stochastic harvest scheduling models. For multistage stochastic optimization problems, each sample of the uncertainty is known as a scenario. It is known that the number of scenarios to consider grows exponentially with the number of stages in multistage stochastic programs. Forest harvesting 3 of 13 scheduling models are typically characterized with long planning horizons with many stages. The contributions in this paper are as follows: 1. Introduce a method to handle climate change uncertainty in forest harvest scheduling that allows to make sound decisions in early stages of planning. Poor decisions in early stages are significantly more detrimental to many businesses compared to decisions in later stages. Hence, later stage decisions can be considered as secondary. 2. Propose a method to generate the set of scenarios to reduce the sample size and keep the optimization model tractable. If we generate possible scenarios of forest growth in i.i.d (independent identical distributed) fashion, then we might have a very large sample size before SAA solution converges to optimality. This large sample size can make the problem computationally intractable. We propose a sampling scheme that focuses at having higher number of replications for distant stages.
We will show that stochastic harvest scheduling has an advantage over the expected value approach (deterministic model presented in Appendix A) because it allows to implement policies that are optimal for all foreseen forest growth change by providing management decisions that would be implemented if we consider the uncertainty and if we do not. The rest of this document is structured as follows. In Section 2, we present the materials and methods used in this research. We describe in that section the SAA method and the scenario generation scheme. We dedicate Section 3 to presenting our findings. Finally, in Section 4, we discuss the results and outline future research.

Problem description and formulation
We present in this section the multistage stochastic harvest scheduling problem with forest growth uncertainty due to climate change. Let t ∈ T = {1, . . . , T} be the set of stages in which forest units are eligible for harvest in the future. We reserve t = 0 to the time "now" (first stage decision), where there is no uncertainty on the forest growth. We define ξ := {ξ 1 , . . . , ξ T } to be the random vector characterizing forest growth change due climate change. Hence, ξ t is the random parameter of ξ at time t. Each ξ t has a support Ξ t which is the range of values that the random parameter ξ t can take. The meaning of parameters, variables and sets is given in Table 1.
The objective of the decision maker is to maximize the expected net present value (NPV) from timber harvest. This can be formulated in a generic form as (1) and (2): where we take the expectation with respect to ξ and, R(x, ξ i ) is the optimal value of the harvest scheduling given the harvest in the first stage (x) and the occurrence of one climate scenario ξ i . In this formulation, the decision variables are only the first stage variables which means that the decision maker mainly cares about how to select the stands that can be harvested in the first period so that the expected NPV is maximized. The value of R(x, ξ i ) is obtained by solving the following optimization problem: subject to  The objective function (3) maximizes the net present value from subsequent stages after the current harvest (x). This objective function includes the cost of harvest and re-plantation for each forest unit. In addition, this objective function accounts for the value of the stands that are not harvested during the planning horizon because those stand have a monetary value. Constraint set (4) imposes that if a stand is harvested now, then it cannot be harvested in subsequent years. In other words a forest unit can only be harvested once during the whole planning horizon. The use of w s variables is to account the stands that are not scheduled for harvest in the whole planning horizon. Constraint set (5) and (6) compute the volume of wood harvested now and in the future periods during the planning horizon, respectively. The volume computed in each period depends on the scenario ξ i . Constraint sets (7) and (8) impose that the volume fluctuation between two consecutive stages should be within a given lower and upper bounds, respectively. These set of constraints are also known as even flow constraint since they ensure that the volume of timber harvested is almost evenly distributed in time. Even with constraint sets (7) and (8), there is a possibility that the volume harvested declines with time. Hence volume at t = 4 might be much lower than volume at t = 1, for instance. To attenuate this effect, we impose supplementary wood flow constraints which are constraint sets (9) and (10). These two sets of constraints impose flow restriction between two non-consecutive stages. Constraint set (11) states that the age of the forest at the end of the planning horizon should be greater or equal to the current age of the forest. This constraint is a proxy for sustainability; it ensures that forest resources are not depleted during the planning horizon. Finally, the definition of the variables is given in (12).

Climate change data
In this paper the uncertainty in forest growth stems from climate change. We present in this section how climate change was translated into forest growth to be suitable to the harvest planning model. Climate change in this project refers to the change in forest growth. Hence, the change can be positive or negative. The change is small in near future, stage 1, compared to the distant future (stage 4). There is a ten year difference between two consecutive stages. In this project, we consider the case of Pacific Northwest where most models forecast that climate change will lead to the increase of forest growth. We report in Table 2 the growth change used for the analysis. This growth change data is based on [21, Table 3]. Forest growth is age dependent and this is reflected in growth change. In addition, this growth change modeling reflect a Geometric Brownian Motion process where the absolute increment of growth are not independent from one period to another although the percentages of change are. The Lower and Upper in the table correspond to the lowest and highest possible growth change, respectively, at each stage. Formally speaking, [Lower, Upper] at stage t, represents Ξ t of the random parameter ξ t .
To assess the performance of the proposed modeling framework, we change the lower bound on the growth change by multiplying it by a factor . This allows to assess the performance of the proposed method for climate change scenarios that forecast decrease in forest growth. We assessed values of equal 1, 20 and 40. Only = 40 corresponds to forest decline whereas, of 20 corresponds to a decrease in forest growth.

Multistage sampling
We have presented both the optimization model we intend to solve in this paper and the uncertain parameter. In this section, we describe how to sample from the distribution of the uncertain parameter. To achieve that objective, let N = {N 1 , . . . , N T } be a sequence of positive integers. At the first stage (stage 1), we generate N 1 replications drawn from Ξ 1 , the support of the random parameter ξ 1 . To Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 October 2020 doi:10.20944/preprints202010.0273.v1 minimize the number of replications necessary, we generate the N 1 by diving Ξ 1 into N 1 intervals. From each interval, we sample uniformly one realization of ξ 1 . We repeat this procedure for stage 2 and so forth for the following stages. At the end, we connect the realization at t = 0 to all realizations at t = 1. We connect all realizations at t = 1 to all others at t = 2 and we continue until the last stage. The result of this procedure is a scenario tree with a total number scenarios or sample size n = ∏ T t=1 N t . Each path from this scenario tree can be viewed as a scenario with probability 1/n. Varying the values of N t allows to have different sample sizes which can be solved using SAA as described in the following section. It is not clear, however, whether having N 1 ≥ N 2 ≥ · · · ≥ N T is preferable to a sampling scheme with N 1 ≤ N 2 ≤ · · · ≤ N T . To answer this question, we tested the solution by considering a sampling scheme with N 1 = N 2 = · · · = N T in Section 3.1 and evaluate in Section 3.2, the solution behavior when we use the two other sampling schemes.

Solution method
We used SAA as the method for solving the stochastic harvest scheduling model since the main challenge of the decision maker is to find the set of stands that are suitable for harvest in the first period. SAA method is a Monte Carlo simulation based approach for solving stochastic optimization problems. A random sample is generated with uniform probability distribution, and then the expected value function of the stochastic problem is approximated by the corresponding sample average function. The sample average approximation problem which is specified by the generated sample is then solved by a deterministic optimization technique which is mixed integer programming in this study. This procedure is repeated by increasing the size of the samples to obtain solutions that are close enough to the true optimal solution.
To formally introduce the SAA method we used to solve the harvest scheduling problem, we reformulate the harvest scheduling problem in (1) and (2) where X is the feasible set (constraints (2), (4) to (12)). We can write (13) because the term r s x s has no uncertainty. Let ξ 1 , . . . , ξ n be a sample of size n drawn from the distribution of ξ. The sample is generated using the method described in Section 2.3. We can write the sample average objective function as: where The idea of SAA is to solve (16) instead of (13) We now describe the steps for solving a stochastic optimization problem using SAA as outlined in Figure 1.
Step Step c: Generate M samples of size n. The idea is to start with lower values of n and increase the sample size progressively. [22] describes a procedure for choosing the value of M. In a nutshell, M should be chosen in a way that allows to compute different statistics such as the mean, variance and confidence interval.
Step d: Solve SAA problem of each sample m to get z m n = max x∈X 1 n ∑ n i=1 f n x, ξ i and the upper bound z M n = 1 M ∑ M m=1 z m n . z M n is an upper bound because it is the average of upper bounds z m n obtained by using only a set of scenarios, and therefore the solution is more optimistic.
Step e: For each sample m, fix the first stage variables tox obtained from step a & b to get a lower bound on the optimal solutionẑ m n (x) = max 1 n ∑ n i=1 f n x, ξ i Step f: Compute the average optimality gap as n (x). We can also compute for each sample m the optimality gap as where f n x, ξ i is obtained by fixing the first stage variable in f n tox from Step a. Notice that G m n (x) ≥ 0. We can then compute the average optimality gap G(x) = 1 M ∑ M m=1 G m n (x) and its variance [14] proved that the optimality gap depends on the sample size n . In fact, they proved that as n → ∞, the optimality gap converges to zero and the SAA objective function value converges to the true optimal objective value with probability one. In other words, the sample size is large enough if we have a small optimality gap.
Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 October 2020 doi:10.20944/preprints202010.0273.v1 Step g: If the optimality gap computed in step f is not sufficiently small, then increase the sample size n and go back to step c. The decision of whether the optimality gap is sufficiently small and its variance is sufficiently low is domain specific.
Step h: If the optimality gap is small enough, then we have obtained the optimal solution and we can compute the confidence interval on the optimality gap as: µx ∈ 0, G(x) + t M−1,α S G with a confidence of 100(1 − α) with α being Type I error.

Importance of SAA
To assess the advantage of solving the stochastic optimization problem with SAA instead of the deterministic model that only consider the average scenario (the model that ignores climate change uncertainty), we compute the so-called value of stochastic solution (VSS) [23]. Letx be the first stage solution we would get if we implemented the average growth solution, respectively. Similarly, let x * be the optimal solution of the stochastic model obtained from SAA. We denote by z i andz i the net NPVs of the scenario ξ i when the first stage variables are fixed to x * andx, respectively. Finally, let We compute VSS as follows: VSS = z(x * ) − z(x). In term of relative value, we present VSS in basis points (bps) 1 as: In case where some scenarios are not feasible after fixing the first stage tox, we report the number of infeasible scenarios.

Experiment
We present in this section, the values of several parameters introduced in the methods in order to conduct a numerical experiment. The numerical experiment was run on a Windows desktop with an AMD 8 Core processor of 4 GHz and 8 GB of RAM. We have used CPLEX 12.8 to solve the stochastic model using python 3 as the modeling language. We solved each model to 0.5% mixed integer programming (MIP) optimality gap. For each model, we generated 30 independent replications (M = 30). We used bootstrapping to compute the 95% confidence level for different metrics. N was set to 625 scenarios and we varied the sample size n from 16 to 625. We tested the modeling on Phyllis Leeper forest with 89 stands (http://ifmlab.for.unb.ca/fmos/datasets/PhyllisLeeper/). Model parameters α and γ were set to 0.85, whereas β and λ were set to 1.15. The planning horizon was 50 years divided into 5 planning periods of 10 years each.

Effect of sample size on the optimality gap
The results of the experiments are summarized in Figure 2. The sample size is the number of scenarios. These results were obtained by setting N 1 = N 2 = N 3 = N 4 . The solutions show that as the sample size increases, the NPV decreases converging toward the optimal objective function value. In addition, we have tighter confidence intervals as the sample size increases. A similar behavior is observed for the optimality gap. However, larger sample sizes lead to a significant increase in solution time. We can conclude that solving problems with up to 256 scenarios is sufficient to capture the variability and obtain solutions that are close enough to the true optimal solution since increasing the sample size to 625 did not significantly reduce the optimality gap and its variance. 1 1 Basis points (BPS) is equivalent to 0.01%. It is commonly used in economics and finance 9 of 13 Furthermore, although we need larger sample size to reduce the optimality gap and its variance, we need to compromise on solution time. The solution time increases exponentially with the sample size.

Effect of sampling scheme on the optimality gap
In this section, we test the effect of scenario generation scheme on the optimality gap. The scheme used to generate scenarios for the four stages is presented in Figure 3. In the figure, the sampling scheme represents four digits corresponding to N 1 , N 2 , N 3 and N 4 . For instance, 1155 means that N 1 = 1, N 2 = 1, N 3 = 5 and N 4 = 5 leading to 1*1*5*5=25 scenarios. This means that the schemes 1555 and 5551 lead to the same number of scenarios (125 scenarios). As we can see from Figure 3, the optimality gap and its variability when N 1 = 1 is larger than when N 1 > 1. The lowest optimality gap is obtained with N 1 = 3. In general having higher values of N t for higher t seems to be favorable. feasible if we implement here and now the deterministic solution (Table 3). As the uncertainty on forest growth change magnitude increases, the value of stochastic solution increases as well. When we consider = 20, out of 200 scenarios, 3 scenarios were infeasible.

Discussion and conclusions
Climate change is a serious issue in forest management planning. In a study conducted in Norway, the majority of forest managers showed the importance of addressing climate change in forest planning [24]. Several researchers conducted similar studies in different ecosystems and reached analogous results [25,26]. To ensure the sustainability of forest resources, forest managers need to incorporate forest growth uncertainty in harvest scheduling models. In this work, we formulated a stochastic harvest scheduling model with forest growth uncertainty due to climate change and solved the model using sample average approximation (SAA). We tested the modeling and solution using climate change data transformed to forest growth change. We compared the robustness of the stochastic solution to the deterministic one by randomly generating a set of scenario and comparing the expected NPV if we implement the stochastic solution and the NPV if we use the deterministic solution. The numerical results showed that SAA allows to have stochastic solutions that are close enough to the true optimal solution when the sample size is large enough. However, large sample size lead to an exponential increase of solution time. This pattern of the computational complexity growing exponentially with the sample size was previously suggested by [22].
One of the main limitation of the proposed method is the computation required in step d of the algorithm presented in Figure 1. As we discussed in the previous paragraph, the increase of the sample size leads to an exponential increase in solution time. It is therefore important to have a strategy that allows to reduce the sample size while producing solutions that allow the convergence of SAA solutions to the true optimal solutions. Our tests suggest that with the appropriate sampling scheme, it is possible to reach convergence with smaller sample sizes. For instance, the sampling schemes 2356 and 3344 yielding sample sizes of 180 and 144, respectively, have smaller optimality gaps and variances compared to a sample size of 625 which stemmed from a sampling scheme of 5555. In conclusion, when we adopt the adequate sampling strategy that allows to sufficiently explore the first stage and generate large samples for future stages, we can limit the number of scenarios necessary for the SAA solution to converge to the true optimal solution.
The proposed model not only allows the managers to make intelligent decision now, but also allows the preservation of forest resources that take time to replenish once depleted. Indeed, the stochastic solution is robust to different climate scenario whereas, the deterministic one is infeasible to many of the tested climate scenarios. These results are in line with the ones obtained by [12,13]. The infeasibility of the deterministic solution stems mainly from the fact that the wood flow constraints are violated due to an intensive harvest in early periods.
In addition, the solution method proposed in this paper is easy to develop and implement by many forest managers. Unlike the methods used in [11,13] that require to know the probability distribution of the random parameter, SAA does not require such information. The method relies on the fact that if the sample size is large enough, then the sample statistics will approximate those of the actual population. The method is also suitable for many applications where the objective function cannot be computed in a closed form such as the so-called black-box optimization problems [27] and the stochastic knapsack problem [22]. In forestry, this method is well suited for harvest scheduling problem with wood price and demand uncertainties because we can extract samples from historical demand and price without the need to model the price like done in [10,28]. Compared to stochastic programming which require the so-called non-anticipativity constraints [13,29], SAA model is relatively smaller in terms of number of constraints (and possibly in terms of number of variables, depending on the formulation) since it does not require such constraints.
Regarding climate change data, we used the information of climate change forecast with statistical models developed in [30]. Unlike the data from [11][12][13], which originated from a process-based modeling, statistical models of forest growth under climate change are much more common (for e.g [31]). Hence, the method developed in this study can easily be extended to other forest systems. Moreover, it is straight forward to translate forecast of precipitation, air moisture and temperature into forest growth compared to processed-based models which require the expertise in plant biology.
Finally, this research can be extended by considering other sources of uncertainty such as the price of wood, or the demand of forest products. In this research we assumed that climate change does not lead to species transition. In some cases, one might need to consider the species shift because of climate change. It would be interesting to include this information in the decision-making process and provide to forest practitioners a range of options when implementing harvest scheduling plans and the choice of species for regeneration.

Appendix A. Deterministic harvest scheduling model
The objective function (A1) is to maximize the net present value from forest harvest. This objective function includes the cost of harvest and re-plantation for each forest unit. In addition, this objective function accounts for the value of the stands that are not harvested during the planning horizon because those stand have a monetary value. Constraint set (A2) imposes that if a stand is harvested now, then it cannot be harvested in subsequent years. In other words a forest unit can only be harvested once during the whole planning horizon. The use of n s variables is to account the stands that are not scheduled for harvest in the whole planning horizon. Constraint set (A3) compute the volume of wood harvested now and in the future periods during the planning horizon, respectively. Constraint sets (A4) and (A5) impose that the volume fluctuation between two consecutive stages should be within a fixed lower and upper bounds, respectively. These set of constraints are also known as even flow constraint since they ensures that the exploitation of forest resources is evenly distributed in time. Even with constraint sets (A4) and (A5), there is a possibility that the volume harvested declines with time. Hence volume at t = 4 is much lower than volume at t = 0, for instance. To attenuate this effect, we impose supplementary wood flow constraints which are constraint sets (A6) and (A7). These two set of constraints impose flow restriction between two non consecutive stages. Constraint set (A8) states that the age of the forest at the end of the planning horizon should be greater or equal to the current age of the forest. This constraint is a proxy for sustainability; it ensures that forest resources are not depleted during the planning horizon. Finally, the definition of the variables are given in (A9). H i