Optimal Exploitation of a General Renewable Natural Resource under State and Delay Constraints

: In this work, we study an optimization problem arising in the management of a natural resource over an infinite time horizon. The resource is assumed to evolve according to a logistic stochastic differential equation. The manager is allowed to harvest the resource and sell it at a stochastic market price modeled by a geometric Brownian process. We assume that there are delay constraints imposed on the decisions of the manager. More precisely, starting harvesting order and selling order are executed after a delay. By using the dynamic programming approach, we characterize the value function as the unique solution to an original partial differential equation. We complete our study with some numerical illustrations.


Introduction
In the recent decades, the management of natural resources has become a major issue. Indeed, for many countries, natural resources ensure regular incomes, allowing for economic growth and development. In particular, the seeking of high short-term profits can lead to an overconsumption of the natural resources and therefore to their exhaustion (see, e.g., [1]). Hence, the question of the sustainability of such natural resources is crucial.
As a consequence, many countries have imposed restrictions on the exploitation of natural resources so as to avoid their depletion. One of the repercussions of these constraints is the non-immediacy of the decisions: the actions of the natural resources managers are executed after some delay. Moreover, the harvests are limited in time, and sometimes, we have a lag constraint between two harvests. The aim of this work is to model these delays and to study their effects on the gain and the behavior of the natural resource managers.
We suppose that the resource manager can act by two type of interventions: starting and stopping the harvesting of the natural resource. We therefore model the strategy by a double sequence (d i , s i ) i≥1 where d i and s i are stopping times representing respectively the time of the i-th decision to start and stop harvesting. Therefore, we assume s i ≥ d i for i ≥ 1.
Such a formulation naturally appears in decision-making problems in economics and finance. In many cases, managers face technical and regulatory delays, which may be significant. Thus, these delays need to be taken into account in the way of acting (see for example [2,3]). In our case, we consider the management of a natural resource when we have constraints and lags. We first suppose that there is a minimum time δ between the end of an action and the beginning of the following one. This constraint can be written on the strategy (d i , s i ) i≥1 as d i+1 ≥ s i + δ for i ≥ 1. We also suppose that we have two kind of lags. The first one appears for starting orders: the harvest of the natural resource starts after a given fixed delay . This delay represents the time needed to access the natural resource. The second kind of lag, denoted by m, corresponds to the time between the end of the harvest and the date when the manager sells the harvest; this lag can be due to the drying, the packaging, the transport, the time to find a counterpart to buy the harvest, etc.
Hence, our modeling takes into account the non-immediacy of both the harvest and its sale. As a result, the corresponding optimal strategies will be more practical and will lead to economic and environmental policies that are more effective than those suggested in the classic literature.
We assume that without any intervention of the manager, the natural resource abundance evolves according to a stochastic logistic diffusion model. Such a logistic dynamics is classic in the modeling of populations' evolution; see for example [4]. If we denote by X α the controlled resource abundance process and by P its price process, the problem of the manager turns into a maximization of the expected total profit on an infinite horizon of the form: over the strategies α = (d i , s i ) i≥1 satisfying the previous constraints. From a mathematical point of view, control problems with delay were studied in [5,6] where there was only one kind of intervention. In our model, we consider two kinds of interventions, which are moreover related by a constraint. In the paper [7], the authors also considered two kinds of interventions. However, there is no constraint linking them, and only one of them is lagged. Furthermore, the state variable (the resource abundance) is a physical quantity. We therefore have the additional state constraint restricting strategies to those in which the remaining abundance is nonnegative.
Control problems under state constraints without delay have been intensely studied in the literature (see for example [8] for the study of optimal portfolio management under liquidity constraints), and the classical approach to deal with such problems is to consider the notion of constrained viscosity solutions introduced in [9,10]. In this work, we adapt these techniques to a state constraints control problem with delay. Using a dynamic programming approach, we characterize the associated value function as the unique solution in the viscosity sense to an original partial differential equation (PDE). The novelty of the PDE lies in the different forms it takes on several regions of the space.
We then test the applicability of our approach by computing numerically the optimal strategies on some examples. Our numerical tests show that the optimal strategies heavily depend on the delay parameters. In particular, the effective optimal strategies are different from the naive optimal strategies, i.e., without delays. This illustrates the contribution of our approach to identifying optimal solutions for the management of natural resources.
The paper is organized as follows. We define the model and formulate our stochastic control problem in Section 2. In Section 3, we derive the partial differential equations associated with the control problem. Then, we characterize the value function as the unique viscosity solution of a Hamilton-Jacobi-Bellman equation. Finally, in Section 4, we compute numerically the value function and the associated optimal policy via an iterative procedure based on a quantization method. We further enrich our studies with numerical illustrations.

The Model
Let Ω = C(R + , R 2 ) be the space of continuous functions from R + to R 2 . We define on Ω the σ-algebra F generated by the coordinate functions ω ∈ Ω → ω t , for t ∈ R + , and we endow (Ω, F ) with the Wiener measure P. By an abuse of notation, we still denote by F the P-completed σ-algebra. We define on the probability space (Ω, F , P) the two R-valued processes B and W by: for t ∈ R + and ω = (ω 1 t , ω 2 t ) t≥0 . We then denote by F = (F t ) t≥0 the complete filtration generated by (W, B).
We consider a resource that evolves according to the classical logistic stochastic differential equation if there is no harvesting: where η, λ, and γ are three positive constants. The constant ηλ corresponds to the intrinsic rate of population growth, and 1/λ is the carrying capacity of the environment. A manager can harvest the resource under some conditions. We denote by α := (d i , s i ) i≥1 a harvesting strategy, which is described as follows.
• d i is the time at which the manager gives the order to harvest. The harvest starts only at time d i + , with a positive constant representing the delay. • s i is the time when the harvest is stopped.
In the following, we will only consider the set A of admissible strategies such that (d i ) i≥1 and (s i ) i≥1 are two increasing sequences of F-stopping times satisfying: and for any i ≥ 1, where δ and K are positive constants with < K. We assume that in the harvesting time, the manager harvests the quantity g(x) by the time unit where x is the quantity of the available resource, and g is a function satisfying the following conditions.
(Hg) g is an increasing function from R + to R + such that g(0) = 0, and there exist two positive constants a min and a max such that a min x ≤ g(x) ≤ a max x for any x ∈ R + . Moreover, the manager must pay a cost when he harvests during a period ∆t, and this cost is f (∆t) where f is an increasing function from R + to R + such that f (0) = 0.
Finally, after any harvest time, the manager sells at time s i + m, with m a positive constant, the harvested resource. We denote by P p the price of the resource, and we suppose that it evolves according to the following stochastic differential equation: where µ and σ are positive bounded F-adapted processes and p is the price at time 0. We assume that m < δ.
We can sum up all the constraints with the following graph.
The state A corresponds to the state where the manager can decide to start a harvest. The state B corresponds to the harvesting time. The state C corresponds to the moment of sale.
The variable d i (resp. s i ) corresponds to the time when the manager decides to leave the state A (resp. B). The time to go from the state A to the state B is . This means that the time between the order to harvest and the start of the harvest is . We cannot stay more than K − in the state B, which means that the harvesting time cannot be more than K − . The time to go from the state B to the state C is m. This means that the manager must wait m after the harvest to sell this production. The time to go from the state C to the state A is δ − m, which means that the minimum time between the sale and the next order to harvest is δ − m.
If the manager follows an admissible strategy α = (d i , s i ) i≥1 , then the quantity of available resource X x,α t at time t evolves with the following stochastic differential equation: with X x,α 0 = x.

The Value Function
The objective of the manager is to optimize the expected profit over an infinite horizon. The associated value function is then given by: where β is a positive constant corresponding to the discount factor, G α i and C α i corresponds to the gain, and the cost for the i-th harvest associated with the strategy α ∈ A: and:

Extension of the Value Function
In order to provide an analytic characterization of the value function V defined by (4), we need to extend the definition of this control problem to general initial conditions. Indeed, the delays imposed on the manager make the state process non-Markov. To overcome this issue, we introduce new variables keeping in mind the time spent from the previous decision. More precisely, we consider a gain function J(x, p, θ, ρ, y, α) from E := R + × R * + × D × A(Θ) to R, with x representing the size of the available resource at the initial time, p the price of the resource, θ the time from the last decision of the manager (start a harvest or stop a harvest), ρ the time from which the manager has decided to harvest the last time, y the quantity of the harvest until now associated with this harvest, and α the strategy. We introduce some notation to simplify the formulae: z := (x, p), Z := R + × R * + and Θ := (θ, ρ, y). We also introduce the following sets: The gain function J is given for any state (z, Θ) ∈ Z × D and strategy α ∈ A(Θ) by: where G 1 and C 1 are defined by: For any j ≥ 2, C j and G j are defined by (5) and (6).
To define the set of admissible strategies A(Θ), we first introduce the set of admissible strategies We can now define the extended value function v by: for any (z, Θ) ∈ Z × D.

Dynamic Programming Principle
To characterize the value function v by a PDE, we use the approach by dynamic programming principle. The value function v satisfies the following equalities, which depend on the set in which Θ lives.
Theorem 1. For any z ∈ Z and Θ ∈ D 0 , we have: For any z ∈ Z and Θ ∈ D 1 , we have: For any z ∈ Z and Θ ∈ D 2 , we have: The proof of this theorem is postponed to Appendix A.

Growth Property
We now impose the following assumption on the coefficients: We then get the following growth property for the value function v. This one will be useful to characterize v as the unique viscosity solution of a PDE system. Proposition 1. The value function v satisfies the following growth condition: there exist two positive constants C 1 and C 2 such that: for any z = (x, p) ∈ Z and Θ = (θ, ρ, y) ∈ D.
Proof. We first prove the left inequality. If Θ ∈ D 0 , we can consider the strategy that consists of stopping the harvest as soon as possible and never harvesting after that, so: If Θ ∈ D 1 , we can consider the strategy that consists of selling the harvest and never harvesting after that, and we get: If Θ ∈ D 2 , we can consider the strategy that consists of starting the harvest as soon as possible, stopping this as soon as possible, and never harvesting after that, so: Hence, the left inequality holds with C 1 = f (K − ).
We now prove the right inequality. For that, we introduce the processX x defined byX x 0 = x and: Using the closed formula of the logistic diffusion (see, e.g., in [11]), we have: which implies the following inequality: We now consider any strategy Since the cost function f is positive, we get: From Assumption (Hg), we have: From Inequality (10), we get (C is a generic constant, which can be modified): From Inequality (8) and all the constraints about d i and s i , we know that s i ≥ Ki + δ(i − 1) + m for any i ∈ N * , so we get:

Viscosity Properties and Uniqueness
We now consider all the cases to get the PDEs satisfied by the value function v, which is derived from the dynamic programming relation: • If Θ ∈ D 0 with θ ∈ [0, ), that means the manager has given the order to harvest, but this has not yet started, which implies: , that means the manager harvests, and he/she can decide to stop this, which implies: If Θ ∈ D 0 with θ = K, the manager must stop the harvest, so we have: • If Θ ∈ D 1 with θ ∈ [0, m), that means the manager has finished harvesting, but he/she has not yet sold his/her harvest, which implies: • If Θ ∈ D 1 with θ = m, that means the manager sells his/her harvest, which implies: , then the manager can do nothing, which implies: • If Θ ∈ D 2 with θ ≥ δ, then the manager can decide to start a harvest: The operator M 2 is defined for any function v ∈ C 2 (Z × D 2 ) by: As usual, we do not have any regularity property on the value function v. We therefore work with the notion of the viscosity solution.

5.
for any (z, Θ) ∈ Z × D 2 and ϕ ∈ C 2 (Z × D 2 ) such that: we have: A locally bounded function w defined on Z × D is said to be a viscosity solution to (11)-(17) if it is a supersolution and a subsolution to (11)-(17).
The next result provides the viscosity properties of the value function v.
Theorem 2 (Viscosity characterization). The value function v is the unique viscosity solution to (11)-(17) satisfying the growth condition (9). Moreover, v is continuous on Z × D.
The proof of this theorem is postponed in Appendixes A-C.

Numerical Results
Unfortunately, we are not able to provide an explicit solution for the HJB Equations (11)-(17). We therefore propose in this section a scheme to approximate the solution v.

The Discrete Problem
In the following, we introduce the numerical tools to solve the HJB equations related to the value function v. We use a numerical backward scheme based on the optimal quantization mixed with an iterative procedure. The convergence of the solution of the numerical scheme towards the solution of the HJB equation, when the time-space step on a bounded grid goes to zero, can be shown using the standard monotonicity, stability, and consistency arguments. We refer to [12,13] for numerical schemes of the same form.
Each HJB equation of the form min(βv − L i v, v − h i ) = 0, with i ∈ {0, 1}, will be approximated as follows: where: The constant ∆ represents the time step, and the index n stands for the iteration procedure steps, which are stopped when the error between two consecutive iterations becomes smaller than a given stopping criterion ε. The random variables ξ k and ξ l represent the quantization of two independent normally distributed random variables.

Remark 1.
Recall that the optimal quantization technique consists of approximating the expectation E[ f (Z)], where Z is a normal distributed variable and f is a given real function, by: The distribution of the discrete variable ξ is known for a fixed N := card(ξ(Ω)), and the approximation is optimal as the L 2 -error between ξ and Z is of order 1/N (see [14]). The optimal grid ξ(Ω) and the associated weights P(ξ = k) can be downloaded from the website: http://www.quantize.maths-fi.com/downloads.

Numerical Interpretations
The numerical computation is done using the following set of data: Penalty function: f (x) = 0.1 × x.  We plot the shape of the value function v for a fixed state (p, θ, ρ, y) in the plane of x s.t. (x, p, θ, ρ, y) ∈ D 0 and θ ∈ [0, ). We can see that, as expected, v is increasing with respect to x. We can remark about three cases in this figure: • x ∈ [1.5, 1.7]: in this case the, value function is increasing since if we have 1.5 ≤ x < x ≤ 1.7 at the initial time, then we will have X x −θ < X x −θ < 2 a.s., and the bigger is the resource when we harvest, the more we can harvest since the function g is increasing; • x ∈ [1. 7,2]: in this case, the value function is constant since if we have 1.7 ≤ x < x ≤ 2 at the initial time then we will have X x −θ = X x −θ = 2 a.s., so we harvest exactly the same quantity in the two cases; • x > 2: in this case, the value function is increasing since if we have 2 ≤ x < x at the initial time, then the resource decreases, but we will have X x −θ < X x −θ a.s. The value function increases with respect to x, which is natural since the greater the resource is, the more we can harvest as the function g is increasing. We can also see that the value function becomes concave after x = 2, which is due to the resource's mean-reverting nature. Indeed, if the quantity of the resource is greater than two, because the drift is negative, the resource would necessarily decrease, reducing the harvest. The oscillations observed when x is small are due to the delay . We plot the shape of the value function v for a fixed state (p, θ, ρ, y) in the plane of x s.t. (x, p, θ, ρ, y) ∈ D 0 and θ ∈ [0, ) for different values of . We can see that, when changing the delay time , the change point of the value function's monotony is also shifted: 1.6 for = 0.1379, 1.7 for = 0.2069, and 1.8 for = 0.2759. Indeed, as the figure shows, as the delay decreases, the change point of the monotony approaches zero and will likely disappear when there is no delay, leading to a perfect concave function. In fact, wasting time waiting to harvest because of the delay will lead the manager to skip the increasing period of the resource. The manager will start the real harvesting when the population is dropping due to the mean-reverting parameter λ; thus, the value function will decrease. We plot the optimal decision that the manager would make for a fixed state (p, θ, ρ, y) in the plane of x s.t. (x, p, θ, ρ, y) ∈ D 0 and θ ∈ [ , K]. As we can see, the optimal decision that the manager should make is to start harvesting if the resource x is over a given level; otherwise, he/she should stop and sell the harvest. This is due to the cost f , which penalizes him/her as long as the harvesting is ongoing. In fact, if the population is not large enough, he/she would not be able to cover his/her loss.   We plot the optimal decision that the manager would make for four fixed states (x, p, y) (P1, P2, P3, P4) in the plane of θ s.t. (x, p, θ, ρ, y) ∈ D 0 . The state P1 represents the case where x and y are both low; state P2 is for x low and y high, P3 for x high and y low, and final state P4 for x and y both high. Decision 1 stands for starting the harvest, and Decision 2 stands for stopping it. As we can see, the optimal decision that the manager should make in state P1 (resp. P2), knowing that he/she has already spent θ time since the starting decision, is to stop harvesting if θ ≤ θ 0 (resp. θ ≤ θ 1 ) where θ 0 0.34 (resp. θ 1 0.38). We can explain this as follows: on the one hand, in the case where θ ≤ θ 0 (resp. θ ≤ θ 1 ), due to the cost of harvesting and the fact that we are in state P1 (resp. P2) where the population is low, the manager prefers to immediately stop harvesting and sell the harvest; otherwise, he/she will likely lose money. On the other hand, if θ ≥ θ 0 (resp. θ ≥ θ 1 ), i.e., the manager has already given the order to harvest since a given period of time, it is optimal for him/her to harvest for the purpose of covering the cost due to the large time spent harvesting. We can also note that this last window of time is larger for state P1 in comparison with the one of state P2. Indeed, in state P2, the manager has already harvested more than in state P1, so he/she can stop harvesting sooner.
Concerning states P3 and P4, where the population is high, obviously, the optimal decision for the manager is to harvest at all times and stop harvesting when θ = K = 0.4828.  We plot the value function v for a fixed state (x, θ, ρ, y) in the plane of p s.t. (x, p, θ, ρ, y) ∈ D 1 and θ ∈ [0, m]. As expected, v is nondecreasing w.r.t. p. The more expensive the resource is, the more the manager takes benefits.  We plot the optimal decision that the manager would make for a fixed state (p, θ, ρ, y) in the plane of x s.t. (x, p, θ, ρ, y) ∈ D 2 and θ ∈ [δ, θ max ]. As we can see, the optimal decision that the manager should make, knowing that he/she has already sold the harvest, is to start harvesting over a certain level of x under the mean-reverting barrier λ so that the population grows enough to cover the harvesting costs and take benefits.

Conclusions
Our modeling takes into account the non-immediacy of both the harvest and its sale, described by the time delays and m. The optimum strategies commented on previously illustrate the effect of those delays on the manager's actions. In the classical literature, many studies suggesting optimal harvesting policies presume that the natural resource is immediately available, which is not the case in general. Consequently, the proposed policies would not be feasible and would lead to a sub-optimal use of the resource, in the best case scenario. The ecological and economic effects may thus be consequential.
For example, a modeling of fisheries that does not involve delays may lead to a harvesting strategy that would likely deplete the fish population, leading to extinction.
In fact, if the population is at a high level at the initial time, it is then likely to decline due to the logistic nature of the dynamics. As a result, if the time required to reach the harvest region is not taken into account, the best approach would be to harvest massively, thus causing a drastic degradation of the fish population.
We may make the same reasoning when it comes to selling the crop. Assume that we are in a position to sell our harvest immediately after harvesting and neglect the time required to return to land. In that case, if the price of fish falls, we would suffer losses, and we would not be able to amortize the costs of fishing.

Conflicts of Interest:
The authors declare no conflicts of interest.

Appendix A. Dynamic Programming Principle
We introduce some notations in this part to alleviate the proofs. We first denote by T the set of F-stopping time. For ω, ω ∈ Ω and t ≥ 0, we set: For any (z, Θ) ∈ Z × D and α ∈ A(Θ) we define Z z,α as the two-dimensional process (X x,α , P p ). For any t ≥ 0, we denote by Θ(t, α) the triple (θ t , ρ t , y t ) where θ t corresponds to the time from the last decision of the manager before t (this order can be an order to start a harvest or an order to stop a harvest), ρ t the time from which the manager has given the last order to harvest before t and y t is the harvested quantity until time t.
For τ ∈ T and α = (d i , s i ) i≥1 ∈ A(Θ), we define the shifted (random) strategy α τ by: Before proving the dynamic programming principle, we need the following results.

2.
Consistency of the gain function: Proof. These properties are direct consequences of the dynamics of Z z,α and of the definitions of J and A. We now turn to the proof of the dynamic programming principle. Unfortunately, we have not enough information on the value function v to directly prove these results. In particular, we do not know the measurability of v and this prevents us from computing expectations involving v as in the dynamic programming principle. We therefore provide weaker dynamic programing principles involving the envelopes v * and v * as in [15] where: We recall that in general, v * ≤ v ≤ v * . Since we get the continuity of v at the end, these results implies the dynamic programming principle.
Proposition A1. For any Θ ∈ D 0 and z ∈ Z, we have: For any Θ ∈ D 1 and z ∈ Z, we have: For any Θ ∈ D 2 and z ∈ Z, we have: Proof. Let i ∈ {0, 1, 2}, z ∈ Z and Θ ∈ D i , α ∈ A(Θ) and ϑ ∈ T . By definition of the value function v, for any ε > 0 and ω ∈ Ω, there exists α ε,ω = (s ε,ω We now define by concatenation the control strategyᾱ consisting of the impulse control components of α on [0, ϑ), and the impulse control components (ᾱ ε + ϑ) on [ϑ, ∞). More precisely, α is given by:ᾱ By definition of the shift given in (A1), we have: From Lemma A1 (ii) and the definition of the performance criterion we get the following equalities.

•
If z ∈ Z and Θ ∈ D 0 , then we have: • If z ∈ Z and Θ ∈ D 1 , then we have: • If z ∈ Z and Θ ∈ D 2 , then we have: Together with (A2), this implies if z ∈ Z and Θ ∈ D 0 , we have: If z ∈ Z and Θ ∈ D 1 , we have: If z ∈ Z and Θ ∈ D 2 , we have: Since ε, ϑ and α are arbitrarily chosen, we get the result.
If z ∈ Z and Θ ∈ D 2 , we get: since ϑ and α are arbitrary, we obtain the required inequality.
Sending n to ∞, we get the supersolution property from the mean value theorem.
By Theorem 1 we can find for each n ∈ N a control α i,n ∈ A(Θ n ) such that for all h n ∈ T : v(z n , Θ n ) ≤ E e −βh n v(Z z n ,α 0,n h n , Θ(h n , α 0,n ))1 h n <s 0,n 1 +n + e −β(s 0,n 1 +m y n +