A Constrained Markovian Diffusion Model for Controlling the Pollution Accumulation

This work presents a study of a finite-time horizon stochastic control problem with restrictions on both the reward and the cost functions. To this end, it uses standard dynamic programming techniques, and an extension of the classic Lagrange multipliers approach. The coefficients considered here are supposed to be unbounded, and the obtained strategies are of non-stationary closed-loop type. The driving thread of the paper is a sequence of examples on a pollution accumulation model, which is used for the purpose of showing three algorithms for the purpose of replicating the results. There, the reader can find a result on the interchangeability of limits in a Dirichlet problem.


Introduction
The aim of pollution accumulation models is to study the management of some goods to be consumed by a society. It is generally accepted that such consumption generates two byproducts: a social utility, and pollution. The difference between the utility and the disutility associated with the pollution is known as social welfare. The theory developed in this work enables the decision maker to find a consumption policy that maximizes an expected social welfare for the society, subject to a constraint that may represent, for example, that some costs of cleaning the environment are not to exceed some given quantity along time.
This paper deals with the problem of finding optimal controllers and values for a class of diffusions with unbounded coefficients on a finite-time horizon under the total payoff criterion subject to restrictions. It uses standard dynamic programming tools, the Lagrange multipliers approach, and a result on the interchangeability of limits in a Bellman equation. The driving thread of the paper is a sequence of examples on a pollution accumulation model, which is used for the purpose of showing how to replicate the theoretical results of the work.
The origin of the use of the optimal control theory in the context of stochastic diffusions on a finite-time horizon can be traced back to the works of Howard (see [1]), Fleming (see, for instance, [2][3][4]), Kogan (see [5]), and Puterman (cf. [6]). However, the stochastic optimization problem with constraints was attacked only in the late 90s and early 2000s, when some financial applications demanded the consideration of these models, under the hypothesis that the coefficients of all: the diffusion itself, the reward function, and the restrictions, are bounded (see, for Example [7][8][9][10]). Constrained optimal control under the discounted and ergodic criteria was studied in the seminal paper of Borkar and Ghosh (see [11]), the work of Mendoza-Pérez, Jasso-Fuentes, Prieto-Rumeau and Hernández-Lerma (see [12,13]), and the paper by Jasso-Fuentes, Escobedo-Trujillo and Mendoza-Pérez [14]. In fact, these works serve as an inspiration to pursue an extension of their research to the realm of non-stationary strategies.
Although this is not the first time that the problem of pollution accumulation has been studied from the point of view of dynamic optimization (for example, [15] uses an LQ model to describe this phenomenon, [16] deals with the average payoff in a deterministic framework, [17,18] extend the approach of the former to a stochastic context, and [19] uses a stochastic differential game against nature to characterize the situation), this paper contributes to the state-of-the-art by adding constraints to the reward function, and by taking into consideration a finite-time horizon. Moreover, this work profits from this fact by proposing a simulation scheme to test its analytic results. However, it would not be possible to find a suitable Lagrange multiplier for such simulations without the results presented in Example 3, and Theorem 2, below.
The relevance of this work lies in the applicability of its analytic results in a finitetime interval. Unlike the models under infinite-time criteria (i.e., discounted and average payoffs; and the refinements of the latter), which focus on finding optimal controllers in the set of (Markovian) stationary strategies, the criterion at hand considers as well the more general set of (Markovian) non-stationary strategies. This fact implies that the functional form of the Bellman equation includes a time-dependent term, and that the feedback controllers will depend explicitly on the time argument. Since the coefficients of the diffusions involved in this study are assumed to be unbounded, all of the points in R n will be attainable, and a verification result will be needed to ensure the existence of a solution to the Bellman equation that remains valid for all (t, x) in [0; T] × R n , where T will be the horizon.
Significance and contributions.
• This paper presents an application of two classic tools: the Lagrange multipliers approach, and Bellman optimization in a finite horizon for diffusions with possibly unbounded coefficients. This fact represents a major technical contribution with respect to the existing literature. • This study illustrates its results by means of the full development and implementation of an example on control of pollution accumulation. It also gives actual algorithms which can be used for the replication of the results presented along its pages. • This work lies within the framework of dynamic optimization. However, it considers a broader class of coefficients than, for instance, [15]. As is the case of [16], it presents a pollution accumulation model. However, it focuses on a stochastic context (as in [17,18]), with the difference that the present project does so in a finite-time horizon, and with restrictions on both the reward and the cost functions.
The rest of the paper is divided as follows. The next section gives the generalities of the model under consideration, i.e., the diffusion that drives the control problem, the total payoff criterion, the restrictions on the cost and the control policies at hand. Example 1 introduces the pollution model. Section 3 deals with the actual (analytic and simulated) solution of the problem. Examples 2, 3, 4, Lemma 2, Theorem 2 and Example 5 illustrate the analytic technique and serve the purpose of comparing it with some numeric simulations. Finally, Section 4 is devoted to the presentation of the final Remarks. This section concludes by introducing some notation for spaces of real-valued functions on an open set R n . The space W ,p (R n ) stands for the Sobolev space consisting of all real-valued measurable functions h on R n such that D α h exists for all |α| ≤ in the weak sense, and it belongs to L p (R n ), where D α h := ∂ |α| h ∂x α 1 1 , · · · , ∂x α n n with α = (α 1 , · · · , α n ), and |α| := Moreover, C κ (R n ) is the space of all real-valued continuous functions on R n with continuous -th partial derivative in x i ∈ R, for i = 1, ..., N, = 0, 1, ..., κ. In particular, when κ = 0, C 0 (R n ) stands for the space of real-valued continuous functions on R n . Now, C κ,η (R n ) is the subspace of C κ (R n ) consisting of all those functions h such that D α h satisfies a Hölder condition with exponent η ∈]0; 1], for all |α| ≤ κ; that is, there exists a constant K 0 such that

Preliminaries
This work studies a finite-horizon optimal control problem with restrictions. In concrete, let (Ω, F , {F t : t ≥ 0}) be a measurable space. Let there also be an F t -adapted stochastic differential system of the form where b : R n × U → R n and σ : R n → R n×d are the drift and diffusion coefficients, respectively; and W(·) is a d-dimensional standard Brownian motion. Here, the set U ⊂ R m is a Borel set called the action (or control) set. Moreover, let u(·) be a U-valued stochastic process representing the controller's action at each time t ≥ 0. Now, the profit that an agent can obtain from its activity in the system is measured with the performance index: where r and r 1 are the running and terminal rewards, respectively; and the symbol E u x [·] stands for the conditional expectation of · given that x(t) = x, and the agent uses the sequence of controllers u.
The goal is to maximize (2) subject to a finite-horizon cost index of the operation: where c is a running-cost rate, c 1 is a terminal cost rate function; θ is a running constraintrate function, and θ 1 is a terminal constraint-rate function. Observe that as the running reward-rate function r depends on the action of the controller; the running constraint-rate θ is independent of such variable.
The following is an assumption on the coefficients of the differential system (1).

Hypothesis (H1a). The control set U is compact.
Hypothesis (H1b). The drift coefficient b(x, u) is continuous on R n × U, and x → b(x, u) satisfies a local Lipschitz condition on R n , uniformly on U; that is, for each R > 0, there exists a constant K 1 (R) > 0 such that for all |x|, |y| ≤ R Hypothesis (H1c). The diffusion coefficient σ satisfies a local Lipschitz condition on R n ; that is, for each R > 0, there exists a constant K 2 (R) > 0 such that for all |x|, |y| ≤ R; that is, there exists a positive constant K 2 such that for all x, y ∈ R n .

Remark 1.
The local Lipschitz conditions on the drift and diffusion coefficients referred to in Hypothesis (H1b)-(H1c), along with the compactness of the control set U, stated in Hypothesis 1a, yield that for each R > 0, there exists a number K 4 (R) For u ∈ U, and h(t, ·) ∈ W 2,p (R n ) for all t ≥ 0, define: with a(·) as in Hypothesis 1d, and ∇h, H representing the gradient and the Hessian matrix of h with respect to the state variable x, respectively. The main application of this work is the pollution accumulation model. Although it will be possible to solve this problem within the realm of pure feedback strategies, this is not always the case. As a consequence, the set of actions needs to be widened.

Control Policies.
Let M be the family of measurable functions f : [0; T] × R n → U. A strategy u(t) := f (t, x(t)), for some f ∈ M is called a Markov policy. Definition 1. Let (U, B(U)) be a measurable space, and P (U) be the family of probability measures supported on U. A randomized policy is a family π := (π t : t ≥ 0) of stochastic kernels on B(U) × R n satisfying: (a) for each t ≥ 0 and x ∈ R n , π t (·|x) ∈ P (U) such that π t (U|x) = 1, and for each D ∈ B(U), π t (D|·) is a Borel function on R n ; and (b) for each D ∈ B(U) and x ∈ R n , the function π t (D|x) is a Borel-measurable in t ≥ 0.
The set of randomized policies is denoted by Π.
Observe that every f ∈ M can be identified with a strategy in Π by means of the P (U)-valued trajectory δ f , where δ f represents the Dirac measure at f . When the controller operates policies π = (π t : t ≥ 0) ∈ Π, both the drift coefficient b, and the operator L u defined in (1) and (4), respectively, are written as Under Hypothesis (H1a)-(H1d) and Remark 1, for each policy π ∈ Π there exists an almost surely unique strong solution x π (·) of (1), which is a Markov-Feller process. Furthermore, for each policy π = (π t : t ≥ 0) ∈ Π, the operator ∂ t ν(t, x) + L π t ν(t, x) becomes the infinitesimal generator of the dynamics (1) (for more details, see the arguments in [20] (Theorem 2.2.7)). Moreover, by the same reasoning of Theorem 4.3 in [20], for each π ∈ Π, the associated probability measure P π (t, x, ·) of x π (·) is absolutely continuous with respect to Lebesgue's measure for every t ≥ 0 and x ∈ R n . Hence, there exists a transition density function p π (t, x, y) ≥ 0 such that P π (t, x, B) = B p π (t, x, y)dy, for every Borel set B ⊂ R n .
Topology of relaxed controls. The set Π is topologized as in [21]. Such a topology renders Π a compact metric space, and it is determined by the following convergence criterion (see [20][21][22]). Definition 2 (Convergence criterion). It will be said that the sequence (π m : m = 1, 2, ...) in Π converges to π ∈ Π, and such convergence is denoted as π m W → π, if and only if for all g ∈ L 1 ([0; T] × R n ), and h ∈ C b ([0; T] × R n × U), i.e., in the set of continuous and bounded functions on Throughout this work, the convergence in Π is understood in the sense of the convergence criterion introduced in Definition 2.
The following Definition is this work's version of the polynomial growth condition quoted in, for instance [18].

Definition 3.
Given a polynomial function of the form w(x) = 1 + |x| k (with k ≥ 2), and x ∈ R n , let the normed linear space B w ([0; T] × R n ) be that which consists of all real-valued measurable functions ν on [0; T] × R n with finite w-norm given by

Remark 2.
(a) Observe that for any function ν ∈ B w ([0; T] × R n ): This last inequality implies that any function ν ∈ B w ([0; T] × R n ) satisfies the polynomial growth condition. (b) Assuming that the initial data x(s) = x has finite absolute moments of every order (i.e., where the constant C k depends on k, T − s, and the constant K 1 is as in Hypothesis (H1b). (c) In the application developed throughout this paper, the constant initial data x(s) = x is considered. Then E|x(t)| k also has finite moments of every order (see Proposition 10.2.2 in [18]). Therefore, E|x(t)| k ≤ C k (1 + |x| k ). Now, hypotheses on the reward, cost and constraint rates from (2) and (3) are stated. These are very standard, and represent an extension of the ones used in classic works, such as p. 157 in [23] (Chapter VI.3) and p. 130 in [24] (Chapter 3).
Hypothesis (H2a). The functions r, c : [0; T] × R n × U → R are continuous, and locally Lipschitz on R n , uniformly on U; that is, for each R > 0, there exists a constant K 5 (R) > 0 such that for all |x|, |y| ≤ R Hypothesis (H2b). r(·, ·, u) and c(·, Hypothesis (H2c). The terminal reward and cost rates r 1 (·, ·), c 1 (·, ·) ∈ B w ([0; T] × R n ); and the running and terminal constraint rates θ(·, ·), For π = (π t : t ≥ 0) ∈ Π the reward and cost rates are written as To complete this section, the main application of this work is introduced. It consists of a pollution accumulation model. This application is inspired by the one presented in [17,18], and satisfies Hypotheses (H1a)-(H1d) and (H2a)-(H2c). Example 1. Fix the probability space (Ω, F , {F t : t ≥ 0}, P), and let T > 0 be a given time horizon. Consider the pollution process defined by the controlled diffusion Here u(s) represents the consumption flow at time t ≥ 0, and γ is certain consumption restriction imposed by, for instance worldwide protocols. Additionally, the number η ∈]0; 1] is the rate of pollution decay.
It is easy to see that the coefficients of (7) meet Hypothesis (H1a)-(H1c). A simple calculation yields that K 3 ≥ σ 2 − c for any c ∈]0; σ 2 [. Now, a simulation of the trajectories of the Itô's diffusion (1) is presented. To this end, the extension of Euler's method for solving first order differential equations known as Euler-Maruyama's method (see, for instance [25] and Chapter 1 in [26]) is used. This technique is suitable for diffusions that meet Hypothesis (H1a)-(H1d). The focus is on the comparison between Vasicek's model for interest rates in finance (see, for instance Chapter 5 in [27]): with s ∈ [t; T], and Kawaghuchi-Morimoto's model (7).
Let z N : {0, 1, ..., N} × Ω → R n , N ∈ N, be the Euler-Maruyama approximations for the stochastic differential Equation (1), recursively defined by z N 0 := x and In Figures 1 and 2, observe that Kawaguchi-Morimoto's process allows one to choose a deterministic (implicit) function of t, whereas Vasicek's series features what is known in the literature as mean reversion. The latter fact is clear from the choice of a constant parameter µ.
Let h ∈ W 1,2;p ([0; T] × R). After (4), the infinitesimal generator of (7) is given by The polynomial function w(x) = x 2 + x + 1 satisfies Definition 3. Please note that this function does not depend on the time argument t.  The reward-rate function used in further developments represents the social welfare, is given by r : [0; T] × R × U → R, and is defined as: where F ∈ C 2 (R) stands for the social utility of the consumption u, and a · x stands for the social disutility (so to speak) of the pollution stock x, for a > 0 fixed. It is assumed that The cost rate function will be given by with c 1 > 0, and c 2 ∈ R satisfying c 1 + ηc 2 > 0. (12) Since the pollution stock x depends on the time variable t ≥ 0, the functions defined in (9) and (11) also depend on this variable.
The running constraint-rate function has the form where q is a positive constant. (Here, as with the reward and cost functions, it is assumed that x implicitly depends on t.) The terminal constraint, cost and reward rates will be fixed at a level of zero. It is not difficult to see that if F meets Hypothesis (H2a)-(H2c), then so do the social welfare, the cost rate and the running constraint functions.

A Finite-Horizon Control Problem with Constraints
This section is devoted to the introduction of the study of the finite-horizon problem with constraints.

Definition 4.
For each π ∈ Π and T ≥ t, the total expected reward, cost and constraint rates over the time interval [t; T] given that x(t) = x are, respectively, with r(s, x(s), π s ) and c(s, x(s), π s ) as in (6).
The proof of the next result is an extension of [28] [Proposition 3.6]. Lemma 1. Hypothesis (H2a)-(H2c) imply that the total expected reward J T (t, x, π, r), the total expected cost J T (t, x, π, c), and the constraint rate sup Proof of Lemma 1. The proof is presented only for J T (t, x, π, r), for the line of reasoning is the same for J T (t, x, π, c) andθ T (t, x, π). By Hypothesis (H2b), it is known that for every For each T > 0, and x ∈ R n , assume that the (running and terminal) constraint functions θ(·, ·) and θ 1 (·, ·) are given, and that they satisfy Hypothesis (H2c). In this way, To avoid trivial situations, it is assumed that this set is not empty (see Remark 3.8 in [14]). To formally introduce what is meant when talking about the maximization of (2) subject to (3), the finite-horizon problem with constraints is defined.

Definition 5.
A policy π * ∈ Π is said to be optimal for the finite-horizon problem with constraints (FHPC) with initial state x ∈ R n if π * ∈ F t,x θ T and, in addition, In this case, J * T (t, x, r) := J T (t, x, π * , r) is called the T-optimal reward for the FHPC.

Lagrange Multipliers
To solve the FHPC, the Lagrange multipliers approach and the dynamic programming technique are used to transform the original FHPC into an unconstrained finite-horizon problem, parametrized by the so-named Lagrange multipliers. To do this, take λ ≤ 0 and consider the new (running and terminal) reward rates ).
It is natural to let, for all (t, x) ∈ [0; T] × R n , Notice that Example 3 (Examples 1 and 2 continued). The performance index for the FHUP is given by Return now to Example 1, where a single trajectory of the processes (7) and (8) for certain parameters were simulated, and the policy u(t) = x(t), for (7); and u(t) = µ, for (8). One's aim is to compute (20) for a fixed value of λ < 0, when the utility function derived from the consumption is given by F(u) = √ u, by means of Monte Carlo simulation. To this end, the following pseudocodes are presented.
Walkthrough of Algorithm 1. This pseudocode's goal is to compute the integral inside (20).
• Line 1 initializes the process. • Line 2 emphasizes the fact that λ < 0 is supposed to be given.

•
In lines 3-11, the algorithm decides if it will work with (7), or with (8). • Line 12 sets F = √ u and D = a · x, and computes initial values for r, c and θ according to (9), (11) and (13) Walkthrough of Algorithm 2. The purpose of this pseudocode is to compute a 95%confidence interval for the expectation of the result of Algorithm 1 according to Monte Carlo's method.  16 if work with (7)  x ← x + (u − ηx)dt + σdW; j ← j + dt; 30 end 31 I ← I · dt; 32 return I; Algorithm 1 receives the initial value x 0 , the step size dt, the time horizon T, and the parameters of the diffusion (7) (resp. (8)) to calculate the (Itô) integral inside the expectation operator in (20) when the process (7) (resp. (8)) is used; then, Algorithm 2 iterates this process and returns the average of such iteration, thus approximating the value of (20). These algorithms require a negative and constant value of the Lagrange multiplier. Later, in Example 5, a modification of Algorithm 1 that solves this situation will be proposed. For the sake of illustration, take the parameter values from Example 1 (that is x 0 = 5, η = 1, σ(x) ≡ 0.5, µ = 5, T = 1, and N = 100), and use Algorithms 1 and 2 to compute an approximation to the value of (20) when one considers the diffusion (8) (that is, the diffusion (7) with u(t) ≡ µ) for all t ≥ 0). Additionally, take γ = 0.4, c 1 = 0.1, c 2 = 0.05, q = 0.0195, and a = 1.25.
Notice that Proposition 1 does not assert the existence of a function that satisfies (21) (this is the purpose of Proposition 2 below). It rather motivates the definition of the finite-horizon unconstrained problem. Definition 6. A policy π * ∈ Π for which is called finite-horizon optimal for the finite-horizon unconstrained problem (FHUP), and J * T (·, ·, r λ ) is referred to as the finite-horizon optimal reward for the FHUP.
Use the former result to introduce the HJB equation for the FHUP for the examples presented along the paper. (Examples 1-3 continued). The HJB equation for the FHUP is given by:

Example 4
where h ∈ C 1,2 ([0; T] × R); and According to Proposition 2, a solution of the HJB equation (25) yields the finite-horizon optimal reward J * T (t, x, r λ ) and the optimal policy π * for the FHUP over the interval [t; T]. Now use Definition 6 and Propositions 1 and 2 to set expressions for the optimal performance index, policies, and constraint rates from the examples presented along this work. (Examples 1-4 continued). Let Λ and I be the Lebesgue's measure and the indicator function, respectively. Consider the planning horizon [t; T] and assume the conditions in (7), (9)-(13) hold. Then, (i) For every x > 0 and λ ≤ 0, the value function J * T (t, x, r λ ) in (23), becomes
For every x > 0 and λ ≤ 0, the total expected reward, cost and constraint, respectively J T (t, x, f λ (t), r), J T (t, x, f λ (t), c), and θ T (t, x, f λ (t)); defined in Example 2, take the form Proof of Lemma 2.
(i) Start by making an informed guess of the solution of (25). Namely h(t, x) := p(t)x + m 2 (t). (33) Observe that h t (t, x) = p (t)x − m 2 (t), h x (t, x) = p(t), and h xx (t, x) = 0. The substitution of these expressions in (25) yields This means that Impose the terminal condition p(T) = 0 to (34) to obtain where k 1 is as in (27). Now, from (35), write To find the supremum of the expression inside the braces, use a standard calculus argument to see that at a critical point u: Next, since by (10), F (u) ≥ 0, it turns out that Then, from (37): With this in mind, (36) turns into where Λ(·) stands for Lebesgue's measure. Therefore, from (33), obtain This proves (26)- (28). The optimality of (29) for the FHUP (20) follows from Proposition 2(ii). (ii) To see that (30) holds, use (17) to write Here, the interchange of integrals is possible due to the finiteness of the interval [t; T], and Fubini's rule. Now, since the solution of the controlled diffusion process (7) is given by where x(t 0 ) = x and its expected value is Now, by (29) observe that the former equals: To prove (31), use the two leftmost members in (18), and proceed as above to put: Finally, by the two rightmost members of (18), write This proves (32). The proof is now complete.

From an Unconstrained Problem, to a Problem with Restrictions
This section starts with an important observation on the set of strategies which will be used.
T] × R n ; and J * T (T, x(T), r λ ) = r λ 1 (x(T))}. Since M can be thought of as a subset of Π, Proposition 2(ii) ensures that the set Π λ is nonempty. Lemma 3. Let (λ m ) be a sequence in ] − ∞; 0] converging to some λ * ≤ 0, and assume that there exists a sequence π λ m ⊂ Π λ m for each m ≥ 1 that converges to a policy π ∈ Π. Then π ∈ Π λ * ; that is, π satisfies Proof of Lemma 3. Recall Definition 2. Take an arbitrary sequence (π m ) ⊂ Π λ such that π m W → π. Observe that Proposition 2 ensures that for each m ≥ 1, J T (t, x, r λ m ) satisfies: In terms of the operatorL π m t λ m , defined in (A4), the former relation reduces to x, r λ m ), and λ m constant. A verification that the hypotheses of Appendix A follows. Specifically, part (a) trivially follows from (39). Then, the focus will be on checking that part (b) of Theorem A1 is met. To do that, for some R > 0, take the ball B R := {x ∈ R n : |x| < R}. By [30] [Theorem 9.11], there exists a constant C 0 (depending on R) such that for a fixed p > n: where |B 2R | represents the volume of the closed ball with radius 2R; M and M 2 (x, T, t) are the constants in Hypothesis (H2b), and in (14), respectively.
Notice that conditions (c) to (f) from Theorem A1 trivially hold, and that condition (g) is given as a part of the hypotheses just presented. Then, one can claim the existence of a function h λ * ∈ W 1,2;p ([0; T] × B R ) together with a subsequence (m k ) such that J * T (·, ·, r λ m k ) = J * T (·, ·, π m k · , r λ m k ) → h λ * (·, ·) uniformly in [0; T] × B R and pointwise on [0; T] × R n as k → ∞ and π m W → π. Furthermore, h λ * satisfies: Since the radius R > 0 was arbitrary, one can extend the analysis to all of x ∈ R n . Thus, Proposition 1 asserts that h λ * (t, x) coincides with J * T (t, x, r λ * ). This proves the result.
Lemma 3 gives, in particular, the continuity of the mapping π t → J T (t, x, π t , r λ ).

Remark 5.
If the opposite condition in (54) occurs, then the existence of a critical point of the mapping λ → J * T (t, z, r λ ) implies necessarily that In this case, every λ ≤ 0 is a critical point of λ → J * T (t, z, r λ ) and f λ (t) = γ is an optimal policy for the FHPC. To avoid this trivial situation, under the fact F (∞) = 0, choose γ large enough such that Now use Theorem 2 to propose a modification of Algorithm 1 to compute the integral inside (20). Observe that it is no longer needed to include the computation of the Vasicek process (8) because the optimal values of the controllers f λ * -given by (57), and the Lagrange multipliers λ * t -given by (60)-are non-stationary along time.
Example 5 (Examples 1-4, Lemma 2, and Theorem 2 continued). Algorithms 2 and 3 can be used to compare the Monte Carlo simulations for the integral inside the expectation operator (20) with the results (formula (58)) from Theorem 2. To this end, recall from Example 1, the choice made for the parameters of (7) (that is: x 0 = 5, σ(x) ≡ 0.5, η = 1 and T = 1). In addition, choose constants that meet (12): these are a = 1.25, γ = 1, c 1 = 0.1, c 2 = 0.05, and q = 0.0195. With this configuration, condition (54) holds for all t ∈ [0; 1] with an error of, at most 0.004 (see Figure 3). With all these in mind, formula (58) in Theorem 2 yields an optimal value for the FHPC of the optimal controllers -and objective function-for a finite-time horizon under constraints. Moreover, this work used the tools presented in [25], and the Monte Carlo simulation technique to test its analytic findings. This represents a major implication of this work concerning the current methodology for resource management and consumption when pollution has an active role. Indeed, the model presented along this paper can be used for the purpose of decision-making when the social welfare, and the cost and rewards constraints are known and parametrized. A plausible extension of this paper could be related to looking for optimal controllers on a random horizon with a constrained performance index, in the fashion of [31].

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Technical Complements
In this appendix, an extension of Theorem 5.1 from [32] to the non-stationary case with only one controller, in a finite horizon is introduced.
Theorem A1. Let R n be a C 2 -class bounded domain and suppose that Hypotheses 1 and 2 hold. Moreover, assume the existence of sequences (h m ) ⊂ W 1,2;p ([0 (e) λ m converges uniformly to some function λ.
Since t ∈ [0; T], remove the time argument from the latter expression by merely substituting the constants M and M 1 by another constants. To keep the notation as straightforward as possible, this will not be done. Now, Hypothesis (H1b) gives the existence of a constant K 1 (R n ), such that |b(x, π)| ≤ K 1 (R n ). Moreover, there also exists a positive constant k([0; T] × R n ) such that |v 1 (t, x, π)| + |v 3 (t, x, π)| ≤ k([0; T] × R n ).
Take all of these facts, and observe that: