Sustainable Optimal Control for Switched Pollution-Control Problem with Random Duration

Considering the uncertainty of game duration and periodic seasonal fluctuation, an n-player switched pollution-control differential game is modeled to investigate a sustainable and adaptive strategy for players. Based on the randomness of game duration, two scenarios are considered in this study. In the first case, the game duration is a random variable, Tf, described by the shifted exponential distribution. In the second case, we assumed that players’ equipment is heterogeneous, and the i-th player’s equipment failure time, Tfi, is described according to the shifted exponential distribution. The game continues until a player’s equipment breaks down. Thus, the game duration is defined as Tf=min{Tf1,…,Tfn}. To achieve the goal of sustainable development, an environmentally sustainable strategy and its corresponding condition are defined. By using Pontryagin’s maximum principle, a unique control solution is obtained in the form of a hybrid limit cycle, the state variable converges to a stable hybrid limit cycle, and the total payoff of all players increases and then converges. The results indicate that the environmentally sustainable strategy in the n-player pollution-control cooperative differential game with switches and random duration is a unique strategy that not only ensures profit growth but also considers environmental protection.


Introduction
Practical ecological, economic, and engineering problems comprise switching phenomena [1][2][3].All systems involving logical decision-making and continuous (smooth) dynamics, such as robot systems [4], chemical control processes [5], etc., can be transformed into hybrid systems with multiple regimes of dynamics.
Therefore, the hybrid dynamic system has gained considerable a research interest in the environmental, economic, and engineering fields.In addition, changed systems, in the form of time-driven and state-driven switches, are increasingly common.The up-to-date contributions to those fields are [6][7][8][9][10].In [9,11], an optimal solution in the form of a hybrid limit cycle (HLC) was introduced as the best possible candidate for the infinite-horizon optimization problem.However, the results were only about the optimal control of a single agent and did not explore the optimal control strategy of multiple players.
In addition, in contrast to the general pollution-control problems with deterministic terminal time [12,13] or with infinite horizon [10,14], the randomness of the game duration cannot be ignored, because the game may end abruptly.The reasons behind this can be an equipment break-down, an economic failure.or a natural disaster, among many others [15].In [15,16], differential pollution-control games with random duration were thoroughly analyzed.However, the impact of seasonal fluctuation on the system was not considered.
Thus, by combining the two aforementioned research directions, an n-player cooperative differential game was explored for pollution control.The game involves infinite time-driven switches and encompasses random game duration.
The contributions of this paper are summarized as follows: 1.
A novel model is proposed to address challenges within the context of an n-player cooperative differential game for pollution control with time-driven switches and random duration.The time-driven switches within system are denoted by a periodic piecewise-constant function.Taking into account the randomness of game duration and the players' equipment warranty period, the finite-horizon optimal control problem is reformulated as an infinite-horizon optimal control problem, in which the game duration is modeled considering two scenarios based on shifted exponential distributions.The proposed model introduces innovative concepts and refines previously established methodologies, aiming to enhance its adaptability to real-world scenarios and yield more practical outcomes.2.
In addition, this study proves the sustainability and uniqueness of the environmentally sustainable solution upon the proposed model.

3.
Solving the optimal pollution-control problem with time-driven switches and random duration: We employed Pontryagin's maximum principle and thoroughly analyzed the adjoint variable dynamics to derive the shifted equilibrium value, resulting in a unique environmentally sustainable solution in the form of an HLC, which is environmentally friendly and guarantees the profit of all players.
The obtained results can be applied to the optimization problems of switched systems with periodic switching signals, such as the formation control problem [17] and capital investment problem [8].Additionally, they can provide valuable insights on how agents can effectively adapt to a rapidly changing and evolving environment, allowing them to reap the benefits.
The remainder of this paper is structured as follows.In Section 2, the model of the nplayer pollution control differential game with time-driven switches and random duration is formulated.In addition, considering different types of game durations, two case studies were conducted.Section 3 presents a discussion of the cooperative game with identical shifted exponential distribution and provides a the definition of the environmently sustainable control and determines the equilibrium value of adjoint variable that is used to derive the unique solution.In Section 4, we discuss a cooperative game with different shifted exponential distributions and obtain the corresponding equilibrium value and unique solution.Section 5 provides an illustration of a pollution-control differential game involving two players, and optimal solutions for both scenarios are demonstrated numerically.

The Problem Statement
All notations used through the paper are summarized in Appendix A, Table A1.In this study, we consider the optimal lake-pollution-control game model based on [15,16,18].The game involves n players (factories).Each player i manages his/her emissions policy toward the lake, such that the dynamic of the pollution stock is governed by the linear differential equation with the following initial condition: where z is the stock of pollution within a fixed natural reservoir (e.g., a lake), , n denotes the emissions rate, b i is the maximal admissible emissions rate of each player, ξ i ∈ (0, 1) is the fraction of the emitted pollutants accumulated in the reservoir from each player (e.g., factory), and δ(t) is the self-cleaning rate of the reservoir.Furthermore, we have z 0 ≥ 0 and the state z(t) is non-negative for all t ≥ 0. The self-cleaning rate for lakes is widely acknowledged to vary throughout the year.This could be due to the impact of various factors, including temperature and light fluctuations, during specific periods (e.g., over the span of a year).Considering the impact of external seasonal changes on the lake, it is reasonable to postulate that the self-cleaning rate of the lake is not a fixed constant; however, it varies as a function of time.Thus, we further assumed that the self-cleaning rate of reservoir δ(t) is represented by a periodic piecewiseconstant function, which defined as the following mathematical expression: The whole time duration D = [0, T f ] is divided into equal periods of length T, each of which is subdivided into two parts: [kT, (k + τ)T] and [(k + τ)T, (k + 1)T], where τ ∈ (0, 1) is the switching ratio and k ∈ N 0 .When the system is in the first subperiod, δ(t) = δ 1 , whereas in the second subperiod, δ(t) = δ 2 and δ 1 = δ 2 .
Note that the production is usually assumed to be linearly related to the emissionss.Therefore, the revenue function can be expressed in terms of emissionss [18].The revenue function of each player R i (v i ) is strictly concave.The marginal revenue decreases with the increasing emissions rate of each player v i (t) ∈ [0, b i ].However, zero emissions (production) is unprofitable.Each player incurs a damage cost, C i (z), for mitigating their emissions at moment t, and this cost function is increasing and convex.Thus, a quadratic revenue functional [19] and a linear cost functional form C i (z) = q i z are commonly derived to represent the instantaneous payoff of player i as Then, the general form of the integral payoff for player i is as follows: where a i > 0 is a positive constant used to transform the emissions flow to the profit flow.Coefficient q i > 0 is a positive constant, corresponding to the tax that a player must bear (e.g., an ecotax).
Considering the randomness of the game duration, we assumed that the terminal time of game T f is a random variable.After simplifying the integral payoff [20], the expected integral payoff of player i is obtained as follows: where F(t) is the cumulative distribution function of T f .

Cooperative Game with Identical Shifted Exponential Distribution
If all players share common pollution-control equipment (e.g., filters), we assume that the game duration is a realization of the random variable T f (the time of equipment failure is same for all players and coincides with the end of the game).
Assume that at the beginning of the game, all players use a new pollution-control equipment, and this equipment comes with a warranty period.The warranty period refers to a specified time frame after the sale of a product or service, during which the manufacturer or supplier assures free repairs or replacements.During this period, if the product experiences any quality issues or malfunctions, the consumer can avail free repairs/replacement services.Hence, before the warranty period, there is no risk of equipment breakdown, while after the warranty period, the equipment is subject to the risk of potential damage.The game terminates when the equipment breaks down.
Therefore, we consider a cooperative game for pollution control, wherein the randomness of game duration and consideration of the warranty period are represented by a shifted exponential distribution.
For the mathematical definition of random variable T f , we applied a shifted exponential distribution, which is given by where θ > 0 is the shift parameter of the exponential distribution from the initiation of game 0, which represents the equipment warranty period.The equipment does not encounter the risk of failure before the warranty period, and after this period, θ, the failure rate is constant over time.In addition, λ > 0 is the parameter of distribution and E(T f ) = θ + 1 λ .By substituting ( 5) into (4), the whole time horizon is split into [0, +∞] as [0, θ] ∪ (θ, ∞], according to Bellman's optimality principle, and the overall payoff functional, (4), can be rewritten as a sum: All players (factories) act together to maximize their joint payoff: By using Pontryagin's maximum principle, the cooperative solution is obtained as a result of the joint optimization problem.We solved this optimization problem separately.First, the second interval (θ, ∞] is considered with respect to players' joint payoff ] dt, and then the first interval [0, θ] is considered with respect to players' joint payoff The payoff functional of the second interval (θ, ∞] is Further, the Hamiltonian is given by Then, we have the canonical system: Let ψ 2 = e −λ(θ−t) ψ 2 .Then, we have Based on the first order derivative of the Hamiltonian, the optimal control is obtained as follows: For the payoff functional (8), which is defined in infinite horizon (θ, ∞] with infinite time-driven switches, players should consider a sustainable development.Therefore, this study aimed to find a sustainable decision-making pattern so that players can find an optimal compromise between profit and penalty (e.g., ecotax).Definition 1.The optimal control, v * i , is environmentally sustainable if it does not take on boundary values, except at isolated instances of time, i.e., This definition is based on the long-term economic interests of all players.For v * i > b i , the control (the rate of emissions) of player i remains at its maximum value.Evidently, this is not profitable because player i bears high ecotax.In another situation v * i < b i , the revenue of player i cannot exceed the cost; therefore, the player must halt the production after a certain time interval to allow the stock of pollution to decrease to a lower level.Hence, both of these situations are not acceptable for sustainable production [21].
The following theorem and its proof are close to the result presented in [11].The main difference is that the condition for the adjoint variable was not set at the initial moment but at moment θ.In addition, this value depends on the interval to which θ belongs.
Theorem 1.The solution to (10) satisfying z(0) = z 0 and ψ 2 (θ) = ψ hlc with is the unique optimal solution to (1)-(7).Here, q Proof.To obtain a periodic solution, the following equation is first solved: Let , the solution to (11) has the form , which is the same thing, solving (14), we have or, which is the same thing, m ∈ [τT, T], then ψ 2 (θ) = c 2 e p 1 θ − q p 2 .Solving (14) we have Then, the solution to (10) satisfying ψ 2 (θ) = ψ 2 (θ) = ψ hlc is given by ; therefore, this solution is periodic and bounded.The boundedness of ψ * 2 (t) and the state z(t) guarantees the fulfillment of the transver- sality condition lim for any admissible solution z(t).Thus, by considering the concavity of the Hamiltonian, we can conclude that v * is the optimal control, where The uniqueness of the obtained optimal solution follows from the concavity of the Hamiltonian.Note that the Hamiltonian is a concave function with respect to the state z and strictly concave with respect to the control v i , thus following the uniqueness of the obtained optimal solution.This completes the proof of the theorem.

Lemma 2. Optimal control v *
i is environmentally sustainable when the following inequalities are satisfied: Proof.For δ 1 > δ 2 , the dynamic of adjoint variable ψ * 2 (t) decreases in the first subperiod and increases in the second subperiod.In addition, according to Theorem 1, , − q p 2 ) < 0. Hence, to ensure that optimal control v * i is environmentally sustainable, the condition min t≥0 ψ * 2 (t) ≥ − a i b i ξ i must be satisfied.As such, we have Similarly, the condition for δ 1 < δ 2 can be obtained.
Figure 1 illustrates the dynamics of ψ 2 with different initial values in two situations: when distribution shift parameter θ is located before and after the switching time in one period.Without loss of generality, period T and switching ratio τ can be assigned randomly because these two parameters do not affect the overall result.Hence, we denote T = 1; τ = 0.5.In addition, for parameter λ, which directly determines the expectation of the game-termination time, we denote λ = 0.5.The other parameters are set as δ 1 = 0.9, δ 2 = 0.45, and q = 6.
The blue line, the initial value ψ 2 (θ) which is equal to the equilibrium value ψ hlc , denotes an equilibrium solution of the adjoint variable, which varies periodically with equal amplitude within interval [ψ * 22 , ψ * 21 ], where ψ * 2i = − q p i , i = 1, 2 are equilibrium positions for each mode, depending on the change of δ.The equilibrium positions are represented by the sky-blue dash lines, and the red lines indicate nonequilibrium solutions; their initial values slightly deviate from the equilibrium value.Thus, the application of a small deviation to the initial value can cause the solution to diverge over time, either going into +∞ or −∞.The solution can then escape from two equilibrium points [ψ * 22 , ψ * 21 ].In this way, an equilibrium solution is uniquely determined, forming an HLC as time approaches an infinite horizon with infinite time-driven switches.

First Interval-[0, θ]
The payoff functional of the first interval, [0, θ], is denoted as The Hamiltonian is given by Then, we have the canonical system: Moreover, the continuity condition ψ 1 (θ) = ψ 2 (θ) = ψ hlc is based on the continuity of the optimal control, which is directly driven by the adjoint variable, based on the fact that the switching instants depend on time, i.e., are autonomous or time-driven [9,22].
According to the first order derivative of the Hamiltonian, the optimal control is obtained as follows: As the terminal value of ψ 1 (t) is determined, we consider the backward time dynamics of ψ 1 (t), which is described by the following system: where h ∈ N 0 , with ψ 1 (θ) = ψ hlc .Each subsystem of (26) has a single stable equilibrium at Note the equilibrium points of adjoint variables in the first interval are less than those in the second interval; thus, the overall trend of adjoint variable ψ 1 (t) in the first interval, [0, θ], increases in the backward time.
Accordingly, back to the forward time point of view, the optimal control in the first interval, [0, θ], may first retain around the maximal admissible value and then decreases to the HLC.

Numerical Overall Adjoint Variable
The initial value of adjoint variable ψ hlc in the second interval is uniquely determined, and this is used as the terminal value in the first interval.Now, we can solve the cooperative adjoint variable, denoted by ψ co (t), in the whole time horizon [0, +∞].Herein, the parameter settings were similar to those in Section 3.1.
Figure 2 shows the situation when distribution shift parameter θ is located before the switching time in the second period, T ≤ θ ≤ T + τ, where Figure 3 shows the situation when distribution shift parameter θ is located after the switching time in the second period, T + τ ≤ θ ≤ 2T.
The blue lines in the figure represent for the cooperative adjoint variable, ψ co (t); the sky-blue dash lines represent the equilibrium points of two intervals, and the green lines show the distribution shift parameter θ.From Figures 2 and 3, we can conclude that regardless of whether shift instant θ is located before or after the switching time in a period, the overall trend of ψ co (t) does not change.
Consequently, the overall optimal control of player i in the whole time duration [0, ∞] is derived as:

Cooperative Game with Different Shifted Exponential Distributions
In reliability engineering, the pollution-control equipment used by each player is different and with a different warranty period.The duration of the warranty period may vary depending on factors such as product type, brand, and contract terms, etc., and it is usually measured in months or years.Hence, the i-th player's equipment fails abruptly at moment T f i as a random variable with a known probability distribution function F i (t), i = 1, n, and the equipment may break down owing to the end of its lifetime or other natural disasters.The game lasts until one of the players' equipments breaks down.Hence, this study also considered an n-player cooperative game of pollution control with different shifted exponential distributions.Furthermore, each player is assumed to possess specific equipment used in pollution control.Moreover, {T f i } n 1 are assumed to be independent random variables.Thus, the game duration is defined as T f = min{T f 1 , . . ., T f n }.
In this case, players' equipment is heterogeneous, and {T f i } n 1 adopts different shifted exponential distributions, as well as different distribution parameters, {λ i } n 1 .Without loss of generality, we assumed that θ 1 ≤ θ 2 . . .≤ θ n , where θ n is the shifted parameter of player n, and it represents the largest shifted parameter among all players.Then, we have , see [15], where F i (t) is defined as The duration of the game is a random variable with composite distribution function [23].Thus, the cumulative distribution function, F(t), with different shifted exponential distributions is denoted as The cooperative payoff functional, (4), in this case, can be rewritten as the following sum: (31) The payoff functional in the last interval, (θ n , ∞], is The Hamiltonian is given by Then, we have the following canonical system: Let ψ n = e − ∑ n i=1 [λ i (θ i −t)] ψ n ; then, we have ˙ ψ n = ∑ n i=1 q i + (δ(t) + ∑ n i=1 λ i ) ψ n .As the differential equation of ψ n (t) is similar to the identical shift case, ψ 2 (t), in the Section 2, we can still uniquely determine the equilibrium initial value, ψ n (θ n ) = ψ hlc N , that forms a unique HLC. where Next, the equilibrium value of the adjoint variable at moment θ n is uniquely determined.Thus, the dynamics of ψ i (t), i = 1, n − 1 are also uniquely determined in backward time.
Consequently, the overall cooperative adjoint variable with different shifted distributions is obtained as follows: (39)

Numerical Optimal Solution
For simplicity, we considered an example of a two-player cooperative game-theoretic model of pollution control.As the overall cooperative adjoint variable is uniquely obtained in each case, optimal solutions are shown below.

Solution of Identical Shifted Exponential Distribution
The optimal control, state trajectory, and corresponding cooperative payoff are demonstrated in Figure 4.
The solution shows that the optimal emissions of each player change periodically after the distribution shift instant θ = 1.2.Further, the stock of pollution converged to a unique HLC and stabilized, and the cooperative payoff increased and then converged.
In some cases, before time instant θ, there may exist a period of radical emissions.This could be interpreted as a more intense use of equipment by players during the warranty period.

Solution of Different Shifted Exponential Distributions
The optimal control, state trajectory, and corresponding cooperative payoff in this case are shown in Figure 5.
Figure 5 shows that after the expiration of each piece of equipment's warranty period (θ i ), the control strategy of player i tends to be increasingly conservative.After the maximal warranty period θ 2 , the control strategy of each player transforms to a periodic solution in the form of an HLC.The stock of pollution also converges to a unique HLC and stabilizes, while the cooperative payoff increases and then converges.This result indicates that the proposed control strategy can still maintain profitability when the equipment of each player is nonhomogeneous with seasonal fluctuation (δ(t)).For the dynamic switched system of pollution levels of a lake considered in this paper, based on the obtained results and coupled with the random duration of the process, we proposed that, within the framework of a cooperative game, players may adopt a production strategy transitioning from aggressive to conservative (gradually decreasing output) before the maximum warranty period θ 2 .This strategy reaches its equilibrium value at θ 2 .Moreover, after θ 2 , players adopt a production strategy following an HLC pattern.This involves the gradual increase and progressive decrease in production during periods of relatively high and low lake self-cleaning rates, respectively.Consequently, this approach ensures the attainment of an optimally controlled outcome from the context of sustainable development.
Therefore, based on the considerations of real-world issues, the obtained results exhibit uniqueness, theoretical applicability, and practical relevance.

Conclusions
In this study, we analyzed the cooperative differential game for a typical hybrid optimal pollution-control problem with two types of time-driven switches: the seasonal fluctuation(self-cleaning rate of the lake) and the shifted parameter of exponential distributions due to the random game duration.A random terminal duration problem was transformed into a combination of an infinite horizon and a finite horizon(s) optimal control problem.
Further, we first considered a scenario with identical game duration and then examined another scenario with joint probability distribution of game duration resulting from the heterogeneity of players' equipment.This paper discussed these two scenarios in detail and presented the results of each scenario analytically and numerically.Furthermore, an environmentally sustainable solution in the form of an HLC was uniquely determined for each scenario, ensuring both sustainable production revenue and environmental protection.
In a subsequent study, we will delve into the optimal control problem of each player in a noncooperative game featuring infinite time-driven switches and random game duration.In addition, we will provide comparisons of the results obtained from the noncooperative game with those obtained from the cooperative game to establish a justifiable allocation rule.

Figure 1 .
Figure 1.Dynamics of the ψ 2 with different initial values.
(a) Optimal emissions and the stock of pollution.(b) Cooperative payoff.

Figure 4 .
Figure 4. Optimal solution and cooperative payoff with identical shifted exponential distribution.

Figure 5 .
Figure 5. Optimal solution and cooperative payoff with different shifted exponential distributions.