Transferable Utility Cooperative Differential Games with Continuous Updating Using Pontryagin Maximum Principle

: We consider a class of cooperative differential games with continuous updating making use of the Pontryagin maximum principle. It is assumed that at each moment, players have or use information about the game structure deﬁned in a closed time interval of a ﬁxed duration. Over time, information about the game structure will be updated. The subject of the current paper is to construct players’ cooperative strategies, their cooperative trajectory, the characteristic function, and the cooperative solution for this class of differential games with continuous updating, particularly by using Pontryagin’s maximum principle as the optimality conditions. In order to demonstrate this method’s novelty, we propose to compare cooperative strategies, trajectories, characteristic functions, and corresponding Shapley values for a classic (initial) differential game and a differential game with continuous updating. Our approach provides a means of more profound modeling of conﬂict controlled processes. In a particular example, we demonstrate that players’ behavior is braver at the beginning of the game with continuous updating because they lack the information for the whole game, and they are “intrinsically time-inconsistent”. In contrast, in the initial model, the players are more cautious, which implies they dare not emit too much pollution at ﬁrst.


Introduction
Dynamic or differential games are an important subsection of game theory that investigates interactive decision-making over time. A differential game is when the evolution of the decision process takes place over a continuous time frame, and it generally involves a differential equation. Differential games provide an effective tool for studying a wide range of dynamic processes such as, for example, problems associated with controlling pollution where it is important to analyze the interactions between participants' strategic behaviors and the dynamic evolution in the levels of pollutants that polluters release. In Carlson and Leitmann [1], a direct methodfor finding open-loop Nash equilibria for a class of differential n-player games is presented.
Cooperative optimization points to the possibility of socially optimal and individually rational solutions to decision-making problems involving strategic action over time. The approach to solving a static cooperative differential game is typically done using a two-step procedure. First, one determines the collective optimal solution and then payoffs are transferred and distributed by using one of the many accessible cooperative game solutions, such as core, Shapley value, nucleolus. In the dynamic cooperative game, it must be assured that, over time, all players will comply with the agreement. This will occur if each player's profits in the cooperative situation at any intermediate moment dominate their non-cooperative profits. This property is known as time consistency and was introduced by Petrosjan (originally 1977) [2].
In order to derive equilibrium solutions, existing differential games often depend on the assumption of time-invariant game structures. However, the future is full of essentially unknown events. Therefore, it is necessary to consider that the information available to players about the future be limited. In the realm of dynamic updating, the Looking Forward Approach is used in game theory, and in differential games especially. The Looking Forward Approach solves the problem of modeling players' behavior when the process information is dynamically updating. This means that the Looking Forward Approach does not use a target trajectory, but composes how a trajectory is to be used by players and how a cooperative payoff is to be allocated along that trajectory. The Looking Forward Approach was first presented in [3]. Afterward, the works in [4][5][6][7][8][9] were published.
In [10][11][12][13][14][15], a class of non-cooperative differential games with continuous updating is considered, and it is assumed that the updating process continues to develop over time. In the paper [10], the Hamilton-Jacobi-Bellman equations of the Nash equilibrium in the game with the continuous updating are derived. The work in [11] is devoted to the class of cooperative differential games with the transferable utility using Hamilton-Jacobi-Bellman equations, construction of characteristic function with continuous updating and several related theorems. Another result related to Hamilton-Jacobi-Bellman equations with continuous updating is devoted to the class of cooperative differential games with nontransferable utility [15]. The works in [13,14] are devoted to the class of linear-quadratic differential games with continuous updating. There the cooperative and non-cooperative cases are considered and corresponding solutions are obtained. In the paper [12], the explicit form of the Nash equilibrium for the differential game with continuous updating is derived by using the Pontryagin maximum principle. In this paper, the class of cooperative game models is examined and results concerning a cooperative setting such as the construction of the notion of cooperative strategies, the characteristic function, and cooperative solution for a class of games with continuous updating using Pontryagin maximum principle are presented. Theoretical results for three players are illustrated on a classic differential game model of pollution control presented in [16]. Another potentially important application of continuous updating approach is related to the class of inverse optimal control problems with continuous updating [17]. The approach can be used for human behavior modeling in engineering systems.
The class of differential games with dynamic and continuous updating has some similarities with Model Predictive Control (MPC) theory which is worked out within the framework of numerical optimal control in the books [18][19][20][21]. The current action of control is realized by solving a limited level of open-loop optimal control problem at each sampling moment in the MPC method. For linear systems, there is an explicit solution [22,23]. However, in general, the MPC approach needs to solve several optimization problems. Another series of related papers corresponding to the stable control category is [24][25][26][27], in which similar methods are considered for the linear-quadratic optimal control problem category. However, the goals of the current paper and the paper on continuous updating methods are different: when the information about the game process is continuous updated over time, the player's behavior can be modeled. In [28,29], a similar issue is considered and the authors investigate repeated games with sliding planning horizons.
The paper is structured as follows. Section 2 starts with the initial differential game model and cooperative differential model. Section 3 demonstrates the knowledge of a differential game with continuous updating and a cooperative differential game with continuous updating by using the Pontryagin maximum principle method and obtains the definition of a characteristic function with continuous updating. It also presents the results of the theoretical portion. Section 4 gives an example of pollution control based on continuous updating. The conclusion is drawn in Section 5.

Preliminary Knowledge
Consider the differential game starting with the initial position x 0 and evolving on time interval [t 0 , T]. The equations of the system's dynamics have the forṁ where x ∈ R l is a set of variables that characterizes the state of the dynamical system at any instant of time during the play of the game, u = (u 1 , . . . , u n ), u i ∈ U i ⊂ compR k , is the control of player i. We shall use the notation The existence, uniqueness, and continuability of solution x(t) for any admissible measurable controls u 1 (·), . . . , u n (·) was dealt with by Tolwinski, Haurie, and Leitmann [30]: There exists a positive constant k such that ∀t ∈ [t 0 , T] and ∀u ∈ U for all x and x such that for any t ∈ [t 0 , T] and x ∈ X set The payoff of player i is then defined as where g i [t, x, u], f (t, x, u) are the integrable functions, x(t) is the solution of the Cauchy problem (1) with fixed open-loop controls u(t) = (u 1 (t), . . . , u n (t)). The strategy profile u(t) = (u 1 (t), . . . , u n (t)) is called admissible if the problem (1) has a unique and continuable solution. Let us agree that in a differential game, a subgame is a "truncated" version of the whole game. A subgame is a game in its own right and a subgame starts out at time instant t ∈ [t 0 , T], after a particular history of actions u(t). Denote such a subgame by Γ(x, t, T). (A remark on this notation is in order. Let the state of the game be defined by the pair (x, t) and denote by Γ(x, t, T) the subgame starting at date t with the state variable x; here, the model considered is of a finite horizon and we will have the terminal time T. If we take account of the infinite horizon, of course, one expects all corresponding value functions to depend on state and not on time.) For each (x, t, T) ∈ X × [t 0 , T] × R, we define a subgame Γ(x, t, T) by replacing the objective functional for player i and the system dynamics by respectively. Therefore, Γ(x, t, T) is a differential game defined on the time interval [t, T] with initial condition x(t) = x.

Cooperative Differential Game Model
We adopt a cooperative game methodology to solve the differential game model with transferable utility. The steps are as follows.

1.
Define the cooperative behavior or strategies and corresponding cooperative trajectory.

2.
Determine the computation of the characteristic function values.

3.
Allocate among players a total cooperative payoff, such as the allocation belongs to the kernel, the bargaining set, the stable set, the core, the Shapley value and the nucleolus (see, e.g., Osborne and Rubinstein [31] for an introduction to these concepts).
First of all, we introduce the notions of cooperative strategies for players u * = (u * i , · · · , u * n ) and the corresponding trajectory x * (t). Strategies u * (t) are called optimal strategies, i.e., a set of controls that maximizes the joint payoff of players: Suppose that the maximum in (2) is achieved on the set of admissible strategies. If we substitute u * (t) into Equation (1), we can get the cooperative trajectory x * (t).
Consequently, to determine the way to distribute the maximum total payoff among players, it is fundamental to define the concept of the characteristic function of the coalition S ⊆ N. This characteristic function shows the strength of the coalition; remarkably, it allows us to take into account the players' contributions to each coalition.
To define a cooperative game, a characteristic function must be introduced. We call V(S; x 0 , t 0 , T), S ⊂ N is a characteristic function for the initial differential game Γ(x 0 , t 0 , T). Through a characteristic function we understand a map from the set of all possible coalitions: which assigns to each coalition S the total payoff value which the players from S can guarantee when acting independently. An important property is the superadditivity of a characteristic function: The question of constructing a characteristic function is one of the main questions in cooperative game theory. Originally, the value of the characteristic function V(S) was interpreted by von Neumann and Morgenstern (1944) as the maximum guaranteed payoff of coalition S that it can gain acting independently of other players [32]. Presently, it is known that there are various means of constructing characteristic functions in cooperative games, such as α-c.f. [33], β-c.f. [34], ζ-c.f. [35], and γ-c.f. [36].
Similar to the above, for each (x * (t), t, T) ∈ X × [t 0 , T] × R, we define a cooperative subgame Γ c (x * (t), t, T) (The superscript "c" means "cooperative") along the cooperative trajectory x * (t) by replacing the objective functional for player i and the system dynamics by respectively. Therefore, Γ c (x * (t), t, T) is a cooperative differential game defined on the time interval [t, T] with initial condition x(t) = x * (t).
In this paper, we will adopt the constructive approach proposed by Petrosjan L. and Zaccour G. [37] with respect to putting together a δ-characteristic function. V(S; x * (t), t, T) denotes the strength of a coalition S for the subgame Γ(x * (t), t, T), it can be calculated in two stages: in the beginning, we are obliged to compute the Nash equilibrium strategies {u NE i } for all players i ∈ N, and second, we refrigerate the strategy for the players of N\S, and the players of the coalition S seek to maximize their joint revenue ∑ i∈S K i on u S = {u i } i∈S . Thus, the definition of the characteristic function is given: Denote by L(x * (t), t, T) the set of imputations in the game Γ(x * (t), t, T): where V({i}; x * (t), t, T) is a value of characteristic function V(S; x * (t), t, T) for coalition S = {i}. By M(x * (t), t, T) represent any cooperative solution or subset of imputation set In fact, the two extensively used cooperative solutions are the Shapley value and the core. In the following, we will consider a specific cooperative solution referred to as the Shapley value [38]. The Shapley value selects a single imputation, an n-vector denoted sh(·) = (sh 1 (·), sh 2 (·), . . . , sh n (·)), satisfying three axioms: fairness, which means similar players are treated equally; efficiency (∑ n i=1 sh i (·) = V(N; ·)); and linearity (a relatively technical axiom required to obtain uniqueness). The Shapley value is defined in a unique way and is particularly suitable for a range of applications.

Preliminary Knowledge
In order to compose the corresponding differential game with continuous updating, we will apply the classic differential game with the specified duration of T to the continuously updated differential game. Consider the family of games Γ(x, t, t + T) starting from the state x at an arbitrary time t > t 0 . Furthermore, assume that the evolution of the state of the game Γ(x, t, t + T) can be described by the ordinary differential equatioṅ whereẋ t (s) is the derivative with respect to s, x t ∈ R l are the state variables of a game that initials from time t, and u(t, s) = (u 1 (t, s), . . . , u n (t, s)), u i (t, s) ∈ U i ⊂ compR k , s ∈ [t, t + T], indicates the control profile of the game that initials from time t at the instant time s. For the game Γ(x, t, t + T), the player i's payoff function has the following form, where x t (s), u(t, s) are trajectory and strategy profile in the game Γ(x, t, t + T). The continuously updated differential games can be established in consonance with the following rules.
The instant time t ∈ [t 0 , +∞) is continuously evolving, and appropriately, players continue to attain new information about the equations of motion and payment functions in the game Γ(x, t, t + T).
The strategy vector u(t) in the continuously updated differential game is as follows, where u(t, s), s ∈ [t, t + T] are strategies in the game Γ(x, t, t + T).
Determine the trajectory x(t) in the continuously updated differential game according where u = u(t) are strategies in the continuously updated differential game (6), anḋ x(t) is the derivative with respect to t. We assume that the strategy in the continuously updated differential game achieved using (6) is either admissible, or that the uniqueness and continuity of the solution of problem (7) can be guaranteed. The existence, uniqueness, and continuity conditions of the open-loop Nash equilibrium for the continuously updated differential game have been mentioned previously.
There is the indispensable difference between a continuously updated differential game and a classic differential game with the specified duration Γ(x 0 , t 0 , T). In the case of classic game, the players are conducted by payoffs that they will finally gain within the time interval [t 0 , T]; but in the game with continuous updating, they orient themselves toward the expected payoffs (5) at each time instant t ∈ [t 0 , T], which are computed due to the information determined by the interval [t, t + T], or the information that they possess at the instant time t. The subgame of the initial game has the form Γ(x, t, T), by using the same method we can define a family subgames of the differential game with continuous updating at each t as Γ(x t,s , s, t + T), where x t,s is the state at the instant time s ∈ [t, t + T]. We will define this next.
First, we introduce the dynamic of the state: Therefore, the payoff function of player i in a subgame with continuous updating Γ(x t,s , s, t + T) has the form where x t (τ) satisfy (8) and u(t, τ), τ ∈ [s, t + T], are strategies in the subgame Γ(x t,s , s, t + T).

Cooperative Differential Game with Continuous Updating
In a cooperative setting, before starting the game, all players agree to behave jointly in an optimal way (cooperate).

The Approach to Define the Characteristic Function on the Interval [s, t + T]
We introduce the notion of characteristic function V t (S; x, s, t + T), ∀S ⊆ N defined for each subgame Γ(x t,s , s, t + T), which s ∈ [t, t + T], t ∈ [t 0 , +∞). Before introducing the characteristic function for the subgame Γ(x t,s , s, t + T), it should be mentioned that from the Equation (4), we can derive that x t,s depends on the initial point x. Therefore, we can replace x t,s by x in the previous statement, such as by using Γ(x, s, t + T) and K t i (x, s, t + T; u) to represent the subgame and the payoff function for each player i of the subgame, respectively. Thus, the characteristic function is given: , which we have already described in (4). Moreover, u S = {u i } i∈S is the strategy profile for the players in the coalition S. We assume that superadditivity conditions for the characteristic function V t (S; x, s, t + T) are satisfied:

An Algorithm to Calculate Characteristic Function with Continuous Updating and the Shapley Value
The first three steps are to compute the necessary elements in order to define the characteristic function. In the next step, the Shapley value is computed.
Step 1: Optimizing the total payment of the grand coalition with continuous updating. We shall refer to the cooperative differential game described above by Γ c (x, t, t + T), the duration of the game is T. We believe that there are no inherent obstacles to cooperation between players, and their benefits can be transferred. More specifically, we assume that before the game actually starts, the players agree to cooperate in the game.
is generalized open-loop cooperative strategies in a game with continuous updating if, for any fixed t ∈ [t 0 , +∞) strategy profile, Using the generalized open-loop cooperative strategies, it seems possible to define the solution concept for a game model with continuous updating.
) are called open-loop cooperative strategies in a game with continuous updating when defined in the following way, where u * (t, s) has defined the above.
We would like to interpret that the "intrinsically time-inconsistent" of players as follows: • u * (t) in the moment t coincides with cooperative strategies in the game defined on the interval [t, t + T], • u * (t + ) in the instant t + has to coincide with cooperative strategies in the game defined on the interval [t + , t + + T].
Trajectory x * (t) that corresponds to open-loop cooperative strategies with continuous updating u * (t) can be obtained from the systeṁ Here, x * (t) denotes a cooperative trajectory with continuous updating. Let there exist a set of controls The solution x * t (s) of the system (11) corresponding to u * (t, s) is called the corresponding generalized cooperative trajectory.
provides generalized open-loop cooperative strategies in a differential game with continuous updating to the problem in (11), if for any fixed t ∈ [t 0 , T], there exists a costate variable ψ t (s) with s ∈ [t, t + T] so that the following relations are satisfied : (

Remark 1.
Let fix t ≥ t 0 and consider game Γ c (x, t, t + T).
The motion equation is in the forṁ The payoff function of the grand coalition has the form For the optimization problem (12) and (13) Hamiltonian has the form If u * i (t, s), i ∈ N are the generalized open-loop cooperative strategies in the differential game with continuous updating; then, as stated in Definition 1, for every fixed t ≥ t 0 , u * i (t, s), i ∈ N is an open-loop cooperative strategy in game Γ c (x * (t), t, t + T). Therefore, for any fixed t ≥ t 0 , the conditions (1)-(3) of the theorem are satisfied as necessary conditions for cooperative strategies in open-loop strategies (see in [39]).
On the other hand, if for every t ≥ t 0 , the Hamiltonian H t are concave in (x t , u(t, s)), then the conditions of the theorem are sufficient for a cooperative open-loop solution [40].
In order to get the characteristic function of the grand coalition in subgame Γ(x * (t), s, t + T), substituting u * (t, τ) and x * t (τ) into the corresponding payoff function, denote V t (N; x * (t), s, t + T) as the function of the coalition N. The current-value maximized cooperative payoff V t (N; x * (t), s, t + T) can be expressed as Step 2: Computation of the generalized open-loop Nash equilibrium with continuous updating.
The problem of a non-cooperative subgame along the cooperative trajectory with continuous updating Γ(x * (t), s, t + T) can be stated as follows, , τ), . . . , u NE n (t, τ)). In this setting, the current-value Hamiltonian function can be written as where τ ∈ [s, t + T], t ∈ [t 0 , +∞). By using the Pontryagin maximum principle with continuous updating [12], we can get the open-loop Nash equilibrium { u NE i (t, τ)} i∈N , and the corresponding trajectory x NE t (τ), ∀τ ∈ [s, t + T], t ∈ [t 0 , +∞). It is then easy to derive the characteristic function of a single-player coalition as follows, for each i = 1, 2, . . . , n Step 3: Compute the characteristic function for all remaining possible coalitions with continuous updating.
Here, we need to compute only the coalitions that contain more than one player and exclude a grand coalition. There will be 2 n − n − 2 subsets obtained in the following way. We will apply the δ-characteristic function so that players of S maximize their total payoff ∑ i∈S K t i (x * (t), s, t + T; u S , u NE N\S ) along the cooperative strategy with continuous updating x * (t), while the other players, those from N\S, use generalized open-loop Nashequilibrium strategies u NE N\S = { u NE j } j∈N\S . Thus, we have a two-stage construction procedure for the characteristic function: (1) Find generalized open-loop Nash equilibrium strategies u NE i (t, τ) for all players i ∈ N, which we have found in the Step 2; (2) "Freeze" the Nash equilibrium strategies u NE j (t, τ) for players from N\S, and, as for the player from the coalition S, maximize their total payoff over u S = {u i } i∈S . In order to compute the value function of the subgame Γ(x * (t), s, t + T), ∀t ∈ [t 0 , +∞),s ∈ [t, t + T], we present the following concept.

Definition 3. A set of strategies u
provides a generalized openloop optimal strategy for coalition S ⊂ N in a subgame with continuous updating Γ(x * (t), s, t + T) when it is the solution obtained by using the Pontryagin maximum principle of the following problem The Hamiltonian function of the problem (14) has the form, ∀S (The uppercase letter "S" in the paper always denotes the coalition S, e.g., x S t , ψ t S , and u S ) ⊂ N: provides a generalized open-loop optimal strategies of the coalition S in subgame with continuous updating Γ(x * (t), s, t + T) to the problem (14) if there exists 2 n − n − 2 costate functions ψ t S (τ), where τ ∈ [s, t + T], S ⊂ N, so that, for ∀s ∈ [t, t + T], t ∈ [t 0 , +∞), the following relations are satisfied: ( Proof. Follow the proof of Theorem 1. Therefore, the agents in the coalition S will adopt the generalized open-loop optimal control u * S (t, τ) characterized in Theorem 2. Note that these controls are functions of fixed time t ∈ [t 0 , +∞) and instant time τ ∈ [s, t + T].
An illustration of the characteristic function for the coalition S ⊆ N is provided in the following way, where x S t (τ) is the trajectory at time instant τ ∈ [s, t + T] when the players in coalition S use generalized open-loop optimal strategies u * S (t, τ), while players in N\S use generalized open-loop Nash equilibrium u NE N\S (t, τ) that was already derived in Step 2. For the characteristic function in the game model with continuous updating, first, suppose that the function V t (S; x * (t), s, t + T), ∀S ⊆ N can be continuously differentiated by s ∈ [t, t + T]. Moreover, through t ∈ [t 0 , +∞) can be integrated, the characteristic function in the game model with continuous updating V(S; x * (t), t, T) is defined as follows. Definition 4. Function V(S; x * (t), t, T), t ∈ [t 0 , T], S ⊆ N is a characteristic function of the differential game with continuous updating Γ(x * (t), t, T), if it is defined as the following integral, where V τ (S; x * (τ ), s, τ + T), s ∈ [τ , τ + T], τ ∈ [t, T], S ⊆ N defined on the interval [s, τ + T] is a characteristic function in game Γ(x * (τ ), s, τ + T).
In (15), we assume that the intergal is taken with a finite time interval because in this case we can only claim that the values of the characteristic function with continuous updating are finite. Later on in the example model, we shall calculate the characteristic function and Shapley value using the final interval method. We assume that superadditivity conditions are satisfied: Step 4: Compute the Shapley value based on the characteristic function with continuous updating.
Consider again the cooperative game model Γ c (x * (t), t, T) with continuous updating. If the players are allowed to form different coalitions consisting of a subset of all players K ⊆ N. There are k players in the subset K. An imputation set of cooperative game A cooperative solution or the optimal principle is a non-empty subset of the imputation set L(x * (t), t, T) . In particular, the Shapley value sh(x * (t), t, T) = (sh 1 (x * (t), t, T), · · · , sh n (x * (t), t, T)) is an imputation whose components are defined as where K\i is the relative complement of i in K, the notion V(K; x * (t), t, T) is defined by Definition 4 and is the profit of coalition K. Meanwhile, [V(K; x * (t), t, T) − V(K\i; x * (t), t, T)] is the marginal contribution of player i to coalition K. There are many other cooperative optimality principles, for example, the von Neumann-Morgenstern solution, N-core, and nucleus. In all cases they involve some subsets of the game imputation set.

A Cooperative Differential Game for Pollution Control
Let us consider the following game proposed by Long [41]. When countries are indexed by i ∈ N, we denote that n = |N|. It is assumed that each player has an industrial production site and the production is proportional to the pollutant u i . Therefore, the player's strategy is to decide the amount of pollutants emitted into the atmosphere.

Initial Game Model
Pollution accumulates over time. We denote by x(t) the stock of pollution at time t and assume that the countries "contribute" to the same stock of pollution. For simplicity, the evolution of stock x(t) is represented by the following linear equation: where δ is a constant rate of decay, in other words, the absorption rate of pollution by nature.
In the following, we assume that the absorption coefficient δ is equal to zero: Pollution is a "public bad" because it exerts adverse affects on health, quality of life, and productivity. We assume that these adverse effects can be represented by having x as an argument of the instantaneous social welfare function F i , with negative derivative: In each country, aggregate social welfare is taken to be the integral of the instantaneous social welfare. Thus, the payoff of the player i can be formulated as follows, For tractability, the function F i is often assumed to take the separable form: where R i (u i ) may be thought of as the utility of the benefit, and D i (x) as the "disutility" caused by pollution. Following standard practice, we take it that R i (u i ) is strictly concave and increasing in u i , and that D i (x) is convex and increasing in x. The possibility that D i is linear is not ruled out.
We assume that the environmental damage cost of player i caused by the pollution stock is D i (x) = d i x and the damage cost D i (x) increases convexly. In the environmental economics literature, the typical assumption is that the production income function of player i can be expressed as a function of emissions, namely, R i (u i (t)) = b i u i − 1 2 u 2 i , satisfying R i (0) = 0, where b i and d i are positive parameters. For the above benefit function to have a concave increase in emissions, we impose the restriction u i (t) ∈ (0, b i ).
Suppose that the game is played in a cooperative scenario in which players have the opportunity to cooperate in order to achieve maximum total payoff: To solve the optimization problem in (19) and (18), we invoke the Pontryagin maximum principle to characterize the solution as follows. Obviously, these are linear state games (These are games for which the system dynamics and the utility functions are polynomials of degree 1 with respect to the state variables and which satisfy a certain property (described below) concerning the interaction between control variables and state variables. We call this class of games linear state games.). This shows that these games have the property that their open-loop Nash equilibrium are Markov perfect. The class of linear state games has a very useful property. The linearity in the state variables together with the decoupled structure between the state variables and the control variables implies that the open-loop equilibrium is Markov perfect and that the value functions are linear in the state variables.
It is obvious to demonstrate that the optimal emissions control of player i for an initial differential game model is given by To obtain the cooperative state trajectory for the initial differential game, it suffices to insert u i (t) in (20) into the dynamics and to solve the differential equation to get

A Pollution Control Game Model with Continuous Updating
In the game Γ(x, t, t + T), the dynamics of the total amount of pollution x t (s) is described byẋ in which we assume that the absorption coefficient corresponding to the natural purification of the atmosphere is equal to zero. The instantaneous payoff of i-th player is defined as Due to decontamination, each player is compelled to bear the cost. Therefore, the instantaneous utility of the i-th player is equal to Thus the payoff of the player i is defined as where u i = u i (t, s) is the control of the player i at the instant time s ∈ [t, t + T], x t = x t (s) is the pollution accumulation at the same time s. Therefore, the payoff function of the player i in the subgame with continuous updating Γ(x, s, t + T) is given by where x t (τ), u(t, τ), and τ ∈ [s, t + T] are both the trajectory and strategies in game Γ(x, s, t + T). The dynamics of the state is given bẏ Step 1: Optimizing the total payment of the grand coalition with continuous updating. Consider the game in a cooperative form. This means that all players will work together to maximize their total payoff. We seek the optimal profile of strategies u * (t, s) = ( u * 1 (t, s), ..., u * n (t, s)) such that ∑ n i=1 K t i → max u 1 ,u 2 ,...,u n . The optimization problem is as follows, ..,u n s.t. x t (s) satis f ies (22). (24) In order to deal with the problem (24), we use the classical Pontryagin maximum principle. The corresponding Hamiltonian is The first order partial derivatives w.r.t. u i 's are and the Hessian matrix ∂ 2 H t ∂u 2 (s, x t , u, ψ t ) is negative definite, all at once, we can conclude that the Hamiltonian H t is concave w.r.t. u i . Here, we obtain the cooperative strategies: Considering the Pontryagin's maximum principle, when dealing with the costate variableψ Finally, the form of the cooperative strategies is and from (22) we get the optimal (cooperative) trajectory: According to the procedure (10), we construct open-loop optimal cooperative strategies with continuous updating: After substituting u * i (t) into the differential Equation (18), we can arrive at the optimal cooperative trajectory x * (t) with continuous updating: The results of the comparison of the cooperative strategies, corresponding trajectories between initial differential game model and the differential game with continuous updating obtained are graphically shown in Figures 1 and 2. From Figure 1 we can see that the optimal control with continuous updating is more stable than the optimal control in the initial game model. We can also see that, from the time t = 4, the optimal control in the initial game is greater than it with continuous updating, which means players should increase the pollution emissions into the atmosphere in the initial differential game model, a harmful result. This occurs because in the initial game model, players have the whole information of the game on the interval [t 0 , T], players are more cautious, and they dare not emit too much pollution at first. However, in real life, it is impossible to have the information for the whole time interval. Therefore, we consider the game with continuous updating, at each time instant t, players have the information only on [t, t + T]. In the case of continuous updating, the players are brave enough to emit more pollution because of lacking the information for the whole game. We can see from Figure 2 that, starting from t = 0 to t = 8, the pollution accumulation in the initial game model is less than the model with continuous updating. Because in the initial game model, players are more knowledgeable, they know the information from the whole time interval, which leads to lower pollution accumulation because the players are cautious. Starting from time t = 8, pollution with continuous updating is less than pollution in the initial game because the knowledge for players in the model with continuous updating is close to the initial game model as time goes on. Using the continuous updating method can help us to make our modeling more consistent with the actual situation.
Next, for a given subgame Γ(x * (t), s, t + T) of a differential game with continuous updating Γ(x * (t), t, t + T), the characteristic function for the grand coalition N is given by V t (N; x * (t), s, t + T), which can be represented as (25). Therefore, we can get the value function of the grand coalition N where Note that in (29), x * (t) represents the cooperative pollution with continuous updating at the time t.
For our problem, we can also use the dynamic programming method based on the Hamilton-Jacobi-Bellman equation. It is straightforward to verify the Bellman function of the form V t = A(t, s)x t (s) + B(t, s), and we get the same result as Pontryagin maximum principle.
Step 2: The computation of the generalized open-loop Nash equilibrium with continuous updating.
The Hamiltonian for each player i = 1, 2, . . . , n is its first-order partial derivatives w.r.t. u i 's are and the Hessian matrix is the negative definite whence we conclude that the Hamiltonian H t i is concave w.r.t. u i . We obtain optimal controls As for the subgame start at time instant s ∈ [t, t + T], we can easily derive the corresponding trajectory (for the Nash equilibrium case) of subgame Γ(x * (t), s, t + T) along the cooperative trajectory, in other words x t,s = x * t (s) is trajectory, then we can substitute the x * t (s) we have already obtained in (26) to get the state variable x S t (τ) that depends on x = x * (t).
The respective value of the characteristic function V t (S; x * (t), s, t + T) is According to Definition 4, the characteristic function of a differential game with continuous updating has the following form, Check the superadditivity condition (16) for constructed characteristic function V(S; x * (t), t, T). It turns out that for any S, P ⊆ N and S ∩ P = ∅, let |S| = k ≥ 1,|P| = m ≥ 1 the following holds, Thus, the δ-characteristic function V(S; x * (t), t, T) is a superadditive function without any additional conditions applied to the parameters of the model.
In the following figure, we will compare the characteristic function for the grand coalition N between the initial game model and a differential game with continuous updating.  Figure 3 demonstrates the reason accounting for why the value of a characteristic function in the initial model is greater than that of continuous updating is that the complexity of the information within a continuous updating setting can reduce the effectiveness of the coalition. It should be noted that the continuous updating case is more realistic. We can conclude that the payoff of the coalition decreases because, as time goes on, pollution accumulates in the air. The player's payoff depends on levels of pollution and payoff decreases as pollution increases. It should also be noted that the coalition's effectiveness decreases in the initial game model at a faster rate than it does with continuous updating.
Step 4: Compute the Shapley value based on the characteristic function with continuous updating.
Any of the known principles of optimality can be applied to find a cooperative solution. First of all, the notion ∑ d j d l (Here we should note that d k d j = d j d k .) represents the interaction of cost among players. Now, consider the cooperative solution of a differential game with continuous updating. According to procedure (17), we construct the Shapley value for any i ∈ N with continuous updating using the characteristic function with continuous updating of the auxiliary subgame and get The graphic representation of the Shapley value for subgames with continuous updating and the initial game model along the optimal cooperative trajectory x * (t) is demonstrated in Figure 4.  Figure 4 shows that if we consider the problem in a more realistic case (continuous updating), a player with continuous updating can get less allocation from the coalition than they get from the initial game model. This is based on the fact that, at an early stage, the pollution emitted into the atmosphere is more than in the initial game model, and in the latter stage the pollution with continuous updating is less than in the initial game model. Thus, starting with t = 0, players get more in the initial game model, but in the same period, they all get 0 in the end. This shows that as pollution intensifies the benefits countries receive from its attendant production gradually decrease.

Conclusions
In this paper, we presented the detailed consideration of a cooperative differential game model with continuous updating based on Pontryagin maximum principle, where the decision-maker updates his/her behavior based on the new information available which arises from a shifting time horizon. The characteristic function with continuous updating obtained by using the Pontryagin maximum principle for the cooperative case is constructed. The results show that the δ-characteristic function computed for the game is superadditive and does not have any other restrictions on the model's parameters. The concept of the Shapley value as a cooperative solution with continuous updating is demonstrated in an analytic form for pollution control problems. Ultimately, considering the example of n-player pollution control, optimal strategies, the corresponding trajectory, the characteristic function, and the Shapley value with continuous updating are conceived for the proposed application and graphically compared for their effectiveness. We showed simulation results that show the applicability of the approach.
The practical significance of the work is determined by the fact that the real life conflict controlled processes evolve continuously in time and the players usually are not or cannot use full information about it. Therefore, it is important to introduce the type of differential games with information updating to the field of game theory. Another important practical contribution of the continuous updating approach is the creation of a class of inverse optimal control problems with continuous updating [17]. Problems that can be used to analyze a profile of the human in the human-machine type of engineering systems. The results are illustrated on the model of a driver assistance system and are applied to the real driving data from the simulator located in the Institute of Control Systems, Karlsruhe Institute of Technology. Our method can provide more in-depth modeling of human engineering systems.