Time-Consistency of an Imputation in a Cooperative Hybrid Differential Game

: This work is aimed at studying the problem of maintaining the sustainability of a cooperative solution in an n -person hybrid differential game. Speciﬁcally, we consider a differential game whose payoff function is discounted with a discounting function that changes its structure with time. We solve the problem of time-inconsistency of the cooperative solution using a so-called imputation distribution procedure, which was adjusted for this general class of differential games. The obtained results are illustrated with a speciﬁc example of a differential game with random duration and a hybrid cumulative distribution function (CDF). We completely solved the presented example to demonstrate the application of the developed scheme in detail. All results were obtained in analytical form and illustrated by numerical simulations.


Introduction
This contribution aims at bringing together two different concepts: the notion of sustainable cooperation from game theory and the notion of hybrid optimal control from the theory of hybrid systems. We first give a brief overview of the related results.
The concept of sustainable cooperation constitutes one of the central ingredients of cooperative game theory. Indeed, in many works it has been shown that a cooperative agreement may turn out to be unstable in the sense that the players may decide to break up the agreement at some intermediate time instant. To overcome this problem, Petrosyan, in [1], introduced the imputation distribution procedure (IDP) that has proven to be a very useful tool in the field of cooperative games. In the original paper [1], a fixed and finiteduration differential game was considered. Later, in [2], the notion of the IDP was extended to the class of differential games with infinite duration and a discounting function of a rather general form. Since then, there have been a number of papers devoted to the analysis of optimal problems with different types of discounting functions and their extension to the class of differential games (see, e.g., [3][4][5], where this problem was considered in both deterministic and stochastic settings, and [6] for the very recent results.) Although the class of hybrid control systems was introduced more than 20 years ago, most results on hybrid control were formulated within the control-related framework and did not address the game-theoretic problems. We will be mostly interested in the hybrid optimal control, the direction that was actively developed during the first decade of the 21st century. We mention the works [7,8] for an overview of the main results on the hybrid optimal control. Some more examples of optimal control and-to some extent-differential games with regime switching can be found in [9][10][11][12][13]. It was only recently that the theory of hybrid control was formally extended to game-theoretic problems. The first attempt was made in [14]; later, in [15], a special but rather general class of hybrid differential games was considered in detail.
In this paper, we consider the problem of the sustainability of a cooperative solution in an n-person hybrid differential game. Earlier, a differential game with a random time horizon and discontinuous distribution was studied in [16], where only one jump was considered, and in [17], where the method of parametrization for calculation of the optimal controls was used. In this work we extend and generalize the results of [17] and put them in the context of hybrid optimal control.
To solve the formulated hybrid optimal control problem we use the method of parametrization, which is a well-known approach for numerical solution of optimal control problems, see, e.g., [18], but is relatively rarely used in the context of differential games. See [15] for some suggestions on how to apply this method to differential games with a switching structure. In short, the whole optimal control problem is decomposed into a number of subproblems, whose initial or final states are parametrized by some variables. These problems are solved backwards by applying the Pontryagin maximum principle [19] to each interval. If the respective optimal control problems admit analytical solutions, these solutions can be further used to determine the optimal values of switching states.
The described approach was successfully applied to a particular differential game with random duration and a composite cumulative distribution function. We computed the optimal controls and cooperative solutions, and determined the imputation distribution procedure.
This paper is organized in the following way. In Section 2 we present the formulation of the problem and formally state all necessary results. In particular, we present a uniform description of the hybrid discounting function using the notion of the hybrid hazard rate and give an explicit formula for computing the IDP. In Section 3 we work out a particular numeric example aimed at illustrating the previously formulated theoretic results. The last Section of the paper presents a conclusion.

Differential Game
Consider a differential game involving n participants (players): Γ d (t 0 , x 0 ) (where the superscript d refers to discounting). Suppose the set of players is N = {1, . . . , n}. Assume that the game initiates at the moment t 0 with the initial state x 0 .
The n-person differential game with prescribed duration T − t 0 in which the integral payoff of the player i can be represented in the following form.

•
The dynamic constraint conditions for the game are given by where (1) satisfy the standard requirements of existence and uniqueness. In particular, we assume that the function g(x(t), u 1 (t), . . . , u n (t)) in (1) is continuously differentiable w.r.t. all its arguments; • The controls u i (t) are assumed to be piecewise continuous functions on the interval [t 0 , T] that belong to the set of admissible control values U i , which are consequently convex compact subsets of R k . The optimal controls are further assumed to be openloop, i.e., they are defined as functions of t.
We will consider a differential game such that the payoff function changes its structure at specific time instants. Specifically, we will consider the situation in which the discounting function changes as one goes from one interval to another. Let σ = {T 0 , . . . , T j , . . . , T r } s.t. t 0 = T 0 < T 1 < · · · < T r−1 < T r = T be an ordered sequence of time instants at which the switches occur. Then the payoff of player i is defined as follows: where L j (t) is the discounting function on the time interval t ∈ [T j ; T j+1 ), and h i is the instantaneous payoff of the ith player. For the further analysis of problems with heterogeneous discounting see [3]. We assume the following conditions to be fulfilled for L j (t), j = 0, . . . r − 1: • L 0 (T 0 ) = 1 and L r−1 (T) = 0, i.e., the discounting function is equal to 1 at the initial time and 0 at the final time; • L j (t), j = 0, . . . , r − 1 are non-increasing and continuously differentiable a.e. functions on [T j ; T j+1 ]; • The discounting functions on the neighboring intervals agree at the switching points: An example of a discounting function is given in Figure 1. Note that this Figure contains not only the discounting function for the whole game, but also its restriction to a subgame as described in Section 2.2.
First, we present an approach to construct a composite discounting function for a given set of not necessarily coordinated functions. Let a set of functions l j (t) = 1 − φ j (t), j = 0, . . . , r − 1 be given, where the functions φ j (t) satisfy the following conditions: . . , r − 1 are non-decreasing and continuously differentiable a.e. on [T j ; T j+1 ].
We have chosen to use this specific form of individual discounting functions expressed in terms of φ j to ensure that our presentation will be compatible with the subsequent exposition. However, this choice is merely a convention and can be changed as long as the individual discounting functions satisfy the required properties. We define a composite discounting function L(t) on the base of the individual functions l j (t) = 1 − φ j (t), j = 0, . . . , r − 1 while ensuring the property (3): We have previously assumed that φ j (t) exists a.e. for any j = 0, . . . , r − 1 and t ∈ [t 0 , T]. Let us define the new function λ σ (t), which is referred to as hazard rate in the reliability theory (see, e.g., [20]): Let us consider the first interval [T 0 ; T 1 ) and the hazard function λ 0 (t). Then we have By integrating both sides of (6) from T 0 to t we obtain: Finally, we can express 1 − φ 0 (t) from (7) as and the first component of the payoff function now can be represented as Similarly to (6)-(8) we obtain the general formula Substituting (9) into (5) we obtain an exponential representation for the composite discounting function L(t): Note that while λ r−1 (t) is undefined at t = T, it can be shown that lim t→T L(t) = 0. Thus we formulate the following result.
Proposition 1. The payoff of the player i on the interval [T j , T j+1 ), j = 0, . . . r − 1 can be represented in the following form: (11) and the payoff of the player i at the whole game (12) can be written as Thus the problem was reduced to the problem with different discounting factors λ j (t) on the different time intervals [T j , T j+1 ) (cf. [6]).
Taking (10) into account we can also rewrite the payoff of player i (12) in a more concise way:

Subgame
Let the game evolve and follow the trajectory x * (t). At any instantaneous time instant τ the players enter into a subgame Γ d (τ, x * (τ)), which is considered to be a new game from the position x * (τ) with duration T − τ.
To this end, we have to redefine the payoff function of the player i (13) to the payoff in a subgame. First we take into account that the discounting functionL(t) for the game (which is a subgame) on time interval [τ; T] should be normalized such thatL(τ) = 1, L(T) = 0 (see Figure 1).
Let the subgame start at τ ∈ [T 0 , T]. We definẽ Let τ ∈ [T j ; T j+1 ], j = 0, . . . , r − 1, then for t ∈ [τ; T j+1 ] we have , and for the reason that t ≥ τ we have Respectively, the discounting function in the whole subgame is defined as We have the following form of the payoff of the player i in the subgame started at τ: where we used the notation σ j = σ \ {T 0 , . . . , T j−1 }. Now we obtain:

Cooperative Differential Game
Suppose that the game is played in a cooperative circumstance. In general, cooperation means that a group of participants agrees to cooperate in a form of coalition before starting the game.
(u * 1 , . . . , u * n ) = arg max and the corresponding trajectory x * (t) obtained from (1) is said to be the optimal trajectory. We also have As the standard in cooperative games, all players in the coalition unanimously agree on a distribution mechanism (cooperative agreement) to divide the total payoff V (N, x 0 , σ). It is probable that the solution of the current game loses it optimality at some instant based on the cooperative solution that was initially chosen, which means that the time-consistency for cooperative solution is not guaranteed. Since we are investigating a dynamic setting, it is necessary to define and determine an imputation distribution procedure [21][22][23][24], which is supposed to be in accordance with the payoff form.
Beforehand, we basically recall the notion of imputation: in an n-players cooperative game, an imputation is a distribution ξ = (ξ 1 , . . . , ξ n ) among players such that the sum of its coordinates is equivalent to the maximal payoff of the grand coalition and the component ξ i distributed to the i-th player is not less than what the player would acquire through a sole game. To be specific, suppose the set of players is N and the characteristic function [24] of the game is v : 2 N −→ R, then ξ is an imputation if ξ 1 + · · · + ξ n = v(N) and ξ i ≥ v({i}) for all i = 1, . . . , n. The first property called efficiency makes sure that the imputation is a distribution method for the total gain among all players [25].
We adopt the definition of an imputation distribution procedure (IDP) first introduced in [21] for differential games with prescribed duration.

Definition 2. An imputation
We now check the time consistency property in detail. Let ϑ ∈ [T j ; T j+1 ). Then Then we obtain By taking the derivative with respect to ϑ, ϑ ∈ [T j ; T j+1 ) and noting that ξ i is a constant, we obtain . Canceling the respective terms we obtain whence the final expression for β i (ϑ) results: We now formally state this result.
Theorem 1. Let the imputation ξ t i of the game Γ d (t, x * (t)) be an absolutely continuous function of t ∈ [0, T]. If the IDP has the form for any ϑ ∈ [0, T] then ξ i is a time-consistent imputation in the game Γ d (0, x 0 ) with IDP given by (16).
Note that this formula has the same form [2,24] as the IDP computed for a problem with a single discounting function with the only difference that instead of λ(t) we use the composite hazard rate function λ σ (t). Furthermore, if we consider a game with a prescribed duration and without discounting, we have λ σ (ϑ) ≡ 0, and (18) takes the standard form as, e.g., in [23].

Description of the Model
This Section will build upon the results presented in [17]. We will skip most results that were previously reported except for those that are necessary for the understanding of the current material.
Consider a model example describing the differential game of investment into the stock of knowledge. Assume that there are N individuals investing in a public stock of knowledge [26]. Let x(t) be the stock of knowledge at time t and u i (t) be the ith agent's investment in public knowledge at time t. The dynamics of the stock of knowledge is described byẋ If each agent derives linear utility from the consumption of knowledge, the instantaneous payoff of the ith player is described by Further assume that the time instants are σ = {0,T − δ,T + δ, T 1 , T 2 , T 3 }, whereT > δ andT + δ < T 1 < T 2 < T 3 . We define the φ i (t) functions as follows: Note that the conditions formulated in Section 2 hold, i.e., φ 0 (0) = 0, φ 4 (T 3 ) = 1, and all functions φ i (τ) are continuously differentiable.
This choice can be interpreted as a problem with random duration in which the game ends at the random time instant with a known cumulative distribution function (21). For instance, this c.d.f. means that the game cannot stop beforeT − δ but then it may stop with the probability given by the uniform distribution at the time interval [T − δ;T + δ]. Let us denote this game as Γ r (t 0 , x 0 ), where the superscript r refers to random duration.

Optimal Solution
We consider the cooperative game. Assume that all players opt to cooperate and, hence, group their efforts to maximize the total payoff.
The optimization problem can be tackled using state parametrization under four continuous intervals (see [17] for details): Note that we do not consider the fifth interval [T 2 , T 3 ] because on this interval φ 4 (t) = 0 and, respectively, l 4 (t) = 0.
For the interval I 1 , we obtain the following expressions for the optimal trajectory and control: For the interval I 2 , we have the expressions for the optimal trajectory and control shown below: where For the interval I 3 , we obtain: For the interval I 4 , we have: The switching states x 1 , x 2 , x 3 are given below: Note that all optimal values of the switching states depend (either directly or indirectly) on the initial state x 0 .

Optimal Solutions for Subgames
In this section we consider the optimal solutions in a subgame starting from some time ϑ ∈ [0, T 3 ]. The formal definition of a subgame and the respective analysis of the related optimal control problems is presented in Section 2.2.
Subgame starting at ϑ ∈ [T − δ,T + δ): Consider a subgame Γ r (ϑ, x * ) such that ϑ ∈ [T − δ,T + δ). The conditional c.d.f., which corresponds to the functionL(t) defined in Section 2.2, takes the following form: The expected integral payoff for player i in this subgame is given by the following formula: Subgame starting at ϑ ∈ [T + δ, T 1 ]: Consider a subgame Γ r (ϑ, x * ) such that ϑ ∈ [T + δ, T 1 ]. The conditional c.d.f. takes the following form: The expected integral payoff for player i in this subgame is given by the following formula: Subgame starting at ϑ ∈ [T 1 , T 2 ]: Consider a subgame Γ r (ϑ, x * ) such that ϑ ∈ [T 1 , T 2 ]. The conditional c.d.f. takes the following form: The expected integral payoff for player i in this subgame is indicated in the following formula: Finally, we reach the following general expression for the expected integral payoff of player i in the subgame Γ r (ϑ, x * ), ϑ ∈ [t 0 , T 2 ]: Both the imputation and the imputation distribution procedure are illustrated in Figure 2. Note that the optimal solution undergoes a discontinuity at time t = T 1 , where the cumulative probability function is discontinuous as well. One can observe that the imputation distribution procedure is positive during the whole interval of time except a short period of time between t = 11 and t = 13. The resulting imputation and the imputation distribution function are computed in an egalitarian way in which the distribution for each player is taken in the form of the average of total payoff. Obviously, this approach can be extended to any further type of imputation.

Conclusions
The aim of this paper was not only to describe an approach of computing the IDP for a class of hybrid differential games, but also to present a worked-out example aimed at demonstrating the described procedure in full detail. The main points of the paper are as follows: (1) The differential game with hybrid discounting function can well describe a wide class of differential games, including the games with random horizon and a hybrid CDF; (2) the considered class of differential games can be described in a uniform way by using the notion of a hybrid hazard rate; (3) finally, it is possible to completely solve a problem of reasonable complexity. Our future work will be concentrated on extending the class of hybrid games.

Conflicts of Interest:
The authors declare no conflict of interest.