Rational Behavior in Dynamic Multicriteria Games

: We consider a dynamic, discrete-time, game model where n players use a common resource and have different criteria to optimize. To construct a multicriteria Nash equilibrium the bargaining solution is adopted. To design a multicriteria cooperative equilibrium, a modiﬁed bargaining scheme that guarantees the fulﬁllment of rationality conditions is applied. The concept of dynamic stability is adopted for dynamic multicriteria games. To stabilize the multicriteria cooperative solution a time-consistent payoff distribution procedure is constructed. The conditions for rational behavior, namely irrational-behavior-proofness condition and each step rational behavior condition are deﬁned for dynamic multicriteria games. To illustrate the presented approaches, a dynamic bi-criteria bioresource management problem with many players is investigated.


Introduction
Mathematical models involving more than one objective [1] seem more adherent to real problems. Players often seek to achieve several goals simultaneously, which can be incomparable. These situations are typical for game-theoretic models in economics and ecology. For example, in bioresource management problems the players wish to maximize their exploitation rates and to minimize the harm to the environment. The multicriteria approach allows determining an optimal behavior in such situations.
In this paper, we consider a dynamic, discrete-time, game model where the players use a common resource and have different criteria to optimize. First, we construct a multicriteria Nash equilibrium applying the bargaining concept (via Nash products [2,3]). Then, we find a multicriteria cooperative equilibrium as a solution of modified bargaining scheme with the multicriteria Nash equilibrium payoffs playing the role of status quo points [4,5]. The presented approach guarantees that the cooperative payoffs of the players are greater than or equal to the multicriteria Nash payoffs.
As is well known, in ecological problems, cooperative behavior leads to a more sparing harvesting rate. The special importance of cooperative behavior for "common resource" exploitation was stressed by Nobel prize winer Ostrom E. [6]. The contract that satisfies the dynamic stability (time-consistency) condition [7,8] is concluded to maintain cooperative behavior. Haurie A. [9] raised the problem of instability of the Nash bargaining solution. The concept of time-consistency (dynamic stability) was introduced by Petrosyan L.A. [7]. Time consistency involves the property that, as the cooperation develops, participants are guided by the same optimality principle at each time moment and hence do x t+1 = f (x t , u 1t , . . . , u nt ) , where x t ≥ 0 denotes the quantity of resource at a time t ≥ 0, f (x t , u 1t , . . . , u nt ) is the natural growth function, and u it ∈ U i = [0, ∞) specifies the strategy (resource exploitation rate) of player i at a time t ≥ 0, i ∈ N. Denote u t = (u 1t , . . . , u nt ). Each player has k goals to optimize. The vector payoff functions of players on a finite planning horizon [0, m] have the form where g j i (u t ) ≥ 0 are the instantaneous payoff functions, j = 1, . . . , k, i ∈ N, δ ∈ (0, 1) denotes the discount factor.

Multicriteria Nash Equilibrium
We design the noncooperative behavior in dynamic multicriteria game applying the Nash bargaining products [2,3]. Therefore, we begin with the construction of guaranteed payoffs which play the role of status quo points.
The possible concepts to determine the guaranteed payoffs for the game with two players were presented in [2]. As it was demonstrated, the case in which the guaranteed payoffs are determined as the Nash equilibrium solutions is the best for the ecological system and also profitable for the players. Therefore, for the multicriteria game with n players we adopt this concept of guaranteed payoff points construction. Namely G 1 1 , . . . , G 1 n -are the Nash equilibrium payoffs in the dynamic game x, N, . . , G k n -are the Nash equilibrium payoffs in the dynamic game x, N, , where the state dynamics is described by (1). Please note that if the Nash equilibrium is not unique, one of the solutions is taken as guaranteed payoff points.
To construct multicriteria payoff functions we adopt the Nash products. The role of the status quo points belongs to the guaranteed payoffs of the players: As it is demonstrated in Appendix A, the presented approach guarantees that the noncooperative payoffs of the players are greater than or equal to the guaranteed ones (for bi-criteria game for simplicity). Hence, the scheme for noncooperative behavior construction is meaningful since multicriteria payoff functions are nonnegative.

Multicriteria Cooperative Equilibrium
The cooperative equilibrium was obtained as a solution of the Nash bargaining scheme in [23,24]. For the multicriteria dynamic games, the Nash product with the sums of players' payoffs for the criteria in which the sums of their noncooperative payoffs act as the status quo points was applied in [3,4]. In [5], a new approach to determine cooperative behavior in dynamic multicriteria game with asymmetric players was presented. More specifically, the cooperative strategies and payoffs of players are determined from the modified bargaining solution for the entire game horizon. The status quo points are the noncooperative payoffs obtained by the players using the multicriteria Nash equilibrium strategies u N t : The cooperative strategies and payoffs are constructed by solving the following problem: where J jN i are the noncooperative payoffs given by (4), i ∈ N, j = 1, . . . , k.
As was demonstrated in [5], with the presented approach, the cooperative payoffs of the players are greater than or equal to the multicriteria Nash payoffs. Hence, the conditions of individual rationality V jc i ≥ J jN i , i = 1, . . . , n, j = 1, . . . , k are fulfilled.

Dynamic Stability of Cooperative Solution
Classically, the solution optimality principle for a cooperative game includes: (1) an agreement on a set of cooperative controls, (2) a mechanism to distribute total payoff among the players. In cooperative setting players seek a set of strategies that yields a Pareto optimal solution, hence they maximize the sum of their individual payoffs. To determine the share of each player from the total payoff, that is called the imputation, some solution concepts, such as NM-solution, the core and the Shapley value are applied; see [25][26][27]. To construct the imputation of the cooperative game the characteristic function reflecting the payoff of any coalition of the players should be determined. There are some approaches how to define the characteristic function, for example α, β, γ-characteristic functions and others (see [8,25,[28][29][30] for details).
In contrast to the classical one the cooperative behavior determination approach presented above needs no distribution of the total cooperative payoff among the players. As it is easily seen, the players seek jointly a set of strategies that optimize their individual payoffs presented as the Nash products. Hence, neither the characteristic function nor the imputation is required. Please note that the problems of construction and stability of the coalitions for multicriteria dynamic games have been also considered; see [31,32]. In the case of coalition games, naturally, the characteristic function and the imputation should be determined. However, in this paper we are not concerned with coalitions' formation processes and the players' cooperative payoffs for the whole game can be calculated without any imputations as where u c t = (u c 1t , . . . , u c nt ) are the cooperative strategies determined in (5). Similarly we determine the cooperative payoffs J c i (t), i = 1, . . . , n, for every subgame started from the state x c t at a time t. As is well known, the Nash bargaining scheme is not dynamically stable [9]. To stabilize the cooperative solution in multicriteria dynamic games we adopt the idea of imputation distribution procedure ( [7,10,18,19]).

Definition 3. A vector
or in extended form, , . . . , The main idea of this scheme is to distribute the cooperative gain along the game path. Then β i can be interpreted as the payment to player i in all criteria at a time t, i = 1, . . . , n.
or in extended form, Here the players following the cooperative trajectory are guided by the same optimal behavior determination approach (5) at each current time and hence do not have any reasonable motivation to deviate from the cooperation agreement.
Proof. The proof is given for the first player, for others it is similar. Conditions (6) of Definition 3 are satisfied: Let us prove that this vector is a time-consistent payoff distribution procedure (7). It follows from the equalities and similarly for the other players.

Conditions for Rational Behavior
The conditions to maintain the cooperative (rational) behavior in dynamic games are considered. Since there can be some irrational players who can break out the cooperation, Yeung D.W.K. [20] introduced the condition that protects players against the loss of profits in this case. Definition 5. The imputation ξ = (ξ 1 , . . . , ξ n ) satisfies irrational-behavior-proofness condition [20] for all t ≥ 0, where β(t) = (β 1 (t), . . . , β n (t)) -time-consistent imputation distribution procedure and V(i, t) is the noncooperative payoff of player i, i ∈ N.
If this condition is satisfied, then each player is irrational-behavior-proof because irrational actions that break the cooperative agreement will not bring his payoff below the initial noncooperative payoff.
In the papers [21,22] for discrete-time problems, a new condition which is stronger than the Yeung's condition and is easier to verify was introduced. Definition 6. The imputation ξ = (ξ 1 , . . . , ξ n ) satisfies each step rational behavior condition if for all t ≥ 0, where β(t) = (β 1 (t), . . . , β n (t))-time-consistent imputation distribution procedure and V(i, t) is the noncooperative payoff of player i, i ∈ N.
The proposed condition offers an incentive to each player to maintain cooperation because at every step she gains more from cooperation than from noncooperative behavior.
Here, we adopt rationality conditions for dynamic multicriteria games. Since no imputation procedure is required with the approach presented above, let us rewrite the definitions.

Definition 7.
The multicriteria cooperative solution J c (t) = (J c 1 (t), . . . , J c n (t)) satisfies the irrational behavior proofness condition if for all t ≥ 0, where β(t) = (β 1 (t), . . . , β n (t)) -time-consistent payoff distribution procedure (8) and J N i (t) is the noncooperative payoff (4) of player i, i ∈ N. Or in extended form, for all t ≥ 0, where β(t) = (β 1 (t), . . . , β n (t)) -time-consistent payoff distribution procedure (8) and J N i (t) is the noncooperative payoff (4) of player i, i ∈ N. Or in extended form, For problem (1), (2) the conditions for rational behavior (11) and (12) can be rewritten as where Since with the presented cooperative behavior construction approach individual rationality conditions are satisfied, then the first parts of both inequalities are nonnegative. Hence, the each step rational behavior conditions is fulfilled if g i (u c t ) − g i (u N t ) ∀t, i ∈ N, and the irrational behavior proofness condition is true if As it easily seen, the each step rational behavior condition yields the Yeung's condition.
Next, we consider a dynamic bi-criteria model related with the bioresource management problem (harvesting) to illustrate the suggested concepts.

Dynamic Bi-Criteria Resource Management Problem
Consider a bi-criteria discrete-time dynamic bioresource management model with many players. Let n players (countries or firms) be exploiting a bioresource on a finite time horizon [0, m]. The population evolves according to the equation where x t ≥ 0 is the population size at a time t ≥ 0, ε ≥ 1 denotes the natural birth rate, and u it ≥ 0 specifies the catch strategy of player i at a time t ≥ 0, i ∈ N = {1, . . . , n}.
Each player seeks to achieve two goals: to maximize the profit from resource sales and to minimize the catching costs. It will be assumed that the players have different market prices but the same costs that depend quadratically on the exploitation rate of each player. The vector payoffs of the players on the finite planning horizon take the form where for i ∈ N, p i ≥ 0 is the market price of the resource for player i, c ≥ 0 indicates the catching cost, and δ ∈ (0, 1) denotes the discount factor.

Multicriteria Nash Equilibrium
First, we construct the guaranteed payoffs using one of the modifications from [2]. The guaranteed payoff points G 1 1 , . . . , G 1 n will be defined as the Nash equilibrium in the game N, . Applying the Bellman principle and assuming the linear form of the strategies and value functions, we obtain the Nash equilibrium strategies and the dynamics becomes Then the guaranteed payoff points take the form where Similarly, determining the Nash equilibrium in the game with the second criteria of all players , yields n more guaranteed payoffs points where In accordance with Definition 1, for designing the multicriteria Nash equilibrium of the game (15), (16) the following problem has to be solved: Considering the process starting from one-stage game to m-stage one and seeking the strategies in linear form, we obtain the multicriteria Nash equilibrium. Proposition 1. The multicriteria Nash equilibrium strategies in problem (15), (16) The players' strategy at the last stage γ N 1 is determined from the following equation

Cooperative Equilibrium
To construct the cooperative payoffs and strategies the modified bargaining scheme will be applied [5]. First, we have to determine the noncooperative payoffs as the ones gained by the players using the multicriteria Nash strategies. Then, we construct the sum of the Nash products with the noncooperative payoffs of players acting as the status quo points.
In view of Proposition 1, the noncooperative payoffs have the form In accordance with Definition 2, for designing the multicriteria cooperative equilibrium the following problem has to be solved: Considering the process starting from one-stage game to m-stage one and seeking the strategies in linear form, we construct cooperative behavior.

Proposition 2.
The multicriteria cooperative equilibrium strategies in problem (15), (16) The players' strategy at the last stage γ c 1 is determined from the following equation

Dynamic Stability and Conditions for Rational Behavior
Proposition 3. The time-consistent payoff distribution procedure in the problem (15), (16) takes the form Proof. follows from Theorem 1 and the form of cooperative strategies given in Proposition 2. Proof. The irrational-behavior-proofness condition (13) in problem (15), (16) takes the form and each step rational behavior condition becomes Let us consider each step rational behavior condition for the first criterium. Since the individual rationality conditions are fulfilled the first part of the inequality is positive. Hence, the sigh of the u c it − u N it need to be checked. In accordance with Propositions 1 and 2 The right hand side of (23) As the each step rational behavior condition is stronger than the Yeung's condition, this yields the fulfillment of irrational-behavior-proofness condition.
These parameters are typical for the fish species in Karelian lake [33]. In the papers [22,34,35] the natural growth function of the population was estimated and its linear approximation with the appropriate parameter ε is applied in this paper. It should be stressed that the price and the cost parameters do not influence the form of the players' strategies, hence can be taken as any values.
The presented figures illustrate our theoretical results. Namely Figure 1 shows the dynamics of the population size, while Figure 2 presents the players' strategies for noncooperative and cooperative cases. As one can notice cooperative behavior improves the ecological situation as it limits bioresource exploitation. The population size increases in both settings but under cooperation much quicker (from x 0 = 50,000 to 110,000).  Moreover, as Figure 2 shows the cooperative behavior is beneficial for the players. To emphasize the last conclusion the instantaneous payoffs (δ t g 1 1 (t)) for both noncooperative and cooperative settings are presented in Figure 3. As it is easily seen the players' cooperative strategies (the catch) are larger than the noncooperative ones and some convergence can be noticed at the end of the planning horizon. It is related to the fact that the asymptotic values of the players' strategies in both cases (γ N t , γ c t ) are (ε − 1)/n. The instantaneous payoffs decrease in both settings because of the discounting but under cooperation much slower (from 60,000 to 4000 monetary units). Since the player's strategy at the last stage under cooperation is larger that noncooperative one the conditions for rational behavior are fulfilled. Figure 4 shows how to distribute the cooperative gain among the game path (PDP β 1 1 (t)). It is quiet interesting that PDP differs from instantaneous payoffs very slightly. Please note that changing the number of players, time horizon and other parameters gives the similar pictures, hence are not presented.

Conclusions
The problem of dynamic stability in multicriteria dynamic games with finite horizon has been investigated. First, we have evaluated the multicriteria Nash equilibrium strategies. Second, we constructed the multicriteria cooperative strategies and payoffs via the modified bargaining scheme. We adopted the concept of dynamic stability for multicriteria dynamic games and have constructed the payoff distribution procedure. The conditions for rational behavior have been modified for dynamic multicriteria games.
The approaches presented in the paper give the possibility to find optimal solutions in various multicriteria dynamic games. To show one of the possible applications, we studied a bi-criteria discrete-time bioresource management problem, where the players differ in their aims. Multicriteria Nash and cooperative equilibria strategies have been derived analytically in linear forms. Hence, they can be directly applied to concrete populations with different values of parameters. As cooperative behavior improves the ecological situation, the dynamic stability concept has been applied to stabilize the cooperative agreement. The time-consistent payoff distribution procedure has been also derived analytically. The fulfillment of conditions for rational behavior has been proved.
The presented theoretical constructions can be applied for different management problems, where the decision maker often has several criteria to optimize. For example, to maximize the profit and to minimize the production cost or the labor involved in the manufacture. Moreover, the constructed payoff distribution procedure gives an incentive to maintain the cooperative agreement that is extremely important for management problems with common resources. Hence, the results presented in this paper can be applied in biological, economical and social game-theoretic models with vector payoffs.
Funding: This research was supported by the Shandong province "Double-Hundred Talent Plan" (No. WST2017009) and Russian Science Foundation (No. 17-11-01079) on studying the dynamic stability.

Conflicts of Interest:
The author declares no conflict of interest.
Similarly, in the case where λ 1 > 0, λ 2 = 0, we will naturally arrive in contradiction. 3. Finally, consider the case λ 1 = 0, λ 2 = 0. Similarly, it is easy to check that u 1 > 0, the minimum is achieved at an interior point and can be found via the first-order optimality condition: Here, the goal function becomes which is less than zero.
Similarly, for other players. Thus, the presented above scheme guarantees that the solution satisfies the conditions J j i ≥ G j i , i ∈ N, j = 1, 2.