Mean-Field Type Games between Two Players Driven by Backward Stochastic Differential Equations

In this paper, mean-field type games between two players with backward stochastic dynamics are defined and studied. They make up a class of non-zero-sum, non-cooperating, differential games where the players’ state dynamics solve backward stochastic differential equations (BSDE) that depend on the marginal distributions of player states. Players try to minimize their individual cost functionals, also depending on the marginal state distributions. Under some regularity conditions, we derive necessary and sufficient conditions for existence of Nash equilibria. Player behavior is illustrated by numerical examples, and is compared to a centrally planned solution where the social cost, the sum of player costs, is minimized. The inefficiency of a Nash equilibrium, compared to socially optimal behavior, is quantified by the so-called price of anarchy. Numerical simulations of the price of anarchy indicate how the improvement in social cost achievable by a central planner depends on problem parameters.


Related Work
Pontryagin's maximum principle is the tool, alongside dynamic programming, to characterize optimal controls in both deterministic and stochastic settings. It can treat not only standard stochastic systems, but generalizes to optimal stopping, singular controls, risk-sensitive controls and partially observed models. Pontryagin's maximum principle yields necessary conditions that must be satisfied

Potential Applications of MFTG with Mean-Field BSDE Dynamics
In [31], a model is proposed for pedestrians groups moving towards targets they are forced to reach, such as deliveries and emergency personnel. The strict terminal condition leads to the formulation of a dynamic model for crowd motion where the state dynamics is a mean-field BSDE. Mean-field effects appear in pedestrian crowd models as approximations of aggregate human interaction, so the game would in fact be a MFTG [32]. A game between such groups is of interest since it can be a tool for decentralized decision making under conflicting interests. Other areas of application include strategies for financial investments, where often future conditions are specified [8,33] and lead to dynamic models including BSDEs. The already mentioned study [1] presents a lengthy list of applications of forward MFTGs in engineering sciences.

Paper Contribution and Outline
In this paper, control of mean-field BSDEs is extended to games between players whose state dynamics are mean-field BSDEs. Such games are in fact MFTGs, since the distribution of each player is effected by both players' choice of strategy. Our MFTG could be viewed as a game between mean-field FBSDEs, where the backward equation is the state equation, and the forward equation is pure noise. A Pontryagin's type SMP is derived, resulting in a verification theorem and conditions for existence of a Nash equilibrium. This solution approach is similar to that of [23,24,28]. The use of spike-perturbation requires minimal assumptions on the set of admissible controls, and differentiating measure-valued functions makes it possible to go beyond linear-quadratic mean-field cost and dynamics. The state BSDE is not converted to a forward optimization problem in the spirit of [25]. As a consequence, the adjoint equation in our SMP is a forward SDE. For the sake of comparison, optimality conditions for the cooperative situation are derived. In this setting, the players work together to optimize social cost, which is the the sum of player costs. The approach used is a straight-forward adaptation of the techniques used in control of SDEs of mean-field type; again, we do not need to take the route via some equivalent forward optimization problem to solve the backward MFTC problem. This cooperative game is a MFTC problem, and our result here is basically a special case of the FBSDE results in [27] or [26] mentioned above, although mean-field terms are present. Numerical simulations are done in the linear-quadratic case, which is explicitly solvable up to a system of ODEs. The examples pinpoint differences between player behavior in the game versus the centrally planned solution. The fraction between the social cost in the game equilibrium and the social cost optimum quantifies the game efficiency and was first studied in [34] for traffic coordination on networks under the name coordination ratio. This fraction was later renamed to the price of anarchy in [35]. We notice that paying a high price for using large control values, or deviating from a preferred initial position makes the Games 2018, 9, 88 4 of 26 problem stiffer, in the sense that the improvement by team optimality is decreasing, while paying a high price for mean-field related costs makes the problem less stiff.
The rest of this paper is organized as follows. The problem formulation is given in Section 2. Sections 3 and 4 deal with necessary and sufficient conditions for any Nash equilibrium and social optimum; maximum principles for the MFTG and the MFTC are derived. An LQ problem is solved explicitly in Section 5, and numerical results are presented. The paper concludes with some remarks on possible extensions in Section 6, followed by an appendix containing proofs.
(Ω, F , F, P)-the underlying filtered probability space. L(X)-the distribution of a random variable X under P.
the set of admissible controls for player i. P (X )-the set of probability measures on X . P 2 (X )-the set of probability measures on X with finite second moment. Θ i t -the t-marginal of the state-, law-and control-tuple of player i. Z F -the trace (Frobenius) norm of the matrix Z.
Let T > 0 be a finite real number representing the time horizon of the game. Consider a filtered probability space (Ω, F , {F t } t≥0 , P) on which two independent standard Brownian motions W 1 · , W 2 · are defined, d 1 -and d 2 -dimensional respectively. Additionally, y 1 T , y 2 T ∈ L 2 F T (Ω; R d ) and ξ, F 0 -measurable, are defined on the space. We assume that these five random objects are independent and that they generate the filtration F := {F t } t≥0 . Notice that ξ makes F 0 non-trivial. Let G be the σ-algebra on [0, T] × Ω of F t -progressively measurable sets. For k ≥ 1, let S 2,k be the set of R k -valued and continuous G-measurable processes X · : and let H 2,k be the set of R k -valued G-measurable processes X · such that E[ The distribution of any random variable ξ ∈ X will be denoted by L(ξ) ∈ P (X ), and −i will denote the index {1, 2}\i. Given a pair of controls (u 1 · , u 2 · ) ∈ U 1 × U 2 , consider the system of controlled BSDEs where S := R d × P (R d ) is equipped with the norm (y, µ) S := |y| + d 2 (µ), d 2 being the 2-Wasserstein metric on P (R d ). R d×(2d 1 +2d 2 ) is equipped with the trace norm Z F = tr(ZZ * ) 1/2 . Note that if X is a square integrable random variable in R d , then d 2 (L(X)) < ∞ and L(X) ∈ P 2 (R d ), the space of measures with finite d 2 -norm. and The martingale representation theorem then gives existence of a unique process plays the role of the projection and without it, Y i · would not be G-measurable. Hence the noise (W 1 · , W 2 · ) generating the filtration is common to both players, and [Z i,1 · , Z i,2 · ], i = 1, 2 is their respective reaction to it. Player i may actually be effected by all the noise in the filtration even if only some components of (W 1 is that it is a second control of player i: first she plays u i · to heed preferences on energy use, initial position etc., then she picks [Z i,1 · , Z i,2 · ] so that her path to y i T is the optimal prediction based on available information in the filtration at any given time. The component b i in (4) acts as a velocity in.
Existence and uniqueness of (4) is given by a slight variation of the results of [10], where the one-dimensional case is treated. For the d-dimensional mean-field free case, see [36].

Assumption 3. For any pair of controls
The Mean-field Type Game (MFTG): find the Nash equilibrium controls of 2.
The Mean-field Type Control Problem (MFTC): find the optimal control pair of In the game each player assumes that the other player acts rationally, i.e., minimizes cost, and picks her control as the best response to that. This leads to a set of two inequalities, characterizing any control pair (u 1 · , u 2 · ) that constitute a Nash equilibrium. In this paper, each player is aware of the other player's control set, best response function and state dynamics. Therefore, even though the decision process is decentralized, both players solve the same set of inequalities. When there is not a unique Nash equilibrium, there is an ambiguity around which equilibrium strategy to play if the players do not communicate. In the control problem, a central planner decides what strategies are played by both of the players. The central planner might just be the two players cooperating towards a common goal, or some superior decision maker. The goal is to find the control pair that minimizes the social cost J. This notion of a centrally planned/cooperative solution is related to the concept of team optimality in team problems [37]. In a team problem, the players share a common objective. A team-optimal solution is then the solution to the joint minimization of the common objective. In our case, the social cost J is a common objective in the MFTC. The Nash solution to the team problem is given by the control pair that satisfies the two inequalities In (11), each player is minimizing the social cost with respect to its marginal, under the assumption that the other player is minimizing its marginal. This is the so-called player-by-player optimality of a control pair in a team problem. Notice that if we set (9), it becomes a team problem. The solution to the MFTG (9) will then be the player-by-player optimal solution to the minimization of the social cost.
Logically, we expect the optimal social cost to be lower than the social cost in a Nash equilibrium. The ratio between the worst case social cost in the game and the optimal social cost is called the price of anarchy, and we will highlight it in the numerical simulations in Section 5 where we also observe behavioral differences between MFTG and MFTC given identical data.

Problem 1: MFTG
This section is the derivation of necessary and sufficient equilibrium conditions of (9). Given the existence of such a pair of controls, we derive the conditions by the means of a Pontryagin type stochastic maximum principle.
Assume that (û 1 · ,û 2 · ) is a Nash equilibrium for the MFTG, i.e., satisfies the following system of inequalities, Consider the first inequality, withū ε,1 chosen as a spike-perturbation ofû 1 . That is, for u · ∈ U 1 , Here, E ε is any subset of [0, T] of Lebesgue measure ε. Clearly,ū ε,1 · ∈ U 1 . When player 1 plays the spike-perturbed controlū ε,1 · and player 2 plays the equilibrium controlû 2 · , we denote the dynamics by In this shorthand notation, which will be used from now on, the difference in performance is Any derivative of f : a → f (a) will be denoted ∂ a f , indifferent of the space the function is mapping from/to.

Assumption 4. The functions
are for all t a.s. uniformly bounded, and A brief overview on differentiation of P 2 (R d )-valued functions is found in Appendix A, and the notation ( (21), this suggests that we need to introduce two first order variation processes. That is, we want Let a.s. for all t ∈ [0, T]. Lemma 1. Let Assumptions 1, 2, 4 and 5 be in force. The first order variation processes that satisfy (22) is given by the following system of BSDEs, A proof is found in the appendix. By Lemma 1, where the introduced costates p 1,j · , j = 1, 2, satisfy p 1,j 0 := ∂ y jĥ 1 0 + E * (∂ µ jĥ 1 0 ) . The notation * (∂ µ jĥ 1 0 ) is defined in (A10). Assumption 4 grants us existence and uniqueness to Equation (27) below.  where for (y i , µ i ) ∈ S, i = 1, 2, and (u 1 , u 2 , z) ∈ U 1 × U 2 × R d×(2d 1 +2d 2 ) , H 1 (ω, t, y 1 , µ 1 , u 1 , y 2 , µ 2 , u 2 , z, p 1,1 t , p 1,2 t ) Then the following duality relation holds, A proof of the lemma above is found in the appendix. We have that By the expansion (30) and Lemma 1, which yields Therefore From the last identity, we can derive necessary and sufficient conditions for player 1's best response toû 2 · . The same argument can be carried out for players 2's best response toû 1 · . Naturally, we need to impose the corresponding assumptions on player 2's control. For completeness and later reference, we state now the second player's version of Lemma 2.

Theorem 2. [Necessary equilibrium conditions] Suppose that
is an equilibrium for the MFTG and that p i,j · , i, j = 1, 2, solve (27) and (34). Then, for i = 1, 2, Sending ε to zero yields The last inequality holds for all A ∈ F s , thus By measurability of the integrand in (42), The same argument yieldŝ Theorem 3.
Proof. By assumption, δ i H i (t) ≤ 0 for any spike variation, almost surely for a.e. t. Applying the convexity and concavity assumptions in the expansion steps results in the inequality

Problem 2: MFTC
Carrying out a similar argument to that of the previous section, we find necessary optimality conditions for problem (10). Also, we readily get a verification theorem. The pair (û 1 Assume from now on that (û 1 · ,û 2 · ) is an optimal control. We study the inequality (48) when (ǔ ε,1 · ,ǔ ε,2 · ) is a spike-perturbation of (û 1 · ,û 2 · ), For simplicity, we write for and in this notation, where f t := f 1 t + f 2 t and h t := h 1 t + h 2 t . Again, we want to find first order variation processes
where p i,i · solves (27) or (34), depending on i. In fact the equilibrium is unique in this case, sinceû i · is the unique pointwise solution to (37) and p i,i · is unique, see (A25) and (A26). By Theorem 5, where p i · solves (59), is an optimal control for the linear-quadratic MFTC and it is unique.

MFTG
The equilibrium dynamics are We see that only two costate processes, p 1,1 · and p 2,2 · , are relevant here. This is a consequence of the lack of explicit dependence on u −i in the b i and f i specified in (67). Nevertheless, the running cost f i depends implicitly on u −i through player −i's state and mean.
Clearly, we need to impose the terminal conditions Calculations presented in the appendix identifies coefficients and yields the following system of ODEs determining α i (·), . . . , θ i (·), where Now (74)-(77) gives us the equilibrium dynamics. In this fashion, it is possible to solve LQ problems more general than (67).

MFTC
The optimally controlled dynamics are We make almost the same ansatz as before, assume that there exists deterministic functions By redefining Q i ,Q i , S i ,S i in (77), (76), (77) and (79), (80) gives us the optimally controlled state dynamics.

Simulation and the Price of Anarchy
Let T := 1, ξ := (y 1 0 , y 2 0 ) ∈ L 2 F 0 (Ω; R d × R d ) be preferred initial positions for player 1 and 2 respectively, and In this setup, H 1 and H 2 are negative semi-definite if r i , ρ i > 0, h i is convex if ν i > 0. In Figure 1 numerical simulations of MFTG and MFTC are presented. In (a), the two players have identical preferences, but different terminal conditions. The situation is symmetric in the sense that we expect the realized paths of player 1 reflected through the line y = 0 to be approximately paths of player 2. In (c), preferences are asymmetric and as a consequence, the realized paths are not each others mirrored images.
The central planner in a MFTC uses more information than a single player does. In fact, in our example, γ i,j (t) = 0 when i = j in the MFTG. The interpretation is that in the game, player i does not care about player −i's noise, only its mean state. For the central planner however, γ i,j is not identically zero for i = j. This can be observed in (b), where the central planner makes the player states evolve under some common noise.
In (c) we see an interesting contrast between the MFTG and the MFTC. Player 1 (black) feels no attraction to player 2 (ρ 1 = 0) while player 2 is attracted to the mean position of player 1 (ρ 2 > 0). In the game, player 1 travels on the straight line from (t, y) ≈ (0, −1) to its terminal position (t, y) = (1, −2). Player 2, on the other hand, deviates far from its preferred initial position at time t = 0, only to be in the proximity of player 1. In the MFTC, the central planner makes player 1 linger around y = 0 for some time, before turning south towards the terminal position. The result is less movement movement by player 2. Even though player 1 pays a higher individual cost, the social cost is reduced by approximately 33%. The social cost J is approximated by In (a) and (c), the outcomes of j (circles for equilibrium control, stars for optimal control) are presented along with the approximation of J (dashed lines) for N = 100. The optimal control yields the lower social cost in both cases. This is expected, the general inefficiency of a Nash equilibrium in nonzero-sum games is well known [39]. The price of anarchy quantifies the inefficiency due to non-cooperation, see for static games [34,40], for differential games [41] and for linear-quadratic mean-field type games [42]. The price of anarchy in mean-field games has been studied recently in [43,44]. It is defined as the largest ratio between social cost for an equilibrium (MFTG) to the optimal social cost (MFTC), Taking the parameter set of (a) as a point of reference, see Table 1, we vary one parameter at the time and study PoA. The result is presented in Figure 2. In the intervals studied, PoA is increasing in ρ i and T and decreasing in ν i and r i . The reason is that the players become less flexible when ν i and/or r i are increased, and the improvement a central planner can do decreases. On the other hand, an increased time horizon gives the central planner more time to improve the social cost. Also, an increased preference on attraction rewards the unegoistic behavior in the MFTC model. Table 1. Parameter values in the symmetric case (a).

Conclusions and Discussion
Mean-field type games with backward stochastic dynamics, where the coefficients are allowed to depend on the marginal distributions of the player states, have been defined in this paper. Under regularity assumptions necessary conditions for a Nash equilibrium have been derived in the form of a stochastic maximum principle. Additional convexity assumptions yielded sufficient conditions. In linear-quadric examples, player behavior in the MFTG is compared to the centrally planned solution in the MFTC. The efficiency of the MFTG Nash equilibrium, quantified by the price of anarchy, and its dependence on problem parameters is studied.
The framework presented in this paper has many possible extensions, towards both theory and applications. The theory for martingale-driven BSDEs is now standard, and one could exchange W 1 · , W 2 · throughout this paper for two martingales M 1 · , M 2 · , possibly jump processes, and approach the game with the theory of forward-backward SDEs. Indeed, the topic of games between mean-field FBSDEs seems yet unexplored. These kind of problems would have immediate applications in finance.
With our definition of U i , we have restricted ourselves to open loop adapted controls in this paper. Other information structures, such as perfect/partial state-and/or law feedback controls, lagged or noise-perturbed controls are possible. Furthermore, both players have perfect information about each other in this paper. Taking inspiration from for example [45,46], the access to information could be restricted, so that the players have only partial information on states/laws. These types of extensions are interesting both from the theoretical and applied point of view. Depending on application, the information structure of the problem will naturally change.
Exploring conditions for the MFTG to be a potential game, or an S-modular game, can open a door for applications in for example interference management and resource allocation [47][48][49] to make use of this framework.
Acknowledgments: Financial support from the Swedish Research Council (2016-04086) is gratefully acknowledged. The author would like to thank Boualem Djehiche and Salah Choutri for fruitful discussions and useful suggestions, and the anonymous reviewers, whose remarks helped to substantially improve this work.

Conflicts of Interest:
The author declares no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

BSDE
Backward stochastic differential equation FBSDE Forward-backward stochastic differential equation LQ Linear-quadratic MFTC Mean-field type control problem MFTG Mean-field type game ODE Ordinary differential equation PoA Price of Anarchy SDE Stochastic differential equation