Mean Field Game with Delay: a Toy Model

We study a toy model of linear-quadratic mean field game with delay. We"lift"the delayed dynamic into an infinite dimensional space, and recast the mean field game system which is made of a forward Kolmogorov equation and a backward Hamilton-Jacobi-Bellman equation. We identify the corresponding master equation. A solution to this master equation is computed, and we show that it provides an approximation to a Nash equilibrium of the finite player game.

Bank i interacts with other banks by choosing its own strategy in order to minimize its cost functional J i (α i , α −i ), which involves the average of log-monetary reserves of all the other banks. The notation α −i is a (N − 1) tuple of the α j with j = i and j ∈ {1, · · · , N}, which represents all other banks' control except bank i. The cost functional for bank i ∈ {1, . . . , N} is given by: where the running and terminal cost functions f and g are: x k , and ǫ > 0, (2.4)

Construction of a Nash equilibrium
In order to apply the dynamic programming principle to identify a closed-loop Nash equilibrium, we have to enlarge the state space by including the path of past controls, which lie in H := L 2 ([−τ, 0]; R), the Hilbert space of square integrable real functions defined on [−τ, 0], and write an infinite dimensional representation for our system. This evolution equation approach was initiated in [12] under a deterministic control setting, and later was generalized in [10] to a stochastic control problem.
Given z ∈ R × H, z 0 ∈ R, and z 1 ∈ H will denote the two components of the product space R × H. The inner product on R × H will be denoted by ·, · , and it is defined by z,z = z 0z0 + 0 −τ z 1 (s)z 1 (s)ds. (2.5) Therefore, the new state is denoted by Z i t = (Z i 0,t , Z i 1,t (s)), s ∈ [−τ, 0], which corresponds to (X i t , α i t−τ −s ) in the notation of the original system (2.1). Bank i tries to minimize its cost functional J i (α i , α −i ) defined by J i (t, z, α i , α −i ) = E T t f i (Z 0,s , α i s )ds + g i (Z 0,T )|Z t = z . (2.6) After all other players j = i have chosen their optimal strategies which minimize their cost functionals, player i's value function V i (t, z) is defined by By dynamic programming principle, the value function V i (t, z) must satisfy the following infinite dimensional HJB equation (see [9] Chapter 2 for details): The infinite dimensional representation of the original system (2.1) is given by (2.8) By minimizing the Hamiltonian in (2.7), the infimum can be computed, so that the optimal control is attained at Assuming that each player follows its own optimal strategy (α i ) 1≤i≤N , which forms a Nash equilibrium, the corresponding value function follows the HJB equation After applying the definitions of the operators A, B and Q, the HJB equation for player i becomes: As shown in [5], a solution of the system (2.11) can be found in the form for some deterministic functions E 0 (t), E 1 (t, s), E 2 (t, s, r), and E 3 (t) satisfying the following (2.14) This set of PDEs (2.13) with boundary conditions (2.14) admits a unique solution as shown in [12], and the optimal strategies take the integral form The mean field game system The mean field game theory describes the structure of a game with infinite many indistinguishable players. All players are rational, i.e., each player tries to minimize their cost against the mass of other players. This assumption implies that the running cost and terminal cost in (2.4) only depend on i-th player's state z i 0 and the empirical distribution of (z j 0 ) j =i . Denoting this empirical distribution by these costs, as in (2.4), can be re-written as As the number N of players goes to ∞, the joint empirical distribution of the states and past controls Z j t = (Z j 0,t , Z j 1,t ) converges to a deterministic limit denoted by ν(t) (with marginals denoted by µ 0 (t) and µ 1 (t)). Here, we assume that, at time 0, ν i 0 satisfies the LLN (for instance with i.i.d. Z j 0 ), and that the propagation of chaos property holds. A full justification of this property would involve generalizing the result in Section 2.1 of [3] to an infinite dimensional setting in order to take into account the past of the controls. This is highly technical but intuitively sound.
A complete proof is beyond the scope of this paper.
In the limit, a single representative player tries to minimize his cost functional, and, dropping the index i, his value function is defined as The HJB equation for the value function V (t, z) reads Then, we minimize in α to get After plugging it into (3.4), our backward HJB equation reads: Next, since we "lift" the original non-Markovian optimization problem into a infinite dimensional Markovian control problem, we are able to characterize the corresponding generator for (3.3), which is denoted by L t , where ϕ is a smooth function and the time dependency comes fromα t given by (3.5).
The derivation of the adjoint L * t of L t is given in Appendix A. Consequently, the forward Kolmogorov equation for the distribution ν(t) reads Combining (3.6) with (3.8), we obtain the mean field game system. To solve this, We make the following ansatz for the value function where we denote the mean of state m 0 := R z 0 dµ 0 (z 0 ), and the mean of past control m 1 := H z 1 dµ 1 (z 1 ). Plugging (3.9) into (3.8), multiplying both sides of (3.8) by z 0 , and integrating over R × H, we have After integration by parts, we obtain as can be seen directly using (3.9). Similarly, plugging (3.9) to (3.8), multiplying both sides of (3.8) by z 1 , and integrating (3.12) By integration by parts, we deduce Now we are ready to verify the ansatz (3.9). We first compute the derivative of the ansatz, (3.14) Then, we plug the ansatz (3.9) into (2.7), and by collecting (m 0 −z 0 ) 2 terms, (m 0 −z 0 )(m 1 −z 1 ) terms, (m 1 − z 1 ) 2 terms, and constant terms, we obtain the following system of PDEs: As for (2.13-2.14), the system (3.15-3.16) admits a unique solution.
4 The master equation

Derivatives
The master equation for this delayed game lies in an infinite dimensional space, and it requires a notion of derivatives in the space of measures in P(H).
The set P(H) of probability measure on H is endowed with Monge-Kantorovich distance where Lip(H) is the collection of real-valued Lipschitz functions on H with Lipschitz constant 1.

The master equation
where (V, ν) is a classical solution to the system of forward-backward equations (3.6) and (3.8), with initial condition ν(t 0 ) = ν 0 , and terminal condition V (T, z) = c 2 ( R y 0 dµ 0 (y 0 ) − z 0 ) 2 , respectively. Then U must satisfy the following master equation where µ 0 and µ 1 are the marginal law for Z 0 and Z 1 respectively.
On the other hand, V satisfies the HJB (2.7) equation.

Explicit solution of the master equation
It turns out that this master equation (4.6) can be solved explicitly by making the following ansatz, and we also define m 0 := R y 0 dµ 0 (y 0 ) and m 1 := H y 1 dµ 1 (y 1 ) for convenience.

Convergence of the Nash system
From the previous section, we have seen that our master equation is well posed, and we obtained an explicit solution. Furthermore, it also describes the limit of Nash equilibria of the N-player games as N → ∞. In this section, generalizing to the case with delay the results of [2] (see also [11]), we show that the solution of the Nash system (2.11) converges to the solution of the master equation (4.6) as number of players N → +∞, with a 1/N Cesaro convergence rate.
In Section 4, we find that (4.9) is a solution to the master equation (4.6). We set , denotes the joint empirical measure of z 0 and z 1 . The empirical measure of z 0 is given by k =i δ z k 0 , and the empirical measure of z 1 is given by Note that, by direct computation, for k = i, and any N ≥ 2, Proposition 5.1. For any i ∈ {1, · · · , N}, u i (t, z 0 , z 1 ) satisfies where e i (t, z) < C N , with terminal condition u i (T, z) = c 2 (z 0 − z i 0 ) 2 . This shows that (u i ) i∈{1,...,N } is "almost" a solution to the Nash system (2.11).
Proof. We compute each term in the above equation in terms of U using the relationship (5.1), and we use the fact that U is a solution to the master equation.
• From the solution (4.9) of the master equation, ∂ z U is Lipschitz with respect to the measures. Namely, Then, Theorem 5.2. Let V i be the solution to the HJB equation (2.11) of the N-player system, where N ≥ 1 fixed, and U be the solution to the master equation (4.6). Fix any (t 0 , ν 0 ) ∈ [0, T ] × P(R × H). Then for any z ∈ R N , let ν i = 1 Proof. We first apply Ito's formula to (V i ) i∈{1,...,N } , and use the fact that V i satisfies the HJB equation (2.11) for the Nash system.
Then, we apply Ito's formula to u i (t, Z t ), and use the fact that u satisfies (5.2) (5.6) Substracting (5.5) from (5.6), taking the square and applying Ito's formula again, we is bounded by C N for k = i, and e i is bounded by C N . Let (Ξ i ) i∈{1,...,N } be a family of independent random variable with common law ν 0 . By integrating (5.7) from t to T , and taking expectation conditional on Ξ, we have (5.8) By the fact that u i T = V i T , and using Young's inequality, we have (5.9) Taking average on both sides, we have Choosing Ξ uniformly distributed in (R × H) N , then by continuity of u i and V i , and the fact

Conclusion
The mean field game system acts as a characteristic of the master equation. The master equation contains all the information in the mean field game system, and it turns the forwardbackward PDE into a single equation. The solution to the mean field game system is a pair (V, ν), that is the value function and the joint law of current state and past law. The solution to the master equation is a function of (t, z, ν). Since our model is linear quadratic, we are able to solve both the mean field game system and the master equation as shown in Section 3 and Section 4, however, the techniques are not the same. The technique for solving the mean field game is that we first make an ansatz for the solution of the HJB equation. Then plugging this ansatz into the Fokker-Planck equation (3.8), we find that the means of state and past control are constant. Hence, the ansatz (3.9) can be verified. On the other hand, a notion of derivative with respect to measure is needed in order to solve the master equation. Again, we make an ansatz (4.9), which has a similar form as (3.9) but is a function of (t, z, ν), and we verify that it satisfies the master equation.
The sets of PDEs (3.15) with boundary conditions (3.16) are the same for the two problems. This is due to the fact that our model is linear-quadratic and the means of states and past controls are constants.
Last but not the least, the Nash equilibrium of the corresponding N-player game is presented in Section 2. The value function (2.12) looks similar to the value function (3.9) in the mean field game system and the solution (4.9) to the master equation. As N → ∞, the set of PDEs (2.13) becomes the same as (3.15). This implies that the solution to the mean filed game appears to be the limit of the Nash system, but generally, the convergence has been known in very few specific situations. Additionally, the solution to the master equation is also a limit to the Nash system, as shown in Section 5.
To summarize, we have extended the notion of master equation in the context of our toy model with delay, and we have shown that, as in the case without delay, this master equation provides an approximation to the corresponding finite-player game with delay. A general form of such a result, not necessarily for linear-quadratic games, is part of our ongoing research.

A Adjoint operator
Let ϕ be a smooth test function defined on R × H. In the following computation, we use the notation ϕ, ν(t) = R×H ϕ(z)dν(t, z).