Linear – Quadratic Mean-Field-Type Games : A Direct Method

In this work, a multi-person mean-field-type game is formulated and solved that is described by a linear jump-diffusion system of mean-field type and a quadratic cost functional involving the second moments, the square of the expected value of the state, and the control actions of all decision-makers. We propose a direct method to solve the game, team, and bargaining problems. This solution approach does not require solving the Bellman–Kolmogorov equations or backward–forward stochastic differential equations of Pontryagin’s type. The proposed method can be easily implemented by beginners and engineers who are new to the emerging field of mean-field-type game theory. The optimal strategies for decision-makers are shown to be in a state-and-mean-field feedback form. The optimal strategies are given explicitly as a sum of the well-known linear state-feedback strategy for the associated deterministic linear–quadratic game problem and a mean-field feedback term. The equilibrium cost of the decision-makers are explicitly derived using a simple direct method. Moreover, the equilibrium cost is a weighted sum of the initial variance and an integral of a weighted variance of the diffusion and the jump process. Finally, the method is used to compute global optimum strategies as well as saddle point strategies and Nash bargaining solution in state-and-mean-field feedback form.


Introduction
In 1952, Markowitz proposed a paradigm for dealing with risk issues concerning choices which involve many possible financial instruments [1].Formally, it deals with only two discrete time periods (e.g., "now" and "3 months from now"), or equivalently, one accounting period (e.g., "3 months").In this scheme, the goal of an Investor is to select the portfolio of securities that will provide the best distribution of future consumption, given their investment budget.Two measures of the prospects provided by such a portfolio are assumed to be sufficient for evaluating its desirability: the expected value at the end of the accounting period and the standard deviation or its square, the variance, of that value.If the initial investment budget is positive, there will be a one-to-one relationship between these end-of-period measures and comparable measures relating to the percentage change in value, or return over the period.Thus, Markowitz' approach is often framed in terms of the expected return of a portfolio and its standard deviation of return, with the latter serving as a measure of risk.A typical example of risk in the current market is the evolution of the prices [2,3] of the cryptocurrencies (bitcoin, litecoin, ethereum, dash, etc).The Markowitz paradigm (also termed as mean-variance paradigm) is often characterized as dealing with portfolio risk and (expected) return [4,5].We address this problem when several entities are involved.Game problems in which the state dynamics is given by a linear stochastic system with a Brownian motion and a cost functional that is quadratic in the state and the control are often called linear-quadratic-Gaussian (LQG) games.For the continuous time LQG game problem with positive coefficients, the optimal strategy is a linear state-feedback strategy which is identical to an optimal control for the corresponding deterministic linear-quadratic game problem, where the Brownian motion is replaced by the zero process.Moreover, the equilibrium cost only differs from the deterministic game problem's equilibrium cost by the integral of a function of time.For LQG control and LQG zero-sum games, it can be shown that a simple square completion method provides an explicit solution to the problem.It was successfully developed and applied by Duncan et al. [6][7][8][9][10][11] in the mean-field-free case.Interestingly, the method can be used beyond the class of LQG framework.Moreover, Duncan et al. extended the direct method to more general noises, including fractional Brownian noises and some non-quadratic cost functionals on spheres, torus, and more general spaces.
The main goal of this work is to investigate whether these techniques can be used to solve mean-field-type game problems which are non-standard problems [12].To do so, we modify the state dynamics to include mean-field terms which are (i) the expected value of the state, (ii) the expected value of the control-actions, in the drift function.We also modify the instant cost and terminal cost function to include (iii) the square of the expected values of the state and (iv) the square of the expected values of the control action.When the state dynamics and/or the cost functional involve a mean-field term (such as the expected value of the state and/or expected values of the control actions), the game is said to be an LQG game of mean-field type, or MFT-LQG.We aim to study the behavior of such MFT-LQG game problems when mean-field terms are involved.If in addition the state dynamics is driven by a jump-diffusion process, then the problem is termed as an MFT-LQJD game problem.
For such game problems, various solution methods such as the stochastic maximum principle (SMP) ( [12]) and the dynamic programming principle (DPP) with Hamilton-Jacobi-Bellman-Isaacs equation and Fokker-Planck-Kolmogorov equation have been proposed [12][13][14].Most studies illustrated these solution methods in the linear-quadratic game with an infinite number of decision-makers [15][16][17][18][19][20][21].These works assume indistinguishability within classes, and the cost functions were assumed to be identical or invariant per permutation of decision-makers indexes.Note that the indistinguishability assumption is not fulfilled for many interesting problems, such as variance reduction or and risk quantification problems, in which decision-makers have different sensitivity towards the risk.One typical and practical example is to consider an energy-efficient multi-level building in which every resident has its own comfort zone temperature and aims to use the Heating, ventilation, and air conditioning (HVAC) system to be closer to its comfort temperature and to maintain it within its own comfort zone.This problem clearly does not satisfy the indistinguishability assumption used in the previous works on mean-field games.Therefore, it is reasonable to look at the problem beyond the indistinguishability assumption.Here we drop these assumptions and solve the problem directly with an arbitrary finite number of decision-makers.In the LQ-mean-field-type game problems, the state process can be modeled by a set of linear stochastic differential equations of McKean-Vlasov, and the preferences are formalized by quadratic cost functions with mean-field terms.These game problems are of practical interest, and a detailed exposition of this theory can be found in [7,12,[22][23][24][25].The popularity of these game problems is due to practical considerations in signal processing, pattern recognition, filtering, prediction, economics, and management science [26][27][28][29].
To some extent, most of the risk-neutral versions of these optimal controls are analytically and numerically solvable [6,7,9,11,24].On the other hand, the linear quadratic robust setting naturally appears if the decision makers' objective is to minimize the effect of a small perturbation and related variance of the optimally controlled nonlinear process.By solving a linear-quadratic game problem of mean-field type, and using the implied optimal control actions, decision-makers can significantly reduce the variance (and the cost) incurred by this perturbation.The variance reduction and minimax problems have very interesting applications in risk quantification problems under adversarial attacks and in security issues in interdependent infrastructures and networks [27,[30][31][32][33]. Table 1 summarizes some recent developments in MF-LQ-related games.

Feature
State of-the-Art This Work Jump yes Diffusion [15,16] yes Mean-Field Type [12,31,32] yes One decision-maker [12] yes Two or more decision-makers [31,32] yes State-MF [12] yes Control-Action-MF yes Bargaining yes Anonymity [15,16] relaxed Indistinguishability [15,16] relaxed In this work, we propose a simple argument that gives the best-response strategy and the Nash equilibrium cost for a class of MFT-LQJD games without the use of the well-known solution methods (SMP and DPP).We apply the square completion method in the risk-neutral mean-field-type game problems.It is shown that this method is well-suited to MF-LQJD games, as well as to variance reduction performance functionals.Applying the solution methodology related to the DPP or the SMP requires an involved (stochastic) analysis and convexity arguments to generate necessary and sufficient optimality criteria.We avoid all of this with this method.

Contribution of This Article
Our contribution can be summarized as follows.We formulate and solve a mean-field-type game described by a linear jump-diffusion dynamics and a mean-field-dependent quadratic or robust-quadratic cost functional for each generic decision-maker.The optimal strategies for the decision-makers are given semi-explicitly using a simple and direct method based on square completion, suggested by Duncan et al. (e.g., [7][8][9]) for the mean-field-free case.This approach does not use the well-known solution methods such as the stochastic maximum principle and the dynamic programming principle with Hamilton-Jacobi-Bellman-Isaacs equation and Fokker-Planck-Kolmogorov equation.It does not require extended backward-forward integro-partial differential equations (IPDEs) to solve the problem.In the risk-neutral linear-quadratic mean-field-type game, we show that there is generally a best response strategy to the mean of the state, and provide a sufficient condition of existence of mean-field Nash equilibrium.We also provide a global optimum solution to the problem in the case of full cooperation between the decision-makers.This approach gives a basic insight into the solution by providing a simple explanation for the additional term in the robust Riccati equation, compared to the risk-neutral Riccati equation.Sufficient conditions for the existence and uniqueness of mean-field equilibria are obtained when the horizon lengths are small enough and the Riccati coefficient parameters are positive.The method (see Figure 1) is then extended to the linear-quadratic robust mean-field-type games under disturbance, formulated as a minimax mean-field-type game.
Only a very limited amount of prior work seems to have been done on the MF-LQJD mean-field-type game problems.As indicated in Table 1, the jump term brings a new feature to the existing literature, and to the best of our knowledge, it is the first work that introduces and provides a bargaining solution [34] in mean-field-type games using a direct method.
The last section of this article is devoted to the validation of the novel equations derived in this article using other approaches.We confirm the validity of the optimal feedback strategies.In the Appendix we provide a basic example illustrating the sub-optimality of the mean-field game approach (which consists of freezing the mean-field term) compared with the mean-field-type game approach (in which an individual decision-maker can significantly influence the mean-field term).

Guess Functional
Itô's Formula Square Completion Process Identification

Structure
A brief outline of the article follows.The next section introduces the non-cooperative mean-field-type game problem and provides its solution.Then, the fully-cooperative game and the bargaining problems and their solutions are presented.The last part of the article is devoted to adversarial problems of mean-field type.

Notation and Preliminaries
Let T > 0 be a fixed time horizon and (Ω, F , F B,N , P) be a given filtered probability space on which a one-dimensional standard Brownian motion B = {B(t)} t≥0 is given, Ñ(dt, dθ) = N(dt, dθ) − ν(dθ)dt is a centered jump process with Lévy measure ν defined over Θ.The filtration F = {F B,N t , 0 ≤ t ≤ T} is the natural filtration generated by the union {B, N} augmented by P−null sets of F .The processes B and N are mutually independent.In practice, B is used to capture smaller disturbance and N is used for larger jumps of the system.
We introduce the following notation: denotes the expected value of the random variable X(t).
An admissible control strategy u i of decision-maker i is an F-adapted and square-integrable process with values in a non-empty subset U i of R. We denote the set of all admissible controls by U i :

Non-Cooperative Problem
Consider n risk-neutral decision-makers (n ≥ 2) and let L i (u 1 , . . ., u n ) be the objective functional of decision-maker i, given by Then, the best-response of decision-maker i to the process (u ) solves the following risk-neutral linear-quadratic mean-field-type control problem where Ex 2 (0) < +∞, q i (t) ≥ 0, q i (t) + qi (t) ≥ 0, r i (t) > 0, r i (t) + ri (t) ≥ 0, and a(t), ā(t), b i (t), σ(t) are real-valued functions, and where E[x(t)] is the expected value of the state created by all decision-makers under the control action profile (u 1 , . . ., u n ) ∈ ∏ n j=1 U j .The method below can handle time-varying coefficients.For simplicity, we impose an integrability condition on these coefficient functions over [0, T]: Under condition (3), the state dynamics of (2) has a solution for each u = (u 1 , . . ., u n ) ∈ ∏ n j=1 U j .Note that we do not impose boundedness or Lipschitz conditions (because quadratic functionals are not necessarily Lipschitz).
Definition 1 (BR i : Best Response of decision-maker i).Any strategy u * i (•) ∈ U i satisfying the infimum in ( 2) is called a risk-neutral best-response strategy of decision-maker i to the other decision-makers strategy u −i ∈ ∏ j =i U j .The set of best-response strategies of i is denoted by BR i : ∏ j =i U j → 2 U i , where 2 U i denotes the set of subsets of U i .
Note that if b i = 0 = r i , there are multiple optimizers of the best-response problem.
] is called a Nash equilibrium of the LQ-MFJD game above.
The risk-neutral mean-field-type Nash equilibrium problem we are concerned with is to find and characterize the processes (x * , u * , E[x * ], E[u * ]) such that for every decision-maker i, u * i is an optimizer of the best response problem (2) and the expected value of the resulting common state E[x * ] created by all the decision-makers coincides with x.This means that an equilibrium is a fixed-point of the best response correspondence BR = (BR 1 , . . ., BR n ), where BR i : ∏ j =i U j → 2 U i is the best-response correspondence of decision-maker i.
We rewrite the expected objective functional and the state coefficients in terms of x − x and x : Note that the expected value of the first term in the integral in L i can be seen as a weighted variance var of the state, since qi (t)E[(x(t) − E[x(t)]) 2 ] = qi (t)var(x(t)).Taking the expectation of the state dynamics, one arrives at the deterministic linear dynamics The direct method consists of writing a generic structure of the cost functional, with unknown deterministic functions to be identified.Inspired from the structure of the terminal cost function, we try a generic solution in a quadratic form.Let , where α, β, γ, δ are deterministic functions of time, such that At the final time T, one can identify α i (T) = q i (T), Recall that Itô's formula for the jump-diffusion process is where D is the drift term We compute the derivative terms: Using ( 7) in ( 6) and taking the expectation yields where we have used the following equalities: We compute the gap between E[L i ] and E[ f i (0, x(0))] as

Best Response to Open-Loop Strategies
In this subsection, we compute the best-response of decision-maker i to open-loop strategies (u j ) j =i .The information structure for the others players is limited to time and initial point; i.e., the mappings (u j ) j =i are measurable functions of time (and do not depend on x) and initial point x 0 . where The best response of decision-maker i to the open-loop strategies (u j ) j =i is and its expected value is ūi = − (b i + bi ) r i +r i (β i x + γ i ), where α i , β i , γ i are deterministic functions of time t.Clearly, the best response to open-loop strategies is in state-and-mean-field feedback form.Here the mean-field feedback terms are the expected value of the state E[x(t)] and the expected value of the control action E[u i (t)].
Therefore, we examine optimal strategies in state-and-mean-field feedback form in the next section.

Feedback Strategies
The information structure for feedback solution is as follows.The model and the objective functions are assumed to be common knowledge.We assume that the state is of perfect observation.We will show below that the mean-field term is computable (via the initial mean state and the model).If the other decision-makers play their optimal state-and-mean-field feedback strategies, then the functions γ 1 , . . ., γ n are identically zero at any given time.We compute again E[L i − f i (0, x(0))] and complete the squares using the elements of {x − x, x}.
where we have used the following square completions: It follows that provides a mean-field Nash equilibrium in feedback strategies.These Riccati equations are different from those of open-loop control strategies.The coefficient of the coupling terms β i β j , α i α j are different, reflecting the coupling through the state and the mean state.Notice that the optimal strategy is in state-and-mean-field feedback form, which is different from the standard LQG game solution.As ā, b, r, q vanish in ( 15), one gets the Nash equilibrium of the corresponding stochastic differential game in closed-loop strategies with α i = β i , and u i becomes mean-field-free.When the diffusion coefficient σ and the jump rate µ vanish, one obtains the noiseless deterministic game problem, and the optimal strategy solution will be given by the equation in β i because x − x = 0 in the deterministic case.
How to feedback the mean-field term E[x(t)]?Here the mean-field term can be explicitly computed if the initial mean state x(0) is given and the model known:

Fully-Cooperative Solutions
In this section, we examine the global optimum and Nash bargaining solution [34] of the game.

Global Optimum
We now consider the fully cooperative scenario where all the decision-makers decide jointly to optimize a single global objective L 0 := ∑ i L i given by Following the same methodology as above with q 0 = ∑ i q i , q0 = ∑ i qi and When the coefficients are constant (in time), α 0 , β 0 are explicitly given by ( The global optimum cost in the fully-cooperative case is and is less than the total cost at the Nash equilibrium, which is This loss of efficiency of Nash equilibria was analyzed in [35], and is often termed as the price of anarchy [36,37].

Nash Bargaining Solution
Mean-field-type bargaining theory deals with the situation in which decision-makers can realize-through cooperation-other better outcomes than the one which becomes effective when they do not cooperate.This non-cooperative outcome is called the threatpoint L NE = (L NE 1 , . . ., L NE n ).The question is which outcome might the decision-makers possibly agree to.Let V be the set of feasible outcomes of the benefit of bargaining [34].We assume that if the agents unanimously agree on a point v = (v 1 , . . ., v n ) ∈ V, they obtain v. Otherwise, they obtain L NE = (L NE 1 , . . ., L NE n ).This presupposes that each decision-maker can enforce the threatpoint when he does not agree with a proposal.Generically, what decision-maker i can guarantee is inf u i sup u −i L i , which is non-admissible in the quadratic setting.The outcome v the decision-makers will finally agree on is called the solution of the bargaining problem.Therefore, we have chosen the non-cooperation solution when there is a disagreement.The Nash bargaining solution selects for a given set V the point at which the product of gains from L NE is maximal.
Since the function v → ∏ k∈N v k is non-convex, Problem ( 21) is non-convex.Here we exploit the convexity of the functional u → w, L(u) for any given w = (w 1 , . . ., w n ) ∈ R n ++ such that ∑ n i=1 w i = 1 to reach any point in the Pareto frontier of the game.The maximization (in w) of the product This is equivalent to We set y i := . Then, it follows that .
Following the same methodology as above, we obtain:

LQ Robust Mean-Field-Type Games
We now consider a robust mean-field-type game with two decision-makers.Decision-maker 1 minimizes with respect to u 1 and Decision-maker 2 maximizes with respect to u 2 .The minimax problem of mean-field type is given by where the objective functional is The risk-neutral robust mean-field-type equilibrium problem we are concerned with is to characterize the processes (x * , u * 1 , u * 2 , E[x * ]) such that for every decision-maker, ū * 1 is the minimizer and u * 2 is the maximum of the best response problem (24), and the expected value of the resulting common state created by all the decision-makers is Below, we solve Problem (24) for r 1 (t) > 0, r1 (t) > 0, r 2 (t) < 0, r2 (t) < 0.
The value of the game is The latter expression being positive, we deduce that for any T > 0 : L m f g > L m f tg .Thus, in this example, the mean-field game approach-which consists of freezing the mean-field term-is sub-optimal.On the other hand, the mean-field-type game approach coincides with the global optimization problem in the one-decision-maker case.Hence, (A4) is the global optimum.

Difference with Multi-Population Mean-Field Games
The model studied here differs from (non-cooperative) multi-population (multi-class or multi-type) mean-field games.In multi-population mean-field games, it is usually assumed that there is an infinite number of decision-makers, each of them having their own control action.In those models, a single decision-maker does not influence the population mean state within its class, since the class size is assumed infinite.On the other hand, in the mean-field-type game model presented here, there is a finite number of "true" decision-makers, and each decision-maker does have a non-negligible effect on the mean-field terms.