Physics-Informed Graph Neural Operator for Mean Field Games on Graph: A Scalable Learning Approach

: Mean-field games (MFGs) are developed to model the decision-making processes of a large number of interacting agents in multi-agent systems. This paper studies mean-field games on graphs ( G -MFGs). The equilibria of G -MFGs, namely, mean-field equilibria (MFE), are challenging to solve for their high-dimensional action space because each agent has to make decisions when they are at junction nodes or on edges. Furthermore, when the initial population state varies on graphs, we have to recompute MFE, which could be computationally challenging and memory-demanding. To improve the scalability and avoid repeatedly solving G -MFGs every time their initial state changes, this paper proposes physics-informed graph neural operators (PIGNO). The PIGNO utilizes a graph neural operator to generate population dynamics, given initial population distributions. To better train the neural operator, it leverages physics knowledge to propagate population state transitions on graphs. A learning algorithm is developed, and its performance is evaluated on autonomous driving games on road networks. Our results demonstrate that the PIGNO is scalable and generalizable when tested under unseen initial conditions.


Introduction
Multi-agent systems (MAS) are prevalent in engineering and robotics applications.With a large number of interacting agents in the MAS, solving agents' optimal control could be computationally intractable and not scalable.To solve this challenge, MFGs are [1,2] developed to model strategic interactions among many agents who make dynamically optimal decisions, while a population distribution is propagated to represent the state of interacting agents.Since its inception, MFGs have been widely applied to social networks [3], swarm robotics [4] and intelligent transportation [5,6].
MFGs are micro-macro games that bridge agent dynamics and population behaviors with two coupled processes: individuals' dynamics solved by optimal control (i.e., agent dynamic) and system evolution arising from individual choices (i.e., population behaviors).
In this work, we focus on a class of MFGs [7], namely, mean-field games on graphs (G-MFG), where the state space of the agent population is a graph and agents select a sequence of nodal and edge transitions with a minimum individual cost.Solving these G-MFGs, however, poses the following challenges: (1) With a graph-based state space, the action space expands significantly, encompassing both nodes and edges, resulting in a high-dimensional search space.More specifically, the decision-making of a representative agent in G-MFG consists of not only en-route choices at nodes but also continuous velocity control on edges subject to congestion effects.(2) Existing work mainly assumes that the initial population distribution is fixed.The change in initial population states leads to the re-computation of mean-field equilibria (MFE), a task that requires computational and memory resources and hinders the practicality of deploying MFG solutions.
To address these challenges, this paper proposes a new learning tool for G-MFGs, namely, a physics-informed graph neural operator (PIGNO).The key element is a graph neural operator (GNO), which can generate population dynamics given the initial population distribution.To enhance the training process, the GNO incorporates physics knowledge regarding how agent and population dynamics propagate over the spatiotemporal domain.

Related Work
Researchers have explored various machine learning methods, such as reinforcement learning (RL) [8][9][10][11], and physics-informed neural networks (PINN) [12][13][14].However, it can be time-consuming and memory-demanding for these learning tools to adapt to changes in initial population density.Specifically, each unique initial condition may require the assignment and retraining of a dedicated neural network to obtain the corresponding MFE.To enhance the scalability of the learning framework for MFGs, Chen et al. [15] introduced a physics-informed neural operator (PINO) framework.This framework utilizes a Fourier neural operator (FNO) to establish a functional mapping between mean-field equilibrium and boundary conditions.However, the FNO fails to solve G-MFGs because it cannot directly project information over a graph into a high-dimensional space and generate population dynamics in the graph state space.Therefore, in this paper, we propose a graph neural operator (GNO) that learns mappings between graph-based function spaces to solve G-MFGs.The GNO leverages message-passing neural networks (MPNNs) to handle state space and propagate state information efficiently by aggregating the neighborhood messages.
Our contributions include: (1) We propose a scalable learning framework leveraging PIGNO to solve G-MFGs with various initial population states; (2) We develop a learning algorithm and apply it to autonomous driving games on road networks to evaluate the algorithm performance.
The rest of this paper is organized as follows: Section 2 introduces preliminaries about G-MFGs.Section 3 presents the details of our scalable learning framework for G-MFGs.Section 4 presents the solution approach.Section 5 demonstrates numerical experiments.Section 6 concludes.

Mean-Field Games on Graphs (G-MFG)
Mean-field games on graphs (G-MFG) model population dynamics and a generic agent's optimal control on both nodes and edges.A G-MFG consists of a forward FPK and multiple backward HJB equations, which are defined on a graph G = {N , L} as follows: Definition 1.A G-MFG with discrete time graph states [16] is: is the population density on each edge (, ) ∈ L at time step .ρ denotes the initial population density over the graph.The Fokker-Planck (FPK) equation captures the evolution of the population state on the graph.The Hamilton-Jacobi-Bellman (HJB) equation captures the optimal control of a generic agent, including the velocity control on edges and route choice on nodes.  = [ ]  is the value function at each edge.Ṽ denotes the terminal cost.  = [    ]  denotes the exit rate at each edge, which represents the agent's velocity control.  = [     ]  is the probability of choosing node  as the nextgo-to node at node , i.e., route choice.  = [     ]  is the cost incurred by the agent at time step .The transition matrix   is determined by   and   .The MFE is denoted by  ( [G-MFG]) = { * ,  * ,  * ,  * }, satisfying Equation (1).The mathematical details of G-MFG can be found in Appendix A.1.We provide a toy example in Appendix A.2 to help readers better understand it.

Graph Neural Operator (GNO)
Graph neural operators (GNOs) are generalized neural networks that can learn functional mappings between high-dimensional spaces [17].GNO utilizes an MPNN to update space representation according to messages from the neighborhood.In this paper, we adopt a GNO to establish mappings between initial population state  0 and population  * at MFE.We leverage the physics knowledge (i.e., FPK and HJB equations) to train the GNO for solving MFE with various initial population densities, eliminating the need to recompute MFE.

Scalable Learning Framework
In this section, we propose a physics-informed graph neural operator (PIGNO) to learn G-MFGs.Figure 1 illustrates the workflow of two couple modules: FPK for population behaviors and HJB for agent dynamics.The FPK and the HJB modules internally depend on each other.In the FPK module, we estimate population density  0: over the graph and update the GNO using a residual defined by the physical rule that captures population dynamics triggered by the transition matrix defined in the FPK equation.In the HJB module, the transition matrix  0: is obtained given the population density  0: .We adopt another GNO to solve the HJB equation since the dynamics of the agents and the cost functions are known in the MFG system.We now delve into the details of the proposed PIGNO.

PIGNO for Population Behaviors
The GNO- maps the initial population distribution and the population distribution from time 0 to .The input of GNO- is the initial population density  0 along with the transition information to propagate population dynamics.The output of GNO- is the population dynamics over the spatiotemporal domain, denoted by ρ ≡ ρ0: .The PIGNO is instantiated as the following MPNN: ∀(, ) ∈ L,  = 0, 1, . . ., where, ρ   is the population density of edge (, ) at time , L    is the set of neighborhood edges of edge (, ) at time ,   is the graph kernel function for   , and    , denotes the cumulative message used to propagate population dynamics from time 0 to time .   , indicates the ratio of the population entering from edge (, ) to edge (, ) till time , which is determined by the ratio of population exiting the edge (, ) (i.e., the velocity control ) and the ratio of the population choosing the edge (, ) as the next-go-to edge (i.e., the route choice ).The MPNN utilizes the initial population distribution and the message to propagate the population dynamics in the G-MFG system.The neighborhood message is transformed by a kernel function   and aggregated as an additional feature to estimate population density.
The GNO- adopts a physics-informed training scheme, which combines both modeldriven and data-driven methods.The training of GNO- is guided by the residual determined by physical rules of population dynamics.Mathematically, the residual   is: where, the set  D contains various initial densities over the graph.  0 is calculated as: where, the first term in   0 evaluates the physical discrepancy based on Equation (1a).It integrates the residual of the FPK equation, ensuring that the model adheres to established laws of motion.When predicted  becomes closer to  * satisfying the FPK equation, the residual gets closer to 0. The second term quantifies the discrepancy between the estimations and the ground truth of the initial density.The observed data comes from the initial distribution of population  0 .The training of   based on observed data follows the traditional supervised learning scheme. 0 and  1 are the weight coefficients.

PIGNO for Optimal Control
Similar to GNO-, GNO- learns a reverse mapping from the terminal costs to the value functions over the graph from time  to 0. The input of GNO- is the terminal costs   and the transition information.The output of GNO- is the value function over the spatiotemporal domain, denoted by V ≡ V0: .The GNO- also follows the MPNN formulation: ∀(, ) ∈ L,  = 0, 1, . . ., where V    is the value function of edge (, ) at time , L    is the set of neighborhood edges of edge (, ) at time ,   is the graph kernel function for   , and    , denotes the cumulative message used to propagate population dynamics from time 0 to time , which comes from the transpose of the cumulative transition matrix in GNO-.The MPNN utilizes the terminal costs and the message to propagate system value functions.The neighborhood message is transformed by a kernel function   and aggregated as an additional feature to estimate value functions.
The training of GNO- takes the HJB residual   into consideration, which is where, the set  D contains various terminal costs over the graph.   is calculated as: where, the first term in    evaluates the physical discrepancy based on Equation (1b).
When predicted  becomes closer to the optimal  * , the residual gets closer to 0. The second term calibrates the predictions to the ground truth of the terminal costs   by supervised learning.Similarly,  0 and  1 are the weight coefficients.Note that a meta assumption of this model is discrete time.There are significant limitations of adopting a continuous-time model: (1) A continuum formulation of forward process over a graph space cannot be easily captured by several coupled partial differential equation systems.One way is to simplify the decision making over the graph as discretized route choice at each node, rendering the game as a continuous time markov decision process, which fails to capture the real-time velocity control on each edge.The other way is to formulate the dynamic process as a graph ODE, which requires further investigation on scalability.(2) A continuum formulation of backward process can be solved by continuoustime reinforcement learning.We leave the scalability of this continuous time reinforcement learning scheme to future work.

Solution Approach
In this section, we present our learning algorithm (Algorithm 1).We first initialize the GNO-   parameterized by  and GNO-   parameterized by .During the th iteration of the training process, we first sample a batch of initial population densities  0 and terminal costs   .The terminal cost   denotes the delay for agents who haven't arrived at their destinations at the terminal time step.In this work, we assume the time delay at the terminal step is proportional to the travel distance between the agent's location and her destination. D represents the set of initial population density.In this work, we interpret initial population density as the travel demands (i.e., the number of agents entering a graph at time 0).Agents can enter the graph from each node.We assume at time 0, the number of agents at each node (i.e., travel demand) follow a uniform distribution.We use each pair of  0 and   to generate the population density ρ0: and V0: over the entire domain.Given ρ0: and V0: , we obtain the spatiotemporal transition  for all nodes.We then update the parameter  of the neural operator according to the residual.At the end of each iteration, we check the convergence according to: Obtain  0: −1() by solving the HJB.

Numerical Experiments
In this section, we employ our algorithm to facilitate autonomous driving navigation in traffic networks.As illustrated in Figure 2, a substantial number of autonomous vehicles (AVs) move to destination node 4, with the objective of minimizing total costs subject to the congestion effect.We use a representative agent as an example to elaborate on the speed control and density dynamics of the population in this scenario.At node 1, the representative agent first selects edge  12 .The agent then drives along edge  12 and selects continuous-time-space driving velocities on the edge.The agent selects her next-to-go edge at node 2, following this pattern until she reaches her destination at node 4.These choices regarding her route and speed will actively influence the evolution of population density across the network.The mathematical formulation of this autonomous driving game can be found in [16].We construct the initial population state over the network as follows: We assume that at time 0, the traffic network is empty.Vehicles enter the road network at origin nodes 1, 2, 3 and move toward the destination 4. Travel demands at each origin satisfy   ∼ Uniform[0, 1],  = 1, 2, 3. Therefore, each initial population distribution over the network consists of travel demands at origins (i.e., [ 1 ,  2 ,  3 ]), which are sampled from three independent uniform distributions.
Figure 3 demonstrates the convergence performance of the algorithm in solving G-MFG.The x-axis represents the iteration index during training, the y-axis displays the convergence gap, and the 1-Wasserstein distance measures the closeness between our results and the MFE obtained by numerical methods [16].The results demonstrate that our algorithm can converge stably after 50 iterations.

Conclusions
In this paper, we propose a scalable learning framework, G-MFGs.Existing numerical methods have to recompute MFE when the initial population density changes.To avoid recomputing MFE inefficiently, this work proposes a learning method, which utilizes graph neural networks to establish a functional mapping between the initial population density and MFE over the entire spatial temporal domain.This learning framework allows us to obtain MFEs corresponding to each initial population distribution without recomputing them.We demonstrate the efficiency of this method in autonomous driving games.Our contribution lies in the scalability of PIGNO to handle various initial population densities without recomputing MFEs.Our framework offers a memory-and data-efficient approach for solving G-MFGs.
We then look into Equation (A2).We denote  ,, =     and we assume that agents entering node  and making route choice at time  will exist the node at time  + 1.We then have We substitute  in Equation (A3) with  according to Equation (A4) and obtain Equation (1b).
We have ∀(, ) ∈ L  .The G  -dMFG on the toy network is first presented.We assume  +1

Figure 1 .
Figure 1.PIGNO for G-MFGs: We leverage graph neural networks to establish a functional mapping between the initial population density along with terminal cost and mean field equilibrium over the entire spatial temporal domain.The population density and terminal cost over the space domain at each time step is denoted by color bars over the graph.The PIGNO allows us to obtain MFEs corresponding to each initial population distribution and terminal cost without recomputing them.

Algorithm 1 3 : 4 : 5 :
PIGNO-MFG 1: Initialize: GNO-   parameterized by , GNO-   parameterized by ; 2: for  ← 0 to  do Sample a batch of initial population densities  0 from the set  D and terminal costs   from the set  D ; Predict ρ0: () and V0: () by   and   corresponding to each pair of  0 and   in the batch; for each ρ0: () and V0: () do 6:

Figure 2 .
Figure 2. Autonomous driving game on the road network.

Figure 4
Figure 4 demonstrates the population density solved by our proposed method along three paths on the road network, i.e., (1 → 2 → 3 → 4), (1 → 2 → 4), and (1 → 3 → 4).The x-axis is the spatial position on the path, and the y-axis represents the time.The z-axis represents the population density .The running cost functional form follows a nonseparable cost structure with a crossing term of the agent action and the population density.We visualize the population density in G-MFG with three initial population states, which are constructed by travel demands [ 1 ,  2 ,  3 ]: [0.6, 0.4, 0.2] (See Figure 4a-c), [0.4,0.4, 0.4]