Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach

Bai, Yu; Zhou, Di; He, Zhen

doi:10.3390/aerospace12040302

Open AccessArticle

Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach

by

Yu Bai

^†,

Di Zhou

^*,† and

Zhen He

^†

School of Astronautics, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Aerospace 2025, 12(4), 302; https://doi.org/10.3390/aerospace12040302

Submission received: 10 March 2025 / Revised: 28 March 2025 / Accepted: 30 March 2025 / Published: 1 April 2025

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

This paper investigates Mean Field Game methods to solve missile interception strategies in three-dimensional space, with a focus on analyzing the pursuit–evasion problem in many-to-many scenarios. By extending traditional missile interception models, an efficient solution is proposed to avoid dimensional explosion and communication burdens, particularly for large-scale, multi-missile systems. The paper presents a system of stochastic differential equations with control constraints, describing the motion dynamics between the missile (pursuer) and the target (evader), and defines the associated cost function, considering proximity group distributions with other missiles and targets. Next, Hamilton–Jacobi–Bellman equations for the pursuers and evaders are derived, and the uniqueness of the distributional solution is proved. Furthermore, using the

ϵ

-Nash equilibrium framework, it is demonstrated that, under the MFG model, participants can deviate from the optimal strategy within a certain tolerance, while still minimizing the cost. Finally, the paper summarizes the derivation process of the optimal strategy and proves that, under reasonable assumptions, the system can achieve a uniquely stable equilibrium, ensuring the stability of the strategies and distributions of both the pursuers and evaders. The research provides a scalable solution to high-risk, multi-agent control problems, with significant practical applications, particularly in fields such as missile defense systems.

Keywords:

mean field games; missile interception; forward–backward stochastic differential equations (FBSDEs); ϵ-Nash equilibrium

1. Introduction

Motivation: With the increasingly complex global security situation, missile interception technology has become an indispensable part of modern air defense and missile defense operations. Traditional missile interception problems often focus on the interactions between a few missiles and targets. However, as the number of missiles increases and the battlefield environment becomes more complex, the limitations of traditional methods have become more apparent. To address this challenge, the Mean Field Game method provides a new solution approach. The introduction of MFG theory allows for the effective resolution of control and optimization problems in large-scale, multi-target environments, avoiding the computational explosion caused by high-dimensional problems found in traditional methods [1,2]. Previous studies on the missile multi-to-multi interception problem mostly expanded from one-to-one or many-to-one situations to many-to-many. However, this process often leads to dimensional explosion, and as the number of individuals increases, the communication burden intensifies, often causing communication blockage. Therefore, this paper employs the MFG method to solve the missile multi-to-multi interception problem. As

N \to \infty

, the coupling effects between individuals are minimized. By deriving the optimal solution of the mean field for both parties in the game, this paper proves the consistency between the Nash equilibrium of finite individuals and the mean field Nash equilibrium.

The application of MFG has covered multiple fields, especially in military decision-making, resource allocation, and multi-agent systems, showing broad potential. In missile interception problems, the MFG method models the behavior of each missile, assuming that the behavior of participants is influenced solely by the statistical distribution of the group. This avoids complex mutual computations between individuals [3]. This makes MFG an effective tool for handling large-scale missile interception systems, providing near-optimal control strategies, without relying on the state of each missile [4,5]. In recent years, researchers have conducted extensive theoretical studies on MFG methods, covering topics from the derivation of the Hamilton–Jacobi–Bellman (HJB) equations to Nash equilibrium analysis in mean field games [6]. The HJB equations provide a mathematical framework for solving optimal control strategies, considering the control constraints, state variables, and optimality conditions of dynamic systems, thus leading to the optimal solution [7]. However, the difficulty of solving the HJB equations increases significantly as the number of participants grows [8]. Therefore, the introduction of MFG effectively simplifies this problem, allowing for reasonable solutions in large-scale systems [9,10]. In the specific application of missile interception problems, the MFG framework not only focuses on the interaction between missiles and targets but also considers the cooperation and competition among multiple missiles [11]. This is especially important because modern missile defense systems typically contain multiple interceptors. When facing multiple enemy targets, optimizing the strategy of each missile to achieve optimal performance for the overall system is a critical issue. By using MFG-based modeling, researchers are able to derive the optimal interception strategy for each missile, while ensuring computational efficiency [12]. With the continuous development of MFG theory, more and more research has begun to focus on how to effectively solve HJB equations in high-dimensional spaces and handle complex boundary conditions. For example, Bensoussan et al. [4] proposed a numerical solution method based on MFG theory that can handle complex problems involving large numbers of participants, while Fathi et al. [5] explored the optimization of control strategies in multi-agent collaboration and competition. These studies provide theoretical support for the design of multi-missile interception control strategies. Furthermore, the advantages of the MFG method lie in its ability to simplify computational complexity using the statistical distribution of group behavior, rather than directly calculating the interactions between individuals [13]. This characteristic makes the MFG method highly applicable in real-time defense systems, especially in complex battlefield environments with multiple targets and interceptors. Researchers have successfully solved many challenges that traditional methods find difficult to address by continuously improving the MFG framework [9].

In this study, we assume that the number of targets is sufficiently large to apply the Mean Field Game (MFG) approach. However, in practical applications, the number of targets is typically finite, which may affect the applicability of this method. While the MFG approach effectively handles large-scale systems and provides equilibrium solutions, its validity might be limited when the number of targets is small. We will discuss how a limited number of targets can influence the applicability and effectiveness of the MFG method.

The goal of this paper is to explore the optimal control strategy in missile interception problems based on the MFG theory, and to analyze the existence and uniqueness of the

ϵ

-Nash equilibrium [14] in mean field games. Through an in-depth study of this problem, this paper not only promotes the application of multi-agent systems in missile defense but also provides theoretical support for optimal control problems in large-scale complex systems [15]. The main innovations of this paper are as follows:

Application of MFG to missile interception problems in three-dimensional space: This paper is the first to apply the Mean Field Game method to multi-missile interception target problems in three-dimensional space, particularly focusing on nonlinear interception systems. This innovation supplants traditional one-to-one and many-to-one interception models, extending them to a many-to-many interception problem. By fully utilizing the distributed nature of the MFG method, missile strategies are optimized in complex environments, avoiding the issue of dimensional explosion, while not requiring individuals to perceive the behavior of all other participants. Compared to existing studies that mainly considered two-dimensional cases, this paper starts from three-dimensional space, where both state and control information are constrained, making it more aligned with practical application scenarios.
Consideration of complex interactions between missiles and targets: When multiple missiles simultaneously attack adjacent and closely located targets, interference between missiles may occur. To address this challenge, this paper proposes a dynamic adjustment mechanism based on the distribution of adjacent groups in three-dimensional space. Each missile adjusts its strategy according to the distribution of nearby groups, both its own and the target’s, thus minimizing interference. Furthermore, when approaching a target group, the missile updates its strategy based on the target group’s distribution to increase the interception probability. To reduce the risk of being intercepted, the target adopts a decentralized evasion strategy by perceiving changes in the missile distribution and the nearby target distribution, thus determining the optimal escape path. The optimal strategies for missiles and targets are inherently adversarial.
$ϵ$ -Nash equilibrium study in a two-group MFG model: This paper investigates a two-group MFG problem, where individuals within the same group cooperate with each other, and cooperation is realized through distribution, while individuals from different groups compete. Each player does not need to perceive the behavior of all individuals, but only needs to understand the distribution of both groups and the distribution of the adjacent group. As the number of players increases, the $ϵ$ -Nash equilibrium can be approximated as a global Nash equilibrium [16], greatly simplifying the problem-solving process and improving the computational efficiency of the model. This approach is inspired by the multi-population MFG model described in Section 7.1.1 of Probabilistic Theory of Mean Field Games with Application [17].

This paper proposes a method for calculating the distribution of nearby groups in three-dimensional space. By considering the distribution of nearby groups around the missile, it reduces the interference from surrounding missiles, as well as from groups nearby the target. Compared to traditional methods, which require calculating interactions between all individuals, this paper only considers the effects brought by nearby groups, making the computation more efficient and practically applicable. These innovations highlight the unique contribution of this paper in the existing literature.

In comparison with the approach proposed by Toumi et al. (2024) [18] in their study on large-scale multi-agent systems, which also considered interactions between agents in a distributed manner, our model extends the concept into a three-dimensional space, where both missile and target distributions become more complex, due to input constraints rather than spatial constraints. While Toumi et al. focused on congestion avoidance in crowd scenarios with agent rewards based on their distance from other agents, our model specifically incorporates missile–target dynamics in a three-dimensional combat scenario, optimizing strategies to minimize interference and improve interception success, despite the input limitations in such a constrained control environment.

Compared to the study by [17], in the field of missile interception, when the number of missiles is sufficiently large, the difference between the individual state and the group mean under the

ϵ

-equilibrium will gradually decrease, indicating that the individual state will progressively follow the changes in the group state.

The rest of the paper is organized as follows: In Section 2, we first present the missile interception model, based on which we derive the spatial equations and the cost functions for both the pursuers and the evaders. We also describe the basic form of the

ϵ

-Nash equilibrium. In Section 3, we derive the distribution functions for both the pursuers and the evaders. By manipulating the terminal function, we introduce a new cost function. In Section 4, we apply the principles of dynamic programming to obtain the Hamilton–Jacobi–Bellman (HJB) equations for both the pursuers and the evaders, from which we derive the optimal strategies. In Section 5, we present the unique solution form of the forward–backward stochastic differential equations. We then prove the boundedness of the states, and finally show the difference between individuals and the mean field in the

ϵ

-Nash equilibrium. In Section 6, we provide the experimental results. Finally, in Section 7, we present the concluding remarks.

2. Problem Setting

Here, we first present the mathematical model of the missile interception problem, then derive the state-space equations for the pursuit problem based on this model. Following that, we outline the related basic assumptions and, finally, we define the dual-population

ϵ

-Nash equilibrium.

2.1. Notation

From the field of missile interception, we will discuss the problem where the missile is represented as the pursuer and the target is represented as the evader. The pursuers,

P_{i}

, are collaboratively capturing the evaders,

E_{i}

, who are attempting to evade the capture. The game takes place in a three-dimensional space. It is assumed that both the pursuers and the evaders are treated as point masses with normal acceleration constraints. The direction of velocity is adjusted by the normal acceleration, with the direction of acceleration being perpendicular to the velocity direction. The pursuer model between a pursuer and an evader is shown in Figure 1. Based on the relationships between the pursuer

P_{i}

and the evader

E_{i}

in three-dimensional space, the non-linear differential equations are derived, as presented in the work by Song et al. [19]. Using this model helps analyze and solve interference issues in the terminal guidance phase, and it reflects the actual situation more accurately. While other models may lead to different impacts on missile interference and target selection strategies, this model demonstrates strong applicability and advantages for the current research.

{\dot{R}}_{P i} = v_{E i} cos θ_{E i} cos φ_{E i} - v_{P i} cos θ_{P i} cos φ_{P i}

(1)

R_{P i} {\dot{θ}}_{L i} = v_{E i} sin θ_{E i} - v_{P i} sin θ_{P i}

(2)

R_{P i} {\dot{φ}}_{L i} cos θ_{L i} = v_{P i} cos θ_{P i} sin φ_{P i} - v_{E i} cos θ_{E i} sin φ_{E i}

(3)

\begin{matrix} {\dot{θ}}_{P i} = & \frac{A_{y P i}}{v_{P i}} + tan θ_{L i} sin φ_{P i} \times \frac{(v_{P i} cos θ_{P i} sin φ_{P i} - v_{E i} cos θ_{E i} sin φ_{E i})}{R_{P i}} \\ + cos φ_{P i} \frac{(v_{P i} sin θ_{P i} - v_{E i} sin θ_{E i})}{R_{P i}} \end{matrix}

(4)

\begin{matrix} {\dot{φ}}_{P i} = & \frac{A_{z P i}}{v_{P i} cos θ_{P i}} + sin θ_{P i} cos φ_{P i} tan θ_{L i} + \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i} cos θ_{P i}} \\ - sin θ_{P i} sin φ_{P i} \frac{v_{E i} sin θ_{E i} - v_{P i} sin θ_{E i}}{R_{P i} cos θ_{P i}} - \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i}} \end{matrix}

(5)

\begin{matrix} {\dot{θ}}_{E i} = & \frac{A_{y E i}}{v_{E i}} + tan θ_{L i} sin φ_{E i} * \frac{(v_{P i} cos θ_{P i} sin φ_{P i} - v_{E i} cos θ_{E i} sin φ_{E i})}{R_{P i}} \\ + cos φ_{E i} \frac{(v_{P i} sin θ_{P i} - v_{E i} sin θ_{E i})}{R_{P i}} \end{matrix}

(6)

\begin{matrix} {\dot{φ}}_{E i} = & \frac{A_{z E i}}{v_{E i} cos θ_{E i}} + sin θ_{E i} cos φ_{E i} tan θ_{L i} + \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i} cos θ_{E i}} \\ - sin θ_{E i} sin φ_{E i} \frac{v_{E i} sin θ_{E i} - v_{P i} sin θ_{E i}}{R_{P i} cos θ_{E i}} - \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i}} \end{matrix}

(7)

The pursuit–evasion problem in three-dimensional space is modeled with multiple pursuers attempting to capture evaders. The notation used to describe the system dynamics is as given in Table 1:

In this paper, to clearly distinguish the parameters of pursuers and evaders, we use superscripts and subscripts. Specifically, the parameters of pursuers are denoted by the letter “P”, with corresponding superscripts and subscripts, while the parameters of the evaders are denoted by the letter “E”, with their respective superscripts and subscripts. Additionally, the letter “i” is used to represent each player, which helps clearly differentiate the state and behavior of each individual, especially in scenarios involving multiple pursuers and evaders. This notation ensures the clarity of formulas, figures, and descriptions, making it easier for readers to understand the components of the model and the relationships between players, while avoiding potential confusion.

2.2. Problem Formulation

In this paper,

(\bar{Ω}, F, {(F_{t})}_{t > 0}, P)

is a complete probability space,

W^{i} (s)

is m-dimensional Brownian motion. For any initial time

t \geq 0

, terminal time

T > t

, and initial state

x_{0}^{i} \in R^{n}

, the filtration

{(F_{t})}_{t > 0}

is the natural filtration generated by the Brownian motion

W^{i} (s)

for

t \leq s \leq T

, augmented by all the

P

-null sets of

F

. According to Equations (1)–(7), we consider a stochastic differential equation in a mean field game with control input constraints [20]:

\{\begin{matrix} d x^{i} (s) = (b^{i} (s, x_{i} (s)) + b_{u 1}^{i} (x_{i} (s)) u_{1}^{i} (s) + b_{u 2}^{i} (x_{i} (s)) u_{2}^{i} (s) \\ + b_{v 1}^{i} (x_{i} (s)) v_{1}^{i} (s) + b_{v 2}^{i} (x_{i} (s)) v_{2}^{i} (s)) d s \\ + σ^{i} (s, x_{i} (s), u_{1} (s), u_{2} (s), v_{1} (s), v_{2} (s)) d W^{i} (s), \\ s \in (t, T), \\ x^{i} (0) = x_{0} . \end{matrix}

(8)

where

i \in N

is the number of players in each population.

x^{i} (s) \in R^{m \times 1}

is the system state, with the initial condition

x_{0},

x^{i} = {[\begin{matrix} R_{P i}, θ_{L i}, φ_{L i}, φ_{P i}, θ_{P i}, θ_{E i}, φ_{E i} \end{matrix}]}^{T}

Let the accelerations of the pursuers and evader be defined as follows:

u_{1}^{i} (s) \in R^{n \times 1}

is the acceleration of the i-th pursuer in the velocity coordinate system of the y-axis.

u_{2}^{i} (s) \in R^{n \times 1}

is the acceleration of the i-th pursuer in the velocity coordinate system of the z-axis.

v_{1}^{i} (s) \in R^{k \times 1}

is the acceleration of the i-th evader in the velocity coordinate system of the y-axis.

v_{2}^{i} (s) \in R^{k \times 1}

is the acceleration of the i-th evader in the velocity coordinate system of thez-axis.

\{W^{i} (s) \in R^{m \times 1}, {(F_{t})}_{t > 0}, t \leq s \leq T\}

is the standard Wiener process for the i-th process.

b^{i} (s, x^{i} (s)) \in R^{m \times 1}, b_{u 1}^{i} (s, x^{i} (s)) \in R^{m \times 1}, b_{u 2}^{i} (s, x^{i} (s)) \in R^{m \times 1}

b_{v 1}^{i} (s, x^{i} (s)) \in R^{m \times 1}, b_{v 2}^{i} (s, x^{i} (s)) \in R^{m \times 1}

are real matrices for the i-th process.

σ^{i} (s, x^{i} (s), u_{1}^{i} (s), u_{2}^{i} (s), v_{1}^{i} (s), v_{2}^{i} (s)) \in R^{m \times m}

is the diffusion term of the system equation for the i-th process.

Assumption 1.

It can be assumed that

B^{i} = b^{i} + {b^{i}}_{u 1} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v 2}^{i} v_{2}^{i}

, and σ is continuous and bounded. There exist positive constants

C_{b}, C_{u}, C_{σ}, C_{b σ}

such that

\begin{matrix} ∥B^{i} (s, x^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i}) - B^{i} (s, y^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i})∥ \leq C_{b} ∥x^{i} - y^{i}∥ \end{matrix}

(9)

\begin{matrix} ∥B^{i} (s, x^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i}) - B^{i} (s, x^{i}, {\bar{u}}_{1}^{i}, {\bar{u}}_{2}^{i}, {\bar{v}}_{1}^{i}, {\bar{v}}_{2}^{i})∥ \\ \begin{matrix}  \end{matrix} \leq C_{u} ({|u_{1}^{i} - {\bar{u}}_{1}^{i}|}^{p} + {|u_{2}^{i} - {\bar{u}}_{2}^{i}|}^{p} + {|v_{1}^{i} - {\bar{v}}_{1}^{i}|}^{p} + {|v_{2}^{i} - {\bar{v}}_{2}^{i}|}^{p}) \end{matrix}

(10)

∥σ^{i} (s, x^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i}) - σ^{i} (s, y^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i})∥ \leq C_{σ} ∥x^{i} - y^{i}∥

(11)

\begin{matrix} ∥B^{i} (s, x^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i}) + ∥σ^{i} (s, x^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i})∥∥ \leq C_{b σ} (1 + ∥x^{i}∥) \end{matrix}

(12)

where

s \in [t, T] . x^{i}, y^{i} \in R^{n}

. The admissible control sets for pursuers and evaders can be defined as follows:

π_{u} = \{\begin{matrix} u_{j}^{i} ∣ u_{j}^{i} \in [t, T] \times R^{n \times 1}, u_{j}^{i} is uniformly locally Lipschitz continuous, \\ |u_{j}^{i}| \leq u_{j}^{i, max}, j = 1, 2, i = 1, \dots N . \end{matrix}\}

(13)

π_{v} = \{\begin{matrix} v_{j}^{i} ∣ v_{j}^{i} \in [t, T] \times R^{n}, v_{j}^{i} is uniformly locally Lipschitz continuous, \\ |v_{j}^{i}| \leq v_{j}^{i, max}, j = 1, 2, i = 1, \dots, N . \end{matrix}\}

(14)

for all

s \in [t, T]

,

L (s, x^{i}, u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i})

, and

g (x^{i})

is continuously differentiable. In the missile interception model presented in this paper, we assume that the control inputs are constrained. Specifically, the trajectory control engine, which governs the missile’s movement, has an upper bound on its output, which is directly related to the normal acceleration. This assumption is essential for accurately modeling real-world missile systems, where limitations on engine power and physical constraints on acceleration are commonly encountered. The assumption of bounded control inputs is well-supported by prior research in the field. For example, references [21] offered theoretical foundations for such constraints in control systems, while references [22,23] explored their relevance to missile interception problems. Furthermore, reference [24,25] discussed the effect of input constraints on system stability and performance, further reinforcing the validity of this assumption in our model [26].

In missile interception models, the pursuer’s cost function is crucial for optimizing the missile behavior to effectively intercept the target, as shown in Equation (15). The cost function can be expressed as a function of several key distributions within the missile–target environment. These distributions represent various interactions between the missile and surrounding entities, each of which contributes to the overall cost. The missile operates within a dynamic environment that requires careful consideration of four primary distribution factors:

Self-Proximity Distribution: This distribution aims to minimize the impact of other missiles in the same class on the missile’s behavior, ensuring that the missile’s movement is not adversely affected by its peers in close proximity.
Total Missile Distribution: The purpose of this distribution is to guide all missiles along a predetermined trajectory that aligns with the target distributions, optimizing the collective missile effort toward target interception.
Target-Proximity Distribution: This distribution is designed to avoid interference from nearby targets, ensuring that the missile’s approach is not obstructed by the presence of other targets in the vicinity.
Total Target Distribution: This distribution guides all missiles to converge toward the collective distribution of all targets, optimizing the strategy for interception of multiple targets simultaneously.

\begin{matrix} J_{P i} (s, u_{1}^{i}, u_{2}^{i}) \\ = inf_{X_{E i} \in C_{E}} E [∥X_{P i} (t) - X_{E i} (t)∥] \\ + inf_{u_{i} \in π_{u}} E [\int_{t}^{T} exp \{- \int_{t}^{s} c_{P} (x^{i} (r)) d r\} (x^{i} {(s)}^{T} Q_{P 1} x^{i} (s) + \frac{{(u_{1}^{i})}^{2}}{2} + \frac{{(u_{2}^{i})}^{2}}{2} \\ + \int_{Ω} λ_{P} (\partial_{t} ψ_{P} (s, x^{i}) + \partial_{x} ψ_{P} (s, x^{i}) \cdot d x^{i}) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1} u_{1}^{i} + γ_{P 1} u_{2}^{i}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1} u_{1}^{i} + γ_{E 1} u_{2}^{i})) d s] \\ + E [exp \{- \int_{t}^{s} c_{P} (x^{i} (r)) d r\} ({(x^{i})}^{T} (T) R_{P 1} x^{i} (T) + μ^{P} (T) - μ^{E} (T))] \end{matrix}

(15)

where

μ^{P}

refers to the probability measure of the pursuer group, and

μ^{E}

refers to the probability measure of the evader group;

{\bar{X}}_{P i}

is the mean of the pursuer’s neighboring group, and

{\bar{X}}_{E i}

is the mean of the evader’s neighboring group;

Ω

is used only for computing the distribution of the neighboring group, not for the entire set of individuals;

Q_{P 1} \in R^{n \times n}, Q_{P 3} \in R^{M \times M}, Q_{P 4} \in R^{M \times M}, R_{P 1} \in R^{n \times n}

are all positive definite matrices;

ψ_{P}

is a function with continuous first derivatives;

X_{P i} \in R^{5 \times 5}

is the vector of the pursuer’s distribution state, which includes the position and the line-of-sight angle in three-dimensional space;

{\bar{X}}_{P i}

is the mean vector of the distribution state;

γ_{P 1}, γ_{P 2}

are constants; and

c_{P}

is a state-dependent function. The cost function associated with the pursuer is composed of three distinct parts, each contributing to the optimal decision-making process for the missile:

Initial Cost: At the initial moment, the missile selects an appropriate target, with the aim of minimizing the cost. Once the target is chosen, the missile will not alter its choice, ensuring that the cost is minimized from the outset.
Running Cost: The running cost continues to accumulate over time, incorporating the dynamic effects of distribution on the missile’s trajectory and behavior, adjusting for the proximity and interactions with both other missiles and targets.
Terminal Cost: At the terminal time, the cost is defined by the final states of both the missile and its target, as well as the terminal values of the various distributions. This cost function seeks to minimize the final discrepancy between the missile’s state and the target’s state, considering both the interception success and the distributions at the end of the engagement.

Next, we present the cost function for the evader, as shown in Equation (16). This cost function primarily depends on three distributions:

Self-Proximity Distribution: This distribution aims to reduce the influence of other evaders of the same type on the evader’s behavior, avoiding interference between groups.
Total Evader Distribution: The purpose of this distribution is to guide all evaders along a predetermined trajectory to effectively move away from the pursuer’s area.
Total Pursuer Distribution: The evader adjusts its own distribution to ensure that its behavior in the group avoids getting too close to the distribution of the pursuer.

\begin{matrix} J_{E i} (s, v_{1}^{i}, v_{2}^{i}) \\ = sup_{v_{i} \in π_{v}} E [\int_{t}^{T} exp \{- \int_{t}^{s} c_{E} (x (r)) d r\} (x^{i} {(s)}^{T} Q_{E 1} x^{i} (s) + \frac{{(v_{1}^{i})}^{2}}{2} + \frac{{(v_{2}^{i})}^{2}}{2} \\ + \int_{Ω} λ_{E} (\partial_{t} ψ_{E} (s, x) + \partial_{x} ψ_{E} (s, x) \cdot d x^{i}) d (μ^{E} (s) - μ^{P} (s)) \\ + exp {(X_{E i} - {\bar{X}}_{P i})}^{T} Q_{E 3} (X_{E i} - {\bar{X}}_{E i}) (ϑ_{E 1} v_{1}^{i} + ϑ_{E 1} v_{2}^{i}) \\ + {(X_{E i} - {\bar{X}}_{P i})}^{T} Q_{E 4} (X_{E i} - {\bar{X}}_{P i}) (ϑ_{E 1} v_{1}^{i} + ϑ_{E 1} v_{2}^{i})) d s] \\ + E [exp \{- \int_{t}^{s} c_{E} (x (r)) d r\} ({(x^{i})}^{T} (T) R_{E 1} x^{i} (T) + m^{P} (T) - m^{E} (T))] \end{matrix}

(16)

where

Q_{E 1} \in R^{n \times n}, Q_{E 3} \in R^{M \times M}, Q_{E 4} \in R^{M \times M}, R_{E 1} \in R^{n \times n}

are positive definite matrices;

ψ_{E}

is a function with continuous first derivatives;

X_{E i} \in R^{5 \times 5}

is the vector representing the evader’s distribution state, which includes position and line-of-sight angles in three-dimensional space;

{\bar{X}}_{E i}

is the mean vector of the evader’s distribution state;

γ_{E 1}, γ_{E 2}

are constants; and

c_{E}

is a state-dependent function. The cost function consists of two main parts:

Running Cost: This part is similar to the previous ones but requires additional consideration of the influence of the evader group distribution on the cost.
Terminal Cost: This part concerns the difference between the evader’s state and the target state at the terminal time, as well as the final value for the influence of each distribution.

2.3. The $ϵ$ Nash Equilibrium of Two-Group Games

In a two-group game, assume there are two groups: the pursuer group P and the evader group E. Each group has N participants. Each participant makes decisions based on their strategy, and their strategy depends not only on their own choices but also on the strategies of other members in the same group and the strategies of members in the other group. For the pursuer group P, the cost function of each pursuer

P_{i}

is defined as

J_{P_{i}} (u_{1}^{i}, u_{2}^{i}, u_{1}^{- i}, u_{2}^{- i}),

where

u_{1}^{i}, u_{2}^{i}

are the strategies of the pursuer

P_{i}

, and

u_{1}^{- i}, u_{2}^{- i}

are the strategies of other members in the pursuer group

P_{- i}

. Under the

ϵ

-Nash equilibrium, assume the evader group’s strategy E remains unchanged, and the strategies of the other pursuers

P_{- i}

remain fixed at

u_{1}^{- i}, u_{2}^{- i}

. If pursuer

P_{i}

deviates from the optimal strategy

u_{1}^{i, *}, u_{2}^{i, *}

, the cost function of pursuer

P_{i}

will increase by at most

ϵ

, i.e.

J_{P_{i}} (u_{1}^{i, *}, u_{2}^{i, *}, u_{1}^{- i}, u_{2}^{- i}) - ϵ \leq J_{P_{i}} (u_{1}^{i}, u_{2}^{i}, u_{1}^{- i}, u_{2}^{- i}) \leq J_{P_{i}} (u_{1}^{i, *}, u_{2}^{i, *}, u_{1}^{- i}, u_{2}^{- i}) .

(17)

where

ϵ

is a positive constant that represents the tolerance for deviations from the optimal strategy. For the evader group E, the cost function of each evader

E_{i}

is defined as

J_{E_{i}} (v_{1}^{i}, v_{2}^{i}, v_{1}^{- i}, v_{2}^{- i}),

where

v_{1}^{i}, v_{2}^{i}

are the strategies of the evader

E_{i}

, and

v_{1}^{- i}, v_{2}^{- i}

are the strategies of the other members in the evader group

E_{- i}

. Under the

ϵ

-Nash equilibrium, assume the pursuer group’s strategy P remains unchanged, and the strategies of the other evaders

E_{- i}

remain fixed at

v_{1}^{- i}, v_{2}^{- i}

. If evader

E_{i}

deviates from the current optimal strategy

v_{1}^{i, *}, v_{2}^{i, *}

, the cost function of evader

E_{i}

will increase by at most

ϵ

, i.e.,

J_{E_{i}} (v_{1}^{i, *}, v_{2}^{i, *}, v_{1}^{- i}, v_{2}^{- i}) + ϵ \geq J_{E_{i}} (v_{1}^{i}, v_{2}^{i}, v_{1}^{- i}, v_{2}^{- i}) \geq J_{E_{i}} (v_{1}^{i, *}, v_{2}^{i, *}, v_{1}^{- i}, v_{2}^{- i}) .

(18)

Following the method of Huang et al. (2007) [2], the results can be generalized to the given system (8). Additionally, the system described by Equations (8), (17), and (18) can be viewed as a special case of the major–minor LQG mean field game (LQG MFG) systems analyzed by Firoozi, Jaimungal, and Caines (2020) [27]. For a comparison of various approaches to major–minor LQG mean field games, refer to Huang (2020) [28]. Furthermore, the analysis in [29] offered valuable insights into entropy regularization techniques applied to these models. In two-group games, the

ϵ

-Nash equilibrium is a relaxed equilibrium concept. This allows each participant to deviate from the optimal strategy within a certain range, without causing a significant loss in utility. Specifically, both the pursuers and the evaders can choose strategies that are not completely optimal, as long as the gain from deviation does not exceed the tolerance

ϵ

. In this context, the

ϵ

-Nash equilibrium provides an effective description that better captures the stability of the game and the behavior patterns of the participants. The concept of

ϵ

-Nash equilibrium in two-group games offers a relaxed equilibrium condition, taking into account real-world factors such as limited rationality and incomplete information. By allowing participants to deviate from their strategies to some extent, the

ϵ

-Nash equilibrium more accurately reflects the behavior and decision-making in real-world games.

3. The Distribution Functions for the Pursuers and Evaders

For convenience, in the subsequent derivations, we first introduce the distribution functions for the pursuer and evaders. Let

h (x) : R^{d} \to R

be a smooth function that is compactly supported and possesses at least second-order continuous derivatives. According to Itô’s lemma,

d E [h (x)] = E [(\frac{1}{2} Δ h (x) + α (s, x, m) \cdot \nabla h (x))] d s,

(19)

where

α (s, x, m)

is a function of s, x, and m, representing the dynamics of the system. We assume the difference in distribution of the pursuers and evaders is given by

m_{s} = m_{s}^{P} - m_{s}^{E}

(20)

where

m_{s}^{P}

and

m_{s}^{E}

are the distribution functions for the pursuer and the evaders, respectively. Substituting this into the equation, we obtain The following equations describe the evolution of the distribution of the system. We first start by writing down the change in the function

h (x)

over the domain

\bar{Ω}

:

d \int_{\bar{Ω}} h (x) d m_{s} = \int_{\bar{Ω}} [\frac{1}{2} Δ h (x) + α (t, x, m_{s}) \cdot \nabla h (x)] d m_{s} d s .

(21)

This equation shows the rate of change in the integral of

h (x)

with respect to

m_{s}

, involving both the Laplacian of

h (x)

and the interaction term

α (t, x, m_{s}) \cdot \nabla h (x)

, which represents the influence of the pursuers on the system’s evolution.

Next, we use the duality product to express the change in the system in terms of the inner product:

d 〈 h, m_{s} 〉 = 〈\frac{1}{2} Δ h + α (s, \cdot, m_{s}) \cdot \nabla h, m_{s}〉 d s,

(22)

where

〈 \cdot, \cdot 〉

denotes the inner product. This equation expresses the rate of change in the inner product of h and

m_{s}

, which corresponds to the evolution of the system in the dual space.

Further, we can write the next expression for the system dynamics:

〈h, \frac{d}{d s} m_{s}〉 d s = 〈\frac{1}{2} Δ m_{s} - div (α (s, x_{s}, m_{s}) m_{s}), h〉 d s .

(23)

This equation shows the rate of change in the inner product of h with the distribution

m_{s}

, where the evolution of

m_{s}

is described in terms of its Laplacian and the divergence of the interaction term.

Finally, based on the above derivation, we obtain the following partial differential equation for

m_{s}

:

\partial_{s} m_{s} = \frac{1}{2} Δ m_{s} - div (α (s, x_{s}, m_{s}) m_{s}) .

(24)

This equation is the key partial differential equation that governs the dynamics of the pursuer and evader system. It describes how the distribution

m_{s}

evolves over time.

The term

α (s, x_{s}, m_{s})

represents the relative information between the pursuer and the evaders, given by the following equation:

α (s, x_{s}, m_{s}) = \int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} .

This expression measures the relative information between the pursuer’s and evader’s distributions.

Finally, we can express the time evolution of the difference between the pursuer’s and evader’s distributions as

\frac{\partial (m_{s}^{P} - m_{s}^{E})}{\partial s} = \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) - div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) + η_{s} .

(25)

This final equation describes the evolution of the difference between the pursuer’s and evader’s distributions. The term

η_{s}

represents any external influence on the system.

The continuity equation for the pursuer is defined as

\frac{\partial (m_{s}^{P})}{\partial s} = \frac{\partial (m_{s}^{E})}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) - div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) + η_{P}

(26)

The continuity equation for the evaders is defined similarly, but with the opposite direction of change:

\frac{\partial (m_{s}^{E})}{\partial s} = - \frac{\partial (m_{s}^{P})}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) - div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) + η_{E}

(27)

where

η_{P} (s)

and

η_{E} (s)

are random processes with zero mean and finite variance. The following definitions are used throughout the derivations:

m_{s}^{P}

is the distribution function of the pursuer at time s.

m_{s}^{E}

is the distribution function of the evaders at time s, which may be a Gaussian distribution or other distribution that describes the dynamics of the target.

λ

is a positive constant that controls the speed at which the missile distribution

m_{s}^{P}

converges to the target distribution

m_{s}^{E}

.

\nabla^{2} m_{s}^{P}

is the Laplace operator, which represents the spatial diffusion of the missile distribution.

η_{s}

is the perturbation term, which represents random noise in the system, typically modeled as white noise or a stochastic process. Finally, we have the normalization condition:

\int_{\bar{Ω}} d μ (X) = \int_{R^{n}} m (X) d X = 1,

where

\bar{Ω}

represents the support of the distribution, and the integral ensures that the total probability is normalized to 1.

4. The Optimal Feedback Strategies

For convenience, in the subsequent calculations, we simplify the cost function. The pursuer’s cost function is redefined as

\begin{matrix} J_{P i} (x^{i}, s; μ^{P}, μ^{E}) & = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{T} c_{P} (x (r)) d r} g_{P i} (x (T), μ^{P} (T)) \\ + \int_{t}^{T} e^{- \int_{t}^{s} c_{P} (x (r)) d r} h_{P i} (x, s, μ^{P} (s)) d s] \end{matrix}

(28)

where This equation is a simplification of Equation (15), designed to facilitate subsequent calculations and proofs. We have divided Equation (15) into two parts: the first part represents the running cost, and the second part corresponds to the terminal cost.

The running cost includes the state during motion, the sum of the probability measures of the neighboring group distributions, and the functional changes between the neighboring groups of the pursuers and evaders. The terminal cost function incorporates the state at the terminal time and the distribution difference between pursuers and evaders at the terminal time.

g_{P i} (x (T), μ^{P} (T))

is the terminal cost, which depends on the state

x (T)

and distribution

μ^{P} (T)

at the final time T,

h_{P i} (x, s, μ^{P} (s))

is the running cost at each time s, with control strategies

u_{1}^{P i}, u_{2}^{P i}

and distribution

m_{s}

. The terminal cost

g_{P i} (x (T), μ^{P} (T))

is defined as

g_{P i} (x (T), μ^{P} (T)) = {(x^{i} (T))}^{T} R_{P 1} x^{i} (T) + μ^{P} (T) - μ^{E} (T)

The running cost

h_{P i} (x, s, μ^{P} (s))

is given by

\begin{matrix} h_{P i} (x, s, μ^{P} (s)) & = [(x^{i} {(s)}^{T} Q_{P 1} x^{i} (s) + \frac{{(u_{1}^{i})}^{2}}{2} + \frac{{(u_{2}^{i})}^{2}}{2}) \\ + \int_{Ω} λ_{P} (\partial_{t} ψ_{P} (s, x^{i}) + \partial_{x} ψ_{P} (s, x^{i}) \cdot d x^{i}) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1} u_{1}^{i} + γ_{P 1} u_{2}^{i}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1} u_{1}^{i} + γ_{E 1} u_{2}^{i})] \end{matrix}

Assumption 2.

For the convenience of the subsequent calculations of the cost function

J (u, v)

, it is assumed that [30]: There exist positive constants

C_{L}

,

C_{g}

, and

p \geq 2

such that

\begin{matrix} |e^{- \int_{t}^{s} c_{P} (x^{i} (r)) d r} h_{P i} (x^{i}, s, μ^{P} (s))| \leq C_{L} (1 + ∥ x^{i} ∥^{p} + | u_{1}^{i} |^{p} + {| u_{2}^{i} |}^{p}) \end{matrix}

(29)

|e^{- \int_{t}^{s} c_{p} (x^{i} (r)) d r} g_{P i} (x^{i} (T), μ^{P} (T))| \leq C_{g} (1 + ∥ x^{i} ∥^{p})

(30)

To process the terminal value

g_{P i} (x_{T}, m_{T})

using the Itô–Wentzell formula, we consider the auxiliary process

p_{P i} (s)

, which represents the terminal condition. The dynamics of

p_{P i} (t)

are given by

d p_{P i} (s) = a_{P i} (s, x^{i}, m^{P} (t)) d t + b_{P i} (s, x^{i}, m^{P} (t)) d W^{i} (s),

(31)

with the terminal condition as follows:

p_{P i} (T) = g_{P i} (x_{T}, m_{T}) .

(32)

By substituting the above formula and applying the Itô–Wentzell formula, we have

\begin{matrix} d p_{P i} (s) = & (\frac{\partial g_{P i} (s, x^{i} (t), m^{P} (t))}{\partial s} + a_{P i} (s, x^{i} (t), m^{P} (t)) \cdot \frac{\partial g_{P i} (s, x^{i} (s), m^{P} (s))}{\partial x} \\ + \frac{1}{2} Tr (b_{P i} (t, x (s), m (t)) b_{P i} {(s, x (s), m (s))}^{T} \frac{\partial^{2} g_{P i} (s, x^{i})}{\partial x^{2}} \\ + \frac{δ g_{P i} (s, x^{i})}{δ m} \cdot \dot{m} (s)) d s \\ + \frac{\partial g_{P i} (s, x^{i})}{\partial x} \cdot σ^{i} (s, x (s), m^{P}) d W^{i} (s) \end{matrix}

(33)

Derivative with respect to time t. Using the chain rule,

\frac{\partial g (t, x^{i}, m^{P})}{\partial s} = \frac{\partial g (t, x^{i}, m^{P})}{\partial x} \frac{\partial x^{i} (s)}{\partial s} + \frac{\partial g (t, x^{i}, m^{P})}{\partial m_{s}^{P} (X)} \frac{\partial m_{s}^{P} (X)}{\partial s} + \frac{\partial g (t, x^{i}, m^{P})}{\partial m_{s}^{E} (X)} \frac{\partial m_{s}^{E} (X)}{\partial s}

(34)

Substituting the evolution equations for the pursuer and evaders,

\begin{matrix} \frac{\partial m_{s}^{P} (X)}{\partial s} & = \frac{\partial m_{s}^{E} (X)}{\partial s} \\ + \frac{1}{2} \nabla^{2} (m_{s}^{P} (X) - m_{s}^{E} (X)) \\ - div (\int m_{s}^{P} (X) ln \frac{m_{s}^{P} (X)}{m_{s}^{E} (X)} d s \cdot (m_{s}^{P} (X) - m_{s}^{E} (X))) \\ + η_{P} \end{matrix}

(35)

According to the Equation (34), the following is the distribution change of the evader:

\begin{matrix} \frac{\partial m_{s}^{E} (X)}{\partial s} & = - \frac{\partial m_{s}^{P} (X)}{\partial s} \\ + \frac{1}{2} \nabla^{2} (m_{s}^{P} (X) - m_{s}^{E} (X)) \\ - div (\int m_{s}^{P} (X) ln \frac{m_{s}^{P} (X)}{m_{s}^{E} (X)} d s \cdot (m_{s}^{P} (X) - m_{s}^{E} (X))) \\ + η_{E} \end{matrix}

(36)

Substituting the above equations, we obtain

\begin{matrix} \frac{\partial g}{\partial s} & = \frac{\partial g}{\partial x} \frac{\partial x}{\partial s} \\ + \frac{\partial g}{\partial m_{s}^{P} (X)} (\frac{\partial m_{s}^{E} (X)}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} (X) - m_{s}^{E} (X)) \\ - div (\int m_{s}^{P} (X) ln \frac{m_{s}^{P} (X)}{m_{s}^{E} (X)} d s \cdot (m_{s}^{P} (X) - m_{s}^{E} (X))) + η_{P}) \\ + \frac{\partial g}{\partial m_{s}^{E} (X)} (- \frac{\partial m_{s}^{P} (X)}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} (X) - m_{s}^{E} (X)) \\ - div (\int m_{s}^{P} (X) ln \frac{m_{s}^{P} (X)}{m_{s}^{E} (X)} d s \cdot (m_{s}^{P} (X) - m_{s}^{E} (X))) + η_{E}) \end{matrix}

(37)

Therefore, we can obtain

\begin{matrix} \frac{\partial g}{\partial s} & = \frac{\partial g}{\partial x} \frac{\partial x}{\partial s} \\ + (\frac{\partial m_{s}^{E} (X)}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} (X) - m_{s}^{E} (X)) \\ - div (\int m_{s}^{P} (X) ln \frac{m_{s}^{P} (X)}{m_{s}^{E} (X)} d s \cdot (m_{s}^{P} (X) - m_{s}^{E} (X))) + η_{P}) \\ - (- \frac{\partial m_{s}^{P} (X)}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} (X) - m_{s}^{E} (X)) \\ - div (\int m_{s}^{P} (X) ln \frac{m_{s}^{P} (X)}{m_{s}^{E} (X)} d s \cdot (m_{s}^{P} (X) - m_{s}^{E} (X))) + η_{E}) \end{matrix}

(38)

Taking the Derivative with Respect to x:

\frac{\partial g}{\partial x} = 2 R_{P 1} x^{i} (T)

This is the derivative of

x^{T} (T) R_{P 1} x (T)

. Taking the Second Derivative with Respect to x:

\frac{\partial^{2} g}{\partial x^{2}} = 2 R_{P 1}

This is the second derivative of

x^{T} (T) R_{P 1} x (T)

. Variational Derivative with Respect to m:

\frac{δ g}{δ m} = \frac{\partial g}{\partial m}

Substituting Derivative Terms into the Formula. Substituting all the terms, we obtain

\begin{matrix} \frac{\partial g}{\partial s} & = 2 R_{P 1} x (T) (b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2} + b_{v 1}^{i} v_{1} + b_{v 2}^{i} v_{2}) \\ + (\frac{\partial m_{s}^{E}}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) - div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) + η_{P}) \\ - (- \frac{\partial m_{s}^{P}}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) - div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) + η_{E}) \end{matrix}

(39)

Pursuer’s Dynamic Equation:

\begin{matrix} d p_{P i} (s) & = (\frac{\partial g_{P i}}{\partial s} + a_{P i} (s, x^{i} (s), m (s)) \cdot (2 R_{P 1} x^{i} (T)) \\ + \frac{1}{2} Tr (b_{P i} b_{P i}^{T} \cdot 2 R_{P 1}) + \partial_{s} m^{P}) d s \\ + (2 R_{P 1} x^{i} (T)) \cdot σ (s, x^{i} (s), m^{P} (s)) d W^{i} (s) \\ = (2 R_{P 1} x^{i} (T) (b^{i} + b_{u 1}^{i} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v 2}^{i} v_{2}^{i}) \\ + \frac{\partial m_{s}^{E}}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) - div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) + η_{P} \\ + \frac{\partial m_{s}^{P}}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{P} - m_{s}^{E}) + div (\int m_{s}^{P} ln \frac{m_{s}^{P}}{m_{s}^{E}} d s \cdot (m_{s}^{P} - m_{s}^{E})) - η_{E} \\ + \frac{1}{2} Tr (b b^{T} \cdot 2 R_{P 1}) + \partial_{s} m^{P}) d s \\ + (2 R_{P 1} x^{i} (T)) \cdot σ^{i} (s, x^{i} (s), m^{P}) d W^{i} (s) \end{matrix}

(40)

Evader’s Dynamic Equation:

\begin{matrix} d p_{E i} (s) & = (\frac{\partial g_{E i}}{\partial s} + a_{E i} (s, x^{i} (s), m (s)) \cdot (2 R_{P 1} x^{i} (T)) \\ + \frac{1}{2} Tr (b_{E i} b_{E i}^{T} \cdot 2 R_{P 1}) + \partial_{s} m^{E}) d s \\ + (2 R_{P 1} x^{i} (T)) \cdot σ (s, x^{i} (s), m^{E} (s)) d W^{i} (s) \\ = (2 R_{P 1} x^{i} (T) (b^{i} + b_{u 1}^{i} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v 2}^{i} v_{2}^{i}) \\ + \frac{\partial m_{s}^{P}}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{E} - m_{s}^{P}) - div (\int m_{s}^{E} ln \frac{m_{s}^{E}}{m_{s}^{P}} d s \cdot (m_{s}^{E} - m_{s}^{P})) + η_{E} \\ + \frac{\partial m_{s}^{E}}{\partial s} + \frac{1}{2} \nabla^{2} (m_{s}^{E} - m_{s}^{P}) + div (\int m_{s}^{E} ln \frac{m_{s}^{E}}{m_{s}^{P}} d s \cdot (m_{s}^{E} - m_{s}^{P})) - η_{P} \\ + \frac{1}{2} Tr (b b^{T} \cdot 2 R_{P 1}) + \partial_{s} m^{E}) d s \\ + (2 R_{E 1} x^{i} (T)) \cdot σ^{i} (s, x^{i} (s), m^{E}) d W^{i} (s) \end{matrix}

(41)

To incorporate the terminal condition into the optimization process, we introduce an auxiliary process

p (t)

, with the dynamics as shown above. The terminal condition is now processed dynamically using the formula:

V_{P i} (x^{i}, s) = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{s} c_{P i} (x^{i} (r)) d r} h_{p i} (x^{i}) d s + e^{- \int_{t}^{s} c (x^{i} (r)) d r} {(x^{i})}^{T} p_{P i} (s) x^{i} d s + V_{P i} (x_{s + d s}^{i}, s + d s)]

(42)

where

e^{- \int_{t}^{s} c_{P} (x^{i} (r)) d r}

represents the discount factor, accounting for the accumulated cost over time [31]. Next, we apply Ito’s Lemma to expand

V_{P i} (x_{s + d s}^{i}, s + d s)

:

V_{P i} (x_{s + d s}^{i}, s + d s) = V_{P i} (x^{i}, s) + \frac{\partial V_{P i}}{\partial s} d s + \frac{\partial V_{P i}}{\partial x} d x^{i} + \frac{1}{2} \frac{\partial^{2} V_{P i}}{\partial x^{2}} {(d x^{i})}^{2}

(43)

We then compute the expectation:

\begin{matrix} E [V_{P i} (x_{s + d s}^{i}, s + d s)] & = V_{P i} (x^{i}, s) \\ + E [\frac{\partial V_{P i}}{\partial s} + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} \frac{\partial V_{P i}}{\partial x} \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) \frac{\partial^{2} V_{P i}}{\partial x^{2}}] d s \end{matrix}

(44)

Substituting this into the dynamic programming equation,

\begin{matrix} V_{P i} (x^{i}, t) & = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{s} c_{P} (x^{i} (r)) d r} (h_{P i} (x^{i}, s) d s \\ + {(x^{i})}^{T} p_{P i} (T) x^{i} d s) + V_{P i} (x^{i}, t) \\ + \frac{\partial V_{P i}}{\partial s} d s + \frac{\partial V_{P i}}{\partial x} d x^{i} \\ + \frac{1}{2} \frac{\partial^{2} V_{P i}}{\partial x^{2}} {(d x^{i})}^{2}] \end{matrix}

(45)

Simplifying yields the Hamilton–Jacobi–Bellman (HJB) equation:

\begin{matrix} V_{P i} (x^{i}, t) & = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{s} c_{P} (x^{i} (r)) d r} (h_{P i} (x^{i}, μ^{P}) d s + p_{P i} (s) d s) + V_{P i} (x^{i}, t) + \frac{\partial V_{P i}}{\partial s} d s \\ + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} \frac{\partial V_{P i}}{\partial x} d s \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) \frac{\partial^{2} V_{P i}}{\partial x^{2}} d s] \end{matrix}

(46)

Substituting

h_{P i} (s, x^{i})

into the equation,

\begin{matrix} V_{P i} (x^{i}, s) & = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{s} c_{P} (x (r)) d r} (x^{i} {(s)}^{T} Q_{P 1} x^{i} (s) + \frac{{(u_{1}^{i})}^{2}}{2} + \frac{{(u_{2}^{i})}^{2}}{2}) \\ + \int_{Ω} λ_{P} (\partial_{t} ψ_{P} (s, x^{i}) + \partial_{x} ψ_{P} (s, x^{i}) \cdot d x^{i}) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1} u_{1}^{i} + γ_{P 1} u_{2}^{i}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1} u_{1}^{i} + γ_{E 1} u_{2}^{i}) + p_{P i} (T)) d s \\ + V_{P i} (x^{i}, s) + \frac{\partial V_{P i}}{\partial s} d s \\ + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} \frac{\partial V_{P i}}{\partial x} d s \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) \frac{\partial^{2} V_{P i}}{\partial x^{2}} d s] \end{matrix}

(47)

where

e^{- \int_{t}^{s} c_{P} (x^{i} (r)) d r}

is the discount factor affecting all terms.

h_{P i} (s, x^{i})

represents the instantaneous cost, while

p_{P i} (s)

handles the terminal condition.

V_{P i} (x^{i}, t)

is the value function of the pursuer’s strategy at time t and state

x^{i}

. The term

\frac{\partial V_{P i}}{\partial s}

represents the time derivative of the value function, indicating the rate of change in

V_{P i}

with respect to time.

\frac{\partial V_{P i}}{\partial x}

is the spatial derivative, showing how

V_{P i}

changes with respect to the state variable

x^{i}

.

\frac{\partial^{2} V_{P i}}{\partial x^{2}}

is the second-order spatial derivative, describing the curvature of

V_{P i}

in space. The term

{(b^{i} + b_{u 1}^{i} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v_{2}}^{i} v_{2}^{i})}^{T} \frac{\partial V_{P i}}{\partial x}

captures the interaction between the state process dynamics and the spatial gradient of the value function.

\frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) \frac{\partial^{2} V_{P i}}{\partial x^{2}}

represents the diffusion term, accounting for uncertainty or noise in the process. We first simplify and obtain the following HJB equation for the pursuer:

\begin{matrix} 0 & = \frac{\partial V_{P i}}{\partial s} \\ + inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{T} c_{P} (x (r)) d r} ((x^{i} {(s)}^{T} Q_{P 1} x^{i} (s) + \frac{{(u_{1}^{i})}^{2}}{2} + \frac{{(u_{2}^{i})}^{2}}{2}) \\ + \int_{Ω} λ_{P} (\partial_{t} ψ_{P} (s, x^{i}) + \partial_{x} ψ_{P} (s, x^{i}) \cdot d x^{i}) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1} u_{1}^{i} + γ_{P 1} u_{2}^{i}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1} u_{1}^{i} + γ_{E 1} u_{2}^{i}) + p_{P i} (t) \\ + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} \frac{\partial V_{P i}}{\partial x} \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) \frac{\partial^{2} V_{P i}}{\partial x^{2}}] \end{matrix}

(48)

Similarly, the HJB equation for the evaders is obtained as

\begin{matrix} 0 & = \frac{\partial V_{E i}}{\partial s} \\ + sup_{u_{i} \in π_{v}} E [e^{- \int_{t}^{T} c_{E} (x (r)) d r} ((x^{i} {(s)}^{T} Q_{E 1} x^{i} (s) + \frac{{(v_{1}^{i})}^{2}}{2} + \frac{{(v_{2}^{i})}^{2}}{2}) \\ + \int_{Ω} λ_{E} (\partial_{t} ψ_{E} (s, x^{i}) + \partial_{x} ψ_{E} (s, x^{i}) \cdot d x^{i}) d (μ^{E} (s) - μ^{P} (s)) \\ + {(X_{E i} - {\bar{X}}_{E i})}^{T} Q_{E 3} (X_{E i} - {\bar{X}}_{E i}) \\ + {(X_{E i} - {\bar{X}}_{P i})}^{T} Q_{E 4} (X_{E i} - {\bar{X}}_{P i}) + p_{E i} (t)) \\ + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} \frac{\partial V_{E i}}{\partial x} \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) \frac{\partial^{2} V_{E i}}{\partial x^{2}}] \end{matrix}

(49)

Assumption 3.

Initial Conditions: We assume that there exists a constant K such that, for all

N \in N

(the number of players), the initial state

X_{0}^{i}

of each player i lies within a closed ball

B_{K} (0)

centered on the initial mean

\bar{X}

with radius K. In other words, the initial states of all players are constrained within a finite region.

This assumption ensures that, when analyzing and solving the game, the initial states of the players are not excessively large, thereby avoiding issues of instability or intractability due to extreme initial conditions. This is crucial for ensuring the existence and solvability of the solution. Based on Theorem 4.12 in [32], we assume that the population distribution

μ

is absolutely continuous and satisfies the following partial differential equation as its distributional solution:

\int_{t}^{T} \int_{\bar{Ω}} (\partial_{t} ψ (t, x) + \nabla ψ (t, x) \cdot v (t, x)) d μ_{t} (X) d t = 0 \forall ψ \in C_{c}^{1} ((t, T) \times \bar{Ω}),

where

μ_{t}

is the population distribution at time t, and

v (t, x)

is the Borel vector field associated with individual decisions. This condition ensures that the evolution of the population follows the distributional solution of the above equation, which directly influences the strategies of individuals in the game. In the game model, we assume that a player’s behavior is not only driven by their own state but also influenced by the distribution of the entire population. This influence is reflected in the second term of the HJB equation, through the distributional solution [33]. In this case, individuals respond to the distribution of the population in their decision-making process, ensuring that each player’s behavior is consistent with the distribution of the overall population. We assume that the value functions for the pursuer and the evaders are given by the following expressions:

V_{P i} = {(x^{i})}^{T} S_{1}^{P i} x^{i} + S_{2}^{P i} x^{i} + S_{3}^{P i}

(50)

Equation (50) represents the value function for the i-th pursuer. The terms

S_{1}^{P i}

,

S_{2}^{P i}

, and

S_{3}^{P i}

are parameters that determine the quadratic, linear, and constant contributions to the value function, respectively, while

x^{i}

is the position of the i-th pursuer.

V_{E i} = {(x^{i})}^{T} S_{1}^{E i} x^{i} + S_{2}^{E i} x^{i} + S_{3}^{E i}

(51)

Equation (51) represents the value function for the i-th evader. Similarly,

S_{1}^{E i}

,

S_{2}^{E i}

, and

S_{3}^{E i}

are parameters that define the quadratic, linear, and constant terms, while

x^{i}

is the position of the i-th evader.

Next, we compute the partial derivatives of the value functions with respect to s and x:

\frac{\partial V_{P i}}{\partial s} = {(x^{i})}^{T} \frac{\partial S_{1}^{P i}}{\partial s} x^{i} + \frac{\partial S_{2}^{P i}}{\partial s} x^{i} + \frac{\partial S_{3}^{P i}}{\partial s}, \frac{\partial V_{P i}}{\partial x} = S_{1}^{P i} x^{i} + S_{2}^{P i}, \frac{\partial^{2} V_{P i}}{\partial x^{2}} = S_{1}^{P i}

Substituting these derivatives into the equation, we obtain the following HJB equation for the pursuer:

\begin{matrix} 0 & = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{T} c_{P} (x (r)) d r} ((x^{i} {(s)}^{T} Q_{P 1} x^{i} (s) + \frac{{(u_{1}^{i})}^{2}}{2} + \frac{{(u_{2}^{i})}^{2}}{2}) \\ + \int_{Ω} λ_{P} (\partial_{t} ψ_{P} (s, x^{i}) + \partial_{x} ψ_{P} (s, x^{i}) \cdot (b^{i} + b_{u 1}^{i} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v_{2}}^{i} v_{2}^{i})) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1} u_{1}^{i} + γ_{P 1} u_{2}^{i}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1} u_{1}^{i} + γ_{E 1} u_{2}^{i}) + p_{P i} (T)) \\ + ({(x^{i})}^{T} \frac{\partial S_{1}^{P i}}{\partial s} x^{i} + \frac{\partial S_{2}^{P i}}{\partial s} x^{i} + \frac{\partial S_{3}^{P i}}{\partial s}) \\ + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} (S_{1}^{P i} x^{i} + S_{2}^{P i}) \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) S_{1}^{P i}] \end{matrix}

(52)

Next, we compute the time derivatives of the parameters

S_{1}^{P i}

,

S_{2}^{P i}

, and

S_{3}^{P i}

:

\frac{\partial S_{1}^{P i}}{\partial s} = - e^{- \int_{t}^{T} c_{P} (x (s)) d s} (Q_{P 1})

(53)

\frac{\partial S_{2}^{P i}}{\partial s} = - {(b^{i} + b_{u 1}^{i} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v_{2}}^{i} v_{2}^{i})}^{T} (S_{1}^{P i})

(54)

\begin{matrix} \frac{\partial S_{3}^{P i}}{\partial s} & = - e^{- \int_{t}^{T} c_{P} (x (s)) d s} [\frac{{(u_{1}^{i})}^{2}}{2} + \frac{{(u_{2}^{i})}^{2}}{2} \\ + \int_{Ω} λ_{P} (\partial_{t} ψ_{P} (s, x^{i}) + \partial_{x} ψ_{P} (s, x^{i}) \cdot (b^{i} + b_{u 1}^{i} u_{1}^{i} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1}^{i} + b_{v_{2}}^{i} v_{2}^{i})) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1} u_{1}^{i} + γ_{P 2} u_{2}^{i}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1} u_{1}^{i} + γ_{E 2} u_{2}^{i}) + p_{P i} (s) \\ + {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} (S_{2}^{P i})] \\ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) S_{1}^{P i} \end{matrix}

(55)

We assume the boundary conditions at time T are

S_{1}^{P i} (T) = 0, S_{2}^{P i} (T) = 0, S_{3}^{P i} (T) = 0 .

The optimal strategy for the pursuer can be derived as follows:

\begin{matrix} u_{1}^{i, *} & = - [\int_{Ω} λ_{P} (\partial_{x} ψ_{P} (s, x^{i}) \cdot b_{u 1}^{i}) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 1}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 1}) \\ + {(b_{u 1}^{i})}^{T} (S_{1}^{P i} x^{i} + S_{2}^{P i})] \end{matrix}

(56)

\begin{matrix} u_{2}^{i, *} & = - [\int_{Ω} λ_{P} (\partial_{x} ψ_{P} (s, x^{i}) \cdot b_{u 2}^{i}) d (μ^{P} (s) - μ^{E} (s)) \\ + exp \{\frac{1}{{(X_{P i} - {\bar{X}}_{P i})}^{T} Q_{P 3} (X_{P i} - {\bar{X}}_{P i})}\} (γ_{P 2}) \\ + {(X_{P i} - {\bar{X}}_{E i})}^{T} Q_{P 4} (X_{P i} - {\bar{X}}_{E i}) (γ_{E 2}) \\ + {(b_{u 2}^{i})}^{T} (S_{1}^{P i} x^{i} + S_{2}^{P i})] \end{matrix}

(57)

The optimal strategy for the evaders is derived as follows:

\begin{matrix} v_{1}^{i, *} & = [\int_{Ω} λ_{E} (\partial_{x} ψ_{E} (s, x^{i}) \cdot b_{v 1}^{i}) d (μ^{E} (s) - μ^{P} (s)) \\ + {(X_{E i} - {\bar{X}}_{E i})}^{T} Q_{E 3} (X_{E i} - {\bar{X}}_{E i}) (ϑ_{v 1}) \\ + {(X_{E i} - {\bar{X}}_{P i})}^{T} Q_{P 4} (X_{E i} - {\bar{X}}_{P i}) (ϑ_{v 1}) \\ + {(b_{v 1}^{i})}^{T} (S_{1}^{E i} x^{i} + S_{2}^{E i})] \end{matrix}

(58)

\begin{matrix} v_{2}^{i, *} & = - [\int_{Ω} λ_{E} (\partial_{x} ψ_{E} (s, x^{i}) \cdot b_{v 2}^{i}) d (μ^{E} (s) - μ^{P} (s)) \\ + {(X_{E i} - {\bar{X}}_{E i})}^{T} Q_{E 3} (X_{E i} - {\bar{X}}_{E i}) (ϑ_{v 2}) \\ + {(X_{E i} - {\bar{X}}_{P i})}^{T} Q_{P 4} (X_{E i} - {\bar{X}}_{P i}) (ϑ_{v 2}) \\ + {(b_{v 2}^{i})}^{T} (S_{1}^{E} x^{i} + S_{2}^{E})] \end{matrix}

(59)

5. $ϵ$ Nash Equilibrium

In this section, we first prove the uniqueness of the solution to the forward-backward stochastic differential equations using monotone operators, as well as the uniqueness of the distribution function. Then, we demonstrate the boundedness of the state and, based on the uniqueness of the solution and the boundedness of the state, we establish the

ϵ

Nash equilibrium.

Theorem 1.

Under the conditions of Assumptions 1 and 3, the solution to the forward-backward stochastic differential equations is unique, and for each player, the value function is unique.

Proof.

Suppose the state satisfies the uniqueness condition. We assume that the time-independent condition holds. Then, the backward stochastic differential equations for

S_{1}^{P i}

,

S_{2}^{P i}

, and

S_{3}^{P i}

have a unique solution: Define the linear operator

A

as follows:

A (\begin{matrix} S_{1}^{P i} \\ S_{2}^{P i} \\ S_{3}^{P i} \end{matrix}) = (\begin{matrix} - e^{- \int_{t}^{T} c_{P} (x (s)) d s} (Q_{P 1}) \\ - {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} S_{1}^{P} \\ \begin{matrix} - {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} S_{2}^{P} - \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) S_{1}^{P} \end{matrix} \end{matrix})

(60)

To obtain the unique solution for Equation (33), we only need to verify whether the monotonicity condition in [34] is satisfied. If there is no ambiguity,

〈 \cdot, \cdot 〉

represents the usual inner product in Euclidean space. Then, the following can be obtained:

\begin{matrix} 〈A (\begin{matrix} S_{1}^{P i} \\ S_{2}^{P i} \\ S_{3}^{P i} \end{matrix}), (\begin{matrix} S_{1}^{P i} \\ S_{2}^{P i} \\ S_{3}^{P i} \end{matrix})〉 & = {(\begin{matrix} S_{1}^{P i} & S_{2}^{P i} & S_{3}^{P i} \end{matrix})}^{T} \\ (\begin{matrix} - e^{- \int_{t}^{T} c_{P} (x (s)) d s} (Q_{P 1}) \\ - {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} S_{1}^{P} \\ - {(b^{i} + b_{u 1}^{i} u_{1} + b_{u 2}^{i} u_{2}^{i} + b_{v 1}^{i} v_{1} + b_{v_{2}}^{i} v_{2})}^{T} S_{2}^{P} \\ - \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) S_{1}^{P} \end{matrix}) \end{matrix}

(61)

Applying Young’s inequality, we obtain

\leq - e^{- \int_{t}^{T} c_{P} (x (s))} ∥Q_{P 1}∥ {∥S^{P i}∥}^{2} - 2 ∥B^{i}∥ {∥S^{P i}∥}^{2} - \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T}) {∥S^{P i}∥}^{2}

\leq (- e^{- \int_{t}^{T} c_{P} (x (s))} ∥Q_{P 1}∥ - 2 ∥B^{i}∥ - \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T})) {∥S^{P i}∥}^{2} < 0

where:

S^{P i} = min ∥S_{i}^{P i}∥, i = 1, 2, 3 .

Since

(e^{- \int_{t}^{T} c_{P} (x (s))} ∥Q_{P 1}∥ + 2 ∥B^{i}∥ + \frac{1}{2} Tr (σ^{i} {(σ^{i})}^{T})) > 0

is satisfied, the monotonicity condition in this theorem holds, and therefore the value function is unique, which implies that the corresponding optimal strategy is also unique. The proof for the evaders follows a similar process and is omitted here. □

Theorem 2.

Under the conditions set by Assumptions 1 and 2, the distribution processes

m^{P}

and

m^{E}

satisfy the uniqueness property, meaning that they each have a unique solution.

Proof.

We will prove the uniqueness of the distribution process by assuming the contrary; that is, there are two different solutions

m^{P} (s), m^{P} (T)

and

m^{P} {(s)}^{'}, m^{P} {(T)}^{'}

for the same initial conditions. Let

m^{P} (s)

and

m^{P} {(s)}^{'}

represent two different solutions. We consider the following equations:

m^{P} (s) = m_{s}^{P} (X_{s}) and m^{P} {(s)}^{'} = m_{s}^{P} (X_{s}^{'})

(62)

where

X, x

and

X^{'}, x^{'}

are the state processes under the distributions

m^{P}

and

m^{P^{'}}

, respectively. The optimal control strategies

u_{1}^{i, *}, u_{2}^{i, *}

correspond to the strategies under

m^{P}

, while the optimal control strategies

{(u_{1}^{i, *})}^{'}, {(u_{2}^{i, *})}^{'}

correspond to the strategies under

m^{P^{'}}

. Let

J_{P i} (u_{1}^{i}, u_{2}^{i}, m^{P})

be the cost functional associated with the control problem:

J_{P i} (u_{1}^{i}, u_{2}^{i}, m^{P}) = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i} (T), m^{P} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P} (s)) d s]

(63)

where

g_{P i} (x^{i} (T), m^{P})

is the terminal cost function, and

h_{P i} (x^{i}, X_{P i}, m^{P} (s))

is the running cost function. We now compare the cost functionals for both distributions m and

m^{'}

. Since we assume

m \neq m^{'}

, we have

J_{P i} (u_{1}^{i}, u_{2}^{i}, μ^{P}) < J_{P i} ({(u_{1}^{i})}^{'}, {(u_{2}^{i})}^{'}, μ^{P})

and

J_{P i} ({(u_{1}^{i})}^{'}, {(u_{2}^{i})}^{'}, μ^{P^{'}}) < J (u_{1}^{i}, u_{2}^{i}, μ^{P^{'}})

This means that the control strategies corresponding to

{(u_{1}^{i})}^{'}, {(u_{2}^{i})}^{'}

and

u_{1}^{i}, u_{2}^{i}

give different cost values. By adding the two inequalities, we obtain

\begin{matrix} E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i} (T), μ^{P} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P}) d s] \\ < E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i^{'}} (T), μ^{P} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i^{'}}, X_{P i}^{'}, μ^{P}) d s], \\ E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i^{'}} (T), μ^{P^{'}} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i^{'}}, X_{P i}^{'}, μ^{P^{'}}) d s] \\ < E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i} (T), μ^{P^{'}} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P^{'}}) d s] . \end{matrix}

(64)

By subtracting the two inequalities, we obtain

\begin{matrix} E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i} (T), μ^{P} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P}) d s] \\ - E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i} (T), μ^{P^{'}} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P^{'}}) d s] \\ - (E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i^{'}} (T), μ^{P} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i^{'}}, X_{P i}^{'}, μ^{P}) d s] \\ - E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x^{i^{'}} (T), μ^{P^{'}} (T)) + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i^{'}}, X_{P i}^{'}, μ^{P^{'}}) d s]) \\ < 0 \end{matrix}

(65)

This implies that the difference in terminal costs must be negative. Thus, we have

\int_{Ω} [g_{P i} (x^{i} (T), μ^{P} (T)) - g_{P i} (x^{i} (T), μ^{P^{'}} (T))] d μ_{T}^{P} - \int_{Ω} g_{P i} (x^{i} (T), μ^{P} (T)) - g_{P i} (x^{i} (T), μ^{P^{'}} (T)) d μ_{T}^{P^{'}}

Then, it can be obtained that

\begin{matrix} \int_{Ω} [\int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P}) d s - \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P^{'}}) d s] d μ_{T}^{P} \\ - \int_{Ω} [\int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P}) d s - \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P^{'}}) d s] d μ_{T}^{P^{'}} \end{matrix}

This leads to the following contradiction:

0 \leq \int_{Ω} [g_{P i} (x^{i}, μ^{P} (T)) - g_{P i} (x^{i}, μ^{P^{'}})] d (μ_{T}^{P} - μ_{T}^{P^{'}}) < 0

(66)

0 \leq \int_{Ω} [\int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P}) d s - \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x^{i}, X_{P i}, μ^{P^{'}}) d s] d (μ_{T}^{P} - μ_{T}^{P^{'}}) < 0

(67)

The inequality above cannot hold, because the integral over a non-negative measure must be non-negative. Therefore, our assumption that

μ^{P} \neq μ^{P^{'}}

is false. The distribution process is unique. □

Lemma 1.

Suppose the initial assumption about the distribution state holds, then there exists a constant

C > 0

such that for all

r \in [t, T]

and

i \in {1, \dots, N}

, the following holds:

E ∥x^{i} (r)∥ \leq C .

(68)

Proof.

Since

x^{i} (r)

is absolutely continuous in the state Equation (8), we can first express it as

E ∥x^{i} (r)∥ = E ∥x_{t}^{i} + \int_{t}^{r} {\dot{x}}^{i} (s) d s∥,

(69)

where

{\dot{x}}^{i} (s)

is the derivative of the state evolution equation. Next, applying the triangle inequality, we obtain

E ∥x^{i} (r)∥ \leq E ∥x_{t}^{i}∥ + E ∥\int_{t}^{r} B^{i} (s) d s∥ .

(70)

We now expand

B^{i} (s)

according to the known system model:

B^{i} (s) = b^{i} + b_{u 1}^{i} u_{1}^{i} (s) + b_{u 2}^{i} u_{2}^{i} (s) + b_{v 1}^{i} v_{1}^{i} (s) + b_{v 2}^{i} v_{2}^{i} (s),

where

u_{1}^{i}

,

u_{2}^{i}

,

v_{1}^{i}

, and

v_{2}^{i}

are the control inputs. Thus, we have

E ∥x^{i} (r)∥ \leq E ∥x_{t}^{i}∥ + E ∥\int_{t}^{r} (b^{i} + b_{u 1}^{i} u_{1}^{i} (s) + b_{u 2}^{i} u_{2}^{i} (s) + b_{v 1}^{i} v_{1}^{i} (s) + b_{v 2}^{i} v_{2}^{i} (s)) d s∥ .

(71)

Using the linearity of expectations and the subadditivity of the norm, we estimate each term as the maximum constant value:

E ∥x^{i} (r)∥ \leq C_{x t} + E \int_{t}^{r} (∥ b^{i} ∥ + ∥ b_{u 1}^{i} ∥ ∥ u_{1}^{i} (s) ∥ + ∥ b_{u 2}^{i} ∥ ∥ u_{2}^{i} (s) ∥ + ∥ b_{v 1}^{i} ∥ ∥ v_{1}^{i} (s) ∥ + ∥ b_{v 2}^{i} ∥ ∥ v_{2}^{i} (s) ∥) d s .

(72)

In the above estimate, we introduce a constant

C_{π}

, which represents the upper bound on the control inputs:

C_{π} = max \{∥ b_{u 1}^{i} ∥, ∥ b_{u 2}^{i} ∥, ∥ b_{v 1}^{i} ∥, ∥ b_{v 2}^{i} ∥\} .

And we let

π = max {∥ u_{1}^{i} ∥, ∥ u_{2}^{i} ∥, ∥ v_{1}^{i} ∥, ∥ v_{2}^{i} ∥}

. We also assume that

∥ b^{i} ∥ < A ∥ x^{i} ∥

, where A is a positive definite matrix, and thus we obtain

E ∥x^{i} (r)∥ \leq C_{x t} + E \int_{t}^{r} (A ∥ x^{i} ∥ + 4 C_{π} π) d s .

(73)

By applying Gronwall’s inequality, we obtain the upper bound:

\begin{matrix} E ∥x^{i} (r)∥ & \leq C_{x t} + 4 (T - t) C_{π} π + E \int_{t}^{r} (C_{x t} + 4 C_{π} π s) ∥ A ∥ e^{T ∥ A ∥} d s . \\ \leq C_{x t} + 4 T C_{π} π + T (C_{x t} + 4 C_{π} π T) ∥ A ∥ e^{T ∥ A ∥} . \end{matrix}

(74)

Finally, we can conclude that there exists a constant C, such that for all

r \in [t, T]

, the following holds:

E ∥x^{i} (r)∥ \leq C .

□

Theorem 3.

Consider a scenario where all players adopt optimal distributions. Although individual distributions may deviate from the optimal distribution, as the number of players increases, the difference between individual states and cost functions becomes smaller and approaches zero. Specifically, we have

lim_{n \to \infty} E {∥\frac{1}{n} \sum_{i = 1}^{n} x^{i} (s) - \bar{x} (s)∥}^{2} \leq C_{a}

(75)

E ∥J^{P i} - {\bar{J}}^{P}∥ \leq C_{J}

(76)

where

C_{J}

and

C_{a}

are constants that do not depend on n.

Proof.

We begin by considering the mean state equation:

d \bar{x} = (b^{i} + b_{u 1} {\bar{u}}_{1} + b_{u 2} {\bar{u}}_{2} + b_{v 1} {\bar{v}}_{1} + b_{v 2} {\bar{v}}_{2}) d s + \bar{σ} d W

(77)

Next, we need to prove the following inequality for the state difference:

\begin{matrix} ∥x^{i} (s) - \bar{x} (s)∥ & \leq e^{A t} (x_{0}^{i} - {\bar{x}}_{0}) \\ + \int_{t}^{T} e^{A (T - s)} (b_{u 1} (u_{1}^{i} - {\bar{u}}_{1}) + b_{u 2} (u_{2}^{i} - {\bar{u}}_{2}) + b_{v 1} (v_{1}^{i} - {\bar{v}}_{1}) + b_{v 2} (v_{2}^{i} - {\bar{v}}_{2})) d s \end{matrix}

(78)

Using the inequality,

{∥\sum_{i = 1}^{n} a_{i}∥}^{2} = n \sum_{i = 1}^{n} {∥a_{i}∥}^{2}

(79)

Now, we can expand this into several terms for each input:

E {∥x^{i} (s) - \bar{x} (s)∥}^{2} \leq 5 E {∥e^{A t} (x_{0}^{i} - {\bar{x}}_{0})∥}^{2}

+ 5 E {∥\int_{t}^{T} e^{A (T - s)} b_{u 1} (u_{1}^{i} - {\bar{u}}_{1}) d s∥}^{2}

+ 5 E {∥\int_{t}^{T} e^{A (T - s)} b_{u 2} (u_{2}^{i} - {\bar{u}}_{2}) d s∥}^{2}

+ 5 E {∥\int_{t}^{T} e^{A (T - s)} b_{v 1} (v_{1}^{i} - {\bar{v}}_{1}) d s∥}^{2}

+ 5 E {∥\int_{t}^{T} e^{A (T - s)} b_{v 2} (v_{2}^{i} - {\bar{v}}_{2}) d s∥}^{2}

Using the Cauchy–Schwarz and Jensen inequalities, we can now bound each term:

E {∥x^{i} (s) - \bar{x} (s)∥}^{2} \leq 5 {∥e^{A t}∥}^{2} E {∥(x_{0}^{i} - {\bar{x}}_{0})∥}^{2}

+ 5 \int_{t}^{T} {∥e^{A (T - s)}∥}^{2} E {∥b_{u 1} (u_{1}^{i} - {\bar{u}}_{1})∥}^{2} d s

+ 5 \int_{t}^{T} {∥e^{A (T - s)}∥}^{2} E {∥b_{u 2} (u_{2}^{i} - {\bar{u}}_{2})∥}^{2} d s

+ 5 \int_{t}^{T} {∥e^{A (T - s)}∥}^{2} E {∥b_{v 1} (v_{1}^{i} - {\bar{v}}_{1})∥}^{2} d s

+ 5 \int_{t}^{T} {∥e^{A (T - s)}∥}^{2} E {∥b_{v 2} (v_{2}^{i} - {\bar{v}}_{2})∥}^{2} d s

Next, we calculate the convergence of each term. For the first term, due to the boundedness of the state, we can assume

E {∥(x_{0}^{i} - {\bar{x}}_{0})∥}^{2} \leq C_{x}

where

C_{x}

is a bound derived from the above formulas. The remaining terms are related to the control inputs. We assume that there exists an upper bound for the control inputs. Therefore, we can conclude

E {∥x^{i} (s) - \bar{x} (s)∥}^{2} \leq C (O)

As

N \to \infty

, we can obtain

lim_{N \to \infty} E {∥\frac{1}{N} \sum_{i = 1}^{N} x^{i} (s) - \bar{x} (s)∥}^{2} \leq \frac{1}{N} \sum_{i = 1}^{N} E {∥\frac{x^{i} (s) - \bar{x} (s)}{\sqrt{N}}∥}^{2} \leq C_{a}

\begin{matrix} J_{P i} (x^{i}, s; μ^{P}, μ^{E}) & = inf_{u_{i} \in π_{u}} E [e^{- \int_{t}^{T} c (x (r)) d r} g_{P i} (x (T), μ^{P} (T), μ^{E} (T)) \\ + \int_{t}^{T} e^{- \int_{t}^{s} c (x (r)) d r} h_{P i} (x, s, μ^{P} (s), μ^{E} (s)) d s] \end{matrix}

(80)

Next, we prove the second inequality. By using the Lipschitz assumption, we can compute

\begin{matrix} E |J^{P i} - {\bar{J}}^{P}| & \leq E [\int_{t}^{T} e^{- \int_{t}^{T} c (x (r)) d r} |h_{P i} (s, x^{i}, X^{i}, u_{1}^{i}, u_{2}^{i}, μ^{P} (s), μ^{E} (s)) - h_{P i} (s, \bar{x}, \bar{X}, {\bar{u}}_{1}, {\bar{u}}_{2}, μ^{P} (s), μ^{E} (s))| d s] \\ + E [e^{- \int_{t}^{T} c (x (r)) d r} |g_{P i} (x^{i} (T), μ^{P} (T), μ^{E} (T)) - g_{P i} (\bar{x} (T), μ^{P} (T), μ^{E} (T))|] \end{matrix}

\begin{matrix} E |J^{P i} - {\bar{J}}^{P}| & \leq E [\int_{t}^{T} e^{- \int_{t}^{T} c (x (r)) d r} |C_{x} (x^{i} - \bar{x}) + C_{X} (X^{i} - \bar{X}) + C_{u 1} (u_{1}^{i} - {\bar{u}}_{1}) + C_{u 2} (u_{2}^{i} - {\bar{u}}_{2}) \\ + C_{W} d_{W} (μ^{P}, μ^{E})| d s] \\ + E [e^{- \int_{t}^{T} c (x (r)) d r} |C_{x T} (x^{i} (T) - \bar{x} (T)) + C_{W T} d_{W} (μ^{P} (T) - μ^{E} (T))|] \end{matrix}

For the state and control input bounds, the choice of upper bounds is the same as for the state bounds. Next, we need to establish the upper bound for the distribution. Considering the distribution differences between the pursuer group

μ^{P} (t)

and the evaders group

μ^{E} (t)

, we can use the Wasserstein distance to measure the difference between the two groups [35]. Specifically, we have

E [d_{W} (μ^{P} (t), μ^{E} (t))] = {(inf_{γ \in Γ (μ^{P}, μ^{E})} \int_{R^{n} \times R^{n}} {∥ X_{P i} - X_{E i} ∥}^{2} d γ (X_{P i}, X_{E i}))}^{1 / 2} \leq C_{l}

where

X_{P i}

and

X_{E i}

represent the sample points from the pursuer group

μ^{P} (t)

and the evader group

μ^{E} (t)

.

γ \in Γ (μ^{P}, μ^{E})

is the set of joint distributions that have

μ^{P} (t)

and

μ^{E} (t)

as marginal distributions.

∥ X_{P i} - X_{E i} ∥^{2}

is the distance metric (usually Euclidean) between the sample points of the pursuer and evaders. This formula represents the expected Wasserstein distance, which measures the distributional difference between the two groups. By minimizing all possible joint distributions

γ

, we obtain the minimal distance metric. Therefore, for the mean, the difference in cost functions can be expressed as

E |J^{P i} - {\bar{J}}^{P}| \leq C_{J}

Similarly, the

ϵ

Nash equilibrium for the evaders is consistent with the Nash equilibrium for a finite number of individuals. The proof follows a similar approach and is omitted. □

6. Numerical Analysis

In this study, we assume that there are differences in the flight dynamics between the pursuers and the evaders. Specifically, the pursuers typically have higher speeds and a greater normal acceleration, which enables them to more quickly approach the target and perform the interception task. In contrast, the evaders have lower speeds and a smaller normal acceleration, which results in weaker maneuverability, thus affecting their evasion ability.

The differences in flight dynamics between the pursuers and evaders directly influence the success rate of the interception task. Under otherwise similar conditions, the higher speed and greater maneuverability of the pursuers allow them to track the evaders more effectively and reduce the interception time. On the other hand, the lower maneuverability of the evaders makes them more likely to be caught, unless specific strategies are employed to increase the probability of successful evasion. We will further analyze the specific impact of differences in flight dynamics on interception strategies and explore the effects of optimization of these differences on system performance in the subsequent sections.

In this study, the control strategies for both the pursuers and evaders are calculated using the optimal state feedback strategy. Despite the differences in flight dynamics, the strategies for both parties are designed to maximize their individual performance. However, the differences in flight dynamics lead to adjustments in the strategies, so that the pursuers can approach the target more quickly, while the evaders must employ strategies to delay capture, despite their limited maneuverability.

Example 1.

The initial conditions for this simulation are as follows:

Initial positions: The initial positions of both the pursuer and the evaders follow a normal distribution. The mean position of the pursuer is $[0 15, 000 5000]$ , with the covariance matrix:

$[\begin{matrix} 10, 000, 000 & 1 & 2 \\ 1 & 10, 000, 000 & 3 \\ 2 & 3 & 10, 000, 000 \end{matrix}]$

The mean position of the evaders are $[100, 000 14, 000 0]$ , with the covariance matrix:

$[\begin{matrix} 10, 000, 000 & 1 & 2 \\ 1 & 10, 000, 000 & 3 \\ 2 & 3 & 10, 000, 000 \end{matrix}]$
Initial velocities: The pursuers’ velocities are set to 4000 m/s, respectively, while the evaders’ velocity is 3500 m/s.
Acceleration limits: The maximum normal accelerations are 30 g for the pursuers and 20 g for the evaders (g = 9.81 m/s²).
Flight path angles: The initial flight path angles for both the pursuers are set to $[0^{\circ}, 0^{\circ}]$ (elevation and azimuth). The initial flight path angles for both the evaders are set to $[0^{\circ}, 180^{\circ}]$ (elevation and azimuth). The covariance matrix for these angles is given by:

$[\begin{matrix} 10 & 1 \\ 1 & 10 \end{matrix}]$
Time step: A time interval of 0.1 s was used for numerical integration.

In the first experiment, five pursuers and five evaders each employed the optimal strategy during the simulation. The simulation environment is summarized in Table 2. During the simulation, the Runge–-Kutta method was applied in each iteration.

The optimal trajectory is depicted in Figure 2, which illustrates the efficiency and effectiveness of the optimal feedback strategy. The strategy ensured that both the pursuers and the evaders followed paths that minimized or maximized their respective objectives, demonstrating the capability of the algorithm to handle dynamic environments.

To further analyze the performance, two-dimensional projections in the X–Y and X–Z planes are shown in Figure 3 and Figure 4, respectively. These projections offer a clearer view of the trajectories in different planes, revealing how the optimal state feedback strategy successfully adapted to the spatial constraints. The figures demonstrate the feasibility of the optimal state feedback strategy algorithm proposed in this paper, highlighting its ability to maintain precise control over the motion of the pursuers and evaders, even under varying conditions.

These results show that the strategy not only guided the system toward the desired outcomes, but also did so in a manner that was robust to changes in the initial conditions and perturbations. The efficiency of the strategy is reflected in the smooth and predictable nature of the trajectories, making it an effective solution for real-time applications in pursuit–evasion problems.

The 3D plot illustrates the overall trajectory of the pursuers and evaders. The paths taken by both entities followed smooth, well-defined curves, which highlights the efficacy of the optimal feedback strategy in controlling their movement across three-dimensional space. The optimal feedback strategy is clearly visible, as both the pursuers and the evaders followed well-defined paths. This demonstrates the robustness of the strategy in guiding the system to its desired outcome.

The XY projection highlights the movement of both pursuers and evaders along the horizontal plane. It visually demonstrates how the optimal strategy governed their trajectories, ensuring effective pursuit and evasion on the two-dimensional surface. This projection shows the horizontal movement of the pursuers and evaders, illustrating the optimality of the feedback strategy in maintaining a controlled pursuit−evasion scenario.

In the XZ projection, we can observe the vertical motion of both the pursuers and the evaders. This visualization confirms the optimal feedback strategy’s ability to manage the 3D movement dynamics and maintain the desired behavior in the vertical direction. This view highlights the vertical movement of both the pursuers and evaders, further validating the strategy’s ability to effectively handle 3D motion dynamics.

The acceleration variations of the pursuers are shown in Figure 5 and Figure 6, while the acceleration variations of the evaders are shown in Figure 7 and Figure 8. Since the simulation model was based on a missile interception scenario, we introduced a constraint on the normal acceleration to better reflect real-world conditions. The sign of the normal acceleration only indicates direction, with the acceleration starting from zero and eventually returning to zero. This represents the convergence of the line-of-sight angle, achieving a head-on interception. Here, the normal acceleration refers to the acceleration in the velocity frame, where the acceleration directions of the pursuers and evaders are opposite. However, due to the nature of the game theory problem, the optimal strategies of both parties evolve simultaneously.

The distance between the evaders and the pursuers is shown in Figure 9, where the final miss distances were

0.53 m

,

0.83 m

,

0.62 m

,

0.94 m

, and

0.78 m

. In missile interception problems, an interception is typically considered successful when the distance is less than one meter. In this paper, interception is defined by the condition

\dot{R} > 0

or

R < 1

, where R represents the distance between the pursuers and the evaders. The evaders is captured by the pursuers between

30 s

and

37 s

. Generally, the terminal guidance phase in missile interception lasts between 20 and 40 s, and the results from Experiment 1 fell within this range. The convergence of the final distance and acceleration led to the convergence of the system state. Given that the initial state was bounded and ultimately converged, this demonstrates the bounded nature of the system state. The successful interception of five evaders by the pursuers proves the feasibility of the proposed approach.

Example 2.

To verify the feasibility of the algorithm, we increased the number of pursuers and evaders. In Experiment 2, we used 10 pursuers and 10 evaders. Figure 10 shows a 3D interception diagram for the 10VS10 scenario. From this, it can be observed that the double-population game strategy based on MFG is effective in practical situations. This experiment demonstrated the feasibility of the proposed strategy in addressing the evasion problem of nearby populations.

Figure 11 shows the projection in the vertical direction, clearly presenting the positional relationships between the groups. Figure 12 shows the projection in the horizontal direction, further enhancing the understanding of the interactions between the two groups.

To further analyze the movement patterns of the pursuers and evaders, we present the acceleration changes of both the pursuers and evaders along the y-axis and z-axis. Figure 13 shows the acceleration changes of the pursuers along the y-axis, illustrating how the pursuers adjusted their vertical acceleration to respond to the movement of the evaders. Figure 14 displays the acceleration changes of the pursuers along the z-axis, providing further insight into their vertical adjustments.

Similarly, Figure 15 shows the acceleration changes of the evaders along the y-axis, revealing how the evaders modified their vertical acceleration to increase the likelihood of a successful escape. Figure 16 demonstrates the acceleration changes of the evaders along the z-axis, helping us understand how they adjusted their movement strategy to avoid capture by the pursuers. These figures provide important information regarding the acceleration changes of both the pursuers and evaders in three-dimensional space, serving as a basis for subsequent behavioral analysis and strategy optimization.

Figure 17 shows the variation in distance between 10 pursuers and 10 evaders, where the final miss distances were

0.78 m

,

0.92 m

,

0.94 m

,

0.65 m

,

0.56 m

,

0.79 m

,

0.84 m

,

0.62 m

,

0.98 m

, and

0.73 m

. As the experiment progressed, the relative positions of the pursuers and evaders continuously changed in three-dimensional space, resulting in fluctuations in the distance between them. By observing this figure, we can clearly see how the pursuers gradually shortened the distance to the evaders, while at certain moments, the evaders successfully increased the distance through effective evasion strategies. These data provided important insights for further analyzing the behavior patterns of chasing and evading, helping to optimize the dynamic interaction strategies between pursuers and evaders.

Based on the many-to-many strategy presented in this article, the experimental results demonstrate that the pursuers successfully captured the evaders, with the system’s state converging to zero and the mean value of the state being finite. At the termination time, the values of the value functions equaled zero. The proposed many-to-one and many-to-many pursuit algorithms effectively enabled the interception of multiple targets.

To test the robustness of the model, we conducted 100 independent experiments with varying initial conditions. The initial positions of the pursuers and evaders were randomly selected, with the constraint that the initial distance between them was always greater than 100,000 m. The initial angles were chosen within a predefined permissible range, to ensure consistency across all trials. The experimental parameters and results are presented in Table 3.

Numerical Stability: To further evaluate the stability of the solution, we varied key system parameters, including control input constraints, system dynamics, and the distribution functions of both the pursuers and evaders, using different distributions—such as exponential, chi-square, Weibull, and normal distributions—for comparison. The results showed that, despite variations in these parameters, the system consistently performed well, with the pursuers successfully intercepting the evaders in most trials. This indicates that the solution to the FBSDEs remained stable and robust, even when small perturbations were introduced into the system parameters.

To provide a more comprehensive view of the advantages and limitations of the MFG approach, we will include a detailed comparison with traditional methods in this paper. Specifically, we compare the performance of the MFG approach and traditional methods in solving the same missile interception problem, particularly in the case of a limited number of targets. Traditional methods typically use proportional guidance laws to intercept targets and assume a smaller number of targets, calculating the behavior of each target individually. In contrast, the MFG approach handles the behavior of large-scale target populations using the mean field assumption. This comparison will help us better demonstrate the advantages of the MFG approach in various scenarios, as well as its potential limitations when the number of targets is limited.

Traditional methods use proportional guidance laws to intercept targets. In comparison, the MFG approach proposed in this paper was tested with 100 targets in 100 experiments, and the results were compared in terms of average miss distance and interception success rate. The experimental results are shown in Table 4.

The primary reason for the failure of traditional methods is that, as the number of targets increases, there is mutual interference between the individuals, making the system dynamics complex and difficult to control. Additionally, the proximity of target groups can significantly affect the interception results, as interactions between targets may lead to misjudgments or deviations in the pursuers’ actions, thereby reducing the success rate of interception. In this case, the density and relative positions of the target group become key factors influencing interception performance.

To overcome these issues, this paper adopts the Mean Field Game (MFG) strategy, which effectively simplifies the behavior model of large-scale target populations using the mean field assumption. Through this approach, we are able to reduce interference between individuals, particularly mitigating the negative impact of groups in proximity on the interception results. The experimental results show that the MFG strategy significantly improved the interception success rate, while reducing the miss distance, demonstrating the effectiveness and advantages of this method in complex environments.

7. Conclusions

This paper used the Mean Field Game approach to deeply explore an optimal control strategy for solving a multi-missile interception problem in three-dimensional space. The study showed that the mean field game model not only effectively overcame the computational explosion problems caused by high dimensions and large-scale systems in traditional methods, but also provided an optimized solution suitable for large-scale, multi-target environments. By introducing the Hamilton–Jacobi–Bellman equation and proving the uniqueness of its solution, this paper provides strong theoretical support for solving complex multi-agent system control problems.

Firstly, this paper derived the motion models of the pursuers and evaders, and described the game behaviors in the multi-missile interception problem through control constraints and cost function design. Using the framework of mean field games, it was shown that the participants could find approximately optimal strategies based on the given population statistics, without being aware of the state of each individual. Furthermore, by leveraging the concept of

ϵ

-Nash equilibrium, this paper further demonstrated that, within the tolerance range, the participants’ strategies could still minimize costs, ensuring the stability and feasibility of the system.

Finally, the research findings in this paper have broad application prospects, especially in high-risk multi-agent control problems such as missile defense systems. Through the effective application of mean field games, the study not only provides a theoretical basis for the design of practical missile interception systems, but also offers scalable solutions for similar multi-agent systems. The successful application of this theoretical framework demonstrates the immense potential of mean field games in complex multi-target environments. Future research could further explore more efficient solution methods and optimization strategies based on this foundation.

Author Contributions

Conceptualization, Y.B. and D.Z.; methodology, Y.B.; software, Y.B.; validation, Y.B., D.Z. and Z.H.; formal analysis, Y.B.; investigation, D.Z.; resources, Z.H.; data curation, Y.B.; writing—original draft preparation, Y.B.; writing—review and editing, D.Z. and Z.H.; visualization, Y.B.; supervision, D.Z.; project administration, D.Z.; funding acquisition, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61773142.

Data Availability Statement

We have added a Data Availability Statement to indicate that no new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Huang, M.; Caines, P.; Malhame, R. Large-population cost-coupled LQG problems: Generalizations to non-uniform individuals. In Proceedings of the 2004 43rd IEEE Conference on Decision and Control (CDC), Nassau, Bahamas, 14–17 December 2004; Volume 4, pp. 3453–3458. [Google Scholar] [CrossRef]
Huang, M.; Caines, P.E.; Malhame, R.P. Large-Population Cost-Coupled LQG Problems With Nonuniform Agents: Individual-Mass Behavior and Decentralized ε-Nash Equilibria. IEEE Trans. Autom. Control. 2007, 52, 1560–1571. [Google Scholar] [CrossRef]
Cecchin, A.; Pra, P.D.; Fischer, M.; Pelino, G. On the Convergence Problem in Mean Field Games: A Two State Model without Uniqueness. SIAM J. Control. Optim. 2019, 57, 2443–2466. [Google Scholar] [CrossRef]
Bensoussan, A.; Frehse, J.; Yam, P. Mean Field Games and Mean Field Type Control Theory; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Fathi, M.; Matarasso, A. On the existence of solutions to the Hamilton-Jacobi-Bellman equation for optimal control problems. Appl. Math. Optim. 2007, 56, 229–252. [Google Scholar]
Fleming, W.H.; Soner, H.M. Controlled Markov Processes and Viscosity Solutions; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Angel, L.; Sebastian, J.; Saltaren, R.; Aracil, R. RoboTenis System Part II: Dynamics and Control. In Proceedings of the 44th IEEE Conference on Decision and Control, Seville, Spain, 12–15 December 2005; pp. 2030–2034. [Google Scholar] [CrossRef]
Osborne, Y.A.P.; Smears, I. Analysis and Numerical Approximation of Stationary Second-Order Mean Field Game Partial Differential Inclusions. SIAM J. Numer. Anal. 2024, 62, 29. [Google Scholar]
Lasry, J.M.; Lions, P.L. Mean field games. Jpn. J. Math. 2007, 2, 229–257. [Google Scholar] [CrossRef]
Ho, E.; Rajagopalan, A.; Skvortsov, A.; Arulampalam, S.; Piraveenan, M. Game Theory in Defence Applications: A Review. Sensors 2022, 22, 1032. [Google Scholar] [CrossRef]
Cesari, L. Optimization: Theory and Applications; Springer: Berlin/Heidelberg, Germany, 1983. [Google Scholar]
Salonen, H. On the existence of Nash equilibria in large games. Int. J. Game Theory 2010, 39, 351–357. [Google Scholar]
Dayanikli, G.; Laurière, M. Multi-population Mean Field Games with Multiple Major Players: Application to Carbon Emission Regulations. In Proceedings of the 2024 American Control Conference (ACC), Toronto, ON, Canada, 8–9 July 2024; pp. 5075–5081. [Google Scholar] [CrossRef]
Nash, J.F. Equilibrium points in n-person games. Proc. Natl. Acad. Sci. USA 1950, 36, 48–49. [Google Scholar] [CrossRef]
Turetsky, V.; Shinar, J. Missile guidance laws based on pursuit–evasion game formulations. Automatica 2003, 39, 607–618. [Google Scholar] [CrossRef]
Sun, Z.; Yang, J. Multi-missile interception for multi-targets: Dynamic situation assessment, target allocation and cooperative interception in groups. J. Frankl. Inst. 2022, 359, 7745–7770. [Google Scholar] [CrossRef]
Carmona, R.; Delarue, F. Probability Theory and Stochastic Modelling: Probabilistic Theory of Mean Field Games with Applications I; Springer International Publishing: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Toumi, N.; Malhamé, R.; Ny, J.L. A mean field game approach for a class of linear quadratic discrete choice problems with congestion avoidance. Automatica 2024, 160, 13. [Google Scholar] [CrossRef]
Song, S.H.; Ha, I.J. A Lyapunov-like approach to performance analysis of 3-dimensional pure PNG laws. IEEE Trans. Aerosp. Electron. Syst. 1994, 30, 238–248. [Google Scholar] [CrossRef]
Cannarsa, P.; Capuani, R.; Cardaliaguet, P. Mean field games with state constraints: From mild to pointwise solutions of the PDE system. Calc. Var. Partial. Differ. Equations 2021, 60, 108. [Google Scholar] [CrossRef]
Song, J.; Zhang, X.; Wang, L. Impact Angle Constrained Guidance against Non-maneuvering Targets. AIAA J. Guid. Control. Dyn. 2023, 46, 1556–1565. [Google Scholar] [CrossRef]
Li, B.; Zhang, W.; Sun, Z. Guidance for Intercepting High-Speed Maneuvering Targets. J. Guid. Control. Dyn. 2021, 44, 2282–2293. [Google Scholar]
Hou, M.; Liang, X.; Duan, G. Adaptive block dynamic surface control for integrated missile guidance and autopilot. Chin. J. Aeronaut. 2013, 26, 741–750. [Google Scholar] [CrossRef]
Oza, H.B.; Padhi, R. Impact-Angle-Constrained Suboptimal Model Predictive Static Programming Guidance of Air-to-Ground Missiles. J. Guid. Control Dyn. 2012, 35, 153–164. [Google Scholar] [CrossRef]
Souganidis, P.E.; Zariphopoulou, T. Mean field games with unbounded controlled common noise in portfolio management with relative performance criteria. Math. Financ. Econ. 2024, 18, 28. [Google Scholar]
Ran, M.; Wang, Q.; Hou, D.; Dong, C. Backstepping design of missile guidance and control based on adaptive fuzzy sliding mode control. Chin. J. Aeronaut. 2014, 27, 634–642. [Google Scholar] [CrossRef]
Firoozi, D.; Jaimungal, S.; Caines, P.E. Convex analysis for LQG systems with applications to major–minor LQG mean-field game systems. Syst. Control. Lett. 2020, 142, 104725. [Google Scholar] [CrossRef]
Huang, M. Linear-quadratic mean field games with a major player: Nash certainty equivalence versus master equations. Commun. Inf. Syst. 2021, 21, 213–242. [Google Scholar] [CrossRef]
Firoozi, D.; Jaimungal, S.; Caines, P.E. Exploratory LQG mean field games with entropy regularization. J. Optim. Theory Appl. 2020, 184, 211–237. [Google Scholar] [CrossRef]
Engwerds, J.C.; van den Broek, W.A.; Schumacher, J.M. Feedback Nash equilibria in uncertain infinite time horizon differential games. In Proceedings of the 14th International Symposium of Mathematical Theory of Networks and Systems, Perpignan, France, 19–23 June 2000; pp. 1–6. [Google Scholar]
Carlini, E.; Silva, F.J.; Zorkot, A. A Lagrange-Galerkin scheme for first order mean field games systems. arXiv 2023, arXiv:2303.14941. [Google Scholar] [CrossRef]
Hintermüller, M.; Surowiec, T.M.; Theiß, M. On a Differential Generalized Nash Equilibrium Problem with Mean Field Interaction. SIAM J. Optim. 2024, 33, 1587–1620. [Google Scholar] [CrossRef]
Li, M.; Li, N.; Wu, Z. Linear–quadratic mean-field game for stochastic systems with partial observation. Automatica 2025, 171, 111821. [Google Scholar] [CrossRef]
Hu, Y.; Peng, S. Solution of forward-backward stochastic differential equations. Probab. Theory Relat. Fields 1995, 103, 273–283. [Google Scholar] [CrossRef]
Bensoussan, A.; Cass, T.; Chau, M.H.M.; Yam, S.C.P. Mean Field Games with Parametrized Followers. IEEE Trans. Autom. Control. 2019, 65, 12–27. [Google Scholar]

Figure 1. Model of movement in 3D space.

Figure 2. Three−dimensional trajectory of pursuers and evaders.

Figure 3. XY projection of the trajectories.

Figure 4. XZ projection of the trajectories.

Figure 5. Pursuer’s acceleration along the y−axis.

Figure 6. Pursuer’s acceleration along the z−axis.

Figure 7. Evader’s acceleration along the y−axis.

Figure 8. Evader’s acceleration along the z−axis.

Figure 9. Variation in the distance.

Figure 10. Three−dimensional interception diagram for 10VS10.

Figure 11. XY projection in the vertical direction.

Figure 12. XZ projection in the horizontal direction.

Figure 13. Pursuer’s y−axis acceleration.

Figure 14. Pursuer’s z−axis acceleration.

Figure 15. Evader’s y−axis acceleration.

Figure 16. Evader’s z−axis acceleration.

Figure 17. Distance variation between 10 pursuers and 10 evaders.

Table 1. Table of symbols and coordinate systems.

Symbol	Description	Coordinate System
$(X_{I}, Y_{I}, Z_{I})$	Inertial reference coordinate system	Inertial
$(X_{L}, Y_{L}, Z_{L})$	Line-of-sight (LOS) coordinate system	LOS
$(X_{E}, Y_{E}, Z_{E})$	Velocity coordinate system of the i-th pursuer	Pursuer
$v_{E i}$	Velocity of the i-th evader	Evader
$v_{P i}$	Velocity of the i-th pursuer	Pursuer
$A_{P i}$	Acceleration of the i-th pursuer	Pursuer
$A_{E i}$	Acceleration of the i-th evader	Evader
$γ_{P i}$	Angle between the acceleration of the i-th pursuer and axis $Y_{P i}$	Pursuer
$γ_{E i}$	Angle between the acceleration of the i-th evader and axis $Y_{E}$	Evader
$R_{P i}$	Distance between the i-th pursuer and the evader	Spatial
$θ_{L i}, φ_{L i}$	LOS angles between the evader and the i-th pursuer relative to the inertial reference coordinate system	LOS
$θ_{P i}, φ_{P i}$	Elevation and azimuth angles of $v_{P i}$ relative to the LOS coordinate system from pointing toward E	Pursuer
$θ_{E i}, φ_{E i}$	Elevation and azimuth angles of $v_{E i}$ relative to the LOS coordinate system from $E_{i}$ pointing toward $P_{i}$	Evader
$A_{z P i}, A_{y P i}$	Projections of the pursuer’s normal acceleration on the $Z_{P i}$ and $Y_{P i}$ axes in the velocity coordinate system	Pursuer
$A_{z E i}, A_{y E i}$	Projections of the evader’s normal acceleration on the $Z_{E i}$ and $Y_{E i}$ axes in the velocity coordinate system	Evader

Table 2. Experiment environment.

Item	Environment
Development language	Python
Library	Numpy
Disk capacity	2 T
RAM	32 G
CPU	i7 2.2 GHZ
OS	Ubuntu 16.04

Table 3. Experimental setup and parameters.

Parameter	Value
Initial distance	100,000
Initial number of evaders	50
Initial number of pursuers	50
Evader initial elevation angle	$[- 20^{\circ}, 20^{\circ}]$ (elevation), $[160^{\circ}, 200^{\circ}]$ (azimuth)
pursuer initial azimuth angle	$[- 20^{\circ}, 20^{\circ}]$ (elevation), $[- 20^{\circ}, 20^{\circ}]$ (azimuth)
Iteration time	0.1 s–0.5 s
Maximum normal acceleration	20–40 g
Number of experiments	100 experiments
Exponential distribution	98 successful, 2 failed
Chi-square distribution	95 successful, 5 failed
Weibull distribution	97 successful, 3 failed
Normal distribution	99 successful, 1 failed

Table 4. Experimental setup and results.

Metric	Proportional Guidance Law	Mean Field Game
Initial Number of Evaders	100	100
Initial Number of Pursuers	100	100
Number of Experiments	100	100
Miss Distance	0.93 m	0.82 m
Interception Success Rate	86 successful	97 successful

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bai, Y.; Zhou, D.; He, Z. Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach. Aerospace 2025, 12, 302. https://doi.org/10.3390/aerospace12040302

AMA Style

Bai Y, Zhou D, He Z. Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach. Aerospace. 2025; 12(4):302. https://doi.org/10.3390/aerospace12040302

Chicago/Turabian Style

Bai, Yu, Di Zhou, and Zhen He. 2025. "Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach" Aerospace 12, no. 4: 302. https://doi.org/10.3390/aerospace12040302

APA Style

Bai, Y., Zhou, D., & He, Z. (2025). Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach. Aerospace, 12(4), 302. https://doi.org/10.3390/aerospace12040302

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach

Abstract

1. Introduction

2. Problem Setting

2.1. Notation

2.2. Problem Formulation

2.3. The $ϵ$ Nash Equilibrium of Two-Group Games

3. The Distribution Functions for the Pursuers and Evaders

4. The Optimal Feedback Strategies

5. $ϵ$ Nash Equilibrium

6. Numerical Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Optimal Pursuit Strategies in Missile Interception: Mean Field Game Approach

Abstract

1. Introduction

2. Problem Setting

2.1. Notation

2.2. Problem Formulation

2.3. The ϵ Nash Equilibrium of Two-Group Games

3. The Distribution Functions for the Pursuers and Evaders

4. The Optimal Feedback Strategies

5. ϵ Nash Equilibrium

6. Numerical Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.3. The $ϵ$ Nash Equilibrium of Two-Group Games

5. $ϵ$ Nash Equilibrium