A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games

Bai, Yu; Zhou, Di; He, Zhen

doi:10.3390/aerospace12010050

Open AccessArticle

A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games

by

Yu Bai

^†

,

Di Zhou

^*,† and

Zhen He

^†

School of Astronautics, Harbin Institute of Technology, Harbin 150001, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Aerospace 2025, 12(1), 50; https://doi.org/10.3390/aerospace12010050

Submission received: 9 December 2024 / Revised: 31 December 2024 / Accepted: 10 January 2025 / Published: 13 January 2025

Download

Browse Figures

Versions Notes

Abstract

This paper investigates three-dimensional pursuit problems in noncooperative stochastic differential games. By introducing a novel polynomial value function capable of addressing high-dimensional dynamic systems, the forward–backward stochastic differential equations (FBSDEs) for optimal strategies are derived. The uniqueness of the value function under bounded control inputs is rigorously established as a theoretical foundation. The proposed methodology constructs optimal closed-loop feedback strategies for both pursuers and evaders, ensuring state convergence and solution uniqueness. Furthermore, the Lebesgue measure of the barrier surface is computed, enabling the design of strategies for scenarios involving multiple pursuers and evaders. To validate its applicability, the method is applied to missile interception games. Simulations confirm that the optimal strategies enable pursuers to consistently intercept evaders under stochastic dynamics, demonstrating the robustness and practical relevance of the approach in pursuit–evasion problems.

Keywords:

pursuit dynamics; stochastic differential games; optimal closed-loop strategies; forward–backward stochastic differential equations (FBSDEs); Lebesgue measure of barrier surfaces

1. Introduction

Motivation: The study of differential game theory has a long and rich history, beginning with the foundational work of Isaacs [1] in the mid-1950s. Over the decades, this field has found applications in diverse domains, such as aerospace, robotics, and control systems [2,3,4,5,6]. A central focus in this area has been the classical non-cooperative linear-quadratic differential game, characterized by linear dynamics and quadratic cost functions. Key advances include necessary and sufficient conditions for the existence of saddle points in deterministic two-player zero-sum differential games over finite time intervals [7]. In stochastic settings, the complexity increases due to uncertainties in system dynamics and decision-making processes. Recent studies have tackled challenges such as unknown system matrices for regulators and participants, leading to adaptive stability solutions for linear-quadratic stochastic differential games [8]. Despite these advances, multi-agent scenarios involving many pursuers and many evaders remain underexplored, particularly in stochastic differential games. Critical challenges include addressing solution uniqueness and quantifying the Lebesgue measure of barrier surfaces, both of which are fundamental to designing optimal strategies.

This paper aims to address these gaps by introducing a novel framework for stochastic differential games in three-dimensional spaces. Specifically, we focus on multi-agent interactions where pursuers and evaders operate under stochastic dynamics. The contributions of this paper are threefold. First, we model these interactions using stochastic differential equations, explicitly accounting for uncertainties to analyze their impact on game strategies. Second, we establish the uniqueness of solutions, ensuring the solvability of the proposed game model. Finally, we compute the Lebesgue measure of the barrier surface using precise mathematical tools, demonstrating its practical relevance in real-world applications such as missile interception.

Brief Summary of Prior Literature: Linear-quadratic differential games have long provided a theoretical foundation for understanding game dynamics under deterministic and stochastic conditions. For games over infinite time horizons, feedback Nash equilibria are well-established, with their existence linked to solutions of algebraic Riccati equations [9]. Extensions to stochastic systems on finite horizons have emphasized feedback information structures, with verification theorems for feedback Stackelberg Nash equilibria derived using fully nonlinear parabolic partial differential equations [10]. Additionally, coupled Riccati equations have been employed to establish local existence, uniqueness, and sufficient conditions for equilibrium solutions. Significant progress has been made in two-player stochastic differential games. For example, Stackelberg solutions under open-loop information structures have been investigated, with applications to mixed linear-quadratic games incorporating input constraints [11]. Similarly, studies on Stackelberg games with Markov jump-diffusion stochastic differential equations have utilized stochastic maximum principles to derive optimal solutions for both leaders and followers [12]. A notable development is the exploration of linear-quadratic Stackelberg differential games for Markov jump-diffusion systems. By formulating a general stochastic maximum principle, open-loop optimal strategies for leaders and followers are derived. The existence of an open-loop saddle point guarantees optimality for both players, as neither can unilaterally improve their outcomes. However, such studies often focus on scenarios involving a single controller in the optimization process. Extensions to multi-agent systems with Markov jump-diffusion dynamics have been investigated by Lv [13], Moon [14], Sun [15], and Zhang and Li [16]. These works analyze interactions between decision-makers under stochastic dynamics but primarily address two-player settings or assume specific structural simplifications.

Recent advancements in multi-agent stochastic differential games have introduced methods such as mean-field games [17] and mixed mean-field analysis [18] to address scenarios with numerous agents. While these approaches offer valuable insights, they often rely on open-loop solutions or simplifying assumptions, such as convex control sets. Feedback strategies, though explored in linear-quadratic frameworks [19], remain underutilized for multi-agent stochastic games involving bounded controls. Studies like [20] have applied the stochastic maximum principle to sequentially solve leader–follower problems, yielding open-loop Stackelberg equilibria. Adaptive strategies for mean-field stochastic differential game problems have been developed using techniques such as weighted least squares estimation, random regularization, and decreasing incentive methods [21]. Additionally, the state-feedback form of Nash equilibrium strategies has been constructed using coupled Riccati equations [22]. Sun and Yong [23] further investigated the properties of open-loop and closed-loop saddle points, while Yu [24] and Miller and Pham [25] extended these approaches to broader scenarios, including McKean–Vlasov stochastic differential equations and Markov jump-diffusion models [26]. Despite these contributions, the challenges of many-to-many stochastic interactions, solution uniqueness, and the impact of bounded controls remain open. This paper extends existing frameworks by employing forward–backward stochastic differential equations (FBSDEs) to model multi-agent interactions. Building on methods from [27,28], we incorporate bounded controls and multi-agent dynamics, addressing both the uniqueness of solutions and the computation of barrier surface measures.

In game theory, Nash equilibrium refers to a situation where all players make decisions simultaneously, and no player can improve their outcome by unilaterally changing their strategy, assuming others keep their strategies unchanged. This concept is widely used in non-hierarchical settings where players have equal decision-making power. On the other hand, Stackelberg equilibrium involves a hierarchical structure. In this scenario, one player (the leader) makes their decision first, while the other players (followers) observe this decision and respond optimally. This sequential decision-making process reflects many real-world scenarios, such as competition between a dominant firm and smaller firms in an industry. The comparison of Nash and Stackelberg Equilibria is shown in Figure 1.

Our work differs from studies by Laszlo and Lions, as well as Carmona, Delarue, and Rachepel, who explored

ϵ

-mean field games [29]. These works typically assume solvable Riccati equations, simplifying the construction of Stackelberg equilibria. While recent studies have proposed strategies for two-player zero-sum Stackelberg and Nash stochastic linear-quadratic (LQ) games, the relationship between these two types of games remains unclear [30]. Additionally, feedback information structures have been analyzed for Stackelberg games on finite horizons [31], with verification theorems derived from Hamiltonian functions. However, the solvability of Riccati equations remains an open question. Other studies incorporating linear-quadratic partial observation models [32] address correlated states and observation noise but primarily focus on two-player scenarios. In contrast, our study provides a comprehensive framework for addressing multi-agent stochastic differential games. By proving solution uniqueness and computing the Lebesgue measure of barrier surfaces, we address fundamental challenges in the field and broaden its applicability to systems with bounded controls and uncertain dynamics.

Contribution of this paper: Inspired by the above literature, we consider the pursuit problems via the non-cooperative stochastic differential game. The main contributions of this article focus on the following points:

1.: This work extends previous studies (e.g., Qi et al. (2024) in [10]) by addressing pursuit problems in many-to-one and many-to-many scenarios, with initial applications in missile interception. Building on the system dynamics introduced in Section 2.2, we derive the optimal strategies for high-dimensional systems through a system of partial differential equations.
2.: In Section 2 presents a rigorous analysis proving the uniqueness of the value function under bounded control inputs. A novel polynomial value function is introduced, which plays a critical role in ensuring the stability and scalability of the proposed framework.
3.: Leveraging the uniqueness of the value function, we further establish in Section 5 the uniqueness of state trajectories within the pursuit problem. Additionally, we analyze the barrier surface separating the pursuit region and the termination set, demonstrating that its Lebesgue measure is zero. This result is crucial for ensuring the feasibility of optimal strategies in practical scenarios.

The value function chosen in this paper is a polynomial, which differs significantly from the one-dimensional value function discussed in references [10]. This gap leads to distinct solutions and results in the formulation of forward–backward stochastic differential equations (FBSDEs) that are also different from those in [20] in prior works. Specifically, the polynomial nature of the value function necessitates a different approach to solving the system, involving matrix calculations and a more complex formulation of the backward differential equations. Additionally, the method proposed in this paper is applied to high-dimensional systems and, for the first time, is applied to missile interception problems, showcasing its versatility and effectiveness in real-world scenarios.

Through this novel approach, we establish the convergence and uniqueness of the state trajectory when both the pursuer and evader follow the optimal state feedback strategy. Furthermore, we demonstrate that the Lebesgue measure of the barrier surface, which separates the pursuit region from the termination set, is zero. These results, stemming from the novel polynomial form of the value function, contribute new insights to the field of non-cooperative linear-quadratic stochastic differential games.

Organization: The organization of this paper is as follows: In Section 2.1, we introduce a three-dimensional model of the pursuit problem. Based on this model, Section 2.2 formulates a high-dimensional stochastic differential game that incorporates the players’ control variables and the diffusion term of the state equation. In Section 3, we provide proof of the uniqueness of the value function and derive the expression for the optimal closed-loop state feedback strategy. In Section 4, based on the uniqueness of the state trajectory within the pursuit region, the Lebesgue measure of the barrier surface is zero, and the barrier surface consists of points where the rate of distance change is zero. In Section 5, we present optimal strategies for many-to-many pursuit problems. Finally, in Section 6, we validate the feasibility of the optimal strategies designed for many-to-one and many-to-many pursuit problems through numerical simulations. The mind map of this paper is shown in Figure 2.

2. Many-to-One Pursuits Problem in Stochastic Differential Games

In this section, we will provide a fundamental definition of the many-to-one pursuit problem and present a mathematical model for stochastic differential games. Some assumptions are given.

2.1. Notation

The pursuers

P_{i}

are collaboratively capturing an evader

E_{i}

, and the evader is trying to evade the capture of the pursuers. The game is played in a three-dimensional space. Assume that both the pursuers and the evader are mass points with normal acceleration constraints. The direction of the velocity is adjusted by the normal acceleration. The direction of the acceleration is perpendicular to the direction of the velocity. The chase model between a pursuer and an evader is shown in Figure 3. From the above references, the non-linear differential equations are obtained based on the relationship between the pursuer

P_{i}

and the evader

E_{i}

in three dimensions [33], i.e.,

{\dot{R}}_{P i} = v_{E i} cos θ_{E i} cos φ_{E i} - v_{P i} cos θ_{P i} cos φ_{P i}

(1)

R_{P i} {\dot{θ}}_{L i} = v_{E i} sin θ_{E i} - v_{P i} sin θ_{P i}

(2)

R_{P i} {\dot{φ}}_{L i} cos θ_{L i} = v_{P i} cos θ_{P i} sin φ_{P i} - v_{E i} cos θ_{E i} sin φ_{E i}

(3)

\begin{matrix} {\dot{θ}}_{P i} = & \frac{A_{y P i}}{v_{P i}} + tan θ_{L i} sin φ_{P i} \times \frac{(v_{P i} cos θ_{P i} sin φ_{P i} - v_{E i} cos θ_{E i} sin φ_{E i})}{R_{P i}} \\ + cos φ_{P i} \frac{(v_{P i} sin θ_{P i} - v_{E i} sin θ_{E i})}{R_{P i}} \end{matrix}

(4)

\begin{matrix} {\dot{φ}}_{P i} = & \frac{A_{z P i}}{v_{P i} cos θ_{P i}} + sin θ_{P i} cos φ_{P i} tan θ_{L i} + \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i} cos θ_{P i}} \\ - sin θ_{P i} sin φ_{P i} \frac{v_{E i} sin θ_{E i} - v_{P i} sin θ_{E i}}{R_{P i} cos θ_{P i}} - \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i}} \end{matrix}

(5)

\begin{matrix} {\dot{θ}}_{E i} = & \frac{A_{y E i}}{v_{E i}} + tan θ_{L i} sin φ_{E i} \times \frac{(v_{P i} cos θ_{P i} sin φ_{P i} - v_{E i} cos θ_{E i} sin φ_{E i})}{R_{P i}} \\ + cos φ_{E i} \frac{(v_{P i} sin θ_{P i} - v_{E i} sin θ_{E i})}{R_{P i}} \end{matrix}

(6)

\begin{matrix} {\dot{φ}}_{E i} = & \frac{A_{z E i}}{v_{E i} cos θ_{E i}} + sin θ_{E i} cos φ_{E i} tan θ_{L i} + \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i} cos θ_{E i}} \\ - sin θ_{E i} sin φ_{E i} \frac{v_{E i} sin θ_{E i} - v_{P i} sin θ_{E i}}{R_{P i} cos θ_{E i}} - \frac{v_{E i} cos θ_{E i} sin φ_{E i} - v_{P i} cos θ_{P i} sin φ_{P i}}{R_{P i}} \end{matrix}

(7)

The pursuit–evasion problem in three-dimensional space is modeled with multiple pursuers attempting to capture an evader. The notation used to describe the system dynamics is as follows in Table 1.

2.2. Problem Formulation

In this paper,

(Ω, F, {(F_{t})}_{t > 0}, P)

is a complete probability space, and

W (s)

is m-dimensional Brownian motion. For any initial time

t > 0

, terminal time

T > t

, and the initial state

x_{0} \in R^{n}

, the filtration

{(F_{t})}_{t > 0}

is the natural filtration generated by the Brownian motion

W (s)

for

t \leq s \leq T

, augmented by all the

P

-null sets of

F

. According to Equations (1)–(7), we consider a stochastic differential game in a pursuit problem with control input constraints:

\{\begin{matrix} d x (s) = (b (s, x (s)) + b_{u 1} (x (s)) u_{1} (s) + b_{u 2} (x (s)) u_{2} (s) + b_{v 1} (x (s)) v_{1} (s) + b_{v 2} (x (s)) v_{2} (s)) d s \\ + σ (s, x (s), u_{1} (s), u_{2} (s), v_{1} (s), v_{2} (s)) d W (s), \\ s \in (t, T), \\ x (t) = x_{t} . \end{matrix}

(8)

where

x (s) \in R^{m \times 1}

is the system state with the initial condition

x_{t}

,

m = 5 n + 2 k

, n is the number of pursuers, and k is the number of evaders.

x = {[\begin{matrix} R_{P 1}, R_{P 2}, \dots, θ_{L 1}, φ_{L 1}, θ_{L 2}, φ_{L 2}, \\ θ_{P 1}, φ_{P 1}, θ_{P 2}, φ_{P 2}, θ_{E 1}, φ_{E 1}, \dots \end{matrix}]}^{T}

.

u_{1} (s) \in R^{n \times 1}

is the acceleration of pursuers in the velocity coordinate system of the y-axis.

u_{2} (s) \in R^{n \times 1}

is the acceleration of pursuers in the velocity coordinate system of the z-axis.

v_{1} (s) \in R^{k \times 1}

is the acceleration of evader in the velocity coordinate system of the y-axis.

v_{2} (s) \in R^{k \times 1}

is the acceleration of evader in the velocity coordinate system of the z-axis.

\{W (s) \in R^{m \times 1}, {(F_{t})}_{t > 0}, t \leq s \leq T\}

is the standard Wiener process,

b (s, x (s)) \in R^{m \times 1}

,

b_{u 1} (s, x (s)) \in R^{m \times n}

,

b_{u 2} (s, x (s)) \in R^{m \times n}

,

b_{v 1} (s, x (s)) \in R^{m \times 1}

,

b_{v 2} (s, x (s)) \in R^{m \times 1}

are real matrices.

σ (s, x (s), u_{1} (s), u_{2} (s), v_{1} (s), v_{2} (s)) \in R^{m \times m}

is the diffusion term of the system equation. The relevant parameters are illustrated using an example involving three pursuers and two evaders:

1. Matrix

b_{u 1}

:

b_{u 1} = [\begin{matrix} 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ \\ \frac{1}{v_{P 1}} & 0 & 0 \\ 0 & 0 & 0 \\ 0 & \frac{1}{v_{P 2}} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \frac{1}{v_{P 3}} \\ ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 \end{matrix}] \begin{matrix} Start of numbers (10 th row) \end{matrix}

2. Matrix

b_{u 2}

:

b_{u 2} = [\begin{matrix} 0 & 0 & 0 \\ ⋮ & ⋮ & ⋮ \\ \frac{1}{v_{P 1} cos (θ_{P_{1}})} & 0 & 0 \\ 0 & 0 & 0 \\ 0 & \frac{1}{v_{P 2} cos (θ_{P_{2}})} & 0 \\ 0 & 0 & 0 \\ 0 & 0 & \frac{1}{v_{P 3} cos (θ_{P_{3}})} \\ ⋮ & ⋮ & ⋮ \\ 0 & 0 & 0 \end{matrix}] \begin{matrix} Start of numbers (11 th row) \end{matrix}

3. Matrix

b_{v 1}

:

b_{v 1} = [\begin{matrix} 0 & 0 \\ ⋮ & ⋮ \\ \frac{1}{v_{E 1}} & 0 \\ 0 & 0 \\ 0 & \frac{1}{v_{E 2}} \\ ⋮ & ⋮ \\ 0 & 0 \end{matrix}] \begin{matrix} Start of numbers (16 th row) \end{matrix}

4. Matrix

b_{v 2}

:

b_{v 2} = [\begin{matrix} 0 & 0 \\ ⋮ & ⋮ \\ \frac{1}{v_{E 1} cos (θ_{E 1})} & 0 \\ 0 & 0 \\ 0 & \frac{1}{v_{E 2} cos (θ_{E 2})} \\ ⋮ & ⋮ \\ 0 & 0 \end{matrix}] \begin{matrix} Start of numbers (17 th row) \end{matrix}

5. Vector b:

b = [\begin{matrix} v_{E 1} cos θ_{E 1} cos φ_{E 1} - v_{P 1} cos θ_{P 1} cos φ_{P 1}, \\ v_{E 1} cos θ_{E 1} cos φ_{E 1} - v_{P 2} cos θ_{P 2} cos φ_{P 2}, \\ v_{E 2} cos θ_{E 2} cos φ_{E 2} - v_{P 3} cos θ_{P 3} cos φ_{P 3}, \\ \frac{v_{E 1} sin θ_{E 1} - v_{P 1} sin θ_{P 1}}{R_{P 1}}, \\ \frac{v_{P 1} cos θ_{P 1} sin φ_{P 1} - v_{E 1} cos θ_{E 1}}{sin φ_{E 1} / R_{P 1} cos θ_{L 1}}, \\ \frac{v_{E 1} sin θ_{E 1} - v_{P 2} sin θ_{P 2}}{R_{P 2}}, \\ \frac{v_{P 2} cos θ_{P 2} sin φ_{P 2} - v_{E 1} cos θ_{E 1}}{sin φ_{E 1} / R_{P 2} cos θ_{L 2}}, \\ \frac{v_{E 2} sin θ_{E 2} - v_{P 3} sin θ_{P 3}}{R_{P 3}}, \\ \frac{v_{P 3} cos θ_{P 3} sin φ_{P 3} - v_{E 2} cos θ_{E 2}}{sin φ_{E 2} / R_{P 3} cos θ_{L 3}}, \\ tan θ_{L 1} sin φ_{P 1} \frac{(v_{P 1} cos θ_{P 1} sin φ_{P 1} - v_{E 1} cos θ_{E 1} sin φ_{E 1})}{R_{P 1}} + cos φ_{P 1} \frac{(v_{P 1} sin θ_{P 1} - v_{E 1} sin θ_{E 1})}{R_{P 1}}, \\ sin θ_{P 1} cos φ_{P 1} tan θ_{L 1} + \frac{v_{E 1} cos θ_{E 1} sin φ_{E 1} - v_{P 1} cos θ_{P 1} sin φ_{P 1}}{R_{P 1} cos θ_{P 1}} \\ - sin θ_{P 1} sin φ_{P 1} \frac{v_{E 1} sin θ_{E 1} - v_{P 1} sin θ_{E 1}}{R_{P 1} cos θ_{P 1}} - \frac{v_{E 1} cos θ_{E 1} sin φ_{E 1} - v_{P 1} cos θ_{P 1} sin φ_{P 1}}{R_{P 1}}, \\ tan θ_{L 2} sin φ_{P 2} \frac{(v_{P 2} cos θ_{P 2} sin φ_{P 2} - v_{E 1} cos θ_{E 1} sin φ_{E 1})}{R_{P 2}} + cos φ_{P 2} \frac{(v_{P 2} sin θ_{P 2} - v_{E 1} sin θ_{E 1})}{R_{P 2}}, \\ sin θ_{P 2} cos φ_{P 2} tan θ_{L 2} + \frac{v_{E 1} cos θ_{E 1} sin φ_{E 1} - v_{P 2} cos θ_{P 2} sin φ_{P 2}}{R_{P 2} cos θ_{P 2}} \\ - sin θ_{P 2} sin φ_{P 2} \frac{v_{E 1} sin θ_{E 1} - v_{P 2} sin θ_{E 1}}{R_{P 2} cos θ_{P 2}} - \frac{v_{E 1} cos θ_{E 1} sin φ_{E 1} - v_{P 2} cos θ_{P 2} sin φ_{P 2}}{R_{P 2}}, \\ tan θ_{L 3} sin φ_{P 3} \frac{(v_{P 3} cos θ_{P 3} sin φ_{P 3} - v_{E 2} cos θ_{E 2} sin φ_{E 2})}{R_{P 3}} + cos φ_{P 3} \frac{(v_{P 3} sin θ_{P 3} - v_{E 2} sin θ_{E 2})}{R_{P 3}}, \\ sin θ_{P 3} cos φ_{P 3} tan θ_{L 3} + \frac{v_{E 2} cos θ_{E 2} sin φ_{E 2} - v_{P 3} cos θ_{P 3} sin φ_{P 3}}{R_{P 3} cos θ_{P 3}} \\ - sin θ_{P 3} sin φ_{P 3} \frac{v_{E 2} sin θ_{E 2} - v_{P 3} sin θ_{E 2}}{R_{P 3} cos θ_{P 3}} - \frac{v_{E 2} cos θ_{E 2} sin φ_{E 2} - v_{P 3} cos θ_{P 3} sin φ_{P 3}}{R_{P 3}}, \\ tan θ_{L 1} sin φ_{E 1} \frac{(v_{P 1} cos θ_{P 1} sin φ_{P 1} - v_{E 1} cos θ_{E 1} sin φ_{E 1})}{R_{P 1}} + cos φ_{E 1} \frac{(v_{P 1} sin θ_{P 1} - v_{E 1} sin θ_{E 1})}{R_{P 1}}, \\ sin θ_{E 1} cos φ_{E 1} tan θ_{L 1} + \frac{v_{E 1} cos θ_{E 1} sin φ_{E 1} - v_{P 1} cos θ_{P 1} sin φ_{P 1}}{R_{P 1} cos θ_{E 1}} \\ - sin θ_{E 1} sin φ_{E 1} \frac{v_{E 1} sin θ_{E 1} - v_{P 1} sin θ_{E 1}}{R_{P 1} cos θ_{E 1}} - \frac{v_{E 1} cos θ_{E 1} sin φ_{E 1} - v_{P 1} cos θ_{P 1} sin φ_{P 1}}{R_{P 1}}, \\ tan θ_{L 3} sin φ_{E 2} \frac{(v_{P 3} cos θ_{P 3} sin φ_{P 3} - v_{E 2} cos θ_{E 3} sin φ_{E 3})}{R_{P 3}} + cos φ_{E 2} \frac{(v_{P 3} sin θ_{P 3} - v_{E 2} sin θ_{E 2})}{R_{P 3}}, \\ sin θ_{E 2} cos φ_{E 2} tan θ_{L 3} + \frac{v_{E 2} cos θ_{E 2} sin φ_{E 2} - v_{P 3} cos θ_{P 3} sin φ_{P 3}}{R_{P 3} cos θ_{E 2}} \\ - sin θ_{E 2} sin φ_{E 2} \frac{v_{E 2} sin θ_{E 2} - v_{P 3} sin θ_{E 2}}{R_{P 3} cos θ_{E 2}} - \frac{v_{E 2} cos θ_{E 2} sin φ_{E 2} - v_{P 3} cos θ_{P 3} sin φ_{P 3}}{R_{P 3}}, \end{matrix}]

6. Pursuer Inputs

u_{1}

and

u_{2}

:

u_{1} = [\begin{matrix} A_{y P 1} \\ A_{y P 2} \\ A_{y P 3} \end{matrix}], u_{2} = [\begin{matrix} A_{z P 1} \\ A_{z P 2} \\ A_{z P 3} \end{matrix}]

7. Evader Inputs

v_{1}

and

v_{2}

:

v_{1} = [\begin{matrix} A_{y E 1} \\ A_{y E 2} \end{matrix}], v_{2} = [\begin{matrix} A_{z E 1} \\ A_{z E 2} \end{matrix}]

The cost function of the pursuers and the evader is as follows:

\begin{matrix} J (u, v) & = E \{\int_{t}^{T} e^{- \int_{t}^{s} c (r) d r} (x^{T} C x + u^{T} D u - v^{T} Δ v) d s + e^{- \int_{t}^{T} c (r) d r} x^{T} (T) R_{T} x (T)\} \end{matrix}

(9)

where

e^{- \int_{t}^{s} c (r) d r}

is the discounting function, and

c (r)

is the function of time.

C \in R^{m \times m}

,

D \in R^{2 n \times 2 n}

,

Δ \in R^{2 \times 2}

,

R_{T} \in R^{m \times m}

,

u = {[u_{1}, u_{2}]}^{T}

,

v = {[v_{1}, v_{2}]}^{T}

,

D = [\begin{matrix} D_{1} \\ D_{2} \end{matrix}]

,

Δ = [\begin{matrix} Δ_{1} \\ Δ_{2} \end{matrix}]

,

D, Δ

is the orthogonal matrix.

Assumption 1.

For the convenience of subsequent calculations of the cost function

J (u, v)

, it is assumed that [9]:

$L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) = x^{T} C x + u^{T} D u - v^{T} Δ v$ , $g (x) = x {(s)}^{T} R_{T} x (s)$ . There exist positive constants $C_{L}$ , $C_{g}$ , and $p \geq 2$ such that:

\begin{matrix} |e^{- \int_{t}^{s} c (x (r)) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2})| \leq C_{L} (1 + {∥x∥}^{p} + {|u_{1}|}^{p} + {|u_{2}|}^{p} + {|v_{1}|}^{p} + {|v_{2}|}^{p}) \end{matrix}

(10)

|e^{- \int_{t}^{s} c (x (r)) d r} g (x)| \leq C_{g} (1 + {∥x∥}^{p})

(11)

Assumption 2.

It could be assumed that

B = b + b_{u 1} u_{1} + b_{u 2} u_{2} + b_{v 1} v_{1} + b_{v 2} v_{2}

, and σ is continuous and bounded. There exist positive constants

C_{b}, C_{u}, C_{σ}, C_{b σ}

such that [18]:

\begin{matrix} ∥B (s, x, u_{1}, u_{2}, v_{1}, v_{2}) - B (s, y, u_{1}, u_{2}, v_{1}, v_{2})∥ \leq C_{b} ∥x - y∥ \end{matrix}

(12)

\begin{matrix} ∥B (s, x, u_{1}, u_{2}, v_{1}, v_{2}) - B (s, x, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2})∥ \\ \begin{matrix}  \end{matrix} \leq C_{u} ({|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p}) \end{matrix}

(13)

∥σ (s, x, u_{1}, u_{2}, v_{1}, v_{2}) - σ (s, y, u_{1}, u_{2}, v_{1}, v_{2})∥ \leq C_{σ} ∥x - y∥

(14)

\begin{matrix} ∥B (s, x, u_{1}, u_{2}, v_{1}, v_{2}) + ∥σ (s, x, u_{1}, u_{2}, v_{1}, v_{2})∥∥ \leq C_{b σ} (1 + ∥x∥) \end{matrix}

(15)

where

s \in [t, T] . x, y \in R^{n}

. The admissible control sets for pursuers and evader could be defined as follows:

U = \{\begin{matrix} u_{i} ∣ u_{i} \in [t, T] \times R^{n \times 1}, u_{i} is uniformly locally Lipschitz continuous, \\ |u_{i} (s, x)| \leq u_{i}^{max}, i = 1, 2 . \end{matrix}\}

(16)

ν = \{\begin{matrix} v_{i} ∣ v_{i} \in [t, T] \times R^{n}, v_{i} is uniformly locally Lipschitz continuous, \\ |v_{i} (s, x)| \leq v_{i}^{max}, i = 1, 2 . \end{matrix}\}

(17)

for all

s \in [t, T]

,

L (s, x, u_{1}, u_{2}, v_{1}, v_{2})

and

g (x)

is continuously differentiable. In the missile interception model presented in this paper, we assume that the control inputs are bounded. Specifically, the trajectory control engine, which governs the missile’s movement, has an upper limit on its output, which is related to the normal acceleration. This assumption is crucial for modeling practical missile systems, where limitations on engine power and physical constraints on acceleration are common.

This assumption of bounded control inputs is supported by previous works in the field. For instance, references [34,35] provide theoretical justification for such constraints in control systems, while references [36,37] discuss their applicability to missile interception problems. Moreover, reference [38] highlights the impact of input constraints on system stability and performance, further validating the choice of this assumption in our model.

3. The Optimal Feedback Strategies

Definition 1.

The triple strategy

(u_{i}^{*}, v_{i}^{*}) \in (U, ν), i = 1, 2

is called to constitute a Nash equilibrium if the triple strategy is within the admissible control set, and if the following hold:

min_{u_{i}^{*} \in U} J (u_{1}^{*}, u_{2}^{*}, v_{1}, v_{2}) \leq min_{u_{i}^{*} \in U} max_{v_{i}^{*} \in ν} J (u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*}) \leq max_{v_{i}^{*} \in ν} J (u_{1}, u_{2}, v_{1}^{*}, v_{2}^{*})

(18)

for any

[s, x] \in (t, T) \times R^{m}

. In this context, the pursuer aims to minimize the profit function, whereas the evader seeks to maximize it.

\begin{matrix} min_{u \in U} max_{v \in ν} J = E \{\int_{t}^{T} e^{- \int_{t}^{s} c (r) d r} (x^{T} C_{s} x + u^{T} D u - v^{T} Δ v) d s \\ + e^{- \int_{t}^{T} c (r) d r} x {(T)}^{T} R_{T} x (T)\} \end{matrix}

(19)

For the second term of the cost function, A matrix

p \in R^{m \times m}

is a solution for the terminal term,

d p = α d s + β d W = (- A^{T} p + ζ (x - E (x))) d s + β d W

(20)

where

p (T) = R_{T}

,

α = - A^{T} p + ζ (x - E (x))

,

ζ (x)

is a function that depends on the state,

ζ (0) = 0 .

β

is a stochastic diffusion term. Applying Itô’s Lemma to compute

〈 p (T) x (T), x (T) 〉

\begin{matrix} x {(T)}^{T} R_{T} x (T) = 〈p (T) x (T), x (T)〉 = 〈p (t) x (t), x (t)〉 \\ + \int_{t}^{T} x^{T} α x + x^{T} p B + B^{T} p x + 2 σ^{T} β x + t r (p σ σ^{T}) d s + \int_{t}^{T} x^{T} β x + 2 σ^{T} p x d W . \end{matrix}

(21)

Combining (19) and (21), the following could be obtained:

\begin{matrix} J = E \{\begin{matrix} e^{- \int_{t}^{s} c (r) d r} \int_{t}^{T} \{x^{T} C_{s} x + u^{T} D_{s} u - v^{T} D_{s} v\} d s + e^{- \int_{t}^{s} c (r) d r} x^{T} R x \end{matrix}\} \\ = E \{\begin{matrix} e^{- \int_{t}^{s} c (r) d r} \int_{t}^{T} \{\begin{matrix} x^{T} C_{s} x + u^{T} D_{s} u - v^{T} D_{s} v + x^{T} α x + 2 B^{T} p x + 2 σ^{T} β x + t r (p σ σ^{T}) \end{matrix}\} d s \\ + e^{- \int_{t}^{s} c (r) d r} \int_{t}^{T} β x^{T} x + 2 p σ^{T} x d W + e^{- \int_{t}^{s} c (r) d r} 〈p (t) x (t), x (t)〉 \end{matrix}\} . \end{matrix}

(22)

The second term of the cost function (22) is equal to zero, and the third term is relative to the initial state, such that

\begin{matrix} J = E \{\begin{matrix} e^{- \int_{t}^{s} c (r) d r} \int_{t}^{T} \{x^{T} C_{s} x + u^{T} D_{s} u - v^{T} D_{s} v\} d s + e^{- \int_{t}^{s} c (r) d r} x^{T} R x \end{matrix}\} \\ = E \{\int_{t}^{T} e^{- \int_{t}^{s} c (r) d r} \{\begin{matrix} x^{T} C_{s} x + u^{T} D_{s} u - v^{T} D_{s} v + x^{T} α x + 2 B^{T} p x + 2 σ^{T} β x + t r (p σ σ^{T}) \end{matrix}\} d s\} . \end{matrix}

(23)

In this paper, the

H J B

-equation for

Ψ (s, x) = inf_{u_{i} \in U} sup_{v_{i} \in ν} J (s, x)

becomes:

\begin{matrix} \partial_{s} Ψ + inf_{u_{i} \in U} sup_{v_{i} \in ν} \{\begin{matrix} \frac{1}{2} t r (σ_{s} {σ^{*}}_{s}) \partial_{x x}^{2} Ψ + 〈b_{s}, \partial_{x} Ψ〉 + c Ψ + L \end{matrix}\} = 0, s \in [t, T] . \end{matrix}

(24)

Theorem 1.

the value function

Ψ (t, x) \in C^{1, 2} ([t, T] \times R^{m})

is the classical solution of HJB,

(u_{i}^{*}, v_{i}^{*}) \in (U, ν)

,

|u_{i} (s, x)| \leq u_{i}^{max},

|v_{i} (s, x)| \leq v_{i}^{max}, i = 1, 2 .

such that

|Ψ|, |\partial_{s} Ψ|, |\partial_{x} Ψ|, |\partial_{x x}^{2} Ψ| \leq C_{Ψ} (1 + {∥x∥}^{N})

, for some

N \geq 0

. If

(x^{*}, u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*})

is an admissible pair at

(t, x)

,

F_{c v} = \frac{1}{2} t r (σ_{s} {σ^{*}}_{s}) \partial_{x x}^{2} Ψ + 〈b_{s}, \partial_{x} Ψ〉 + c Ψ + L

, such that

(u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*}) \in arg inf_{u_{i} \in U} sup_{v_{i} \in n u} F_{c v}

. For almost everywhere,

s \in [t, T]

. Then, the pair

(x^{*}, u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*})

is optimal at

(t, x)

.

Proof.

Based on Itô’s Lemma:

\begin{matrix} Ψ (s, x) = E \{\begin{matrix} Ψ (T, x) - \int_{t}^{T} [\partial_{s} Ψ + 〈b, \partial_{x} Ψ〉 + \frac{1}{2} t r (σ_{s} {σ^{*}}_{s}) \partial_{x x}^{2} Ψ] d s \end{matrix}\} \end{matrix}

(25)

Then, the following could be obtained:

\begin{matrix} J (t, x, u_{1}^{*}, u_{2}^{*}, v_{1}, v_{2}) \leq E \{g (x (T)) - \int_{t}^{T} [\partial_{s} Ψ - L + F_{c v} - Ψ + Ψ] d s\} \leq J (t, x, u_{1}, u_{2}, v_{1}^{*}, v_{2}^{*}) . \end{matrix}

(26)

If and only if

F_{c v} = Ψ

,

Ψ (t, x) = J (t, x, u_{1}, u_{2}, v_{1}, v_{2})

. If

(u_{1}, u_{2}, v_{1}, v_{2}) = (u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*})

, then the equation above gives:

Ψ (t, x) = V (t, x) = J (t, x, u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*})

, where value function

V (t, x)

is the viscosity solution of the HJB equation. It may be the case that a continuous first and second derivative

C^{1, 2}

does not exist. □

Step 1: the value function

V (t, x)

is the Lipschitz continuous function.

Proof.

We introduce three Lemmas 1–3, to complete the proof. To prove that the value function is Lipschitz continuous, it is necessary to demonstrate that each input parameter of the value function satisfies the Lipschitz continuity condition. This step ensures the mathematical rigor of the subsequent solution process. While the numerical simulations in this paper are conducted within the

R^{3}

space, the uniqueness of the value function as a solution holds true in higher-dimensional spaces as well. We provide a simplified overview of the lemmas used:

Lemma 1: Proves that time is Lipschitz continuous, ensuring stability with respect to temporal variations.
Lemma 2: Shows that the state is Lipschitz continuous, accounting for the relationship between states and dynamics.
Lemma 3: Demonstrates that the control inputs are Lipschitz continuous under bounded constraints, ensuring consistency of input–output relationships.

These lemmas collectively establish the Lipschitz continuity of the value function, providing a foundation for the uniqueness proof in both low- and high-dimensional spaces.

Lemma 1.

For any

s \in [t, T]

,

x_{1}, x_{2} \in R^{m}

, there exists

C_{λ 1}, C_{λ 2}

, such that:

\begin{matrix} inf_{u_{i} \in U} E \{C_{λ 1} {∥x_{1} - x_{2}∥}^{p}\} \leq |V (t, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x_{2}, u_{1}, u_{2}, v_{1}, v_{2})| \leq sup_{v_{i} \in ν} E \{C_{λ 2} {∥x_{1} - x_{2}∥}^{p}\} . \end{matrix}

(27)

V (t)

is Lipschitz continuous.

Proof.

For any

s \in [t, T]

, we can obtain the following:

\begin{matrix} x_{1} (s) = x_{1} (t) + \int_{t}^{s} B (s, x_{1}) d s + \int_{t}^{s} σ (s, x_{1}) d W . x_{2} (s) = x_{2} (t) + \int_{t}^{s} B (s, x_{2}) d s + \int_{t}^{s} σ (s, x_{2}) d W . \end{matrix}

(28)

Then,

\begin{matrix} x_{1} (s) - x_{2} (s) = x_{1} (t) - x_{2} (t) + \int_{t}^{s} B (s, x_{1}) - B (s, x_{2}) d s + \int_{t}^{s} σ (s, x_{1}) - σ (s, x_{2}) d W . \end{matrix}

(29)

Taking expectations on both sides of the equation:

\begin{matrix} E \{∥x_{1} (s) - x_{2} (s)∥\} & \leq E \{∥x_{1} (t) - x_{2} (t)∥ + \int_{t}^{s} C_{b} ∥x_{1} - x_{2}∥ d s\} \end{matrix}

(30)

According to the Gronwall inequality, there exists a positive constant

C_{b}

such that:

E \{∥x_{1} (s) - x_{2} (s)∥\} \leq E \{e^{C_{b} (s - t)} ∥x_{1} (t) - x_{2} (t)∥\}

(31)

for any

s - t \leq T

, such that:

E \{∥x_{1} (s) - x_{2} (s)∥\} \leq E \{e^{C_{b} T} ∥x_{1} (t) - x_{2} (t)∥\}

(32)

\forall ε > 0

,

x_{1}, x_{2} \in R^{n}

, such that:

\begin{matrix} V (t, x_{2}, u_{1}, u_{2}, v_{1}, v_{2}) + ε \geq sup_{v_{i} \in ν} E \{\begin{matrix} \int_{t}^{T} e^{- \int_{t}^{s} c (r) d r} L (s, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) d s + e^{- \int_{t}^{s} c (r) d r} g (x (T)) \end{matrix}\} \end{matrix}

(33)

\begin{matrix} V (t, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x_{2}, u_{1}, u_{2}, v_{1}, v_{2}) - ε \\ \leq sup_{v_{i} \in ν} E \{\begin{matrix} \int_{t}^{T} e^{- \int_{t}^{s} c (r) d r} (\begin{matrix} L (s, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - L (s, x_{2}, u_{1}, u_{2}, v_{1}, v_{2}) \end{matrix}) d s \\ + e^{- \int_{t}^{s} c (r) d r} g (x_{1} (T)) - e^{- \int_{t}^{s} c (r) d r} g (x_{2} (T)) \end{matrix}\} \end{matrix}

(34)

There exist constants

C_{L}

and

C_{g}

, such that:

\begin{matrix} V (t, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x_{2}, u_{1}, u_{2}, v_{1}, v_{2}) - ε \\ \leq sup_{v_{i} \in ν} E \{\int_{t}^{T} C_{L} ({∥x_{1} - x_{2}∥}^{p}) d s + C_{g} {∥x_{1} (T) - x_{2} (T)∥}^{p}\} \\ \leq sup_{v_{i} \in ν} E \{C_{L} e^{C_{b} T} {∥x_{1} - x_{2}∥}^{p} T + C_{g} e^{C_{b} T} {∥x_{1} - x_{2}∥}^{p}\} \\ \leq sup_{v_{i} \in ν} E \{C_{λ} {∥x_{1} - x_{2}∥}^{p}\} . \end{matrix}

(35)

Similarly, we could obtain the following:

\begin{matrix} inf_{u_{i} \in U} E \{C_{λ 1} {∥x_{1} - x_{2}∥}^{p}\} \leq |V (t, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x_{2}, u_{1}, u_{2}, v_{1}, v_{2})| \leq sup_{v_{i} \in ν} E \{C_{λ 2} {∥x_{1} - x_{2}∥}^{p}\} \end{matrix}

(36)

where

C_{λ}

is a constant, which depends on

T, C_{L}, C_{b}, C_{g}

. This completes the proof of Lemma 1. □

Lemma 2.

For any

x \in R^{m}

,

V (x)

is Lipschitz continuous. There exist constants

k_{1}, k_{2}

, such that:

\begin{matrix} inf_{u_{i} \in U} E \{k_{2} {(1 + ∥x∥)}^{P} T\} \leq V (t_{1}, x (t_{1})) - V (t_{2}, x (t_{1})) \leq sup_{v_{i} \in ν} E \{k_{1} {(1 + ∥x∥)}^{P} T\} . \end{matrix}

(37)

Proof.

\forall t_{1} \leq t_{2}, t_{1}, t_{2} \in [0, T]

.

x (t_{2}) - x (t_{1}) = \int_{t_{1}}^{t_{2}} B (s, x) d s + \int_{t_{1}}^{t_{2}} σ (s, x) d W

(38)

Then,

E \{∥x (t_{2}) - x (t_{1})∥\} = E \{∥\int_{t_{1}}^{t_{2}} B (s, x) d s∥\} .

(39)

We could obtain the following:

\begin{matrix} E \{∥B (t, x, u_{1}, u_{2}, v_{1}, v_{2}) - B (t, 0, u_{1}, u_{2}, v_{1}, v_{2})∥\} \leq E \{C_{b} ∥x∥\}, \\ E \{∥B (t, x, u_{1}, u_{2}, v_{1}, v_{2})∥\} \leq E \{∥B (t, 0, u_{1}, u_{2}, v_{1}, v_{2})∥ + C_{b} ∥x∥\} \\ \leq E \{max_{v_{i} \in ν} ∥B (t, 0, u_{1}, u_{2}, v_{1}, v_{2})∥ + C_{b} ∥x∥\} \\ \leq E \{M_{F} + C_{b} ∥x∥\}, \end{matrix}

(40)

where

M_{F} : = max_{v_{i} \in ν} ∥B (0, u_{1}, u_{2}, v_{1}, v_{2})∥ .

\begin{matrix} E \{min_{u_{i} \in U} ∥B (0, u_{1}, u_{2}, v_{1}, v_{2})∥ - C_{b} ∥x∥\} \leq E \{∥B (x, u_{1}, u_{2}, v_{1}, v_{2})∥\} \end{matrix}

(41)

where

m_{F} : = min_{u_{i} \in U} ∥B (0, u_{1}, u_{2}, v_{1}, v_{2})∥

. Then,

\forall x \in R^{n}, t_{1}, t_{2} \in [0, T]

.

\begin{matrix} V (t_{1}, x) = inf_{u_{i} \in U} sup_{v_{i} \in ν} E \{\begin{matrix} \int_{t_{1}}^{t_{1} + h} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s + V (t_{2}, x) \end{matrix}\} \end{matrix}

(42)

where

h = t_{2} - t_{1}

. The following could be obtained:

\begin{matrix} V (t_{1}, x (t_{1})) - V (t_{2}, x (t_{1})) \\ \begin{matrix}  \end{matrix} \leq V (t_{2}, x (t_{2})) - V (t_{2}, x (t_{1})) + sup_{v_{i} \in ν} E \{\int_{t_{1}}^{t_{2}} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \\ \begin{matrix}  \end{matrix} \leq E \{C_{λ} {∥x (t_{2}) - x (t_{1})∥}^{p}\} + sup_{v_{i} \in ν} E \{\int_{t_{1}}^{t_{2}} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \\ \leq sup_{v_{i} \in ν} E \{\begin{matrix} C_{λ} {(M_{F} + C_{b} ∥x∥)}^{P} T + {(M_{L} + C_{M} ∥x∥)}^{P} T \end{matrix}\} \leq sup_{v_{i} \in ν} E \{k_{1} {(1 + ∥x∥)}^{P} T\} . \end{matrix}

(43)

On the other hand, for any

δ > 0

, such that:

\begin{matrix} V (t_{1}, x (t_{1})) - V (t_{2}, x (t_{1})) + δ \\ \begin{matrix}  \end{matrix} \geq V (t_{2}, x (t_{2})) - V (t_{2}, x (t_{1})) + inf_{u_{i} \in U} E \{\int_{t_{1}}^{t_{2}} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \\ \begin{matrix}  \end{matrix} \geq inf_{u_{i} \in U} E \{k_{2} {(1 + ∥x∥)}^{P} T\} . \end{matrix}

(44)

Then, let

δ \to 0

. We can obtain the following:

\begin{matrix} inf_{u_{i} \in U} E \{k {(1 + ∥x∥)}^{P} T\} \leq V (t_{1}, x (t_{1})) - V (t_{2}, x (t_{1})) \leq sup_{v_{i} \in ν} E \{k {(1 + ∥x∥)}^{P} T\} . \end{matrix}

(45)

This completes the proof of Lemma 2. □

Then, we need to add a proof regarding the impact of bounded inputs on value functions.

Lemma 3.

\forall (u_{1}, u_{2}, {\bar{u}}_{1}, {\bar{u}}_{2}) \in U, {\bar{u}}_{i} = u^{max}, (v_{1}, v_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) \in ν, {\bar{v}}_{i} = v^{max}

, there exist constants

C_{L 1}, C_{L 2},

such that:

\begin{matrix} inf_{u_{i} \in U} E \{C_{L 1} e^{C_{u} T} T (\begin{matrix} {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix})\} \\ \begin{matrix}  \end{matrix} \leq V (t, x, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) \\ \leq sup_{v_{i} \in ν} E \{C_{L 2} e^{C_{u} T} T (\begin{matrix} {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix})\} . \end{matrix}

(46)

Proof.

\exists s \in [t, T], x \in R^{N},

there exists a constant

C_{L 1}

, such that:

\begin{matrix} V (t, x, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) \\ \leq sup_{v_{i} \in ν} E \{\int_{t}^{T} e^{- C_{e} s} (\begin{matrix} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) - L (s, x, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) \end{matrix}) d s\} \\ \leq sup_{v_{i} \in ν} E \{\int_{t}^{T} C_{L 1} ∥\begin{matrix} 1 + {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix}∥ d s\} \\ \leq sup_{v_{i} \in ν} E \{C_{L 1} e^{C_{u} T} T (\begin{matrix} {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix})\} . \end{matrix}

(47)

Then, there exists a constant

C_{L 2}

, such that:

\begin{matrix} V (t, x, u_{1}, u_{2}, v_{1}, v_{2}) - V (t, x, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) \\ \leq sup_{v_{i} \in ν} E \{\int_{t}^{T} e^{- C_{e} s} (\begin{matrix} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) - L (s, x, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) \end{matrix}) d s\} \\ \leq sup_{v_{i} \in ν} E \{\int_{t}^{T} C_{L} ∥\begin{matrix} 1 + {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix}∥ d s\} \\ \leq sup_{v_{i} \in ν} E \{C_{L} e^{C_{u} T} T (\begin{matrix} {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix})\} . \end{matrix}

(48)

This completes the proof of Lemma 3. □

According to Lemmas 1–3, we can obtain the following:

\begin{matrix} |V (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t_{2}, x_{2}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2})| \\ \leq |V (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t_{1}, x_{2}, u_{1}, u_{2}, v_{1}, v_{2})| \\ + |V (t_{1}, x_{2}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t_{2}, x_{2}, u_{1}, u_{2}, v_{1}, v_{2})| \\ + |V (t_{2}, x_{2}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t_{2}, x_{2}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2})| . \end{matrix}

(49)

Then, we could obtain the following:

\begin{matrix} inf_{u_{i} \in U} E \{\begin{matrix} C_{L 1} e^{C_{u} T} T (\begin{matrix} {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix}) \\ + C_{λ 1} {∥x_{1} - x_{2}∥}^{p} + k_{1} {(1 + ∥x∥)}^{P} T \end{matrix}\} \\ \leq |V (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V (t_{2}, x_{2}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2})| \leq \\ sup_{v_{i} \in ν} E \{\begin{matrix} C_{L} e^{C_{u} T} T (\begin{matrix} {|u_{1} - {\bar{u}}_{1}|}^{p} + {|u_{2} - {\bar{u}}_{2}|}^{p} + {|v_{1} - {\bar{v}}_{1}|}^{p} + {|v_{2} - {\bar{v}}_{2}|}^{p} \end{matrix}) \\ + C_{λ} {∥x_{1} - x_{2}∥}^{p} + k {(1 + ∥x∥)}^{P} T \end{matrix}\} . \end{matrix}

(50)

This completes the proof of step 1. □

Step 2: the value function

V (t, x)

is the unique viscosity solution of the

H J B

equation.

Proof.

We assume that

V_{1}

is a sub-solution. Let

μ

be the standard complete probability space:

μ = (Ω_{1}, F, F_{s}^{t}, P),

where

W_{η}

is a Wiener process in the filtration probability space

(Ω, F, F_{s}^{t}, P) .

Here,

F

is the augmentation by

P

-sets, and

F_{s}^{t}

is the

σ

-algebra generated by

W_{η}

.

\begin{matrix} J (t, x, u_{1}, u_{2}, v_{1}, v_{2}) = \\ E \{\int_{t}^{t + h} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \\ + E \{\begin{matrix} \int_{t + h}^{T} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s + e^{- \int_{t}^{s} c (r) d r} g (x (T)) | F_{t + h}^{T} \end{matrix}\} \\ = E \{\int_{t}^{t + h} e^{- \int_{t}^{s} c (s) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \\ + E \{E \{\begin{matrix} \int_{t + h}^{T} e^{- \int_{t + h}^{s} c (s) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s + e^{- \int_{t}^{s} c (s) d r} g (x (T)) | F_{t + h}^{T} \end{matrix}\}\} \\ = E \{\int_{t}^{t + h} e^{- \int_{t}^{s} c (s) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \\ + E \{e^{- \int_{t + h}^{s} c (r) d r} J^{u} (t + h, x, u_{1}, u_{2}, v_{1}, v_{2}) + ς\} \\ \geq E \{\int_{t}^{t + h} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} + inf_{u_{i} \in U} e^{- \int_{t + h}^{T} c (r) d r} V_{1} (t + h, x, u_{1}, u_{2}, v_{1}, v_{2}) . \end{matrix}

(51)

Then, we can obtain the following:

\begin{matrix} inf_{u_{i} \in U} V_{1} (t, x, u_{1}, u_{2}, v_{1}, v_{2}) \geq \\ E \{\int_{t}^{t + h} e^{- \int_{t}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} + inf_{u_{i} \in U} e^{- \int_{t + h}^{T} c (r) d r} V (t + h, x, u_{1}, u_{2}, v_{1}, v_{2}) . \\ inf_{u_{i} \in U} V_{1} (t, x, u_{1}, u_{2}, v_{1}, v_{2}) - inf_{u_{i} \in U} e^{- \int_{t + h}^{T} c (r) d r} V_{1} (t + h, x, u_{1}, u_{2}, v_{1}, v_{2}) \\ \geq E \{\int_{t}^{t + h} e^{- \int_{t + h}^{T} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} . \\ E \{\int_{t}^{t + h} e^{- C_{e} s} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} + inf_{u_{i} \in U} e^{- \int_{t + h}^{T} c (r) d r} V_{1} (t + h, x, u_{1}, u_{2}, v_{1}, v_{2}) \\ - inf_{u_{i} \in U} V_{1} (t, x, u_{1}, u_{2}, v_{1}, v_{2}) \leq 0 . \\ inf_{u_{i} \in U} V_{1} (t, x, u_{1}, u_{2}, v_{1}, v_{2}) - inf_{u_{i} \in U} e^{- \int_{t + h}^{T} c (r) d r} V_{1} (t + h, x, u_{1}, u_{2}, v_{1}, v_{2}) \\ - E \{\int_{t}^{t + h} e^{- \int_{t + h}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \geq 0 . \end{matrix}

(52)

For any

(t_{0}, x_{0}) \in (t, T) \times R^{n}

, and there exists

Ψ_{1} \in C^{1, 2} ((t, T) \times R^{n})

, with

V_{1} - Ψ_{1}

attaining its maximum at

(t_{0}, x_{0})

, we have:

\begin{matrix} e^{- \int_{t + h}^{T} c (r) d r} V_{1} (t, x) - e^{- \int_{t + h}^{T} c (r) d r} Ψ_{1} (t, x) \leq V_{1} (t_{0}, x_{0}) - Ψ_{1} (t_{0}, x_{0}) \end{matrix}

where

\forall (t_{0}, x_{0}) \in (t, T) \times R^{n}

.

\begin{matrix} e^{- \int_{t_{0} + h}^{T} c (r) d r} V_{1} (t_{0} + h, x_{0}) - e^{- \int_{t_{0} + h}^{T} c (r) d r} Ψ_{1} (t_{0} + h, x_{0} + h) \leq V_{1} (t_{0}, x_{0}) - Ψ_{1} (t_{0}, x_{0}) . \end{matrix}

\begin{matrix} Ψ_{1} (t_{0}, x_{0}) - e^{- \int_{t_{0} + h}^{T} c (r) d r} Ψ_{1} (t_{0} + h, x_{0} + h) \leq V (t_{0}, x_{0}) - e^{- \int_{t_{0} + h}^{T} c (r) d r} V (t_{0} + h, x_{0}) . \end{matrix}

\begin{matrix} Ψ_{1} (t_{0}, x_{0}) - e^{- \int_{t_{0} + h}^{T} c (r) d r} Ψ_{1} (t_{0} + h, x_{0} + h) - E \{\int_{t_{0}}^{t_{0} + h} e^{- \int_{t_{0}}^{s} c (r) d r} L (s, x, u_{1}, u_{2}, v_{1}, v_{2}) d s\} \geq 0 . \end{matrix}

Then, we can obtain the following:

inf_{u_{i} \in U} E \{\begin{matrix} \frac{1}{h} \int_{t_{0}}^{t_{0} + h} - e^{- \int_{t_{0}}^{s} c (r) d r} \partial_{s} Ψ_{1} - e^{- \int_{t_{0}}^{s} c (r) d r} B^{T} \partial_{x} Ψ_{1} - e^{- \int_{t_{0}}^{s} c (r) d r} \frac{1}{2} (σ_{s}^{T} σ_{s}) \partial_{x x}^{2} Ψ_{1} \\ - (1 - e^{- \int_{t_{0}}^{s} c (r) d r}) Ψ_{1} - e^{- \int_{t_{0}}^{s} c (r) d r} L d s \end{matrix}\} \geq 0 .

\begin{matrix} lim_{h \to 0} \partial_{s} Ψ_{1} + inf_{u_{i} \in U} (\begin{matrix} B^{T} \partial_{x} Ψ_{1} + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} \partial_{x x}^{2} Ψ_{1} + C_{e} Ψ_{1} + L \end{matrix}) \leq 0 . \end{matrix}

(53)

where

C_{e} = lim_{h \to 0} \frac{e^{- \int_{t_{0}}^{t_{0} + h} c (r) d r} - 1}{h}

. Similarly,

V_{2}

is a supersolution. For any

(t_{0}, x_{0}) \in (t, T) \times R^{m}

, there exists

Ψ_{2} \in C^{1, 2} ((t, T) \times R^{n})

, such that

V_{2} - Ψ_{2}

attains a minimum at

(t_{0}, x_{0})

, and we have

V_{2} (t, x) - Ψ_{2} (t, x) \leq V_{2} (t_{0}, x_{0}) - Ψ_{2} (t_{0}, x_{0})

. Then, we have the following:

\partial_{s} Ψ_{2} + sup_{v_{i} \in ν} (B \partial_{x} Ψ_{2} + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} \partial_{x x}^{2} Ψ_{2} + C_{e} Ψ_{2} + L) \geq 0

. Next, we prove the existence of a unique solution V to ensure that the HJB equation has a unique solution. Let

M : = sup_{t \in [0, T]} {V_{1} - V_{2}}

. Assuming by contradiction that

M > 0

, we construct the following function using the doubling variable technique: for

ε, β, χ > 0

and

0 < m 1 < 1

.

\begin{matrix} Φ (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}, t_{2}, x_{2}, u_{1}^{'}, u_{2}^{'}, v_{1}^{'}, v_{2}^{'}) \\ = V_{1} (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - V_{2} (t_{2}, x_{2}, u_{1}^{'}, u_{2}^{'}, v_{1}^{'}, v_{2}^{'}) - \frac{{|t_{1} - t_{2}|}^{p} + {∥x_{1} - x_{2}∥}^{p}}{p ε} \\ - β {(\sqrt{1 + {∥x_{1}∥}^{p}} + \sqrt{1 + {∥x_{2}∥}^{p}})}^{m 1} - χ (∥u_{1} - u_{1}^{'}∥ + ∥u_{2} - u_{2}^{'}∥ + ∥v_{1} - v_{1}^{'}∥ + ∥v_{2} - v_{2}^{'}∥) \end{matrix}

(54)

where

Φ

is continuous and the last three terms of equation (54) are used to estimate the discontinuities. When

Φ (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}, t_{2}, x_{2}, u_{1}^{'}, u_{2}^{'}, v_{1}^{'}, v_{2}^{'}) \to - \infty,

then

max \{∥x_{1}∥, ∥x_{2}∥\} \to \infty

. Then, the following exists:

\begin{matrix} sup Φ (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}, t_{2}, x_{2}, u_{1}^{'}, u_{2}^{'}, v_{1}^{'}, v_{2}^{'}) = Φ ({\bar{t}}_{1}, {\bar{x}}_{1}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}, {\bar{t}}_{2}, {\bar{x}}_{2}, {\bar{u}}_{1}^{'}, {\bar{u}}_{2}^{'}, {\bar{v}}_{1}^{'}, {\bar{v}}_{2}^{'}) . \end{matrix}

(55)

According to the definition of M, there exists

(\tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2})

, such that:

V_{1} (\tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2}) - V_{2} (\tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2}) \geq \frac{M}{2}

(56)

Then:

\begin{matrix} Φ ({\bar{t}}_{1}, {\bar{x}}_{1}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}, {\bar{t}}_{2}, {\bar{x}}_{2}, {\bar{u}}_{1}^{'}, {\bar{u}}_{2}^{'}, {\bar{v}}_{1}^{'}, {\bar{v}}_{2}^{'}) \\ \geq Φ (\tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2}, \tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2}) \\ \geq V_{1} (\tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2}) - V_{2} (\tilde{t}, \tilde{x}, {\tilde{u}}_{1}, {\tilde{u}}_{2}, {\tilde{v}}_{1}, {\tilde{v}}_{2}) - 2 β {(\sqrt{1 + {∥\tilde{x}∥}^{p}})}^{m 1} \\ - χ (∥{\tilde{u}}_{1} - {\tilde{u}}_{1}∥ + ∥{\tilde{u}}_{2} - {\tilde{u}}_{2}∥ + ∥{\tilde{v}}_{1} - {\tilde{v}}_{1}∥ + ∥{\tilde{v}}_{2} - {\tilde{v}}_{2}∥) \\ \geq \frac{M}{2} - 2 β {(\sqrt{1 + {∥\tilde{x}∥}^{p}})}^{m 1} . \end{matrix}

(57)

There exist

β \in R

, such that

β {(\sqrt{1 + {∥\tilde{x}∥}^{p}})}^{m 1} = \frac{M}{8}

. Then:

Φ ({\bar{t}}_{1}, {\bar{x}}_{1}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}, {\bar{t}}_{2}, {\bar{x}}_{2}, {\bar{u}}_{1}^{'}, {\bar{u}}_{2}^{'}, {\bar{v}}_{1}^{'}, {\bar{v}}_{2}^{'}) \geq \frac{M}{4} .

(58)

We can obtain the following:

\begin{matrix} V_{1} ({\bar{t}}_{1}, {\bar{x}}_{1}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) - V_{2} ({\bar{t}}_{2}, {\bar{x}}_{2}, {\bar{u}}_{1}^{'}, {\bar{u}}_{2}^{'}, {\bar{v}}_{1}^{'}, {\bar{v}}_{2}^{'}) - \frac{M}{4} \geq β {(\sqrt{1 + {∥{\bar{x}}_{1}∥}^{p}} + \sqrt{1 + {∥{\bar{x}}_{2}∥}^{p}})}^{m 1} \end{matrix}

(59)

V_{1}

and

V_{2}

are bound, then there exists a constant

C_{x}

, such that:

β {(\sqrt{1 + {∥{\bar{x}}_{1}∥}^{p}} + \sqrt{1 + {∥{\bar{x}}_{2}∥}^{p}})}^{m 1} \leq C_{x}

(60)

According to this assumption, we can obtain the following:

\begin{matrix} \frac{{|{\bar{t}}_{1} - {\bar{t}}_{2}|}^{p} + {∥{\bar{x}}_{1} - {\bar{x}}_{2}∥}^{p}}{p ε} \leq \\ V_{1} ({\bar{t}}_{1}, {\bar{x}}_{1}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) - V_{1} ({\bar{t}}_{2}, {\bar{x}}_{2}, {\bar{u}}_{1}^{'}, {\bar{u}}_{2}^{'}, {\bar{v}}_{1}^{'}, {\bar{v}}_{2}^{'}) + V_{2} ({\bar{t}}_{1}, {\bar{x}}_{1}, {\bar{u}}_{1}, {\bar{u}}_{2}, {\bar{v}}_{1}, {\bar{v}}_{2}) - V_{2} ({\bar{t}}_{2}, {\bar{x}}_{2}, {\bar{u}}_{1}^{'}, {\bar{u}}_{2}^{'}, {\bar{v}}_{1}^{'}, {\bar{v}}_{2}^{'}) \end{matrix}

(61)

We assume the existence of a continuity modulus

ω

:

\begin{matrix} \frac{{|{\bar{t}}_{1} - {\bar{t}}_{2}|}^{p} + {∥{\bar{x}}_{1} - {\bar{x}}_{2}∥}^{p}}{p ε} \\ \leq 2 ω (\begin{matrix} |{\bar{t}}_{1} - {\bar{t}}_{2}| + ∥{\bar{x}}_{1} - {\bar{x}}_{2}∥ + ∥{\bar{u}}_{1} - {\bar{u}}_{1}^{'}∥ + ∥{\bar{u}}_{2} - {\bar{u}}_{2}^{'}∥ + ∥{\bar{v}}_{1} - {\bar{v}}_{1}^{'}∥ + ∥{\bar{v}}_{2} - {\bar{v}}_{2}^{'}∥ \end{matrix}) \end{matrix}

(62)

According to the state bounded, control inputs are strictly bounded,

|{\bar{t}}_{1} - {\bar{t}}_{2}| \leq \sqrt{C ε}

,

∥{\bar{x}}_{1} - {\bar{x}}_{2}∥ \leq \sqrt{C ε}

.

∥{\bar{u}}_{1} - {\bar{u}}_{1}^{'}∥ + ∥{\bar{u}}_{2} - {\bar{u}}_{2}^{'}∥ + ∥{\bar{v}}_{1} - {\bar{v}}_{1}^{'}∥ + ∥{\bar{v}}_{2} - {\bar{v}}_{2}^{'}∥ \leq 4 \sqrt{C ε}

, we can obtain the following:

\frac{{|{\bar{t}}_{1} - {\bar{t}}_{2}|}^{p} + {∥{\bar{x}}_{1} - {\bar{x}}_{2}∥}^{p}}{p ε} \leq 2 ω (6 \sqrt{p ε}),

(63)

\begin{matrix} χ (∥u_{1} - u_{1}^{'}∥ + ∥u_{2} - u_{2}^{'}∥ + ∥v_{1} - v_{1}^{'}∥ + ∥v_{2} - v_{2}^{'}∥) \leq 4 χ \sqrt{C ε} . \end{matrix}

(64)

Next, we assume:

\begin{matrix} ϕ_{1} = V_{2} (t_{2}, x_{2}, u_{1}^{'}, u_{2}^{'}, v_{1}^{'}, v_{2}^{'}) + \frac{{|t_{1} - t_{2}|}^{p} + {∥x_{1} - x_{2}∥}^{p}}{p ε} + β {(\sqrt{1 + {∥x_{1}∥}^{p}} + \sqrt{1 + {∥x_{2}∥}^{p}})}^{m 1} \\ + χ (∥u_{1} - u_{1}^{'}∥ + ∥u_{2} - u_{2}^{'}∥ + ∥v_{1} - v_{1}^{'}∥ + ∥v_{2} - v_{2}^{'}∥) . \end{matrix}

(65)

\begin{matrix} ϕ_{2} = V_{1} (t_{1}, x_{1}, u_{1}, u_{2}, v_{1}, v_{2}) - \frac{{|t_{1} - t_{2}|}^{p} + {∥x_{1} - x_{2}∥}^{p}}{p ε} \\ - β {(\sqrt{1 + {∥x_{1}∥}^{p}} + \sqrt{1 + {∥x_{2}∥}^{p}})}^{m 1} - χ (∥u_{1} - u_{1}^{'}∥ + ∥u_{2} - u_{2}^{'}∥ + ∥v_{1} - v_{1}^{'}∥ + ∥v_{2} - v_{2}^{'}∥) . \end{matrix}

(66)

Incorporateing the above Equations (62)–(64) into the Equations (65) and (66), the following could be obtained:

\begin{matrix} \partial_{s} ϕ_{1} + inf_{u_{i} \in U} (B^{T} \partial_{x} ϕ_{1} + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} \partial_{x x}^{2} ϕ_{1} + C_{e} V_{1} + L) \\ = \frac{| t_{1} - t_{2} |^{p - 1}}{ε} + C_{e} V_{1} (\bar{t_{1}}, \bar{x_{1}}) \\ + B^{T} (\frac{| x_{1} - x_{2} |^{p - 1}}{ε} + β m 1 {(\sqrt{1 + | x_{1} |^{p}} + \sqrt{1 + | x_{2} |^{p}})}^{m 1 - 1} \frac{p | x_{1} |^{p - 1}}{2 \sqrt{1 + | x_{1} |^{p}}}) \\ + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\frac{| x_{1} - x_{2} |^{p - 2}}{(p - 1) ε} + β m 1 (m 1 - 1) {(\sqrt{1 + | x_{1} |^{p}} + \sqrt{1 + | x_{2} |^{p}})}^{m 1 - 2} \frac{p | x_{1} |^{p - 1}}{2 \sqrt{1 + | x_{1} |^{p}}}) \\ + β m 1 {(\sqrt{1 + | x_{1} |^{p}} + \sqrt{1 + | x_{2} |^{p}})}^{m 1 - 1} \frac{p (p - 1) | x_{1} |^{p - 2} 2 \sqrt{1 + | x_{1} |^{p}} - p {| x_{1} |}^{p - 1}}{{(2 \sqrt{1 + | x_{1} |^{p}})}^{2}} + L \leq 0 . \end{matrix}

(67)

We could obtain the following:

\begin{matrix} \partial_{s} ϕ_{2} + sup_{v_{i} \in ν} (B \partial_{x} ϕ_{2} + \frac{1}{2} (σ_{s}^{T} σ_{s}) \partial_{x x}^{2} ϕ + C_{e} V_{2} + L) \\ = \frac{| t_{1} - t_{2} |^{p - 1}}{ε} + C_{e} V_{2} (\bar{t_{2}}, \bar{x_{2}}) \\ + B^{T} (\frac{| x_{1} - x_{2} |^{p - 1}}{ε} + β m 1 {(\sqrt{1 + | x_{1} |^{p}} + \sqrt{1 + | x_{2} |^{p}})}^{m 1 - 1} \frac{p | x_{2} |^{p - 1}}{2 \sqrt{1 + | x_{2} |^{p}}}) \\ + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\frac{| x_{1} - x_{2} |^{p - 2}}{(p - 1) ε} + β m 1 (m 1 - 1) {(\sqrt{1 + | x_{1} |^{p}} + \sqrt{1 + | x_{2} |^{p}})}^{m 1 - 2} \frac{p | x_{2} |^{p - 1}}{2 \sqrt{1 + | x_{2} |^{p}}}) \\ + β m 1 {(\sqrt{1 + | x_{1} |^{p}} + \sqrt{1 + | x_{2} |^{p}})}^{m 1 - 1} \frac{p (p - 1) | x_{2} |^{p - 2} 2 \sqrt{1 + | x_{2} |^{p}} - p {| x_{2} |}^{p - 1}}{{(2 \sqrt{1 + | x_{2} |^{p}})}^{2}} \\ + L \geq 0 . \end{matrix}

(68)

Subtracting the above two Equations (67) and (68) yields the following:

\begin{matrix} C_{e} (V_{1} (\bar{t_{1}}, \bar{x_{1}}) - V_{2} (\bar{t_{2}}, \bar{x_{2}})) \\ + B^{T} (\frac{∥ x_{1} - x_{2} ∥^{p - 1}}{ε} + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}}) \\ + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\frac{∥ x_{1} - x_{2} ∥^{p - 2}}{(p - 1) ε} + β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 2} \frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}}) \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \frac{p (p - 1) ∥ x_{1} ∥^{p - 2} 2 \sqrt{1 + ∥ x_{1} ∥^{p}} - p {∥ x_{1} ∥}^{p - 1} (\frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}})}{{(2 \sqrt{1 + ∥ x_{1} ∥^{p}})}^{2}} \\ - B^{T} (\frac{∥ x_{1} - x_{2} ∥^{p - 1}}{ε} + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}}) \\ - \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\frac{∥ x_{1} - x_{2} ∥^{p - 2}}{(p - 1) ε} + β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{1 - 2} \frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}}) \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \frac{p (p - 1) ∥ x_{2} ∥^{p - 2} 2 \sqrt{1 + ∥ x_{2} ∥^{p}} - p {∥ x_{2} ∥}^{p - 1} (\frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}})}{{(2 \sqrt{1 + ∥ x_{2} ∥^{p}})}^{2}} \\ \leq 0 . \end{matrix}

(69)

For the second term of Equation (69):

\begin{matrix} B^{T} (\frac{∥ x_{1} - x_{2} ∥^{p - 1}}{ε}) + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} (\frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}}) \\ - B^{T} (\frac{∥ x_{1} - x_{2} ∥^{p - 1}}{ε}) - β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} (\frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}}) \end{matrix}

= B^{T} (β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} (\frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}} - \frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}}))

Then:

\begin{matrix} = B^{T} (β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} (\frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}} - \frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}})) \\ = B^{T} (β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} (C_{p} ∥ x_{2} - x_{1} ∥)) \end{matrix}

(70)

for the third term of Equations (69) and (70).

\begin{matrix} \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\begin{matrix} \frac{∥ x_{1} - x_{2} ∥^{p - 2}}{(p - 1) ε} + β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 2} \times (\frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}}) \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \\ \times \frac{p (p - 1) ∥ x_{2} ∥^{p - 2} 2 \sqrt{1 + ∥ x_{2} ∥^{p}} - p {∥ x_{2} ∥}^{p - 1} (\frac{p ∥ x_{2} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{2} ∥^{p}}})}{{(2 \sqrt{1 + ∥ x_{2} ∥^{p}})}^{2}} \end{matrix}) \\ - \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\begin{matrix} \frac{∥ x_{1} - x_{2} ∥^{p - 2}}{(p - 1) ε} + β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 2} \times (\frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}}) \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \\ \times \frac{p (p - 1) ∥ x_{1} ∥^{p - 2} 2 \sqrt{1 + ∥ x_{1} ∥^{p}} - p {∥ x_{1} ∥}^{p - 1} (\frac{p ∥ x_{1} ∥^{p - 1}}{2 \sqrt{1 + ∥ x_{1} ∥^{p}}})}{{(2 \sqrt{1 + ∥ x_{1} ∥^{p}})}^{2}} \end{matrix}) \end{matrix}

(71)

There exists constant

C_{p}

and

C_{p 2}

, such that:

\frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\begin{matrix} β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 2} \times C_{p} ∥ x_{2} - x_{1} ∥ \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \times C_{p 2} ∥ x_{2} - x_{1} ∥ \end{matrix})

(72)

Therefore, we can obtain the following:

\begin{matrix} C_{e} (V_{1} ({\bar{t}}_{1}, {\bar{x}}_{1}) - V_{2} ({\bar{t}}_{2}, {\bar{x}}_{2})) \\ \leq B^{T} (\begin{matrix} β m {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \times C_{p} ∥ x_{2} - x_{1} ∥ \end{matrix}) \\ + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\begin{matrix} β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 2} \times C_{p} ∥ x_{2} - x_{1} ∥ \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \times C_{p 2} ∥ x_{2} - x_{1} ∥ \end{matrix}) \end{matrix}

(73)

\begin{matrix} C_{e} \frac{M}{4} \leq B^{T} (\begin{matrix} β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \times C_{p} ∥ x_{2} - x_{1} ∥ \end{matrix}) \\ + \frac{1}{2} {(σ_{s}^{T} σ_{s})}^{T} (\begin{matrix} β m 1 (m 1 - 1) {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 2} \times C_{p} ∥ x_{2} - x_{1} ∥ \\ + β m 1 {(\sqrt{1 + ∥ x_{1} ∥^{p}} + \sqrt{1 + ∥ x_{2} ∥^{p}})}^{m 1 - 1} \times C_{p 2} ∥ x_{2} - x_{1} ∥ \end{matrix}) \end{matrix}

(74)

Let

β \to 0

, and assume that

C_{e} > 0

. Under this assumption, we obtain

M \leq 0

, which contradicts the initial assumption that

M > 0

. Therefore, we conclude that:

M = sup_{t \in [0, T]} \{V_{1} - V_{2}\} \leq 0 .

This implies:

V_{1} \leq V_{2} .

Similarly, if

V_{1}

is a viscosity supersolution and

V_{2}

is a viscosity subsolution, we can deduce that:

V_{1} \geq V_{2} .

Thus, it can be concluded that:

V_{1} = V_{2} .

This shows that when the control input is strictly bounded, the value function is the unique viscosity solution of the Hamilton–Jacobi–Bellman (HJB) equation. □

Viscosity solutions may not have first- or second-order continuity. Given that the problem involves stochastic elements, we introduce functions

Ψ_{1}

and

Ψ_{2}

to approximate

V_{1}

and

V_{2}

, respectively, which allows us to represent the Hamilton–Jacobi–Bellman (HJB) equation. By using

Ψ_{1} - Ψ_{2}

to approximate

V_{1} - V_{2}

, we avoid dealing with the discontinuities of

V_{1}

and

V_{2}

directly. Specifically, Formulas (65) and (66) allow us to remove the non-differentiable points of

V_{1}

and

V_{2}

through the introduction of boundedness conditions. Our ultimate goal is to prove that the upper bound of

V_{1} - V_{2}

is less than zero, thereby confirming that

V_{1} \leq V_{2}

. A similar proof can be used to show

V_{1} \geq V_{2}

.

We provide a detailed explanation of the derivation process for the HJB equation as a solution. To demonstrate the existence of a unique solution, two aspects must be addressed: Existence of a solution: The existence of a solution is established by proving that the value function is Lipschitz continuous. Uniqueness of the solution: The uniqueness of the solution is proven using a contradiction method. Specifically, by showing that the upper and lower solutions converge to the same value, we confirm the uniqueness of the solution. To support these points, we use Lemmas 1–3: Lemma 1 demonstrates that time is Lipschitz continuous. Lemma 2 establishes that the state is Lipschitz continuous. Lemma 3 confirms that under bounded control inputs, the control inputs are Lipschitz continuous. The derivation process for the HJB equation is shown in the Figure 4.

Next, we proceed to solve the

H J B

equation. Let

Ψ (t, x)

be the value function that solves the

H J B

equation. The solution takes the following form:

Ψ (s, x) = x^{T} S_{1} x + S_{2}^{T} x + x^{T} S_{3} + S_{4}

(75)

where

S_{1} \in R^{m \times m}

with

S_{1} (T) = 0

,

S_{2} \in R^{m \times 1}

with

S_{2} (T) = 0

,

S_{3} \in R^{m \times 1}

with

S_{3} (T) = 0

, and

S_{4} \in R^{1 \times 1}

with

S_{4} (T) = 0

. Using Equation (75), we obtain the following:

\begin{matrix} f (s, x) = x^{T} S_{1}^{'} x + {(S_{2}^{'})}^{T} x + x^{T} S_{3}^{'} + S_{4}^{'} \\ + e^{- \int_{t}^{s} c (r) d r} \{\begin{matrix} x^{T} C x + u^{T} D u - v^{T} Δ v + x^{T} α x + x^{T} p B + B^{T} p x + β^{T} σ x + x^{T} σ β + tr (p σ^{T} σ) \end{matrix}\} \\ + C_{e} (x^{T} S_{1} x + S_{2}^{T} x + x^{T} S_{3} + S_{4}) + B^{T} (S_{1} x + S_{1}^{T} x + S_{2} + S_{3}) + \sum_{i, j} ({(σ_{s} σ_{s}^{T})}_{i, j} S_{1, i, j}) \end{matrix}

(76)

where

S_{1}^{'} = \frac{d}{d s} S_{1}, S_{2}^{'} = \frac{d}{d s} S_{2}, S_{3}^{'} = \frac{d}{d s} S_{3}, S_{4}^{'} = \frac{d}{d s} S_{4}

. According to the first-order condition optimality, the optimal feedback strategy could be expressed as follows:

\begin{matrix} u_{1}^{*} = - D_{1}^{- 1} (\frac{e^{\int_{t}^{s} c (r) d r}}{2} b_{u 1}^{T} (S_{1} x + S_{1}^{T} x + S_{2} + S_{3}) + b_{u 1}^{T} p x), \\ u_{2}^{*} = - D_{2}^{- 1} (\frac{e^{\int_{t}^{s} c (r) d r}}{2} b_{u 2}^{T} (S_{1} x + S_{1}^{T} x + S_{2} + S_{3}) + b_{u 2}^{T} p x), \\ v_{1}^{*} = Δ_{1}^{- 1} (\frac{e^{\int_{t}^{s} c (r) d r}}{2} b_{v 1}^{T} (S_{1} x + S_{1}^{T} x + S_{2} + S_{3}) - b_{v 1}^{T} p x), \\ v_{2}^{*} = Δ_{2}^{- 1} (\frac{e^{\int_{t}^{s} c (r) d r}}{2} b_{v 2}^{T} (S_{1} x + S_{1}^{T} x + S_{2} + S_{3}) - b_{v 2}^{T} p x) . \end{matrix}

(77)

Next, by incorporating the optimal strategies from Equation (77) into the

H J B

Equation (24), we can compare the quadratic, linear, and constant terms. This leads to the following coupled Riccati equations:

\begin{matrix} S_{1}^{'} + e^{- \int_{t}^{s} c (r) d r} (α + C) + C_{e} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{u 1} D_{1} b_{u 1}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{u 1} D_{1} S_{1}^{T} \\ + \frac{1}{2} S_{1}^{T} b_{u 1} D_{1} b_{u 1}^{T} p + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} D_{1} b_{u 1}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} D_{1} S_{1}^{T} \\ + \frac{1}{2} S_{1} D_{1} b_{u 1}^{T} p + \frac{1}{2} p^{T} b_{u 1} D_{1} b_{u 1}^{T} S_{1} + \frac{1}{2} p^{T} b_{u 1} D_{1} S_{1}^{T} + e^{- \int_{t}^{s} c (r) d r} p^{T} b_{u 1} D_{1} b_{u 1}^{T} p \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{u 2} D_{2} b_{u 2}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{u 2} D_{2} S_{1}^{T} + \frac{1}{2} S_{1}^{T} b_{u 2} D_{2} b_{u 2}^{T} p + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} D_{2} b_{u 2}^{T} S_{1} \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} D_{2} S_{1}^{T} + \frac{1}{2} S_{1} D_{2} b_{u 2}^{T} p + \frac{1}{2} p^{T} b_{u 2} D_{2} b_{u 2}^{T} S_{1} + \frac{1}{2} p^{T} b_{u 2} D_{2} S_{1}^{T} + e^{- \int_{t}^{s} c (r) d r} p^{T} b_{u 2} D_{2} b_{u 2}^{T} p \\ - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{v 1} Δ_{1} b_{v 1}^{T} S_{1} - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{v 1} Δ_{1} S_{1}^{T} + \frac{1}{2} S_{1}^{T} b_{v 1} D_{1} b_{v 1}^{T} p - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} Δ_{1} b_{v 1}^{T} S_{1} \\ - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} Δ_{1} S_{1}^{T} + \frac{1}{2} S_{1} Δ_{1} b_{v 1}^{T} p + \frac{1}{2} p^{T} b_{v 1} Δ_{1} b_{v 1}^{T} S_{1} + \frac{1}{2} p^{T} b_{v 1} Δ_{1} S_{1}^{T} \\ - e^{- \int_{t}^{s} c (r) d r} x^{T} p^{T} b_{v 1} Δ_{1} b_{v 1}^{T} p x - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{v 2} Δ_{2} b_{v 2}^{T} S_{1} - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{v 2} Δ_{2} S_{1}^{T} \\ + \frac{1}{2} S_{1}^{T} b_{v 2} D_{2} b_{v 2}^{T} p - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} Δ_{2} b_{v 2}^{T} S_{1} - \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} Δ_{2} S_{1}^{T} + \frac{1}{2} S_{1} Δ_{2} b_{v 2}^{T} p \\ + \frac{1}{2} p^{T} b_{v 2} Δ_{1} b_{v 2}^{T} S_{1} + \frac{1}{2} p^{T} b_{v 2} Δ_{2} S_{1}^{T} - e^{- \int_{t}^{s} c (r) d r} x^{T} p^{T} b_{v 2} Δ_{2} b_{v 2}^{T} p x = 0 . \end{matrix}

(78)

where

S_{1} (T) = 0

.

\begin{matrix} {(S_{2}^{'})}^{T} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} D_{1} b_{u 1}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} D_{1} b_{u 1}^{T} S_{1}^{T} + \frac{1}{2} {(S_{2} + S_{3})}^{T} D_{1} b_{u 1}^{T} p \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} D_{2} b_{u 2}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} D_{2} b_{u 2}^{T} S_{1}^{T} + \frac{1}{2} {(S_{2} + S_{3})}^{T} D_{2} b_{u 2}^{T} p \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} Δ_{1} b_{v 1}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} Δ_{1} b_{v 1}^{T} S_{1}^{T} + \frac{1}{2} {(S_{2} + S_{3})}^{T} Δ_{1} b_{v 1}^{T} p \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} Δ_{2} b_{v 2}^{T} S_{1} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} {(S_{2} + S_{3})}^{T} Δ_{2} b_{v 2}^{T} S_{1}^{T} + \frac{1}{2} {(S_{2} + S_{3})}^{T} Δ_{2} b_{v 2}^{T} p \\ + e^{- \int_{t}^{s} c (r) d r} B^{T} p + e^{- \int_{t}^{s} c (r) d r} β^{T} σ + C_{e} S_{2}^{T} + B^{T} (S_{1} + S_{1}^{T}) = 0 . \end{matrix}

(79)

where

S_{2} (T) = 0

.

\begin{matrix} S_{3}^{'} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{u 1} D_{1} (S_{2} + S_{3}) + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} b_{u 1} D_{1} (S_{2} + S_{3}) + \frac{1}{2} p^{T} b_{u 1} D_{1} (S_{2} + S_{3}) \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{u 2} D_{2} (S_{2} + S_{3}) + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} b_{u 2} D_{2} (S_{2} + S_{3}) + \frac{1}{2} p^{T} b_{u 2} D_{2} (S_{2} + S_{3}) \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{v 1} Δ_{1} (S_{2} + S_{3}) + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} b_{v 1} Δ_{1} (S_{2} + S_{3}) + \frac{1}{2} p^{T} b_{v 1} Δ_{1} (S_{2} + S_{3}) \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1}^{T} b_{v 2} Δ_{2} (S_{2} + S_{3}) + \frac{e^{\int_{t}^{s} c (r) d r}}{4} S_{1} b_{v 2} Δ_{2} (S_{2} + S_{3}) + \frac{1}{2} p^{T} b_{v 2} Δ_{2} (S_{2} + S_{3}) \\ + e^{- \int_{t}^{s} c (r) d r} B^{T} p + e^{- \int_{t}^{s} c (r) d r} β^{T} σ + C_{e} S_{3} + B^{T} S_{3} . \end{matrix}

(80)

where

S_{3} (T) = 0

.

\begin{matrix} S_{4}^{'} + \frac{e^{\int_{t}^{s} c (r) d r}}{4} ({(S_{2} + S_{3})}^{T} b_{u 1} D_{1} b_{u 1}^{T} (S_{2} + S_{3})) + \frac{e^{\int_{t}^{s} c (r) d r}}{4} ({(S_{2} + S_{3})}^{T} b_{u 2} D_{2} b_{u 2}^{T} (S_{2} + S_{3})) \\ + \frac{e^{\int_{t}^{s} c (r) d r}}{4} ({(S_{2} + S_{3})}^{T} b_{v 1} Δ_{1} b_{v 1}^{T} (S_{2} + S_{3})) + \frac{e^{\int_{t}^{s} c (r) d r}}{4} ({(S_{2} + S_{3})}^{T} b_{v 2} Δ_{2} b_{v 2}^{T} (S_{2} + S_{3})) \\ + e^{- \int_{t}^{s} c (r) d r} t r (p σ σ^{T}) + C_{e} S_{4} + \sum_{i, j} ({(σ_{s} σ_{s}^{T})}_{i, j} S_{1, i, j}) = 0 . \end{matrix}

(81)

where

S_{4} (T) = 0

. Equations (8), (20) and (78)–(81) represent the forward–backward stochastic differential equations (FBSDEs). Next, we will demonstrate that the state is globally stable for any initial state

x_{0}

.

4. The Barrier Surface in the Stochastic Differential Game

Theorem 2.

The solution

\{x (s), t \leq s \leq T\}

of the system (8) under the optimal strategies (77) of the pursuers and evader is globally stable in the sense that for any initial state

x_{0}

, the following holds:

\underset{T \to \infty}{lim sup} \frac{1}{T} E (\int_{t}^{T} {∥x (s)∥}^{2} d s) < \infty

(82)

Proof.

It can be assumed that

b (s, x (s)) \leq Φ (s) x (s) + γ_{[t]} (v (t) - v ([t])),

where

Φ (s)

is uniformly stable and converging to a random matrix

Φ (\infty)

, and

Φ^{T} (s) K (s) + K (s) Φ (s) = - I .

To facilitate estimation, we employ reduced excitation or exploration signals. We use a random sequence where

lim_{k \to \infty} γ_{k} = 0

, with

min γ_{k} \geq k^{- \frac{1}{5}}

and

γ_{0} = 0

. An independent standard Wiener process sequence is selected to counteract the randomness of the states. It satisfies the following:

\underset{N \to \infty}{lim sup} \frac{1}{N} \sum_{k = 1}^{N} \int_{k}^{k + 1} γ_{k}^{2} {|v (t) - v (k)|}^{2} d t = 0 .

System (8) can now be rewritten as follows:

\begin{matrix} d x (s) \leq (\begin{matrix} Φ (s) x (s) + γ_{[t]} (v (t) - v ([t])) + b_{u 1} (x (s)) u_{1} (s) + b_{u 2} (x (s)) u_{2} (s) \\ + b_{v 1} (x (s)) v_{1} (s) + b_{v 2} (x (s)) v_{2} (s) \end{matrix}) d s \\ + σ (s, x (s), u_{1} (s), u_{2} (s), v_{1} (s), v_{2} (s)) d W (s) \end{matrix}

(83)

Let

Ω = {[\begin{matrix} b_{u 1} & b_{u 2} & \begin{matrix} b_{v 1} & b_{v 2} \end{matrix} \end{matrix}]}^{T}

,

Σ = {[\begin{matrix} u_{1} & u_{2} & v_{1} & v_{2} \end{matrix}]}^{T}

. The following could be obtained:

\begin{matrix} d x (s) \leq (Φ (s) x (s) + γ_{[t]} (v (t) - v ([t])) + Ω^{T} Σ) d s + σ d W (s) \end{matrix}

(84)

According to the closed optimal backward strategy (77), we could obtain the following:

\begin{matrix} u_{1} = L_{u 1} x + c_{u 1}, u_{2} = L_{u 2} x + c_{u 2}, v_{1} = L_{v 1} x + c_{v 1}, v_{2} = L_{v 2} x + c_{v 2}, Σ = L_{u v} x + c_{u v} . \end{matrix}

(85)

By applying the

I t \hat{o}

Lemma, the following can be obtained:

\begin{matrix} d 〈K (s) x (s), x (s)〉 = \\ 2 〈\begin{matrix} K (s) x (s), Φ (s) x (s) + Ω^{T} (L_{u v} x + c_{u v}) + γ_{[t]} (v (t) - v ([t])) \end{matrix}〉 d t \\ + tr (K (s) σ σ^{T}) d t + 2 〈K (s) x (s), σ d W (s)〉 + (1 - m_{u v}) {| x (s) |}^{2} d t \\ = 2 〈K (s) x (s), Ω^{T} c_{u v} + γ_{[t]} (v (t) - v ([t]))〉 d t + tr (K (s) σ σ^{T}) d t + 2 〈K (s) x (s), σ d W (s)〉 \end{matrix}

(86)

where

|\int_{t}^{T} 〈K (s) x (s), σ d W (s)〉| = O ({(\int_{t}^{T} {|x (s)|}^{2} d t)}^{\frac{1}{2} + ε})

,

ε \in (0, 1 / 2)

. By integrating Equation (86) and applying the Cauchy–Schwarz inequality, we obtain the following:

\begin{matrix} \sum_{i = [t]}^{[T]} (〈K (i - 1) x (i), x (i)〉 - 〈K (i - 1) x (i - 1), x (i - 1)〉) + 〈K ([T]) x (T), x (T)〉 - 〈K ([T]) x ([T]), x ([T])〉 \\ + 〈K ([t]) x (t), x (t)〉 - 〈K ([t]) x ([t]), x ([t])〉 + \int_{t}^{T} (1 - m_{u v}) {| x (t) |}^{2} d t \\ = \int_{t}^{T} 2 〈K (s) x (s), Ω^{T} c_{u v} + γ_{[t]} (v (t) - v ([t]))〉 d t + \int_{t}^{T} tr (K (s) σ σ^{T}) d t + \int_{t}^{T} 2 〈K (s) x (s), σ d W (s)〉 \\ = O ({(\int_{t}^{T} {| x (s) |}^{2} d t)}^{\frac{1}{2}}) + O ({((\int_{t}^{T} {| x (s) |}^{2} d t) \int_{t}^{T} γ_{[t]}^{2} {(v (t) - v ([t]))}^{2} d t)}^{\frac{1}{2}}) + O (T) \\ + O ({(\int_{t}^{T} {| x (s) |}^{2} d t)}^{\frac{1}{2} + ε}) \end{matrix}

(87)

Then, according to Equation (87), and integrating the system Equation (83), the integral interval is

(k, k + 1]

.

\begin{matrix} x (k + 1) \leq e^{Φ (k)} x (k) + \int_{k}^{k + 1} e^{(k + 1 - s) Φ (k)} (γ_{[t]} (v (s) - v ([k])) + Ω^{T} c_{u v}) d s \\ + \int_{k}^{k + 1} e^{(k + 1 - s) Φ (k)} σ d W (s) \end{matrix}

(88)

According to the Cauchy–Schwarz inequality, the following can be concluded:

\begin{matrix} {|x (k + 1)|}^{2} \leq m {|x (k)|}^{2} + m_{2} {|\int_{k}^{k + 1} (Ω^{T} c_{u v}) d s|}^{2} \\ \begin{matrix}  \end{matrix} + m_{3} \int_{k}^{k + 1} (γ_{[t]} {|v (s) - v ([k])|}^{2}) d s + m_{1} {|\int_{k}^{k + 1} e^{(k + 1 - s) Φ (k)} σ d W (s)|}^{2} \end{matrix}

(89)

where

0 < m < 1, m_{1},

m_{2}, m_{3} > 0

. Some fixed constants are relative to the maximum of

\{e^{Φ (k)}, k \in N\}

.

\begin{matrix} \sum_{k = 1}^{N} {|x (k)|}^{2} \leq m \sum_{k = 0}^{N - 1} {|x (k)|}^{2} + m_{2} \int_{0}^{N - 1} {|Ω^{T} c_{u v}|}^{2} d s + m_{3} \sum_{i = 0}^{N - 1} \int_{k}^{k + 1} (γ_{k}^{2} {|v (s) - v ([k])|}^{2}) d s \\ \begin{matrix}  \end{matrix} + m_{1} \sum_{k = 0}^{N - 1} {|\int_{k}^{k + 1} e^{(k + 1 - s) Φ (k)} σ d W (s)|}^{2} . \end{matrix}

(90)

The following could be obtained:

\begin{matrix} lim_{N \to \infty} \frac{(1 - m)}{N} \sum_{k = 1}^{N} {|x (k)|}^{2} \leq \frac{m}{N} {|x (0)|}^{2} + \frac{m_{2}}{N} \int_{0}^{N - 1} {|Ω^{T} c_{u v}|}^{2} d s \\ \begin{matrix}  \end{matrix} + \frac{m_{3}}{N} \sum_{i = 0}^{N - 1} \int_{k}^{k + 1} (γ_{k}^{2} {|v (s) - v ([k])|}^{2}) d s + \frac{m_{1}}{N} \sum_{k = 0}^{N - 1} {|\int_{k}^{k + 1} e^{(k + 1 - s) Φ (k)} σ d W (s)|}^{2} . \end{matrix}

(91)

Since the second term is bounded and the expectation of the third and fourth terms are equal to zero, we obtain the following:

\underset{T \to \infty}{lim sup} E (\frac{1}{T} \int_{t}^{T} {|x (s)|}^{2} d s) < \infty .

This shows that for the state variables, as time approaches infinity, the norm of the state is a linear function of time. Hence, the proof is completed. □

Let

x^{i} (s) : (Ω_{i}, F_{i}, P_{i}) \to (Ω, F), s \in [t, T]

be two stochastic processes. The processes

x^{1} (s)

and

x^{2} (s)

are said to have the same finite-dimensional distribution on

[t, T]

if there exists a set D of full measures on

[t, T]

, such that for all

t \leq t_{1} \leq t_{2} \leq \dots \leq t_{n} \leq T

, where

t_{i} \in D

and

A \in F^{\otimes n}

(with ⊗ denoting the tensor product). We have the following:

\begin{matrix} P_{1} (W_{1}, (x^{1} (t_{1}), x^{1} (t_{2}), \dots, x^{1} (t_{n})) \in A) = P_{2} (W_{2}, (x^{2} (t_{1}), x^{2} (t_{2}), \dots, x^{2} (t_{n})) \in A) \end{matrix} .

This can be written as

L_{P_{1}} (x^{1}) = L_{P_{2}} (x^{2})

, where

L_{P_{i}}

denotes the law of the process

x^{i}

under probability measure

P_{i}

. Now, let

μ_{1} = (Ω_{1}, F_{1}, F_{s}^{1, t}, P_{1}, W_{1})

and

μ_{2} = (Ω_{2}, F_{2}, F_{s}^{2, t}, P_{2}, W_{2})

be two filtered probability spaces.

Theorem 3.

Let

η_{i} \in L^{2} (Ω_{i}, F_{s}^{i, t}, P_{i})

, and let

x^{i} (s)

be the unique solution of the state equation given the control inputs

u_{1}^{i}, u_{2}^{i} \in U

,

v_{1}^{i}, v_{2}^{i} \in ν

with initial state

x^{i} (t) = η_{t}

. If the following equality holds:

L_{P_{1}} (u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}, W_{1}, η_{1}) = L_{P_{2}} (u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}, W_{2}, η_{2})

then we have:

L_{P_{1}} (x^{1}, u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) = L_{P_{2}} (x^{2}, u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2})

Proof.

x^{i} (s)

are the obtained limits of iterations of the maps. This implies the following:

\begin{matrix} K_{i} [z^{i}] (s) = η_{t} + \int_{t}^{s} B (r, z^{i} (r), u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i}) d r + \int_{t}^{s} σ (r, z^{i} (r), u_{1}^{i}, u_{2}^{i}, v_{1}^{i}, v_{2}^{i}) d W \end{matrix}

(92)

where

z_{1}^{i} (s) = η_{i}

,

z_{k + 1}^{i} (s) = K_{i} [z_{k}^{i}] (s)

,

x^{i} (s) = lim_{k \to \infty} z_{k + 1}^{i} (s)

.

\begin{matrix} z_{k}^{1} (s) - z_{k}^{2} (s) = \int_{t}^{s} B (r, z_{k - 1}^{1} (r), u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) - B (r, z_{k - 1}^{2} (r),, u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) d r \\ + \int_{t}^{s} σ (r, z_{k - 1}^{1} (r), u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) - σ (r, z_{k - 1}^{2} (r), u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) d W \end{matrix}

(93)

Hence, we use the Burkholder–Davis–Gundy inequalities in the first term and writing

C_{z}

for the constant in the inequalities.

\begin{matrix} E ({|z_{k}^{1} (s) - z_{k}^{2} (s)|}^{2}) \leq 2 E (\begin{matrix} {|\int_{t}^{s} \begin{matrix} B (r, z_{k - 1}^{1} (r), u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) - B (r, z_{k - 1}^{1} (r),, u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) d r \end{matrix}|}^{2} \\ + {|\int_{t}^{s} \begin{matrix} σ (r, z_{k - 1}^{2} (r), u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) - σ (r, z_{k - 1}^{2} (r), u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) d W \end{matrix}|}^{2} \end{matrix}) \\ \leq 2 (\begin{matrix} C_{z} E (\int_{t}^{s} {(\begin{matrix} σ (r, z_{k - 1}^{1} (r), u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) - σ (r, z_{k - 1}^{1} (r), u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) \end{matrix})}^{2} d W) \\ + T E (\int_{t}^{s} {(\begin{matrix} B (r, z_{k - 1}^{2} (r), u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) - B (r, z_{k - 1}^{2} (r),, u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) \end{matrix})}^{2} d r) \end{matrix}) \\ \leq 2 (C_{z} + T) C_{k}^{2} E \int_{t}^{s} {|z_{k - 1}^{1} - z_{k - 1}^{2}|}^{2} d r \leq C_{v} E \int_{t}^{s} {|z_{k - 1}^{1} - z_{k - 1}^{2}|}^{2} d r \end{matrix}

(94)

where

C_{v} = 2 (C_{z} + T) C_{k}^{2}

. According to this definition, we could have the following:

L_{P_{1}} (z_{k}^{1}, w_{1}, u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1}) = L_{P_{2}} (z_{k}^{2}, w_{2}, u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2}) .

(95)

This implies the following:

\begin{matrix} E ({|x^{1} (s) - x^{2} (s)|}^{2}) = lim_{k \to \infty} E ({|z_{k}^{1} (s) - z_{k}^{2} (s)|}^{2}) \leq lim_{k \to \infty} C_{v 1} E \int_{t}^{s} {|z_{k - 1}^{1} - z_{k - 1}^{2}|}^{2} d r \\ . . . \\ \leq lim_{k \to \infty} C_{v n} E \int_{t}^{s} {|z_{1}^{1} - z_{1}^{2}|}^{2} d r \leq C_{v} E \int_{t}^{s} {|η_{t} - η_{t}|}^{2} d r \leq 0 . \end{matrix}

(96)

Therefore, passing to the limit as

k \to \infty

gives the following result:

E (x^{1} (s)) = E (x^{2} (s))

. This completes the proof. □

Theorem 4.

According to Assumptions 1 and 2, there exists a unique solution for

(s, x, α, u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*})

\in [t, T] \times R^{m \times 1} \times R^{m \times m} \times R^{n \times 1} \times R^{n \times 1} \times R^{1 \times 1} \times R^{1 \times 1}

to the forward–backward stochastic differential equations (FBSDEs).

Proof.

(The proof by contradiction): Suppose that there exist two solutions,

(x^{1}, α^{1}, u_{1}^{1}, u_{2}^{1}, v_{1}^{1}, v_{2}^{1})

,

(x^{2}, α^{2}, u_{1}^{2}, u_{2}^{2}, v_{1}^{2}, v_{2}^{2})

, and denote

\hat{x} = x^{1} - x^{2}, \hat{α} = α^{1} - α^{2}, {\hat{u}}_{1} = u_{1}^{1} - u_{1}^{2}, {\hat{u}}_{2} = u_{2}^{1} - u_{2}^{2},

{\hat{v}}_{1} = v_{1}^{1} - v_{1}^{2} .

Then, we can obtain the following:

\{\begin{matrix} d \hat{x} (s) = (b + b_{u 1} {\hat{u}}_{1} (s) + b_{u 2} {\hat{u}}_{2} (s) + b_{v 1} {\hat{v}}_{1} (s) + b_{v 2} {\hat{v}}_{2} (s)) d s, + \hat{σ} d W (s), \\ d \hat{p} = \hat{α} d s + \hat{β} d W = (- A^{T} \hat{p} + ζ (\hat{x} - E (\hat{x}))) d s + \hat{β} d W, \\ {\hat{x}}_{0} = 0, \\ {\hat{p}}_{T} = - R_{T} ({\hat{x}}_{T} - E ({\hat{x}}_{T})) . \end{matrix}

(97)

Applying the

I t \hat{o}

Lemma to

〈\hat{p}, \hat{x}〉

and taking the expectation on both sides:

\begin{matrix} 0 & = E 〈- R_{T} ({\hat{x}}_{T} - E ({\hat{x}}_{T})), \hat{x}〉 + E 〈\hat{p}, d \hat{x}〉 + E 〈d \hat{p}, \hat{x}〉 + E 〈d \hat{p}, d \hat{x}〉 \\ = E 〈- R_{T} ({\hat{x}}_{T} - E ({\hat{x}}_{T})), \hat{x}〉 \\ + E 〈\hat{p}, (b + b_{u 1} {\hat{u}}_{1} (s) + b_{u 2} {\hat{u}}_{2} (s) + b_{v 1} {\hat{v}}_{1} (s) + b_{v 2} {\hat{v}}_{2} (s)) d s + \hat{σ} d W〉 \\ + E 〈(- A^{T} \hat{p} + ζ (\hat{x} - E (\hat{x}))) d s + \hat{β} d W, \hat{x}〉 + E (β^{T} σ d s) \\ = E 〈R_{T} ({\hat{x}}_{T} - E ({\hat{x}}_{T})), {\hat{x}}_{T}〉 \\ + E (\int_{t}^{T} ({\hat{p}}^{T} (b + b_{u 1} {\hat{u}}_{1} (s) + b_{u 2} {\hat{u}}_{2} (s) + b_{v 1} {\hat{v}}_{1} (s) + b_{v 2} {\hat{v}}_{2} (s)) \\ + {\hat{x}}^{T} (- A^{T} \hat{p} + ζ (\hat{x} - E (\hat{x}))) + β^{T} σ) d s) \\ \geq E 〈R_{T}^{\frac{1}{2}} ({\hat{x}}_{T} - E ({\hat{x}}_{T})), R_{T}^{\frac{1}{2}} ({\hat{x}}_{T} - E ({\hat{x}}_{T}))〉 + E \int_{t}^{T} 〈ζ^{\frac{1}{2}} (\hat{x} - E (\hat{x})), ζ^{\frac{1}{2}} (\hat{x} - E (\hat{x}))〉 d s . \end{matrix}

(98)

Thus, we obtain

E ({\hat{x}}_{T} - E ({\hat{x}}_{T})) = 0,

and

E (ζ (\hat{x} - E (\hat{x}))) = 0,

which implies that

\hat{p} \equiv 0

. Therefore, the value function is the unique viscosity solution of the HJB equation, indicating that

S_{1}, S_{2}, S_{3}, S_{4}

are unique. Consequently, we have

E ({\hat{u}}_{1} (s)) = E ({\hat{u}}_{2} (s)) = E ({\hat{v}}_{1} (s)) = E ({\hat{v}}_{2} (s)) .

This leads to

E (\hat{x}) = 0 .

This completes the proof. □

Based on the global stability and uniqueness of the state, we can derive a comprehensive mapping of state information relative to the initial conditions. Next, we will explore the Lebesgue measure problem associated with barrier surfaces in the context of Nash equilibrium.

Theorem 5.

The Lebesgue measure of a set composed of state points where the distance change rate between pursuers and evaders (Nash equilibrium) is zero is shown in Figure 5:

m^{*} \{x (s) | E ({\overset{•}{R}}_{P i} (s)) = 0, \forall i = 1, 2, 3, . . .\} = 0

(99)

Proof.

The barrier surface can be defined as the set of points where the relative dynamics between the evader and pursuer reach a critical state such that the expected relative velocity projection between the two becomes zero. Mathematically, this surface is expressed as follows:

I_{i} = \{x (s) | E ({\overset{\cdot}{R}}_{P i} (s)) = 0, \forall i = 1, 2, 3, \dots\},

where

{\overset{\cdot}{R}}_{P i} = v_{E} cos θ_{E} cos φ_{E} - v_{P i} cos θ_{P i} cos φ_{P i},

with

I_{i} = (θ_{E}, φ_{E}, θ_{P i}, φ_{P i})

representing the points on the barrier surface such that

v_{E} cos θ_{E} cos φ_{E} = v_{P i} cos θ_{P i} cos φ_{P i} .

The barrier surface describes the critical configurations where the pursuer’s control actions cannot effectively reduce the distance to the evader. This implies that the states on the barrier surface act as a dynamic boundary between regions of successful pursuit and regions where the evader can potentially avoid capture. The barrier surface can be further understood as a countable set of points:

I_{n} = \{I_{1}, I_{2}, \dots, I_{N}, \dots\},

with a small neighborhood around each point:

Q_{ξ} = \{Q_{1}, Q_{2}, Q_{3}, \dots, Q_{N}, \dots\}, Q_{i} = B (I_{i}, \frac{ε}{N}) \supset \{I_{i}\},

where

I_{i} \in I_{n}

. According to the nested open interval theorem, there exists

ε > 0

such that the neighborhoods

Q_{ξ}

form a cover for the points on the barrier surface. The measure of these points on the barrier surface can be expressed as follows:

\begin{matrix} m^{*} (I_{n}) = (\sum_{i = 1}^{\infty} m^{*} (I_{i})) = inf (m^{*} (Q_{ξ})) = inf (m^{*} (⋃_{i = 1}^{\infty} Q_{i})) \end{matrix}

(100)

Thus:

\begin{matrix} inf (m^{*} (⋃_{i = 1}^{\infty} Q_{i})) \leq inf (\sum_{i = 1}^{N} m * (Q_{i})) \leq \frac{4}{3} \frac{π ε^{3}}{N^{3}} * \frac{N (1 + N)}{2} \approx \frac{2}{3} \frac{π ε^{3}}{N} \end{matrix}

(101)

where

m^{*} (Q_{i}) = \frac{4}{3} π {(\frac{ε}{N})}^{3}

represents the volume of the ball. This implies the following:

m^{*} (I_{n}) \leq \frac{2}{3} \frac{π ε^{3}}{N} .

Taking the limit as

ε \to 0

, we obtain the following:

lim_{ε \to 0} m^{*} (I_{n}) \leq lim_{ε \to 0} \frac{2}{3} \frac{π ε^{3}}{N} \leq 0

(102)

where N is a real number with an upper bound. Thus,

m^{*} (I_{n}) = 0

, completing the proof. □

Note: In the context of stochastic differential games, the Lebesgue measure of the barrier surface is zero, which divides the state space of the pursuit region into two distinct parts: the pursuit region and the terminal region. The state within the pursuit region has two ranges, and there is no equilibrium state between the pursuer and the evader. When discussing the equilibrium problem, it is important to clarify that equilibrium is not the desired outcome. We are primarily concerned with two results: capture and escape. Once an equilibrium state is reached, it indicates that the current strategies are ineffective—neither allowing the pursuer to capture the evader nor enabling the evader to escape from the pursuit region. Even if a termination time is established, the final state cannot be regarded as the definitive outcome of the game. Given any initial conditions, the state trajectory over a certain time frame is unique. Notably, when the state transitions through the barrier surface from the pursuit region to the terminal region, the time required for this transition is effectively zero.

In practical pursuit scenarios, it is often assumed that both sides maintain a balanced state when evenly matched. However, in reality, this balance is not static but fluctuates around the equilibrium. Using a missile interception model as our simulation framework, we illustrate that the condition

\dot{R} = 0

, representing a constant relative distance, does not persist in real-world scenarios. Instead, interception success corresponds to

\dot{R} < 0

, while

\dot{R} > 0

indicates that the evader escapes, resulting in the two participants diverging. This highlights an important insight: the condition

\dot{R} = 0

is a transient state, consisting of a set of discrete points forming a surface with a Lebesgue measure of zero. In practical terms, this means that the system’s state crosses the barrier surface in zero time. For missile interception problems, this translates to a binary outcome—either interception is successful, or it fails entirely.

5. The Multiple Pursuers and Evaders in a Stochastic Differential Game

When multiple pursuers are chasing evaders, the pursuers should be grouped, then each group should be incorporated into the state equation. As shown in Figure 6, the pursuit process is divided into the following five steps:

Step 1: Dividing the n pursuers into groups, with each group corresponding to a respective number of evaders denoted by y.
Step 2: Substituting the information of the pursuers into the system Equation (8), where $v_{1} (s) \in R^{y \times 1}$ and $v_{2} (s) \in R^{y \times 1}$ . The status includes the following: $Status : {R_{P 1}, R_{P 2}, \dots, θ_{L 1}, φ_{L 1}, θ_{L 2}, φ_{L 2}, \dots, θ_{P 1}, φ_{P 1}, θ_{P 2}, φ_{P 2}, \dots, θ_{E 1}, φ_{E 1}, θ_{E 2}, φ_{E 2}, \dots} .$
Step 3: Calculating the parameters $S_{1}, S_{2}, S_{3}, S_{4}$ of the value function based on the $H J B$ equation.
Step 4: Solving for the optimal closed-loop feedback strategies $u_{1}^{*}, u_{2}^{*}, v_{1}^{*}, v_{2}^{*}$ based on FBSDEs.
Step 5: Implementing the optimal strategies into the motion equations for the pursuers to capture the evaders.

6. Numerical Analysis

Example 1.

The initial conditions for this simulation are as follows:

Initial positions: The initial position of pursuer $P_{1}$ is [0, 15,000, 5000] m, while that of $P_{2}$ is [0, 13,000, −5000] m. The initial position of the evader E is [100,000, 14,000, 0] m.
Initial velocities: The pursuers’ velocities are set to 3000 m/s and 3500 m/s, respectively, while the evader’s velocity is 2000 m/s.
Acceleration limits: The maximum normal accelerations are 30 g for the pursuers and 20 g for the evader (g = 9.81 m/s²).
Flight path angles: The initial flight path angles for the pursuers are set to −4° (elevation) and 4° (azimuth). The evader’s initial angles are 10° (elevation) and 170° (azimuth).
Time step: A time interval of 0.1s was used for numerical integration.

During the simulation, both the pursuers and the evader employ the optimal strategy. The optimal trajectory is depicted in Figure 7. Two-dimensional projections in the X–Y and X–Z planes are shown in Figure 8 and Figure 9, respectively. These figures demonstrate the feasibility of the optimal state feedback strategy algorithm proposed in this paper. The distance between the evader and the pursuers is illustrated in Figure 10, where the final miss distance is

0.83 m

. In missile interception problems, it is commonly assumed that interception occurs when the distance is less than one meter. Furthermore, in this paper, the interception termination set is defined by

\dot{R} > 0

or

R < 1

. The evader is captured by the pursuers after

31.7 s

. Generally, the terminal guidance phase of missile interception lasts between 20 s and 40 s, and the result from Experiment 1 is within this range. Acceleration variations are presented in Figure 11 and Figure 12. The state norm is shown in Figure 13. During the interception process, the state continuously converges, and we have proven this in Theorem 2. The experimental results confirm the validity of Theorem 2, with the terminal state approaching zero. The value of the cost function is shown in Figure 14. It is evident that the cost function converges to a constant value, indicating the convergence of state and input over a finite time. The value function is shown in Figure 15. The value function satisfies the termination condition of the forward–backward stochastic differential equation, with the terminal value being zero. The simulation environment is summarized in Table 2. During the simulation process, the Runge–Kutta method is utilized for each iteration.

The mean value of the state is 503,775.07. The state graph demonstrates that the state norm gradually converges as time increases, indicating that as the time approaches the termination point, the state norm behaves like a linear function. Concurrently, the cost function stabilizes at a constant value, showing a decreasing growth rate over time. Furthermore, the value function satisfies the final convergence criterion, with

V (T) = 0

.

Example 2.

In this case, compared to Example 1, the evader employs a sinusoidal maneuver while the pursuer utilizes an optimal feedback strategy, with all other initial conditions remaining unchanged. After 32 s, the pursuer successfully captures the evader, achieving a mission distance of 0.38 m. The mean state value is 504,990.71. The three-dimensional pursuit diagram is displayed in Figure 16, while two-dimensional plane projections in the X–Y and X–Z planes are shown in Figure 17 and Figure 18, respectively. Since the evader’s maneuver is different from that in Example 1, the optimal trajectory is also different. However, in both cases, the pursuer successfully intercepts the evader. The primary objective of Example 2 is to demonstrate that, even when the evader adopts different evasion maneuvers, the pursuer is still able to intercept the evader. The distance between the evader and the pursuer is illustrated in Figure 19, confirming that the evader is captured. Acceleration variations are depicted in Figure 20 and Figure 21. The state norm is shown in Figure 22, the cost function in Figure 23, and the value function in Figure 24. The final experimental results confirm the feasibility of the proposed algorithm.

Example 3.

In this scenario, the evader employs a constant maneuver while the pursuer implements an optimal feedback strategy, with all other initial conditions remaining unchanged from Example 1. The pursuer successfully captures the evader, resulting in a final mission distance of 0.62 m. The mean state converges to 460,599.78, and the value function reaches zero at the moment of capture. The three-dimensional pursuit diagram is illustrated in Figure 25. Two-dimensional plane projections in the X–Y and X–Z planes are shown in Figure 26 and Figure 27, respectively. In Experiment 3, since the evader employs a constant-value maneuver, the trajectory of the target in three-dimensional space forms an arc. The distance between the evader and the pursuer is depicted in Figure 28, confirming that the evader has been captured. Additionally, the acceleration variations are presented in Figure 29 and Figure 30. In this experiment, the normal acceleration of the pursuer reached its maximum value, but the pursuer successfully captured the evader. The state norm is shown in Figure 31, the cost function in Figure 32, and the value function in Figure 33. The final experimental results confirm the feasibility of the algorithm.

Based on Examples 1–3, we observe the following: In Example 1, both the pursuers and the evader utilize the optimal feedback strategy, allowing the pursuer to successfully capture the evader. The changes in state throughout the pursuit process align with the proof presented in the theorem, and the cost function converges continuously. Ultimately, the value function approaches the designated terminal value. In Example 2, the evader does not employ the optimal feedback strategy; nevertheless, the pursuer successfully captures the evader in both cases. Notably, the cost function at termination is lower than that observed in Example 1. Furthermore, the data for the value function at the terminal time confirms that

V (T) = 0

.

Example 4.

We simulated a scenario involving three pursuers and two evaders. Pursuers 1 and 2 successfully captured Evader 1, while Pursuer 3 captured Evader 2. The final mission distances were recorded as follows:

R_{2} = 0.91 m

and

R_{3} = 0.75 m

. The state ultimately converged to 878,031.05. At the moment of capture, the value function converged to zero. The three-dimensional pursuit diagram is illustrated in Figure 34, while the two-dimensional plane projections in the X–Y and X–Z planes are shown in Figure 35 and Figure 36, respectively. In this experiment, both the pursuers and the evaders employed optimal game strategies. As a result, the pursuers successfully intercepted the evaders. However, due to the change in initial positions, the optimal trajectories differ from those in the previous experiments, demonstrating the adaptability of the strategy to varying initial conditions. The three-dimensional pursuit diagram in Figure 34 clearly shows the distinct paths taken by each pursuer and evader during the pursuit. Figure 37 depicts the distance between the evaders and the pursuers at the moment of capture. The distance between each pursuer and their respective evader decreased progressively over time, confirming the effectiveness of the optimal feedback strategy in achieving interception. The mission distances at the moment of capture were recorded as

R_{2} = 0.91 m

and

R_{3} = 0.75 m

, indicating the close proximity of the pursuers to the evaders when interception occurred. The acceleration variations are presented in Figure 38 and Figure 39. These plots reveal how the pursuers’ accelerations changed throughout the pursuit, with notable spikes occurring during the final stages of interception. These spikes indicate the maximum efforts exerted by the pursuers as they close the gap with the evaders. The state norm is illustrated in Figure 40, showing the progression of the system state during the pursuit. As expected, the state norm decreases steadily, reflecting the convergence of the system towards the interception point. This convergence is a strong indicator of the effectiveness of the optimal strategy. The cost function is shown in Figure 41, and the value function is depicted in Figure 42. The cost function reaches a constant value as the simulation progresses, demonstrating the stability of the optimal strategy. Notably, the value function converges to zero at the moment of capture, indicating that the pursuers successfully achieved their objective of intercepting the evaders. These results confirm the robustness and effectiveness of the optimal game strategy in various pursuit scenarios, as well as the adaptability of the strategy to different initial conditions.

Based on the many-to-many strategy presented in this article, we conducted a simulation for Example 4. The experimental results demonstrated that the pursuer successfully captured the evader, with the state converging to zero and the mean value of the state being finite. At the termination time, the value

V (T)

of the value function equals zero. Both Example 1 and Example 4 illustrate that the proposed many-to-one and many-to-many pursuit algorithms effectively enable the interception of multiple targets. We conducted 100 separate experiments with different initial conditions to test the robustness of the model. The initial positions of the pursuer and evader were randomly selected, with the constraint that the initial distance between them was always greater than 100,000 m. The initial angle was chosen within a predefined permissible range to ensure consistency in all trials. The experimental parameters and results are presented in Table 3. The simulation results indicate a high success rate for interception. Out of the 100 experiments, 98 resulted in successful interception, while only 2 cases saw the evader escaping. This demonstrates that the proposed model and algorithm are highly robust and effective, even under varied initial conditions. Numerical stability: To further assess the stability of the solution, we varied key system parameters such as control input constraints and system dynamics and observed the performance. The results showed that, despite variations in these parameters, the system consistently performed well, with the pursuer successfully intercepting the evader in the majority of the trials. This suggests that the solution to the FBSDEs remains stable and robust, even under small perturbations in the system parameters.

7. Conclusions

In this paper, we investigated optimal backward strategies in pursuit problems through stochastic differential games. We assumed that players’ strategies are bounded, which implies that there are constraints on the actions they can take, ensuring that strategies remain realistic within the context of the game. Our approach emphasizes the uniqueness of the value function as a viscosity solution, providing a robust theoretical foundation for our findings. We demonstrated that the parameters of the optimal backward strategies are intrinsically linked to the solutions of forward–backward stochastic differential equations (FBSDEs) and the terminal conditions of the cost function. This relationship represents a significant extension of the work done in references [10,20], where we offer a matrix formulation of the polynomial growth value function. This formulation not only simplifies the computational aspect of deriving optimal strategies but also enhances the applicability of our findings to real-world scenarios.

Furthermore, we provided the expressions for the optimal strategies through rigorous calculations and established the convergence of the state trajectory within the pursuit region. Our results, illustrated in Figure 6, indicate that when the state transitions from the pursuit region to the termination set, it encounters a barrier surface with a Lebesgue measure of zero. This phenomenon illustrates that the state cannot achieve Nash equilibrium during the pursuit process, reflecting the dynamics observed in actual pursuit scenarios. Finally, we validated our proposed optimal feedback strategy through numerical simulations, which demonstrated effective state convergence and successful evader capture. These findings not only reinforce the theoretical contributions of our work but also open avenues for further research in multi-target pursuit strategies.

Author Contributions

Conceptualization, Y.B. and D.Z.; methodology, Y.B.; software, Y.B.; validation, Y.B., D.Z. and Z.H.; formal analysis, Y.B.; investigation, D.Z.; resources, Z.H.; data curation, Y.B.; writing—original draft preparation, Y.B.; writing—review and editing, D.Z. and Z.H.; visualization, Y.B.; supervision, D.Z.; project administration, D.Z.; funding acquisition, Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 61773142.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Isaacs, R. Differential Games I, II, III, IV; Research Memoranda; RAND Corporation: Santa Monica, CA, USA, 1954. [Google Scholar]
Basar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory, 2nd ed.; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar]
Bagchi, A. Stackelberg Differential Games in Economic Models; Springer: Berlin/Heidelberg, Germany, 1984. [Google Scholar]
Smith, J.M. Evolution and the Theory of Games; Cambridge University Press: Cambridge, UK, 1982. [Google Scholar] [CrossRef]
Yeung, D.W.K.; Petrosyan, L.A. Cooperative Stochastic Differential Games; Springer: New York, NY, USA, 2006. [Google Scholar]
Ho, Y.C.; Bryson, A.; Baron, S. Differential games and optimal pursuit-evasion strategies. IEEE Trans. Autom. Control 1965, 10, 385–389. [Google Scholar] [CrossRef]
Bernhard, P. Linear-quadratic, two-person, zero-sum differential games: Necessary and sufficient conditions. J. Optim. Theory Appl. 1979, 27, 51–69. [Google Scholar] [CrossRef]
Liu, N.; Guo, L. Adaptive Stabilization of Noncooperative Stochastic Differential Games. SIAM J. Control Optim. 2024, 62, 1317–1342. [Google Scholar] [CrossRef]
Engwerds, J.C.; van den Broek, W.A.; Schumacher, J.M. Feedback Nash equilibria in uncertain infinite time horizon differential games. In Proceedings of the 14th International Symposium of Mathematical Theory of Networks and Systems, Perpignan, France, 19–23 June 2000; pp. 1–6. [Google Scholar]
Huang, Q.; Shi, J. Mixed leadership stochastic differential game in feedback information pattern with applications. Automatica 2024, 160, 111425. [Google Scholar] [CrossRef]
Xie, T.H.; Feng, X.W.; Huang, J.H. Mixed linear quadratic stochastic differential leader-follower game with input constraint. Appl. Math. Optim. 2021, 84, S215–S251. [Google Scholar] [CrossRef]
Moon, J. Linear-quadratic stochastic leader-follower differential games for Markov jump-diffusion models. Automatica 2023, 147, 110713. [Google Scholar] [CrossRef]
Lv, S. Two-player zero-sum stochastic differential games with regime switching. Automatica 2020, 114, 108819. [Google Scholar] [CrossRef]
Moon, J. A sufficient condition for linear-quadratic stochastic zero-sum differential games for Markov jump systems. IEEE Trans. Autom. Control 2019, 64, 1619–1626. [Google Scholar] [CrossRef]
Sun, H.; Yan, L.; Li, L. Linear-quadratic stochastic differential games with Markov jumps and multiplicative noise: Infinite-time case. Int. J. Innov. Comput. Appl. 2015, 11, 349–361. [Google Scholar]
Zhang, C.; Li, F. Non-zero sum differential game for stochastic Markovian jump systems with partially unknown transition probabilities. J. Frankl. Inst. 2021, 358, 7528–7558. [Google Scholar] [CrossRef]
Wang, B.; Zhang, H.; Fu, M.; Liang, Y. Decentralized strategies for finite population linear-quadratic-Gaussian games and teams. Automatica 2022, 148, 110789. [Google Scholar] [CrossRef]
Luo, G.; Zhang, H.; He, H.; Jin, Y.; Cui, Y. Mean field theory-based multi-agent adversarial cooperative learning. IEEE Trans. Cybern. 2020, 50, 5052–5065. [Google Scholar]
Bensoussan, A.; Chen, S.K.; Chutani, A.; Sethi, S.P.; Siu, C.C.; Yam, S.C.P. Feedback Stackelberg-Nash equilibria in mixed leadership games with an application to cooperative advertising. SIAM J. Control Optim. 2019, 57, 3413–3444. [Google Scholar] [CrossRef]
Huang, J.; Qiu, Z.; Wang, S.; Wu, Z. Linear quadratic mean-field game-team analysis: A mixed coalition approach. Automatica 2024, 159, 111358. [Google Scholar] [CrossRef]
Liu, N.; Guo, L. Stochastic Adaptive Linear Quadratic Differential Games. IEEE Trans. Autom. Control 2022, 69, 1066–1073. [Google Scholar] [CrossRef]
Hamadène, S. Nonzero sum linear-quadratic stochastic differential games with time-inconsistent coefficients. SIAM J. Control Optim. 1999, 37, 460–485. [Google Scholar]
Sun, J.; Yong, J. Linear quadratic stochastic differential games: Open-loop and closed loop saddle points. SIAM J. Control Optim. 2014, 52, 4082–4121. [Google Scholar] [CrossRef]
Yu, Z. An optimal feedback control-strategy pair for zero-sum linear-quadratic stochastic differential game: The Riccati equation approach. SIAM J. Control Optim. 2015, 53, 2141–2167. [Google Scholar] [CrossRef]
Miller, E.; Pham, H. Linear-quadratic McKean-Vlasov stochastic differential games. In Modeling, Stochastic Control, Optimization, and Applications; Springer: Berlin/Heidelberg, Germany, 2019; Volume 164, pp. 451–481. [Google Scholar] [CrossRef]
Sun, J. Two-person zero-sum stochastic linear-quadratic differential games. SIAM J. Control Optim. 2021, 59, 1804–1829. [Google Scholar] [CrossRef]
Chang, D.; Xiao, H. Linear quadratic nonzero sum differential games with asymmetric information. Math. Probl. Eng. 2014, 2014, 262314. [Google Scholar] [CrossRef]
Shi, J.; Wang, G.; Xiong, J. Leader-follower stochastic differential game with asymmetric information and applications. Automatica 2016, 63, 60–73. [Google Scholar] [CrossRef]
Nourian, M.; Caines, P.E. ϵ-Nash Mean Field Game Theory for Nonlinear Stochastic Dynamical Systems with Major and Minor Agents. arXiv 2012, arXiv:1209.5684. [Google Scholar]
Goldys, B.; Yang, J.; Zhou, Z. Singular perturbation of zero-sum linear-quadratic stochastic differential games. SIAM J. Control Optim. 2022, 60, 48–80. [Google Scholar] [CrossRef]
Shi, Q.H. A Verification Theorem for Stackelberg Stochastic Differential Games in Feedback Information Pattern. arXiv 2021, arXiv:2108.06498. [Google Scholar]
Zheng, Y.; Shi, J. A linear-quadratic partially observed Stackelberg stochastic differential game with application. Appl. Math. Comput. 2022, 420, 126819. [Google Scholar] [CrossRef]
Song, S.H.; Ha, I.J. A Lyapunov-like approach to performance analysis of 3-dimensional pure png laws. IEEE Trans. Aerosp. Electron. Syst. 1994, 30, 238–248. [Google Scholar] [CrossRef]
Song, J.; Zhang, X.; Wang, L. Impact Angle Constrained Guidance against Non-maneuvering Targets. AIAA J. Guid. Control Dyn. 2023, 46, 1556–1565. [Google Scholar] [CrossRef]
Zhang, Y.; Li, X.; Guo, J. A Review of Missile Interception Techniques. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 2341–2357. [Google Scholar]
Li, B.; Zhang, W.; Sun, Z. Guidance for Intercepting High-Speed Maneuvering Targets. J. Guid. Control Dyn. 2021, 44, 2282–2293. [Google Scholar]
Wang, L.; Zhou, J. Nonlinear Missile Guidance and Control; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
Chen, Q.; Zhou, X. Design of Optimal Missile Guidance Law Using LQR. Control Eng. Pract. 2019, 83, 62–73. [Google Scholar] [CrossRef]

Figure 1. Comparison of Nash and Stackelberg Equilibria.

Figure 2. The mind map.

Figure 3. Model of movement in 3D space.

Figure 4. The derivation process for the HJB equation.

Figure 5. State in stochastic differential game.

Figure 6. The model of the multiple pursuers and evaders.

Figure 7. Optimal trajectory example 1.

Figure 8. Two-dimensional plane view of X–Z.

Figure 9. Two-dimensional plane view of X–Y.

Figure 10. Distance between evader and pursuers.

Figure 11. Z-axis acceleration.

Figure 12. Y-axis acceleration.

Figure 13. The state norm.

Figure 14. The cost function.

Figure 15. The value function.

Figure 16. Optimal trajectory example 2.

Figure 17. Two-dimensional plane view of X–Z.

Figure 18. Two-dimensional plane view of X–Y.

Figure 19. Distance between evader and pursuers.

Figure 20. Z-axis acceleration.

Figure 21. Y-axis acceleration.

Figure 22. The state norm.

Figure 23. The cost function.

Figure 24. The value function.

Figure 25. Optimal trajectory Example 3.

Figure 26. Two-dimensional plane view of X–Z.

Figure 27. Two-dimensional plane view of X–Y.

Figure 28. Distance between evader and pursuers.

Figure 29. Z-axis acceleration.

Figure 30. Y-axis acceleration.

Figure 31. The state norm.

Figure 32. The cost function.

Figure 33. The value function.

Figure 34. Optimal trajectory Example 4.

Figure 35. Two-dimensional plane view of X–Z.

Figure 36. Two-dimensional plane view of X–Y.

Figure 37. Distance between evaders and pursuers.

Figure 38. Z-axis acceleration.

Figure 39. Y-axis acceleration.

Figure 40. The state norm.

Figure 41. The cost function.

Figure 42. The value function.

Table 1. List of symbols and their descriptions.

Symbol	Description	Coordinate System
$(X_{I}, Y_{I}, Z_{I})$	Inertial reference coordinate system	Inertial
$(X_{L}, Y_{L}, Z_{L})$	Line-of-sight (LOS) coordinate system	LOS
$(X_{E}, Y_{E}, Z_{E})$	Velocity coordinate system of the i-th pursuer	Pursuer
$v_{E i}$	Velocity of the i-th evader	Evader
$v_{P i}$	Velocity of the i-th pursuer	Pursuer
$A_{P i}$	Acceleration of the i-th pursuer	Pursuer
$A_{E i}$	Acceleration of the i-th evader	Evader
$γ_{P i}$	Angle between the acceleration of the i-th pursuer and axis $Y_{P i}$	Pursuer
$γ_{E i}$	Angle between the acceleration of the i-th evader and axis $Y_{E}$	Evader
$R_{P i}$	Distance between the i-th pursuer and the evader	Spatial
$θ_{L i}, φ_{L i}$	LOS angles between the evader and the i-th pursuer relative to the inertial reference coordinate system	LOS
$θ_{P i}, φ_{P i}$	Elevation and azimuth angles of $v_{P i}$ relative to the LOS coordinate system from pointing toward E	Pursuer
$θ_{E i}, φ_{E i}$	Elevation and azimuth angles of $v_{E i}$ relative to the LOS coordinate system from $E_{i}$ pointing toward $P_{i}$	Evader
$A_{z P i}, A_{y P i}$	Projections of the pursuer’s normal acceleration on the $Z_{P i}$ and $Y_{P i}$ axes in the velocity coordinate system	Pursuer
$A_{z E i}, A_{y E i}$	Projections of the evader’s normal acceleration on the $Z_{E i}$ and $Y_{E i}$ axes in the velocity coordinate system	Evader

Table 2. Experiment environment.

Item	Environment
Development language	Python
Library	Numpy
Disk capacity	2 T
RAM	32 G
CPU	i7 2.2 GHZ
OS	Ubantu 16.04

Table 3. Experimental setup and parameters.

Parameter	Value
Initial distance	100,000
Evader initial elevation angle	$[- 20^{\circ}, 20^{\circ}]$ (elevation) and $[160^{\circ}, 200^{\circ}]$ (azimuth)
Evader initial azimuth angle	$[- 20^{\circ}, 20^{\circ}]$ (elevation) and $[- 20^{\circ}, 20^{\circ}]$ (azimuth)
Iteration time	0.1–0.5 s
Maximum normal acceleration	20–40 g
100 experiments	98 successful, 2 failed

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bai, Y.; Zhou, D.; He, Z. A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games. Aerospace 2025, 12, 50. https://doi.org/10.3390/aerospace12010050

AMA Style

Bai Y, Zhou D, He Z. A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games. Aerospace. 2025; 12(1):50. https://doi.org/10.3390/aerospace12010050

Chicago/Turabian Style

Bai, Yu, Di Zhou, and Zhen He. 2025. "A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games" Aerospace 12, no. 1: 50. https://doi.org/10.3390/aerospace12010050

APA Style

Bai, Y., Zhou, D., & He, Z. (2025). A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games. Aerospace, 12(1), 50. https://doi.org/10.3390/aerospace12010050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Class of Pursuit Problems in 3D Space via Noncooperative Stochastic Differential Games

Abstract

1. Introduction

2. Many-to-One Pursuits Problem in Stochastic Differential Games

2.1. Notation

2.2. Problem Formulation

3. The Optimal Feedback Strategies

4. The Barrier Surface in the Stochastic Differential Game

5. The Multiple Pursuers and Evaders in a Stochastic Differential Game

6. Numerical Analysis

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI