Next Article in Journal
A Study on the Influence of Flight Parameters on Two-Phase Flow and Radiation in the Plume of Solid Rocket Motors
Previous Article in Journal
Guidance and Control Architecture for Rendezvous and Approach to a Non-Cooperative Tumbling Target
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Game Theory-Based Leader–Follower Tracking Control for an Orbital Pursuit–Evasion System with Tethered Space Net Robots

1
School of Astronautics, Northwestern Polytechnical University, Xi’an 710072, China
2
National Key Laboratory of Aerospace Flight Dynamics, Northwestern Polytechnical University, Xi’an 710072, China
*
Author to whom correspondence should be addressed.
Aerospace 2025, 12(8), 710; https://doi.org/10.3390/aerospace12080710
Submission received: 24 June 2025 / Revised: 30 July 2025 / Accepted: 8 August 2025 / Published: 11 August 2025
(This article belongs to the Special Issue Dynamics and Control of Space On-Orbit Operations)

Abstract

The tethered space net robot offers an effective solution for active space debris removal due to its large capture envelope. However, most existing studies overlook the evasive behavior of non-cooperative targets. To address this, we model an orbital pursuit–evasion game involving a tethered net and propose a game theory-based leader–follower tracking control strategy. In this framework, a virtual leader—defined as the geometric center of four followers—engages in a zero-sum game with the evader. An adaptive dynamic programming method is employed to handle input saturation and compute the Nash Equilibrium strategy. In the follower formation tracking phase, a synchronous distributed model predictive control approach is proposed to update all followers’ control simultaneously, ensuring accurate tracking while meeting safety constraints. The feasibility and stability of the proposed method are theoretically analyzed. Additionally, a body-fixed reference frame is introduced to reduce the capture angle. Simulation results show that the proposed strategy successfully captures the target and outperforms existing methods in both formation keeping and control efficiency.

1. Introduction

For on-orbit capture missions, flexible tethered space nets (TSNs) are highly effective for debris removal and non-cooperative target capture, providing a wide range and reduced precision requirements compared to traditional methods like robotic arms or harpoons [1,2]. To further enhance control over the position and configuration of the deployed net, recent advances have led to the development of maneuverable tethered space net robots (TSNRs), which integrate a deformable tethered net with four actively controlled spacecraft that coordinate their motion to adjust the net’s configuration [3,4]. While recent studies have extensively explored the formation control of the TSNR system, few have considered the potential evasive maneuvers of non-cooperative targets. However, when the target exhibits such autonomous behavior, existing formation tracking methods may compromise capture performance. The TSNR capture problem thus becomes an orbital pursuit–evasion game (OPEG) scenario, further complicated by the safety constraints caused by the tethered net.
The safety constraints in the TSNR system primarily revolve around maintaining appropriate inter-spacecraft distances [5]. As the four maneuvering spacecraft regulate the net configuration by varying their relative positions, excessive separation can lead to the bouncing effect or even breaking of the tether, while overly close distances may result in entanglement or even collisions between the spacecraft [6]. Ma et al. [7] employed the artificial potential field (APF) method to constrain the safety distance between spacecraft in TSNR and designed a robust adaptive control scheme to achieve high-precision formation tracking control. To enhance control performance under multiple constraints, distributed model predictive formation control (DMPC) is employed to accomplish trajectory tracking and collision avoidance in multi-agent systems [8,9]. Although these methods demonstrate excellent performance in formation tracking, collision avoidance, and connectivity maintenance, they typically rely on a known reference or target trajectory [10]. However, when the target exhibits rational and autonomous behavior, its actions become interdependent with those of the pursuer [11]. In such cases, applying the aforementioned formation tracking control methods—which focus solely on one-sided optimization—may result in suboptimal solutions, thereby lacking foresight in their pursuit strategy.
Game theory is widely applied to pursuit–evasion problems and is increasingly used to address optimization challenges in orbital competition and cooperation tasks [12]. Treating each spacecraft as an intelligent agent, these orbital scenarios are typically modeled as differential games, with the core challenge being to identify the Nash Equilibrium (NE), where any unilateral deviation by a spacecraft leads to a disadvantage [13]. As is well recognized, input saturation constitutes a fundamental physical constraint, with exceeding actuator limits leading to poor control performance. Consequently, it is essential to incorporate input saturation constraints into the NE solution. To address the optimal control problem under such constraints, a data-driven adaptive dynamic programming (ADP) method with a saturated linear function is proposed, where spacecraft inputs exceeding prescribed bounds are truncated [14]. Furthermore, with the growing trend toward spacecraft clustering, the involvement of multiple agents in orbital games introduces additional complexity to the solution process [12]. The multi-agent pursuit–evasion scenario is established as a mixed zero-sum game problem [15], which involves solving a set of coupled Hamilton–Jacobi (HJ) equations. In particular, the presence of quadratic terms in the formation error cost—critical for tethered net formation tracking in this paper—substantially increases the computational complexity of the solution process.
The leader–follower approach is a widely used strategy in multi-agent systems, where followers adjust their positions based on the leader’s state to maintain a desired formation or achieve cooperative tasks. This method offers a clear control hierarchy and reduced communication [16]. In spacecraft attitude coordination, a distributed predefined-time control framework has been proposed in which each follower estimates the virtual leader’s motion and tracks it using a chattering-free adaptive controller, ensuring robustness against uncertainties and disturbances [17]. For spacecraft formation flight, Wei et al. [18] proposed an adaptive leader–follower formation control approach that integrates APF with prescribed performance control to achieve precise tracking while ensuring collision avoidance and connectivity maintenance. Under the same state constraints, the distributed formation control problem for a group of leader-following spacecraft with bounded control inputs is investigated in [19].
In this paper, we focus on minimizing the complexity of solving the multi-agent pursuit–evasion problem by employing a game theory-based virtual leader–follower tracking control approach. This method addresses the multi-agent orbital pursuit–evasion problem while enforcing safety constraints imposed by a tethered net. Additionally, the spacecraft is modeled as a continuous low thrust system with control bounds. At the leader level, the geometric center of the four pursuer spacecraft, which approximates the center of the tethered net, is treated as a virtual leader. The virtual leader engages in an orbital zero-sum game with the evader spacecraft. To solve the differential game problem under input constraints, we employ an adaptive dynamic programming method incorporating a hyperbolic tangent saturation function. At the follower level, the four spacecraft collaboratively maintain a formation to follow the virtual leader’s trajectory. Assuming synchronous decision-making among the followers and considering both safety and control constraints, we apply a synchronous distributed model predictive control (SDMPC) approach to determine the optimal tracking control for each spacecraft. Compared to sequential DMPC [20], where each agent must wait for the previous one to complete its computation before using the updated information, and iterative DMPC [21], where all agents solve their optimization problems and exchange information with neighbors iteratively within each sampling period, SDMPC allows all agents to update their control simultaneously through a single optimization step. This parallel update scheme significantly reduces both computation time and overall computational load [22]. In comparison with the existing research, the core contributions of this study are summarized below.
(1) Compared to other studies on orbital tethered net capture [3,4], this work models the target spacecraft as an intelligent maneuverable agent and formulates the problem as a multi-agent OPEG with safety constraints imposed by the tethered net.
(2) A novel game theory-based virtual leader–follower tracking control strategy is proposed to solve the multi-agent OPEG.
(3) An adaptive dynamic programming approach incorporating a saturation function is employed to address the OPEG, enabling effective handling of spacecraft input saturation.
(4) A synchronous distributed model predictive control method is developed to ensure optimal formation tracking of the pursuer system while satisfying safety constraints imposed by the tethered net.
The rest of the paper is structured as outlined below. Section 2 describes the mission objectives and formulates the dynamics of the multi-agent pursuit–evasion system with a tethered net. In Section 3, the ADP approach is employed to solve the orbital zero-sum game under input saturation constraints. Section 4 presents a distributed formation tracking strategy based on SDMPC with integrated safety constraints, along with a corresponding stability analysis. Simulation findings appear in Section 5, with conclusions summarized in Section 6.

2. Problem Formulation

2.1. Task Overview

In the OPEG scenario, the pursuer attempts to capture the evader, whereas the evader strives to maximize its distance. In this paper, four spacecraft comprising the pursuer system manipulate a square tethered net to capture the evader. The tethered net is modeled as a mass–spring system, which also exerts dynamic influence on the motion of the four spacecraft, as illustrated in Figure 1. The capture distance is defined as the distance between the evader and the virtual leader represented by the geometric center of the four spacecraft in the pursuer system. This distance is denoted as d P E = r P r E 2 , where r P = 1 4 i = 1 4 r i . r P , r i , r E represent the position of the virtual leader, the ith follower in the pursuer, and the evader, respectively. Considering the lower precision requirements for the tethered net, a capture threshold δ is established, with the capture deemed successful when d P E δ .

2.2. Dynamics Modeling

The orbital movement of the spacecraft in the OPEG scenario are represented by defining several coordinate systems based on the earth-centered inertial (ECI) frame, labeled as O E X E Y E Z E , which is positioned at the center of the Earth. The relative dynamics between the orbital pursuer and the evader are analyzed in the Euler-Hill coordinate frame. This frame, labeled as O C X C Y C Z C , is defined with the X C -axis pointing radially outward, the Z C -axis aligned with the angular momentum vector, and the Y C -axis completing the right-handed orthogonal coordinate system. To describe the attitude of the tethered net, the body-fixed frame, denoted by o B x B y B z B , is defined and located at the virtual leader. The x B -axis aligns with the direction of the virtual leader’s velocity, the y B -axis is parallel to the O C X C Y C plane, and the z B -axis is defined to form a right-handed orthogonal coordinate system. In the Euler–Hill coordinate frame, the spacecraft’s relative translational dynamics are governed by the following equations:
x ¨ = 3 ω 0 2 x + 2 ω 0 y ˙ + u x y ¨ = 2 ω 0 x ˙ + u y z ¨ = ω 0 2 z + u z
Here, ω 0 represents the mean motion of the Euler–Hill coordinate frame, and u = [ u x , u y , u z ] denotes the acceleration vector generated by continuous low thrust of the spacecraft. The matrix representation of the relative orbital dynamics in the pursuit–evasion scenario is given as follows:
x ˙ = A x + B u P B u E
where x = [ x , y , z , x ˙ , y ˙ , z ˙ ] , and the input of the pursuer and evader spacecraft is assumed to be bounded by positive constants, as follows: u P λ P and u E λ E .

2.3. Leader–Follower Framework Design

The game theory-based virtual leader–follower tracking control framework is shown in Figure 2. Based on initial conditions and the game control strategy, both the virtual leader (the pursuer) and the evader can follow their ideal NE trajectories. Then, using the distributed formation tracking control, the followers with a tethered net can track their formation trajectory under the uncertainty of tethered net dynamics and safety constraints.
Furthermore, as illustrated in Figure 3, minimizing the impact of the evader on the net during capture requires reducing the angle φ between the velocity vectors of the virtual leader and the evader ( v P and v E ), as well as the angle ψ between the net’s normal vector k and the virtual leader’s velocity v P . Aligning these vectors will help ensure a smoother interception and mitigate potential deformation or damage to the net.
To minimize the angle φ , the virtual leader’s strategy is optimized by minimizing the objective function x T Q x , which reflects the velocity error between the virtual leader and the evader. In addition, to reduce the angle ψ , the desired relative states of the follower spacecraft are defined in the body-fixed frame such that the vector k is aligned with the x B -axis, which is oriented along the virtual leader’s velocity vector, namely denoted by v P = [ x ˙ P , y ˙ P , z ˙ P ] . The state x i B of the ith follower in the body-fixed frame is designed to achieve the desired formation configuration. By applying a coordinate transformation involving the rotation matrix R T and translation vector x P (representing the state of the virtual leader), we can obtain the reference state x i r for the ith follower spacecraft in the Euler–Hill frame as follows:
x i r = R T 1 x i B + x P
where R T is defined as follows:
R T = cos ( γ ) cos ( θ ) cos ( γ ) sin ( θ ) sin ( γ ) sin ( θ ) cos ( θ ) 0 sin ( γ ) cos ( θ ) sin ( γ ) sin ( θ ) cos ( γ )
Here, the two rotation angles θ and γ of the coordinate axes are defined as follows:
θ = arctan y ˙ P x ˙ P , γ = arctan z ˙ P v P 2

3. Game Theory-Based Approach to Orbital Pursuit–Evasion Problem

3.1. OPEG Modeling

In the OPEG, two players are involved: a virtual leader and an evader. The spacecraft’s cost function is formulated as follows:
J ( x 0 , u P , u E ) = 0 x T Q x + U ( u P , λ P ) U ( u E , λ E ) d τ
where x 0 denotes the initial state at time t = 0 and Q is a positive definite matrix. Additionally, to handle the input saturation constraints of the spacecraft, this section introduces a non-quadratic function U ( , ) , which is expressed as follows [23]:
U ( u , λ ) = 2 0 u λ tanh 1 ( t / λ ) T R d t
where R = d i a g ( r 1 , r 2 , r 3 ) is a positive definite diagonal matrix and tanh 1 ( · ) denotes the inverse of the hyperbolic tangent function. U ( u , λ ) can be reduced to a form that no longer includes the integral symbol, as shown below:
U ( u , λ ) = 2 λ u T R tanh 1 ( u / λ ) + λ 2 R ¯ ln ( 1 ¯ ( u / λ ) 2 )
where the vector R ¯ = [ r 1 , r 2 , r 3 ] , 1 ¯ = [ 1 , 1 , 1 ] T . ln ( · ) denotes the natural logarithm (base e), while ( · ) 2 indicates that each component of the vector is squared individually.
In this OPEG scenario, the virtual leader designs a control to minimize J ( x 0 , u P , u E ) , while the evading spacecraft designs a control to maximize it. This optimal control problem, involving both the virtual leader and evader, is a zero-sum game with a unique solution if a NE strategy exists, provided that the following condition holds.
min u P max u E J ( x 0 , u P , u E ) = max u E min u P J ( x 0 , u P , u E )

3.2. Approach to Game Control Solution

The value function is defined as follows:
V ( x , u P , u E ) = t x T Q x + U ( u P , λ P ) U ( u E , λ E ) d τ
Based on the Taylor expansion of the value function V, the equivalent expression of Equation (10) is as follows:
0 = x T Q x + U ( u P , λ P ) U ( u E , λ E ) + V T ( A x + B u P B u E )
The Hamiltonian is defined as follows:
H ( x , V , u P , u E ) = x T Q x + U ( u P , λ P ) U ( u E , λ E ) + V T ( A x + B u P B u E )
Based on the stationary conditions of optimization, the optimal controls are derived by solving the following first-order conditions:
H ( x , V , u P , u E ) u P = 0 , H ( x , V , u P , u E ) u E = 0
Accordingly, the game controls of the two players can be expressed as follows:
u P * = λ P tanh ( D λ P * ) u E * = λ E tanh ( D λ E * )
where tanh ( · ) denotes the hyperbolic tangent function and D λ i * is defined as follows:
D λ i * = 1 2 λ i R 1 B T V * , i = E o r P
Substituting Equations (8) and (13) into Equation (11) gives the following the Hamilton–Jacobi–Isaacs (HJI) equation:
0 = x T Q x + V * T ( A x 2 λ E B tanh ( D λ E * ) ) + λ P 2 R ¯ ln ( 1 ¯ tanh 2 ( D λ p * ) ) λ E 2 R ¯ ln ( 1 ¯ tanh 2 ( D λ E * ) )
By solving the HJI equation, the optimal value function V * is obtained and subsequently substituted into Equation (13) to derive the NE game controls for the virtual leader and the evader, which are ( u P * , u E * ) . However, due to the nonlinearities in the cost terms and the presence of input saturation, the HJI equation cannot be solved analytically. Nevertheless, its explicit form is essential for constructing the loss function used in the neural network-based approximation of the value function. To address this, the ADP method is employed to obtain a numerical solution under input constraints. The neural network can be employed to approximate the optimal value function V * and its gradient V * , as follows:
V * ( x ) = W T ϕ ( x ) + ε ( x )
where W denotes the ideal constant weight vector, ϕ ( x ) is a vector of manually designed basis functions that act as neuron nodes, and ε ( x ) represents the approximation error of the neural network. Then, the gradient form of V * ( x ) is given as follows:
V * ( x ) = ϕ ( x ) T W + ε ( x )
Substituting Equation (17) into Equation (13), the NE controls are presented below.
u P * = λ P tanh 1 2 λ P R 1 B T ( ϕ ( x ) T W + ε ( x ) ) u E * = λ E tanh 1 2 λ E R 1 B T ( ϕ ( x ) T W + ε ( x ) )
Let W ^ denote the estimation of W , and the estimation of the gradient of the value function V ^ can be described as follows:
V ^ = ϕ T W ^
The NE control can be approximated as follows:
u ^ P = λ P tanh 1 2 λ P R 1 B T ( ϕ ( x ) T W ^ ) u ^ E = λ E tanh 1 2 λ E R 1 B T ( ϕ ( x ) T W ^ )
Substituting Equations (19) and (20) into Equation (12), we can obtain the estimation error of the HJI equation as follows:
e H = x T Q x + U ( u ^ P , λ P ) U ( u ^ E , λ E ) + W ^ T ϕ ( A x + B u ^ P B u ^ E )
To approximate the solution of the HJI equation, the approximation error e H is expected to be minimized as much as possible. By minimizing the squared error term E = 1 2 e H T e H , the learning of the neural network weight vector can be achieved. Based on this, the updated law for the weight vector is defined as follows:
Δ W ^ = α 1 ( 1 + ρ T ρ ) 2 E e H × e H W ^ = α ρ ( 1 + ρ T ρ ) 2 e H
where the α is the learning rate, ρ = e H / W ^ . An enhanced Levenberg–Marquardt algorithm modifies the normalization denominator from 1 + ρ T ρ to ( 1 + ρ T ρ ) 2 , ensuring boundedness in the mathematical proof [23]. By performing iterative updates of the form W ^ W ^ + Δ W ^ , the neural network weights are expected to converge, allowing for the determination of the game control for the spacecraft in the OPEG.

4. SDMPC Approach to Multi-Agent Formation Control

4.1. Task Formulation for Formation Control

In this section, we examine a scenario in which four follower spacecraft cooperatively maneuver a tethered net to track the NE trajectory of a virtual leader. Effective coordination among the followers is essential for successful mission execution, necessitating synchronized decision-making. Each follower is required to reach its assigned position while maintaining the prescribed formation configuration. To ensure coordinated operation, the system must comply with the completeness constraints imposed on each follower spacecraft, specified as follows:
lim k ( r i ( k ) r i r ( k ) ) = 0
lim k ( r i ( k ) r j ( k ) ) = d i j r , j N i
r i ( k ) r j ( k ) 2 2 R , j N i
r i ( k ) r j ( k ) 2 2 r , j N i
where r i r represents the reference position of the ith follower, computed according to Equation (3); d i j r denotes the desired relative positions among the followers, while R and r represent the maximum and minimum safe distances between followers, respectively. Let N A { 1 , 2 , 3 , 4 } be the index set of the four follower spacecraft. N i { j N A , j i | d i j r } is a priori information for agent i denotes the set of neighbors of agent i.
In this work, SDMPC is employed to facilitate formation tracking while ensuring compliance with critical constraints and mitigating the effects of disturbances caused by the tethered net. These constraints not only involve maintaining safe inter-spacecraft distances above, but also require that the gap between the actual trajectories and the assumed trajectories communicated to other agents remains within acceptable bounds. The assumed trajectories are defined as the combination of the optimal trajectories computed at the previous time step and the terminal control input applied to the final state, as shown in Equation (40). The computational schematic of the SDMPC method is illustrated in Figure 4. At kth time step, each spacecraft optimizes its trajectory at the kth time step based on the reference trajectory x i r , the optimized trajectory from the ( k 1 ) th time step, and the assumed trajectories of its neighboring spacecraft. The corresponding notations for control and state variables are summarized in Table 1. The first segment of the optimized control for each follower is chosen as the actual input at kth time step, and then the actual next state is determined.

4.2. Definition of the Cost Functions and Constraints

At time step k, the cost function for ith follower can be written in the following form.
J i * ( k , x i * , x ^ i , x i r , u i * ) = min u i ( k + t | k ) J i ( k , x i , x ^ i , x i r , u i )
with the constraint that for t = 1 , 2 , , N 1 ,
x i ( k + t + 1 | k ) = A x i ( k + t | k ) + B u i ( k + t | k ) ,
u i ( k + t | k ) U ,
r i ( k + t | k ) r ^ j ( k + t | k ) 2 2 R μ i j I ( k + t | k ) , j N i
r i ( k + t | k ) r ^ j ( k + t | k ) 2 2 r + μ i j O ( k + t | k ) , j N i
r ^ i ( k + t | k ) r i ( k + t | k ) 2 μ i ( k + t | k ) ,
x i ( k + N | k ) X i f ,
where
μ i j I ( k + t | k ) = r ^ i ( k + t | k ) r ^ j ( k + t | k ) 2 2 r 2 , μ i j O ( k + t | k ) = 2 R r ^ i ( k + t | k ) r ^ j ( k + t | k ) 2 2 , μ i ( k + t | k ) = min j N a i min ( μ i j I ( k + t | k ) , μ i j O ( k + t | k ) ) .
Equations (28) and (29) represent the dynamic constraints and input saturation constraints, respectively. Equations (30) and (31) impose safety constraints induced by the tethered nets. Similarly, Equation (32) define the compatibility constraints, while Equation (33) specifies the terminal state region constraints. The cost function in this optimal problem is expressed as follows:
J i ( k , x i , x ^ i , x i r , u i ) = t = 0 N 1 L i ( k + t | k , x i , x ^ i , x i r , u i ) + L i f ( x i ( k + N | k ) , x i r )
where the stage and terminal cost functions are defined as follows:
L i ( k + t | k , x i , x ^ i , x i r , u i ) = x i ( k + t | k ) x i r ( k + t | k ) Q 2 + x i ( k + t | k ) x ^ j ( k + t | k ) d i j r Q 2 + u i ( k + t | k ) R 2 = Δ x i ( k + t | k ) Q 2 + x i j ( k + t | k ) Q 2 + u i ( k + t | k ) R 2
L i f ( x i ( k + N | k ) , x i r ) = x i ( k + N | k ) x i r ( k + N | k ) P i 2
where · M 2 denotes the squared weighted norm of a vector with respect to the matrix M , i.e., x M 2 = x T M x .

4.3. Design of the Compatibility Constraints

To ensure formation tracking consistency in synchronized decision-making, it is assumed that each agent’s actual trajectory has a small deviation from its assumed trajectory communicated to other agents. To address this, the compatibility constraint on position in this work is defined as follows:
r ^ i ( k + t | k ) r i ( k + t | k ) 2 = ε i p ( k + t | k ) μ i ( k + t | k )

4.4. Design of Safety Constraints

Considering the configuration constraints of the tethered net deployment, the tracking spacecraft must satisfy the following safe distance constraints:
r i ( k + t | k ) r j ( k + t | k ) 2 = r i ( k + t | k ) r ^ j ( k + t | k ) + r ^ j ( k + t | k ) r j ( k + t | k ) 2 r i ( k + t | k ) r ^ j ( k + t | k ) 2 ε j p ( k + t | k ) r i ( k + t | k ) r ^ j ( k + t | k ) 2 μ i j O ( k + t | k )
If the condition r i ( k + t | k ) r ^ j ( k + t | k ) 2 2 r + μ i j O ( k + t | k ) is satisfied, the avoidance of the safety constraint given in Equation (26) can be ensured. Likewise, a sufficient condition for satisfying Equation (25) is expressed as r i ( k + t | k ) r ^ j ( k + t | k ) 2 2 R μ i j I ( k + t | k ) .

4.5. Design of the Terminal Ingredients

4.5.1. Terminal Control

The assumed control and assumed state, which include the optimal solution from the previous step and the terminal component, are defined as follows:
u ^ i ( k + t | k ) = u i * ( k + t | k 1 ) , t = 0 , 1 , , N 2 κ i ( x i * ( k 1 + N | k 1 ) ) , t = N 1
x ^ i ( k + t | k ) = x i * ( k + t | k 1 ) , t = 0 , 1 , , N 1 x i κ i ( k 1 + N | k 1 ) , t = N
The terminal control in assumed control is constructed as follows:
κ i ( x i ( k ) ) = K i 1 x i ( k ) + K i 2 x i r ( k )
Substitute the terminal control into the orbital dynamics (Equation (1)) and express them in the following difference form.
Δ x i ( k + 1 ) = A Δ x i ( k ) + I 2 n A x i r ( k ) B K i 1 x i ( k ) B K i 2 x i r ( k ) = A + B K i 1 Δ x i ( k ) + I 2 n A B K i 1 B K i 2 x i r ( k )
The gains K i 1 and K i 2 are computed such that the matrix Φ i = A + B K i 1 is stabilized, and the equality I 2 n A B K i 1 B K i 2 = 0 holds. This expression can be further simplified as follows:
Δ x i ( k + 1 ) = Φ i Δ x i ( k )

4.5.2. Terminal Cost

The terminal cost is defined in Equation (36), where P i is a positive definite matrix obtained as the solution to the Lyapunov equation, commonly represented as follows:
P i Φ i + Φ i T P i + Q i + 2 K i 1 T R i K i 1 = Q ¯ ,
where Q ¯ is a predefined matrix with positive definite. Additionally, it can be demonstrated that the terminal cost serves as a local control-Lyapunov function and satisfies the following conditions. The detailed derivation is omitted here, as a similar process can be found in [24]:
i N a L i f x i ( k + 1 ) , x i r L i f x i ( k ) , x i r + i N a L i k , x i , x ^ i , x i r , κ i x i 0 , x i X i f .

4.5.3. Terminal Set

The terminal set is determined according to the system dynamics and safety constraints, and is simplified as shown in Equation (46) [22]. It ensures that all states within this set strictly adhere to the safety constraints (Equations (30) and (31)) caused by the tethered net.
X i f = x i | r i r r i 2 μ i

4.6. Discussion of the Feasibility and Stability

At the kth time step, all follower spacecraft independently solve their respective optimization problems simultaneously. Provided a feasible solution exists at the initial time for each follower spacecraft, the optimization problem remains feasible at all subsequent time steps. Moreover, the entire system converges to a state of asymptotic stability.
Proof. 
(a) Feasibility: At the kth time step, it is assumed that a valid solution exists for the formation tracking of the follower spacecraft. Let the assumed control u ^ i ( k + t | k + 1 ) and assumed state x ^ i ( k + t | k + 1 ) be the feasible control u ˜ i ( k + t | k + 1 ) and feasible state x ˜ i ( k + t | k + 1 ) . Since u i * ( k + t | k ) and x i * ( k + t | k ) is derived from the constrained optimization at the kth time step, it readily satisfies the constraints such as Equations (28)–(32). Additionally, under the designed feedback control κ i ( x i * ( k + N | k ) ) , the positively invariant set X i f ensures that the safety constraints are met. Therefore, if a feasible solution exists at the kth time step in this optimization problem of each follower spacecraft, a feasible solution will also exist at the ( k + 1 ) th time step.
(b) Stability: The difference in the sum of optimal costs for all spacecraft between consecutive time steps can be expressed as follows:
J * ( k + 1 ) J * ( k ) i = 1 N a ( J i ( k + 1 , x ˜ i , x ^ i , x i r , u ˜ i ) J i ( k , x i * , x ^ i , x i r , u i * ) ) = i = 1 N a { t = 0 N 1 ( L i ( k + 1 + t | k + 1 , x ˜ i , x ^ i , x i r , u ˜ i ) L i ( k + t | k , x i * , x ^ i , x i r , u i * ) ) + L i f ( x ˜ i ( k + 1 + N | k + 1 ) , x i r ) L i f ( x i * ( k + N | k ) , x i r ) }
Further simplification can be obtained by substituting Equation (45) into the above, given that L i is a quadratic form defined by a positive semi-definite matrix, yielding the following:
J * ( k + 1 ) J * ( k ) i = 1 N a { t = 0 N 1 L i ( k + t | k , x i * , x ^ i , x i r , u i * ) } 0

5. Simulation Results

In this section, numerical simulations are conducted based on the initial conditions to evaluate the proposed game theory-based virtual leader–follower control in addressing the multi-agent OPEG problem. In this study, the initial states of the virtual leader and the evader, along with the desired relative state of the follower are provided in Table 2. For each simulation, the position of the evader is randomly selected within a certain range. The local orbital angular velocity is set to ω 0 = 7.2388 × 10 5 s 1 . The coefficient affecting the maximum input of the virtual leader is specified as λ P = 0.4 , while the evader’s is limited to λ E = 0.1 [25]. Select Q = I 6 , R = I 3 [23]. Additionally, the capture threshold δ = 10 m. The learning rate α = 0.001 [26], and the activation function employed to approximate the optimal value function V * is chosen as follows:
ϕ ( x ) = [ x 1 2 , x 1 x 2 , x 1 x 3 , x 1 x 4 , x 1 x 5 , x 1 x 6 , x 2 2 , x 2 x 3 , x 2 x 4 , x 2 x 5 , x 2 x 6 , x 3 2 , x 3 x 4 , x 3 x 5 , x 3 x 6 , x 4 2 , x 4 x 5 , x 4 x 6 , x 5 2 , x 5 x 6 , x 6 2 , x 1 4 , x 2 4 , x 3 4 , x 4 4 , x 5 4 , x 6 4 ] ,
which results in a single hidden layer neural network architecture with 27 neuron nodes for approximating the optimal value function. The initial values of the neural network weights for the iterative process are given as follows:
W 0 = [ 0.90 , 3.21 , 5.11 , 0.43 , 3.45 , 4.83 , 2.11 , 0.56 , 1.46 , 3.78 , 1.25 , 3.56 , 4.28 , 0.56 , 2.53 , 1.58 , 5.32 , 2.58 , 6.75 , 0.86 , 3.57 , 0.86 , 6.12 , 2.36 , 2.25 , 4.10 , 0.87 ] .
To enrich the states for training neural network weights, we injected random noise n ( t ) = 0.5 e 0.15 t ( sin 2 ( t ) cos ( t ) + sin 2 ( 2 t ) cos ( 0.1 t ) ) into the x-axis control input of the virtual leader during the first 80 s.
Additionally, the prediction horizon length N is set to 6, and the time step d t = 0.1 s. Considering the safety issues of each follower during formation tracking, the minimum distance between adjacent pursuers is set as r = 10 m, and the maximum distance between them is denoted as R = 25 m. The adjacency matrix A c that defines the communication topology among the followers is defined as follows:
A c = 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1 0 .
In this work, based on the initial condition x E 0 = [ 100 ; 200 ; 150 ; 0 ; 0 ; 0 ] , the simulation results of the OPEG are presented in Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14. Specifically, Figure 5 illustrates the NE trajectories of both spacecraft, while Figure 6 depicts the evolution of the distance between them. From these figures, it can be observed that the evader attempts to escape, whereas the virtual leader actively minimizes the distance d P E . Due to its higher acceleration capability, the virtual leader gradually closes their gap over time and captures the evader. Figure 7 illustrates the convergence behavior of the neural network weights, which eventually stabilize at specific values as follows:
W = [ 1.54 , 3.38 , 0.26 , 3.21 , 2.23 , 0.55 , 1.96 , 3.56 , 0.11 , 2.56 , 0.98 , 4.01 , 1.87 , 1.21 , 0.36 , 1.01 , 1.26 , 0.87 , 2.56 , 3.21 , 0.74 , 1.54 , 0.25 , 3.21 , 0.45 , 1.21 , 0.64 ]
The weight convergence condition is defined based on the infinity norm of the difference between two consecutive weight vectors. Specifically, the training is considered to have converged when W ^ k W ^ k + 1 < 0.1 . The NE control can be obtained for both virtual leader and evader by substituting the converged weights into Equation (18). Based on this control strategy, we conducted 600 randomized simulations to investigate how the relative initial positions between the evader and the virtual leader influence the capture outcome. The evader’s initial positions are sampled in the YZ, XZ, and XY planes that pass through the origin. Under this NE control policy, all simulations resulted in successful capture within 100 s. Moreover, the control effort required by the virtual leader is generally positively correlated with the initial distance to the evader, as illustrated in Figure 8. To further evaluate the generalization ability of the NE control policy, we expand the sampling region of the evader’s initial positions to an unseen area and conduct another 600 randomized simulations. The results are shown in Figure 9, where red crosses denote evader initial positions that were not captured within 300 s, and the color of each point indicates the required capture time. A success rate of 95.16% is achieved in the unseen region, demonstrating the generalization capability of the proposed NE control policy.
Based on the NE trajectory of the virtual leader, Figure 10 illustrates the trajectory and configuration of the TSNR system based on SDMPC control during the formation tracking simulation. Three representative states of the tethered net’s motion are highlighted in the figure. The blue segments of the tethered net indicate the distances between neighboring nodes. When the relative distances among the four follower spacecraft remain within an appropriate range, the net maintains slight tension, contributing to overall system stability. Furthermore, by defining the reference trajectories of the followers in the body-fixed frame rather than the Euler–Hill frame, the capture angle during the net-capturing process can be effectively reduced, and its supporting simulation result is shown in Figure 11. In addition, a simulation analysis was conducted to evaluate the formation control performance under different predictive horizons, as illustrated in Figure 12. The vertical axis represents the average distance between adjacent spacecraft. The results show that a larger predictive horizon slightly improves formation control performance, but it also leads to increased computational load. Specifically, as the prediction horizon increases from N = 4 to N = 10 , the average computation time per prediction step increases from 0.18 s to 0.25 s, 0.33 s, and 0.46 s, respectively.
To further demonstrate the advantages of the proposed SDMPC approach, a comparative analysis is conducted against the APF method, a widely acknowledged state-of-the-art technique in formation tracking applications [7]. As shown in Figure 13, the real-time lengths of the formation edges are illustrated, demonstrating that the proposed SDMPC approach achieves the desired formation more quickly and effectively mitigates overshoot caused by the dynamics of the tethered net. Additionally, Figure 14 presents the control effort of the follower spacecraft under both methods; the total control effort consumed by SDMPC is 938.26 m 2 / s 4 , compared to 1060.04 m 2 / s 4 by APF.
To ensure the generalizability of the results, 600 Monte Carlo simulation runs were conducted with randomized initial positions of the evader during the pursuit–evasion phase. Figure 15 presents the trajectory and formation configuration of the TSNR system under SDMPC-based tracking, corresponding to the initial condition x E 0 = [ 20 ; 30 ; 80 ; 0 ; 0 ; 0 ] . This Monte Carlo simulation also provided sufficient data to support a statistical comparison of performance metrics across different formation tracking methods. The performance metrics include control effort, maximum formation deviation, and formation error convergence time. Specifically, control effort represents the energy cost of control; maximum formation deviation denotes the largest error in the distance between adjacent spacecraft compared to the desired spacing; and formation error convergence time refers to the time required for the maximum formation deviation to fall below 0.5 m. As shown in Table 3, the proposed SDMPC method controls formation errors more effectively while also achieving lower control effort consumption.

6. Conclusions

In this study, a game-theoretic virtual leader–follower tracking control strategy is proposed for orbital pursuit–evasion systems involving tethered space net robots. At the OPEG level involving the virtual leader and the evader, an ADP approach incorporating a saturation function is employed to effectively handle input constraints and compute the NE trajectories for this orbital zero-sum game. At the formation tracking level, the follower’s SDMPC strategy ensures robust formation tracking performance under safety constraints and uncertain net dynamics, with theoretical analysis provided for feasibility and stability. The Monte Carlo simulation results demonstrate that the proposed system can successfully capture the target while maintaining safe inter-spacecraft distances, outperforming existing formation tracking methods in both formation maintenance and control effort. Additionally, the designed capture strategy maintains a small capture angle, which helps reduce the impact force on the tethered net during engagement. Additionally, it is worth noting that the communication conditions considered in this work are relatively idealized. In future work, we plan to further account for communication delays, noise, and packet loss to enhance the robustness of the system’s cooperative control.

Author Contributions

Conceptualization, Z.Z. and C.W.; methodology, C.W.; software, C.W.; validation, C.W. and J.L.; formal analysis, Z.Z. and C.W.; investigation, C.W.; data curation, C.W.; writing—original draft preparation, C.W.; writing—review and editing, Z.Z. and C.W.; visualization, C.W.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China under Grant No. 12072269, and the Innovation Foundation for Doctor Dissertation of Northwestern Polytechnical University under Grant No. CX2021049.

Data Availability Statement

The data presented in this paper are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Aglietti, G.S.; Taylor, B.; Fellowes, S.; Salmon, T.; Retat, I.; Hall, A.; Steyn, W.H. The active space debris removal mission RemoveDebris. Part 2: In orbit operations. Acta Astronaut. 2020, 168, 310–322. [Google Scholar] [CrossRef]
  2. Svotina, V.V.; Cherkasova, M.V. Space debris removal–Review of technologies and techniques. Flexible or virtual connection between space debris and service spacecraft. Acta Astronaut. 2023, 204, 840–853. [Google Scholar] [CrossRef]
  3. Liu, Y.; Ma, Z.; Zhang, F.; Huang, P. Time-varying formation planning and scaling control for tethered space net robot. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 6717–6728. [Google Scholar] [CrossRef]
  4. Zhu, W.; Pang, Z.; Du, Z.; Gao, G.; Zhu, Z.H. Multi-debris capture by tethered space net robot via redeployment and assembly. J. Guid. Control Dyn. 2024, 10, 1359–1376. [Google Scholar] [CrossRef]
  5. Zhang, F.; Huang, P. Releasing dynamics and stability control of maneuverable tethered space net. IEEE/ASME Trans. Mechatron. 2016, 22, 983–993. [Google Scholar] [CrossRef]
  6. Ma, Y.; Zhang, Y.; Liu, Y.; Huang, P.; Zhang, F. An active energy management distributed formation control for tethered space net robot via cooperative game theory. Acta Astronaut. 2025, 227, 57–66. [Google Scholar] [CrossRef]
  7. Ma, Y.; Zhang, Y.; Huang, P.; Liu, Y.; Zhang, F. Game theory based finite-time formation control using artificial potentials for tethered space net robot. Chin. J. Aeronaut. 2024, 37, 358–372. [Google Scholar] [CrossRef]
  8. Du, Z.; Zhang, H.; Wang, Z.; Yan, H. Model predictive formation tracking-containment control for multi-UAVs with obstacle avoidance. IEEE Trans. Syst. Man Cybern. Syst. 2024, 54, 3404–3414. [Google Scholar] [CrossRef]
  9. Nie, Y.; Li, X. Antidisturbance distributed lyapunov-based model predictive control for quadruped robot formation tracking. IEEE Trans. Ind. Electron. 2025. [Google Scholar] [CrossRef]
  10. Li, J.; Li, C. Guidance strategy of motion camouflage for spacecraft pursuit-evasion game. Chin. J. Aeronaut. 2024, 37, 312–319. [Google Scholar] [CrossRef]
  11. Jia, Z.; Ye, D.; Xiao, Y.; Sun, Z. Closed-Loop Strategy Synthesis for Real-Time Spacecraft Pursuit-Evasion Games in Elliptical Orbits. IEEE Trans. Aerosp. Electron. Syst. 2025. [Google Scholar] [CrossRef]
  12. Liu, Y.; Zhang, Y.; Jiang, J.; Li, C. Multiple-to-one orbital pursuit: A computational game strategy. IEEE Trans. Aerosp. Electron. Syst. 2024, 61, 2213–2225. [Google Scholar] [CrossRef]
  13. Shen, H.; Casalino, L. Revisit of the three-dimensional orbital pursuit-evasion game. J. Guid. Control Dyn. 2018, 41, 1820–1831. [Google Scholar] [CrossRef]
  14. Shen, M.; Wang, X.; Zhu, S.; Wu, Z.; Huang, T. Data-driven event-triggered adaptive dynamic programming control for nonlinear systems with input saturation. IEEE Trans. Cybern. 2023, 54, 1178–1188. [Google Scholar] [CrossRef]
  15. Lv, Y.; Ren, X. Approximate Nash solutions for multiplayer mixed-zero-sum game with reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 2018, 49, 2739–2750. [Google Scholar] [CrossRef]
  16. Zhang, T.; Zhang, S.; Li, H.; Zhao, X. Velocity-free prescribed-time orbit containment control for satellite clusters under actuator saturation. Adv. Space Res. 2025, 75, 5110–5123. [Google Scholar] [CrossRef]
  17. Yao, Q.; Li, Q.; Xie, S.; Jahanshahi, H. Distributed predefined-time robust adaptive control design for attitude consensus of multiple spacecraft. Adv. Space Res. 2025, 75, 7473–7486. [Google Scholar] [CrossRef]
  18. Wei, C.; Wu, X.; Xiao, B.; Wu, J.; Zhang, C. Adaptive leader-following performance guaranteed formation control for multiple spacecraft with collision avoidance and connectivity assurance. Aerosp. Sci. Technol. 2022, 120, 107266. [Google Scholar] [CrossRef]
  19. Xue, X.; Wang, X.; Han, N. Leader-Following Connectivity Preservation and Collision Avoidance Control for Multiple Spacecraft with Bounded Actuation. Aerospace 2024, 11, 612. [Google Scholar] [CrossRef]
  20. Grimm, F.; Kolahian, P.; Zhang, Z.; Baghdadi, M. A sphere decoding algorithm for multistep sequential model-predictive control. IEEE Trans. Ind. Appl. 2021, 57, 2931–2940. [Google Scholar] [CrossRef]
  21. Wu, J.; Dai, L.; Xia, Y. Iterative distributed model predictive control for heterogeneous systems with non-convex coupled constraints. Automatica 2024, 166, 111700. [Google Scholar] [CrossRef]
  22. Dai, L.; Cao, Q.; Xia, Y.; Gao, Y. Distributed MPC for formation of multi-agent systems with collision avoidance and obstacle avoidance. J. Franklin Inst. 2017, 354, 2068–2085. [Google Scholar] [CrossRef]
  23. Mu, C.; Wang, K. Approximate-optimal control algorithm for constrained zero-sum differential games through event-triggering mechanism. Nonlinear Dyn. 2019, 95, 2639–2657. [Google Scholar] [CrossRef]
  24. Jiang, Y.; Hu, S.; Damaren, C.; Luo, L.; Liu, B. Trajectory planning with collision avoidance for multiple quadrotor UAVs using DMPC. Int. J. Aeronaut. Space Sci. 2023, 24, 1403–1417. [Google Scholar] [CrossRef]
  25. Zhu, W.; Pang, Z.; Si, J.; Gao, G. Dynamics and configuration control of the Tethered Space Net Robot under a collision with high-speed debris. Adv. Space Res. 2022, 70, 1351–1361. [Google Scholar] [CrossRef]
  26. Yang, Y.; Fan, X.; Xu, C.; Wu, J.; Sun, B. State consensus cooperative control for a class of nonlinear multi-agent systems with output constraints via ADP approach. Neurocomputing 2021, 458, 284–296. [Google Scholar] [CrossRef]
Figure 1. Graphical representation of the orbital pursuit–evasion system with TSNR.
Figure 1. Graphical representation of the orbital pursuit–evasion system with TSNR.
Aerospace 12 00710 g001
Figure 2. Diagram illustrating the structure of the proposed leader–follower scheme.
Figure 2. Diagram illustrating the structure of the proposed leader–follower scheme.
Aerospace 12 00710 g002
Figure 3. Diagram illustrating the capture angle.
Figure 3. Diagram illustrating the capture angle.
Aerospace 12 00710 g003
Figure 4. The computational schematic of the SDMC method for spacecraft formation tracking.
Figure 4. The computational schematic of the SDMC method for spacecraft formation tracking.
Aerospace 12 00710 g004
Figure 5. Trajectories of the virtual leader and the evader during the OPEG.
Figure 5. Trajectories of the virtual leader and the evader during the OPEG.
Aerospace 12 00710 g005
Figure 6. Relative position between the virtual leader and the evader during the OPEG.
Figure 6. Relative position between the virtual leader and the evader during the OPEG.
Aerospace 12 00710 g006
Figure 7. Convergence of the weights of the optimal value function.
Figure 7. Convergence of the weights of the optimal value function.
Aerospace 12 00710 g007
Figure 8. Colored scatter plot of initial positions, annotated by control effort.
Figure 8. Colored scatter plot of initial positions, annotated by control effort.
Aerospace 12 00710 g008
Figure 9. Colored scatter plot of unseen initial positions, annotated by capture time.
Figure 9. Colored scatter plot of unseen initial positions, annotated by capture time.
Aerospace 12 00710 g009
Figure 10. Trajectories of the followers and configuration of the tethered net under the formation tracking control.
Figure 10. Trajectories of the followers and configuration of the tethered net under the formation tracking control.
Aerospace 12 00710 g010
Figure 11. The variation in the capture angle under two different reference trajectories during formation tracking.
Figure 11. The variation in the capture angle under two different reference trajectories during formation tracking.
Aerospace 12 00710 g011
Figure 12. The average inter-agent distance with different predictive horizons.
Figure 12. The average inter-agent distance with different predictive horizons.
Aerospace 12 00710 g012
Figure 13. Comparison of formation distance variations with the APF method.
Figure 13. Comparison of formation distance variations with the APF method.
Aerospace 12 00710 g013
Figure 14. Comparison of input variations with the APF method.
Figure 14. Comparison of input variations with the APF method.
Aerospace 12 00710 g014
Figure 15. Trajectories of the followers and configuration of the tethered net under the formation tracking control with x E 0 = [ 20 ; 30 ; 80 ; 0 ; 0 ; 0 ] .
Figure 15. Trajectories of the followers and configuration of the tethered net under the formation tracking control with x E 0 = [ 20 ; 30 ; 80 ; 0 ; 0 ; 0 ] .
Aerospace 12 00710 g015
Table 1. Description of different control and state variables.
Table 1. Description of different control and state variables.
SymbolDescription
u ^ i Assumed control
x ^ i Assumed state
u ˜ i Feasible control
x ˜ i Feasible state
u i * Optimal control
x i * Optimal state
Table 2. Initial states of the spacecraft in Euler–Hill frame, and expected relative state between followers in body-fixed frame.
Table 2. Initial states of the spacecraft in Euler–Hill frame, and expected relative state between followers in body-fixed frame.
x P 0 x E 0 d 12 r d 23 r d 34 r d 41 r
x ( m )00 ± 2000−20020
y ( m )00 ± 2000000
z ( m )00 ± 200200−200
x ˙  ( m / s )0.0100000
y ˙  ( m / s )000000
z ˙  ( m / s )000000
Table 3. Comparison of performance metrics (mean ± standard deviation) between SDMPC and APF.
Table 3. Comparison of performance metrics (mean ± standard deviation) between SDMPC and APF.
MetricSDMPCAPF
Control Effort ( m 2 / s 4 )   975.46 ± 101.78   1031.25 ± 106.19
Maximum Formation Deviation ( m )   8.66 ± 2.65   11.56 ± 3.11
Formation Error Convergence Time ( s )   39.56 ± 4.26   42.83 ± 5.67
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Z.; Wang, C.; Luo, J. Game Theory-Based Leader–Follower Tracking Control for an Orbital Pursuit–Evasion System with Tethered Space Net Robots. Aerospace 2025, 12, 710. https://doi.org/10.3390/aerospace12080710

AMA Style

Zhu Z, Wang C, Luo J. Game Theory-Based Leader–Follower Tracking Control for an Orbital Pursuit–Evasion System with Tethered Space Net Robots. Aerospace. 2025; 12(8):710. https://doi.org/10.3390/aerospace12080710

Chicago/Turabian Style

Zhu, Zhanxia, Chuang Wang, and Jianjun Luo. 2025. "Game Theory-Based Leader–Follower Tracking Control for an Orbital Pursuit–Evasion System with Tethered Space Net Robots" Aerospace 12, no. 8: 710. https://doi.org/10.3390/aerospace12080710

APA Style

Zhu, Z., Wang, C., & Luo, J. (2025). Game Theory-Based Leader–Follower Tracking Control for an Orbital Pursuit–Evasion System with Tethered Space Net Robots. Aerospace, 12(8), 710. https://doi.org/10.3390/aerospace12080710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop