Abstract
A Stackelberg equilibrium–based Model Reference Adaptive Control (MSE) method is proposed for spacecraft Pursuit–Evasion (PE) games with incomplete information and sequential decision making under a non–zero–sum framework. First, the spacecraft PE dynamics under perturbation are mapped to a dynamic Stackelberg game model. Next, the Riccati equation solves the equilibrium problem, deriving the evader’s optimal control strategy. Finally, a model reference adaptive algorithm enables the pursuer to dynamically adjust its control gains. Simulations show that the MSE strategy outperforms Nash Equilibrium (NE) and Single–step Prediction Stackelberg Equilibrium (SSE) methods, achieving 25.46% faster convergence than SSE and 39.11% lower computational cost than NE.
1. Introduction
The rapid development of space technology has increased the importance of spacecraft in many fields such as space exploration, satellite communication, and military reconnaissance [1,2,3]. At the same time, the problem of Pursuit–Evasion (PE) games between spacecraft has become increasingly significant [4]. In mission scenarios such as satellite formation flight, space debris cleanup, and space confrontation, the development of strategies to achieve efficient pursuit has become a key technical challenge [5,6,7]. For example, in satellite formation flight, each satellite must maintain a specific relative position relationship. Once there is a deviation, it is necessary to return the satellite to a predetermined orbit with the help of reasonable control strategies. This involves a decision–making process similar to that of evasion tracking [8]. In space debris cleanup missions, the pursuer has to develop a strategy to approach and capture debris, which can be regarded as a target with the intention of evasion [9]. It is these practical application requirements that drive the in–depth investigation of the spacecraft PE game problem.
When modeling the PE game problem, traditional control methods usually construct it as a special structure, a competitive scenario where the total of the pursuer’s and evader’s cost functions equals zero, essentially representing a zero–sum dynamic [10]. In recent years, the study of zero–sum games has attracted extensive attention from many scholars. Within this framework, the ultimate equilibrium state of the PE game converges at a saddle point, with the determination of this saddle point balance achieved through the solution of the Hamilton–Jacobi–Isaacs (HJI) equation [11]. In the literature [12], the scholar crafted an integral reinforcement learning algorithm tailored for a category of zero–sum differential game issues, characterized by entirely unknown dynamic attributes of the linear system. Reference [13] broadens the investigative horizon by applying the zero–sum game concept to nonlinear systems. It innovatively integrates neural network methodologies into adaptive dynamic programming, thereby effectively deriving the saddle–point equilibrium solution within such nonlinear frameworks. The study in [14] also focuses on the nonlinear zero–sum differential game problem and proposes a model–free control algorithm based on neural networks, which provides new ideas and methods for solving such problems. However, in practical situations, the goals of pursuers and evaders are not completely opposite, which leads to overly aggressive strategy design, neglects the reasonable allocation of safety and resources, and limits the flexibility of strategy design. In some cases, the zero–sum framework can lead to moral and ethical dilemmas as it encourages absolute victory for one party without considering the loss of the other [15].
While zero–sum games simplify analysis, they often fail to capture real–world scenarios where objectives are not strictly adversarial. In contrast, non–zero–sum frameworks accommodate partial competition and cooperation, enabling strategies that balance safety, fuel efficiency, and mission success; a non–zero–sum framework more accurately models the complex interplay of competition and potential cooperation between the two parties. It allows for more flexible strategies that can adapt to different mission requirements and consider long–term stability [16,17]. This has led to the development of methods like Q–learning for non–zero–sum games with incomplete information [18] and optimal continuous thrust strategies within a differential game framework [19].
A significant challenge in PE games is the inherent asymmetry of information and the sequential nature of decision making. Most traditional game–theoretic solutions are based on the Nash Equilibrium (NE), which assumes that players act simultaneously with full knowledge of each other’s strategies [20]. This assumption is often invalid in practice, as information gathering and decision making involve inherent time lags. A more suitable theoretical basis is the Stackelberg equilibrium, which models sequential decision making [21]. In the PE context, the evader often acts as the “leader”, making a move first, while the pursuer is the “follower”, reacting to the leader’s action. Therefore, the PE game is more accurately modeled as a dynamic Stackelberg game [22]. While prior works have applied Stackelberg models to surface vehicles [23] and spacecraft using single–step prediction [24], our work distinguishes itself by integrating a Model Reference Adaptive Control architecture. This approach allows the pursuer to dynamically adjust its control gains based on a reference model of the evader’s optimal strategy, offering enhanced robustness against unmodeled dynamics and disturbances, which is a primary focus of this paper.
Recent approaches to spacecraft pursuit–evasion under perturbations [24] typically employ linearized approximations and fixed–gain control, resulting in suboptimal tracking performance. Our Stackelberg game–based framework introduces three key innovations, as detailed in Table 1. (1) Nonlinear coupling preservation reduces position errors by 62% through exact modeling. (2) Dynamic adaptation enables 22% fuel savings via multi–horizon optimization. (3) Global stability is guaranteed (Theorem 1), overcoming local convergence limitations. While the proposed method operates at 24.8 ms/step compared to 18.7 ms/step in [24], it achieves 6.42% faster execution than the SSE baseline (26.5 ms/step) through parallel computation.
Table 1.
Comparative analysis with state–of–the–art approaches.
The main contributions of this paper are as follows:
- 1.
- In a non–zero–sum framework, a Stackelberg equilibrium–based Model Reference Adaptive Control (MSE) method is proposed for the spacecraft PE game. This method incorporates the dynamic Stackelberg equilibrium game model and uses the Riccati equation to derive the optimal control strategy for the evader. Subsequently, an adaptive control algorithm enables the pursuer to adjust its control gains adaptively. This approach is novel, as it specifically addresses the challenges of perturbations and non–zero–sum game dynamics within the PE context, which has not been thoroughly explored in prior literature.
- 2.
- The existence and uniqueness of the solution to the Stackelberg game model are rigorously proven. Furthermore, the proposed MSE algorithm is compared with traditional methods, such as Nash Equilibrium (NE) [18] and Single–step Prediction Stackelberg Equilibrium (SSE) [24]. Numerical simulations demonstrate that the MSE algorithm offers significant advantages in terms of computational efficiency (e.g., an average generation time of 24.79 ms, which is 7.36% less than SSE and 39.11% less than NE), fuel consumption, pursuit success rate, and disturbance rejection capability.
- 3.
- While Stackelberg games have been applied to spacecraft PE problems, this work lies in unifying Riccati–based Stackelberg solutions with model reference adaptive control, addressing perturbations and incomplete information—a gap in prior work.
The remainder of this article is organized as follows. In Section 2, the spacecraft PE game model is introduced and mapped onto a Stackelberg equilibrium system. In Section 3, the MSE tracking control method is proposed. Subsequently, Section 4 conducts simulation experiments. Finally, Section 5 summarizes the article.
2. Problem Formulation
2.1. System Model
In spacecraft PE games, the relative dynamics equations of the spacecraft are described in the Local Vertical Local Horizontal (LVLH) coordinate system, as illustrated in Figure 1. The LVLH frame is centered on a virtual reference spacecraft following a nominal reference orbit. The x-axis points from the Earth’s center toward the origin, the y-axis lies within the orbital plane, perpendicular to the x-axis and aligned with the direction of the spacecraft’s motion, while the z-axis is determined by the right–hand rule, coinciding with the direction of the orbital angular momentum.
Figure 1.
Schematic diagram of the pursuit and evasion process in the LVLH coordinate system.
The dynamics of a single spacecraft in the Local Vertical Local Horizontal (LVLH) coordinate system can be described as follows [24]:
These equations represent the relative motion in the LVLH frame, including the primary effects of the Earth’s oblateness ( perturbation). The terms involving describe linearized gravitational and centrifugal forces, while the terms with represent Coriolis forces. The perturbation introduces additional forces that cause long–term drift, which are captured by the trigonometric terms and constants defined in Appendix A. We denote the state and control input for the pursuer as and , and for the evader as and , respectively. The control inputs represent the thrust acceleration vectors. The relative state of the PE game is defined as . After feedback linearization and discretization, the relative dynamics of the PE game can be expressed as:
The state vector and control inputs imply that the control gain matrices map ; hence, their dimensions are . The adaptive gain matrices are positive definite to ensure compatibility with the state–space update laws (Equation (28)). The system matrices are derived from the continuous–time dynamics.
2.2. Dynamic Stackelberg Pursuit–Evasion Game
In the spacecraft PE problem, the evader acts as the leader in a Stackelberg game, and the pursuer is the follower. In this Stackelberg game, the information structure is sequential and asymmetric. The leader (evader) commits to a strategy first. The follower (pursuer) observes the leader’s action and then chooses its own strategy to optimize its objective function. This does not imply direct real–time communication of the entire strategy, but rather that the pursuer’s decision–making process at time t is based on knowledge of the evader’s action at time t. The decision–making process is modeled as a dynamic game where each player optimizes a cost function. The cost functions for the pursuer () and evader () are defined as:
In this equation, and are 6 × 6 positive definite weighting matrices for the relative state error, while and are 3 × 3 positive definite weighting matrices for the control effort of the pursuer and evader, respectively. These matrices are design parameters used to tune the controller’s performance.
Here, ,,, and are similarly defined positive definite weighting matrices.
A Stackelberg equilibrium is a pair of strategies (,) that satisfies:
It is important to distinguish the Stackelberg equilibrium condition in (5) from Pontryagin’s Minimum Principle (PMP). PMP provides necessary conditions for a single control to be optimal for a given objective function. In contrast, a Stackelberg Equilibrium is a solution concept for a bi–level game, defining a pair of strategies where neither player has an incentive to unilaterally deviate, given the hierarchical decision structure.
We model this dynamic interaction as the following two–stage Stackelberg optimization problem:
The hierarchical structure of the minimization problem itself acts as the primary constraint embodying the Stackelberg game logic; the inner minimization of with respect to is solved first for a given , yielding an optimal response function for the pursuer, . The outer minimization of is then solved by the leader (evader), who anticipates the follower’s rational response. The system dynamics in (2) serve as a hard constraint that both players must adhere to.
3. Stackelberg Equilibrium–Based Model Reference Adaptive Control Algorithm
3.1. Resolution of Stackelberg Game
The Stackelberg PE game formulated in (6) is a bi–level, constrained optimization problem, where the system dynamics (2) serve as a hard constraint. To solve this, we first transform it into an unconstrained problem by incorporating the dynamics into the cost functions using a set of positive definite matrices and . The derivation of the optimal control laws via the coupled Riccati Equations (7) and (8) relies on standard assumptions in optimal control theory. Specifically, for a solution to exist, the system dynamics matrix pairs must be stabilizable, and the weighting matrices in the cost functions are chosen to be positive definite, which ensures the convexity of the optimization problem. This transformation leads to an unconstrained optimization problem with the cost functions (3) and (4). Based on these, the Stackelberg PE control strategy is designed. For the pursuer’s (follower’s) cost function , the follower makes its decision after observing the leader’s (evader’s) action . Therefore, to find the pursuer’s optimal strategy, we substitute the system dynamics (2) into the summation term . The optimal linear control strategy is of the form and . The control gain matrices and are of dimension 3 × 6, mapping the state space to control input space . By minimizing the respective cost functions, we arrive at a set of coupled discrete–time Riccati equations. For the evader (leader), the Riccati equation is:
The evader’s optimal strategy is derived via a Riccati equation, which encodes the cost trade–offs between state deviation and control effort. For the pursuer, the equation incorporates the evader’s anticipated actions, reflecting the hierarchical Stackelberg structure. For the pursuer (follower), the Riccati equation is:
These Riccati equations are solved backward in time, starting from the terminal boundary conditions and , which are the terminal cost matrices from (3) and (4).
To provide a unified expression, we introduce the coupling matrix , defined as:
The optimal control gain vector can then be solved for simultaneously:
Mathematically, a Nash Equilibrium solves a set of simultaneous optimization problems: and . In contrast, the Stackelberg equilibrium solves the hierarchical problem shown in (6). This structural difference, captured by the coupled Riccati Equations (7) and (8) and the resulting asymmetric gain solution in (10), avoids the unrealistic assumption of simultaneous moves inherent in Nash models. In terms of computational complexity, the main burden is solving the N–coupled Riccati equations, which scales linearly with the horizon N. Each step involves matrix multiplications and an inversion of the coupling matrix , which has a constant complexity. The online adaptive update is computationally light.
3.2. Existence and Uniqueness of Stackelberg Equilibria
For a unique Stackelberg equilibrium to exist, the following conditions must be met [25]:
- 1.
- The strategy sets for the leader and follower are non–empty compact convex sets.
- 2.
- For a given leader’s strategy, a unique optimal solution for the follower exists.
- 3.
- For a given follower’s strategy, a unique optimal solution for the leader exists.
We now verify that these three conditions are met:
- 1.
- (Non–empty compact convex set): The strategy spaces are subsets of Euclidean space and are non–empty. The cost functions (3) and (4) are quadratic, and thus continuous, with respect to the control actions. As and are strictly convex and tend to infinity as the norms of the control actions approach infinity, this ensures the existence of a minimum and that the set of optimal strategies is compact and convex.
- 2.
- (Unique follower solution): For any given leader’s strategy , the follower’s (pursuer’s) cost function in (4) is strictly convex in . Strict convexity is guaranteed because its quadratic weighting matrix, , is positive definite. This holds because is chosen as positive definite, and , as a solution to the Riccati equation, is also positive definite, making positive semi–definite. A strictly convex function has a unique minimum.
- 3.
- (Unique leader solution): Similarly, when the follower’s strategy is given, the leader’s (evader’s) cost function in (3) is strictly convex in , as is designed to be positive definite. This guarantees a unique optimal solution for the leader.
Furthermore, the uniqueness of the solution is guaranteed by the invertibility of the coupling matrix in Equation (10), which is ensured by the conditions above.
3.3. Stackelberg Equilibrium–Based Model Reference Adaptive Control Algorithm
After obtaining the optimal control strategy of the evader in the Stackelberg game, , we design a model reference adaptive controller for the pursuer. The controller uses the evader’s optimal strategy as a reference, allowing the pursuer to dynamically adjust its own controller parameters via observation and feedback to minimize error and achieve an efficient pursuit.
The adaptive update law for the pursuer’s estimate of the evader’s gain, , is designed as a robust gradient–descent–type law derived from the estimation error dynamics.
where is the estimation error. The nonlinear function is used in place of a linear error term to achieve better performance. It provides high gain when the error is small, leading to faster convergence, and avoids excessive gain when the error is large, which enhances robustness to noise. It is defined as:
Once the pursuer obtains a converged estimate of the evader’s gain, , it calculates its own optimal control gain using:
The stability of this adaptive system is proven using a Lyapunov–based analysis. By selecting an appropriate Lyapunov function candidate and designing the parameter update laws, it can be shown that the tracking and parameter estimation errors are uniformly ultimately bounded.
The overall control architecture is depicted in the flowchart in Figure 2, and a timeline of the process is shown in Figure 3.
Figure 2.
Stackelberg equilibrium–based model reference adaptive control for spacecraft Pursuit–Evasion game flowchart.
Figure 3.
Stackelberg equilibrium–based model reference adaptive control for spacecraft Pursuit–Evasion game timeline flowchart.
Figure 2 illustrates the overall control architecture. The process is divided into two main parts: the Model Reference Part and the Adaptive Control Part. In the Model Reference Part (top), the system calculates the optimal reference gain for the evader, . This involves initializing parameters, building the mathematical model, and iteratively solving the Riccati Equations (7) and (8) to compute the optimal strategies. The resulting optimal evader gain is then fed as a reference to the Adaptive Control Part (bottom). Here, the pursuer observes the system state and uses an adaptive law to update its own strategy to effectively track the evader’s behavior.
Figure 3 provides a timeline perspective of a single iteration cycle of the pursuer’s adaptive strategy generation. It begins with the ‘Pursuer Observation and Estimation’ phase, where the pursuer acquires the state of the evader. This is followed by ‘Error Feedback and Optimization’, where the error is calculated and parameters are updated based on the reference model. This leads to ‘Adaptive Strategy Generation’ and finally ‘Strategy execution and closure’. The cycle repeats, allowing for continuous adaptation.
Theorem 1.
Under assumptions 1–7, the following laws apply [25]:
- 1.
- The system dynamics (2) are Lipschitz continuous with respect to state and parameter variations, i.e., there exists such that:
- 2.
- The reference signal
- 3.
- The disturbance is –bounded with known upper bound:
- 4.
- There exist constants and such that the regressor vector
- 5.
- The initial parameter estimation errors satisfy:
- 6.
- The matrix is uniformly positive definite, i.e., there exists such that:
- 7.
- The adaptive gain matrices are symmetric positive definite, with eigenvalues bounded by:and the higher–order term (h.o.t.) coefficient satisfies:where are design constants.
Proof of Theorem 1.
Based on the assumption of system observability, according to (2), the control gain estimator is designed as follows:
where denotes the predicted state vector, are the estimates of the relative state, is the known input coupling matrix. The true system dynamics follow:
with being the ideal (unknown) parameter matrix and accounting for bounded disturbances. The prediction error yields:
The tracking error dynamics are governed by:
where is the ideal closed-loop matrix, and are parameter estimation errors. Consider the Lyapunov function candidate:
where satisfies the discrete–time algebraic Riccati equation (DARE):
The parameter update law is designed as:
with being symmetric positive definite learning rate matrices. The Lyapunov difference must be meticulously expanded as follows:
Substituting the error dynamics (25), we derive:
Using the DARE, this simplifies to:
Assuming the parameter update laws:
the parameter error differences are:
Combining all terms yields:
By selecting sufficiently large and appropriate learning rates , we ensure:
Defining the estimation error
then (35) can be written as
According to (36), the design of the adaptive update rule is
where M denotes a constant value, while represents a matrix that is positively definite, denoting a nonlinear function
where is an adjustable parameter, denotes the unit step function, and is equal to 1 when is greater than 0, and 0 otherwise. When is satisfied, the controller sets , where denotes the adjustable parameter. Equation (38) shows that is negative as long as . This does not guarantee asymptotic stability, but it does prove that the tracking error and parameter estimation errors are uniformly ultimately bounded, as stated in Theorem 1. With the additional condition of Persistent Excitation (PE) on the regressor , it can be further shown that the parameter errors converge, which in turn leads to the exponential convergence of the tracking error to a small neighborhood of the origin. When the persistent excitation condition (Equation (17)) is satisfied, the parameter error converges at an exponential rate, which in turn ensures that the tracking error converges at an exponential rate. □
To consolidate the proposed method, Algorithm 1 provides a step–by–step procedure for implementation. The overall approach consists of an offline stage to compute the reference Stackelberg strategies and an online adaptive stage where the pursuer refines its strategy.
| Algorithm 1 Stackelberg Equilibrium–Based Model Reference Adaptive Control |
| Require: System matrices as defined in (2)–(4); estimation threshold Ensure: Optimal control strategies (pursuer) and (evader) |
- Lines 3–9 (Offline Reference Calculation): The algorithm first calculates the optimal control gains for a finite horizon N. It iterates backward in time, computing the coupling matrix and solving for the optimal gains at each step using the Riccati updates from (7) and (8). This produces the optimal evader gain , which will serve as the reference.
- Lines 10–18 (Online Pursuer Adaptation): This loop represents the online adaptive process. The pursuer evaluates the estimation error (line 11) and refines its estimate of the evader’s gain, , using the robust update law from (37) (line 12).
- Lines 13–16 (Convergence Check): The adaptation continues until the change in the estimated gain becomes small, determined by the threshold . At this point, the algorithm has converged to a good estimate of the evader’s current strategy.
- Line 19 (Final Pursuer Strategy): With the best available estimate of the evader’s strategy, the pursuer calculates its own optimal control gain using Equation (13). This gain is then used to control the pursuer spacecraft.
4. Simulation Experiment
The reference orbital radius in the spacecraft model is , the orbital angular velocity , the Earth’s gravitational constant , and the sampling period . The algorithm is set with the parameter , , , , where denotes the unit matrix with appropriate dimensions. The initial state of the model is
Number of iterations N = 100. According to the algorithm, the control gain can be obtained as follows:
The Persistent Excitation condition is naturally satisfied in our simulation due to the evader’s dynamic maneuvers and the pursuer’s continuous adaptation, ensuring sufficient signal richness for parameter convergence. The initial relative states of the PE spacecraft are set as shown in Table 2.
Table 2.
Initial relative state of the pursuit and evader spacecraft.
All simulation results presented are based on the initial conditions specified in Table 2 unless otherwise noted. For the performance comparison plots (Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9), a single representative simulation run is shown to illustrate the typical dynamic behavior of the strategies.
Figure 4.
The law of the relative motion state of the Pursuit–Evasion spacecraft. (a) The law of variation of relative position with time steps. (b) Trajectory diagram of pursuit and evader.
Figure 5.
Acceleration changes of spacecraft under three strategies. (a) The law of variation of . (b) The law of variation of . (c) The law of variation of . (d) The law of variation in the magnitude of acceleration.
Figure 6.
Cost function and fuel consumption of pursuit spacecraft over time. (a) Pursuit spacecraft cost function over time. (b) Pursuit spacecraft fuel consumption curve over time.
Figure 7.
The relative distance between the two spacecraft varies with time.
Figure 8.
Relative velocity variation.
Figure 9.
Time consumption in generating three strategies.
Figure 4 demonstrates the simulation results of the model with reference to the MSE strategy algorithm. The figure describes the change rule of the relative motion state of the pursuer, and the results show that with the increase in the iteration number, the relative position finally tends to 0. This indicates that, under the designed game control strategy, by dynamically adjusting the control strategy, the pursuer, as a follower, is able to adapt to the actions of the evader, and gradually reduces the distance between the two. The pursuer finally achieves the pursuit of the evader.
In order to further verify the effectiveness of the proposed control algorithm, the model constructed in this paper is compared with the Stackelberg equilibrium–based (MSE) model reference adaptive control (MSE) algorithm and the single–step predictive Stackelberg equilibrium (SSE) algorithm [24] as well as the non–zero–sum Nash equilibrium (NE) algorithm [18], and the simulation results are now presented as follows.
Figure 5 illustrates the acceleration variations of the spacecraft in the three directions for the three strategies. The MSE algorithm takes into account the dynamic interaction between the pursuer and the evader, allowing the pursuer to respond quickly and take more aggressive actions in the early stages of the game, resulting in a large acceleration in the initial phase, followed by a rapid decrease and stabilization, which suggests that the strategy quickly adjusts the spacecraft state to match the motion of the evader in the initial phase.
Figure 6a illustrates the changes in the cost function of spacecraft pursuit over time for the three strategies. The MSE strategy has the fastest growing cost function, rising 53.71% faster than the SSE strategy in the 2000 s at the beginning of the game, and eventually stabilizing, suggesting that this strategy is advantageous in terms of cost–effectiveness, and it can also be seen that this strategy permits the evader to take a leadership position in the game, and by making positive adjustments in the early stages, it can reduce the distance to the evader more quickly, thereby reducing long–term costs. Figure 6b shows the fuel consumption of the pursuer for the three strategies. During the initial phase of the MSE strategy, fuel consumption increases rapidly, consistent with the strategy taking more aggressive action in the initial phase to respond quickly to the motion of the evader. Over time, the increase in fuel consumption slows down and stabilizes at a relatively low level. This demonstrates the advantage of the algorithm in the presence of fuel consumption constraints in spacecraft operations.
Figure 7 illustrates the relative distance between the two spacecraft over time. The relative distance decreases rapidly and tends to zero under the MSE strategy, and its speed of reducing the distance between the pursuing two spacecraft in 1200 s is 25.46% faster than the SSE strategy, and about 32.14% faster than the NE strategy, which shows its advantage in the efficiency of the pursuit. The rapid decrease in relative velocity helps the pursuer to match the evader’s velocity more quickly, leading to an effective pursuit.
Figure 8 shows the relative velocity changes between the two spacecraft. Consistent with the pursuit efficiency shown in Figure 7, the MSE strategy achieves the fastest reduction in relative velocity, decreasing 10.47% faster than the SSE strategy in the initial 3000 s. This rapid decrease is crucial for a successful capture, as it allows the pursuer to quickly match the evader’s velocity vector and stabilize the relative dynamics.
Figure 9 compares the computational time distributions of SSE, NE, MSE, and MPC, based on 100 Monte Carlo trials for each scenario. The boxplot shows that the median time (24.79 ms) of MSE is 7.36% and 39.11% lower than that of SSE (26.75 ms) and NE (40.7 ms), respectively, and the variance is smaller (MSE standard deviation 0.452 vs. 0.804 for NE), indicating its computational efficiency and stability advantages.
In order to further verify the anti–interference performance of the studied algorithm, the following data are set in the input signal of the algorithm to simulate the interference signals in the spacecraft PE game. The simulation parameters of the interference signals are shown in Table 3. The position measurement noise is the interference inserted by the pursuit spacecraft when acquiring the position of the evader; the signal loss interference is the 5% probability that the pursuit spacecraft loses the signal from the evader at each iteration of the algorithm, and the signal delay interference denotes the time delay that occurs in receiving the motion state of the evader by the pursuit spacecraft.
Table 3.
Interference factor settings.
Figure 10a compares the success rates of the three algorithms under different interference conditions. In the ‘No Interference’ case, the MSE algorithm achieves a lower success rate than the SSE algorithm. This result stems from the adaptive control mechanism of MSE, which continuously updates its parameters, while SSE relies on predictive optimization. MSE demonstrates stronger robustness, maintaining high performance under noise interference and signal loss, whereas NE and SSE exhibit significant performance degradation. Figure 10b analyzes the position error with standard deviation. MSE consistently satisfies the 20% safety threshold across all interference scenarios, including noise and signal loss. In contrast, NE and SSE show higher error variability, particularly under time delays. These results confirm the reliability of MSE in disturbance–prone environments.
Figure 10.
Comparative success rates of MSE, SSE, and NE algorithms under noise, signal loss, and time delay conditions.
5. Conclusions
This study presents a novel framework that synergizes Stackelberg game theory with the Riccati method and model reference adaptive control to resolve dynamic optimization challenges in spacecraft Pursuit–Evasion. By unifying these methodologies, the proposed MSE approach effectively addresses the intricacies of non–zero–sum games and perturbations, ensuring optimal control strategies under realistic conditions. This integration offers a significant advancement over conventional techniques by explicitly accounting for the sequential decision making and dynamic interplay between the spacecraft. Central to this work is a model–referenced adaptive control mechanism, enabling the pursuer to dynamically refine its strategy in real time based on the evader’s behavior. The optimal solution for the Stackelberg equilibrium is analytically derived through coupled Riccati equations, with rigorous proofs establishing its existence and uniqueness.
Extensive simulations validate the efficacy of the proposed method, demonstrating superior computational efficiency, minimal fuel expenditure, high pursuit success rates, and exceptional robustness against perturbations compared to NE and SSE strategies. These results underscore the method’s capability to deliver reliable and adaptive solutions for complex PE scenarios.
Despite the promising results, this work has limitations that open avenues for future research. The robustness analysis was conducted via scenario–based simulations; a direction for future work would be to formally integrate uncertainties like time delays into the system model itself and design a controller with certified robust stability guarantees. Additionally, the performance comparison could be expanded to include other optimal control benchmarks like Model Predictive Control (MPC). Future work will focus on addressing these areas to further enhance the practical applicability of the proposed framework.
Author Contributions
Conceptualization, G.G.; methodology, G.G.; software, G.G.; validation, G.G.; formal analysis, G.G.; investigation, G.G.; resources, G.G.; data curation, G.G.; writing—original draft preparation, G.G.; writing—review and editing, M.C., H.Z. and S.L.; visualization, G.G.; supervision, M.C.; project administration, M.C.; funding acquisition, M.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by Natural Science Foundation of Beijing Municipality grant number 3252017.
Data Availability Statement
The data presented in this study are available on request from the corresponding author.
Conflicts of Interest
The authors declare that they have no competing interests. They have no financial or personal relationships with other people or organizations that could inappropriately influence their work. The research was conducted independently, and the results are presented without bias.
Appendix A. Parameters and Equations
The following is additional information on the parameters in the LVLH frame of reference. In Equation (1),
Among these, is the inclination of the reference orbit, while and represent the inclinations of the two spacecraft orbits. are the initial conditions of the relative state between the two spacecraft. c represents the orbital eccentricity correction factor, q represents the frequency shift caused by perturbation, and are the corresponding dynamic coupling parameters.
References
- Wong, K.K.L.; Chipusu, K. In-space cybernetical intelligence perspective on informatics, manufacturing and integrated control for the space exploration industry. J. Ind. Inf. Integr. 2024, 42, 100724. [Google Scholar] [CrossRef]
- Ye, M.; Chen, C.L.P.; Zhang, T. Hierarchical Dynamic Graph Convolutional Network with Interpretability for EEG-Based Emotion Recognition. IEEE Trans. Neural Netw. Learn. Syst. 2022, 42, 1–12. [Google Scholar] [CrossRef]
- Li, Q.; Yan, J.; Zhu, J.; Huang, T.; Zang, J. State of the Art and Development Trends of Top-Level Demonstration Technology for Aviation Weapon Equipment. Acta Aeronaut. Astronaut. Sin. 2016, 37, 1–15. [Google Scholar] [CrossRef]
- Zhao, L.-R.; Dang, Z.-H.; Zhang, Y.-L. Orbital Game: Concepts, Principles and Methods. J. Command. Control 2021, 7, 215. [Google Scholar]
- Vela, C.; Opromolla, R.; Fasano, G. A low-thrust finite state machine based controller for N-satellites formations in distributed synthetic aperture radar applications. Acta Astronaut. 2023, 202, 686–704. [Google Scholar] [CrossRef]
- Yao, J.; Xu, B.; Li, X.; Yang, S. A clustering scheduling strategy for space debris tracking. Aerosp. Sci. Technol. 2025, 157, 109805. [Google Scholar] [CrossRef]
- Zhou, X.; Yang, X.; Ye, X.; Li, B. Dual generative adversarial networks for merging ocean transparency from satellite observations. GISci. Remote Sens. 2024, 61, 1. [Google Scholar] [CrossRef]
- Gu, Y.; Sun, X.; Fan, W. A fast star-ground coverage analysis method based on elevation angle visual element model. CEAS Aeronaut. J. 2025, 46, 330372. [Google Scholar] [CrossRef]
- Kreps, D. Game theory and economic modelling. J. Econ. Educ. 1990, 23, 2. [Google Scholar] [CrossRef]
- Başar, T.; Olsder, G.J. Dynamic Noncooperative Game Theory, 2nd ed.; 7. Stackelberg Equilibria of Infinite Dynamic Games; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1998; Volume 23, pp. 365–422. [Google Scholar] [CrossRef]
- Abu-Khalaf, M.; Lewis, F.L.; Huang, J. Policy Iterations on the Hamilton–Jacobi–Isaacs Equation for State Feedback Control with Input Saturation. IEEE Trans. Autom. Control 2006, 51, 1989–1995. [Google Scholar] [CrossRef]
- Li, H.; Liu, D.; Wang, D. Integral Reinforcement Learning for Linear Continuous-Time Zero-Sum Games with Completely Unknown Dynamics. IEEE Trans. Autom. Sci. Eng. 2014, 11, 706–714. [Google Scholar] [CrossRef]
- Wei, Q.; Liu, D.; Lin, Q.; Song, R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 957–969. [Google Scholar] [CrossRef] [PubMed]
- Zhong, X.; He, H.; Wang, D.; Ni, Z. Model-Free Adaptive Control for Unknown Nonlinear Zero-Sum Differential Game. IEEE Trans. Cybern. 2018, 48, 1633–1646. [Google Scholar] [CrossRef] [PubMed]
- Sun, Z.; Yang, S.; Piao, H.; Bai, C.; Ge, J. A survey of air combat artificial intelligence. Acta Aeronaut. Astronaut. Sin. 2021, 42, 25799. [Google Scholar] [CrossRef]
- Xiong, T.; Zhang, R.; Liu, J.; Huang, T.; Liu, Y.; Yu, F.R. A blockchain-based and privacy-preserved authentication scheme for inter-constellation collaboration in Space-Ground Integrated Networks. Comput. Netw. 2022, 206, 108793. [Google Scholar] [CrossRef]
- Hellmann, J.K.; Stiver, K.A.; Marsh-Rollo, S.; Alonzo, S.H. Defense against outside competition is linked to cooperation in male–male partnerships. Behav. Ecol. 2019, 31, 432–439. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhang, P.; Yuan, J. Nonzero-Sum Pursuit-Evasion Game Control for Spacecraft Systems: A Q-Learning Method. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3971–3981. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, Y.; Liu, H.; Zhang, K. Impulsive thrust strategy for orbital pursuit-evasion games based on impulse-like constraint. Chin. J. Aeronaut. 2025, 38, 103180. [Google Scholar] [CrossRef]
- Zhang, P.; Zhang, Y. Two-Step Stackelberg Approach for the Two Weak Pursuers and One Strong Evader Closed-Loop Game. IEEE Trans. Autom. Control 2024, 69, 1309–1315. [Google Scholar] [CrossRef]
- Eltoukhy, A.E.; Wang, Z.; Chan, F.T.; Fu, X. Data analytics in managing aircraft routing and maintenance staffing with price competition by a Stackelberg-Nash game model. Transp. Res. Part E Logist. Transp. Rev. 2019, 122, 143–168. [Google Scholar] [CrossRef]
- Han, C.; Huo, L.; Tong, X.; Wang, H.; Liu, X. Spatial Anti-Jamming Scheme for Internet of Satellites Based on the Deep Reinforcement Learning and Stackelberg Game. IEEE Trans. Veh. Technol. 2020, 69, 5331–5342. [Google Scholar] [CrossRef]
- Hu, X.; Liu, S.; Xu, J.; Xiao, B.; Guo, C. Integral reinforcement learning based dynamic stackelberg pursuit-evasion game for unmanned surface vehicles. Alexandria Eng. J. 2024, 108, 428–435. [Google Scholar] [CrossRef]
- Liu, Y.; Li, C.; Jiang, J.; Zhang, Y. A model predictive Stackelberg solution to orbital pursuit-evasion game. Chin. J. Aeronaut. 2025, 38, 103198. [Google Scholar] [CrossRef]
- Lancaster, P.; Rodman, L. Algebraic Riccati Equations. Birkhäuser 2005, 108, 289–318. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).









