1. Introduction
With the rapid development of space technology, space competition has become increasingly intense, and threats to space security are on the rise. In response to the complex and changing space situation, countries around the world are competing to enhance spacecraft performance, achieving significant breakthroughs in computational capability, maneuverability, and intelligence levels. Against this backdrop, orbital games [
1] have received widespread attention, with typical application scenarios including orbital pursuit–evasion [
2], orbital defense [
3], and orbital pursuit–defense [
4]. However, a single satellite is limited by computational resources, maneuvering energy, and payload capacity, which leads to evident limitations in executing tasks in orbit. Therefore, the collaborative deployment and application of multi-aircraft motion systems have become a current research hotspot in the field of aerospace, and it is essential to develop high-performance distributed cooperative control technologies. Distributed spacecraft cooperative control methods are fundamental to tasks such as in-orbit servicing, space target surveillance, and orbital games. Given the complexity of future orbital tasks, as well as the high operational costs and limited resources of spacecraft, the trajectory planning must consider spacecraft safety while optimizing objectives like fuel consumption and time efficiency. Therefore, the flight trajectories of multi-spacecraft need to be optimized to achieve the best cooperative approach strategy.
Currently, spacecraft trajectory planning methods have been widely studied. Traditional planning methods can generally be categorized into two main types: indirect methods and direct methods. Indirect methods adopt an analytical optimization approach based on optimal control theory. They reformulate the optimal control problem as a boundary value problem (BVP) and derive the first-order necessary conditions (FONCs) for optimal trajectories using variational principles. However, as spacecraft missions grow increasingly complex, trajectory planning problems also scale in size and difficulty, with more constraints being introduced. Such complexities significantly increase the challenges of deriving and numerically solving FONCs, leading to higher computational costs. Moreover, indirect methods are highly sensitive to initial values, which limits their practical application in real-world engineering scenarios [
5]. Direct methods discretize the state and control variables, transforming the continuous-time optimal control problem into a finite-dimensional nonlinear programming (NLP) problem, that can be solved using numerical optimization techniques. Compared to indirect methods, although direct methods sacrifice some accuracy, they have gained widespread attention due to their numerical robustness, efficiency, and ability to handle complex constraints. Gradient-based techniques [
6] leverage numerical optimization algorithms like sequential quadratic programming (SQP) to solve trajectory planning problems, while convexification approaches [
7] transform non-convex optimization problems into convex ones to improve computational efficiency. Some researchers have attempted to combine direct methods and indirect methods to leverage their complementary advantages in addressing the complex problems of spacecraft trajectory planning [
8]. However, direct methods tend to converge to local optima due to their gradient-based nature, and the computational cost increases with the addition of constraints.
Heuristic methods provide innovative approaches to trajectory optimization, with representative techniques including Genetic Algorithms (GA) [
9], Particle Swarm Optimization (PSO) [
10], and Ant Colony Optimization (ACO) [
11] demonstrating success in solving complex optimization problems. Such methods are particularly effective for solving complex nonlinear, multi-modal, non-convex, or discontinuous optimization problems, which are common in engineering scenarios. Their strong global search capabilities can effectively address the issue of sensitivity to initial conditions and demonstrate outstanding performance in engineering applications [
11]. In the early stages of spacecraft trajectory planning, the focus was typically on optimizing a single objective, such as fuel consumption or flight time. In recent years, researchers have increasingly turned their interest toward multi-objective trajectory optimization problems. Solving such problems can be viewed as a process of determining the Pareto front. Advanced multi-objective optimization algorithms, such as MOPSO [
12] and NSGAII [
13,
14,
15], have already been successfully applied to single-spacecraft trajectory optimization. Such methods explore the solution space through iterative optimization. To improve the computational efficiency of trajectory planning for spacecraft, the Clohessy–Wiltshire (CW) equations are typically adopted as the foundational model [
13,
15,
16]. This model is derived from the precise nonlinear equations of motion in the Local Vertical Local Horizontal (LVLH) coordinate system, linearized to obtain relative dynamics equations. The simplifications made during the derivation result in insufficient accuracy for modeling elliptical orbits, limiting its applicability to circular orbits and short-range approach scenarios. However, the linearization characteristics of the CW equations enables the explicit derivation of the relationship between impulsive maneuvers and terminal states, allowing the rapid evaluation of solution quality in each iteration without numerical integration. This not only enhances computational efficiency but also facilitates future engineering implementation for on-orbit applications. Therefore, this study focuses on approach decision-making methods for near-circular orbits based on the CW equations. Future work could extend the proposed trajectory planning method to the elliptical orbits by reducing the complexity of the computational solution of nonlinear dynamic equations and improving computational efficiency [
17]. Additionally, intelligent methods such as imitation learning [
18] and reinforcement learning [
19] can be combined to achieve approach control for long-distance, high-eccentricity orbits.
When faced with more complex space missions, single-spacecraft systems encounter significant limitations due to constraints such as computational resources, maneuvering energy, and payload capacity. In contrast, multi-spacecraft systems, with the advantages of high fault tolerance, flexibility, and efficiency, have gradually become a research focus in the aerospace field. They are widely applied in tasks such as formation flying, orbital maintenance, and deep space exploration [
20]. Centralized control requires a central node to gather global information and make unified decisions. However, due to the constraints of spacecraft sensing, communication, and computing capabilities, achieving efficient and reliable on-orbit decision-making is challenging. In comparison, distributed control distributes computational tasks among multiple nodes, with each node only processing information from itself and nearby nodes. This significantly reduces computational complexity, improves efficiency, and is better suited for the demands of high-dynamic, cooperative operations. In [
21], the computational efficiency of spacecraft formation reconfiguration path planning using a distributed method is approximately seven times higher than that of a centralized method. This highlights that distributed control methods are a key technological trend for addressing complex tasks in future high-dynamic space environments. In recent years, numerous spacecraft swarm control methods have emerged. However, most are based on continuous maneuvering models and primarily target tasks such as formation maintenance and orbital adjustments. Common methods include leader–follower control [
22], artificial potential fields [
23], and swarm control [
24]. However, these methods often fail to meet the demands of long-distance maneuvering and rapid approach to targets under an impulsive maneuvering mode. To achieve coordinated target approach for multi-spacecraft, there is an urgent need to develop a cooperative path-planning method suitable for the impulsive maneuvering mode.
Distributed negotiation strategies are often used to achieve task balancing in the multi-spacecraft system. In [
25], a networked game model based on a game-theoretic negotiation mechanism is proposed. Through cooperation with neighbors, individual actions are updated iteratively to reach a Nash equilibrium. Similarly, in [
26], the self-organizing task allocation problem in multi-spacecraft systems was modeled as a potential game, and a distributed task allocation algorithm based on game learning was proposed. Inspired by this, the study attempted to use a distributed negotiation method to determine the cooperative approach strategy. However, due to information asymmetry among spacecraft in distributed conditions, decentralized computational methods struggle to plan approach paths from a global perspective, which means global optimality cannot be guaranteed. To address this, this paper studies cooperative approach strategies in multi-spacecraft systems under distributed conditions. Facing challenges such as complex constraints, opponent interception, information asymmetry, and the tendency to fall into local optima, a cooperative game negotiation strategy combining offline trajectory planning and online distributed negotiation is proposed. The main contributions are as follows:
A multi-objective optimization model considering multiple constraints is established, and a constraint-handling mechanism based on the constraint dominance principle (CDP) is introduced. By combining the NSGAII method, the NSGAII-CDP algorithm is designed to efficiently generate the Pareto front, thereby obtaining a set of approach paths that meet the constraint.
The negotiation strategy among multi-spacecraft is modeled as a cooperative game, with defined players, strategy space, and local reward functions. Based on this, the existence and convergence of the Nash equilibrium under distributed conditions are theoretically analyzed and verified by constructing an exact potential function.
A distributed negotiation strategy based on simulated annealing is proposed, effectively overcoming the problem of negotiation strategies tending to fall into local optima and improving global optimization performance.
The remaining sections of this paper are organized as follows.
Section 2 describes the multi-spacecraft coordinated approach problem and the relative dynamics model.
Section 3 presents the multi-objective optimization algorithm, including the optimization variables, objective functions, constraints, mathematical model and the design of NSGAII-CPD. In
Section 4, the distributed negotiation strategy based on simulated annealing is presented, and the existence and convergence of the Nash equilibrium are discussed.
Section 5 demonstrates the effectiveness and superiority of the proposed method through numerical simulations. Finally,
Section 6 draws the conclusions of this paper.
3. Multi-Objective Optimization
This section takes the ith pursuer as an example to illustrate the design and application of the multi-objective optimization algorithm. The optimization variables consist of the time intervals between consecutive impulses and the velocity increments associated with each impulse.
To facilitate the constraint on the maximum velocity increment for each impulse, the
kth velocity increment vector is transformed into spherical coordinates
, where
denotes the magnitude of the velocity increment,
is the angle between the velocity vector and the x-y plane, and
is the angle between the projection of the velocity vector onto the x-y plane and the x-axis. Therefore,
is defined as
Define as the velocity increment corresponding to the transformation of into the Earth-Centered Inertial (ECI) coordinate system. The optimization variables can be represented as a set of parameters , and the total number of optimization variables is .
3.1. Objective Functions
Considering the timeliness of space game tasks, it is required that pursuers complete the approach task within a relatively short period. Furthermore, the total fuel consumption during the mission and the successful completion of the approach task are also crucial. Therefore, three optimization objectives are defined as follows:
The total fuel consumption:
The terminal distance:
where
denotes the distance between pursuer
and target
. The parameter
specifies the terminal moment of the mission.
3.2. Constraints
Firstly, the relative dynamics constraint between the pursuer and the target must be satisfied and is expressed as follows:
The initial relative state conditions should satisfy the following constraints:
The maneuvering time should satisfy the following constraints:
where
represents the maneuvering moment of the
kth impulse.
defines the minimum time interval either between two impulses or between the moment of the first impulse and the initial time. This ensures that the pursuer has sufficient time for iterative optimization or attitude adjustments before executing a maneuvering decision.
represents the maximum time interval, while
denotes the maximum total mission time.
Due to the limitation of the thruster capacity, the velocity increment of a single-impulse maneuver operation and the total velocity increment should satisfy the following constraints:
where
represents the maximum velocity increment of a single-impulse maneuver operation, and
stands for the maximum total velocity increment available from the thruster.
In addition, to ensure that the pursuer avoids interception, the following passive safety constraints must be satisfied:
where
denotes the distance between pursuer
and defender
, and
represents the safe distance. Furthermore,
M is the number of defenders.
Finally, to achieve the final approach operation, corresponding terminal constraints are established, which mainly include constraints on the terminal distance and terminal relative velocity.
The terminal distance constraint requires that, at the final moment, the distance between the pursuer and the target must be less than the specified distance
to accomplish the approach operation:
The terminal relative velocity constraint requires that, at the final moment, the relative velocity between the pursuer and the target must be less than the relative velocity
:
where
denotes the relative velocity between pursuer
and target
.
3.3. Mathematical Model
The mathematical model of the multi-objective optimization problem (MOOP) studied in this study can be expressed as follows:
where
The MOOP described by (
20) and (
21) can be formulated as a problem of finding the Pareto optimal solutions. Multi-objective optimization algorithms can be used to obtain the Pareto optimal front.
3.4. NSGAII-CDP
NSGAII is an improved multi-objective evolutionary algorithm (MOEA) based on NSGA [
27]. Based on NSGAII, Simulated Binary Crossover (SBX) and Polynomial Mutation are applied to continuous variables, eliminating the need for additional encoding or decoding steps and improving the efficiency and adaptability of the algorithm. Specifically, given two parent individuals
and
, two offspring individuals
and
are generated using the SBX operator:
where
is a random variable uniformly distributed between 0 and 1, denoted as
.
is dynamically and randomly determined by the distribution factor
according to (
23).
Individual
f undergoes polynomial mutation and transforms into
where
and
denote the variable’s upper and lower bounds.
is dynamically and randomly determined by the distribution factor
according to (
25).
Furthermore, for constraint handling, common techniques include the CDP, adaptive penalty methods, and adaptive trade-off models. Among these, CDP performs the best [
28]. Therefore, this paper adopts CDP to handle multiple constraints, including the following three basic rules [
29]:
Any feasible solution is preferred to any infeasible solution.
Among two feasible solutions, the one with a better objective function value is preferred.
Among two infeasible solutions, the one with a smaller constraint violation is preferred.
For this purpose, an error operator
e is introduced to calculate the overall degree of constraint violation independently of the objective function.
e is defined as follows:
During population sorting, solutions are first divided into feasible and infeasible based on whether
e equals zero. Feasible solutions are sorted using the non-dominated sorting method, while infeasible solutions are ordered in ascending degree of constraint violation and positioned after the feasible ones. The flowchart of NSGAII-CDP is shown in
Figure 3.
4. Distributed Negotiation
A distributed cooperative game negotiation method is employed among the pursuers to jointly negotiate based on the existing Pareto front and determine an optimal comprehensive approach strategy that balances objectives such as flight time intervals, total flight time, and total fuel consumption.
4.1. Communication Topology
The communication topology is modeled using graph theory, with an undirected graph describing the communication topology of the multi-spacecraft system. represents the set of nodes, and represents the set of edges in the connections. Node corresponds to pursuer , and edge indicates that can exchange information with .
Due to communication resource constraints, each spacecraft can only communicate with a limited number of other spacecraft. An adjacency matrix
is defined to describe inter-satellite communication, specifically expressed as
Since the communication network is undirected, the adjacency matrix is symmetric, meaning for all . Assume that the communication topology contains at least one spanning tree, ensuring the connectivity of the communication network.
4.2. Distributed Cooperative Game Model
To achieve efficient strategy negotiation, this section models the cooperative game problem as , where represents the set of players (i.e., the pursuers), and each player acts as an independent decision-maker. represents the strategy space, where is the Pareto front for . represents the combination of strategies of all players, where is the feasible strategy set for , derived from its Pareto front obtained through multi-objective optimization. denotes the payoff functions for all players, while represents the individual payoff of a player. Here, represents the strategies of the other players. During the cooperative game process, continuously adjusts its strategy and exchanges information with other players to maximize its own payoff .
Pursuers aim to achieve a balance that prioritizes ensuring the most consistent arrival time possible while simultaneously minimizing the total flight time and reducing their respective fuel consumption as much as possible. Therefore, the utility function
is defined as
where
and
represent the total flight time and total fuel consumption of
, respectively, while
denotes the total flight time of
. Dimensionless parameters
, and
c are introduced to standardize these indicators. Coefficients
, and
serve as weighting factors. The overall optimization objective is obtained through a weighted linear summation, allowing for different weighting schemes to be set according to varying requirements.
4.3. Nash Equilibrium Existence and Convergence
Each pursuer optimizes its strategy based on its payoff function to achieve an optimal planning outcome. The system reaches Nash equilibrium [
30] when no individual pursuer can unilaterally adjust its strategy to gain a greater benefit. Importantly, at the end of each cooperative game round, pursuers must exchange their strategies to maintain consistency in collective understanding.
4.3.1. Existence of Nash Equilibrium
Theorem 1. Every game with a finite number of players and action profiles has at least one Nash equilibrium [
31].
In the negotiation and decision-making process of pursuers, the number of pursuers is finite, and the size of the Pareto front obtained from multi-objective optimization is also finite. Referring to Theorem 1, it can be concluded that a Nash equilibrium exists in the distributed negotiation process.
4.3.2. Convergence of Nash Equilibrium
To further illustrate the global convergence of Nash equilibrium, construct a potential function
Analysis indicates that the following condition holds for all pursuers:
where
and
represent the strategy choices of
in the
gth and
th iteration, respectively. Therefore,
G is a potential game with an exact potential function [
32].
This study is based on the best response selection strategy, which can be described as
Each pursuer selects the most advantageous strategy based on the current partial information. In other words, each round of decision-making must ensure the maximization of current benefits
, such that the payoff of each individual after each round of the game is guaranteed to be non-decreasing. According to (
31), the following can be derived:
Since the strategy space is finite, Equation (
30) is monotonically increasing and bounded. During the strategy adjustment process, when no single player can unilaterally change their strategy to achieve a higher payoff, the system is deemed to have reached a stable state, corresponding to a Nash equilibrium. Consequently, with a sufficient number of iterations, it is guaranteed that the decision-making process will converge to a locally optimal solution, and potentially even achieve a globally optimal solution.
4.4. Simulated Annealing Distributed Negotiation
However, due to factors such as incomplete information decision-making and the mutual influence of strategies, it is challenging to converge the global optimal solution. Therefore, a distributed cooperative game negotiation strategy with a simulated annealing mechanism is designed to escape from local optima and search for better solutions. The process of simulated annealing cooperative game negotiation is shown in
Figure 4.
In each round of the cooperative game, pursuer
randomly selects a strategy and calculates its payoff. Then, the strategy is accepted with probability
p. If not accepted,
optimizes its strategy according to (
32). The acceptance probability
p is determined by the Metropolis criterion:
where
T represents the annealing temperature, and
denotes the payoff difference. From (
34), it can be concluded that
When , the strategy is accepted fully.
When
, the strategy is adjusted according to the probability in (
34).
The value of p is influenced by the temperature T and the payoff difference . The higher the temperature T, the larger the p. Conversely, the worse the payoff of the new strategy, the smaller the p. Therefore, when the temperature is very high, there is still a high probability of accepting the new strategy, even if its payoff is significantly lower.
To ensure proper convergence of the algorithm, the temperature
T is annealed using an exponential decay approach, defined as
where
and
represent the annealing temperature in the
gth and
th iteration, respectively.
is the annealing parameter that represents the decay rate. Algorithm 1 illustrates the iterative process for pursuer
to find a cooperative approach strategy.
Algorithm 1: Simulated Annealing Distributed Negotiation Process |
- 1
Obtains the Pareto front through NSGAII-CDP - 2
Initialize and - 3
Randomly select an initial strategy - 4
for to do - 5
if - 6
- 7
else - 8
- 9
end if - 10
- 11
Randomly select a new strategy - 12
end for - 13
Output
|