1. Introduction
Recently, the pursuit–evasion problem between spacecraft remains an important area of study in aerospace engineering, where one spacecraft acts as the pursuer and the other as the evader. Various pursuit-and-evasion strategies have been proposed to solve the pursuit–evasion problem between two spacecraft. But these studies mostly focus on a single pursuit–evasion strategy; that is, the Evader only uses one evasion strategy. However, with the improvement of spacecrafts’ intelligence levels, the Evader may adopt a variety of strategies and will be able to switch to a more appropriate one according to the situation.
For the pursuit–evasion problem, the use of differential game theory, which had been widely used by other researchers in this field, was initially proposed as a solution by Issacs in 1965 [
1]. Mauro et al. [
2] studied the problem of a long-distance pursuit–evasion game by transforming it into a two-point boundary value problem and proposed a semi-direct method to obtain the saddle point solution. Ye et al. [
3] also transformed the pursuit-and-evasion problem into a two-point boundary value problem and solved it through a heuristic search. This algorithm was applied in a close-range pursuit–evasion game under different thrust configurations. Li et al. [
4] and Zhang et al. [
5] studied the pursuit–evasion game considering the 
J2 perturbation and proposed new methods with which to quickly find the saddle point solution. Prince et al. [
6] used the indirect heuristic method to study the differential game of proximity operations in elliptical orbits. Pang et al. [
7] studied the pursuit–evasion game along an elliptical orbit by providing a precise gradient. Unfortunately, the methods mentioned above are all open-loop solutions and cannot be used for real-time feedback control.
Consequently, researchers studied the real-time feedback control of the pursuit–evasion problem. Li et al. [
8] proposed an infinite-horizon nonlinear quadratic differential game considering the motion camouflage pursuit problem. Wang et al. [
9] and Ye et al. [
10] investigated pursuit–evasion control based on zero-effort miss and deduced a pursuit–evasion feedback control strategy. Zhang et al. [
11] proposed a new adaptive weighted differential game guidance law to intercept maneuvering targets by combining two guidance laws derived from complete and incomplete information modes. Li et al. [
12] designed a linear- quadratic duration-adaptive strategy to solve the orbital pursuit–evasion–defense game problem.
In addition, other methods have also been studied to solve the pursuit–evasion problem. Gong et al. [
13] used the reachable region method to study the pursuit–evasion game under continuous thrust and derived the analytical form of the reachable region based on the Hill–Clohessy–Wiltshire (HCW) equation. Zhao et al. [
14] proposed an impulsive pursuit–evasion algorithm based on a multi-agent deep deterministic policy gradient (MADDPG) that can yield a pursuit–evasion strategy under multiple constraints. In summary, in the continuous thrust pursuit–evasion game scenario, the real-time feedback control law based on game theory is more likely to be applied in space due to its simple structure and excellent pursuit ability.
In order to evade interception by the Pursuer more effectively, the Evader may employ a variety of strategies and switch from one to another in the process depending on the situation [
15]. For example, in reference [
16], Evaders have varying structural dynamics and will switch from one mode to another in the game, resulting in changes of evasion strategies. Therefore, for the Pursuer, it is necessary to adjust its pursuit mode to achieve efficient interception of the Evader. In the complete information pursuit–evasion game, the Pursuer can obtain all the information of the Evader and adjust its pursuit strategy in real time. However, the Pursuer generally cannot obtain all the information on the Evader in practice. Under the condition of incomplete information, the Pursuer has to estimate the strategy adopted by the Evader according to the states of the two players and adjust its strategy in real time. This motivated us to study the problem of intercepting the Evader with a switchable evasion strategy under conditions of incomplete information.
The interactive multiple-model filter (IMM) is a method for estimating the state of dynamic systems, and it can be used for Markov stochastic jump systems. It is widely used in state estimation [
17], dynamic target tracking [
18], fault detection [
19,
20], and so on. Considering the characteristics of multiple-model estimation, the IMM method is used to solve the problem of Evader strategy switching in a pursuit–evasion game. Zou et al. [
21] proposed a cooperative estimation method that combines the IMM of the evader and the Kalman filter of the defender. The cooperative method can effectively estimate the state of the Pursuer and significantly improve the accuracy of active defense guidance. Tang et al. [
22] used the IMM method to estimate the state information of the Evader by combining the smooth variable sliding filter with mode matching, achieving a good interception effect.
However, the difference in control strategies is very small in spacecraft pursuit–evasion game strategy switching due to the low thrust of satellite thrusters. The classical IMM cannot predict the evasion mode precisely, which will affect interception performance. In addition, the model probability estimated by an IMM is unstable and fluctuates greatly due to the influence of navigation information error. Motivated by the above problems, this paper aims to design a method that can accurately and stably estimate the strategy mode of the Evader so as to intercept the Evader quickly.
The main novelties and contributions of this paper are as follows: (1) A switchable escape strategy based on a linear quadratic game strategy and a zero-effort miss game strategy was designed. (2) An interactive multiple-model learning filter is proposed by introducing the idea of feedback. (3) An interactive multiple-model learning filter based on an LSTM network (LSTM-IMML) is proposed.
The remainder of this paper is organized as follows. The Pursuer and Evader dynamics in the pursuit–evasion game are introduced in 
Section 2. In 
Section 3, a switchable pursuit–evasion strategy based on linear quadratic and zero-effort miss distance is designed. 
Section 4 details the interactive multiple-model learning filtering method based on an LSTM network. In 
Section 5, the proposed LSTM-IMML method is compared with IMM to validate the performance of the proposed method. Finally, 
Section 6 concludes the paper.
  2. Dynamics of Spacecraft Pursuit–Evasion
To describe the maneuver game of two players, a reference spacecraft which is very close to the Pursuer and Evader is selected as the origin of the reference coordinate system 
oxyz as shown in 
Figure 1. The coordinate systems 
OXYZ and 
oxyz represent the Earth inertial and the reference coordinate systems, respectively. The axis 
ox points from the Earth’s center to the center of mass of the reference spacecraft; the axis 
oy points to the velocity direction, and the 
oz axis completes the right-hand rule.
Assuming the orbit of the reference spacecraft is circular and the two satellites maneuver near the reference spacecraft, the dynamics of the Pursuer and Evader in the reference coordinate system 
oxyz can be simplified to the HCW equation.
      
      where, subscript 
P represents the Pursuer and subscript 
E represents the Evader, 
 represent the positions of the Pursuer and the Evader in the reference coordinate system 
oxyz, 
 denote the velocities, 
 is the orbital angular velocity of the reference spacecraft, 
 represent the thrust acceleration in three axes.
Defining state vector 
 and control vector 
, Equation (1) can be written
      
      where
      
Then, the state space equations of the Pursuer and Evader can be obtained as
      
      where subscript 
P represents the Pursuer and subscript 
E represents the Evader. In the reference coordinate system, the dynamics of the pursuit–evasion game are the difference between the dynamics of the Pursuer and the Evader. Defining 
, the relative dynamics can be written as
      
Considering the actual thruster limitation of the satellite, it is assumed that the constraints of thrust acceleration amplitude are
      
Using the control strategy that satisfies Equation (5), the Pursuer and Evader compete for the terminal distance. The goal of the Pursuer is to intercept the Evader in the shortest time, while the Evader expects to avoid the interception. The terminal interception set is defined as
      
      where 
 and 
 represent the position vectors of the Pursuer and the Evader, respectively. The Pursuer wants to use control 
 to make the relative state enter the set 
 as soon as possible, while the Evader expects to avoid this by applying control 
.
  3. Game Strategy Switch
In the pursuit–evasion game scenario, the Evader expects to increase the relative distance and quickly maneuver away from the Pursuer during the approach process. However, it will easily be intercepted by the Pursuer if the Evader adopts a fixed evasion strategy and this leads to the game strategy switch of the Evader. As the relative distance changes, the Evader will adjust evasion strategies in due time. This paper assumes that two game strategies are adopted, one is the linear quadratic game in consideration of fuel consumption, and the other is the zero-effort miss game strategy with maximum thrust.
  3.1. Linear Quadratic Game Strategy
Linear quadratic differential game theory is widely used in pursuit–evasion problems. As described in [
23], the objective function is first constructed, which is the quadratic function of the state difference and the control vector of the Pursuer and the Evader. The objective function of the Pursuer is
        
Due to the opposite goals of the Pursuer and the Evader, the designed objective function is also opposite. Then the objective function of the Pursuer can be represented as
        
        where, 
 is a positive semi-definite matrix and 
 are positive definite matrices. The conditions for the control sets 
 and 
 of the Pursuer and the Evader to arrive at the saddle point solution of the game are
        
Based on linear quadratic differential game theory, the optimal feedback control law of both sides can be derived [
23].
        
        where, 
P is a symmetric matrix obtained by solving the algebraic Riccati equation reversely, which satisfies
        
  3.2. Zero-Effort Miss Game Strategy
When the pursuit–evasion game comes to the final stage, the Pursuer and Evader compete for the distance between them. In this case, only the relative distance of the two satellites is considered without considering fuel consumption, and the zero-effort miss 
 is introduced.
        
        where, 
 and 
 are the state transition matrix of system (2) from 
 to 
, as shown in [
10], which satisfies
        
By taking the derivative of zero-effort miss, we can obtain
        
        where, 
.
In the pursuit–evasion process, the Pursuer expects to reduce the zero-effort miss, while the Evader expects to increase the zero-effort miss in a way that is beneficial to itself as much as possible, so the zero-effort miss as the objective function is defined as
        
According to the derivation process in reference [
10], the control strategy of the Pursuer under maximum thrust acceleration can be obtained as
        
        where 
 is the maximum thrust acceleration amplitude of the Pursuer.
In the same way, the control strategy of the Evader under maximum thrust acceleration is
        
        where 
 is the maximum thrust acceleration amplitude of the Evader.
  3.3. The Design of Switchable Pursuit-Evasion Strategy
The pursuit–evasion strategy based on the linear quadratic method takes into consideration the fuel consumption, so it is suitable for long-distance pursuit and evasion. In addition, the thrust acceleration according to the linear quadratic feedback control law is related to  and the relative state , and the thrust acceleration of the Evader will decrease as the distance between the two satellites decreases. Hence, the Evader will increase its thrust acceleration by switching the parameters of linear quadratic strategy. That is, when the Pursuer approaches the Evader, the Evader increases the output thrust acceleration by decreasing .
When the distance between the Pursuer and the Evader is reduced to the warning range of the Evader, the best option for the Evader is to maneuver away from the Pursuer with a maximum thrust amplitude. That is, the Evader will switch to the zero-effort miss evasion strategy regardless of fuel consumption.
Based on the above analysis, the evasion strategy of the Evader can be designed. Suppose that the Evader has 
 evasion modes. Firstly, the Evader switches between different linear quadratic strategies by changing 
 in the former 
 strategies, and then, finally, the Evader switches to the zero-effort miss strategy of the 
M-th strategy. The Evader has 
 evasion modes, the former 
 strategies are linear quadratic evasion strategies, and the 
M-th is a zero-effort miss evasion strategy, which can be expressed as
        
        where 
 represents the distance between the two spacecraft and 
 represents the strategy switch boundary of the Evader.
The Pursuer will also switch its strategy after the Evader performs a strategy switch. Therefore, the Pursuer’s strategy can be designed as
        
However, in the case of incomplete information, the Pursuer does not know the strategy adopted by the Evader. Hence, the difficulty of strategy switching for the Pursuer lies in the estimation of the evasion strategy used by the Evader.
  4. Strategy Estimation Method
To maneuver away from the Pursuer more effectively, the Evader will actively switch evasion strategy, which requires the Pursuer to estimate the evasion strategy in real time, and then change to its appropriate pursuit strategy. In this section, an interactive multiple-model learning filtering combined with an LSTM neural network is proposed. Feedback learning filters are used to estimate the state of the Evader, and then the evasion strategy is estimated by the LSTM network. Afterwards, the Pursuer switches to an appropriate pursuit strategy based on the estimated evasion strategy to intercept the Evader.
  4.1. IMM-Based Strategy Switch Method
The multiple-model idea of the strategy switch of the pursuit–evasion game is to map the possible evasion strategy of the Evader into a model set, where each model corresponds to an evasion strategy. At the same time, multiple filters are used, working in parallel to estimate the state of each model. Then, the evasion strategy of the Evader is obtained by calculating the effective probability of each model.
The strategy switch method based on IMM is mainly divided into the following four steps:
The dynamics of the pursuit–evasion game between two spacecraft satisfy the Markov process, and the transition probability is
        
        where 
 represents the system mode at time 
k.
According to the estimation of each filter in the previous moment, the inputs of the filter corresponding to the j-th model at the current moment should be calculated first, these being mixed probability, mixed state estimation, and the corresponding error covariance matrix.
Mixed probability can be expressed as
        
        where 
 is constant, 
 is the probability of matching model 
i at time 
, 
 represents the transition probability from model 
i to model 
j.
Mixed state estimation and the corresponding error covariance matrix are given as
        
        where 
 and 
 are the state estimation and error covariance matrix of the 
i-th model filter at time 
k − 1, respectively.
According to different evasion strategies, the corresponding pursuit–evasion strategy model is constructed.
For the pursuit–evasion model with a linear quadratic evasion strategy, it can be expressed as
        
        where 
 and 
 denotes the 
 linear quadratic pursuit strategy.
By discretizing the above equation, the state transition equation in discrete form can be obtained.
        
        where 
 is zero-mean Gaussian white noise and represents the process noise sequence.
For the pursuit–evasion model with a zero-effort miss evasion strategy, it can be expressed as
        
Similarly, the state transition equation can be obtained by discretizing the above equation
        
In this way, the state transition matrix under the two game strategies can be acquired after receiving the new measurement information. Then, a Kalman filter is used to update the state estimation of the corresponding matching model 
j based on the new measurement information, which includes
        
        where superscript 
 represents the 
j-th filter corresponding to the 
j-th pursuit–evasion model, 
 and 
 are the process noise covariance and measurement noise covariance, 
 is the measurement matrix, and 
 is the measured value at time 
k.
For the model 
, the model posterior probability at time 
k is calculated
        
        where 
 is the Gaussian likelihood function of the 
j-th model.
        
Obviously, the posterior probability at step 
k is satisfied
        
According to the posterior probability of each model, the evasion strategy corresponding to the model with the highest probability is the strategy adopted by the Evader.
        
Then, as shown in Equation (20), the Pursuer chooses the corresponding pursuit strategy according to the evasion strategy.
Based on the output of each sub-filter, the overall estimate and estimation error covariance matrix at time 
 can be calculated
        
  4.2. Interactive Multiple-Model Feedback Learning Filter
In the IMM method, the final fusion estimate is only the output, with no feedback for the estimation of the next state. However, the final fusion tends to be more accurate than the model-dependent estimates, so the final fusion can be used as a reference for next state estimation. On this basis, an interactive multiple-model filter based on fusion estimation feedback learning is proposed.
First, the feedback learning term is defined based on the IMM method, namely
        
        where 
 is obtained from the overall estimate at time 
 through the system state transition matrix, i.e.,
        
        where 
 is the total estimate at time 
, calculated by Equation (34).
The feedback learning term is used for state estimation in the next moment; that is,
        
        where 
 is the gain constant of the feedback learning term.
Due to the introduction of the feedback learning term, the gain matrix 
 needs to be redesigned. The error is defined as follows.
        
According to the updated estimation equation, the relationship between the three errors can be represented as
        
The corresponding covariance matrix is
        
        where 
 is the identity matrix and
        
The optimal gain matrix 
 can be obtained by minimizing the covariance matrix trace, i.e.,
        
Using the matrix calculation theory, any matrix satisfies
        
The optimal gain matrix is
        
Thus, the Kalman filter in Step 2 can be replaced by
        
In this way, the interactive multiple-model feedback learning filter is obtained. In Step 2, the multiple-model feedback learning filter is used to estimate the state of the corresponding mode.
  4.3. LSTM-IMML Method
RNN (Recurrent Neural Network) is a neural network used to deal with time series problems. It will memorize the previous information and apply it to the current output [
24]. LSTM is a special RNN network, which is designed to solve the problem of long dependence. It can process sequence data efficiently and has been adopted in natural language processing, trajectory prediction [
25], time series prediction [
26], and other fields [
27]. The LSTM unit can determine whether the current input is important, so long-term information is not affected by recursive operations and is stored in a more secure manner [
24]. When estimating the evasion strategy, it is necessary to combine the current measurement information with the previous estimation state to improve accuracy and stability, which have a long-term dependence on information. Therefore, the LSTM network is embedded into the IMML method to estimate the mode probability of the evasion strategy in this section.
The LSTM cell structure is shown in 
Figure 2. 
, 
, and 
 are the states of output, input, and the cell state in the 
t-th epoch, respectively. A standard LSTM unit includes a forget gate, an input gate, and an output gate [
28]. The forget gate is used to determine which information will be forgotten from the cell state, the input gate determines whether new information can be kept in the cell state, and the output gate determines which information will be output. The mathematical expressions of the forget gate, input gate, and output gate can be seen in [
27].
In this paper, the LSTM network is mainly used to estimate the possible mode probability of the Evader based on the state information output by the multiple-model feedback filter. As shown in 
Figure 3, a probability estimation neural network based on LSTM is established.
As illustrated in 
Figure 3, the LSTM-based probability estimation network includes an input layer, two LSTM layers, two fully connected layers, and an output layer. The number of neurons in each layer is shown in 
Figure 3. A dropout layer is added after each LSTM layer to prevent the LSTM network from overfitting and to enhance the generalization ability of the network [
29]. The dropout probability of the dropout layer is set as 10%. During training, the dropout layer discards the activation value of the LSTM neurons with a probability of 10%, thereby improving the generalization ability of the network.
The framework of the proposed LSTM-IMML method for the Evader’s strategy estimation is shown in 
Figure 4. The method consists of two parts. The top block represents the offline training process and the bottom block represents the online estimation process. During training, the training data sets and the training test sets are first constructed according to the IMM method, and then the training is completed on the ground computer. After the training of the LSTM network is completed, the LSTM network can be applied in the actual space pursuit–evasion game scenario without online training.
The steps of using the LSTM-IMML algorithm to estimate the evasion strategy are detailed below.
- (1).
- Based on the last filter estimation, calculate the mixing probability , the mixing state estimation , and the mixing error covariance matrix ; 
- (2).
- Use multiple feedback learning filters to estimate and update the state of each model based on the new measurement information ; 
- (3).
- Take the measured residual  of the filter as the input, adopt the trained LSTM network to calculate each mode probability ; 
- (4).
- Estimate and fuse the output of each filter, then repeat step 1. 
Compared with the classic IMM method, the proposed LSTM-IMML method mainly has the following improvements. (1) The IMML method was inspired by the feedback idea, and the final fusion estimation is introduced into the estimation of the next state in the pursuit–evasion game, which can effectively improve the precision of state estimation. (2) Combined with its memory function, the LSTM network is introduced to improve the estimation stability of the Evader’s evasion strategy mode and reduce fluctuations. (3) The introduction of the LSTM network avoids the error caused by singular value in model probability estimation using the IMM, which increases the robustness of the method.
  5. Numerical Simulation
The pursuit–evasion game scenario is simulated to verify the effectiveness of the proposed strategy switch algorithm based on an LSTM network and multiple-model feedback learning filtering. The reference orbit is a geosynchronous orbit, the Pursuer and Evader maneuver around the reference spacecraft. The maximal thrust acceleration amplitude of the Pursuer and Evader are 
 and 
, respectively. The interception range is set as 
. The initial state of the Pursuer is 
, and the initial state of the Evader is 
. The Evader adopts three evasion strategies, the first two are linear quadratic evasion strategies, and the third is a zero-effort miss evasion strategy.
      
      where the mode switch boundary 
. In the linear quadratic pursuit–evasion strategy, 
, 
, 
, and 
. In the LSTM-IMML method, the Markov transition probability matrix is 
, and the elements in this matrix correspond to 
 in Equation (21). The gain constant 
 is 0.05.
In the feedback learning filter, the measurement noise covariance matrix is  and the process noise covariance matrix is . All prior model probabilities are set to the same value, i.e., the initial moment mode probability is .
Two cases are simulated in this section. One is that the Pursuer adopts a fixed pursuit strategy, and the other is that the Pursuer switches its own pursuit strategy according to estimation of the Evader’s strategy. The Evader performs a strategy switch in both cases.
  5.1. The Case of a Pursuer with a Fixed Strategy
In this scenario, the Evader performs multiple strategy switches, as shown in Equation (46). The Pursuer adopts a fixed strategy, which is the linear quadratic pursuit strategy. The simulation results of pursuit–evasion are as follows.
Figure 5a shows the maneuvering trajectories of the Pursuer and the Evader, and the distance between the two satellites is depicted in 
Figure 5b. The simulation results indicate that the Pursuer does not intercept the Evader successfully, although the Pursuer gradually approaches the Evader at the beginning. After the distance reaches the switch boundary, the Evader switches its strategy to the zero-effort miss strategy. Then, the Evader maneuvers away from the Pursuer at maximum thrust acceleration, as shown in 
Figure 5d, which leads to an increase in the relative distance. 
Figure 5c,d show the velocity and control acceleration of the two satellites, respectively, and it is clear that the Evader adopts two strategy switches to increase the relative velocity between the Pursuer and the Evader.
 From the above simulation results, we can conclude that the Pursuer with a fixed pursuit strategy may not be able to intercept the Evader with multiple switchable evasion strategies. Thus, the Pursuer also needs to switch its own strategy to intercept the Evader.
  5.2. The Case of a Pursuer with Strategy Switch
In this scenario, the Pursuer performs a strategy switch to match the Evader’s strategy, and the two IMM and LSTM-IMML methods are compared.
Firstly, the constructed strategy estimation network needs to be trained offline. According to the IMM method, the strategy of the Evader is estimated, and the strategy of the Pursuer is switched to perform the pursuit–evasion game simulation. Based on this, the training data set is constructed. The maximum number of training sets is 250, the gradient threshold is set as 1, and the learning rate is 0.005. The loss value in the training process is shown in 
Figure 6. From the training results, the LSTM neural network tends to converge after 250 iterations, indicating that the LSTM neural network is well trained.
Under the same conditions, based on the trained LSTM neural network, the simulation results of the LSTM-IMML method are shown as follows.
Figure 7a,b depict the maneuvering trajectory and relative distance when the Pursuer adopts the LSTM-IMML method for the estimation of the evasion strategy. It is clear that the Pursuer approaches the interception range within 20 m and finally intercepts the Evader. 
Figure 7c shows the velocities of the Pursuer and Evader. The control acceleration of the two satellites is shown in 
Figure 7d, which indicates the Pursuer quickly switches strategy after the Evader switches its evasion strategy. The effectiveness of the proposed LSTM-IMML strategy switch method is verified.
 Furthermore, the evasion mode probabilities of the Evader estimated by the IMM method and the LSTM-IMML method are further compared, as shown in the following.
Figure 8 shows the probability that the Pursuer uses the IMM method to estimate the evasion strategy adopted by the Evader. In general, the IMM method can estimate the evasion strategy effectively. However, when the difference between the different evasion strategies is small, the estimated probability difference is not obvious. Especially between 375 s and 420 s, the probability of mode 2 does not show an advantage, and the probability difference between mode 1 and mode 2 is weak. This is because, when the distance between the two satellites is reduced, the control outputs of the Evader’s strategy 1 and strategy 2 are similar, which leads to a weakening of the observability, and the filter cannot distinguish between the two strategy models. In addition, between 500 s and 600 s, although mode 3 is dominant, there is still a jump phenomenon.
 The mode probability of the Evader estimated by the LSTM-IMML method is shown in 
Figure 9, which illustrates that the evasion strategy used by the Evader can be accurately estimated. Compared with the IMM method, the LSTM network is more accurate and stable in estimating mode probability. In the corresponding time interval, the mode probability corresponding to the evasion strategy estimated by the Pursuer is above 0.8. In particular, between 300 and 420 s, the output difference between the control strategies of mode 1 and mode 2 is very small, and the observability becomes weaker. However, the mode 2 probability estimated by the LSTM-IMML method still occupies an advantage, which also demonstrates that the LSTM-IMML method is better. In addition, compared with the IMM method, the proposed LSTM-IMML method is more stable, and the estimated probability does not appear prone to drastic fluctuation.
To further analyze the accuracy of state estimation by the IMM method and the LSTM-IMML method, 100 Monte Carlo simulations were performed to calculate the error between the estimated state and the real value. The root mean square error of the position and the root mean square error of the velocity obtained by the IMM method and the LSTM-IMML method are shown in 
Figure 10 and 
Figure 11, respectively. The position and velocity root mean square error of the LSTM-IMML method are smaller than those of the IMM method, indicating that the introduction of feedback items in the filter can improve state estimation accuracy.
In addition, the case of the Evader using four evasion strategies is considered to further compare the IMM and LSTM-IMML methods; that is, the first three are linear quadratic strategies, and the fourth is a zero-effort miss strategy. The boundaries of mode switch are 
. In the linear quadratic pursuit–evasion strategy, 
. In the LSTM-IMML method, the Markov transition probability matrix is 
. The IMM and LSTM-IMML methods are used for simulation, and the simulation results are shown in 
Figure 12 and 
Figure 13.
Figure 12a and 
Figure 13a show the relative distance between the two satellites when the Pursuer adopts the IMM and LSTM-IMML methods, respectively. When the Evader adopts a switchable evasion strategy with four modes, the Pursuer using the LSTM-IMML method approaches the interception range within 20 m at the end, while the Pursuer using the IMM method does not. This is because evasion mode 2 and mode 3 are very similar, which cannot be accurately estimated using the IMM method, but can be accurately estimated using the LSTM-IMML method, which can be seen from 
Figure 12b and 
Figure 13b. This simulation further proves the superiority of the proposed LSTM-IMML method in mode probability estimation.
   6. Conclusions
In this paper, a new switchable pursuit strategy for the Pursuer is proposed when the Evader adopts multiple switchable evasion strategies. Firstly, the linear quadratic and zero-effort miss pursuit–evasion strategies are designed. Then, the IMM method is used to identify the evasion strategy of the Evader in parallel by using multiple filters. To overcome the problem that the IMM method has poor accuracy in estimating the Evader’s state, a feedback learning filter is proposed to improve the state estimation accuracy. The estimation accuracy is improved by introducing the feedback term. In addition, a mode probability estimation network based on LSTM is proposed to enhance the stability of the probability estimation, which is embedded in the interactive multiple-model learning filter. The simulation results verify the effectiveness of the proposed LSTM-IMML method. Compared with the IMM, the state estimation accuracy of the proposed LSTM-IMML method is improved, the estimated mode probability is more exact and stable, and the observability is stronger, which improves the interception effectiveness of the Pursuer. The proposed LSTM-IMML game switching strategy further enhances the recognition accuracy and stability of the Evader’s mode probability, which can be implemented in a space pursuit–evasion game mission with incomplete information. In subsequent studies, we will consider constraints such as navigation information loss and sensor operating distance.