Next Article in Journal
“Follower of the Reference Point”: Platform Utility-Oriented Incentive Mechanism in Crowdsensing
Previous Article in Journal
Wi-CAL: A Cross-Scene Human Motion Recognition Method Based on Domain Adaptation in a Wi-Fi Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Air Combat Maneuver Strategy Algorithm Based on Two-Layer Game Decision-Making and Distributed Double Game Trees MCTS under Uncertain Information

1
Aviation Engineering School, Air Force Engineering University, Xi’an 710038, China
2
Air Technical Sergeant Academy, Air Force Engineering University, Xinyang 464000, China
3
School of Computer Science, Shaanxi Normal University, Xi’an 710119, China
4
Unmanned System Research Institute, Northwestern Polytechnical University, Xi’an 710072, China
*
Authors to whom correspondence should be addressed.
Electronics 2022, 11(16), 2608; https://doi.org/10.3390/electronics11162608
Submission received: 10 July 2022 / Revised: 30 July 2022 / Accepted: 1 August 2022 / Published: 20 August 2022
(This article belongs to the Section Systems & Control Engineering)

Abstract

:
In this paper, a model for maneuver decisions in air combat is established based on position situation information, the performance of the fighter, the threat of combat intention, and the multi-fighter collaboration effect. Additionally, a two-layer game decision algorithm based on the double game tree distributed Monte Carlo search strategy is proposed, and the operational rules of interval numbers and possibility degree comparison rules are adopted to solve the designed method. The experiment results show that the model and algorithm are effective in their intended purpose. The two-layer game decision-making and distributed double game tree MCTS can precut the huge game tree strategy space and quickly identify the optimal air combat game decision scheme, which improves the efficiency of strategy searches. By compared experiments, it was found that the proposed algorithm can improve the performance of air combat.

1. Introduction

In recent years, the decision problem of close-range air combat has become a hot topic. Close-range air combat is a game process involving a target; it is characterized by high-level dynamics and intense confrontation [1,2]. Fighters make maneuver decisions according to rapidly changing information in air combat situations. Since an air combat environment often contains complex factors such as uncertainty and incompleteness, it poses severe challenges for air combat decision making. Moreover, fighters are always in high-speed motion and lots of players are involved in the game. A lack of a definite pattern or method for soldiers, as well as incomplete combat rules, will lead to explosive growth in the game solution space. The purpose of air combat maneuver decisions is to obtain an optimal air combat situation, i.e., to threaten the target fighter and carry out effective attacks, to get rid of the lock of a target fighter and get out of danger. Therefore, every maneuver will directly affect the development of the process.
Despite the numerous achievements in air combat maneuver decision-making in a battlefield environment, few studies have focused on uncertain battlefield environment information. Addressing the decision problem of air combat maneuvers under the condition of certain battlefield environmental information, Dr. Luo and Dr. Meng [3] used a multi-state transition Markov network to construct a maneuvering decision network, which met the real-time requirements of air combat decision-making but did not use network parameters for learning. An iterative algorithm for online integral strategy combining approximate dynamic programming and a zero-sum game was proposed in [4]. An algorithm combining game theory and a deep reinforcement learning algorithm to study the maneuvering decision problem of close-range air combat were proposed in [5,6]. The above models are mainly game methods applying the strategies of two or more sides, which can be categorized as matrix games and differential games. Their main characteristics are to factor an opponent’s strategy into the analysis, and to emphasize the antagonism between two or more sides. In addition, there are methods that consider unilateral optimization, focusing on the optimization of their own strategies, as opposed to predicting and analyzing opponents’ strategies; these mainly include intelligent methods, guidance laws, and expert systems methods, etc. An evolutionary expert system tree method to study air combat decision-making is proposed in [7]; it solves the problem associated with the inability of traditional expert system methods to cope with unexpected situations. The method described in [8] uses the deep reinforcement learning method to solve the air combat maneuver decision-making problem. Prof. Du [9] proposed a maneuvering decision model that combines multi-objective optimization and reinforcement learning. Dr. Xu [10] combined the characteristics of a missile attack zone and the basic flight maneuver (BFM) method to study the 1vs1 autonomous air combat decision-making problem. An air combat maneuver method based on BFM was also introduced in [11].
It is usually difficult for both sides to obtain accurate information about each other on the battlefield. Therefore, it is necessary not only to study the problem of deterministic information-based air combat games, but also to solve the problems of incomplete information in air combat. Aiming at the problems of uncertain battlefield information, Chen Xia and Liu Min [12] studied an offensive and defensive setting involving unmanned aerial vehicle (UAV) air combat. They also modeled the payoff function and combined it with particle swarm optimization (PSO), proposing the Nash equilibrium solution method under uncertain information conditions. The authors of [13] established an intuitive fuzzy game model for UAV air combat maneuvering, and proposed a nonlinear programming method for solving Nash equilibrium which addresses the problems of UAV air combat maneuvering decision-making under uncertain environments. The authors of [14] attempted to solve the Nash equilibrium of non-cooperative games under an uncertain battlefield information environment, and analyzed the influence of different combat factors on the outcomes of air-to-air confrontations. An information supplement method based on Bayesian theory and an information reduction method based on rough and light set theory were adopted to process uncertain air combat information. These processes improved the efficiency of autonomous decision-making in air combat [15]. Additionally, a belief state based on MCTS was proposed by Dr. Xu to tackle their use in problems with imperfect information [16].
Motived by the above discussion, in our paper, a close-range air combat decision process under uncertain interval information conditions is modeled as a two-layer game decision-making solution. Additionally, a double game tree distributed Monte Carlo search algorithm is proposed to determine the optimal game strategy scheme. Both parties establish a game tree to make synchronous decisions. Due to the nature of the applied multi-fighter, multi-round, continuous air combat game, there are a large number of players, and the air combat has no fixed strategies, i.e., the combat styles are complex and diverse. As a result, the maneuvering decision-making solution space dramatically increases in complexity. Traditional maneuvering decision-making methods are difficult to simulate and are unable to comprehensively predict situations in air combat games. It is therefore necessary to find a more efficient solution. The MCTS algorithm introduces the idea of reinforcement learning based on trial sampling, and simulates and evaluates the air combat process through the iterative process of the algorithm, which is equivalent to filtering and optimizing the policy search space. This approach is suitable for solving problems with huge decision spaces. Therefore, the algorithm can grasp and predict air combat game situations more and more accurately in a continuous simulation game process, and as such, can accurately grasp and predict trends in the enemy’s strategy as much as possible. As a result, it can determine the best maneuvering strategy scheme in current and future situations.

2. Modeling of Two-Layer Game Maneuver Decision Problem

Because of limited intelligence information and the influence of electromagnetic interference on the battlefield, sensors may be restricted in their ability to identify targets or to determine the maximum detection range of radar and the ranges of certain weapons. As such, partial information is likely to be obtained only within a certain range. Therefore, it is necessary to study the uncertain information air combat game strategy algorithm for incomplete information. In order to improve the nature of maneuver decision-making, the analysis and processing of uncertain information is critical.
Multi-fighter air combat maneuver decision-making needs to solve the problems of who the combat target is, how to fly, and how to maneuver, which actually involve target allocation, coordinated tactics, and an action selection strategy. Based on the accurate modeling of air combat games and the determination of a fighter’s intentions, this paper simulates operational decision-making thinking according to the idea of simplifying complex problems and divides maneuver decisions into two levels: target allocation and action selection. Target allocation decisions are made on the first level and air combat maneuvering decisions on the second, after identifying a target. Target allocation decision-making mainly determines the target allocation strategy in one-to-one or many-to-one situations, and solves the problem of who the combat target is and with whom to coordinate the operation. Action selection involves choosing a suitable maneuver strategy based on the target allocation plan, serving mainly to solve the problems of how to fly and maneuver. The cooperative nature of the target allocation decision-making layer mainly determines the cooperative allocation scheme according to the performance threat index and intention threat index of the whole system and the cooperative effect generated by the game payoffs of each fighter in the system. The cooperative tactics of the action selection decision-making layer are mainly determined by the cooperative performance threat index of opponent fighters and the game payoff value of the maneuver decision-making.
First, a dominant function model should be established to assess the situation and the effect of multi-fighter coordination, as well as to make multi-fighter air combat target allocation and maneuver decisions. Many factors affect air combat situations. The process described in this paper makes air combat target allocation decisions based on angle and distance factors, the performance threat and combat intention threat indexes, as well as changes of the total threat index due to multi-fighter cooperation.

2.1. First Layer Target Allocation Decision Model

2.1.1. The Performance Dominant Function

The fighter performance dominant function needs to comprehensively consider factors such as maneuverability, detection ability, firepower, and electronic countermeasures ability. Suppose the maximum detection range of the radar of the i-th fighter of N is d i r a d e r = [ d i r a d e r min , d i r a d e r max ] , its maximum range of attack is d i m i s s l e = [ d i m i s s l e min , d i m i s s l e max ] , and its electronic countermeasure capability coefficient is e i e c m = [ e i e c m min , e i e c m max ] . At the same time, suppose the maximum detection range of the radar of the j-th fighter of M is d j r a d e r = [ d j r a d e r min , d j r a d e r max ] , its maximum range of attack is d j m i s s l e = [ d j m i s s l e min , d j m i s s l e max ] , and its electronic countermeasure capability coefficient is e j e c m = [ e j e c m min , e j e c m max ] . Since the fighter can perform various maneuvers quickly, it can be assumed that the radar on the fighter is omnidirectional and the fire attack angle is 360 degrees. h i is the air combat performance advantage index of the i-th fighter of N and h j is the air combat performance advantage index of the j-th fighter of M. h i and h j are calculated in the same way. Based on [17], the performance dominance function of fighter i is established as follows:
S p = h i / max ( h i , h j )
h i = [ ln ( d i r a d e r + 1 ) + ln ( d i m i s s l e + 1 ) ] e i e c m

2.1.2. The Angular Dominant Function

The influence of the angle between two fighters on an attack situation is called the angular dominant function. Suppose A is an attack fighter and D is the target fighter (Figure 1). The target line d A D is defined as the line between A and D. ϕ A D is the angle between the velocity vector of A and the target line. q A D is the angle between the velocity vector of D and the target line. Then, the value of angular dominant function may be defined as follows:
S a = 1 | ϕ A D | + | q A D | π

2.1.3. The Distance Dominant Function

The influence of the distance between two fighters on an attack situation is called the distance dominant function. The distance dominant function represents the distance advantage of the two fighters and the influence of the distance after an air combat decision. d A D is the distance between the i-th fighter of N and the j-th fighter of M. Generally, the maximum detection range of radar is larger than the range of missiles. Suppose that the distance advantages are 0 and 1 for the cases of outside the radar detection range and inside the missile range, respectively. When the distance is between the maximum detection range of the radar and the range of the missile, the distance advantage value increases with the decrease of the distance between the two fighters. The value of the distance dominant function may be calculated as follows:
S d = { 0 , d A D > d i r a d e r max ( d i r a d e r d A D ) ( d i r a d e r d i m i s s l e ) , o t h e r s 1 , d A D < d i m i s s l e min

2.1.4. The Performance Threat Index Dominant Function

When multiple fighters cooperate in air combat, the synergistic effect is reflected in that the total threat index of one side decreases while that of the other side increases. Therefore, after the targets have been allocated, the performance threat index dominant function of the i-th fighter of N relative to the j-th fighter of M may be calculated as follows:
P t h i j = { 0 d A D > d j r a d e r max ( ( ( h j h i ) / max ( h ) ) k d A D + 1 ) / ( 1 k d A D ) o t h e r s 1 d A D d j r a d e r max d i r a d e r min
when the distance between the two fighters is larger than the maximum detection range of the opponent’s radar, the threat index is 0. The parameter k is the threat index; k < 0 suggests that the threat index decreases with an increase of distance. The threat index dominant function P t h j i of the j-th fighter of M relative to the i-th fighter of N is calculated analogously.

2.1.5. The Combat Intention Threat Value

In an air combat environment, both sides will take a series of combat actions to achieve their combat intentions. Combat actions are achieved through a series of maneuvers, and the execution of such maneuvers will lead to changes in the fighter status [18]. Therefore, the achievement of combat intention is ultimately manifested as the changes of combat status parameters, while different combat intentions correspond to different change rules of status parameters. In this paper, intention space in close air combat is defined as four types of combat intention strategies F = { penetration , attack , cover , withdraw } , where cover includes reconnaissance, jamming, maneuvering cover, and feint, and penetration includes low- and high-altitude penetration. According to [19,20,21], the mapping relationship between a fighter’s combat intention and the characteristic status parameters can be obtained. Table 1, Table 2 and Table 3 list the corresponding relationship between the fighter’s characteristic status information regarding altitude, course angle, maneuver type and combat intention. The intention threat value as showed in Table 4.

2.1.6. The Combat Intention Threat Index Dominant Function

To estimate the threat index of the combat intentions of both sides, the enemy’s combat intentions should be identified first. According to the estimated threat index of combat intention and the influence of combat intention on combat effectiveness, game countermeasure strategies are adopted to achieve the optimal combat effectiveness. In air combat, the enemy usually hides their true combat intentions as far as possible, which leads to the concealment of the status information obtained by the opponent at a given moment. Moreover, the target fighters’ combat intentions are implemented through a series of combat actions and maneuvers. The real intention is usually hidden in the dynamic and time-changing status information, so the combat intention should be identified from the target status fighter’s information from several consecutive moments. According to the method described in [22], we can extract a fighter’s characteristic information from time series and dynamically changing air combat situation data. On this basis, we can map the relationship between combat intention and the characteristic status parameters. Then, we can use the Long Short Term Memory (LSTM) neural network to learn the fighter’s time series characteristics and identify the target fighter’s combat intention. After determining the enemy’s combat intention, the threat index of the i-th fighter of N relative to the j-th fighter of M is estimated using the following formula. The threat index, P f j i , of the j-th fighter of M relative to the i-th fighter of N is calculated analogously.
P f i j = 0.5 + S F i S F j 2 max ( S F )

2.1.7. The Total Threat Index of Multi-Fighter Coordination

The change in the collaborative total threat index is due to the influence of multi-fighter coordinated air combat on the combat situation. The total threat index of multi-fighter coordination mainly manifests in changes in the global performance threat index and combat intention threat index after multi-fighter coordination. Both the total performance and the total intention threat index of the whole system should consider the threat of each combat unit to the each of the opponent’s combat units. These reflect the overall synergistic threat effect and performance of both sides. Therefore, when multiple fighters cooperate in air combat, the total performance threat index and the total intention threat index of both sides may be respectively calculated by the following equations.
The cooperative performance threat index of N and the cooperative performance threat index of M are calculated using Equations (7) and (8):
P T H n = 1 i = 0 n j = 0 m ( 1 P t h i j ) ,   i = 1 , 2 , , n ; j = 1 , 2 , , m
P T H m = 1 i = 0 n j = 0 m ( 1 P t h j i ) ,   i = 1 , 2 , , n ; j = 1 , 2 , , m
Similarly, the cooperative intention threat index of N and the cooperative intention threat index of M are calculated as follows:
P F n = 1 i = 0 n j = 0 m ( 1 P f i j ) ,   i = 1 , 2 , , n ; j = 1 , 2 , , m
P F m = 1 i = 0 n j = 0 m ( 1 P f j i ) ,   i = 1 , 2 , , n ; j = 1 , 2 , , m

2.1.8. The Target Allocation Decision Function

In the first layer, the decisions of target allocation and combat intention are completed. In the target allocation decision stage, it is assumed that the following constraints are met: a fighter from N should attack at least one fighter from M, and only one fighter from M can be attacked in a discrete short interval. Four factors are considered to make the target allocation decision. Firstly, the performance and threat factors of the opponent’s fighters, such as their maneuverability, electronic countermeasure capability, and detection and firepower capabilities, are assessed. Secondly, the positional factors of the opponent’s fighters, i.e., the angle and distance are considered. Thirdly, the threat factors of the combat intentions of both sides are considered after determining the opponent’s combat intentions. Fourth, the effect factors of the threat index are updated based on multi-fighter collaboration. Since height dominance can be derived from the distance and angle, it is not considered in this paper. Speed is considered in the basic maneuver library, and the decision to either accelerate or decelerate is made. Taking an N fighter as an example, the comprehensive dominant function of the i-th N fighter relative to the j-th M fighter is constructed as S a i j , S d i j , S F i j and S p i j . These values represent the angle dominant function, distance dominant function, combat intention threat value, and the performance dominant function of the i-th N fighter relative to the j-th M fighter in the current situation. These are all normalized values.
In conclusion, the total dominant function of multi-fighter cooperative air combat in the target allocation scheme can be established. These four factors, as well as the influence of the collaborative performance threat index and collaborative intention threat index of the system globally are mainly comprehensively in the target allocation stage. The optimal payoff of collaborative target allocation is determined on this basis. Therefore, the total payoff functions of N and M, respectively, can be calculated as follows:
U n = P T H n i = 0 n j = 0 m ( λ 1 S a i j + λ 2 S d i j + λ 3 S p i j ) + P F n i = 0 n j = 0 m ( λ 4 S F i j )
U m = P T H m i = 0 n j = 0 m ( λ 1 S a j i + λ 2 S d j i + λ 3 S p j i ) + P F m i = 0 n j = 0 m ( λ 4 S F j i )
where λ 1 , λ 2 , λ 3 , λ 4 are weight coefficients and λ 1 + λ 2 + λ 3 + λ 4 = 1 .
Both sides in the air combat game always try to maximize their respective payoff functions. Therefore, the target allocation decision function selects the scheme with the optimal total dominant function value from many options, i.e., g n or g m .

2.2. The Second Layer Maneuver Decision Model

2.2.1. Basic Maneuver Library

If an air combat game is considered as a game of chess in a three-dimensional space, then the game strategies can be considered to be based upon the various positions on the chessboard and the choice of different game strategies, which will result in different payoffs. According to [1], the maneuvers of a fighter in three-dimensional space can be divided into 11 categories. As shown in Figure 2, these maneuvers include: 1. direct flight without any maneuver, 2. Climbing, 3. Diving, 4. Turning left, 5. Climbing to the left, 6. Diving to the left, 7. Turning right, 8. Climbing to the right, 9. Diving to the right, 10. Accelerating, and 11. Decelerating. The inclination angle can reach −60°, 0°, and 60°, corresponding to climbing, flying without any maneuver, and diving, respectively, and the roll angle can reach −30°, 0°, and 30°, corresponding to turning left, flying without any maneuver, and turning right, respectively. The basic maneuver library can combine most tactical maneuvers in air combat, and different combinations of sequences correspond to different tactical maneuvers [1,9].
In the existing literature, the relationship between the current maneuver and the next one is often ignored. For example, if a fighter chooses to dive to the left, after performing this action, only maneuvers 3 and 4 should be possible in the next maneuver decision, because the fighter cannot, e.g., turn or dive to the right in the next decision-making cycle. Therefore, for close-range air combat decision-making, the decision result of the current maneuver affects the range of possible maneuvers in the next decision cycle. Accordingly, this paper establishes a constraint relationship between the current maneuver decision and the next maneuver decision library, as shown in Table 5, in order to prevent the algorithm from searching for the maneuvers that are difficult or impractical.

2.2.2. The Maneuver Decision Function

In the second layer, air combat maneuver decisions are made. Decisions regarding air combat maneuvers consider three factors: the first is the distance factor between the opposing fighters; the second is the angle factor of the two fighters; and the third is the performance threat effect factor of multi-fighter cooperation.
U i = ( 1 j = 1 m ( 1 P t h j i ) ) ( θ 1 S a i j + θ 2 S d i j )
U j = ( 1 i = 1 n ( 1 P t h i j ) ) ( θ 1 S a j i + θ 2 S d j i )
where U i and U j are the payoffs of a maneuver decision, θ 1 , θ 2 are weight coefficients; θ 1 + θ 2 = 1 . S a i j , S a j i , S d i j , S d j i , P t h j i , P t h i j are different from the decision process of target allocation. At this point in the process, they are the angle dominant function, distance dominant function and performance dominant function at the next moment situation. These values are all normalized.

3. Interval Number Correlation Methods

3.1. The Operational Rules of Interval Numbers

x is called an interval number [23,24], if x = [ x , x + ] = { θ | x θ x + , x , x + R } , where R is a set of real numbers, x is the lower limit value of x , and x + is the upper limit value of x .
If x = [ x , x + ] and y = [ y , y + ] are two interval numbers, then their operational rules are defined as follows:
(1) Addition
x + y = [ x , x + ] + [ y , y + ] = [ x + y , x + + y + ] .
(2) Subtraction
x y = [ x , x + ] [ y , y + ] = [ x y + , x + y ] ,
and x = [ x , x + ] = [ x + , x ] , x x = [ x , x + ] [ x , x + ] = [ x x + , x + x ] .
(3) Multiplication
x y = [ x , x + ] [ y , y + ] = [ x y , x + y + ] , and λ x = [ λ x , λ x + ] , where λ is a positive real number.
(4) Division
x y = [ x , x + ] [ y , y + ] = [ x y + , x + y ] , where y > 0
(5) Logarithm
log c b = [ log c b , log c b + ] , where c > 0
In the first layer game decision, N has mn kinds of allocation schemes and m side has nm kinds of allocation schemes. According to the operation rules of interval numbers, due to the uncertainty of the information interval, each value obtained by the payoff function is an interval number. Therefore, the payoff value of the game can be written as: U n = [ U n min , U n max ] and U m = [ U m min , U m max ] .
Similarly, in the second layer game decision, the payoff value can be written as: U i = [ U i min , U i max ] and U j = [ U j min , U j max ] .

3.2. The Solving Game Method Based on the Possibility Degree

According to [25], two interval numbers U 1 = [ U 1 min , U 1 max ] and U 2 = [ U 2 min , U 2 max ] can be compared by the possibility degree. Namely, the definition of the possibility degree that U 1 = [ U 1 min , U 1 max ] is superior to U 2 = [ U 2 min , U 2 max ] is regarded as P ( U 1 U 2 ) :
Accordingly, the definition of the possibility degree that U 2 = [ U 2 min , U 2 max ] is superior to U 1 = [ U 1 min , U 1 max ] is regarded as:
P ( U 1 U 2 ) = { 1 U 1 min U 2 max U 1 max U 2 max U 1 max U 1 min + U 2 max U 1 min U 1 max U 1 min U 1 min U 2 min U 2 max U 2 min + 0.5 U 2 max U 1 min U 1 max U 1 min U 2 max U 1 min U 2 max U 2 min U 2 min U 1 min U 2 max U 1 max U 1 max U 2 max U 1 max U 1 min + 0.5 U 2 max U 2 min U 1 max U 1 min U 1 min U 2 min U 2 max U 1 max
P ( U 2 U 1 ) = { 0 U 1 min U 2 max 0.5 U 2 max U 1 min U 1 max U 1 min U 2 max U 1 min U 2 max U 2 min U 2 min U 1 min U 2 max U 1 max U 2 min U 1 min U 1 max U 1 min + 0.5 U 2 max U 2 min U 1 max U 1 min U 1 min U 2 min U 2 max U 1 max
Using Equations (15) and (16), we can obtain the possibility degree matrix using and comparing every two interval numbers as follow: P f = U 1 U 2 U h U 1 U 2 U h [ P 12 P 1 h P 21 P 2 h P h 1 P h 2 ] .
U h is the payoff value of the h-th scheme and P i j is the possibility degree value of U i U j , P j i = 1 P i j , i , j { 1 , 2 , , h } . When i = j , P i j = P i i = ( ) represents no comparison. The P i j value indicates the level of U j U i . If P i j = 1 , U i is definitely better than U j . Conversely, if P i j = 0 , U j is definitely better than U i . Thus, the matrix P f is a complementary judgment matrix.
By comparing all the strategy combination schemes, namely, comparing the interval numbers of the payoff in pairs, we can obtain the possibility degree matrix. Then, by employing the improved chaotic particle swarm algorithm and sorting the strategy combination schemes, we can obtain the optimal scheme.

4. The Solving Algorithm Based on Two-Layer Game Decision-Making and Distributed MCTS

4.1. Distributed Monte Carlo Search Algorithm Based on Double Game Trees

Both sides establish their own game trees in the air combat game process. The whole process is described as a path from the root to the leaf of a multi-way game tree because the game process chooses a target fighter and a maneuver strategy for both sides. After the target allocation of the first layer decision, the target of each fighter is determined. This is equivalent to pre-pruning the game tree. Therefore, in the second layer decision-making, we consider the maneuvering decision under the condition that the target fighter has been determined. Compared with a one-time decision, the game strategy search space is greatly reduced, and the searching efficiency is improved.
The MCTS algorithm is used for game strategy selection. The algorithm can maintain a balance between exploitation and exploration, i.e., it aims to ensure the best rewards from past decisions and to obtain greater rewards in the future. It can master and predict game strategies more and more accurately in the current and future situations. The MCTS algorithm is a method by which to establish a search tree to find the best decision based on the decision space of sampling in a specific field. In summary, the air combat situation is mainly determined by the distance and angle between the two fighters. It includes four steps: selection, expansion, simulation, and back-propagation [26]. In order to search the air combat game node, in this paper, the MCTS frame adopted a modified Upper Confidence Bound (UCB) algorithm.
Step 1: Selection. Suppose the root node is the attacking side. The UCB value is calculated by Equation (17). The node with the maximum UCB value will be selected as the subsequent node.
U C B = max { U i ¯ + C 2 ln n n j }
where U i ¯ is a normalized value of the average payoff of fighter i in the past, i.e., t 1 , n is the total number of times which a game strategy is selected, and n j is the number of times which the j game strategy was selected. The regulatory factor is C; it is used to adjust the balance between the return value and the unexplored node.
Step 2: Expansion. If the MCTS algorithm does not reach the termination condition for the maximum number of iterations or leaf nodes, then it can continue to select the game strategy is a downward process. If the MCTS algorithm reaches the leaf node, then it needs to expand the game strategy as a new node; as a result, the new game strategy will be added to the Monte Carlo tree.
Step 3: Simulation. Since the new node has not yet been visited, the times of visits and the times of wins are both 0. A simulation is then carried out on the node according to the default random strategy.
Step 4: Back-propagation. Suppose side N or side M wins. Then, the times of visits and of wins of every node on the simulation path will both be updated to 1, that is, 1 will be added to those values on all the father nodes.
In the traditional Monte Carlo search method, both sides of the game play sequential games on the same game tree and make decisions in turn [16,26]. In this way, both sides will make game decisions in chronological order. For example, when t = 1, the decision is made by N, but when t = 2, the decision is made by M; subsequently, when t = 3, the decision is made by N, and so on. Thus, the game is a repetitive process, as shown on the left of Figure 3. In sequential game decision-making, the player who makes the strategy choice and takes the action first usually occupies an advantageous position. The other player must choose its own strategy on the basis of the opponent’s action strategy. The most important characteristic of the air combat game, in contrast to a game of chess, is that both sides make decisions simultaneously under the current game situation environment.
Due to the different mission characteristics and the high real-time requirements of the air combat game, it is not suitable to directly use a traditional Monte Carlo tree search. In the multi-fighter and multi-round continuous air combat game, not only the coordination of multi-fighter combat, but also the influence of the historical game strategies of both sides should be taken into account, and the decisions of both sides should be made at the same time rather than in turn in order to avoid problems such as lagging decision information. Therefore, a distributed double game tree Monte Carlo search algorithm was designed in this paper for maneuvering decisions; this represents a novel MCTS method. As shown on the right of Figure 3, both sides establish a game tree. At t = 1, t = 2, …, t = n, both sides make synchronous decisions in their respective game trees at every moment, and there is no need to wait for the opponent to make a decision before taking turns. Meanwhile, the dominant value and decision function are calculated according to the real-time updated situation information. In this way, the battlefield situation can be perceived in real time, the opponent’s strategies can be applied when making decisions, and the optimal game strategy scheme can be obtained.

4.2. The Algorithm Flow

(1) Initialization: set the values of the parameters
(2) Repeat
(3) Determine the current nodes of the game trees and identify the opponent’s intentions according the situation information;
(4) Calculate the various target allocation schemes, U n or U m , at the current moment;
(5) The first layer of game decision: select the optimal scheme, i.e., g n or g m ;
(6) Calculate all possible maneuver decision schemes, U i or U j , at the next moment;
(7) The second layer of game decision: select the maneuver decision scheme using the UCB algorithm and MCTS;
(8) Update the situation information of the game decision trees of both sides;
(9) t = t + 1;
(10) Repeat until the maximum number of iterations, or min | U n U m | t h r e s h o l d , has been reached.

5. Experiment and Results

A 2vs2 air combat was simulated in three-dimensional space in this experiment. The initialization in the simulation was as follows. The initial coordinates of the N and M fighters were (1000, 500, 6100), (1500, 800, 4000) and (5000, 1000, 3200), (4500, 1500, 6500), respectively. The initial yaw angle, track inclination angle and roll angle were (1, 0, 20), (1, 0, 0), (1, 0, 0) and (1, 0, −15), respectively. The d i m i s s l e of N were [12, 16] and [17, 20], and the d j m i s s l e of M were [10, 15] and [17, 21]. The d i r a d e r of N were [500, 800] and [800, 1000], and the d j r a d e r of M were [600, 800] and [700, 1000]. The e i e c m of N were [0.5, 0.8] and [0.6, 0.9], and the e j e c m of M were [0.7, 0.9] and [0.6, 0.9]. k = −1/400 and C = 0.1. λ 1 , λ 2 , λ 3 , λ 4 were set as 0.2, 0.3, 0.3 and 0.2, and θ 1 and θ 2 were set as 0.6 and 0.4. The termination condition of the algorithm was set as follows: reaching the maximum number of decision iterations (21), or the difference between the two payoff function values reaching the threshold value of 0.9.
The simulation results are shown in Figure 4, Figure 5, Figure 6 and Figure 7 and Table 6. In this simulation, it was assumed that the basic maneuver libraries of both sides were the same, and the same decision algorithm was adopted. In Figure 5, Figure 6 and Figure 7, the average values of the interval results were used for plotting. Table 6 lists the maneuver decisions for all fighters. Figure 4 shows the air combat flight trajectories of the four fighters. Figure 4a,b present the three- and two-dimensional plane projection graphs of the flight trajectories, respectively. Each curve represents the combat flight trajectory of a fighter according to our algorithm. The total payoff function change curves of N and M, corresponding to the decision in the first layer game, are shown in Figure 5. Figure 5a presents the total payoff function of decisions taken by N, and Figure 5b those of M. The payoff function change curves of the four fighters corresponding to the decisions in the second layer game are shown in Figure 6 and Figure 7. Figure 6a,b presents the changes of payoff function of fighters 1 and 2 of N in the whole decision-making process, respectively, while Figure 7a,b presents those of fighters 1 and 2 of M.
From the experimental results, it can be seen that the algorithm proposed in this paper is effective. Under the condition that the fighter performance on both sides is closely matched, combat fighters can correctly identify the combat intention and movement state of the opponent, forecast the trajectory of the opponent’s fighter and accurately predict the battlefield situation, and quickly make the optimal decision. It can be seen from Figure 4a,b that the two sides struggle due to continuous target allocation and maneuvering decisions, and the flight trajectories show a highly staggered pattern. It is obvious that the situation changes rapidly and becomes complicated. For example, the target allocation of the No. 1 fighter of N changed at t = 11, while the target allocation of the other fighters remained unchanged. From Figure 5a,b, it can be seen that the total payoff values of both sides and the payoff values of each fighter fluctuated constantly, indicating that the situation changed frequently for both sides, and that there were situations in which an advantage turned into disadvantage, or vice versa. This conforms to actual combat situations. As can be seen from the results, when t = 1, the initial total payoff values of N and M were 0.6659 and −0.0375, respectively, at the target allocation stage; the payoff values of fighters 1 and 2 of N were 0.2145 and 0.1504, respectively; and the payoff values of fighters 1 and 2 of M were −0.1392 and −0.2145 respectively. However, when t = 21, the total payoff values of N and M were 0.6809 and 0.6829, respectively, at the target allocation stage; the payoff values of fighters 1 and 2 of N were 0.3869 and 0.1179, respectively; and the payoff values of fighters 1 and 2 of M were 0.1272 and 0.3827, respectively. It can be concluded that M was at a disadvantage in the initial situation, but by the end, both sides were basically balanced. The strategy space for air combat games is very large, and it is impossible for the opponent to adopt the optimal solution every time. Even if the opponent uses the MCTS method to determine the optimal solution, if the best decision-making time is achieved and the right decision-making solution is used, it is possible to turn defeat into victory.
In order to verify the performance of the algorithm, we performed four groups of comparative experiments. In the four groups, N adopted the algorithm designed in this paper, while M adopted four different algorithms, namely, the algorithm described in this paper, the traditional MCTS algorithm, the angle and distance optimal algorithm, and the distance optimal algorithm. All four algorithms need to meet the maneuver constraints in Table 5 when making decisions. According to the design idea of the algorithm proposed in this paper, the traditional MCTS algorithm will select and decide the optimal combat strategy for M from the first layer decision-making schemes at time t (e.g., Equation (12)) and combine the second layer decision-making schemes at time t + 1 (e.g., Equation (14)). However, in the decision-making process, the traditional MCTS algorithm does not make two-layer decisions, but rather, makes several decisions simultaneously. The third group attempts to optimize the angle and distance to make decisions. That is, according to Equations (3) and (4), the maneuvering strategy with the maximum benefit is selected as the combat strategy for M based on the optimization of angle and distance at time t and angle and distance at time t + 1. The fourth group applies the optimal distance to make decisions. That is, the maneuvering strategy with the maximum benefit is selected as the combat strategy of M based on the comprehensive optimization of distance at time t and time t + 1.
In the algorithm proposed in this paper, the difference in the first layer decision payoff between the two sides is noted as U n m = U n U m , while the difference in the second layer decision payoff is noted as U N M = U N U M = i = 1 n U i j = 1 m U j . The four groups of experiments were carried out 20 times each, and the U n m and U N M values of the four groups of experiments were compared. The results are shown in Table 7 and Table 8. As can be seen from the experimental results, when U n m > 0 and U N M > 0, the N side has a better payoff. Conversely, when U n m < 0 and U N M < 0, N has a poor payoff and is at a disadvantage. When U n m = 0 and U N M = 0, both sides have the same payoff and the air combat situation is balanced. In the experiments, M used four different algorithms for air combat, while the N algorithm remained unchanged. When the probability of U n m > 0 and the probability of U N M > 0 are greater, it was difficult for M to gain an advantage, and the benefits that M side were smaller. Otherwise, the air combat countermeasure performance of the algorithm was better. Similarly, when the probability of U n m < 0 and the probability of U N M < 0 were greater, it was likely that M would gain an advantage. The experimental statistical results showed that among the four algorithms, the distance optimal algorithm model was the simplest but the air combat performance was the worst. Compared with the other three algorithms, the algorithm proposed in this paper had the best air combat performance and air combat effect; the performance of the angle and distance optimal algorithm was close to that of the traditional MCTS algorithm, but the performance of the latter was slightly better than that of the angle and distance optimal algorithm.

6. Conclusions

Addressing the maneuver decision-making problem of a multi-fighter air combat situation under an uncertain information condition, this paper established a two-layer game decision-making algorithm using distributed MCTS with double game trees, and adapted the operational rules of interval numbers and the solving game method based on the degree of probability of determining the optimal game strategy scheme. The experiment results showed that the model and algorithm were effective and the fighters in the air combat could accurately predict their opponent’s combat intentions and trajectories and the battlefield situation. Based on repeated experiments, the following conclusions may be drawn. (1) The situation changes frequently in close range air combat, and the strategy space for air combat games is very large. Even if the opponent used the MCTS method to find the optimal solution, as long as the best decision-making time was achieved and the right solution was used, it was possible to turn defeat into victory. (2) By comparing the experiments, it was found that the proposed algorithm could improve the air combat performance. A great deal of research has been undertaken on air combat maneuver decision-making, and many achievements have been made. However, most of them focus on the problem of one-to-one air combat decision-making. Starting from the multi-aircraft air combat maneuvering decision-making problem, this paper simulated the human decision-making process, decomposed maneuvering decision-making into two-layer game problems comprising target allocation and maneuvering decisions, and considered a situation of uncertain information in the decision-making process. Moreover, the air combat maneuver decision-making problem was regarded as a simultaneous decision-making problem for both sides, which improved the traditional maneuver decision-making method for both sides and more closer resembled an actual combat situation. The research in this paper provides the theoretical basis of the algorithm for follow-up research on simulations of air combat confrontations in real scenarios.

Author Contributions

Conceptualization, Q.L. and F.W.; methodology, Q.L. and Z.L.; validation, F.W., Q.L. and Z.L.; investigation and data curation W.Y.; writing—original draft preparation, Q.L.; writing—review and editing, Z.L.; project administration Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China and the Nature Science Foundation of Shaanxi Province, China, grant number 62106284 and 2021JQ-370, respectively, and the Xi’an Youth Talent Promotion Plan under grant No. 095920201309.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ji, H.M.; Yu, M.J.; Qiao, X.H.; Yang, H.Y.; Zhang, S.W. Application of the improved BAS-TIMS algorithm in air combat maneuver decision. J. Natl. Univ. Def. Technol. 2020, 42, 123–133. (In Chinese) [Google Scholar]
  2. Zhang, H.P.; Huang, C.Q.; Xuan, Y.B.; Tang, S.Q. Maneuver Decision of Autonomous Air Combat of Unmanned Combat Aerial Vehicle Based on Deep Neural Network. Acta Armamentarii 2020, 41, 1613–1622. [Google Scholar]
  3. Luo, Y.Q.; Memg, G.L. Research on UAV Maneuver Decision—Making Method Based on Markov Network. J. Syst. Simul. 2017, 29, 110–116. (In Chinese) [Google Scholar]
  4. Xi, Z.F.; Xu, A.; Kou, Y.X. Decision process of multi-aircraft cooperative air combat maneuver. Syst. Eng. Electron. 2020, 42, 381–389. (In Chinese) [Google Scholar]
  5. Mei, D.; Liu, J.T.; Gao, L. Maneuver decision of air combat based on approximate dynamic programming and zero-sum game. Ordnance Ind. Autom. 2017, 36, 35–39. [Google Scholar]
  6. Li, A.W.; Wang, Z. Close air combat maneuver decision based on deep stochastic game. Syst. Eng. Electron. 2020, 40, 1023–1032. [Google Scholar]
  7. Wang, X.; Wang, W.J.; Song, K.P. UAV air combat decision based on evolutionary expert system tree. Ordnance Ind. Autom. 2019, 38, 48–53. [Google Scholar]
  8. Zhang, X.B.; Liu, G.Q.; Yang, C.J.; Wu, J. Research on air confrontation maneuver decision-making method based on reinforcement learning. Electronics 2018, 7, 279. [Google Scholar] [CrossRef] [Green Version]
  9. Du, H.; Cui, M.; Han, T.; Wei, Z.; Tang, C.; Tian, Y. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning. J. Beijing Univ. Aeronaut. Astronaut. 2018, 44, 2247–2256. (In Chinese) [Google Scholar]
  10. Xu, A.; Gao, C.Q.; Kou, Y.X. Autonomous air combat decision of 1vs1 based on BFM method. Syst. Eng. Electron. 2020, 42, 2513–2519. (In Chinese) [Google Scholar]
  11. Shin, H.M.; Lee, J.H.; Kim, H.G.; Shim, D.H.C. An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp. Sci. Technol. 2018, 72, 305–315. [Google Scholar] [CrossRef]
  12. Chen, X.; Liu, M.; Hu, Y.-X. Study on UAV Offensive /Defensive Game Strategy Based on Uncertain Information. Acta Armamentarii 2012, 33, 1510–1515. [Google Scholar]
  13. Li, S.H.; Ding, Y.; Gao, Z.L. UAV air combat maneuvering decision based on intuitionistic fuzzy game theory. Syst. Eng. Electron. 2019, 41, 1063–1070. [Google Scholar]
  14. Li, Q.N.; Yang, R.N.; Feng, C.; Liu, Z.C. Approach for air-to-air confrontment based on uncertain interval information conditions. J. Syst. Eng. Electron. 2019, 30, 100–109. [Google Scholar]
  15. Li, F.F.; Wang, W.J.; Ma, D.Q. Research on Uncertain Information Processing in Tactical Decision-making. In Proceedings of the 7th China Conference on Command and Control, Beijing, China, 24–26 July 2019. [Google Scholar]
  16. Xu, X. Modeling CGF Tactical Decision Making through Monte Carlo Tree Search. Ph.D. Thesis, National University of Defense Technology, Changsha, China, 30 October 2019. (In Chinese). [Google Scholar]
  17. Zhang, B.C.; Kou, Y.N.; Wu, M. Close-range air combat situation assessment using deep belief network. J. Beijing Univ. Aeronaut. Astronaut. 2017, 43, 1450–1459. (In Chinese) [Google Scholar]
  18. Chen, Z.G.; Wu, X.F. A Novel Multi-Timescales Layered Intention Recognition Method. Appl. Mech. Mater. 2014, 644, 4607–4611. [Google Scholar] [CrossRef]
  19. Chen, H.; Ren, Q.L.; Hua, Y. Fuzzy neural network based tactical intention recognition for sea targets. Syst. Eng. Electron. 2016, 38, 1847–1853. (In Chinese) [Google Scholar]
  20. Ou, W.; Liu, S.J.; He, X.Y. Tactical Intention Recognition Algorithm Based on Encoded Temporal Features. Command Control Simul. 2016, 38, 36–41. [Google Scholar]
  21. Xu, X.M. Research on Situational Awareness of Intelligent Air Combat Based on Machine Learning. Ph.D. Thesis, Air Force Engineering University, Xi’an, China, 2019. (In Chinese). [Google Scholar]
  22. Wu, X.Q.; Li, D.F. A Model for Aerial Target Attacking Intention Judgment Based on Reasoning and Multi-Attribute Decision Making. Electron. Opt. Control 2010, 17, 10–13. [Google Scholar]
  23. Moore, R.E.; Kearfott, R.B.; Cloud, M.J. Introduction to Interval Analysis; Prentice-Hall: Hoboken, NJ, USA, 1966; pp. 235–255. [Google Scholar]
  24. Sengupta, A.; Pal, T.K. On comparing interval numbers. Eur. J. Oper. Res. 2000, 127, 28–43. [Google Scholar] [CrossRef]
  25. Zhang, Q.; Fan, Z.P.; Pan, D.H. A Ranking Approach for Interval Numbers in Uncertain Multiple Attribute Decision Making Problems. Theory Pract. Syst. Eng. 1999, 19, 129–133. [Google Scholar]
  26. Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of Go without human knowledge. Nature 2017, 550, 354–359. [Google Scholar] [CrossRef]
Figure 1. The position relationship between two fighters.
Figure 1. The position relationship between two fighters.
Electronics 11 02608 g001
Figure 2. Diagram of fighter maneuvers.
Figure 2. Diagram of fighter maneuvers.
Electronics 11 02608 g002
Figure 3. Diagram of the MCTS iterative process.
Figure 3. Diagram of the MCTS iterative process.
Electronics 11 02608 g003
Figure 4. The air combat flight trajectories of four fighters. (a) The three-dimensional flight trajectory diagram; (b) The two-dimensional flight trajectory diagram.
Figure 4. The air combat flight trajectories of four fighters. (a) The three-dimensional flight trajectory diagram; (b) The two-dimensional flight trajectory diagram.
Electronics 11 02608 g004
Figure 5. The total payoff function change curves of the N and M sides in the first layer decision. (a) Total payoff of N; (b) Total payoff of M.
Figure 5. The total payoff function change curves of the N and M sides in the first layer decision. (a) Total payoff of N; (b) Total payoff of M.
Electronics 11 02608 g005
Figure 6. The payoff function change curves of the N side in the second layer decision. (a) No.1 fighter’s payoff for N; (b) No.2 fighter’s payoff for N.
Figure 6. The payoff function change curves of the N side in the second layer decision. (a) No.1 fighter’s payoff for N; (b) No.2 fighter’s payoff for N.
Electronics 11 02608 g006
Figure 7. The payoff function change curves of the M side in the second layer decision. (a) No.1 fighter’s payoff for M; (b) No.2 fighter’s payoff for M.
Figure 7. The payoff function change curves of the M side in the second layer decision. (a) No.1 fighter’s payoff for M; (b) No.2 fighter’s payoff for M.
Electronics 11 02608 g007
Table 1. The relationship between altitude and the combat intention of the target fighter.
Table 1. The relationship between altitude and the combat intention of the target fighter.
Altitude/mMost Likely Combat IntentionSecondary Possible Combat Intention
50~200penetrationattack
200~1000coverattack
1000~8000attackcover
8000~10,000coverpenetration
above 10,000penetrationretreat
Table 2. The relationship between course and the combat intention of the target fighter.
Table 2. The relationship between course and the combat intention of the target fighter.
Course Angle/(°)Most Likely Combat IntentionSecondary Possible Combat Intention
0~20penetrationattack
20~60attackpenetration
60~90coverattack
90~180retreatcover
Table 3. The relationship between maneuver type and the combat intention of the target fighter.
Table 3. The relationship between maneuver type and the combat intention of the target fighter.
Maneuver TypeMost Likely Combat IntentionSecondary Possible Combat Intention
8-shapedcover
0-shapedcover
S-shapedcoverpenetration
Climbattackretreat
Diveattackpenetration
Snake maneuvercoverpenetration
Postposition tracking turnattackcover
Horizontal scissor maneuverattackcover
Table 4. Combat intention threat value.
Table 4. Combat intention threat value.
Combat Intention FAttackPenetrationCover
Intention threat value S F 0.80.50.3
Table 5. Maneuver transfer list.
Table 5. Maneuver transfer list.
Case NumberThe Current Maneuver DecisionThe Optional Maneuvers in Next Decision Cycle
1Direct flight without any maneuverAny
2ClimbDirect flight without any maneuver, Climb to the left, Climb to the right
3DiveDirect flight without any maneuver, Dive to the left, Dive to the right
4Turn leftDirect flight without any maneuver, Climb to the left, Dive to the left
5Climb to the leftClimb, Turn left
6Dive to the leftDive, Turn left
7Turn rightDirect flight without any maneuver, Climb to the right, Dive to the right
8Climb to the rightClimb, Turn right
9Dive to the rightDive, Turn right
10AccelerateDirect flight without any maneuver, Accelerate
11DecelerateDirect flight without any maneuver, Decelerate
Table 6. Results of maneuver decisions.
Table 6. Results of maneuver decisions.
IterationsNo.1 Fighter’s
Maneuver for N
No.2 Fighter’s
Maneuver for N
No.1 Fighter’s
Maneuver for M
No.1 Fighter’s
Maneuver for M
01111
11123
21756
310154
41154
51155
671044
79146
89766
97864
107744
111111
1211210
134722
141715
154834
165866
175766
184966
194944
206754
216744
Table 7. Probability of U n m compared to 0.
Table 7. Probability of U n m compared to 0.
U n m The Proposed AlgorithmTraditional MCTS
Algorithm
Angle and Distance
Optimal Algorithm
Distance Optimal
Algorithm
>050%68%75%84%
=02%7%3%3%
<048%25%22%13%
Table 8. Probability of U N M compared to 0.
Table 8. Probability of U N M compared to 0.
U N M The Proposed AlgorithmTraditional MCTS
Algorithm
Angle and Distance
Optimal Algorithm
Distance Optimal
Algorithm
>049%63%68%89%
=03%5%3%1%
<048%32%29%10%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Li, Q.; Wang, F.; Yang, W.; Liu, Z. Air Combat Maneuver Strategy Algorithm Based on Two-Layer Game Decision-Making and Distributed Double Game Trees MCTS under Uncertain Information. Electronics 2022, 11, 2608. https://doi.org/10.3390/electronics11162608

AMA Style

Li Q, Wang F, Yang W, Liu Z. Air Combat Maneuver Strategy Algorithm Based on Two-Layer Game Decision-Making and Distributed Double Game Trees MCTS under Uncertain Information. Electronics. 2022; 11(16):2608. https://doi.org/10.3390/electronics11162608

Chicago/Turabian Style

Li, Qiuni, Fawei Wang, Wanping Yang, and Zongcheng Liu. 2022. "Air Combat Maneuver Strategy Algorithm Based on Two-Layer Game Decision-Making and Distributed Double Game Trees MCTS under Uncertain Information" Electronics 11, no. 16: 2608. https://doi.org/10.3390/electronics11162608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop