Abstract
In human-computer gaming scenarios, the autonomous decision-making problem of an unmanned combat air vehicle (UCAV) is a complex sequential decision-making problem involving multiple decision-makers. In this paper, an autonomous maneuver decision-making method for UCAV that considers the partially observable states of Human (the adversary) is proposed, building on a game-theoretic approach. The maneuver decision-making process within the current time horizon is modeled as a game of Human and UCAV, which significantly reduces the computational complexity of the entire decision-making process. In each established game decision-making model, an improved maneuver library that contains all possible maneuvers (called the continuous maneuver library) is designed, and each of these maneuvers corresponds to a mixed strategy of the established game. In addition, the unobservable states of Human are predicted via the Nash equilibrium strategy of the previous decision-making stage. Finally, the effectiveness of the proposed method is verified by some adversarial experiments.
1. Introduction
With the advancement of science and technology, air combat equipment is accelerating innovation, which makes air combat showcase features such as high dynamics, strong confrontation, and incomplete information [1,2]. Traditionally, air combat commands have been performed by experienced pilots. However, an excellent pilot needs to gain combat experience from a lot of combat training, which requires huge costs. Moreover, limited by the physiology and mind of human pilots, it is difficult for human pilots to have a comprehensive grasp of the battlefield situation in the face of the new characteristics of modern air combat [3,4]. Therefore, it is particularly necessary to develop an autonomous decision-making system that can replace the role of human pilots [5,6]. Under this demand, the research on human-computer gaming has gradually emerged with the maturity of artificial intelligence technologies. As a human-computer gaming technology, the autonomous maneuver decision-making of an unmanned combat air vehicle (UCAV) versus human pilots has recently become a research hotspot [7,8].
In recent years, with the emergence of artificial intelligence technologies such as deep learning and reinforcement learning, the level of information acquisition, processing, and analysis of artificial intelligence-enabled machines has been greatly improved [9,10,11,12]. In the field of cognitive intelligence, human-computer gaming algorithms, represented by AlphaGo [13] and Libratus [14], have already defeated top human professional players in problems with defined boundaries and fixed rules. However, air combat decision-making is made in an open environment with no fixed rules and incomplete information [15,16,17]. In addition, air combat is different from scenarios such as Go and Texas Poker, which can be learned through multiple training sessions. As a result, existing human-computer gaming technologies cannot be migrated in parallel to air combat decision-making problems. In comparison, mathematical modeling of the human-computer gaming problem is a feasible research direction. Game theory, a mathematical tool used to describe the strategic interaction between multiple rational decision makers, has recently been used by many scholars to explore the decision-making approaches in air combat. In Ref. [18], a constraint strategy game approach was proposed to give intelligent decisions for multiple UCAVs with time-sensitive information. In Ref. [19], the task management and control problem was studied based on a dynamic game strategy, and a fast and optimal search algorithm inspired by graph theory and Kuhn–Munkres algorithm was designed to give the optimal decisions. In Ref. [20], a dimensionality reduction approach for matrix games was developed to provide efficient solutions for the multi-UCAV attack-defense decision-making problem.
As a typical air combat decision-making problem, maneuver decision-making has attracted a lot of interest from researchers in recent years [21,22,23]. A matrix game approach was proposed to generate maneuver decisions for low-flying aircraft during one-on-one air combat over hilly terrain in Ref. [24], which inspired the subsequent research of game theory in the maneuver decision-making problem. The maneuver library was designed, which consists of seven basic maneuvers. The maneuvers are max load factor turn left, max load factor turn right, max long acceleration, steady flight (flight with the current speed), max long deceleration, max load factor pull up, and max load factor push over. Then there are also some variants of the maneuver library, such as the maneuver library with 15 maneuvers [25], the maneuver library with 36 maneuvers [26], etc. An autonomous maneuver decision-making method was proposed in Ref. [27], based on an improved deep reinforcement learning (DRL) algorithm for a UCAV in short-range aerial combat. In the proposed algorithm, the final return value was incorporated into the previous steps, which overcomes the defects of traditional DRL in terms of training speed and convergence speed. A novel dynamic quality replay (DQR) method was proposed in Ref. [28] with the help of the DRL algorithm, which enables UAVs to effectively learn maneuver strategies from historical data without relying on traditional expert systems.
The above studies have gradually promoted the development of autonomous maneuver decision-making approaches. On this basis, we propose a maneuver decision-making method that considers the partially observable states of Human (the adversary) with a continuous maneuver decision-making library. We know that all existing maneuver libraries are composed of a limited number of basic maneuvers, so all available maneuvers designed in this way are discrete. However, in real maneuver decision-making process, all possible maneuvers are distributed in a continuous space. On this account, we design a continuous maneuver library that contains all possible maneuvers. In addition, the partially observable states of Human, as a class of incomplete information in air combat, is considered during the maneuver decision-making process. We know that maneuver decision-making is an iterative observation and decision-making process through interaction with the environment [29]. However, the current state of Human is sometimes unobservable in real air combat, due to equipment performance limitations and external disturbances. Considering this kind of incomplete information, we give a state prediction method when the state of Human is unobservable.
The main contributions of this paper are summarized as follows:
- (1)
- The maneuver decision-making process within the current time horizon is modeled as a game of UCAV and Human, which inherently reduces the computational complexity. In each established game decision-making model, a continuous maneuver library that contains all possible maneuvers is designed, where each maneuver corresponds to a mixed strategy of the game model, which not only enriches the maneuver library but also solves the problem of the executable of mixed strategies;
- (2)
- The partially observable state of Human is considered during the dynamic maneuver decision-making process of UCAV and Human, and a method to predict the unobservable state of Human is given via the Nash equilibrium strategy of the previous decision-making stage.
The structure of this paper is as follows. In Section 2, the maneuver decision-making problem in human-computer gaming is described, and a situation assessment method is reviewed to assess the quality of the maneuvers. In Section 3, an autonomous maneuver decision-making method is proposed for UCAV versus Human. Firstly, the decision-making process of the current time horizon is modeled as a game of both sides; then a continuous maneuver library that contains all possible maneuvers is designed; finally, a method for predicting the unobservable state of Human is given. In Section 4, simulations and analyses are presented to demonstrate the effectiveness of the proposed method.
2. Problem Formulation and Preliminaries
This paper considers the following maneuver decision-making problem in human-computer gaming scenarios: the two confronting entities in air combat, Human (the experienced pilot, labeled as ) and Computer (the autonomous decision-making system of the UCAV, labeled as ). Both aim to achieve the best possible attack position through a series of reasonable maneuvers. In this process, both and choose a maneuver according to the current air combat situation, then the current situation is changed to the next situation; after that, the two sides observe the current air combat situation and make further maneuver decisions, and so on, until one side occupies the best possible attack position and launches missiles. This is a sequential decision-making process involving two interacting decision-makers, which is essentially a dynamic game of and .
We assume that both and complete a maneuver every time . As a consequence, the maneuver decision-making process of and can be divided into multiple consecutive decision-making stages, as shown in Figure 1, where () is the i-th decision point, satisfying , , which is an ideal attack position of one side, is the air combat situation of and at time , and represent the maneuvers taken by and at time , respectively.
Figure 1.
Maneuver decision-making process of and .
During the maneuver decision-making process of and , first receives the air combat information of both sides in real time through airborne sensors and ground base stations, and then conducts a situational assessment based on the received information, which is the basis for subsequent maneuver decision-making. The air combat situation of and is given in Figure 2, where is the distance between and , is the height difference between them, and q are the target azimuth and target entry angle of with respect to , respectively, and are the speeds of and , respectively. We point out that the above parameters can be calculated from the position and speed of and . Thus, the states of and are defined as their positions and speeds, and are denoted as and , respectively.
Figure 2.
Air combat situation of and .
The situation assessment index system is composed of speed advantage, angle advantage, distance advantage, height advantage, and performance advantage. In Ref. [30], the speed advantage function considers the speed of both aircrafts. On this basis, we propose an improved speed advantage function which considers both the speed of the aircrafts and the speed of the missiles. The speed advantage function is given as follows:
where is the aircraft speed advantage function, is the missile speed advantage function, and and are weight coefficients (satisfying , ). For the aircraft speed advantage function , please refer to Ref. [30]. Similar to , we give the missile speed advantage function as follows:
where and are the missile speeds of and respectively.
For detailed formulas of angle advantage, distance advantage, height advantage, and performance advantage, please see Refs. [3,30,31,32]. By combining the advantage functions of speed, angle, distance, height, and performance, the overall situation advantage of over can finally be obtained [3].
3. Design of the Maneuver Decision-Making Method
In this section, an autonomous maneuver decision-making method for that considers the partially observable states of is proposed. The maneuver decision-making process within the current time horizon is modeled as a game of and . In each game, a continuous maneuver library that contains all possible maneuvers is designed. Finally, a method to predict the unobservable state of is proposed. The overall flow of the proposed autonomous maneuver decision-making method is given in Figure 3, where the game decision-making model of and at the i-th decision-making stage is constructed based on the continuous maneuver library and situation assessment method; then the strategy executed by is obtained through the calculated Nash equilibrium of the established game; subsequently, observe the state of when this state is observable, otherwise, the unobservable state of is predicted via the Nash equilibrium strategy ; if one side reaches a suitable attack position, launch the guns/missiles, otherwise update the air combat situation and repeat the above maneuver decision-making process.
Figure 3.
Overall flow of the maneuver decision-making method of and .
3.1. Game Decision-Making Model of the Current Time Horizon
As a dynamic game process, the decision time of maneuver decision-making increases exponentially with the increase of the decision stages. As a consequence, it is unrealistic to consider the entire maneuver decision-making process at once. Here, we borrow the ideal of moving horizon solutions of dynamic games in Ref. [33], and the maneuver decision-making process within the current time horizon is modeled as a game of and . That is, we do not consider the entire decision-making process at once, but limit the computation to a short time horizon that may involve only the next few decision-making stage, which can significantly reduce the computational complexity. Formally, the i-th decision stage of the maneuver decision-making process is modeled as the following game: , where
- is the air combat situation of the i-th decision-making stage;
- and are the maneuver sets of and , respectively. Since both and have full maneuverability, we assume that they have the same maneuver set, i.e., ;
- is the payoff function of , which associates each with a real value . The payoff function of is defined as , due to the adversarial nature of and .
Below we give the construction of the payoff functions. The purpose of or in making maneuver decisions is to have a situational advantage over the other side. Therefore, the payoff function is defined as the added value of the situation assessment after taking a certain maneuver combination. Formally, the payoff function of is constructed as follows:
where is the air combat situation after taking the maneuvers and , and are the situation assessment values of under the air combat situations and , respectively.
In the context of game theory, and are called the pure strategy sets of and , respectively. However, the strategy chosen in a game is often a probability distribution over the set of a pure strategy set, which is called a mixed strategy. Formally, a mixed strategy of is a probability distribution , which associates a pure strategy with a value , where represents the probability of choosing strategy . A pure strategy is usually a finite set, so a mixed strategy can be represented by the range of the probability distribution (a probability vector). The set of all probability distributions on is called the mixed strategy set of , denoted as . Similarly, the mixed strategy set of is given as . The mixed strategy is essentially an expansion of the pure strategy. Naturally, the payoff function of defined on the mixed strategy set can be given as [34]:
where , . In fact, is the expected payoff of on the probability distribution combination .
Nash equilibrium is a widely adopted solution concept of a game. In a Nash equilibrium, no player can increase his own payoff by unilaterally changing his current strategy. For the above game , a strategy combination is called a pure strategy Nash equilibrium of G, if for each , and each , the following inequalities hold [34]:
Similarly, a strategy combination is called a mixed strategy Nash equilibrium of G, if for each , and each , there is [34]
It has been proved by John Nash that every game has at least one mixed strategy Nash equilibrium [34]. However, not all games have a pure strategy Nash equilibrium. A mixed strategy Nash equilibrium of a two-player zero-sum game can be obtained by the following lemma.
Lemma 1
(Parthasarathy and Raghavan [35]). Consider a two-player zero-sum game , where and are the strategy sets of players and , respectively, is the payoff matrix of . If and are the optimal solutions of the following dual linear programming (7) and (8), respectively, then is a Nash equilibrium of G, where , , .
3.2. Continuous Maneuver Library
In this section, the maneuver library is designed. We know that the Nash equilibrium of a game is selected from the mixed strategy set, so the mixed strategy set is actually the maneuver library (the set of all possible maneuvers). In other words, the mixed strategy set () corresponds to the maneuver library. In order to design the maneuver libraries and , it is only necessary to design the pure policy sets and (called the maneuver sets).
Before designing the maneuver sets and , we first investigate all the possible maneuvers of and . For a maneuver of or , its control variables are decomposed into the following three parts: horizontal control variable, vertical control variable, and acceleration control variable, which are denoted as , , and , respectively. So a maneuver (or ) can be represented by a three-dimensional vector as follows: . For the three variables we give the following design:
- The horizontal control variable belongs to the interval , where represents max load factor turn left, represents max load factor turn right, represents no horizontal turning maneuver, represents times max load factor turn left, and represents times max load factor turn right;
- The vertical control variable belongs to , where represents max load factor push over, represents max load factor pull up, represents no vertically turning maneuvers, represents times max load factor push over, and represents times max load factor pull up;
- The acceleration control variable belongs to , where represents the maximum thrust deceleration, represents the maximum thrust acceleration, represents that the thrust is 0, represents times maximum thrust deceleration, and represents times maximum thrust acceleration.
As a result, all possible maneuvers form the cube in three-dimensional space, as shown in Figure 4.
Figure 4.
All possible maneuvers of and .
Now, we construct as the set of all the vertices of the cube, in other words,
The vertices are marked in Figure 4, where
Since , the set is also given as
Below we explain the advantages of defining the pure strategy sets in this way. As we discussed above, the Nash equilibrium of a game is often in the form of a mixed strategy. If the pure strategy set is defined as the vertices of the cube, i.e., a point in three-dimensional space, then a mixed strategy is a convex combination of these vertices, which is a point inside the cube. Note that each point in the cube corresponds to a maneuver that can be executed, so each mixed strategy is an executable maneuver. More precisely, a mixed strategy of can be represented by the following point inside the cube:
where is the probability of executing the pure strategy in the mixed strategy. Furthermore, the set of all mixed strategies constitutes the whole cube, that is, the mixed strategy set can be represented by the cube . Compared with the existing discrete maneuver library, the maneuver library we designed is called the continuous maneuver library.
Remark 1.
We know that although the concept of mixed strategy is widely used, it has also been controversial since it was proposed. This is because it implements specific strategies in a probabilistic manner. Significantly, constructing the pure strategy set in this way not only solves the defect that the mixed strategy is not easy to implement, but also expands the maneuver library to include all possible maneuvers.
3.3. Prediction of the Unobservable State of Human
As we designed above, our maneuver decision-making method is an observation-decision-execution cycle. However, due to the uncertainty and various disturbances in the air combat process, not all the states of can be completely observed in the whole decision-making stages. When the state of is unobservable, it is difficult to make the subsequent decision-making. To this end, we give a method to predict the state of based on the Nash equilibrium calculated in the previous decision-making stage.
The prediction method we propose is based on the following idea: as a rational decision maker, is considered to take an optimal action (the Nash equilibrium strategy) at time , so the state after taking this action can be viewed as its predicted state at time t. Essentially, the proposed state prediction method is an intention prediction method.
In fact, regardless of whether the state of is observable at time , we have established the game model of and at situation , furthermore, the Nash equilibrium of G is calculated as , where is the maneuver that needs to perform at the i-th decision-making stage. At time , if the state of is observable, the observed state of and the state of constitute the new air combat situation , and the decision-making of next stage is continued; if the state of is unobservable, the Nash equilibrium strategy of the previous step is used to predict the state of . This is because as a rational decision maker, the optimal maneuver of is the Nash equilibrium strategy. As a result, is taken as the predicted action of the previous step, and the state under is taken as the predicted state of . The above method of predicting the state of is shown in Figure 5, which is called the Nash equilibrium based state prediction (NESP) algorithm, and the detailed steps to predict the state of at time by NESP algorithm are given as follows:
Figure 5.
Prediction of the state of .
Step 1. Establish the game decision-making model under the situation ;
Step 2. Calculate the Nash equilibrium of G;
Step 3. The maneuver strategy is considered to be the strategy adopted by at the i-th decision-making stage;
Step 4. Calculate the state of after taking maneuver strategy according to the dynamical model [36], that is, the position and speed of at time .
Remark 2.
The above state prediction method is effective when the state of the adversary is unobservable in a short time (small continuous steps). However, when multiple continuous states of the adversary are unobservable, the reliability of the predicted results cannot be guaranteed. This is because in this case it is necessary to continue to predict the state of the next step based on the predicted state of the previous step. As a consequence, the deviations in the predicted results will accumulate as the number of predicted steps increases. Fortunately, what is faced in real air combat is more of this kind of unobservability in small steps.
To sum up, an automatic maneuver decision-making method that considers the partially observable states of and a continuous maneuver library is designed, which is summarized as the automatic maneuver decision-making algorithm (AMDM) in Algorithm 1.
| Algorithm 1: Automatic maneuver decision-making algorithm (AMDM) |
Input: Initial air combat situation of and : ; |
Output: The maneuver decision sequence of : , , ⋯, |
|
![]() |
|
4. Simulations
In Section 4, numerical simulations and analyses of the proposed method are presented. First, an example of one decision-making stage in air combat is given to illustrate the computational flow of the proposed method; then, some comparative experiments are carried out, and the simulation analyses are given to illustrate the effectiveness of our method.
4.1. A Numerical Example of Maneuver Decision-Making in One Decision-Making Phase
Consider the maneuver decision scenario of and , and their initial parameters, which are given in Table 1. As discussed in Section 3.2, the maneuver set of (the maneuver set of ) is composed of the following eight maneuvers: , where is a three-dimensional vector representing a specific maneuver of or . Under the initial states, the trajectories of are shown in Figure 6 when it chooses these eight maneuvers, and the trajectory of is similar. When chooses a maneuver from and chooses a maneuver from , both and move from the current state to the next state. Since the maneuver set () contains 8 maneuvers, there are a total of 64 results when and choose different maneuvers. According to the situation assessment method in Refs. [3,30,31,32], the situational advantage of can be calculated when and take different maneuvers, respectively. As we discussed above, the payoff of is defined as the added value of its situational assessment value after both parties perform their respective maneuvers. As a result, a payoff matrix of can be obtained through the situation assessment, which is presented in Table 2. In fact, the above maneuver decision-making process can described by the game , where is the situation of the first decision-making stage, the payoff function can be represented by the payoff matrix in Table 2, where , and the payoff function .
Table 1.
Initial parameters of and in a numerical example.
Figure 6.
Trajectories of when performing different maneuvers.
Table 2.
Payoff matrix of .
According to Lemma 1, a Nash equilibrium of can be obtained by solving the following dual linear programming:
By calculation, a Nash equilibrium of is , where
It shows that should perform the maneuver ; should perform maneuver with probability 0.156, maneuver with probability 0.001, maneuver with probability 0.444 and maneuver with probability 0.6802. According to the discussion in Section 3.2, the Nash equilibrium strategy and can be represented by the following three-dimensional vector, respectively:
In consequence, is represented by , which means max load factor turn left, max load factor pull up, and maximum thrust acceleration. Similarly, is represented by , which means 0.998 times max load factor turn right, 0.688 times max load factor push up, and 0.11 times maximum thrust deceleration.
In fact, in the current decision-making stage, only pays attention to its own Nash equilibrium strategy , and the calculated Nash equilibrium strategy is used to predict the state of when the next state of is unobservable. It can be seen from the above calculation that the Nash equilibrium strategy of is a mixed strategy. In previous methods, a mixed strategy is a maneuver that is difficult to interpret and perform, whereas in our method, corresponds to an executable maneuver. This guarantees that the proposed method can be used in real maneuver decision-making scenarios.
4.2. Some Comparative Experiments of Maneuver Decision-Making Process
In this part, some adversarial experiments of and are implemented to demonstrate the effectiveness of the proposed algorithm. The initial parameters of and are first given in Table 3. These settings ensure that and have the same situational advantages in the beginning. The simulation step size of these experiments is 4 s.
Table 3.
Initial parameters of and of the comparative experiments.
We assume that and have the same initial situations and follow the same dynamic model, and that the only difference between them is the algorithm used to maneuver. In these experiments, adopts the proposed AMDM algorithm, while adopts the existing maneuver strategies or is operated by an experienced person with hand gestures. In order to characterize the partially observable state of , one state of and is randomly set to be unobservable, and the data information of these states is artificially deleted. In the unobservable state, our algorithm continues to make maneuver decision-making based on the predicted state of , while the adversary’s algorithm has no ability to deal with incomplete information. In this case, considers that took a random maneuver in the previous decision-making stage, so makes decisions based on the state of after that random maneuver.
The maneuver trajectory of AMDM algorithm compared with other algorithms are shown in Figure 7, where adopts the AMDM algorithm, respectively adopts the following six maneuver methods: straight maneuver (SM), random maneuver (RM), elemental maneuver (EM) [24], experienced person 1 (EP1), experienced person 2 (EP2), and experienced person 3 (EP3). In each subgraph of Figure 7, the initial positions of and are marked by the symbol ★, and the unobservable states of and are covered with clouds. In each experiment, it is considered that or has reached the ideal attack position, when their situational advantage difference exceeds 0.5.
Figure 7.
Maneuver trajectories of the AMDM algorithm compared with the other algorithms. (a) Straight maneuver, (b) random maneuver, (c) elemental maneuver, (d) experienced person 1, (e) experienced person 2, (f) experienced person 3.
In the first experiment, adopts a fixed straight strategy, and it can be seen form Figure 7a that can easily achieve the attack position through the maneuver combination of climbing and diving. In the second experiment, takes the random maneuvers. Figure 7b show that the maneuver trajectory of when performing random strategy has no regularity, so can also loosely complete the attack position. In the third experiment, selects an elemental maneuver in the basic maneuver library in each decision-making stage. In Figure 7c, the state information of in the third stage is unobservable, which makes choose an inappropriate maneuver. As a result, gradually loses its situational advantage and finally choses to escape. In the last three experiments, since the experienced person has no ability to predict unobservable states and their decision-making is highly subjective, eventually loses priority in angle and height, as shown in Figure 7d, Figure 7e, and Figure 7f, respectively. The situation advantage difference between and in each stages of these experiments are given in Figure 8, which shows that can always outperform to reach a favorable attacking position after six to seven decision-making stages. Thus, the effectiveness of the proposed method has been verified.
Figure 8.
The difference between the situation advantages of and .
Furthermore, the back-propagation neural network (BPNN) method is used to predict the unobservable states of in the above experiments. In each confrontation experiment, the historical state data of is sampled every 1 s, and 8 consecutive data are taken as a sample, where the first 7 are the inputs of the network, and the last one is the target of the network. Then, these samples are randomly divided into a training set, validation set and test set, and their proportions are 70%, 15%, and 15%, respectively. The number of hidden neurons is set to 10, and the Levenberg–Marquardt algorithm is used to train the network. By the above BPNN prediction method, the unobservable state of is predicted.
Then, based on the predicted state by BPNN algorithm, the maneuver decision-making game model of this decision-making stage is established, and the corresponding Nash equilibrium strategy is obtained. If the state of is unobservable at time , Table 4 shows the situation advantage change of to from to under the two state prediction methods. According to Table 4, compared with the BPNN algorithm, the proposed NESP state prediction method can generally enable to obtain a higher situation advantage, which shows the effectiveness of the proposed prediction method. This is because the proposed prediction method considers the intentions , while BPNN only makes predictions based on historical state data.
Table 4.
Situation advantage change of to under different state prediction methods.
5. Conclusions
The autonomous maneuver decision-making problem of UCAV under the framework of human-computer gaming has been studied in this paper. The maneuver decision-making process has been decomposed into a sequential decision problem, where each decision-making stage is modeled as a game of and . A continuous maneuver library has been designed, which not only expands the maneuver library to infinity, but also solves the executable problem of mixed strategies. Moreover, the unobservable states in the maneuver decision-making process have been considered, and a state prediction method based on Nash equilibrium has been proposed. Future work will aim to extend the method to the task assignment problem of multiple UCAVs.
Author Contributions
Conceptualization, S.L. and Q.W.; methodology, S.L.; software, S.L. and B.D.; validation, Y.W. and M.C.; investigation, M.C.; writing—original draft preparation, S.L.; writing—review and editing, Q.W. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by Major Projects for Science and Technology Innovation 2030 (Grant No. 2018AAA0100805).
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Dalkıran, E.; Önel, T.; Topçu, O.; Demir, K.A. Automated integration of real-time and non-real-time defense systems. Def. Technol. 2021, 17, 657–670. [Google Scholar] [CrossRef]
- Yang, Z.; Sun, Z.X.; Piao, H.Y.; Huang, J.C.; Zhou, D.Y.; Ren, Z. Online hierarchical recognition method for target tactical intention in beyond-visual-range air combat. Def. Technol. 2022, 18, 1349–1361. [Google Scholar] [CrossRef]
- Li, S.; Wu, Q.; Chen, M.; Wang, Y. Air Combat Situation Assessment of Multiple UCAVs with Incomplete Information. In CISC 2020: Proceedings of 2020 Chinese Intelligent Systems Conference; Springer: Singapore, 2020; pp. 18–26. [Google Scholar]
- Guo, J.; Wang, L.; Wang, X. A Group Maintenance Method of Drone Swarm Considering System Mission Reliability. Drones 2022, 6, 269. [Google Scholar] [CrossRef]
- Shin, H.; Lee, J.; Kim, H.; Shim, D.H. An autonomous aerial combat framework for two-on-two engagements based on basic fighter maneuvers. Aerosp. Sci. Technol. 2018, 72, 305–315. [Google Scholar] [CrossRef]
- Li, J.; Chen, R.; Peng, T. A Distributed Task Rescheduling Method for UAV Swarms Using Local Task Reordering and Deadlock-Free Task Exchange. Drones 2022, 6, 322. [Google Scholar] [CrossRef]
- Zhou, X.; Qin, T.; Meng, L. Maneuvering Spacecraft Orbit Determination Using Polynomial Representation. Aerospace 2022, 9, 257. [Google Scholar] [CrossRef]
- Li, W.; Lyu, Y.; Dai, S.; Chen, H.; Shi, J.; Li, Y. A Multi-Target Consensus-Based Auction Algorithm for Distributed Target Assignment in Cooperative Beyond-Visual-Range Air Combat. Aerospace 2022, 9, 486. [Google Scholar] [CrossRef]
- Abdar, M.; Pourpanah, F.; Hussain, S.; Rezazadegan, D.; Liu, L.; Ghavamzadeh, M.; Fieguth, P.; Cao, X.; Khosravi, A.; Acharya, U.R.; et al. A review of uncertainty quantification in deep learning: Techniques, applications and challenges. Inf. Fusion 2021, 76, 243–297. [Google Scholar] [CrossRef]
- Zhang, P.; Wang, C.; Jiang, C.; Han, Z. Deep reinforcement learning assisted federated learning algorithm for data management of IIoT. IEEE Trans. Ind. Inform. 2021, 17, 8475–8484. [Google Scholar] [CrossRef]
- Du, B.; Mao, R.; Kong, N.; Sun, D. Distributed Data Fusion for On-Scene Signal Sensing With a Multi-UAV System. IEEE Trans. Control Netw. Syst. 2020, 7, 1330–1341. [Google Scholar] [CrossRef]
- Han, J.; Wu, J.; Zhang, L.; Wang, H.; Zhu, Q.; Zhang, C.; Zhao, H.; Zhang, S. A Classifying-Inversion Method of Offshore Atmospheric Duct Parameters Using AIS Data Based on Artificial Intelligence. Remote Sens. 2022, 14, 3197. [Google Scholar] [CrossRef]
- Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef] [PubMed]
- Brown, N.; Sandholm, T. Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science 2018, 359, 418–424. [Google Scholar] [CrossRef] [PubMed]
- Li, W.; Shi, J.; Wu, Y.; Wang, Y.; Lyu, Y. A Multi-UCAV cooperative occupation method based on weapon engagement zones for beyond-visual-range air combat. Def. Technol. 2022, 18, 1006–1022. [Google Scholar] [CrossRef]
- Kang, Y.; Pu, Z.; Liu, Z.; Li, G.; Niu, R.; Yi, J. Air-to-Air Combat Tactical Decision Method Based on SIRMs Fuzzy Logic and Improved Genetic Algorithm. In Advances in Guidance, Navigation and Control; Springer: Singapore, 2022; pp. 3699–3709. [Google Scholar]
- Li, B.; Liang, S.; Chen, D.; Li, X. A Decision-Making Method for Air Combat Maneuver Based on Hybrid Deep Learning Network. Chin. J. Electron. 2022, 31, 107–115. [Google Scholar]
- Li, S.; Chen, M.; Wang, Y.; Wu, Q. Air combat decision-making of multiple UCAVs based on constraint strategy games. Def. Technol. 2022, 18, 368–383. [Google Scholar] [CrossRef]
- Zhang, T.; Li, C.; Ma, D.; Wang, X.; Li, C. An optimal task management and control scheme for military operations with dynamic game strategy. Aerosp. Sci. Technol. 2021, 115, 106815. [Google Scholar] [CrossRef]
- Li, S.; Chen, M.; Wang, Y.; Wu, Q. A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making. Inf. Sci. 2022, 594, 305–321. [Google Scholar] [CrossRef]
- Ruan, W.; Duan, H.; Deng, Y. Autonomous Maneuver Decisions via Transfer Learning Pigeon-inspired Optimization for UCAVs in Dogfight Engagements. IEEE/CAA J. Autom. Sin. 2022, 9, 1–19. [Google Scholar] [CrossRef]
- Hu, J.; Wang, L.; Hu, T.; Guo, C.; Wang, Y. Autonomous Maneuver Decision Making of Dual-UAV Cooperative Air Combat Based on Deep Reinforcement Learning. Electronics 2022, 11, 467. [Google Scholar] [CrossRef]
- Du, B.; Chen, J.; Sun, D.; Manyam, S.G.; Casbeer, D.W. UAV Trajectory Planning with Probabilistic Geo-Fence via Iterative Chance-Constrained Optimization. IEEE Trans. Intell. Transp. Syst. 2021, 23, 5859–5870. [Google Scholar] [CrossRef]
- Austin, F.; Carbone, G.; Hinz, H.; Lewis, M.; Falco, M. Game theory for automated maneuvering during air-to-air combat. J. Guid. Control Dyn. 1990, 13, 1143–1149. [Google Scholar] [CrossRef]
- Yang, Q.; Zhang, J.; Shi, G.; Hu, J.; Wu, Y. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access 2019, 8, 363–378. [Google Scholar] [CrossRef]
- Zhang, H.; Huang, C.; Xuan, Y.; Tang, S. Maneuver Decision of Autonomous Air Combat of Unmanned Combat Aerial Vehicle Based on Deep Neural Network. Acta Armamentarii 2020, 41, 1613. [Google Scholar]
- Li, Y.; Shi, J.; Jiang, W.; Zhang, W.; Lyu, Y. Autonomous maneuver decision-making for a UCAV in short-range aerial combat based on an MS-DDQN algorithm. Def. Technol. 2022, 18, 1697–1714. [Google Scholar] [CrossRef]
- Hu, D.; Yang, R.; Zhang, Y.; Yue, L.; Yan, M.; Zuo, J.; Zhao, X. Aerial combat maneuvering policy learning based on confrontation demonstrations and dynamic quality replay. Eng. Appl. Artif. Intell. 2022, 111, 104767. [Google Scholar] [CrossRef]
- Du, B.; Sun, D.; Hwang, I. Distributed State Estimation for Stochastic Linear Hybrid Systems with Finite-Time Fusion. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 3084–3095. [Google Scholar] [CrossRef]
- Dong, Y.; Feng, J.; Zhang, H. Cooperative tactical decision methods for multi-aircraft air combat simulation. J. Syst. Simul. 2002, 14, 723–725. [Google Scholar]
- Jiang, C.; Ding, Q.; Wang, J.; Wang, J. Research on threat assessment and target distribution for multi-aircraft cooperative air combat. Fire Control Command Control 2008, 33, 8–12+21. [Google Scholar]
- Shen, Z.; Xie, W.; Zhao, X.; Yu, C. Modeling of UAV Battlefield Threats Based on Artificial Potential Field. Comput. Simul. 2014, 31, 60–64. [Google Scholar]
- Cruz, J.; Simaan, M.A.; Gacic, A.; Liu, Y. Moving horizon Nash strategies for a military air operation. IEEE Trans. Aerosp. Electron. Syst. 2002, 38, 989–999. [Google Scholar] [CrossRef]
- Maschler, M.; Solan, E.; Zamir, S. Game Theory; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
- Parthasarathy, T.; Raghavan, T.E.S. Some Topics in Two-Person Games; American Elsevier Publishing Company: New York, NY, USA, 1917. [Google Scholar]
- Huang, C.; Dong, K.; Huang, H.; Tang, S.; Zhang, Z. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J. Syst. Eng. Electron. 2018, 29, 86–97. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
