Next Article in Journal
Smart Irrigation Enhancement Through UAV-Based Clustering and Wireless Charging in Wireless Sensor Networks
Previous Article in Journal
A Survey on Unauthorized UAV Threats to Smart Farming
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Autonomous Maneuver Decision-Making for Unmanned Combat Aerial Vehicle Based on Modified Marine Predator Algorithm and Fuzzy Inference

1
Graduate School, Air Force Engineering University, Xi’an 710038, China
2
Aviation Engineering School, Air Force Engineering University, Xi’an 710038, China
*
Author to whom correspondence should be addressed.
Drones 2025, 9(4), 252; https://doi.org/10.3390/drones9040252
Submission received: 13 February 2025 / Revised: 24 March 2025 / Accepted: 25 March 2025 / Published: 26 March 2025

Abstract

:
In recent years, autonomous maneuver decision-making has emerged as a key technology in autonomous air combat confrontation, garnering widespread attention. A method combining the modified marine predator algorithm (MMPA) and fuzzy inference is proposed to solve the autonomous maneuver decision-making problem of an unmanned combat aerial vehicle (UCAV). By incorporating the missile attack strategy into the process of calculating the maneuver strategy, the air combat decision-making capability of the UCAV is enhanced. First, the weight coefficients determined by the fuzzy inference method are combined with air combat superiority functions that consider the current missile attack zone and then the objective function is obtained, which is to be optimized at the current moment. Second, the MMPA is used to solve the objective function to obtain the missile attack maneuver strategy and the maneuver strategy for defending against missile attacks. A comparative analysis with other classical intelligent optimization algorithms highlights the advantages of the proposed method. Furthermore, the air combat confrontation simulation experiments are conducted under six different initial scenarios, namely, neutral, offensive, oppositional, defensive, parallel, and head-on. The simulation results show that the integrated maneuver and missile attack decision-making capabilities of the UCAV are improved using the proposed autonomous maneuver decision-making method.

1. Introduction

The autonomous decision-making capability of an aircraft plays a pivotal role in ensuring the success of an aerial confrontation [1,2]. In modern warfare, achieving air superiority is critical because it can greatly influence the outcome of combat. In high-intensity and high-threat air combat confrontations, pilots need to quickly assess the confrontation situation and make operational decisions, as well as accurately control the aircraft to complete flight maneuvers and launch missiles. Therefore, there is a growing need for autonomous air combat confrontation systems that can make effective decisions in complex and changing environments. Autonomous air combat confrontation decision-making is a critical and challenging task that encompasses the following tasks: maximizing the probability of shooting down a target aircraft while minimizing the risk of being shot down. Due to the excellent performance of unmanned aerial vehicles (UAVs) in performing reconnaissance and air-to-ground strikes, as well as the attention paid by many countries to the development of UCAV autonomous air combat confrontation, UCAVs with autonomous decision-making capabilities will play an important role in the air combat of the future, and autonomous air combat is a key issue in the future development of air combat. An autonomous air combat confrontation system is a complex system, mainly including situational awareness and assessment, maneuver decision-making, flight control, missile attack, and damage assessment. Among them, according to the real-time air combat confrontation situation information, it is key to an autonomous air combat confrontation system to make reasonable maneuver decisions and accurately control the aircraft to implement offensive or defensive maneuvers in order to satisfy the conditions of the missile attack of the attacking aircraft or destroy the conditions of the missile attack of the target aircraft, and ultimately to hit the target aircraft or evade the missile attack. How to integrate missile attack decision-making with UCAV autonomous maneuver decision-making is a problem to be solved at present.
The main approaches to solving autonomous maneuver decision-making for UCAVs include mathematical solutions, intelligent optimization, and machine learning methods. Mathematical solutions involve the application of differential game models and optimal control theory to solve offensive [3,4] and defensive maneuver [5,6] problems, which have the advantage of being intuitive and easy to understand in terms of modeling and solving. However, the drawback to such approaches is that they are usually applicable under certain idealized assumptions [7]. Intelligent optimization methods require the formulation of objective functions for the autonomous maneuver decision-making problem and the use swarm intelligence algorithms to solve it, such as the genetic algorithm (GA) [8], particle swarm optimization (PSO) [9], and pigeon-inspired optimization (PIO) algorithm [10]. These methods have the advantages of high precision, good environmental adaptability, and fine real-time performance. However, the interpretability of them is insufficient. Moreover, machine learning methods have been successfully applied in many fields, such as complex games [11], image segmentation [12,13], trajectory planning [14,15], target tracking [16,17], autonomous landing [18], formation control [19,20], target assignment [21], and situational assessment [22]. They are increasingly applied in the field of air combat confrontation autonomous maneuver decision-making, and representative methods include the improved policy gradient algorithm [23], hierarchical reinforcement learning algorithm [24], deep reinforcement learning (DRL) algorithm [25,26], course learning technique [27,28], and fuzzy inference method [29]. These methods do not require traversing the solution space. However, a large amount of data are required to train the decision model, and the training process is complex. As for the missile attack decision-making problem, both reinforcement learning and deep learning methods play an important role in solving it. On the one hand, DRL algorithms have been improved for training missile attack strategies, as evidenced by the effective use of the hierarchical policy gradient algorithm [30] and the heuristic reinforcement learning algorithm [31]. On the other hand, neural network techniques have been used to predict missile attack zones; for instance, using Bayesian networks for missile attack zone modeling [32]. In addition, some rule-based autonomous decision-making methods are also used to solve the missile attack decision-making problem, such as the fuzzy tree search method [33] and the state–event–condition–action method [34].
In air combat confrontation, the relationship between the maneuver decision-making and missile attack decision-making of a UCAV is intricate. Some studies have used DRL methods for maneuver decision-making and missile attack decision-making [35]. However, the sparse and delayed characteristics of the reward signals pose challenges for the algorithms to effectively explore missile attack maneuver strategies and maneuver strategies to defend against missile attacks. In addition, reinforcement learning methods make it difficult to integrate the missile attack zone model with the maneuver decision-making process, as they can hardly achieve correlation between maneuver strategies and missile attack strategies. It remains challenging to effectively combine these two aspects to enhance the algorithm’s decision-making capability.
The development, theoretical background, and application of the MPA in recent years are reviewed in reference [36]. Compared with other swarm intelligence optimization algorithms, the MPA has the following advantages: derivative-free computation, fewer parameters, ease of use, scalability, reliability, and completeness. Therefore, the MPA is widely used in a wide range of optimization problems, such as engineering applications, classification and clustering, and image processing. In order to enhance the exploitation capability of the MPA, reference [37] combined the MPA with chaotic mapping and applied it to solve the continuous optimization problem. By improving the predator and prey encounter rates and increasing the population diversity, reference [38] proposed a hybrid MPA and applied it to solve the engineering optimization problem. In addition, the introduction of the fuzzy inference method can solve the problem of the insufficient interpretability of intelligent optimization algorithms. On the basis of the previous analysis, we propose a method based on the MMPA and fuzzy inference and use it to solve the UCAV autonomous maneuver decision-making problem, which enhances the air combat decision-making capability of the UCAV by combining the process of solving the missile attack strategy with solving the maneuver strategy. The core concept of the MMPA- and fuzzy-inference-based approach is to use real-time computed missile attack zones to enhance the effectiveness and superiority of the intelligent optimization algorithm for autonomous air combat confrontation. In the combined MMPA and fuzzy inference approach, according to the dynamic changes in the air combat confrontation situation, the missile attack zone computation model is used to compute the allowable missile launch range online, and the computation results are combined with the air combat superiority function to obtain the air combat superiority functions that take into account the current allowable missile launch range. In addition, the fuzzy inference method is used to dynamically adjust the weight coefficients of the air combat superiority functions to promote the consistency of the confrontation state reflected by the comprehensive air combat situation assessment function with the current real air combat confrontation situation. The above air combat superiority functions are combined with the weight coefficients determined by the fuzzy inference method to obtain the current objective function to be optimized. Then, the MMPA is used to solve the above objective function to obtain the optimized maneuver control variables at the next moment. Due to the integration of the missile attack zone computational model with the maneuver strategy solving process, the maneuver strategy and the missile attack strategy can be organically combined. In conclusion, the main contributions of this paper are summarized as follows.
Our confrontation strategy considers a missile attack zone model and an integrated strategy to generate maneuver control commands and missile launch commands.
Compared with the comprehensive air combat situation assessment function with fixed weight coefficients [39], a dynamic adjustment strategy for the weight coefficients of the air combat superiority functions is proposed.
Through simulation experiments, the effectiveness of the fuzzy-inference-based air combat superiority function tuning weight mechanism in the MMPA is verified.
Our proposed autonomous maneuver decision-making method is tested against various autonomous maneuver decision-making methods for target aircraft in various initial confrontation scenarios in order to facilitate a comprehensive analysis of its confrontation performance and adaptability.
The rest of this paper is organized as follows. Section 2 models the missile attack zone computational model and the autonomous maneuver decision-making problem model for the within-visual-range (WVR) air combat confrontation scenario. In Section 3, the autonomous maneuver decision-making method based on the MMPA is proposed. In Section 4, air combat confrontation simulation experiments with different initial scenarios are implemented, and the experimental results are used to demonstrate the effectiveness of the autonomous maneuver decision-making method proposed in this paper. The conclusions are presented in Section 5.

2. Problem Statement

For the research objects involved in autonomous maneuver decision-making problems, such as UCAVs and missiles, this section mainly introduces the UCAV model construction, controller design, air combat geometric relationship establishment, and air-to-air missile attack zone calculation and describes the overall framework of the autonomous maneuvering decision-making system.

2.1. UCAV Dynamics Model

In the study of the autonomous maneuver decision-making problem, a 6-degree-of-freedom (6-DOF) UCAV model is considered [40], which has the following control variables: throttle stick offset, elevator, aileron, and rudder deflections. The mathematical model of the 6-DOF UCAV model is described as follows:
x ˙ b = u cos θ cos ψ + v ( sin θ sin ϕ cos ψ cos ϕ sin ψ ) + ω ( sin ϕ sin ψ + cos ϕ sin θ cos ψ ) y ˙ b = u cos θ sin ψ + v ( sin θ sin ϕ sin ψ + cos ϕ cos ψ ) + ω ( sin ϕ cos ψ + cos ϕ sin θ sin ψ ) h ˙ b = u sin θ v sin ϕ cos θ ω cos ϕ cos θ V ˙ = u u ˙ + v v ˙ + ω ω ˙ V α ˙ = u ω ˙ ω u ˙ u 2 + ω 2 β ˙ = v ˙ V v V ˙ V 2 cos β ϕ ˙ = p + ( r cos ϕ + q sin ϕ ) tan θ θ ˙ = q cos ϕ r sin ϕ ψ ˙ = 1 cos θ ( r cos ϕ + q sin ϕ ) p ˙ = 1 I x I z I x z 2 [ I z L + I x z N + ( I x I y + I z ) I x z p q + ( I y I z I z 2 + I x z 2 ) q r ] q ˙ = 1 I y [ M I x z ( p 2 r 2 ) ] r ˙ = 1 I x I y I x z 2
where x b , y b , h b denote the position coordinates of the UCAV in the inertial reference frame, and V , α , β are the ground velocity, the attack angle, and the sideslip angle, respectively. ϕ , θ , ψ denote the roll, pitch, and yaw angles. I x , I y , I z denote the coordinate components of the rotational inertia. L , M , N are the forces along the body-fixed reference frame. Furthermore, u , v , ω are the velocities in the longitudinal, lateral, and normal projections of the aircraft in the inertial reference frame, respectively. The units of them all are in m/s. They provide support for calculating the motion of the aircraft in the body-fixed reference frame. p , q , r are the roll, pitch, and yaw angular rates in the body-fixed reference frame. The units of them all are in deg/s. The ailerons, elevators, and rudder are deflected to control the numerical changes in the above variables.
In equation system (1), the first three equations are the displacement velocity of the aircraft in the inertial reference frame, which is used to dynamically update the position coordinates of the aircraft. The fourth, fifth, and sixth equations represent the velocity, attack angle, and sideslip angle change rate in the inertial reference frame. Equations (7)–(9) denote the roll, pitch, and yaw angle change rate in the inertial reference frame. The last three equations are the roll, pitch, and yaw angle change rate in the body-fixed reference frame.

2.2. UCAV Controller

The actual variables of the maneuver control of the UCAV are u c = [ δ T , δ e , δ a , δ r ] , where δ T , δ e , δ a , δ r are the amount of throttle stick deflection, elevator, aileron, and rudder deflection, respectively. However, the results calculated using the maneuver decision method are u i n i t = [ μ , α , δ T ] , where μ , α , δ T are the roll angle, attack angle, and throttle stick offset, respectively. If u i n i t is directly adopted as the 6-DOF UCAV maneuver control input, it is obvious that the desired trajectory cannot be obtained [41]. Therefore, trajectory planning is carried out based on the above calculations to obtain the planning trajectory s p = [ V p , γ p , χ p ] , where V p , γ p , χ p are the planning flight velocity, trajectory inclination angle, and trajectory declination angle in the trajectory coordinate reference frame, respectively. That is, u i n i t is adopted as the 3-degree-of-freedom (3-DOF) UCAV maneuver control input, then the s p is acquired. Considering the planning trajectory s p and the current actual trajectory s c = [ V c , γ c , χ c ] , where V c , γ c , χ c represent the current actual flight velocity, the trajectory inclination angle, and the trajectory declination angle in the trajectory coordinate reference frame, the desired trajectory s d = [ V d , γ d , χ d ] is obtained, which is required to be reached by the maneuver controller, where V d , γ d , χ d denote the desired flight velocity, trajectory inclination angle, and trajectory declination angle in the trajectory coordinate reference frame.
The desired trajectory s d acts as the input of the maneuver controller, and then the maneuver inverse solution [42] is performed on it to obtain the control command u d , which contains the throttle thrust T d , roll angle μ d , attack angle α d , and sideslip angle β d . The angular rate commands and the commands to control the wing deflection are generated using the nonlinear dynamic inverse (NDI) method [43]. Finally, the tracking control of the desired trajectory is achieved. The method for solving the control instructions using the NDI is described in detail in reference [43] and will not be discussed in detail in this paper. The workflow of the maneuver controller is depicted in Figure 1.
The derivation of the maneuver inverse solution process proceeds as follows.
In the trajectory coordinate reference frame, the aircraft dynamics equations can be expressed as:
m V ˙ = T cos α cos β D + Y sin β m g sin γ m V γ ˙ = T ( cos μ sin α + sin μ cos α sin β ) + L cos μ Y sin μ cos β m g cos γ m V χ ˙ cos γ = T ( sin μ sin α cos μ cos α sin β ) + L sin μ + Y cos μ cos β
where μ is the roll angle in the trajectory coordinate reference frame. T, D, Y represent the thrust, drag, and lateral forces on the UCAV. The direction of T is the same as the longitudinal axis of the body-fixed frame, D is opposite to the flight velocity, and Y is perpendicular to the longitudinal plane of symmetry of the body-fixed frame. m denotes the mass of the UCAV, and g is the acceleration of gravity. V , γ , χ represent the flight velocity, trajectory inclination angle, and trajectory declination angle in the trajectory coordinate reference frame, respectively.
The lift and drag coefficients are used to fit the lift and drag in the process of obtaining the desired control variables from the inverse solution of the desired trajectory. However, the lift and drag coefficients are quadratic polynomial functions related to the attack angle, and the process of the maneuver inverse solution needs to use them iteratively, which makes it difficult to meet the real-time requirements of maneuver control. In order to solve the above problem, a simplified calculation method for the maneuver inverse solution is considered, and the lateral force Y is assumed to be 0. When the value of the sideslip angle fluctuates around 0, the above hypothesis is usually valid. In UCAV maneuver flights, it is assumed that the sideslip angle β and the lateral force Y values are small [44], namely Y 0 , sin β 0 ,   cos β 1 . Then, Equation (2) can be simplified as:
m V ˙ = T cos α D m g sin γ m V γ ˙ = ( L + T sin α ) cos μ m g cos γ m V χ ˙ cos γ = ( L + T sin α ) sin μ
Using V ˙ d , γ ˙ d , χ ˙ d to replace V ˙ , γ ˙ , χ ˙ in Equation (3), Equation (4) can be obtained:
V ˙ d = 1 m [ T cos α D m g sin γ ] γ ˙ d = 1 m V [ ( L + T sin α ) cos μ m g cos γ ] χ ˙ d = 1 m V cos γ [ L + T sin α sin μ ]
By solving Equation (4), we can obtain the control instructions consisting of the throttle thrust T d , roll angle μ d , and attack angle α d :
μ d = a t a n 2 ( V χ ˙ d cos γ V γ ˙ d + g cos γ )
α d = a t a n 2 ( m V χ ˙ d cos γ L sin μ d ( m V ˙ d + D + m g sin γ ) sin μ d )
T d = m V ˙ d + D + m g sin γ cos α d
In the specific implementation of UCAV maneuver control, the F-16 6-DOF model is selected as the control object. Based on the aerodynamic data of the F-16 aircraft, a 6-DOF flight dynamics model is constructed. Depending on the MATLAB/Simulink simulation platform, an NDI method is used to solve the actual maneuver control variables of the UCAV, which are used as inputs for the 6-DOF flight dynamics model of the F-16 to drive the UCAV to perform a series of maneuvers. After the loop iteration of the above maneuver control process, the UCAV externally presents the maneuver capabilities of climb, hover, dive, and level flight.

2.3. Establishment of Air Combat Geometry

Figure 2 depicts the spatial relative geometry between the attacking aircraft and the target aircraft. UCAV represents the attacking aircraft, and enemy represents the target aircraft. R is the distance vector between the attacking aircraft and the target aircraft, which is also called the line of sight (LOS). V u c a v and V e n e m y denote the velocity vectors of the attacking and target aircraft, respectively. λ u c a v and λ e n e m y represent the spatial off-axis angles of the attacking aircraft and the target aircraft, respectively, where λ u c a v is the angle between the LOS and the velocity vector of the attacking aircraft, and λ e n e m y is the angle between the LOS and the velocity vector of the target aircraft. The calculation of λ u c a v and λ e n e m y can be expressed as Equation (8).
λ u c a v = cos 1 [ V u c a v R V u c a v R ] λ e n e m y = cos 1 [ V e n e m y R V e n e m y R ]

2.4. Calculation of the Missile Attack Zone

To meet the real-time and accuracy requirements, the missile motion model adopts the 3-DOF mass model.
The missile kinematics equations are:
x ˙ m = V m cos γ m cos ψ m y ˙ m = V m cos γ m sin ψ m z ˙ m = V m sin γ m
where x m , y m , z m are the position coordinates of the missile in the inertial reference frame. V m is the velocity of the missile. γ m is the ballistics inclination angle. ψ m is the ballistics declination angle.
The missile dynamics equations are:
V ˙ m = P m Q m g G m g sin γ m ψ ˙ m = n m y g V m cos γ m γ ˙ m = n m z g V m g cos γ m V m
where n m y , n m z are the turn control overloads of the missile in the yaw and pitch directions, respectively. P m , Q m are the thrust and air resistance, respectively. G m is the gravity force on the missile.
P m = P 0   t t w 0   t > t w
Q m = 1 2 ρ V m 2 s m C D m
G m = G 0 G t t   t t w G 0 G t t w   t > t w
where t w is the operating time of the missile engine. ρ is the air density. S m is the reference cross-sectional area of the missile. C D m denotes the drag coefficient. P 0 is the mean thrust. G 0 is the initial gravity force on the missile, and G t the fuel consumption rate.
The turn control overload n m y , n m z of the missile can be expressed as:
n m y = K V m cos γ e g [ β ˙ r + ( tan ε r ) ε ˙ r tan ( β r ε r ) ] n m z = V m g K cos ( β r ε r ) ε ˙ r
β r = a t a n 2 ( r y / r x ) ε r = a t a n 2 ( r z / r x 2 + r y 2 )
β ˙ r = ( r ˙ y r x r y r ˙ x ) / ( r x 2 + r y 2 ) ε ˙ r = ( r x 2 + r y 2 ) r ˙ z r z ( r ˙ x r x + r ˙ y r y ) R e m 2 r x 2 + r y 2
where K is the proportional guidance coefficient. γ e is the trajectory inclination angle of the target. β r and ε r are the LOS declination and inclination angle between the missile and the target aircraft, respectively. r is the LOS vector, namely, the line connecting the center of mass of the missile and the target aircraft, and the direction from the center of mass of the missile to the target aircraft is positive. r x , r y and r z are the projections of the LOS vector r on three axes, respectively. x e , y e , z e are the coordinates of the target’s position in the inertial reference frame. r x = x e x m , r y = y e y m , r z = z e z m , R e m = r = r x 2 + r y 2 + r z 2 .
The kinematics and dynamics equations of the target aircraft are:
x ˙ e = V e cos γ e cos ψ e y ˙ e = V e cos γ e sin ψ e z ˙ e = V e sin γ e V ˙ e = g ( n e x sin γ e ) ψ ˙ e = n e y g V e cos γ e γ ˙ e = n e z g V e g cos γ e V e
where V e , γ e , ψ e are the target aircraft’s velocity, trajectory inclination angle, and trajectory declination angle in the trajectory coordinate reference frame, respectively. n e x , n e y , n e z are the turn control overloads of the target aircraft in the velocity, yaw, and pitch directions, respectively.
In view of the shortcomings in the accuracy and engineering practicability of traditional air-to-air missile attack zone solution methods, such as the interpolation method, theoretical calculation method, and polynomial fitting method, reference [45] establishes a kinematics model and guidance model of air-to-air missiles and uses the golden section method and classification discussion to realize real-time solving of the missile attack zone. The simulation results show that this method is simple, low complexity, and high efficiency. Considering the high accuracy and real-time requirement of air combat, this paper chooses the golden section method to calculate the missile attack zone.
Consider calculating the far boundary value of the missile attack zone, and the calculation method for the near boundary value is similar. The flowchart for calculating the far boundary value of the missile attack zone is shown in Figure 3.
Setting the initial LOS declination and inclination angles between the missile and the target aircraft to different values, and according to the above steps for calculating the boundary values of the missile attack zone, the missile attack zone envelope can be obtained.
The simulation results of the missile attack zone are given below when the initial LOS declination and inclination angles take different values. It is assumed that the target aircraft is in uniform linear motion; the initial value of the velocity of the missile and target is 0.74 Ma, and the initial altitude values of the missile is 8000 m. After determining the values of the LOS declination and inclination angles, the missile attack zones are calculated separately, distinguishing between attacks where the missile is located in the forward hemisphere of the target aircraft and attacks where it is located in the rear hemisphere of the target aircraft. The missile approaches the target aircraft using proportional guidance methods. The aerodynamic data of the missile are the open-source data of a close-range air-to-air missile. In Figure 4, Figure 5 and Figure 6, the boundary values of the LOS declination angle are indicated by the numbers pointed to by arrows on both sides of the attack zone, which are positive in the counterclockwise direction with the attacking aircraft as the reference point. The near distance sampling values of the missile attack zone are represented by the green hollow circles close to the side of the attacking aircraft, and the far distance sampling values are represented by the red hollow circles far away from the attacking aircraft.
(1) The initial LOS inclination angle is taken as ε r 0 = 1 / 6 π , and the range of the LOS declination angle is β r 0   [ 1 / 3 π , 1 / 3 π ] . When calculating the missile attack zone, it is assumed that the LOS inclination angle remains constant and the LOS declination angle is cyclic. Starting from 1 / 3 π , the missile attack zone is calculated once every iteration, and the LOS declination angle is increased by 1 / 15 π ; when the LOS declination value is greater than 1 / 3 π , the calculation stops. According to the above calculation results, the maximum effective attack distance values and the minimum effective attack distance values are connected to the boundary, and finally, the attack zone shown in the following figure is obtained. The LOS declination and attack zone generation methods in Figure 5 and Figure 6 are similar.
In Figure 4a, the minimum near distance sampling value is 389.7439 m, and the maximum far distance sampling value is 4412.5518 m in the missile attack zone. In Figure 4b, the minimum near distance sampling value of the missile attack zone is 1246.5888 m, and the maximum far distance sampling value is 10,829.3913 m. Compared with Figure 4a,b, the attack range of the missile in the front hemisphere of the target aircraft is significantly larger than that of the rear hemisphere attack. The above results show that when the value of the LOS inclination angle is positive and the target aircraft is located within the off-axis angle of the missile attack, the attacking aircraft points its head toward the front hemisphere of the target aircraft by implementing a sequence of maneuvers, which is advantageous for increasing the missile attack range and improving the probability of the missile hitting the target. In the process of evading the attack of the target aircraft, the attacking aircraft can make out the target aircraft located in its rear hemisphere through the implementation of a sequence of maneuvers, which can compress the attack envelope of the target aircraft’s missiles and reduce the probability of being destroyed by the target aircraft.
(2) The initial LOS inclination angle is taken as ε r 0 = 0 , and the range of the LOS declination angle is β r 0   [ 1 / 3 π , 1 / 3 π ] .
In Figure 5a, the minimum near distance sampling value is 371.3086 m, and the maximum far distance sampling value is 4029.9262 m in the missile attack zone. In Figure 5b, the minimum near distance sampling value of the missile attack zone is 1228.1555 m, and the maximum far distance sampling value is 14,144.2515 m. The above results show that when the value of the LOS inclination angle is 0 and the target aircraft is located within the off-axis angle of the missile attack, the attacking aircraft points its head toward the front hemisphere of the target aircraft by implementing a sequence of maneuvers, which is advantageous for increasing the missile attack range and improving the probability of the missile hitting the target. However, it should be noted that if the missile attack range of the attacking aircraft is extended, then the threat of it being attacked may also increase. In Equation (21) in Section 3.2, the spatial off-axis angles of the attacking and target aircraft are substituted into the angular superiority function, which facilitates an objective evaluation of the threat to both sides. If the two sides are in a positive head-on situation, although the missile attack distance of the attacking aircraft is larger, its angular superiority function value is 0.5, because the missile attack range of the target aircraft is also larger at this time, and both sides may be attacked by the other side.
(3) The initial LOS inclination angle is taken as ε r 0 = 1 / 6 π , and the range of the LOS declination angle is β r 0   [ 1 / 3 π , 1 / 3 π ] .
In Figure 6a, the missile attack zone is sampled at a minimum of 394.3421 m in the near distance and a maximum of 4167.3509 m in the far distance. In Figure 6b, the minimum near distance sampling value of the missile attack zone is 1209.5226 m, and the maximum far distance sampling value is 12,579.9199 m. The above results show that when the value of the LOS inclination angle is negative and the target aircraft is located within the off-axis angle of the missile attack, the attacking aircraft points its head toward the front hemisphere of the target aircraft by implementing a sequence of maneuvers, which is conducive to increasing the missile attack range and improving the probability of the missile hitting the target. In the process of air combat confrontation, although the attacking aircraft expands its missile attack zone through maneuvers, the attacking aircraft and the target aircraft may be in a head-on situation at this time. This requires the UCAV to adjust its maneuver strategy according to the variation of the situation.

2.5. Framework of Autonomous Maneuver Decision-Making System

The autonomous maneuver decision-making system models the air combat confrontation process as a sequence of discrete decisions. At each decision point, the UCAV executes the acquired optimized maneuver control variables. When the simulation iterates to the next decision point, the maneuver control variables are updated by an intelligent optimization algorithm, and the UCAV drives its model to move with the new maneuver control variables. In Figure 7, the framework diagram of the autonomous maneuver decision-making system is depicted. The allowable missile attack distances of the UCAV are calculated by collecting the state information of the UCAV itself and of the target aircraft. The missile attack distances are combined with the air combat superiority functions to obtain the air combat superiority functions that consider the current missile’s allowable attack distance. In addition, the missile attack distance calculation results provide information for the missile attack decision at the current moment. Then, the fuzzy inference method is used to determine the weight coefficients of the air combat superiority functions. Considering the air combat superiority functions and their weights comprehensively, the integrated air combat situation evaluation function, namely, the current objective function to be optimized, is obtained. Applying the MMPA to optimize the above function, the maneuver control variables that maximize the value of the current objective function through the iterative operations of the algorithm are obtained, which will be used as the optimized maneuver control variables of the UCAV in the iterative process of the current air combat confrontation simulation. The flow of the MMPA for solving the optimized maneuver control variables is shown in Figure 8. After that, the above maneuver control variables are used as input for the maneuver controller, which is solved by it to obtain the maneuver control instructions of the 6-DOF UCAV model. According to these maneuver control instructions, the UCAV model performs the maneuver. The specific workflow of the maneuver controller is shown in Figure 1 in Section 2.2.
It can be seen from Figure 8 that the objective function is constructed by integrating the results of the missile attack zone, the attacker state, and the target state. The above objective function is taken as the fitness function of the MMPA, and the position coordinates of the population individual represent the maneuver control variables of the 3-DOF UCAV model. According to the update of the individual position, the individual with the largest fitness function value is selected to be preserved. When the algorithm reaches the maximum number of cycles, the position coordinates of the optimal individual are output. Then, the above coordinates are used as the input of the maneuver controller to solve the maneuver control instructions of the 6-DOF UCAV model.

3. Autonomous Maneuver Decision-Making Based on the MMPA and Fuzzy Inference

When using intelligent optimization algorithms to solve maneuver strategies, it is necessary to consider issues such as the objective function design and the algorithm’s global optimal solution search capability. This section firstly introduces the target aircraft state prediction related to the objective function design, the air combat situation evaluation function establishment, and the dynamic adjustment of the weight coefficients of the superiority function, and secondly, it introduces the improvement points of the proposed optimization algorithm and its solution process.

3.1. Prediction of Target Aircraft State Information

The autonomous maneuver decision-making system collects state information from the attacking and target aircraft to support maneuver decisions. However, during air combat confrontation, the obtained target aircraft state information may be discontinuous due to interference or maneuvers performed by both sides in the confrontation. Therefore, a quadratic curve fitting method for the flight trajectory based on the recent state information of the target aircraft is proposed. Within the prediction time, the target aircraft’s movement is assumed to be continual. To improve the real-time performance of the state prediction, the target aircraft is treated as a 3-DOF particle model, and its motion is described as a quadratic polynomial equation concerning time [10].
The parameter t denotes the decision moment and t is the prediction time. The predicted position of the target aircraft is [ x e p ( t + t ) , y e p ( t + t ) , z e p ( t + t ) ] , V e p ( t + t ) represents the predicted velocity, and ψ e p ( t + t ) and γ e p ( t + t ) are the predicted trajectory declination angle and trajectory inclination angle, respectively. The predicted position of the target aircraft is calculated as follows:
x e p ( t ) y e p ( t ) z e p ( t ) = P 3 × 3 t 2 t 1 = p 1 x p 2 x p 3 x p 1 y p 2 y p 3 y p 1 z p 2 z p 3 z t 2 t 1
where P 3 × 3 is the parameter matrix, which is obtained using a polynomial curve fitting method based on the position information of the current moment and past moments. For instance, p 1 x , p 2 x and p 3 x can be computed as follows:
p 1 x = 1 2 t 2 ( x e ( t ) 2 x e ( t t ) + x e ( t 2 t ) ) p 2 x = 1 2 t 2 ( 3 x e ( t ) 4 x e ( t t ) + x e ( t 2 t ) ) p 3 x = x e ( t ) .
Using the parameter matrix, the position, velocity, and trajectory angle of the target aircraft are calculated as follows:
x e p ( t + t ) = p 1 x t 2 + p 2 x t + p 3 x y e p ( t + t ) = p 1 y t 2 + p 2 y t + p 3 y z e p ( t + t ) = p 1 z t 2 + p 2 z t + p 3 z V e p x ( t + t ) = x e p ( t + t ) x e p ( t ) t V e p y ( t + t ) = y e p ( t + t ) y e p ( t ) t V e p z ( t + t ) = z e p ( t + t ) z e p ( t ) t V e p ( t + t ) = V e p x 2 + V e p y 2 + V e p z 2 γ e p ( t + t ) = sin 1 ( V e p z V e p ) ψ e p ( t + t ) = tan 1 ( V e p y V e p x ) .

3.2. Air Combat Situation Evaluation Function

The superiority functions include the angle superiority function, distance superiority function, and energy superiority function [10]. The fuzzy inference method is used to determine the weight coefficients of each superiority function, and then the weighting operation is performed on the above superiority functions to obtain the comprehensive air combat situation evaluation function.
According to Figure 2, if the value of λ u c a v is smaller, it indicates that at this time, the attacking aircraft is more accurately aiming at the target aircraft. If the value of λ e n e m y is smaller, it indicates that the probability of the attacking aircraft being attacked by the target aircraft is smaller. When λ u c a v = 0 ,   λ e n e m y = 0 , the angle of the attacking aircraft is the same as the target aircraft, that is., the attacking aircraft is directly behind the target aircraft. The angular superiority function can be calculated by the following equation:
S A = 1 2 ( 1 + cos ( 1 2 ( λ u c a v + λ e n e m y ) ) )
The missile attack distance is related to the distance between the UCAVs on both sides of the confrontation. The distance superiority function takes into account the effects of the angle function and relative distance. R m a x represents the maximum attack distance of the missile, and R e u represents the relative distance between the attacking aircraft and the target aircraft. The expression of the distance superiority function is as follows:
S R = 0.5 + ( S A 0.5 ) × max ( 0 , 10 R m a x R e u 9 R e u )
The energy values and energy ratios of the attacking and target aircraft are calculated as follows:
E u c a v = z u c a v + V u c a v 2 g , E e n e m y = z e n e m y + V e n e m y 2 g , k = E u c a v E e n e m y
where E u c a v is the energy value of the attacking aircraft. z u c a v is the altitude value of the attacking aircraft. E e n e m y represents the energy value of the target aircraft, and k is the energy ratio of the attacking aircraft to the target aircraft.
According to the values of parameter k , the values of the energy superiority function are determined.
S E = 1   k > 2 0.5   0.5 k 2 0   k < 0.5 .
By weighting and summing the above angle, distance, and energy superiority functions, a comprehensive air combat situation evaluation function is obtained.
S = w A S A + w R S R + w E S E
where w A , w R , w E denote the weights of the angle, distance, and energy superiority functions, respectively. According to the value of each superiority function, the weight of each superiority function in the comprehensive air combat situation evaluation function is determined using the fuzzy inference method. Moreover, the weights are dynamically adjusted according to the changes in the air combat confrontation situation.

3.3. Determining the Weights of the Air Combat Superiority Functions Using Fuzzy Inference Methods

In Equation (25), the weights of the superiority functions vary with the change in the air combat situation, aiming to establish a comprehensive air combat situation evaluation function that varies with the air combat situation so as to better guide the UCAV to make maneuver decisions. The air combat situation is a reflection of the relationship between the attacking aircraft and the target aircraft, which is characterized by ambiguity, dynamism, and diversity, while the four air combat situations in reference [46] only consider the angle and distance factors, which makes it difficult to accurately describe the battlefield situation. In this paper, we use the fuzzy decision-making method to determine the weights of the air combat superiority functions. The fuzzy decision-making processes consist of the fuzzification steps of the input and output variables, designing the affiliation function, building a fuzzy rule base, and defuzzification.

3.3.1. Input and Output Fuzzification and Affiliation Function Design

The fuzzy inference machine based on air combat situations has 3 inputs and 3 outputs. The inputs are the angle, distance, and energy superiority function, and the outputs are the weights of the angle, distance, and energy superiority function. The subsets of fuzzy language for each input and output are listed below [47].
Angle fuzzy superiority function:
F A = A b i g   0.75 S A 1   B i g   a n g l e   s u p e r i o r i t y A r e b i g   0.5 S A < 0.75   R e l a t i v e l y   b i g   a n g l e   s u p e r i o r i t y A r e s m a l l   0.25 S A < 0.5   R e l a t i v e l y   s m a l l   a n g l e   s u p e r i o r i t y A s m a l l   0 S A < 0.25   S m a l l   a n g l e   s u p e r i o r i t y
Distance fuzzy superiority function:
F R = R b i g   0.75 S R 1   B i g   d i s t a n c e   s u p e r i o r i t y R r e b i g   0.5 < S R < 0.75   R e l a t i v e l y   b i g   d i s t a n c e   s u p e r i o r i t y R r e s m a l l   0.25 S R 0.5   R e l a t i v e l y   s m a l l   d i s t a n c e   s u p e r i o r i t y R s m a l l   0 S R < 0.25   S m a l l   d i s t a n c e   s u p e r i o r i t y
Energy fuzzy superiority function:
F E = E b i g   S E = 1   B i g   e n e r g y   s u p e r i o r i t y E r e b i g   S E = 0.5   R e l a t i v e l y   b i g   e n e r g y   s u p e r i o r i t y E s m a l l   S E = 0   S m a l l   e n e r g y   s u p e r i o r i t y
The weights of the angle, distance, and energy superiority functions are as follows:
w ¯ i = C B   w i 0.8   B i g C R B   0.6 w i < 0.8   R e l a t i v e l y   b i g C M   0.4 w i < 0.6   M i d d l e C R S   0.2 w i < 0.4   R e l a t i v e l y   s m a l l C S   w i < 0.2   S m a l l
where CB, CRB, CM, CRS, CS, A big ,   A rebig , A resmall , A small , R big , R rebig , R resmall , R small , E big ,   E rebig , E small are fuzzy languages that represent the fuzzified representation of the corresponding inputs and outputs. The subscript i denotes A ,   R ,   E , respectively. w i represents the exact output after defuzzification. w ¯ i is the fuzzy output, and F A , F R , F E denote the fuzzified input, respectively. Equations (26)–(29) contain both the fuzzy quantity of each input and output and the affiliation function of each fuzzy quantity. In order to reduce the complexity of the fuzzy inference calculation and facilitate fast input-to-output mapping according to the fuzzy rules, this paper adopts the rectangular affiliation function and the single-point affiliation function, that is, the affiliation of each input corresponding to a certain fuzzy quantity is only 0 or 1, and there is no intersection between each affiliation function of the same input.

3.3.2. Fuzzy Inference Rule Design

According to the expert experience, the tree-like fuzzy inference rules are obtained as shown in Figure 9, which contains a total of 27 inference rules. The inference rules are sequentially numbered as W i ( i = 1 , 2 , , 27 ) , and the subscript i denotes the i t h rule, which corresponds to the 27 types of air combat situations.
In the process of formulating fuzzy rules, the following expert experiences are mainly followed. (1) When the attacking aircraft has obvious shortcomings in a certain fuzzy superiority function, the maneuver to make up for the shortcomings should be chosen as much as possible to rapidly increase its superiority. For example, in situation W 3 , where the attacking aircraft has advantages in angle and distance but is at a disadvantage in energy, the weight of the energy superiority function should be increased when making maneuver decisions. (2) When the angle superiority and energy superiority are large and the distance superiority is relatively small, the weight of the distance superiority function should be considered to be increased in order to meet the air-to-air missile launch conditions as soon as possible. For example, in situation W 7 , where the attacking aircraft has a large angle and energy superiority but a relatively small distance superiority, the weight of the distance superiority function should be increased when making maneuver decisions. (3) When the angle superiority of the attacking aircraft is relatively small and the energy superiority is small, an escape strategy should be adopted to increase the energy superiority and decrease the distance superiority as much as possible, with the aim of increasing the distance from the target aircraft and escaping from the missile launch zone of the target aircraft as soon as possible. As in the case of situation W 18 , the attacking aircraft has a relatively small angle superiority and a small energy superiority, so the weight of the angle and distance superiority functions should be reduced, and the weight of the energy superiority function should be increased when making maneuver decisions.

3.3.3. Defuzzification to Obtain the Weights of the Superiority Functions

According to the fuzzy rules, the fuzzy quantities of the output need to be defuzzied for the purpose of converting them into exact quantities. We use the center of gravity method to complete the operation of defuzzification [48]. The expression is as follows:
z 0 = a b z μ ( z ) d z a b μ ( z ) d z
where μ ( z ) is the affiliation function of the fuzzy set in which the output z is located. z 0 is the exact quantity obtained by fuzzy inference. The range of z 0 may not be consistent with the range of the values of the actual output, and an argument domain transformation is also required, as shown in the following equation:
u = u m i n + u m a x 2 + k ( z 0 z m i n + z m a x 2 )
where k = ( u m a x u m i n ) / ( z m a x z m i n ) is the scale factor. [ u m i n , u m a x ] denotes the range of variation of the actual control variables. [ z m i n , z m a x ] is the range of variation of z 0 .
Combining the above analyses, the specific steps to obtain the outputs from the inputs of the fuzzy inference machine are as follows:
Step 1: Convert the exact input quantities into fuzzy input quantities according to the affiliation functions in Equations (26)–(29).
Step 2: Map the fuzzy input quantities into fuzzy output quantities according to the 27 rules contained in the fuzzy rule tree in Figure 9. Since there is no intersection between the affiliation function of the different fuzzy quantities of each input designed in this paper, the fuzzy output quantities can be obtained from the mapping of the fuzzy input without the requirement for complex fuzzy implication calculation.
Step 3: According to Equations (30) and (31), calculate the exact output after defuzzification, and obtain the weight matrix w of the superiority functions in the current air combat situation. The expression is as follows:
w = [ w A , w R , w E ]
where w A , w R , w E represent the weights of the angle, distance, and energy superiority functions in the current air combat situation.

3.4. Solving the Objective Function with MMPA

3.4.1. Original Marine Predator Algorithm

The marine predator algorithm (MPA) consists of an elite matrix, a prey matrix, a step size, a position update parameter, a step control parameter, and a movement type. In the d-dimensional search space, it is assumed that the population sizes of both the marine predators and prey are NP, the position update parameter is P, and the step control parameter is CF.
The elite matrix and prey matrix are denoted as follows:
E l i t e = X 1 , 1 T o p X 1 , 2 T o p X 1 , d T o p X 2 , 1 T o p X 2 , 2 T o p X 2 , d T o p X N P , 1 T o p X N P , 2 T o p X N P , d T o p N P × d ,   P r e y = X 1 , 1 X 1 , 2 X 1 , d X 2 , 1 X 2 , 2 X 2 , d X N P , 1 X N P , 2 X N P , d N P × d
where E l i t e denotes the elite matrix, and P r e y is the prey matrix. X T o p represents the top predator, and X i , j is the j t h dimension of the i t h predator.
Based on the prey-to-predator velocity ratio and their lifespans, the way to update their positions is determined.
(1) If the velocity ratio of prey to predator P v 10 , the predators do not move. When I t e r < 1 3 M a x _ I t e r , their positions are updated as follows:
stepsize i = R B ( Elite i R B Prey i ) Prey i = Prey i + P R stepsize i   i = 1 , 2 , N P
where R B stands for Brownian motion, which is a random vector with a normal distribution. R is a normalised random vector taking values at [0, 1]. P = 0.5. I t e r represents the current iteration number of the MPA and M a x _ I t e r represents the maximum iteration number of the algorithm. denotes term-by-term multiplication.
(2) If the velocity ratio of prey to predator P v 1 , the prey move by L e ´ v y motion, and the predators start to move by Brownian motion. When 1 3 M a x _ I t e r < I t e r < 2 3 M a x _ I t e r , the populations are divided into two parts, and their positions are updated as follows:
stepsize i = R L ( Elite i R L Prey i ) Prey i = Prey i + P R stepsize i   i = 1 , 2 , 1 2 N P
where R L Prey i is used to simulate the movements of prey. R L denotes a random vector based on L e ´ v y motion.
stepsize i = R B ( R B Elite i Prey i ) Prey i = Elite i + P C F stepsize i   i = 1 2 N P , N P
where R B Elite i is used to simulate the movements of predators. The step control parameter CF is updated as follows:
C F = ( 1 I t e r M a x _ I t e r ) ( 2 I t e r M a x _ I t e r )
(3) If the velocity ratio of the prey to the predator P v = 0.1 , then no matter what kind of motion the prey adopt, the predators move by L e ´ v y motion. When I t e r > 2 3 M a x _ I t e r , their positions are updated as follows:
stepsize i = R L ( R L Elite i Prey i ) Prey i = Elite i + P C F stepsize i   i = 1 , 2 , N P
Marine predators may be affected by the effect of eddy current formation or fish aggregation devices (FADs) in the process of predation, leading to a trap in local optimal solutions. Therefore, longer jumps are used to avoid falling into local optimal solutions. The jump patterns are as follows:
Prey i = Prey i + C F [ X l + R ( X u X l ) ] U   i f   r F A D s Prey i + F A D s 1 r + r P r e y r 1 P r e y r 2   i f   r > F A D s
where FADs = 0.2, r is a random number at [0, 1]. X u , X l represent the upper and lower limits of the prey position. U is a binary vector containing only 0 and 1. P r e y r 1 and P r e y r 2 are random values of the prey position.
At the end of each iteration of the algorithm, the optimized solution obtained in this iteration is compared with the optimized solution of the previous iteration stored in the marine memory, and the solution with a higher fitness value is saved to improve the quality of the populations. The pseudo-code of the MPA is shown in Algorithm 1 [49].
Algorithm 1: The pseudo-code of the marine predator algorithm
1      Initialize {
2         population size NP, search space dimension d, prey matrix P r e y N P × d , step size control parameter CF,
3         uniform random vector in [0, 1] R , random vector based on Brownian motion R B ,
4         random vector based on L e ´ v y  motion  R L , upper limits of the prey position X u ,
5         lower limits of the prey position X l }
6      Main loop
7      while ( I t e r < M a x _ I t e r )
8         compute the fitness of each agent;
9         construct the Elite matrix E l i t e N P × d and finish memory saving;
10         if ( I t e r < 1 3 M a x _ I t e r )
11         compute the step size of each agent and update the prey position using Equation (34);
12         else if ( 1 3 M a x _ I t e r < I t e r < 2 3 M a x _ I t e r )
13            for the first half of the populations ( i = 1 , 2 , 1 2 N P )
14            update the prey position using Equation (35);
15            for the other half of the populations ( i = 1 2 N P , N P )
16            update the prey position using Equation (36);
17         else
18            update the prey position using Equation (38);
19         end if
20      compute the fitness of each agent, finish memory saving and Elite matrix updating;
21      apply FADs effect and update the prey position using Equation (39);
22       I t e r = I t e r + 1 ;
23      end while
24      //P = 0.5 and  F A D s = 0.2

3.4.2. MMPA

In this paper, we propose an MMPA [50] that combines the logistic opposition-based learning (LOBL) mechanism and adaptive parameter updating rules to cope with the problem that the MPA makes it easy to fall into the local optimal solutions. Compared with the MPA, the MMPA has the following advantages: it can better balance the exploration and exploitation ability of the algorithm and has better stability [50]. The MMPA mainly adopts the following measures to overcome the deficiencies of the MPA. (1) In the process of population initialization and population position updating, we introduce the LOBL mechanism to improve the quality of the initial populations and the quality of the solutions obtained during the population position updating process. (2) A new iterative update rule is proposed to improve the global search capability. (3) We propose a strategy for correcting the position weights of searching individuals in the early and middle iteration phases of the algorithm, whose purpose is to prevent the algorithm from falling into local optimal solutions and accelerate convergence. (4) A new strategy for nonlinear step-size control parameter generation is proposed to balance the performance of the algorithm between exploration and exploitation.
Opposition-based learning (OBL) is a new technology in the field of computing proposed by Tizhoosh [51]. The logistic chaotic mapping is a well-known chaotic mechanism [52], which helps to generate new solutions and increase population diversity. In this paper, a novel LOBL mechanism is proposed by combining chaotic theory and OBL to enhance the population diversity and generate higher-quality solutions. The mathematical model of LOBL is as follows:
X ¯ i , j = L j + U j λ i X i , j   i = 1 , 2 , N P
where X i , j and X ¯ i , j represent the j t h dimensional solution of the prey at the i t h point and the solution corresponding to its opposite P r e y ¯ , respectively. λ i is the i t h logistic chaotic value, which is generated as follows:
λ i + 1 = C p λ i ( 1 λ i )   i = 1 , 2 , N P
where C p is a control parameter with a value of 4 [53].
During position updating using Brownian motion and L e ´ v y motion, it is possible that the intelligent agents miss the optimal solutions due to the occurrence of a large step size [54]. Sine and cosine functions are introduced to improve the performance of the MPA, which are used to dynamically update the population positions in a three-phase optimization process. The new position update rules are as follows:
(1)
When I t e r < 1 3 M a x _ I t e r
stepwise i = R B ( Elite i R B Prey i )
Prey i = Prey i + C F R 1 stepsize i   i f   p > 0.5 Prey i + C F R 2 stepsize i   i f   p 0.5   i = 1 , 2 , N P
(2)
When 1 3 M a x _ I t e r < I t e r < 2 3 M a x _ I t e r
The positions of the first half of the populations are updated as follows:
stepsize i = R L ( Elite i R L Prey i )
Prey i = Prey i + C F R 1 stepsize i   i f   p > 0.5 Prey i + C F R 2 stepsize i   i f   p 0.5   i = 1 , 2 , 1 2 N P
The positions of the last half of the populations are updated as follows:
stepsize i = R B ( R B Elite i Prey i )
Prey i = Elite i + C F R 1 stepsize i   i f   p > 0.5 Elite i + C F R 2 stepsize i   i f   p 0.5   i = 1 2 N P , N P
(3)
When I t e r > 2 3 M a x _ I t e r
stepsize i = R L ( R L Elite i Prey i )
Prey i = Elite i + C F R 1 stepsize i   i f   p > 0.5 Elite i + C F R 2 stepsize i   i f   p 0.5   i = 1 , 2 , N P
In Equations (43)–(49), the expressions for R 1 , R 2 are as follows:
R 1 = sin ( 2 π R r a n d ( ) ) R 2 = cos ( 2 π R r a n d ( ) )   i = 1 , 2 , N P
where r a n d ( ) and p are a random vector and random number at [0, 1], respectively.
The global search capability of the algorithm is improved by employing the amplitude feature of the sine and cosine functions in the position iterative update rules, which is more favorable for guiding the algorithm to search around the optimal solutions.
In the early and middle optimization phases of the algorithm, the predators and prey search very fast and may skip the optimal solutions owing to exploring too fast. In order to make the transition between exploration and exploitation of the MPA smoother, the inertia weight coefficients are introduced into the position-updating equations in the early and middle phases of the algorithm, and the improved position-updating equations are shown in Equations (52) and (53). The inertia weight coefficient is expressed as follows:
S F = 1 sin ( π 2 I t e r M a x _ I t e r ) e ( π 2 I t e r M a x _ I t e r )
After introducing the inertia weight coefficients, by adjusting the effect of Prey i on the population position update, the search intelligent agents can focus on the region around the optimal solutions found during the iteration process. In addition, by using a nonlinear weight-changing strategy in the early and middle phases of the algorithm, the shortcomings that the algorithm is prone to falling into local optimal solutions and reaching premature convergence can be avoided. The improved position update equations are as follows:
Prey i = S F Prey i + C F R 1 stepsize i   i f   p > 0.5 S F Prey i + C F R 2 stepsize i   i f   p 0.5  
Prey i = S F Elite i + C F R 1 stepsize i   i f   p > 0.5 S F Elite i + C F R 2 stepsize i   i f   p 0.5  
Larger step-size control parameters help the algorithm to perform global exploration, and smaller parameters promote the algorithm to perform local exploitation. In order to further enhance the balance between exploration and exploitation, improve the global search capability, and promote the fast convergence of the local optimization process, this paper proposes a new strategy for nonlinear step-size control parameter generation. The expression for CF is as follows:
C F = cos ( π 2 I t e r M a x _ I t e r ) ( 2 I t e r M a x _ I t e r )
The CF variation rules of the original MPA and MMPA are shown in Figure 10. Compared with the CF of the original MPA, the CF of the MMPA has the following variation characteristics: in the early phase of the algorithm iteration, the parameters are slowly decreased, while in the late phase, the parameters are rapidly reduced, which not only increases the global search time of the algorithm but also promotes the algorithm to rapidly converge in the late phase of the iteration. The pseudo-code of the MMPA is shown in Algorithm 2 [50].
Algorithm 2: The pseudo-code of the modified marine predator algorithm
1      Initialize {
2         population size NP, search space dimension d;
3         initialize the prey population to acquire the prey matrix P r e y N P × d ;
4         use the LOBL mechanism to produce P r e y ¯ based on Equations (40) and (41);
5         compute the fitness and choose the best NP search agents from P r e y P r e y ¯ ;
6         define inertia weight coefficient SF based on Equation (51);
7         define step size control parameter CF based on Equation (54);
8         apply sine and cosine functions to update the population positions based on Equation (50);
9         random vector based on Brownian motion R B , random vector based on L e ´ v y motion R L ;
10         upper limits of the prey position X u , lower limits of the prey position X l  }
11      Main loop
12      while ( I t e r < M a x _ I t e r )
13         compute the fitness of each agent;
14         construct the Elite matrix E l i t e N P × d and finish memory saving;
15         refresh SF by Equation (51), CF by Equation (54) and refresh R B and R L ;
16         for every search agent
17            if ( I t e r < 1 3 M a x _ I t e r )
18               compute the step size of agent and update prey position using Equations (42), (43) and (52);
19            else if ( 1 3 M a x _ I t e r < I t e r < 2 3 M a x _ I t e r )
20               for the first half of the population ( i = 1 , 2 , 1 2 N P )
21               update prey positions using Equations (44) and (45) and Equation (52);
22               for the last half of the population ( i = 1 2 N P , N P )
23               update prey positions using Equations (46), (47) and (53);
24            else
25               update prey positions using Equations (48) and (49);
26            end if
27         refresh R 1 and R 2 using Equation (50);
28         end for
29      employ the LOBL mechanism to produce P r e y ¯ based on Equations (40) and (41);
30      compute the fitness and choose the best NP search agents from P r e y P r e y ¯ ;
31      finish memory saving and Elite matrix updating;
32      apply FADs effect and refresh prey positions using Equation (39);
33       I t e r = I t e r + 1 ;
34      end while
35      // F A D s = 0.2

3.4.3. MMPA Time Complexity Analysis

The time complexity of the MMPA can be obtained through mathematical expressions. Suppose the marine predator populations size is NP, the search space dimension is d, and the objective function is S. In the original MPA, the complexity of updating the elite matrix and prey matrix can be expressed as O(NP (2 d + S)), and the complexity of updating the position of the prey and the marine predator is O ( NP log NP + 2 d log NP + S log NP ) . Defining the iteration number of the algorithm as Max _ Iter , the computational cost of the original MPA is O ( Max _ Iter ( NP log NP + 2 d log NP + S log NP ) ) . For the MMPA proposed in this paper, the computational cost of the LOBL mechanism is defined as C L , the inertia weight coefficients as SF, and the position iterative update parameters as R1, R2. The increased computational cost due to the introduction of the LOBL mechanism is O ( NP 2 d + S + C L ) and the introduction of the new position iterative update rules is O ( NP log NP + NP ) . Furthermore, the inertia weight factor leads to an increased computational cost of O ( NP d ) at every iteration. Combining these analyses, the total computational cost of the MMPA is O ( Max _ Iter ( 3 NP d + NP S + NP C L + NP log NP + NP ) ) .

4. Simulation Results and Analysis

In order to evaluate the performance of the maneuver strategy solving method proposed in this paper, firstly, the parameter sensitivity analysis of the MMPA is carried out and the algorithm performance comparison experiments are implemented; secondly, six types of initial scenarios are set up to conduct the air combat simulation; and finally, the simulation results of the MMPA are statistically analyzed and discussed.

4.1. Comparative Analysis of Intelligent Optimization Algorithms

4.1.1. Comparative Experiments with the MMPA Parameters

In order to select the algorithm-related parameters, the effects of the parameter variations on the performance of the MMPA are presented in Figure 11. The following parameters are mainly considered: population size NP, position update parameter P, step size control parameter CF, and inertia weight coefficient SF. Their values are enumerated in Table 1. Iter denotes the current iteration number of the MMPA, and the maximum iteration number of the algorithm is Max_Iter. rand is a random number at [0, 1]. R1 and R2 are the values of the position update parameters.
It can be seen from Figure 11 that the performance of the algorithm is affected by the above four parameters to some extent. In Figure 11a, increasing the number of populations enriches the solution space, which is favorable to improving the performance of the algorithm. However, when the number of populations exceeds a threshold value, continuing to increase the number of populations will not improve the algorithm’s ability to find optimal solutions. Figure 11b depicts the effect of different position update parameters on the performance of the algorithm. By introducing sine and cosine functions into the iterative updating equations, the iterative position update rules are changed based on the magnitude change characteristics of the functions, which helps to improve the algorithm’s global searching ability and guides the algorithm to search in the range of the optimal solution space. As shown in Figure 11b, the most appropriate values of the position update parameters are R1 = sin(2 π ·rand), R2 = cos(2 π ·rand). In Figure 11c, a larger step control parameter improves the algorithm’s global optimal solution search capability, and a smaller parameter promotes the algorithm to perform local optimal solution search. Therefore, to further improve the balance between exploration and exploitation, enhance the global search capability, and promote fast local convergence, the algorithm’s step-size control parameter is more preferably valued as CF = cos( 1 / 2 · π ·Iter/Max_Iter) ^(2·Iter/Max_Iter). The proposed step control parameter decreases slowly at the beginning of the iteration and rapidly in the late iteration, which increases the global search time and accelerates the convergence of the algorithm in the late iteration. In Figure 11d, the inertia weight coefficients are used to enhance the smoothness of the transitions between exploration and exploitation to improve the performance of the algorithm. With the introduction of the inertia weight coefficients, by varying the influence of the current populations on the next generation of populations, the populations can cluster around the optimal solutions found during the iterations of the algorithm. An optimal value for the algorithm’s inertia weight factor is SF = [1 − sin( 1 / 2 · π ·Iter/Max_Iter)] ^e( 1 / 2 · π ·Iter/Max_Iter).
As can be seen from Figure 11, under the parameter settings shown in Table 1, the MMPA can quickly find the optimal solution, making the fitness value of the objective function reach the peak value. The main reasons for the rapid convergence of the MMPA are as follows. On the one hand, the quality of the initial solution is high, making the population search in the optimal solution space, which is conducive to improving the search efficiency of the optimal solution; on the other hand, the adaptive adjustment of the iterative updating parameters of the population position not only enables the population individuals to search in the whole solution space but also enables them to gather around the optimal solution to search so as to promote the fast convergence of the algorithm.

4.1.2. Comparative Experiments with the Algorithm Performance

In order to demonstrate the effectiveness and superiority of the MMPA, comparative experiments are implemented between the proposed maneuver decision-making method and other existing methods. The main comparison methods are particle swarm algorithm (PSO) [55], differential evolution (DE) [56], MPA [49], logistic opposition-based learning–marine predator algorithm (LOBL-MPA) [57,58,59], self-adaptive rules–marine predator algorithm (SAR-MPA) [60,61], and Harris hawks optimization (HHO) [62]. The initial parameter settings of these algorithms are shown in Table 2. c1 and c2 are the acceleration coefficients, and w is the inertia weight. F0 is the initial coefficient of mutation, and F is the dynamic coefficient of mutation during the iteration process of the algorithms. CR denotes the crossover probability. J represents the rabbit’s random jump strength; E0 is the initial state of rabbit energy; and E denotes the dynamic energy of the prey when it escapes. LOBL is a mechanism for population initialization and individual selection that combines logistic chaotic mapping with opposition-based learning.
In Figure 12, the results of the comparative experiments are depicted. Table 3 demonstrates the time spent by the different algorithms in solving the objective function. From Figure 12, it can be seen that the maximum value of the fitness value curve is obtained for the MMPA, which proves the effectiveness and superiority of the proposed algorithm. However, HHO, the original MPA, LOBL-MPA, and SAR-MPA perform poorly compared to the MMPA. The original MPA, DE, and PSO are not applicable for solving the multidimensional search problem presented in this paper. To facilitate the assessment of the algorithm’s performance, the number of convergence iterations is defined as the criterion for assessing the performance of the algorithm. The number of convergence iterations refers to the number of iterations required to discover the global optimal solutions. For time consumption, although the MMPA takes more time than the original MPA, it has an improved number of convergence iterations. Moreover, the time taken by the MMPA to solve the optimal maneuver control variables of the UCAV is 0.085 s, which is much less than the total air combat simulation time.
From Figure 12 and Table 3, it can be seen that all seven algorithms have fewer convergence iterations and are able to find the optimal solution faster. The number of convergence iterations is two for the MMPA and five for PSO. The reason for the fewer convergence iterations for the seven algorithms mentioned above is mainly due to the fact that, in the process of constructing the comprehensive air combat situation evaluation function, the air combat superiority function variable weight strategy is introduced, which is introduced in Section 3.3. The introduction of the above strategy has a strong guiding effect on the search for the optimal solution of the optimization algorithms, which promotes the rapid convergence of the algorithms.
The ablation experiments are carried out to analyze the improvement effect of the MMPA. The MPA and SAR-MPA are selected as the comparison algorithms. The parameter settings of the algorithm are shown in Table 2. The trial maneuver decision method is chosen as the baseline algorithm, which means that the aircraft first estimates the state of the target and then selects the maneuvers with the maximum fitness function value from the maneuver library as the output. The UCAV controlled by the baseline algorithm represents the blue side, and the UCAV controlled by the above three test algorithms represents the red side. The initial confrontation scenario is set up as follows: The heading angle of the red side is 0°, and the blue side is 180°; both sides are in an oppositional position, both at an altitude of 6000 m and a velocity of 220 m/s, and the initial distance between the two sides is 14.14 km. In the above scenario, the MMPA, MPA, and SAR-MPA are tested against the baseline algorithm 100 times, respectively. To analyze the improvement effect of the MMPA, the control precision and the victory rate of confrontation are counted. The control accuracy represents the ratio of the accumulated flight time of the aircraft in the confrontation area to the confrontation time in a certain confrontation. The average value of this index in 100 confrontations of the above three types of methods is, respectively, taken; that is, the control accuracy value of Table 4 is obtained. The counter victory ratio is the ratio of the number of victories achieved by the missile hitting the target to the total number of confrontations. The statistical results of the confrontation are shown in Table 4.
It can be seen from the results in Table 4 that the control accuracy and the counter victory ratio of the MMPA are the best. SAR-MPA is the second-best performer in both categories. The MPA has the worst performance on these indicators. This shows that the improved mechanism introduced in the MMPA is effective and can effectively improve the control accuracy and the victory rate of the aircraft.

4.2. Air Combat Simulation

To test the effectiveness of the MMPA in the autonomous maneuver decision-making of UCAVs, five different initial scenarios are set up, and air combat simulation experiments are conducted based on the above scenarios, respectively. According to the settings of the initial positions of the two UCAVs, the initial confrontation scenarios are classified into the following five categories, namely, neutral, offensive, oppositional, defensive, and parallel. In addition, to test the effectiveness of the variable weight strategy for the superiority functions based on fuzzy inference in UCAV autonomous maneuver decision-making, one head-on scenario is selected based on the setup of the initial positions of two UCAVs, and air combat simulation experiments are conducted.
In the air combat confrontation scenario, the red UCAVs stand for the attacking aircraft, the blue UCAVs denote the target aircraft, and the green curves are the missile flight trajectory. The performances of both sides’ aircraft platforms and their mounted weapons are the same. The attacking aircraft and target aircraft always know each other’s position information. F-16 models are used as aircraft platforms, and their mounted weapons are two close-range air-to-air missiles. The maximum airspeed of the aircraft platform is 360 m/s, and the minimum is 82 m/s. The missile attack distances are dynamically solved by the fire control systems of both UCAVs. After the missile is launched, it approaches the target using proportional guidance methods. The decision time is T D = 1   s , and the simulation sampling time T s = 0.02   s . The parameter settings of the MMPA and its comparison methods, the MPA, DE, PSO, and HHO, are shown in Table 2.
Case 1 is used to test the effect of the dominant initial situation on the autonomous maneuver decision-making of the UCAV. In Case 2, the MMPA and MPA are used for the maneuver decision-making methods of both sides, and the effectiveness of the modified algorithm is tested through air combat confrontation simulation experiments. Cases 3, 4, and 5 are used as comparison experiments, in which the maneuver decision-making methods used by the red UCAV are both the MMPA, and the blue UCAV are DE, PSO, and HHO, respectively. Case 6 is used to test the superiority of the variable weight strategy based on fuzzy inference for the superiority functions over the strategy that the weights of superiority functions are fixed. In Case 6, the maneuver decision-making methods employed by both sides are the MMPA. In the process of solving the optimized maneuver control variables, the red UCAV adopts the superiority function variable weight strategy to calculate the fitness values, while the blue UCAV adopts the superiority function fixed weight strategy. The detailed parameter settings for the six initial confrontation scenarios are listed in Table 5. In this section, the typical adversarial simulation trajectories for the six types of initial scenarios are described, and the results are preliminarily analyzed. The repeated experiments and a discussion of the results are presented in Section 4.3.
Case 1: Neutral scenario
The initial heading angles for red and blue are 0° and 90°, respectively. Both sides are in orthogonal positions and have the same initial altitude and flight velocity. At the initial moment, both UCAVs use the MMPA to solve the optimized maneuver control variables. The blue side presses the slope to hover to the left to disrupt the red side’s stable tracking condition, making it impossible for the red side to satisfy the missile launch conditions. At 2 s, the red presses the slope to turn left, and by controlling the longitudinal overload, the red achieves faster acceleration. At 8 s, the red’s velocity is greater than the blue’s, and by obtaining a greater normal overload, it allows for greater pitch angle rates and achieves a faster turn for the red than the blue’s. During 10 s to 15 s, the red attempts to acquire stable tracking of the blue to reach the situation that the red is directly behind the blue by implementing a greater turn angle rate than the blue to achieve the missile launch conditions and then shoot down the blue side. At 16 s, the blue attempts to escape the passive situation of being tracked by the red through increasing velocity and changing head pointing. At 17 s, as a result of turning at a higher rate, the blue’s normal overload has reached 6 g. At 18 s, the blue is within the red’s missile attack zone, and the red launches a missile. At 20 s, the relative distance between the missile and the blue side is less than 100 m, and it is judged that the target has been successfully hit by the missile. At last, the red side wins the air combat, and the simulation is terminated.
As can be seen from Figure 13b, the values of the roll angular rates are larger than those of the pitch and yaw angular rates. The change curves of the velocity, attack angle, and sideslip angle are shown in Figure 13c. At the beginning of the confrontation, the red side starts to increase its velocity and maintains a faster acceleration. At 6 s, the increase rates of the red side’s velocity slow down. At 12 s, the blue side achieves a rapid increase in velocity by increasing the longitudinal overload, and at 14 s, the magnitude of its velocity exceeds that of the red side. In air combat confrontation, the increase in flight velocity indicates the rise in air combat energy. The values of the sideslip angles are small, and both sides keep the values of the sideslip angles fluctuating around zero. The change curves of the overload are shown in Figure 13d. The overload is a key parameter to determine the maneuver performance of UCAVs. Due to the intensity of the close air combat confrontation, both sides may reach large values of overload to change the head pointing faster and maintain a large flight velocity. Taken together, the above analysis shows that the MMPA is effective in the autonomous maneuver decision-making of the UCAV, and due to the initial angle advantage of the red side, the red side wins the air combat finally.
Case 2: Offensive scenario
Both the red and blue sides have an initial heading angle of 0°. This means that both sides are flying in the same direction at the starting point, and the red is in an attacking position. Both sides have the same initial altitude and flight velocity. The red UCAV uses the MMPA to solve the optimized maneuver control variables, and the blue UCAV employs the MPA. At the initial moment, the red UCAV is at the right rear of the blue UCAV.
At 2 s, the red swings to the left to stabilize the tracking conditions, while the blue swings to the right and lowers the altitude to disrupt the red’s stabilizing tracking conditions. In response to the blue’s circling maneuver to the lower right, at 4 s, the red flattens out. From 5 s, the red presses the slope to the right and lowers the altitude to achieve stable tracking conditions. At 10 s, the red achieves a faster turn with larger pitch angle rates. During 11 to 32 s, both sides’ UCAV velocity and normal overload change patterns are similar. However, the red’s pitch angle rates are always larger than the blue’s, so the red follows the blue closely in the circling maneuver, and the red completes the head-turning more quickly. At 34 s, the blue increases the longitudinal overload to gain greater velocity. However, due to the unstable flight state of the blue at that moment, its value of attack angle fluctuates around zero, and the blue’s velocity does not increase but rather decreases. At 35 s, the red completes the turning maneuver with greater turning angle rates than the blue. At 36 s, the blue is within the red’s missile attack zone, and the red launches a missile. At 40 s, the relative distance between the missile and the blue side is less than 100 m, and it is judged that the target has been successfully hit by the missile. At last, the red side wins the air combat, and the simulation is terminated.
The change curves of the roll, pitch, and yaw angular rates based on the body-fixed coordinate frame are shown in Figure 14b. As can be seen from Figure 14c, at the beginning of the confrontation, the blue side first starts to increase the velocity. At 8 s, the blue side’s velocity reaches a peak, and then the blue side’s velocity decreases for 9 s to 15 s due to the circling maneuver, which is accompanied by energy loss. At 8 s, the red starts to increase its velocity. At 14 s, the red reaches a peak velocity and then decreases its velocity due to maneuvers. When both UCAVs maneuver to reduce their altitude, their velocity increases. The values of the sideslip angles are small. The overload variation curves are shown in Figure 14d. From 1 s to 30 s, the overload change patterns of both UCAVs are similar because they are performing similar circling maneuvers. At 34 s, the change rules of the overload curves of the blue side are not consistent with those of the red side because the blue side tries to increase the longitudinal overload to gain greater velocity as well as to achieve the purpose of avoiding being attacked by the red side. In summary, in the offensive initial scenario, the MMPA assists the red UCAV with high accuracy to obtain optimized maneuver control variables continuously, and the red side finally succeeds in occupying a favorable attack position and wins the air combat.
Case 3: Oppositional scenario
At the initial moment of the air combat confrontation, the heading angle of the red UCAV is 0°, and the blue UCAV is 180°. Both sides in the confrontation are in opposing positions, and they have the same initial altitude and flight velocity. The red UCAV uses the MMPA to solve the optimized maneuver control variables, and the blue UCAV employs the DE algorithm. The initial straight-line distance between the two sides is 14.14 km. At 2 s, both sides hover to the left with a pressure slope to point their heads toward each other and search for each other as soon as possible. During 5 s to 10 s, both UCAVs perform the climb accompanied by velocity and energy reduction. During 11 s to 18 s, the blue gains acceleration by increasing the longitudinal overload. However, in 11 s to 15 s, the red gains greater pitch angle rates by increasing the normal overload, which helps the red to complete the head-turning faster. In 17 s to 20 s, the blue increases the normal overload to achieve a faster turn. At 25 s, both UCAVs increase their velocity by increasing the longitudinal overload and achieve a faster turn by increasing the normal overload. At this point, both UCAVs have similar longitudinal overload change curves, but the change rates of the red’s normal overload are greater than those of the blue. The red side is trying to achieve a faster turn by increasing the longitudinal overload. Aiming at achieving a faster head pointing to the blue, the red has a normal overload value greater than 3 g at 28 s. From 25 s to 30 s, the red completes the circling maneuver faster than the blue due to the greater value of the red’s pitch angle rates compared to the blue. The red gains less energy loss; therefore, it has a greater velocity. At 28 s, the red’s velocity is 267 m/s. At 29 s, the red has maintained stable tracking of the blue, and the blue is within the red’s missile attack zone. Furthermore, the red has an energy advantage at this time, so the red launches a missile to attack the blue. At 31 s, the relative distance between the missile and the blue side is less than 100 m, and it is judged that the missile has successfully hit the target. At last, the red side wins the air combat, and the simulation is terminated.
As can be seen from Figure 15b, the values of the roll angular rates and yaw angular rates fluctuate on axis 0. The change curves of the velocity, attack angle, and sideslip angle are shown in Figure 15c. In the first half of the confrontation, the velocity change curves of both UCAVs are similar. In the second half of the confrontation, the acceleration of the red side is larger than that of the blue side. After the circling maneuver, the red side has greater velocity. The values of both sides’ sideslip angles keep fluctuating around zero. The overload variation curves are shown in Figure 15d. The normal overload determines the turning performance of the UCAVs, and from 23 s to 31 s, the continuous circling maneuvers make the normal overload values of both UCAVs greater than 2 g. At 28 s, to complete the possession before the blue side, the red side has a normal overload value of more than 3 g. The results of this case show that the MMPA outperforms the DE algorithm in the autonomous maneuver decision-making of UCAVs. Compared with the DE algorithm, the MMPA has better flexibility, and the information prediction of the target is effective.
Case 4: Defensive scenario
Both the red and the blue have an initial heading angle of 180°. This means that both UCAVs fly in similar directions, and the blue has an initial positional advantage. The red UCAV uses the MMPA, and the blue UCAV uses the PSO algorithm to solve the optimized maneuver control variables. To attack the red side, the blue side reduces its altitude. To avoid being attacked by the blue, the red increases its altitude, converting the kinetic energy of the initial moment into potential energy. Between 1 s and 10 s, the velocity of the blue increases steadily, while the red’s velocity decreases steadily. Between 11 s and 20 s, by applying a greater normal overload, the blue makes a downward turn at a greater angular rate than the red, while the red makes a downward turn at a radius smaller than the blue. At 22 s, when the red discovers that the blue has changed out of the downward turn, by increasing its normal overload and implementing a greater pitch angle rate, the red quickly changes out of the downward turn as well. At 22 s, the normal overload of the red is greater than 5 g. To reduce the energy loss due to the fast maneuver, the red increases its longitudinal overload while performing the change out of the downward maneuver to keep its velocity stable. Between 25 s and 35 s, the blue climbs in altitude, then dives and maintains a pressure slope to turn left in an attempt to track the red. In 25 s to 35 s, the red performs a climb followed by a downward dive to destabilize the blue’s tracking and prevent it from meeting the missile launch conditions. At 36 s, the blue continues to hover to the left, and the red follows suit with a spin to the left. The red completes the maneuver faster because it has a smaller radius of rotation. At 48 s, the blue switches to level flight and then climbs to the right by pressing the slope. At 47 s, the red changes out of the circling maneuver and then climbs up to the left by pressing the slope. At 61 s, the blue maneuvers to the apex and attempts to continue to complete the second half of the maneuver to increase velocity and gain energy advantage. At 59 s, the red switches to level flight, followed by pressing the slope to the right, then transferring to a diagonal somersault maneuver. At 65 s, the blue is within the red’s missile attack zone, and the red launches a missile to attack the blue. At 69 s, the relative distance between the missile and the blue side is less than 100 m; it is judged that the missile has successfully hit the target. At last, the red side wins the air combat, and the simulation is terminated.
The change curves of the roll, pitch, and yaw angular rates based on the body-fixed coordinate frame are shown in Figure 16b. The change curves of the velocity, attack angle, and sideslip angle are shown in Figure 16c. The air combat confrontation is intense, and the UCAVs on both sides have large velocity variations. At 62 s, due to implementing the climb maneuver, the velocity of the blue side is less than 100 m/s but greater than the minimum 82 m/s. At 22 s, due to executing the fast dive maneuver, the maximum velocity of the red side is more than 340 m/s. The values of both sides’ sideslip angles keep fluctuating around zero for most of the time. At 65 s, the blue side’s sideslip angle fluctuates due to lateral overload. The overload change curves are shown in Figure 16d. The close air combat confrontation is intense, and both sides have a fast rhythm shift between attack and defense. To achieve the head of both sides rapidly pointing to the target aircraft, the normal overload values are sometimes greater than 5 g. The air combat simulation results of this case show that the MMPA is more effective than the PSO algorithm in solving the optimized maneuver control variables of the UCAVs. Although the PSO algorithm increases the diversity of the selection of the UCAV maneuver control variables by setting the mutation probability as well as the crossover probability, the MMPA has a greater advantage in selecting the optimized maneuver control variables. Therefore, the red side UCAV has a greater probability of winning the air combat confrontation.
Case 5: Parallel scenario
Both the initial heading angles for red and blue are 0°. Both sides are in parallel positions and have the same initial altitude and flight velocity. The red uses the MMPA to solve the optimized maneuver control variables, and the blue uses HHO. At the initial moment, the red presses the slope to hover to the left to search for the blue. At the same time, the blue presses the slope to hover to the right to search for the red. At 7 s, the blue performs a climb maneuver. At 14 s, the blue climbs to the top, then flattens the plane and executes a downward dive. During the first half of the confrontation, the red side maintains a hover maneuver. At 15 s, the red side climbs up. After gaining a velocity advantage by diving, the blue side performs a right circle maneuver at 23 s to change the head pointing and attempt to steadily track the red side. At 23 s, the red reaches its peak, followed by a dive maneuver. The red finds the blue during the dive and meets the missile attack conditions. The velocity of the blue is reduced due to the high overload turning maneuver. In order to change the head pointing faster, the maximum normal overload of the blue is 3.6 g. At 26 s, the red side launches a missile. At 29 s, the relative distance between the missile and the blue is less than 100 m, and it is judged that the missile has successfully hit the target. The red side wins the air combat, and the simulation is terminated.
The curves of the roll, pitch, and yaw angular rate variation based on the body-fixed co-ordinate frame are shown in Figure 17b. As can be seen from Figure 17c, at the beginning of the confrontation, both sides hover steadily, with small fluctuations in the velocity values. At 7 s, the velocity of the blue is reduced due to the implementation of the climb maneuver. At 23 s, the velocity of the red reaches its minimum due to the conversion of kinetic energy into potential energy. The values of the sideslip angles on both sides keep fluctuating around zero. The change curves of the overload are shown in Figure 17d. Based on the above analysis, compared with HHO, the MMPA has advantages in terms of the UCAV’s autonomous maneuver decision-making.
Case 6: Head-on scenario
The initial heading angle of the red side is 0°, and the blue side is 180°. Both sides are in a head-on position with the same initial altitude and flight velocity, and the initial distance of both UCAVs is 14.14 km. The maneuver decision-making methods of both UCAVs are the MMPA, and the red UCAV adopts a superiority function variable weight strategy based on fuzzy inference to calculate the fitness values, while the blue UCAV adopts a superiority function fixed weight strategy. According to the results of the fitness values, the maneuver control variables that make the fitness values the largest are selected as the output of the algorithm and used as the maneuver control variables of the corresponding UCAVs at the next moment. From 1 s to 6 s, both sides hover to the left by pressing the slope to search and find each other as soon as possible. At 7 s, the blue reduces the slope and switches to an upward climb accompanied by a small reduction in its velocity. At 7 s, the red also begins to climb. At 10 s, the red climbs to the top and then dives. The change curves of the velocity of both sides are similar for the first 9 s. At 13 s, the red climbs to the apex, followed by a dive. At 16 s, the blue ends the dive, and its velocity reaches a peak at that moment, which has a value of 250 m/s, followed by turning left through pressing the slope. At 17 s, the red exits the dive, and its velocity also reaches a peak at that moment, then maintains level flight. During the dive of the UCAVs on both sides, the red exits the dive with greater velocity than the blue due to the red implementing greater longitudinal overload values than the blue. At 17 s, the blue performs a turn with smaller normal overload values than the red. The turning maneuver causes a loss of kinetic energy for the blue, and its velocity reaches a minimum at 28 s, which is less than 150 m/s at that moment. At 19 s, the red is circling to the left by pressing the slope. At this time, the red completes the head-turning faster than the blue by applying a larger normal overload during the circling maneuver, with larger pitch angle rates and flight velocity. When the red completes the turn in the vertical plane, at 30 s, the red begins to circle horizontally to the left to search for the blue. At 29 s, the blue increases the normal overload to achieve a quick change out of the maneuver in the vertical plane. At 36 s, the red points its head toward the blue; the blue is within the red’s missile attack zone, and the red launches a missile to attack the blue. At 38 s, the relative distance between the missile and the blue side is less than 100 m, and it is judged that the missile has successfully hit the target. At last, the red side wins the air combat, and the simulation is terminated.
The change curves of the roll, pitch, and yaw angular rates based on the body-fixed coordinate frame are shown in Figure 18b. As can be seen from Figure 18c, at the beginning of the confrontation, the change curves of the velocity of both UCAVs are similar. During the subsequent maneuvers, the change curves of the velocity of both UCAVs are different because the change curves of their overload values as well as the pitch and yaw angle rate values are not the same. At 30 s, the velocity of the red side remains steady with a small increase. Between 30 s and 35 s, although the velocity of the blue has a large increase, its velocity is smaller than that of the red. At this point, the red has a velocity advantage, which creates the conditions for the red to quickly complete the occupation attack of the missile. The values of the sideslip angles are small. The overload change curves are shown in Figure 18d. The change curves of the lateral overload on both sides are similar. From 11 s to 20 s, the longitudinal overload values of the red side are larger than that of the blue side, together with both sides performing similar maneuvers, so the velocity of the red side is larger than that of the blue side. Between 19 s and 35 s, the values of the normal overload of the blue side are smaller than the red, which results in the blue being slower than the red to complete the circling maneuver in the vertical plane. The air combat confrontation results of this case show that the superiority function variable weight strategy based on fuzzy inference helps to guide the red UCAV to choose superior maneuver control variables. Combining the MMPA method with the above strategy can help the red UCAV win the air combat.

4.3. Quantitative Results of the MMPA and Discussion

In order to test the stability of the maneuver decision-making method proposed in this paper, simulation experiments are carried out for six types of air combat scenarios that are proposed in Section 4.2. If the missile hits the other side, the attacker is judged to win this confrontation. If both missiles have been launched and do not shoot down each other, it is considered a draw. For each scenario, 20 adversarial experiments are repeated, and the statistical results of the winning rate are shown in Table 6.
As can be seen from the above statistics, the MMPA has a win rate of over 90% compared to the MPA, DE, PSO, and HHO. In a dominant initial situation, the MMPA has a win rate of 100%. Combining the air combat superiority function variable weight strategy with the MMPA results in a win rate of 95% for the method. The confrontation statistics show that the MMPA has advantages over the other comparative methods proposed in this paper and that the superiority function variable weight strategy is favorable for improving the confrontation win rate of the UCAV.
In order to explore the significant differences between the optimal solutions obtained by the MMPA and those obtained by the other intelligent optimization algorithms, a Wilcoxon rank-sum test with a 5% degree is conducted [63]. Table 7 shows the obtained p-values of the Wilcoxon rank-sum test with 5% significance.
With respect to the p-values in Table 7, it is detected that the optimal solutions of the MMPA are significantly better than those realized by the other algorithms in the six cases.

5. Conclusions

(1) In this paper, a method based on the combination of the modified MPA and fuzzy inference is proposed to solve the UCAV autonomous maneuver decision-making problems. Considering 6-DOF UCAVs in close-range air combat confrontation scenarios, the UCAV maneuver decision-making problems are transformed into optimization problems. In solving the optimized maneuver control variables of UCAVs, the MMPA shows better search capability and higher accuracy. The superiority function variable weight strategy based on fuzzy inference helps to guide the UCAVs to select superior maneuver control variables. The air combat confrontation simulation results show that the proposed maneuver decision-making method is beneficial for increasing the probability of winning the air combat in the following initial scenarios: neutral, offensive, oppositional, defensive, parallel, and head-on, and the method is more suitable for dealing with autonomous maneuver decision-making problems in dynamic air combat environments.
(2) Our future work will focus on the following aspects. Firstly, we will port the developed autonomous maneuver decision-making methods to a real aircraft platform and test the effectiveness of the autonomous maneuver decision-making methods based on a real aircraft platform. Secondly, we will incorporate random disturbances, such as the loss of target information, the introduction of false targets, and so on, into the air combat confrontation scenarios to improve the robustness of the autonomous maneuver decision-making methods.

Author Contributions

Methodology, Y.L. (Yuequn Luo) and D.D.; software, Y.L. (Yuequn Luo), M.T. and N.L.; original draft preparation, Y.L. (Yuequn Luo) and M.T.; writing—review and editing, Y.L. (Yidong Liu), H.Z. and F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was partly supported by the National Natural Science Foundation of China (No. 62101590).

Data Availability Statement

The data that support the findings of this study are available on request from the corresponding author [Yuequn Luo].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Choi, J.; Seo, M.; Shin, H.S.; Oh, H. Adversarial swarm defence using multiple fixed-wing unmanned aerial vehicles. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 5204–5219. [Google Scholar]
  2. Yang, Z.; Yuan, Z.; Wang, X.; Huang, J.; Zhou, D. Autonomous control of UAV trajectory based on RHC-Radau method in complex penetration combat environment. Aerosp. Sci. Technol. 2024, 146, 108915. [Google Scholar]
  3. Taylor, L.W. Application of the epsilon technique to a realistic optimal pursuit-evasion problem. J. Optim. Theory Appl. 1975, 15, 685–702. [Google Scholar]
  4. Anderson, G. A real-time closed-loop solution method for a class of nonlinear differential games. IEEE Trans. Autom. Control 1972, 17, 576–577. [Google Scholar]
  5. Nakagawa, S.; Yamasaki, T.; Takano, H.; Yamaguchi, I. Timing determination algorithm for aircraft evasive maneuver against unknown missile acceleration. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7–11 January 2019. [Google Scholar]
  6. Yang, Z.; Zhou, D.; Kong, W.; Piao, H.; Zhang, K.; Zhao, Y. Nondominated maneuver strategy set with tactical requirements for a fighter against missiles in a dogfight. IEEE Access 2020, 8, 117298–117312. [Google Scholar]
  7. Dong, Y.; Ai, J.; Liu, J. Guidance and control for own aircraft in the autonomous air combat: A historical review and future prospects. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2019, 233, 5943–5991. [Google Scholar]
  8. Smith, R.; Dike, B.; Mehra, R.; Ravichandran, B.; El-Fallah, A. Classifier systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft. Comput. Methods Appl. Mech. Eng. 2000, 186, 421–437. [Google Scholar]
  9. Duan, H.; Li, P.; Yu, Y. A predator-prey particle swarm optimization approach to multiple UCAV air combat modeled by dynamic game theory. IEEE/CAA J. Autom. Sin. 2015, 2, 11–18. [Google Scholar]
  10. Duan, H.; Lei, Y.; Xia, J.; Deng, Y.; Shi, Y. Autonomous maneuver decision for unmanned aerial vehicle via improved pigeon-inspired optimization. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 3156–3170. [Google Scholar]
  11. Vinyals, O.; Babuschkin, I.; Czarnecki, W.M.; Mathieu, M.; Dudzik, A.; Chung, J.; Choi, D.H.; Powell, R.; Ewalds, T.; Georgiev, P.; et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature 2019, 575, 350–354. [Google Scholar]
  12. Li, X.; Huang, H.; Zhao, H.; Wang, Y.; Hu, M. Learning a convolutional neural network for propagation-based stereo image segmentation. Vis. Comput. 2020, 36, 39–52. [Google Scholar] [CrossRef]
  13. Zhang, X.; Wang, T.; Luo, W.; Huang, P. Multi-level fusion and attention-guided cnn for image dehazing. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4162–4173. [Google Scholar] [CrossRef]
  14. Zhu, B.; Bedeer, E.; Nguyen, H.H.; Barton, R.; Henry, J. UAV trajectory planning in wireless sensor networks for energy consumption minimization by deep reinforcement learning. IEEE Trans. Veh. Technol. 2021, 70, 9540–9554. [Google Scholar] [CrossRef]
  15. Wang, L.; Wang, K.; Pan, C.; Xu, W.; Aslam, N.; Hanzo, L. Multi-agent deep reinforcement learning-based trajectory planning for multi-UAV assisted mobile edge computing. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 73–84. [Google Scholar] [CrossRef]
  16. Li, B.; Wu, Y. Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 2020, 8, 29064–29074. [Google Scholar] [CrossRef]
  17. Li, Y.; Han, W.; Wang, Y. Deep reinforcement learning with application to air confrontation intelligent decision-making of manned/unmanned aerial vehicle cooperative system. IEEE Access 2020, 8, 67887–67898. [Google Scholar] [CrossRef]
  18. Wu, L.; Wang, C.; Zhang, P.; Wei, C. Deep reinforcement learning with corrective feedback for autonomous UAV landing on a mobile platform. Drones 2022, 6, 238. [Google Scholar] [CrossRef]
  19. Xu, D.; Guo, Y.; Yu, Z.; Wang, Z.; Lan, R.; Zhao, R.; Xie, X.; Long, H. PPO-Exp: Keeping fixed-wing UAV formation with deep reinforcement learning. Drones 2023, 7, 28. [Google Scholar] [CrossRef]
  20. Zhao, Y.; Chen, Y.; Zhen, Z.; Jiang, J. Multi-weapon multi-target assignment based on hybrid genetic algorithm in uncertain environment. Int. J. Adv. Robot. Syst. 2020, 17, 1729881420905922. [Google Scholar]
  21. Guo, T.; Jiang, N.; Li, B.; Zhu, X.; Wang, Y.; DU, W. UAV navigation in high dynamic environments: A deep reinforcement learning approach. Chin. J. Aeronaut. 2021, 34, 479–489. [Google Scholar] [CrossRef]
  22. Hu, J.; Wang, L.; Hu, T.; Guo, C.; Wang, Y. Autonomous maneuver decision making of dual-UAV cooperative air combat based on deep reinforcement learning. Electronics 2022, 11, 467. [Google Scholar] [CrossRef]
  23. Xianyong, J.; Hou, M.; Wu, G.; Ma, Z.; Tao, Z. Research on maneuvering decision algorithm based on improved deep deterministic policy gradient. IEEE Access 2022, 10, 92426–92445. [Google Scholar] [CrossRef]
  24. Pope, A.P.; Ide, J.S.; Mićović, D.; Diaz, H.; Twedt, J.C.; Alcedo, K.; Javorsek, D. Hierarchical reinforcement learning for air combat at Darpa’s alphadogfight trials. IEEE Trans. Artif. Intell. 2022, 4, 1371–1385. [Google Scholar]
  25. Wang, X.; Wang, Y.; Su, X.; Wang, L.; Lu, C.; Peng, H.; Liu, J. Deep reinforcement learning-based air combat maneuver decision-making: Literature review, implementation tutorial and future direction. Artif. Intell. Rev. 2024, 57, 1. [Google Scholar]
  26. Piao, H.; Han, Y.; Chen, H.; Peng, X.; Fan, S.; Sun, Y.; Liang, C.; Liu, Z.; Sun, Z.; Zhou, D. Complex relationship graph abstraction for autonomous air combat collaboration: A learning and expert knowledge hybrid approach. Expert Syst. Appl. 2023, 215, 119285. [Google Scholar] [CrossRef]
  27. Hou, Y.; Liang, X.; Lv, M.; Yang, Q.; Li, Y. Subtask-masked curriculum learning for reinforcement learning with application to UAV maneuver decision-making. Eng. Appl. Artif. Intell. 2023, 125, 106703. [Google Scholar]
  28. Bae, J.H.; Jung, H.; Kim, S.; Kim, S.; Kim, Y.-D. Deep reinforcement learning-based air-to-air combat maneuver generation in a realistic environment. IEEE Access 2023, 11, 26427–26440. [Google Scholar] [CrossRef]
  29. Wu, A.; Yang, R.; Liang, X.; Zhang, J.; Qi, D.; Wang, N. Visual range maneuver decision of unmanned combat aerial vehicle based on fuzzy reasoning. Int. J. Fuzzy Syst. 2022, 24, 519–536. [Google Scholar]
  30. Sun, Z.; Piao, H.; Yang, Z.; Zhao, Y.; Zhan, G.; Zhou, D.; Meng, G.; Chen, H.; Chen, X.; Qu, B.; et al. Multi-agent hierarchical policy gradient for air combat tactics emergence via self-play. Eng. Appl. Artif. Intell. 2021, 98, 104112. [Google Scholar]
  31. Zuo, J.L.; Yang, R.N.; Zhang, Y.; Li, Z.; Wu, M. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning. Acta Aeronaut. Astronaut. Sin. 2017, 38, 323571. [Google Scholar]
  32. Sun, Y.; Wang, X.; Wang, T.; Gao, P. Modeling of air-to-air missile dynamic attack zone based on Bayesian networks. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; pp. 5596–5601. [Google Scholar]
  33. Ernest, N.; Carroll, D.; Schumacher, C.; Clark, M.; Cohen, K.; Lee, G. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions. J. Def. Manag. 2016, 6, 144. [Google Scholar]
  34. Hou, Y.; Liang, X.; Zhang, J.; Lv, M.; Yang, A. Hierarchical decision-making framework for multiple UCAVs autonomous confrontation. IEEE Trans. Veh. Technol. 2023, 72, 13953–13968. [Google Scholar]
  35. Chen, C.; Mo, L.; Lv, M.; Lin, D.; Song, T.; Cao, J. Enhanced missile hit probability actor-critic algorithm for autonomous decision-making in air-to-air confrontation. Aerosp. Sci. Technol. 2024, 151, 109285. [Google Scholar]
  36. Al-Betar, M.A.; Awadallah, M.A.; Makhadmeh, S.N.; Alyasseri, Z.A.A.; Al-Naymat, G.; Mirjalili, S. Marine predators algorithm: A review. Arch. Comput. Methods Eng. 2023, 30, 3405–3435. [Google Scholar]
  37. Kumar, S.; Yildiz, B.S.; Mehta, P.; Panagant, N.; Sait, S.M.; Mirjalili, S.; Yildiz, A.R. Chaotic marine predators algorithm for global optimization of real-world engineering problems. Knowl. Based Syst. 2023, 261, 110192. [Google Scholar]
  38. Zhong, K.; Luo, Q.; Zhou, Y.; Jiang, M. TLMPA: Teaching-learning-based Marine Predators algorithm. Aims Math 2021, 6, 1395–1442. [Google Scholar]
  39. Zhang, H.; Wei, Y.; Zhou, H.; Huang, C. Maneuver decision-making for autonomous air combat based on FRE-PPO. Appl. Sci. 2022, 12, 10230. [Google Scholar] [CrossRef]
  40. Duan, H.; Huo, M.; Yang, Z.; Shi, Y.; Luo, Q. Predator-prey pigeon-inspired optimization for UAV ALS longitudinal parameters tuning. IEEE Trans. Aerosp. Electron. Syst. 2019, 55, 2347–2358. [Google Scholar]
  41. Ruan, W.; Duan, H.; Deng, Y. Autonomous Maneuver Decisions via Transfer Learning Pigeon Inspired Optimization for UCAVs in Dogfight Engagements. IEEE/CAA J. Autom. Sin. 2022, 9, 1639–1657. [Google Scholar]
  42. Wang, M.; Wang, L.; Yue, T.; Liu, H. Influence of unmanned combat aerial vehicle agility on short-range aerial combat effectiveness. Aerosp. Sci. Technol. 2020, 96, 105534. [Google Scholar]
  43. Snell, S.A.; Enns, D.F.; Garrard, W.L., Jr. Nonlinear inversion flight control for a supermaneuverable aircraft. J. Guid. Control Dyn. 1992, 15, 976–984. [Google Scholar] [CrossRef]
  44. Snell, S.; Garrard, W.; Enns, D. Nonlinear control of a supermaneuverable aircraft. In Proceedings of the AIAA Guidance, Navigation and Control Conference, Boston, MA, USA, 14–16 August 1989. [Google Scholar]
  45. Hanghang, Y.O.U.; Qisong, H.A.N.; Minjian, Y.U.; Huiming, J.I.; Zelong, Y.E. A method to solve the unreachable zone of mid-range air-to-air missile. In Proceedings of the 2019 IEEE 2nd International Conference on Electronic Information and Communication Technology (ICEICT), Harbin, China, 20–22 January 2019; pp. 649–654. [Google Scholar]
  46. Huang, C.; Dong, K.; Huang, H.; Tang, S. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J. Syst. Eng. Electron. 2018, 29, 86–97. [Google Scholar] [CrossRef]
  47. Wu, A.; Yang, R.; Liang, X.L.; Zhang, J.Q. Maneuver decision on visual range air combats of unmanned combat aerial vehicles based on fuzzy inference. J. Nanjing Univ. Aeronaut. Astronaut. 2021, 53, 898–908. [Google Scholar]
  48. Pusdekar, R.M.; Bawaskar, A.B. VLSI Architecture of Centre of Gravity Based Defuzzifier Unit. Int. J. Eng. Innov. Technol. 2015, 4, 9001. [Google Scholar]
  49. Faramarzi, A.; Heidarinejad, M.; Mirjalili, S.; Gandomi, A.H. Marine predators algorithm: A nature-inspired metaheuristic. Expert Syst. Appl. 2020, 152, 113377. [Google Scholar] [CrossRef]
  50. Fan, Q.; Huang, H.; Chen, Q.; Yao, L.; Yang, K.; Huang, D. A modified self-adaptive marine predators algorithm: Framework and engineering applications. Eng. Comput. 2022, 38, 3269–3294. [Google Scholar] [CrossRef]
  51. Tizhoosh, H.R. Opposition-Based Learning: A New Scheme for Machine Intelligence. In Proceedings of the International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’06), Vienna, Austria, 28–30 November 2005; pp. 695–701. [Google Scholar]
  52. Zhenyu, G.; Bo, C.; Min, Y.; Binggang, C. Self-Adaptive Chaos Differential Evolution; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4221, pp. 972–975. [Google Scholar]
  53. Mirjalili, S.; Gandomi, A.H. Chaotic gravitational constants for the gravitational search algorithm. Appl. Soft Comput. 2017, 53, 407–419. [Google Scholar] [CrossRef]
  54. Emary, E.; Zawbaa, H.M.; Sharawi, M. Impact of Lèvy flight on modern meta-heuristic optimizers. Appl. Soft Comput. 2019, 75, 775–789. [Google Scholar] [CrossRef]
  55. Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
  56. Tan, M.; Tang, A.; Ding, D.; Xie, L.; Huang, C. Autonomous air combat maneuvering decision method of UCAV based on LSHADE-TSO-MPC under enemy trajectory prediction. Electronics 2022, 11, 3383. [Google Scholar] [CrossRef]
  57. Balakrishnan, K.; Dhanalakshmi, R.; Mahadeo Khaire, U. Analysing stable feature selection through an augmented marine predator algorithm based on opposition-based learning. Expert Syst. 2022, 39, e12816. [Google Scholar]
  58. Balakrishnan, K.; Dhanalakshmi, R.; Mahadeo Khaire, U. Excogitating marine predators algorithm based on random opposition-based learning for feature selection. Concurr. Comput. Pract. Exp. 2022, 34, e6630. [Google Scholar]
  59. Alrasheedi, A.F.; Alnowibet, K.A.; Saxena, A.; Sallam, K.M.; Mohamed, A.W. Chaos Embed Marine Predator (CMPA) Algorithm for Feature Selection. Mathematics 2022, 10, 1411. [Google Scholar] [CrossRef]
  60. Yu, G.; Meng, Z.; Ma, H.; Liu, L. An adaptive marine predators algorithm for optimizing a hybrid PV/DG/battery system for a remote area in China. Energy Rep. 2021, 7, 398–412. [Google Scholar]
  61. Ramezani, M.; Bahmanyar, D.; Razmjooy, N. A new improved model of marine predator algorithm for optimization problems. Arab. J. Sci. Eng. 2021, 46, 8803–8826. [Google Scholar]
  62. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar]
  63. Derrac, J.; García., S.; Molina., D.; Herrera, F. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evol. Comput. 2011, 1, 3–18. [Google Scholar]
Figure 1. UCAV maneuver controller workflow.
Figure 1. UCAV maneuver controller workflow.
Drones 09 00252 g001
Figure 2. Spatial off-axis angles of the attack and target aircraft.
Figure 2. Spatial off-axis angles of the attack and target aircraft.
Drones 09 00252 g002
Figure 3. Calculation flowchart for the missile attack zone.
Figure 3. Calculation flowchart for the missile attack zone.
Drones 09 00252 g003
Figure 4. Missile attack zone. (a) Missile attack in the rear hemisphere of the target aircraft. (b) Missile attack in the forward hemisphere of the target aircraft.
Figure 4. Missile attack zone. (a) Missile attack in the rear hemisphere of the target aircraft. (b) Missile attack in the forward hemisphere of the target aircraft.
Drones 09 00252 g004
Figure 5. Missile attack zone. (a) Missile attack in the rear hemisphere of the target aircraft. (b) Missile attack in the forward hemisphere of the target aircraft.
Figure 5. Missile attack zone. (a) Missile attack in the rear hemisphere of the target aircraft. (b) Missile attack in the forward hemisphere of the target aircraft.
Drones 09 00252 g005
Figure 6. Missile attack zone. (a) Missile attack in the rear hemisphere of the target aircraft. (b) Missile attack in the forward hemisphere of the target aircraft.
Figure 6. Missile attack zone. (a) Missile attack in the rear hemisphere of the target aircraft. (b) Missile attack in the forward hemisphere of the target aircraft.
Drones 09 00252 g006
Figure 7. Framework of the autonomous maneuver decision-making system.
Figure 7. Framework of the autonomous maneuver decision-making system.
Drones 09 00252 g007
Figure 8. Flow of the MMPA for solving the optimized maneuver control variables.
Figure 8. Flow of the MMPA for solving the optimized maneuver control variables.
Drones 09 00252 g008
Figure 9. Fuzzy rule tree. (a) Rules related to big angle superiority. (b) Rules related to relatively big angle superiority. (c) Rules related to relatively small angle superiority. (d) Rules related to small angle superiority.
Figure 9. Fuzzy rule tree. (a) Rules related to big angle superiority. (b) Rules related to relatively big angle superiority. (c) Rules related to relatively small angle superiority. (d) Rules related to small angle superiority.
Drones 09 00252 g009
Figure 10. Comparison of the change curves of the step-size control parameters.
Figure 10. Comparison of the change curves of the step-size control parameters.
Drones 09 00252 g010
Figure 11. Effect of the parameters on the performance of the MMPA. (a) Case 1: Population number. (b) Case 2: Position update parameters. (c) Case 3: Step-size control parameters. (d) Case 4: Inertia weight coefficients.
Figure 11. Effect of the parameters on the performance of the MMPA. (a) Case 1: Population number. (b) Case 2: Position update parameters. (c) Case 3: Step-size control parameters. (d) Case 4: Inertia weight coefficients.
Drones 09 00252 g011
Figure 12. Comparison of the fitness value change curves of the different algorithms.
Figure 12. Comparison of the fitness value change curves of the different algorithms.
Drones 09 00252 g012
Figure 13. Simulation results for the air combat confrontation with neutral initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Figure 13. Simulation results for the air combat confrontation with neutral initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Drones 09 00252 g013
Figure 14. Simulation results for the air combat confrontation with the offensive initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Figure 14. Simulation results for the air combat confrontation with the offensive initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Drones 09 00252 g014
Figure 15. Simulation results for the air combat confrontation with the oppositional initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Figure 15. Simulation results for the air combat confrontation with the oppositional initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Drones 09 00252 g015
Figure 16. Simulation results for the air combat confrontation with the defensive initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Figure 16. Simulation results for the air combat confrontation with the defensive initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Drones 09 00252 g016
Figure 17. Simulation results for the air combat confrontation with the parallel initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Figure 17. Simulation results for the air combat confrontation with the parallel initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Drones 09 00252 g017
Figure 18. Simulation results for the air combat confrontation with the head-on initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Figure 18. Simulation results for the air combat confrontation with the head-on initial scenario. (a) Red, blue UCAVs and missile trajectories in three dimensions space. (b) Roll, pitch, and yaw angular rates in the body-fixed coordinate frame. (c) Velocity, attack angle, sideslip angle curves. (d) Overload curves in x-, y-, z-axis directions.
Drones 09 00252 g018
Table 1. Different parameter values.
Table 1. Different parameter values.
Case 1
NP50~500P R 1   = sin ( 2 π rand ) ,   R 2   = cos ( 2 π rand)
CF cos ( 1 / 2 · π Iter / Max _ Iter ) ^ ( 2 Iter/Max_Iter)SF [ 1 sin ( 1 / 2 · π Iter / Max _ Iter ) ] ^ e ( 1 / 2 · π Iter/Max_Iter)
Case 2
NP300P 0.5 ;   0.8 ;   R 1   = sin ( 2 π · rand ) ,   R 2   = cos ( 2 π ·rand)
CF cos ( 1 / 2 · π ·Iter/Max_Iter) ^ (2·Iter/Max_Iter)SF [ 1 sin ( 1 / 2 · π · Iter / Max _ Iter ) ] ^ e ( 1 / 2 · π ·Iter/Max_Iter)
Case 3
NP300P R 1   = sin ( 2 π · rand ) ,   R 2   = cos ( 2 π ·rand)
CF 1 sin ( 1 / 2 · π · Iter / Max _ Iter ) ;
( 1 Iter / Max _ Iter ) ^ ( 2 · Iter / Max _ Iter ) ;
cos ( 1 / 2 · π ·Iter/Max_Iter) ^ (2·Iter/Max_Iter)
SF [ 1 sin ( 1 / 2 · π · Iter / Max _ Iter ) ] ^ e ( 1 / 2 · π ·Iter/Max_Iter)
Case 4
NP300P R 1   = sin ( 2 π · rand ) ,   R 2   = cos ( 2 π ·rand)
CF cos ( 1 / 2 · π ·Iter/Max_Iter) ^ (2·Iter/Max_Iter)SF 0.2 ;   1.0 ;   [ 1 sin ( 1 / 2 · π · Iter / Max _ Iter ) ] ^ e ( 1 / 2 · π ·Iter/Max_Iter)
Table 2. Algorithm initial parameter settings.
Table 2. Algorithm initial parameter settings.
(1) PSO Parameters
Max_Iter50NP300
c11.3c21.7
w0.8
(2) DE Parameters
Max_Iter50NP300
F00.3FF0·2 ^ e(1 − (Max_Iter/(Max_Iter + 1 − Iter)))
CR0.1·(1 + rand)
(3) MPA Parameters
Max_Iter50NP300
P0.5CF(1 − Iter/Max_Iter) ^ (2·Iter/Max_Iter)
SF1
(4) LOBL-MPA Parameters
Max_Iter50NP300
P0.5CF(1 − Iter/Max_Iter) ^ (2·Iter/Max_Iter)
SF1LOBLuse
(5) SAR-MPA Parameters
Max_Iter50NP300
P R 1   = sin ( 2 π · rand ) ,   R 2   = cos ( 2 π ·rand)CF cos ( 1 / 2 · π ·Iter/Max_Iter) ^ (2·Iter/Max_Iter)
SF [ 1 sin ( 1 / 2 · π · Iter / Max _ Iter ) ] ^ e ( 1 / 2 · π ·Iter/Max_Iter)
(6) HHO Parameters
Max_Iter50NP300
E0rand − 1J2·(1 − rand)
EE0 (1 − Iter/Max_Iter)
(7) MMPA Parameters
Max_Iter50NP300
P R 1   = sin ( 2 π · rand ) ,   R 2   = cos ( 2 π ·rand)CF cos ( 1 / 2 · π ·Iter/Max_Iter) ^ (2·Iter/Max_Iter)
SF [ 1 sin ( 1 / 2 · π · Iter / Max _ Iter ) ] ^ e ( 1 / 2 · π ·Iter/Max_Iter)LOBLuse
Table 3. Comparative analysis of different algorithms.
Table 3. Comparative analysis of different algorithms.
AlgorithmTime Consumptions/sConvergence Iterations
PSO0.06745
DE0.15204
MPA0.04504
MMPA0.08502
SAR-MPA0.12543
LOBL-MPA0.11153
HHO0.08763
Table 4. Statistical results of the ablation experiments.
Table 4. Statistical results of the ablation experiments.
AlgorithmControl Precision/%Victory Ratio/%
MPA84.9184
SAR-MPA91.5789
MMPA95.9594
Table 5. Initial conditions for the air combat confrontation.
Table 5. Initial conditions for the air combat confrontation.
SceneUCAVx/my/mh/mV/(m/s) θ / deg ψ / deg
Case 1Red00600022000
Blue400006000220090
Case 2Red00600022000
Blue30001000600022000
Case 3Red00600022000
Blue10,00010,00060002200180
Case 4Red0500050002200180
Blue0600060002200180
Case 5Red00600022000
Blue03000600022000
Case 6Red00600022000
Blue10,00010,00060002200180
Table 6. Statistical results of the victory rate in repeated adversarial experiments.
Table 6. Statistical results of the victory rate in repeated adversarial experiments.
Confrontation ScenarioWin RateLoss RateTie Rate
Case 1: MMPA with dominant initial situation vs. MMPA with submissive initial situation100%0%0%
Case 2: MMPA vs. MPA95%5%0%
Case 3: MMPA vs. DE95%5%0%
Case 4: MMPA vs. PSO100%0%0%
Case 5: MMPA vs. HHO90%5%5%
Case 6: MMPA with variable weight strategy vs. MMPA with fixed weight strategy95%0%5%
Table 7. p-values of the Wilcoxon rank-sum test with 5% significance for Case 1–Case 6.
Table 7. p-values of the Wilcoxon rank-sum test with 5% significance for Case 1–Case 6.
PSODEMPALOBL-MPASAP-MPAHHO
Case 18.15 × 10−119.06 × 10−111.27 × 10−119.13 × 10−116.32 × 10−119.75 × 10−12
Case 22.78 × 10−115.47 × 10−119.58 × 10−119.65 × 10−111.58 × 10−119.71 × 10−11
Case 39.57 × 10−114.85 × 10−118.00 × 10−111.42 × 10−114.22 × 10−119.16 × 10−11
Case 47.92 × 10−119.59 × 10−116.56 × 10−113.57 × 10−128.49 × 10−119.34 × 10−11
Case 56.79 × 10−117.58 × 10−117.43 × 10−113.92 × 10−116.55 × 10−111.71 × 10−11
Case 67.06 × 10−113.18 × 10−122.77 × 10−114.62 × 10−129.71 × 10−128.23 × 10−11
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, Y.; Ding, D.; Tan, M.; Liu, Y.; Li, N.; Zhou, H.; Wang, F. Autonomous Maneuver Decision-Making for Unmanned Combat Aerial Vehicle Based on Modified Marine Predator Algorithm and Fuzzy Inference. Drones 2025, 9, 252. https://doi.org/10.3390/drones9040252

AMA Style

Luo Y, Ding D, Tan M, Liu Y, Li N, Zhou H, Wang F. Autonomous Maneuver Decision-Making for Unmanned Combat Aerial Vehicle Based on Modified Marine Predator Algorithm and Fuzzy Inference. Drones. 2025; 9(4):252. https://doi.org/10.3390/drones9040252

Chicago/Turabian Style

Luo, Yuequn, Dali Ding, Mulai Tan, Yidong Liu, Ning Li, Huan Zhou, and Fumin Wang. 2025. "Autonomous Maneuver Decision-Making for Unmanned Combat Aerial Vehicle Based on Modified Marine Predator Algorithm and Fuzzy Inference" Drones 9, no. 4: 252. https://doi.org/10.3390/drones9040252

APA Style

Luo, Y., Ding, D., Tan, M., Liu, Y., Li, N., Zhou, H., & Wang, F. (2025). Autonomous Maneuver Decision-Making for Unmanned Combat Aerial Vehicle Based on Modified Marine Predator Algorithm and Fuzzy Inference. Drones, 9(4), 252. https://doi.org/10.3390/drones9040252

Article Metrics

Back to TopTop