Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm
Abstract
:1. Introduction
2. Problems and Modeling
2.1. Problem Description
2.2. Aircraft Model
2.3. Air Combat Advantage Function Modeling
2.3.1. Geometric Situation Modeling
2.3.2. Stability Advantage Function
2.3.3. Missile and Environmental Advantage Function
2.3.4. Reward Value
3. Simulation of Operation Command Based on PSO-RBF Algorithm
3.1. RBF Neural Network Principle
3.2. PSO-RBF Algorithm
- The pilot’s control instructions under different air combat situations are selected as the learning samples, which are normalized and used as the input layer of the RBF neural network.
- Each dimension vector of a single individual in the PSO algorithm is composed of the center of basis function and the weight of the output layer in the RBF neural network, and then the scale of the PSO population, the maximum number of iterations, and the initial flight speed and initial position of each particle are initialized.
- Calculate the fitness value of the ith particle in the particle swarm at the current position to update the optimal position of a single particle and the optimal position of the entire particle population.
- Update the flight speed and position of the ith particle, as shown in Equation (17).
- Determine whether the output results satisfy the final iteration requirements. If the conditions are met, proceed to the next step. If not, perform a new round of iteration until the model end conditions are met.
- Record the optimal position after the iteration and obtain the parameters of the new RBF neural network.
3.3. State Space
4. Air Combat Maneuver Decision Modeling
4.1. FRV-DDPG Algorithm
4.2. Algorithm Steps
Algorithm 1: FRV-DDPG UAV air combat decision algorithm. |
1. Initialize the memory playback unit with a capacity of R, the number of samples for a single learning and random noise |
2. Initialize critical online network , critical target network , actor online network , and actor target network |
3. for episode = 1, 2, …, M do |
4. Initialize the status of UAVs on both sides and obtain the current situation. |
5. for step = 1, 2, …, T do. |
6. The UAV generates a random action strategy in the actor online network according to the situation of both sides . |
7. After the UAV and the target perform actions, obtain the reward value and the new situation of the two aircraft . |
8. Obtain the final reward value at the end of the air combat round at time . |
9. Store data samples in . |
10. end for. |
11. Take a batch of samples at random . |
12. Let , where . |
13. According to the objective function , the critical online network is updated using the gradient descent method. |
14. Update the actor’s online network using random policy gradients:. |
15. Update critical target network and actor target network :. |
16. end for. |
5. Simulations and Results
5.1. Simulation Environment Settings
5.2. Simulation Training
5.2.1. Simulate Operation Command
5.2.2. Air Combat Maneuver Decision Training
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
UAV | unmanned aerial vehicle |
FRV-DDPG | final reward value deep deterministic policy gradient |
6-DOF | six-degree-of-freedom |
PSO-RBF | particle swarm optimization radial basis function |
DQN | deep Q network |
BP | back propagation |
References
- Shan, G.; Xu, G.; Qiao, C. A non-myopic scheduling method of radar sensors for maneuvering target tracking and radiation control. Def. Technol. 2020, 16, 242–250. [Google Scholar] [CrossRef]
- Fu, Q.; Fan, C.; Song, Y.; Guo, X. Alpha C2—An Intelligent Air Defense Commander Independent of Human Decision-Making. IEEE Access 2020, 8, 87504–87516. [Google Scholar] [CrossRef]
- Zhou, K.; Wei, R.; Zhang, Q.; Xu, Z. Learning system for air combat decision inspired by cognitive mechanisms of the brain. IEEE Access 2020, 8, 8129–8144. [Google Scholar] [CrossRef]
- Wang, Y.; Huang, C.; Tang, C. Research on unmanned combat aerial vehicle robust maneuvering decision under incomplete target information. Adv. Mech. Eng. 2016, 8, 10. [Google Scholar] [CrossRef] [Green Version]
- Han, T.; Wang, X.; Liang, Y.; Ku, S. Study on UCAV robust maneuvering decision in automatic air combat based on target accessible domain. J. Phys. Conf. Ser. 2019, 1213, 052004. [Google Scholar] [CrossRef]
- Ha, J.S.; Chae, H.J.; Choi, H.L. A stochastic game-theoretic approach for analysis of multiple cooperative air combat. In Proceedings of the 2015 American Control Conference (ACC), Chicago, IL, USA, 1–3 July 2015; pp. 3728–3733. [Google Scholar]
- Ni, J.; Tang, G.; Mo, Z.; Cao, W.; Yang, S.X. An Improved Potential Game Theory Based Method for Multi-UAV Cooperative Search. IEEE Access 2020, 8, 47787–47796. [Google Scholar] [CrossRef]
- Ma, Y.; Wang, G.; Luo, H.; Lei, X. Cooperative occupancy decision making of Multi-UAV in Beyond-Visual-Range air combat: A game theory approach. IEEE Access 2019, 8, 11624–11634. [Google Scholar] [CrossRef]
- Xu, G.; Wei, S.; Zhang, H. Application of situation function in air combat differential games. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 5865–5870. [Google Scholar]
- Park, H.; Lee, B.Y.; Tahk, M.J.; Yoo, D.W. Differential game based air combat maneuver generation using scoring function matrix. Int. J. Aeronaut. Space Sci. 2016, 17, 204–213. [Google Scholar] [CrossRef] [Green Version]
- Xie, R.Z.; Li, J.Y.; Luo, D.L. Research on maneuvering decisions for multi-UCAVs air combat. In Proceedings of the 11th IEEE International Conference on Control and Automation (ICCA), Taichung, Taiwan, 18–20 June 2014; pp. 767–772. [Google Scholar]
- Huang, C.; Dong, K.; Huang, H.; Tang, S.; Zhang, Z. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization. J. Syst. Eng. Electron. 2018, 29, 86–97. [Google Scholar] [CrossRef]
- Liu, C.; Sun, S.; Tao, C.; Shou, Y.; Xu, B. Sliding mode control of multi-agent system with application to UAV air combat. Comput. Electr. Eng. 2021, 96, 107491. [Google Scholar] [CrossRef]
- Xu, G.; Liu, Q.; Zhang, H. The application of situation function in differential game problem of the air combat. In Proceedings of the 2018 Chinese Automation Congress (CAC), Xi’an, China, 30 November–2 December 2018; pp. 1190–1195. [Google Scholar]
- Pan, Q.; Zhou, D.; Huang, J.; Lv, X.; Yang, Z.; Zhang, K.; Li, X. Maneuver decision for cooperative close-range air combat based on state predicted influence diagram. In Proceedings of the 2017 IEEE International Conference on Information and Automation (ICIA), Macau, China, 18–20 July 2017; pp. 726–731. [Google Scholar]
- Geng, W.; Kong, F.; Ma, D. Study on tactical decision of UCAV medium-range air combat. In Proceedings of the 26th Chinese Control Decision Conference (CCDC), Changsha, China, 1–2 June 2014; pp. 135–139. [Google Scholar]
- Dong, W.; Wei, Z.; Chang, H.; Jie, Z. Research on automatic decision making of UCAV based on plan goal graph. In Proceedings of the 2016 IEEE International Conference on Robotics and Biomimetics, Qingdao, China, 3–7 December 2016; pp. 1245–1249. [Google Scholar]
- Luo, D.; Shen, C.; Wang, B.; Wu, W. Air Combat Decision-Making for Cooperative Multiple Target Attack: An Approach of Hybrid Adaptive Genetic Algorithm. J. Grad. Sch. Chin. Acad. Sci. 2006, 23, 382–389. [Google Scholar]
- Kaneshige, J.; Krishnakumar, K. Artificial immune system approach for air combat maneuvering. Int. Soc. Opt. Eng. 2008, 6560, 656009. [Google Scholar]
- Zhang, H.; Huang, C. Maneuver decision-making of deep learning for UCAV thorough azimuth angles. IEEE Access 2020, 8, 12976–12987. [Google Scholar] [CrossRef]
- Rosales, C.; Miguel, S.C.; Rossomando, F.G. Identification and adaptive PID control of a hexacopter UCAV based on neural networks. Int. J. Adapt. Contr. Signal. Process. 2019, 33, 74–91. [Google Scholar] [CrossRef] [Green Version]
- Qu, C.; Gai, W.; Zhong, M.; Zhang, J. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning. Appl. Soft. Comput. J. 2020, 89, 106099. [Google Scholar] [CrossRef]
- Yang, J.; You, X.; Wu, G.; Hassan, M.M.; Almogren, A.; Guna, J. Application of reinforcement learning in UAV cluster task scheduling. Future Gener. Comput. Syst. 2019, 95, 140–148. [Google Scholar] [CrossRef]
- Zhao, X.; Zong, Q.; Tian, B.; Zhang, B.; You, M. Fast task allocation for heterogeneous unmanned aerial vehicles through reinforcement learning. Aerosp. Sci. Technol. 2019, 92, 588–594. [Google Scholar] [CrossRef]
- Zhang, X.; Liu, G.; Yang, C.; Jiang, W. Research on air confrontation maneuver decision-making method based on reinforcement learning. Electronics. 2018, 7, 279. [Google Scholar] [CrossRef] [Green Version]
- Yang, Q.; Zhang, J.; Shi, G.; Hu, J.; Wu, Y. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning. IEEE Access 2019, 8, 363–378. [Google Scholar] [CrossRef]
- Gan, Z.; Li, B.; Neretin, E.S.; Dyachenko, S.A. UAV Maneuvering Target Tracking based on Deep Reinforcement Learning. J. Phys. Conf. Ser. 2021, 1958, 012015. [Google Scholar] [CrossRef]
- Yao, J.; Li, X.; Zhang, Y.; Ji, J.; Wang, Y.; Zhang, D.; Liu, y. Three-Dimensional Path Planning for Unmanned Helicopter Using Memory-Enhanced Dueling Deep Q Network. Aerospace 2022, 9, 417. [Google Scholar] [CrossRef]
- Hu, D.; Yang, R.; Zuo, J.; Zhang, Z.; Wu, J.; Wang, Y. Application of deep reinforcement learning in maneuver planning of beyond-visual-range air combat. IEEE Access 2021, 9, 32282–32297. [Google Scholar] [CrossRef]
- He, R.; Lv, H.; Zhang, S.; Zhang, D.; Zhang, H. Lane following method based on improved DDPG algorithm. Sensors 2021, 21, 4827. [Google Scholar] [CrossRef] [PubMed]
- Ma, Y.; Zhu, W.; Benton, M.G.; Romagnoli, J. Continuous control of a polymerization system with deep reinforcement learning. J. Process Control 2019, 75, 40–47. [Google Scholar] [CrossRef]
- Yue, L.; Qiu, X.; Liu, X.; Xia, Q. Deep reinforcement learning and its application in autonomous fitting optimization for attack areas of UCAVs. J. Syst. Eng. Electron. 2020, 31, 734–742. [Google Scholar] [CrossRef]
- Wang, L.; Hu, J.; Xu, Z.; Zhao, C. Autonomous maneuver strategy of swarm air combat based on DDPG. Auton. Intell. Syst. 2021, 1, 15. [Google Scholar] [CrossRef]
- Li, B.; Gan, Z.; Chen, D.; Aleksandrovich, D.S. UAV maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sens. 2020, 12, 3789. [Google Scholar] [CrossRef]
- Kong, W.; Zhou, D.; Yang, Z.; Zhang, K.; Zeng, L. Maneuver strategy generation of UCAV for within visual range air combat based on multi-agent reinforcement learning and target position prediction. Appl. Sci. 2020, 10, 5198. [Google Scholar] [CrossRef]
- Kong, W.; Zhou, D.; Yang, Z.; Zhao, Y.; Zhang, K. UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning. Electronics 2020, 9, 1121. [Google Scholar] [CrossRef]
- Li, B.; Yang, Z.; Chen, D.; Liang, S.; Ma, H. Maneuvering target tracking of UAV based on MN-DDPG and transfer learning. Def. Technol. 2021, 17, 457–466. [Google Scholar] [CrossRef]
- Mohebbi, A.; Ahmadi-Pour, M.; Mohebbi, M. Accurate prediction of liquid phase equilibrium adsorption of sulfur compound. Chem. Eng. Res. Des. 2017, 126, 199–208. [Google Scholar] [CrossRef]
- Partovi, M.; Mosalanezhad, M.; Lotfi, S.; Barati-Harooni, A.; Najafi-Marghmaleki, A.; Mohammadi, A.H. On the estimation of CO2-brine interfacial tension. J. Mol. Liq. 2017, 243, 265–272. [Google Scholar] [CrossRef]
- Yamaguch, K.; Endou, T.; Tanaka, M.; Ohtake, H.; Tanaka, K. T-S Fuzzy Modeling and PDC Approach to Control of Nonlinear F16 Aircraft Model. In Proceedings of the 28th Fuzzy System Symposium (FSS), Nagoya, Japan, 12–14 September 2012; pp. 169–174. [Google Scholar]
- Xu, A.; Chen, X.; Li, Z.; Hu, X. A method of situation assessment for beyond-visual-range air combat based on tactical attack area. Fire Control Command Control 2020, 45, 97–102. [Google Scholar]
- Hamdi, H.; Regaya, C.B.; Zaafouri, A. Real-time study of a photovoltaic system with boost converter using the PSO-RBF neural network algorithms in a MyRio controller. Sol. Energy 2019, 183, 1–16. [Google Scholar] [CrossRef]
- Yang, Q.; Zhu, Y.; Zhang, J.; Qiao, S.; Liu, J. UAV air combat autonomous maneuver decision based on DDPG algorithm. In Proceedings of the 2019 IEEE 15th international conference on control and automation (ICCA), Edinburgh, UK, 16–19 July 2019; pp. 37–42. [Google Scholar]
- Tutsoy, O.; Brown, M. Chaotic dynamics and convergence analysis of temporal difference algorithms with bang-bang control. Optim. Control Appl. Meth. 2016, 37, 108–126. [Google Scholar] [CrossRef]
- Tutsoy, O.; Brown, M. An analysis of value function learning with piecewise linear control. J. Exp. Theor. Artif. Intell. 2015, 28, 529–545. [Google Scholar] [CrossRef]
Rudder Surface | Value Range |
---|---|
Wing area | 27.87 |
Wing span | 9.144 |
Wing chord | 3.45 |
Weight | 9295.44 |
Parameters | Value | Parameters | Value |
---|---|---|---|
50 km | 90° | ||
10 km | 1 km | ||
5 km | 1 km | ||
3 km | |||
0.4 | 0.25 | ||
0.1 | 0.25 |
Initial State | UAV | Target |
---|---|---|
x | 0 m | (−15,000, 15,000) m |
y | 0 m | (−15,000, 15,000) m |
h | 3000 m | (2000, 4000) m |
v | 200 m/s | (150, 350) m/s |
0° | (0, 360)° |
Initial State | Case 1 | Case 2 |
---|---|---|
x | 0 m | 0 m |
y | 8000 m | −8000 m |
h | 3000 m | 3000 m |
v | 200 m/s | 200 m/s |
180° | 180° |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, Y.; Lyu, Y.; Shi, J.; Li, W. Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm. Aerospace 2022, 9, 658. https://doi.org/10.3390/aerospace9110658
Li Y, Lyu Y, Shi J, Li W. Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm. Aerospace. 2022; 9(11):658. https://doi.org/10.3390/aerospace9110658
Chicago/Turabian StyleLi, Yongfeng, Yongxi Lyu, Jingping Shi, and Weihua Li. 2022. "Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm" Aerospace 9, no. 11: 658. https://doi.org/10.3390/aerospace9110658
APA StyleLi, Y., Lyu, Y., Shi, J., & Li, W. (2022). Autonomous Maneuver Decision of Air Combat Based on Simulated Operation Command and FRV-DDPG Algorithm. Aerospace, 9(11), 658. https://doi.org/10.3390/aerospace9110658