Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm
Highlights
- Proposed a continuous control strategy based on the Soft Actor-Critic (SAC) algorithm for UAV autonomous maneuvering in 1v1 tactical encounter scenarios.
- Introduced a multi-dimensional situation-coupled reward function with a Health Point (HP) system to quantitatively evaluate situational advantages and cumulative tactical performance.
- Enhanced decision-making precision and robustness under both ideal and noisy observation conditions.
- Demonstrated the effectiveness of SAC in handling high-dimensional state spaces and generating precise maneuvering commands.
Abstract
1. Introduction
- This paper completed the modeling of UAV kinematics and the engagement scenario, as well as the construction of a discrete maneuver command library.
- This paper implemented SAC deep reinforcement learning within a continuous action space using a multidimensional coupled reward function to generate maneuver decision commands.
- This paper established a Gym simulation environment to validate the robustness of SAC/TD3 reinforcement learning algorithms under both error-free and erroneous input state observations.
2. Problem Formulation
2.1. UAV Maneuvering Model
2.2. Autonomous Aerial Maneuvering Environment
2.3. Opponent Maneuver Policy
3. Methodologies
3.1. State Space and Action Space
3.2. Reward Function Design
3.3. Policy Network Training Method
4. Simulation and Results
4.1. Simulation Parameters and Initialization Conditions Settings
4.2. Results Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
DURC Statement
Conflicts of Interest
References
- Cao, S.; Wang, X.; Zhang, R.; Peng, Y.; Yu, H. Aerobatic Maneuvering flight control of fixed-wing UAVs: An SE (3) approach using dual quaternion. IEEE Trans. Ind. Electron. 2024, 71, 14362–14372. [Google Scholar] [CrossRef]
- Zhang, J.d.; Yu, Y.f.; Zheng, L.h.; Yang, Q.m.; Shi, G.q.; Wu, Y. Situational continuity-based air combat autonomous maneuvering decision-making. Def. Technol. 2023, 29, 66–79. [Google Scholar] [CrossRef]
- Li, Y.; Dong, W.; Zhang, P.; Zhai, H.; Li, G. Hierarchical reinforcement learning with automatic curriculum generation for unmanned combat aerial vehicle tactical decision-making in autonomous air combat. Drones 2025, 9, 384. [Google Scholar] [CrossRef]
- Zhu, Y.; Zheng, Y.; Wei, W.; Fang, Z. Enhancing Automated Maneuvering Decisions in UCAV Air Combat Games Using Homotopy-Based Reinforcement Learning. Drones 2024, 8, 756. [Google Scholar] [CrossRef]
- Yang, J.; Yang, X.; Yu, T. Multi-unmanned aerial vehicle confrontation in intelligent air combat: A multi-agent deep reinforcement learning approach. Drones 2024, 8, 382. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, Y.; Sun, M.; Chen, Z. Air combat maneuver decision based on deep reinforcement learning with auxiliary reward. Neural Comput. Appl. 2024, 36, 13341–13356. [Google Scholar] [CrossRef]
- Fan, Z.; Xu, Y.; Kang, Y.; Luo, D. Air combat maneuver decision method based on A3C deep reinforcement learning. Machines 2022, 10, 1033. [Google Scholar] [CrossRef]
- Zheng, Z.; Duan, H. UAV maneuver decision-making via deep reinforcement learning for short-range air combat. Intell. Robot. 2023, 3, 76–94. [Google Scholar] [CrossRef]
- Yang, Q.; Zhu, Y.; Zhang, J.; Qiao, S.; Liu, J. UAV air combat autonomous maneuver decision based on DDPG algorithm. In Proceedings of the 2019 IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 16–19 July 2019; pp. 37–42. [Google Scholar]
- Gao, X.; Zhang, Y.; Wang, B.; Leng, Z.; Hou, Z. The Optimal Strategies of Maneuver Decision in Air Combat of UCAV Based on the Improved TD3 Algorithm. Drones 2024, 8, 501. [Google Scholar] [CrossRef]
- Buşoniu, L.; Rejeb, J.B.; Lal, I.; Morărescu, I.C.; Daafouz, J. Optimistic minimax search for noncooperative switched control with or without dwell time. Automatica 2020, 112, 108632. [Google Scholar] [CrossRef]
- Austin, F.; Carbone, G.; Falco, M.; Hinz, H.; Lewis, M. Automated maneuvering decisions for air-to-air combat. In Proceedings of the Guidance, Navigation and Control Conference, Monterey, CA, USA, 17–19 August 1987; p. 2393. [Google Scholar]
- Pope, A.P.; Ide, J.S.; Mićović, D.; Diaz, H.; Twedt, J.C.; Alcedo, K.; Walker, T.T.; Rosenbluth, D.; Ritholtz, L.; Javorsek, D. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials. IEEE Trans. Artif. Intell. 2022, 4, 1371–1385. [Google Scholar] [CrossRef]
- Seong, H.; Shim, D.H. TempFuser: Learning Agile, Tactical, and Acrobatic Flight Maneuvers Using a Long Short-Term Temporal Fusion Transformer. IEEE Robot. Autom. Lett. 2024, 9, 10803–10810. [Google Scholar] [CrossRef]
- Kong, W.; Zhou, D.; Du, Y.; Zhou, Y.; Zhao, Y. Reinforcement learning for multiaircraft autonomous air combat in multisensor UCAV platform. IEEE Sensors J. 2022, 23, 20596–20606. [Google Scholar] [CrossRef]
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. Openai gym. arXiv 2016, arXiv:1606.01540. [Google Scholar] [CrossRef]
- Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]




































| Action ID | Maneuver Name | Maneuver Primitive Command |
|---|---|---|
| 1 | Constant Speed Forward | |
| 2 | Accelerated Forward | |
| 3 | Decelerated Forward | |
| 4 | Constant Speed Right Turn | |
| 5 | Accelerated Right Turn | |
| 6 | Decelerated Right Turn | |
| 7 | Constant Speed Left Turn | |
| 8 | Accelerated Left Turn | |
| 9 | Decelerated Left Turn | |
| 10 | Constant Speed Climb | |
| 11 | Accelerated Climb | |
| 12 | Decelerated Climb | |
| 13 | Constant Speed Dive | |
| 14 | Accelerated Dive | |
| 15 | Decelerated Dive | |
| 16 | Constant Speed Right Climb | |
| 17 | Accelerated Right Climb | |
| 18 | Decelerated Right Climb | |
| 19 | Constant Speed Left Climb | |
| 20 | Accelerated Left Climb | |
| 21 | Decelerated Left Climb | |
| 22 | Constant Speed Right Dive | |
| 23 | Accelerated Right Dive | |
| 24 | Decelerated Right Dive | |
| 25 | Constant Speed Left Dive | |
| 26 | Accelerated Left Dive | |
| 27 | Decelerated Left Dive |
| State | x (km) | y (km) | z (km) | v (m/s) | (rad) | (rad) | |||
|---|---|---|---|---|---|---|---|---|---|
| Blue | 0 | 0 | 0 | 100 | - | - | |||
| Red | - | - | - | 0 | 100 |
| Parameter | Value | Parameter | Value | Parameter | Value |
|---|---|---|---|---|---|
| 0.25 Ma | 1 Ma | 1 × 104 | |||
| 1 × 103 | |||||
| 2 | 200 | ||||
| 6 | 0.9 km | ||||
| 0.5 km | 1.2 km | 5.5 km | |||
| 25 | |||||
| 3 km | 8 km | 0.01 | |||
| 2 Ma | 1 km | 500 | |||
| 10 km | lr | 2.5 × 10−4 | 1/10 | ||
| batch | 256 | buffer | 1 × 106 | 1 × 103 | |
| 0.99 | 1/15 |
| Evaluation Comparison | Win Rate (95% CI) | Loss Rate (95% CI) | Tie Rate (95% CI) | Mean Returns | Variance of Returns | One-Sided Win Rate p-Value (SAC > TD3) |
|---|---|---|---|---|---|---|
| SAC with no noise input | 266.5532 | 380.5362 | ||||
| TD3 with no noise input | 138.5664 | 530.2239 | – | |||
| SAC with low noise input | 250.4452 | 420.4957 | ||||
| TD3 with low noise input | 133.1989 | 545.2348 | – | |||
| SAC with high noise input | 263.4485 | 450.7286 | ||||
| TD3 with high noise input | 72.4408 | 591.0407 | – |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Quan, S.; Cao, S.; Wang, C.; Yu, H. Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm. Drones 2026, 10, 35. https://doi.org/10.3390/drones10010035
Quan S, Cao S, Wang C, Yu H. Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm. Drones. 2026; 10(1):35. https://doi.org/10.3390/drones10010035
Chicago/Turabian StyleQuan, Shiming, Su Cao, Chang Wang, and Huangchao Yu. 2026. "Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm" Drones 10, no. 1: 35. https://doi.org/10.3390/drones10010035
APA StyleQuan, S., Cao, S., Wang, C., & Yu, H. (2026). Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm. Drones, 10(1), 35. https://doi.org/10.3390/drones10010035

