Hierarchical Decision-Making for UAV Close-Range Dynamic Tracking Using a Pursuit-Strategy Action Space
Abstract
1. Introduction
- Formulation of a dimensionally reduced but trajectory-continuous action space anchored in pursuit strategies: Different from existing methods that assign top-level outputs either to low-level continuous control commands or to fixed-duration maneuver templates, this study defines the top-level decision-making action space as three geometric pursuit strategies: lag pursuit, lead pursuit, and pure pursuit. Unlike conventional discrete maneuvers, each action corresponds to a continuously updated guidance law rather than a rigid maneuver template.
- Development of strategy-level guidance laws grounded in angular geometry: Specifically addressing the characteristics of close-range anticipatory tracking, mathematical models for lag pursuit, lead pursuit, and pure pursuit are formulated. These models serve as an intermediary layer to translate top-level guidance-mode intentions into continuous low-level flight reference parameters.
- Realization of high-fidelity hierarchical closed-loop validation: by constructing a low-level flight control model within the JSBSim flight-dynamics environment, a hierarchical closed-loop architecture is established alongside mid-level geometric guidance and top-level pursuit-strategy selection, which enables the proposed framework to be evaluated under nonlinear aerodynamic constraints.
2. Materials and Methods
2.1. Comprehensive Architectural Overview
2.2. Flight Controller
2.3. Definition of State Space and Relative-Motion Characteristics
2.4. Geometric Guidance Laws and High-Level Decision-Making Model
2.4.1. Relative-Motion State Assessment and High-Level Decision-Making Mechanism
2.4.2. Lag Pursuit Guidance Law
2.4.3. Lead Pursuit Guidance Law
2.4.4. Pure Pursuit Guidance Law
3. Results
3.1. Simulation and Validation of Pursuit Guidance Laws



3.2. Convergence Evaluation of the Hierarchical Decision Framework
3.3. Comparison Between PPO and Double-DQN for High-Level Guidance-Mode Decision-Making
3.4. Analysis of Representative Dynamic-Tracking Trajectories
3.5. Results of Monte Carlo Randomized Testing
4. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
DURC Statement
Conflicts of Interest
Appendix A
Appendix A.1. Low-Level Flight-Controller PPO Agent
| Hyperparameter | Value |
|---|---|
| Number of hidden layers in the critic network | 3 |
| Hidden-layer sizes of the critic network | {256, 128, 128} |
| Optimizer of the critic network | Adam |
| Learning rate of the critic network | 3 × 10−4 |
| Hidden-layer sizes of the actor network | {512, 256, 128} |
| Optimizer of the actor network | Adam |
| Learning rate of the actor network | 3 × 10−4 |
| Number of parallel environments | 30 |
| Discount factor γ | 0.99 |
| Buffer size | 900 |
| Entropy coefficient | 1 × 10−3 |
| GAE parameter λ | 0.95 |
| Clipping parameter | 0.2 |
Appendix A.2. Hyperparameters for the Cross-Algorithm Comparison
| Group | Parameter | Proposed PPO | Double-DQN |
|---|---|---|---|
| Algorithm-specific setting | Implementation | Stable-Baselines3 PPO | Custom PyTorch Double-DQN |
| Learning paradigm | On-policy policy gradient | Off-policy value-based learning | |
| Learning rate | 3 × 10−4 | 3 × 10−4 | |
| Discount factor γ | 0.99 | 0.99 | |
| Batch size | 256 | 256 | |
| Network architecture | SB3 MlpPolicy, 64 × 64 | Q-network, 256 × 256 with ReLU | |
| Exploration mechanism | Entropy regularization | ε-greedy | |
| Replay buffer | Not used | 200,000 transitions | |
| Target network | Not used | Updated every 2000 steps | |
| PPO-specific setting | Rollout length n_steps | 1024 | -- |
| Effective rollout size | 16 × 1024 = 16,384 | -- | |
| Number of epochs | 10 | -- | |
| GAE parameter λ | 0.95 | -- | |
| Clip range | 0.3 | -- | |
| Entropy coefficient | 3 × 10−4 | -- | |
| Double-DQN-specific setting | Learning starts | -- | 10,000 transitions |
| Training frequency | -- | Every 4 environment steps | |
| Gradient clipping | -- | 10.0 | |
| ε schedule | -- | 1.0 to 0.05 | |
| ε decay length | -- | 500,000 transitions |
References
- Yang, Q.; Zhang, J.; Shi, G.; Hu, J.; Wu, Y. Maneuver Decision of UAV in Short-Range Air Combat Based on Deep Reinforcement Learning. IEEE Access 2020, 8, 363–378. [Google Scholar] [CrossRef]
- Zhang, H.; Wei, Y.; Zhou, H.; Huang, C. Maneuver Decision-Making for Autonomous Air Combat Based on FRE-PPO. Appl. Sci. 2022, 12, 10230. [Google Scholar] [CrossRef]
- Li, B.; Huang, J.; Bai, S.; Gan, Z.; Liang, S.; Evgeny, N.; Yao, S. Autonomous Air Combat Decision-Making of UAV Based on Parallel Self-Play Reinforcement Learning. CAAI Trans. Intell. Technol. 2023, 8, 64–81. [Google Scholar] [CrossRef]
- Pope, A.P.; Ide, J.S.; Mićović, D.; Diaz, H.; Twedt, J.C.; Alcedo, K.; Walker, T.T.; Rosenbluth, D.; Ritholtz, L.; Javorsek, D. Hierarchical Reinforcement Learning for Air Combat at DARPA’s AlphaDogfight Trials. IEEE Trans. Artif. Intell. 2023, 4, 1371–1385. [Google Scholar] [CrossRef]
- Barto, A.G.; Mahadevan, S. Recent Advances in Hierarchical Reinforcement Learning. Discret. Event Dyn. Syst. 2003, 13, 41–77. [Google Scholar] [CrossRef]
- Hu, W.; Deng, Z.; Yang, Y.; Zhang, P.; Cao, K.; Chu, D.; Zhang, B.; Cao, D. Socially Game-Theoretic Lane-Change for Autonomous Heavy Vehicle Based on Asymmetric Driving Aggressiveness. IEEE Trans. Veh. Technol. 2025, 74, 17005–17018. [Google Scholar] [CrossRef]
- Deng, Z.; Hu, W.; Sun, C.; Chu, D.; Huang, T.; Li, W.; Yu, C.; Pirani, M.; Cao, D.; Khajepour, A. Eliminating Uncertainty of Driver’s Social Preferences for Lane Change Decision-Making in Realistic Simulation Environment. IEEE Trans. Intell. Transp. Syst. 2024, 26, 1583–1597. [Google Scholar] [CrossRef]
- Peng, F.; She, S.; Deng, Z. Semantic-Aligned Multimodal Vision–Language Framework for Autonomous Driving Decision-Making. Machines 2026, 14, 125. [Google Scholar] [CrossRef]
- Yuan, W.; Xiwen, Z.; Rong, Z.; Shangqin, T.; Huan, Z.; Wei, D. Research on UCAV Maneuvering Decision Method Based on Heuristic Reinforcement Learning. Comput. Intell. Neurosci. 2022, 2022, 1477078. [Google Scholar] [CrossRef]
- Chai, J.; Chen, W.; Zhu, Y.; Yao, Z.-X.; Zhao, D. A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 5417–5429. [Google Scholar] [CrossRef]
- Tan, M.; Sun, H.; Ding, D.; Zhou, H.; Han, T.; Luo, Y. Hierarchical Online Air Combat Maneuver Decision Making and Control Based on Surrogate-Assisted Differential Evolution Algorithm. Drones 2025, 9, 106. [Google Scholar] [CrossRef]
- Wang, L.; Wang, J.; Liu, H.; Yue, T. Decision-Making Strategies for Close-Range Air Combat Based on Reinforcement Learning with Variable-Scale Actions. Aerospace 2023, 10, 401. [Google Scholar] [CrossRef]
- Zhang, Y.; Yang, Z.; Zhang, B.; Wang, X.; Piao, H.; Zhou, D. Air Combat Joint Strategy Learning Based on a Dual-Loop Framework and Hindsight Experience Replay. J. Comput. Des. Eng. 2026, 13, 1–22. [Google Scholar] [CrossRef]
- Wang, C.; Tu, J.; Yang, X.; Yao, J.; Xue, T.; Ma, J.; Zhang, Y.; Ai, J.; Dong, Y. Explainable Basic-Fighter-Maneuver Decision Support Scheme for Piloting within-Visual-Range Air Combat. J. Aerosp. Inf. Syst. 2024, 21, 501–514. [Google Scholar] [CrossRef]
- Cao, Y.; Kou, Y.-X.; Li, Z.-W.; Xu, A. Autonomous Maneuver Decision of UCAV Air Combat Based on Double Deep Q Network Algorithm and Stochastic Game Theory. Int. J. Aerosp. Eng. 2023, 2023, 3657814. [Google Scholar] [CrossRef]
- Yang, K.; Kim, S.; Lee, Y.; Jang, C.; Kim, Y.-D. Manual-Based Automated Maneuvering Decisions for Air-to-Air Combat. J. Aerosp. Inf. Syst. 2024, 21, 28–36. [Google Scholar] [CrossRef]
- Wang, W.; Ru, L.; Lv, M.; Hou, Y.; Yin, H. Exploring Hierarchical Hybrid Autonomous Maneuvering Decision-Making Architecture in Beyond Visual Range Air Combat. IEEE Trans. Veh. Technol. 2025, 74, 15491–15506. [Google Scholar] [CrossRef]
- Wang, H.; Wang, J. Enhancing Multi-UAV Air Combat Decision Making via Hierarchical Reinforcement Learning. Sci. Rep. 2024, 14, 4458. [Google Scholar] [CrossRef]
- Wang, X.; Wang, Y.; Su, X.; Wang, L.; Lu, C.; Peng, H.; Liu, J. Deep Reinforcement Learning-Based Air Combat Maneuver Decision-Making: Literature Review, Implementation Tutorial and Future Direction. Artif. Intell. Rev. 2024, 57, 1. [Google Scholar] [CrossRef]
- Qian, C.; Zhang, X.; Li, L.; Zhao, M.; Fang, Y. H3E: Learning Air Combat with a Three-Level Hierarchical Framework Embedding Expert Knowledge. Expert Syst. Appl. 2024, 245, 123084. [Google Scholar] [CrossRef]
- Li, Y.; Dong, W.; Zhang, P.; Zhai, H.; Li, G. Hierarchical Reinforcement Learning with Automatic Curriculum Generation for Unmanned Combat Aerial Vehicle Tactical Decision-Making in Autonomous Air Combat. Drones 2025, 9, 384. [Google Scholar] [CrossRef]
- Zheng, Z.; Duan, H. UAV Maneuver Decision-Making via Deep Reinforcement Learning for Short-Range Air Combat. Intell. Robot. 2023, 3, 76–94. [Google Scholar] [CrossRef]
- De Marco, A.; D’Onza, P.M.; Manfredi, S. A Deep Reinforcement Learning Control Approach for High-Performance Aircraft. Nonlinear Dyn. 2023, 111, 17037–17077. [Google Scholar] [CrossRef]
- Li, L.; Zhang, X.; Qian, C.; Wang, R. Basic Flight Maneuver Generation of Fixed-Wing Plane Based on Proximal Policy Optimization. Neural Comput. Appl. 2023, 35, 10239–10255. [Google Scholar] [CrossRef]
- Li, L.; Zhang, X.; Qian, C.; Wang, R.; Zhao, M. Autopilot Controller of Fixed-Wing Planes Based on Curriculum Reinforcement Learning Scheduled by Adaptive Learning Curve. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 2182–2196. [Google Scholar] [CrossRef]
- Yang, J.; Wang, L.; Han, J.; Chen, C.; Yuan, Y.; Yu, Z.L.; Yang, G. An Air Combat Maneuver Decision-Making Approach Using Coupled Reward in Deep Reinforcement Learning. Complex Intell. Syst. 2025, 11, 364–380. [Google Scholar] [CrossRef]
- Shaw, R.L. Fighter Combat: Tactics and Maneuvering; Naval Institute Press: Annapolis, MD, USA, 1985; pp. 62–97. [Google Scholar]
- Yang, Z.; Sun, Z.; Piao, H.; Huang, J.; Zhou, D.; Ren, Z. Online Hierarchical Recognition Method for Target Tactical Intention in Beyond-Visual-Range Air Combat. Def. Technol. 2022, 18, 1349–1361. [Google Scholar] [CrossRef]
- Zhang, T.; Wang, Y.; Sun, M.; Chen, Z. Air Combat Maneuver Decision Based on Deep Reinforcement Learning with Auxiliary Reward. Neural Comput. Appl. 2024, 36, 13341–13356. [Google Scholar] [CrossRef]
- Zhou, H.; Liu, A.; Li, H. UAV Air Combat Situation Assessment Method Based on Improved Clustering and Self-Learning Network. In Proceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering, Xiamen, China, 20–22 October 2023; ACM: New York, NY, USA, 2023; pp. 1753–1759. [Google Scholar]
- Li, H.; Lin, Q.; Han, T.; He, Y. Close-Range Air Combat Model Based on Energy Maneuverability and Its Applications. Acta Aeronaut. Astronaut. Sin. 2025, 46, 1753–1759. [Google Scholar] [CrossRef]
- Chen, R.; Li, H.; Yan, G.; Peng, H.; Zhang, Q. Hierarchical Reinforcement Learning Framework in Geographic Coordination for Air Combat Tactical Pursuit. Entropy 2023, 25, 1409. [Google Scholar] [CrossRef]









| Symbol | Definition & Physical Meaning |
|---|---|
| Line-of-sight range between both aircraft, used to determine whether the tracked aircraft satisfies the predefined close-range geometric condition. | |
| Antenna train angle (ATA), the angle between the tracking aircraft’s velocity vector and the line-of-sight vector. | |
| Aspect angle (AA), the angle between the tracked aircraft’s velocity vector and the line-of-sight vector. | |
| Relative altitude, used to evaluate the margin for potential energy conversion and assess the potential energy difference between the two aircraft. | |
| Flight airspeeds of the tracking aircraft and the tracked aircraft, which determine their respective kinetic energy levels. | |
| Current normal load factors of both aircraft, reflecting the intensity of their maneuvers. | |
| , , represent the angle of attack, sideslip angle, and flight-path angle, respectively. In the local North-East-Up inertial frame, the flight-path angle is defined by , where is the vertical component of the velocity vector and is the flight speed. The pitch angle describes the attitude of the aircraft body, whereas describes the direction of the velocity vector; for coordinated small-sideslip flight, . | |
| The complete state-space vector used for relative-motion state assessment. |
| Strategy | Guidance Purpose | Typical Applicable Relative-Motion Condition | Possible Limitation |
|---|---|---|---|
| Lag pursuit | Maintains or enlarges angular alignment advantage while preventing overshoot by guiding the tracking aircraft toward the rear region of the tracked aircraft. | The tracking aircraft has obtained a preliminary favorable relative position or is approaching the tracked aircraft with excessive closure speed; the main objective is to remain within a favorable tracking region while avoiding crossing the tracked aircraft’s turn circle. | It may delay geometric-alignment acquisition when the tracking aircraft needs to rapidly align the nose with the tracked aircraft; in extreme challenging situations, lag pursuit alone may not provide sufficient angular recovery. |
| Lead pursuit | Captures an anticipatory tracking geometry by aiming ahead of the tracked aircraft and compensating for prediction delay and vertical motion effects. | The tracking aircraft is close to satisfying the predefined geometric tracking condition and needs to reserve an anticipatory angle for tracking a maneuvering object. | The current implementation uses a guidance-level first-order prediction of the tracked aircraft position; under rapidly varying three-dimensional maneuvers, higher-order trajectory-prediction and acceleration modeling may be required. |
| Pure pursuit | Rapidly reduces the line-of-sight angle by pointing directly toward the tracked aircraft’s current position, serving as an angular-recovery mechanism under challenging relative geometry. | The tracking aircraft is in a challenging or neutral angular position and needs to quickly regain nose-pointing direction toward the tracked aircraft. | Pure pursuit can cause excessive closure speed and may lead to overshoot or altitude loss if not coordinated with energy and safety constraints. It is not intended to replace more complex emergency recovery maneuvers. |
| Scenario | Tracked Aircraft Maneuver | Initial Velocity (m/s) | Initial Altitude (km) | Initial Heading Angle (∘) |
|---|---|---|---|---|
| Lag pursuit | Level flight | 244, 183 | 6, 6 | 45, 0 |
| Lag pursuit (Tracking aircraft inside the turning circle) | Turning | 304, 183 | 6, 6 | 45, 0 |
| Lag pursuit (Tracking aircraft outside the turning circle) | Turning | 304, 183 | 6, 6 | 45, 0 |
| Lead pursuit | Turning | 244, 183 | 6, 6 | 45, 0 |
| Pure pursuit | Lag pursuit | 183, 183 | 6, 6 | 0, 0 |
| Hyperparameter | Value |
|---|---|
| Number of hidden layers in the critic network | 3 |
| Hidden-layer sizes of the critic network | {512, 256, 128} |
| Optimizer of the critic network | Adam |
| Learning rate of the critic network | 3 × 10−4 |
| Hidden-layer sizes of the actor network | {256, 256, 128} |
| Optimizer of the actor network | Adam |
| Learning rate of the actor network | 3 × 10−4 |
| Number of parallel environments | 32 |
| Discount factor γ | 0.99 |
| Buffer size | 512 |
| Entropy coefficient | 1 × 10−3 |
| GAE parameter λ | 0.95 |
| Method | Avg. Reward | Success (%) | Non-Crash Success (%) | Failure (%) | Simultaneous Termination (%) | Learning-Agent Crash (%) |
|---|---|---|---|---|---|---|
| Double-DQN | 634.98 | 57.0 | 34.5 | 29.0 | 14.0 | 20.5 |
| Proposed PPO | 827.35 | 83.5 | 67.0 | 8.5 | 8.0 | 1.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Lai, Y.; Chen, Y.; Yang, Y.; Jian, J.; Liu, Y. Hierarchical Decision-Making for UAV Close-Range Dynamic Tracking Using a Pursuit-Strategy Action Space. Aerospace 2026, 13, 508. https://doi.org/10.3390/aerospace13060508
Lai Y, Chen Y, Yang Y, Jian J, Liu Y. Hierarchical Decision-Making for UAV Close-Range Dynamic Tracking Using a Pursuit-Strategy Action Space. Aerospace. 2026; 13(6):508. https://doi.org/10.3390/aerospace13060508
Chicago/Turabian StyleLai, Yu, Yong Chen, Yang Yang, Jialong Jian, and Yuanfei Liu. 2026. "Hierarchical Decision-Making for UAV Close-Range Dynamic Tracking Using a Pursuit-Strategy Action Space" Aerospace 13, no. 6: 508. https://doi.org/10.3390/aerospace13060508
APA StyleLai, Y., Chen, Y., Yang, Y., Jian, J., & Liu, Y. (2026). Hierarchical Decision-Making for UAV Close-Range Dynamic Tracking Using a Pursuit-Strategy Action Space. Aerospace, 13(6), 508. https://doi.org/10.3390/aerospace13060508

