SCPL-TD3: An Intelligent Evasion Strategy for High-Speed UAVs in Coordinated Pursuit-Evasion
Abstract
Highlights
- Proposes the SCPL-TD3 strategy to enable effective evasion for High-Speed UAVs against coordinated pursuers.
- Analyzes and classifies the impact of pursuer spacing on evasion difficulty levels.
- Achieves superior evasion rates while minimizing resource costs, thereby preserving operational capability for subsequent missions.
- Provides a foundational framework and a critical decision-making metric for assessing evasion difficulty and optimizing vehicle trajectory in complex pursuit-evasion scenarios.
Abstract
1. Introduction
- Existing methodologies primarily emphasize the successful execution of evasion maneuvers while neglecting the extent of deviation from the intended trajectory [23,24,25,26,27]. This oversight reduces the probability of successfully reaching the target zone post-evasion, thereby limiting mission effectiveness;
- Classical methods suffer from computational intractability in high-dimensional state spaces and reliance on idealized assumptions, while existing DRL approaches lack sample efficiency and exhibit slow convergence as well as training instability.
- A semantic classification progressive learning framework, combined with the Twin Delayed Deep Deterministic Policy Gradient (SCPL-TD3) algorithm, is introduced to improve agent training in complex PE environments. Unlike conventional fixed training paradigms, SCPL-TD3 dynamically adjusts training complexity and leverages semantic classification to prioritize critical state-action patterns, thereby providing targeted guidance during policy learning. Furthermore, by building upon the twin delayed deep deterministic policy gradient framework, the proposed method not only accelerates convergence but also significantly improves training stability and sample efficiency in dynamic and high-dimensional environments;
- This work presents an analysis of a coordinated PE scenario involving two pursuers, from which an optimal initial spatial interval is derived to enhance the effectiveness of coordinated pursuit maneuvers. Moreover, an evasion difficulty classification framework is introduced, which systematically categorizes two-pursuer PE scenarios into distinct levels according to their spatial and dynamic constraints. This framework provides a structured method for evaluating and optimizing evasion strategies under cooperative pursuer conditions;
- A reward function is proposed that integrates energy consumption constraints and critical miss distance into a unified optimization objective. This multi-objective design not only minimizes evasion cost but also maintains sufficient operational margins, thereby ensuring the reliable accomplishment of subsequent mission objectives. By effectively balancing immediate evasion demands and overall mission requirements, the proposed reward mechanism significantly enhances the practicality and robustness of the learned evasion strategies.
2. Problem Formulation and Vehicle Modeling
2.1. Coordinated PE Scenario Construction
2.2. Vehicle Modeling
3. Methods
3.1. Initial Common Space Interval Generation
3.2. SCPL-TD3 Algorithm
3.3. Design of State Space and Motion Space
3.4. Reward Function Design Considering the Evasion Cost
3.5. Network Structure Design
4. Simulation and Analysis
4.1. Effectiveness Validation
4.1.1. Training Efficiency and Performance
4.1.2. Simple Evasion Scenario
4.1.3. Difficult Evasion Scenario
4.2. Robustness Verification
4.3. Real-Time Performance Validation
- (1)
- The proposed SCPL-TD3 algorithm demonstrates superior training efficiency, converging significantly faster than TD3 and achieving higher performance than PPO by effectively avoiding premature convergence.
- (2)
- In evasion effectiveness tests, SCPL-TD3 achieves successful evasion in complex scenarios where all baseline methods fail, while its energy-saving trajectory optimization directly enhances mission success probability by preserving critical kinetic energy for follow-on tasks.
- (3)
- Comprehensive Monte Carlo simulations validate the algorithm’s robustness, showing a 97.04% success rate under significant initial condition perturbations.
- (4)
- The real-time performance validation confirms the method’s practical deployability, generating intelligent evasion commands within stringent computational constraints.
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhao, Z.Y.; Ma, Y.; Tian, Y.; Ding, Z.J.; Zhang, H.; Tong, S.H. Research on integrated design method of wide-range hypersonic vehicle/engine based on dynamic multi-objective optimization. Aerosp. Sci. Technol. 2025, 159, 110031. [Google Scholar] [CrossRef]
- Bu, X.W.; Qi, Q. Fuzzy optimal tracking control of hypersonic flight vehicles via single-network adaptive critic design. IEEE Trans. Fuzzy Syst. 2022, 30, 270–278. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, C.; Zheng, C.M.; Kong, X.W.; Bao, J.Y. Adaptive neural network fault-tolerant control of hypersonic vehicle with immeasurable state and multiple actuator faults. Aerosp. Sci. Technol. 2024, 152, 109378. [Google Scholar] [CrossRef]
- Ding, Y.B.; Yue, X.K.; Chen, G.S.; Si, J.S. Review of control and guidance technology on hypersonic vehicle. Chin. J. Aeronaut. 2022, 7, 1–18. [Google Scholar] [CrossRef]
- Li, B.; Gan, Z.; Chen, D.; Sergey Alekandrovich, D. UAV maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning. Remote Sens. 2020, 12, 3789. [Google Scholar] [CrossRef]
- Zhuang, X.; Li, D.; Wang, Y.; Liu, X.; Li, H. Optimization of high-speed fixed-wing UAV penetration strategy based on deep reinforcement learning. Aerosp. Sci. Technol. 2024, 148, 109089. [Google Scholar] [CrossRef]
- Fainkich, M.; Shima, T. Cooperative guidance for simultaneous interception using multiple entangled sliding surfaces. J. Guid. Control Dyn. 2025, 48, 591–599. [Google Scholar] [CrossRef]
- Zheng, Z.W.; Li, J.Z.; Feroskhan, M. Three-dimensional terminal angle constraint guidance law with class K∞ function-based adaptive sliding mode control. Aerosp. Sci. Technol. 2024, 147, 109005. [Google Scholar] [CrossRef]
- Yan, T.; Cai, Y.L.; Xu, B. Evasion guidance algorithms for air-breathing hypersonic vehicles in three-player pursuit-evasion games. Chin. J. Aeronaut. 2020, 33, 3423–3436. [Google Scholar] [CrossRef]
- Imado, F.; Uehara, S. High-g barrel roll maneuvers against proportional navigation from optimal control viewpoint. J. Guid. Control Dyn. 1998, 21, 876–881. [Google Scholar] [CrossRef]
- Yu, W.B.; Chen, W.C.; Jiang, Z.G.; Zhang, W.Q.; Zhao, P.L. Analytical entry guidance for coordinated flight with multiple no-fly-zone constraints. Aerosp. Sci. Technol. 2018, 84, 273–290. [Google Scholar] [CrossRef]
- Wang, C.C.; Wang, Z.L.; Zhang, S.Y.; Tan, J.R. Adam-assisted quantum particle swarm optimization guided by length of potential well for numerical function optimization. Swarm Evol. Comput. 2023, 79, 101309. [Google Scholar] [CrossRef]
- Morelli, A.C.; Hofmann, C.; Topputo, F. Robust low-thrust trajectory optimization using convex programming and a homotopic approach. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 2103–2116. [Google Scholar] [CrossRef]
- Mao, Y.Q.; Szmuk, M.; Xu, X.R.; Acikmese, B. Successive convexification: A superlinearly convergent algorithm for non-convex optimal control problems. arXiv 2018, arXiv:1804.06539. [Google Scholar]
- Liu, Z.; Zhang, X.G.; Wei, C.Z.; Cui, N.G. High-precision adaptive convex programming for reentry trajectories of suborbital vehicles. Acta Aeronaut. Astronaut. Sin. 2023, 44, 729430. [Google Scholar]
- Zhang, J.L.; Liu, K.; Fan, Y.Z.; Yu, Z.Y. A piecewise predictor-corrector re-entry guidance algorithm with no-fly zone avoidance. J. Astronaut. 2021, 42, 122–131. [Google Scholar]
- He, R.Z.; Liu, L.H.; Tang, G.J.; Bao, W.M. Entry trajectory generation without reversal of bank angle. Aerosp. Sci. Technol. 2017, 71, 627–635. [Google Scholar] [CrossRef]
- Liang, Z.X.; Ren, Z. Tentacle-based guidance for entry flight with no-fly zone constraint. J. Guid. Control Dyn. 2017, 40, 1–10. [Google Scholar] [CrossRef]
- Chai, R.Q.; Tsourdos, A.; Savvaris, A.; Cai, S.C.; Xia, Y.Q. High-fidelity trajectory optimization for aeroassisted vehicles using variable order pseudospectral method. Chin. J. Aeronaut. 2021, 34, 237–251. [Google Scholar] [CrossRef]
- Rao, H.P.; Zhong, R.; Li, P.J. Fuel-optimal deorbit scheme of space debris using tethered space-tug based on pseudospectral method. Chin. J. Aeronaut. 2020, 34, 210–223. [Google Scholar] [CrossRef]
- Hou, L.F.; Li, L.; Chang, L.L.; Wang, Z.; Sun, G.Q. Pattern dynamics of vegetation based on optimal control theory. Nonlinear Dyn. 2025, 113, 1–23. [Google Scholar] [CrossRef]
- Wu, S. Linear-quadratic non-zero sum backward stochastic differential game with overlapping information. IEEE Trans. Autom. Control 2023, 68, 1800–1806. [Google Scholar] [CrossRef]
- Yu, X.Y.; Wang, X.F.; Lin, H. Optimal penetration guidance law with controllable missile escape distance. J. Astronaut. 2023, 44, 1053–1063. [Google Scholar]
- Liu, C.; Sun, S.S.; Tao, C.G.; Shou, Y.X.; Xu, B. Optimizing evasive maneuvering of planes using a flight quality driven model. Sci. China Inf. Sci. 2024, 67, 132206. [Google Scholar] [CrossRef]
- Du, Q.F.; Hu, Y.D.; Jing, W.X.; Gao, C.S. Three-dimensional target evasion strategy without missile guidance information. Aerosp. Sci. Technol. 2025, 157, 109857. [Google Scholar] [CrossRef]
- Singh, S.K.; Reddy, P.V. Dynamic network analysis of a target defense differential game with limited observations. IEEE Trans. Control Netw. Syst. 2023, 10, 308–320. [Google Scholar] [CrossRef]
- Wang, Y.Q.; Ning, G.D.; Wang, X.F. Maneuver penetration strategy of near space vehicle based on differential game. Acta Aeronaut. Astronaut. Sin. 2020, 41, 724276. [Google Scholar]
- Wang, X.; Wang, S.; Liang, X.X.; Zhao, D.W.; Huang, J.C.; Xu, X. Deep reinforcement learning: A survey. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef]
- Kiran, B.R. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 4909–4926. [Google Scholar] [CrossRef]
- Chen, J.Y.; Yu, C.; Li, G.S.; Tang, W.H.; Ji, S.L.; Yang, X.Y. Online planning for multi-uav pursuit-evasion in unknown environments using deep reinforcement learning. IEEE Robot. Autom. Lett. 2025, 10, 8196–8203. [Google Scholar] [CrossRef]
- Qu, X.Q.; Gan, W.H.; Song, D.L.; Zhou, L.Q. Pursuit-evasion game strategy of USV based on deep reinforcement learning in complex multi-obstacle environment. Ocean Eng. 2023, 273, 114016. [Google Scholar] [CrossRef]
- Li, J.S.; Wang, X.F.; Lin, H. Intelligent penetration policy for hypersonic cruise missiles based on virtual targets. Acta Armamentarii. 2024, 45, 3856–3867. [Google Scholar]
- Yan, T.; Liu, C.; Gao, M.J.; Jiang, Z.J.; Li, T. A deep reinforcement learning-based intelligent maneuvering strategy for the high-speed UAV pursuit-evasion game. Drones 2024, 8, 309. [Google Scholar] [CrossRef]
- Duan, Z.K.; Xu, G.J.; Liu, X.; Ma, J.Y.; Wang, L.Y. Optimal confrontation position selecting games model and its application to one-on-one air combat. Def. Technol. 2024, 31, 417–428. [Google Scholar] [CrossRef]
- Mishley, A.; Shaferman, V. Near-optimal evasion from acceleration bounded modern pursuers. J. Guid. Control Dyn. 2025, 48, 793–807. [Google Scholar] [CrossRef]
- Hao, Z.M.; Zhang, R.; Li, H.F. Parameterized evasion strategy for hypersonic glide vehicles against two missiles based on reinforcement learning. Chin. J. Aeronaut. 2025, 38, 103173. [Google Scholar] [CrossRef]
Method Category | Advantages | Limitations | Key Distinctions from Proposed Method |
---|---|---|---|
Predefined Maneuvers | Simple implementation, computationally efficient | Predictable, inflexible in dynamic environments | SCPL-TD3 uses adaptive learning rather than fixed patterns |
Trajectory Optimization | Provides safe trajectories by avoiding threat zones | Overly conservative, high-energy consumption, large deviations | SCPL-TD3 balances evasion with mission constraints |
Modern Control-Based Guidance Laws | Rigorous theoretical foundation, optimal solutions under ideal conditions | Requires perfect target knowledge, limited to unilateral solutions | SCPL-TD3 is model-free and computationally efficient for high-dimensional states |
Existing DRL Methods | model-free approach, potential for complex environments | Slow convergence, unstable training, limited to one-on-one scenarios | SCPL-TD3 accelerates convergence and handles coordinated pursuit |
Type of the Network | Policy Network | Actor Network |
---|---|---|
Input layer | 8 | 8 |
Hidden layer 1 | 256 | 256 |
Hidden layer 2 | 256 | 256 |
Hidden layer 3 | 256 | 256 |
Hidden layer 4 | 128 | 256 |
Hidden layer 5 | 64 | 128 |
Output layer | 1 | 1 |
Variable | Value |
---|---|
Initial coordinate value of HSUAV | |
Initial coordinate value of the typical | |
Initial coordinate value of the typical | |
Initial coordinate value of the typical | |
Mach number | 6, 4, 4 |
Initial value of the ballistic deviation angle | 0, 180, 180 |
Time constants of autopilot | 0.5 |
Maximum lateral overload | 2, 6, 6 |
Miss distance threshold | 1 |
Sampling time/ms | 0.1 |
Navigation coefficient | 4 |
Size of the ERB | |
Small batch sample size | 256 |
Learning rate of actor network and critic network | |
Initial exploration rate | 0.3 |
Attenuation factor | |
Discount factor | 0.99 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, X.; Yan, T.; Li, T.; Liu, C.; Jiang, Z.; Yan, J. SCPL-TD3: An Intelligent Evasion Strategy for High-Speed UAVs in Coordinated Pursuit-Evasion. Drones 2025, 9, 685. https://doi.org/10.3390/drones9100685
Zhang X, Yan T, Li T, Liu C, Jiang Z, Yan J. SCPL-TD3: An Intelligent Evasion Strategy for High-Speed UAVs in Coordinated Pursuit-Evasion. Drones. 2025; 9(10):685. https://doi.org/10.3390/drones9100685
Chicago/Turabian StyleZhang, Xiaoyan, Tian Yan, Tong Li, Can Liu, Zijian Jiang, and Jie Yan. 2025. "SCPL-TD3: An Intelligent Evasion Strategy for High-Speed UAVs in Coordinated Pursuit-Evasion" Drones 9, no. 10: 685. https://doi.org/10.3390/drones9100685
APA StyleZhang, X., Yan, T., Li, T., Liu, C., Jiang, Z., & Yan, J. (2025). SCPL-TD3: An Intelligent Evasion Strategy for High-Speed UAVs in Coordinated Pursuit-Evasion. Drones, 9(10), 685. https://doi.org/10.3390/drones9100685