Intelligent Game Strategies in Target-Missile-Defender Engagement Using Curriculum-Based Deep Reinforcement Learning
Abstract
:1. Introduction
- (1)
- Combining the findings of differential game in the traditional three-body game with DRL algorithms enables agent training with clearer direction, while avoiding inaccuracies due to model linearization, and better adapts to complex battlefield environments with stronger nonlinearity.
- (2)
- The three-body adversarial game model is constructed as a Markov Decision Process suitable for reinforcement learning training. Through analysis of the sign of the action space and design of the reward function in the adversarial form, the combat requirements of evasion and attack can be balanced in both missile and target/defender training.
- (3)
- The missile agent and target/defender agent are trained in a curriculum learning approach to obtain intelligent game strategies for both attack and defense.
- (4)
- The intelligent attack strategy enables the missile to avoid the defender and hit the target in various battlefield situations and adapt to the complex environment.
- (5)
- The intelligent active defense strategy enables the less capable target/defender to achieve an effect similar to network adversarial attack on the missile agent. The defender intercepts the attacking missile before it hits the target.
2. Dynamic Model of TMD Engagement
2.1. Nonlinear Engagement Model
2.2. Linearization and Zero-Effort Miss
3. Curriculum-Based DRL Algorithm
3.1. Deep Reinforcement Learning and Curriculum Learning
3.2. Reward Shaping
3.3. Action Selection
3.4. Observation Selection
3.5. Curricula for Steady Training
3.6. Strategy Update Algorithm
4. Intelligent Game Strategies
4.1. Attack Strategy for the Missile
4.2. Active Defense Strategy for the Target/Defender
5. Simulation Results and Analysis
5.1. Training Setting
5.2. Simulation Analysis of Attack Strategy for Missile
5.2.1. Engagement in Different Scenarios
5.2.2. Analysis of Typical Engagement Process
5.2.3. Performance under Uncertainty Disturbances
5.3. Simulation Analysis of Active Defense Strategy for Target/Defender
5.3.1. Engagement in Different Scenarios
5.3.2. Analysis of Typical Engagement Process
5.3.3. Performance under Uncertainty Disturbances
6. Conclusions
- (1)
- Employing the curriculum-based DRL trained attack strategy, the missile is able to avoid the defender and hit the target in various situations.
- (2)
- Employing the curriculum-based DRL trained attack strategy, the less capable target/defender is able to achieve an effect similar to network adversarial attack against the missile agent. The defender intercepts the missile before the it hits the target.
- (3)
- The intelligent game strategies are able to maintain robustness in the face of disturbances from input noise and modeling inaccuracies.
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Li, C.; Wang, J.; Huang, P. Optimal Cooperative Line-of-Sight Guidance for Defending a Guided Missile. Aerospace 2022, 9, 232. [Google Scholar] [CrossRef]
- Li, Q.; Yan, T.; Gao, M.; Fan, Y.; Yan, J. Optimal Cooperative Guidance Strategies for Aircraft Defense with Impact Angle Constraints. Aerospace 2022, 9, 710. [Google Scholar] [CrossRef]
- Liang, H.; Li, Z.; Wu, J.; Zheng, Y.; Chu, H.; Wang, J. Optimal Guidance Laws for a Hypersonic Multiplayer Pursuit-Evasion Game Based on a Differential Game Strategy. Aerospace 2022, 9, 97. [Google Scholar] [CrossRef]
- Shi, H.; Chen, Z.; Zhu, J.; Kuang, M. Model predictive guidance for active aircraft protection from a homing missile. IET Control Theory Appl. 2022, 16, 208–218. [Google Scholar] [CrossRef]
- Kumar, S.R.; Mukherjee, D. Cooperative Active Aircraft Protection Guidance Using Line-of-Sight Approach. IEEE Trans. Aerosp. Electron. Syst. 2021, 57, 957–967. [Google Scholar] [CrossRef]
- Yan, M.; Yang, R.; Zhang, Y.; Yue, L.; Hu, D. A hierarchical reinforcement learning method for missile evasion and guidance. Sci. Rep. 2022, 12, 18888. [Google Scholar] [CrossRef]
- Liang, H.; Wang, J.; Wang, Y.; Wang, L.; Liu, P. Optimal guidance against active defense ballistic missiles via differential game strategies. Chin. J. Aeronaut. 2020, 33, 978–989. [Google Scholar] [CrossRef]
- Ratnoo, A.; Shima, T. Line-of-Sight Interceptor Guidance for Defending an Aircraft. J. Guid. Control Dyn. 2011, 34, 522–532. [Google Scholar] [CrossRef]
- Yamasaki, T.; Balakrishnan, S. Triangle Intercept Guidance for Aerial Defense. In AIAA Guidance, Navigation, and Control Conference; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 2010. [Google Scholar]
- Yamasaki, T.; Balakrishnan, S.N.; Takano, H. Modified Command to Line-of-Sight Intercept Guidance for Aircraft Defense. J. Guid. Control Dyn. 2013, 36, 898–902. [Google Scholar] [CrossRef]
- Yamasaki, T.; Balakrishnan, S.N. Intercept Guidance for Cooperative Aircraft Defense against a Guided Missile. IFAC Proc. Vol. 2010, 43, 118–123. [Google Scholar] [CrossRef]
- Liu, S.; Wang, Y.; Li, Y.; Yan, B.; Zhang, T. Cooperative guidance for active defence based on line-of-sight constraint under a low-speed ratio. Aeronaut. J. 2022, 1–19, published online. [Google Scholar] [CrossRef]
- Shaferman, V.; Oshman, Y. Stochastic Cooperative Interception Using Information Sharing Based on Engagement Staggering. J. Guid. Control Dyn. 2016, 39, 2127–2141. [Google Scholar] [CrossRef] [Green Version]
- Prokopov, O.; Shima, T. Linear Quadratic Optimal Cooperative Strategies for Active Aircraft Protection. J. Guid. Control Dyn. 2013, 36, 753–764. [Google Scholar] [CrossRef]
- Shima, T. Optimal Cooperative Pursuit and Evasion Strategies Against a Homing Missile. J. Guid. Control Dyn. 2011, 34, 414–425. [Google Scholar] [CrossRef]
- Alkaher, D.; Moshaiov, A. Game-Based Safe Aircraft Navigation in the Presence of Energy-Bleeding Coasting Missile. J. Guid. Control Dyn. 2016, 39, 1539–1550. [Google Scholar] [CrossRef]
- Liu, F.; Dong, X.; Li, Q.; Ren, Z. Cooperative differential games guidance laws for multiple attackers against an active defense target. Chin. J. Aeronaut. 2022, 35, 374–389. [Google Scholar] [CrossRef]
- Chen, W.; Cheng, C.; Jin, B.; Xu, Z. Research on differential game guidance law for intercepting hypersonic vehicles. In Proceedings of the 6th International Workshop on Advanced Algorithms and Control Engineering (IWAACE 2022), Qingdao, China, 8–10 July 2022; Qiu, D., Ye, X., Sun, N., Eds.; SPIE: Bellingham, WA, USA, 2022; p. 94, ISBN 978-1-5106-5775-5. [Google Scholar]
- Rubinsky, S.; Gutman, S. Three-Player Pursuit and Evasion Conflict. J. Guid. Control Dyn. 2014, 37, 98–110. [Google Scholar] [CrossRef]
- Rubinsky, S.; Gutman, S. Vector Guidance Approach to Three-Player Conflict in Exoatmospheric Interception. J. Guid. Control Dyn. 2015, 38, 2270–2286. [Google Scholar] [CrossRef]
- Garcia, E.; Casbeer, D.W.; Pachter, M. Pursuit in the Presence of a Defender. Dyn. Games Appl. 2019, 9, 652–670. [Google Scholar] [CrossRef]
- Garcia, E.; Casbeer, D.W.; Pachter, M. The Complete Differential Game of Active Target Defense. J. Optim. Theory Appl. 2021, 191, 675–699. [Google Scholar] [CrossRef]
- Garcia, E.; Casbeer, D.W.; Fuchs, Z.E.; Pachter, M. Cooperative Missile Guidance for Active Defense of Air Vehicles. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 706–721. [Google Scholar] [CrossRef]
- Garcia, E.; Casbeer, D.W.; Pachter, M. Design and Analysis of State-Feedback Optimal Strategies for the Differential Game of Active Defense. IEEE Trans. Autom. Control 2018, 64, 553–568. [Google Scholar] [CrossRef]
- Liang, L.; Deng, F.; Lu, M.; Chen, J. Analysis of Role Switch for Cooperative Target Defense Differential Game. IEEE Trans. Autom. Control 2021, 66, 902–909. [Google Scholar] [CrossRef]
- Liang, L.; Deng, F.; Peng, Z.; Li, X.; Zha, W. A differential game for cooperative target defense. Automatica 2019, 102, 58–71. [Google Scholar] [CrossRef]
- Qi, N.; Sun, Q.; Zhao, J. Evasion and pursuit guidance law against defended target. Chin. J. Aeronaut. 2017, 30, 1958–1973. [Google Scholar] [CrossRef]
- Shaferman, V.; Shima, T. Cooperative Multiple-Model Adaptive Guidance for an Aircraft Defending Missile. J. Guid. Control Dyn. 2010, 33, 1801–1813. [Google Scholar] [CrossRef]
- Shaferman, V.; Shima, T. Cooperative Differential Games Guidance Laws for Imposing a Relative Intercept Angle. J. Guid. Control Dyn. 2017, 40, 2465–2480. [Google Scholar] [CrossRef]
- Saurav, A.; Kumar, S.R.; Maity, A. Cooperative Guidance Strategies for Aircraft Defense with Impact Angle Constraints. In Proceedings of the AIAA Scitech 2019 Forum, San Diego, CA, USA, 7 January 2019; American Institute of Aeronautics and Astronautics: Reston: San Diego, CA, USA, 2019. ISBN 978-1-62410-578-4. [Google Scholar]
- Liang, H.; Wang, J.; Liu, J.; Liu, P. Guidance strategies for interceptor against active defense spacecraft in two-on-two engagement. Aerosp. Sci. Technol. 2020, 96, 105529. [Google Scholar] [CrossRef]
- Shalumov, V.; Shima, T. Weapon–Target-Allocation Strategies in Multiagent Target–Missile–Defender Engagement. J. Guid. Control Dyn. 2017, 40, 2452–2464. [Google Scholar] [CrossRef]
- Sun, Q.; Qi, N.; Xiao, L.; Lin, H. Differential game strategy in three-player evasion and pursuit scenarios. J. Syst. Eng. Electron. 2018, 29, 352–366. [Google Scholar] [CrossRef]
- Sun, Q.; Zhang, C.; Liu, N.; Zhou, W.; Qi, N. Guidance laws for attacking defended target. Chin. J. Aeronaut. 2019, 32, 2337–2353. [Google Scholar] [CrossRef]
- Chai, R.; Tsourdos, A.; Savvaris, A.; Chai, S.; Xia, Y.; Philip Chen, C.L. Review of advanced guidance and control algorithms for space/aerospace vehicles. Prog. Aerosp. Sci. 2021, 122, 100696. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, H.; Wu, T.; Lun, Y.; Fan, J.; Wu, J. Attitude control for hypersonic reentry vehicles: An efficient deep reinforcement learning method. Appl. Soft Comput. 2022, 123, 108865. [Google Scholar] [CrossRef]
- Gaudet, B.; Furfaro, R.; Linares, R. Reinforcement learning for angle-only intercept guidance of maneuvering targets. Aerosp. Sci. Technol. 2020, 99, 105746. [Google Scholar] [CrossRef] [Green Version]
- He, S.; Shin, H.-S.; Tsourdos, A. Computational Missile Guidance: A Deep Reinforcement Learning Approach. J. Aerosp. Inf. Syst. 2021, 18, 571–582. [Google Scholar] [CrossRef]
- Furfaro, R.; Scorsoglio, A.; Linares, R.; Massari, M. Adaptive generalized ZEM-ZEV feedback guidance for planetary landing via a deep reinforcement learning approach. Acta Astronaut. 2020, 171, 156–171. [Google Scholar] [CrossRef] [Green Version]
- Gaudet, B.; Linares, R.; Furfaro, R. Adaptive guidance and integrated navigation with reinforcement meta-learning. Acta Astronaut. 2020, 169, 180–190. [Google Scholar] [CrossRef] [Green Version]
- He, L.; Aouf, N.; Song, B. Explainable Deep Reinforcement Learning for UAV autonomous path planning. Aerosp. Sci. Technol. 2021, 118, 107052. [Google Scholar] [CrossRef]
- Wang, Y.; Dong, L.; Sun, C. Cooperative control for multi-player pursuit-evasion games with reinforcement learning. Neurocomputing 2020, 412, 101–114. [Google Scholar] [CrossRef]
- English, J.T.; Wilhelm, J.P. Defender-Aware Attacking Guidance Policy for the Target–Attacker–Defender Differential Game. J. Aerosp. Inf. Syst. 2021, 18, 366–376. [Google Scholar] [CrossRef]
- Shalumov, V. Cooperative online Guide-Launch-Guide policy in a target-missile-defender engagement using deep reinforcement learning. Aerosp. Sci. Technol. 2020, 104, 105996. [Google Scholar] [CrossRef]
- Qiu, X.; Gao, C.; Jing, W. Maneuvering penetration strategies of ballistic missiles based on deep reinforcement learning. Proc. Inst. Mech. Eng. Part G: J. Aerosp. Eng. 2022, 236, 3494–3504. [Google Scholar] [CrossRef]
- Radac, M.-B.; Lala, T. Robust Control of Unknown Observable Nonlinear Systems Solved as a Zero-Sum Game. IEEE Access 2020, 8, 214153–214165. [Google Scholar] [CrossRef]
- Zhao, M.; Wang, D.; Ha, M.; Qiao, J. Evolving and Incremental Value Iteration Schemes for Nonlinear Discrete-Time Zero-Sum Games. IEEE Trans. Cybern. 2022, 1–13, published online. [Google Scholar] [CrossRef]
- Xue, S.; Luo, B.; Liu, D. Event-Triggered Adaptive Dynamic Programming for Zero-Sum Game of Partially Unknown Continuous-Time Nonlinear Systems. IEEE Trans. Syst. Man Cybern Syst. 2020, 50, 3189–3199. [Google Scholar] [CrossRef]
- Wei, Q.; Liu, D.; Lin, Q.; Song, R. Adaptive Dynamic Programming for Discrete-Time Zero-Sum Games. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 957–969. [Google Scholar] [CrossRef]
- Zhu, Y.; Zhao, D.; Li, X. Iterative Adaptive Dynamic Programming for Solving Unknown Nonlinear Zero-Sum Game Based on Online Data. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 714–725. [Google Scholar] [CrossRef]
- Jiang, H.; Zhang, H.; Han, J.; Zhang, K. Iterative adaptive dynamic programming methods with neural network implementation for multi-player zero-sum games. Neurocomputing 2018, 307, 54–60. [Google Scholar] [CrossRef]
- Wang, W.; Chen, X.; Du, J. Model-free finite-horizon optimal control of discrete-time two-player zero-sum games. Int. J. Syst. Sci. 2023, 54, 167–179. [Google Scholar] [CrossRef]
- Zhao, W.; Queralta, J.P.; Westerlund, T. Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: A Survey. 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canber, ACT, Australia, 1–4 December 2020; IEEE: New York, NY, USA, 2020; pp. 737–744, ISBN 978-1-7281-2547-3. [Google Scholar]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning-ICML ’09, Montreal, QC, Canada, 14–18 June 2009; Danyluk, A., Bottou, L., Littman, M., Eds.; ACM Press: New York, NY, USA, 2009; pp. 1–8, ISBN 978-1-6055-8516-1. [Google Scholar]
- Perelman, A.; Shima, T.; Rusnak, I. Cooperative Differential Games Strategies for Active Aircraft Protection from a Homing Missile. J. Guid. Control Dyn. 2011, 34, 761–773. [Google Scholar] [CrossRef]
- Wang, X.; Chen, Y.; Zhu, W. A Survey on Curriculum Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 4555–4576. [Google Scholar] [CrossRef] [PubMed]
- Soviany, P.; Ionescu, R.T.; Rota, P.; Sebe, N. Curriculum Learning: A Survey. Int. J. Comput. Vis. 2022, 130, 1526–1565. [Google Scholar] [CrossRef]
- Zarchan, P. Tactical and Strategic Missile Guidance, 6th ed.; American Institute of Aeronautics and Astronautics: Reston, VA, USA, 2012; ISBN 978-1-60086-894-8. [Google Scholar]
- Fujimoto, S.; van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. In Proceedings of the 35th International Conference on Machine Learning; Dy, J., Krause, A., Eds.; PLMR: Stockholm, Sweden, 2018; pp. 1587–1596. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning; Dy, J., Krause, A., Eds.; PMLR: Stockholm, Sweden, 2018; pp. 1861–1870. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv Prepr. 2017, arXiv:1707.06347. Available online: https://arxiv.org/abs/1707.06347v2 (accessed on 28 August 2017).
- Liu, F.; Dong, X.; Li, Q.; Ren, Z. Robust multi-agent differential games with application to cooperative guidance. Aerosp. Sci. Technol. 2021, 111, 106568. [Google Scholar] [CrossRef]
- Wei, X.; Yang, J. Optimal Strategies for Multiple Unmanned Aerial Vehicles in a Pursuit/Evasion Differential Game. J. Guid. Control Dyn. 2018, 41, 1799–1806. [Google Scholar] [CrossRef]
- Shaferman, V.; Shima, T. Cooperative Optimal Guidance Laws for Imposing a Relative Intercept Angle. J. Guid. Control Dyn. 2015, 38, 1395–1408. [Google Scholar] [CrossRef]
- Ilahi, I.; Usama, M.; Qadir, J.; Janjua, M.U.; Al-Fuqaha, A.; Hoang, D.T.; Niyato, D. Challenges and Countermeasures for Adversarial Attacks on Deep Reinforcement Learning. IEEE Trans. Artif. Intell. 2022, 3, 90–109. [Google Scholar] [CrossRef]
- Qiu, S.; Liu, Q.; Zhou, S.; Wu, C. Review of Artificial Intelligence Adversarial Attack and Defense Technologies. Appl. Sci. 2019, 9, 909. [Google Scholar] [CrossRef]
Gain | Meaning | Effective Time |
---|---|---|
Responsible for pursuing the target, i.e., decreasing | ||
Responsible for avoiding the defender, i.e., increasing | ||
Responsible for avoiding the missile, i.e., increasing | ||
Responsible for assisting the defender in pursuing the missile, i.e., decreasing | ||
Responsible for assisting the target in avoiding the missile, i.e., increasing | ||
Responsible for pursuing the missile, i.e., decreasing |
Parameters | Missile | Target | Defender |
---|---|---|---|
Lateral position/m | 0 | [3000,4000] | [1500,2500] |
Longitudinal position/m | 1000 | [500,1500] | [500,1500] |
Max load/g | 15 | 5 | 10 |
Time constant | 0.1 | 0.2 | 0.3 |
Flight path angle/(°) | 0 | ||
Velocity/() | [250,300] | [150,200] | [250,300] |
Kill radius/m | 5 | — | 5 |
Hyperparameters | Value |
---|---|
Ratio clipping | 0.3 |
Learning rate | |
Discount rate | 0.99 |
Buffer size | |
Actor network for M | 8-16-16-16-2 |
Critic network for M | 8-16-16-16-1 |
Actor network for T/D | 8-16-16-16-4 |
Critic network for T/D | 8-16-16-16-1 |
Parameters | Missile | Target | Defender |
---|---|---|---|
Lateral position/m | 0 | 3500 | 2500 |
Longitudinal position/m | 1000 | 1300/1000/700 | 1300/1000/700 |
Velocity/() | 250 | 150 | 250 |
Kill radius/m | 5 | — | 5 |
Observation Noise | ||||
---|---|---|---|---|
5% | 89.2% | 79.1% | 88.4% | 76.5% |
15% | 89.0% | 76.4% | 86.5% | 75.3% |
25% | 82.5% | 75.5% | 78.5% | 75.1% |
35% | 75.0% | 74.1% | 79.0% | 74.2% |
Observation Noise | C-DRL | CLQDG | ||||
---|---|---|---|---|---|---|
5% | 98.4% | 87.2% | 93.5% | 82.2% | 70.0% | 67.3% |
15% | 94.5% | 86.6% | 94.0% | 81.0% | 68.0% | 66.7% |
25% | 95.0% | 87.0% | 92.5% | 79.8% | 53.3% | 50.1% |
35% | 93.4% | 85.7% | 93.7% | 79.6% | 38.2% | 37.3% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, X.; Chen, W.; Chen, Z. Intelligent Game Strategies in Target-Missile-Defender Engagement Using Curriculum-Based Deep Reinforcement Learning. Aerospace 2023, 10, 133. https://doi.org/10.3390/aerospace10020133
Gong X, Chen W, Chen Z. Intelligent Game Strategies in Target-Missile-Defender Engagement Using Curriculum-Based Deep Reinforcement Learning. Aerospace. 2023; 10(2):133. https://doi.org/10.3390/aerospace10020133
Chicago/Turabian StyleGong, Xiaopeng, Wanchun Chen, and Zhongyuan Chen. 2023. "Intelligent Game Strategies in Target-Missile-Defender Engagement Using Curriculum-Based Deep Reinforcement Learning" Aerospace 10, no. 2: 133. https://doi.org/10.3390/aerospace10020133
APA StyleGong, X., Chen, W., & Chen, Z. (2023). Intelligent Game Strategies in Target-Missile-Defender Engagement Using Curriculum-Based Deep Reinforcement Learning. Aerospace, 10(2), 133. https://doi.org/10.3390/aerospace10020133