A Fully Controllable UAV Using Curriculum Learning and Goal-Conditioned Reinforcement Learning: From Straight Forward to Round Trip Missions
Abstract
:1. Introduction
- Novel Integration Methodology: We propose a unique combination of curriculum learning and goal-conditioned RL that enables fully autonomous UAV control in ways not previously demonstrated. Unlike existing methods that typically focus on either curriculum learning or goal-conditioning separately, our approach synergistically combines both to achieve superior performance in complex missions.
- Advanced Mission Handling Framework: Our approach successfully handles round-trip missions and complex directional changes, capabilities that existing approaches have struggled to achieve. The method demonstrates particular effectiveness in scenarios that require multiple waypoint navigation and return-to-base operations, significantly expanding current UAV operational capabilities.
- Comprehensive Empirical Validation: Through extensive experiments, we demonstrate that our approach achieves significantly higher success rates (>70% for complex missions) compared to baseline methods (<35% for similar scenarios). Our experimental results validate the effectiveness of progressive learning in enhancing UAV control capabilities.
2. Background
2.1. Goal-Conditioned RL
2.2. Actor-Critic RL
2.3. Random Network Distillation
2.4. Self-Imitation Learning
2.5. Curriculum Learning
3. Problem Definition
3.1. Environment
3.2. Test Environment
3.3. State
- The UAV’s angle and distance relative to the goal
- The UAV’s Heading, pitch, and bank angle
3.4. Action
3.5. Reward
- Upon successfully reaching the final target (+20)
- Upon departing from its environment (−10)
- When the fuel is depleted before reaching the goal (−5)
4. A Fully Controllable UAV in the Path Planning
4.1. Learning the Basic Flight for the Agent
4.2. Learning Various Flight Maneuvers
5. Experiment
5.1. Model Architecture
5.2. Training Phase
5.3. Test Phase
6. Result
6.1. Overall Learning Progression
6.2. Performance Evaluation in Test Scenarios
- Single-goal missions: ≥95% success rate
- Two-subgoal missions: ≥90% success rate
- Three-subgoal missions: ≥75% success rate
- Complex round-trip missions: ≥73% success rate
7. Discussion
7.1. Performance Analysis and Limitations
7.2. Implementation Challenges and Technical Limitations
7.3. Future Directions and Applications
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lee, G.T.; Kim, C.O. Autonomous control of combat unmanned aerial vehicles to evade surface-to-air missiles using deep reinforcement learning. IEEE Access 2020, 8, 226724–226736. [Google Scholar] [CrossRef]
- Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A.; Guizani, M. Unmanned aerial vehicles (UAVs): A survey on civil applications and key research challenges. IEEE Access 2019, 7, 48572–48634. [Google Scholar] [CrossRef]
- Yan, C.; Xiang, X.; Wang, C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments. J. Intell. Robot. Syst. 2020, 98, 297–309. [Google Scholar] [CrossRef]
- Cui, Z.; Wang, Y. UAV path planning based on multi-layer reinforcement learning technique. IEEE Access 2021, 9, 59486–59497. [Google Scholar] [CrossRef]
- Chen, X.; Chen, X.M.; Zhang, J. The dynamic path planning of UAV based on A* algorithm. Appl. Mech. Mater. 2014, 494, 1094–1097. [Google Scholar] [CrossRef]
- Li, J.; Huang, Y.; Xu, Z.; Wang, J.; Chen, M. Path planning of UAV based on hierarchical genetic algorithm with optimized search region. In Proceedings of the 2017 13th IEEE International Conference on Control & Automation (ICCA), Ohrid, Macedonia, 3–6 July 2017; pp. 1033–1038. [Google Scholar]
- Huang, C.; Fei, J. UAV path planning based on particle swarm optimization with global best path competition. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1859008. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Sinha, S.; Mandlekar, A.; Garg, A. S4rl: Surprisingly simple self-supervision for offline reinforcement learning in robotics. In Proceedings of the Conference on Robot Learning. PMLR, Auckland, New Zealand, 14–18 December 2022; pp. 907–917. [Google Scholar]
- Hwang, H.J.; Jang, J.; Choi, J.; Bae, J.H.; Kim, S.H.; Kim, C.O. Stepwise Soft Actor–Critic for UAV Autonomous Flight Control. Drones 2023, 7, 549. [Google Scholar] [CrossRef]
- Song, Y.; Romero, A.; Müller, M.; Koltun, V.; Scaramuzza, D. Reaching the limit in autonomous racing: Optimal control versus reinforcement learning. Sci. Robot. 2023, 8, eadg1462. [Google Scholar] [CrossRef] [PubMed]
- Ma, B.; Liu, Z.; Dang, Q.; Zhao, W.; Wang, J.; Cheng, Y.; Yuan, Z. Deep reinforcement learning of UAV tracking control under wind disturbances environments. IEEE Trans. Instrum. Meas. 2023, 72, 2510913. [Google Scholar] [CrossRef]
- Ma, B.; Liu, Z.; Zhao, W.; Yuan, J.; Long, H.; Wang, X.; Yuan, Z. Target tracking control of UAV through deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2023, 24, 5983–6000. [Google Scholar] [CrossRef]
- Wang, Y.; Boyle, D. Constrained reinforcement learning using distributional representation for trustworthy quadrotor UAV tracking control. IEEE Trans. Autom. Sci. Eng. 2024. [Google Scholar] [CrossRef]
- Choi, J.; Kim, H.M.; Hwang, H.J.; Kim, Y.D.; Kim, C.O. Modular Reinforcement Learning for Autonomous UAV Flight Control. Drones 2023, 7, 418. [Google Scholar] [CrossRef]
- Qu, C.; Gai, W.; Zhong, M.; Zhang, J. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning. Appl. Soft Comput. 2020, 89, 106099. [Google Scholar] [CrossRef]
- Bouhamed, O.; Ghazzai, H.; Besbes, H.; Massoud, Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–5. [Google Scholar]
- Luo, Y.; Ji, T.; Sun, F.; Liu, H.; Zhang, J.; Jing, M.; Huang, W. Goal-Conditioned Hierarchical Reinforcement Learning With High-Level Model Approximation. IEEE Trans. Neural Netw. Learn. Syst. 2024. [Google Scholar] [CrossRef] [PubMed]
- Ashraf, M.; Gaydamaka, A.; Tan, B.; Moltchanov, D.; Koucheryavy, Y. Low Complexity Algorithms for Mission Completion Time Minimization in UAV-Based Emergency Response. IEEE Trans. Intell. Veh. 2024. [Google Scholar] [CrossRef]
- Yang, R.; Lu, Y.; Li, W.; Sun, H.; Fang, M.; Du, Y.; Li, X.; Han, L.; Zhang, C. Rethinking goal-conditioned supervised learning and its connection to offline rl. arXiv 2022, arXiv:2202.04478. [Google Scholar]
- Zhao, R.; Sun, X.; Tresp, V. Maximum Entropy-Regularized Multi-Goal Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Volume 97, Proceedings of Machine Learning Research, pp. 7553–7562. [Google Scholar]
- Nasiriany, S.; Pong, V.; Lin, S.; Levine, S. Planning with goal-conditioned policies. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar]
- Lee, G.T.; Kim, K. A Controllable Agent by Subgoals in Path Planning Using Goal-Conditioned Reinforcement Learning. IEEE Access 2023, 11, 33812–33825. [Google Scholar] [CrossRef]
- Bhatnagar, S.; Sutton, R.S.; Ghavamzadeh, M.; Lee, M. Natural actor–critic algorithms. Automatica 2009, 45, 2471–2482. [Google Scholar] [CrossRef]
- Burda, Y.; Edwards, H.; Pathak, D.; Storkey, A.; Darrell, T.; Efros, A.A. Large-scale study of curiosity-driven learning. arXiv 2018, arXiv:1808.04355. [Google Scholar]
- Pathak, D.; Agrawal, P.; Efros, A.A.; Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proceedings of the International Conference on Machine Learning. PMLR, Sydney, Australia, 6–11 August 2017; pp. 2778–2787. [Google Scholar]
- Burda, Y.; Edwards, H.; Storkey, A.; Klimov, O. Exploration by random network distillation. arXiv 2018, arXiv:1810.12894. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning. PMLR, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Oh, J.; Guo, Y.; Singh, S.; Lee, H. Self-imitation learning. In Proceedings of the International Conference on Machine Learning. PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 3878–3887. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 41–48. [Google Scholar]
- Ivanovic, B.; Harrison, J.; Sharma, A.; Chen, M.; Pavone, M. Barc: Backward reachability curriculum for robotic reinforcement learning. In Proceedings of the 2019 International Conference on Robotics and Automation (ICRA), Montreal, QC, Canada, 20–24 May 2019; pp. 15–21. [Google Scholar]
- Silva, F.L.D.; Costa, A.H.R. Object-oriented curriculum generation for reinforcement learning. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, 10–15 July 2018; pp. 1026–1034. [Google Scholar]
- Narvekar, S.; Peng, B.; Leonetti, M.; Sinapov, J.; Taylor, M.E.; Stone, P. Curriculum learning for reinforcement learning domains: A framework and survey. J. Mach. Learn. Res. 2020, 21, 7382–7431. [Google Scholar]
- Kim, S.; Kim, Y. Three dimensional optimum controller for multiple UAV formation flight using behavior-based decentralized approach. In Proceedings of the 2007 International Conference on Control, Automation and Systems, Seoul, Republic of Korea, 17–20 October 2007; pp. 1387–1392. [Google Scholar]
- Lee, G.; Kim, K.; Jang, J. Real-time path planning of controllable UAV by subgoals using goal-conditioned reinforcement learning. Appl. Soft Comput. 2023, 146, 110660. [Google Scholar] [CrossRef]
- Konda, V.; Tsitsiklis, J. Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 1999, 12, 1008–1014. [Google Scholar]
- Razzaghi, P.; Tabrizian, A.; Guo, W.; Chen, S.; Taye, A.; Thompson, E.; Bregeon, A.; Baheri, A.; Wei, P. A survey on reinforcement learning in aviation applications. Eng. Appl. Artif. Intell. 2024, 136, 108911. [Google Scholar] [CrossRef]
- Sanz, D.; Valente, J.; del Cerro, J.; Colorado, J.; Barrientos, A. Safe operation of mini UAVs: A review of regulation and best practices. Adv. Robot. 2015, 29, 1221–1233. [Google Scholar] [CrossRef]
Method | Simple Tasks | Complex Tasks | Round Trip |
---|---|---|---|
(Case 1 & 2) | (Case 3 & 4) | ||
Goal-Conditioned Only [1] | 80% | 0% | 0% |
Our Method | 95% | 92% | 77% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kim, H.; Choi, J.; Do, H.; Lee, G.T. A Fully Controllable UAV Using Curriculum Learning and Goal-Conditioned Reinforcement Learning: From Straight Forward to Round Trip Missions. Drones 2025, 9, 26. https://doi.org/10.3390/drones9010026
Kim H, Choi J, Do H, Lee GT. A Fully Controllable UAV Using Curriculum Learning and Goal-Conditioned Reinforcement Learning: From Straight Forward to Round Trip Missions. Drones. 2025; 9(1):26. https://doi.org/10.3390/drones9010026
Chicago/Turabian StyleKim, Hyeonmin, Jongkwan Choi, Hyungrok Do, and Gyeong Taek Lee. 2025. "A Fully Controllable UAV Using Curriculum Learning and Goal-Conditioned Reinforcement Learning: From Straight Forward to Round Trip Missions" Drones 9, no. 1: 26. https://doi.org/10.3390/drones9010026
APA StyleKim, H., Choi, J., Do, H., & Lee, G. T. (2025). A Fully Controllable UAV Using Curriculum Learning and Goal-Conditioned Reinforcement Learning: From Straight Forward to Round Trip Missions. Drones, 9(1), 26. https://doi.org/10.3390/drones9010026