iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV
Abstract
:1. Introduction
- Proposed a hybrid path planning algorithm in the integration of iADA* for global planning and deep reinforcement learning algorithms for local planning for UAV.
- Developed a simulator to mimic the realistic features of the real-world environment to achieve adaptive behaviors of a UAV for learning.
- Performed comprehensive evaluation and validation of the proposed hybrid path planning algorithm-based system in an AirSim simulation platform. The effectiveness of the proposed system for an autonomous UAV is investigated through simulations and experiments.
2. Proposed iADA*-RL Planning Algorithm
2.1. Overall Description
2.2. Global Path Planning and Re-Planning iADA* Algorithm
Algorithm 1:iADA* |
|
2.3. Vehicle Dynamics
2.4. Learning Algorithm for Local Planning
2.4.1. RL-Based Local Planner
2.4.2. Problem Statement
2.4.3. Simulator Development
2.5. RL Agent Training
- 1.
- Depth image data: The state representation must provide the information needed by the agent to avoid real-time obstacles. To avoid these obstacles, the agent receives a pre-processed depth image from the drone’s front camera, and follows and returns to the planned path and waypoints provided by the iADA* global path planner. The agent also receives the current position and state of the drone. The depth image shown in Figure 7 shows the distances to the surfaces of the obstacles seen by the camera, mapped to grey-scale, in an image of size 256 × 144 pixels. In this experiment, we applied a basic image processing step to reduce the dimensionality of the input, in order to reduce the computation and memory requirements. A drone is typically moving along a fixed plane, and the middle section of the image is therefore most relevant. The size of the pre-processed image used is 100 × 30 pixel sizes.
- 2.
- Relative goal position and current velocity: The relative global position is a vector that represents the goal in polar coordinates (distance and goal) with respect to the current position of the UAV vehicle. The distance and the direction to the destination are calculated using Equation (2):Finally, the reward used in the learning process is set using Equation (3), following the approach used in [2]. When the UAV reaches the destination, an arrival reward is given. In this study, the condition is regarded as representing arrival at the destination. When colliding with obstacles such as walls, other UAVs or cars, a collision reward is given. In other cases, a positive or negative reward is given based on the relationship between the UAV and the destination. A positive reward is given for actions that approach the destination, and a negative reward for actions that move away from it. represents the value of at time is an attenuating hyper-parameter:If the drone bumps into an obstacles, either static or dynamic, the agent will receive a reward of −50. A new epoch starts with the drone in the initial position of [0, 0, 0], which indicates the North, East, Down (NED) point coordinate in the AirSim simulator. To update the weights of the network, we apply a mini-batch gradient descent to the loss function, as follows:
2.6. Learning Phase
3. Case Study: Mission Planning for an Autonomous UAV
3.1. Experimental Setting in Simulation
3.2. Experiments and Results
3.3. Discussions and Future Work
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Park, J.; Kim, J.; Song, J. Path planning for a robot manipulator based on probabilistic roadmap and reinforcement learning. Int. J. Control Autom. Syst. 2007, 5, 674–680. [Google Scholar]
- Kato, Y.; Kamiyama, K.; Morioka, K. Autonomous robot navigation system with learning based on deep Q-network and topological maps. In Proceedings of the 2017 IEEE/SICE International Symposium on System Integration (SII), Taipei, Taiwan, 11–14 December 2017; pp. 1040–1046. [Google Scholar] [CrossRef]
- Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural Combinatorial Optimization with Reinforcement Learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
- Huang, H.; Yang, Y.; Wang, H.; Ding, Z.; Sari, H.; Adachi, F. Deep Reinforcement Learning for UAV Navigation Through Massive MIMO Technique. IEEE Trans. Veh. Technol. 2020, 69, 1117–1121. [Google Scholar] [CrossRef] [Green Version]
- Faust, A.; Palunko, I.; Cruz, P.; Fierro, R.; Tapia, L. Automated aerial suspended cargo delivery through reinforcement learning. Artif. Intell. 2017, 247, 381–398. [Google Scholar] [CrossRef] [Green Version]
- Faust, A.; Ramirez, O.; Fiser, M.; Oslund, K.; Francis, A.; Davidson, J.; Tapia, L. PRM-RL: Long-range Robotic Navigation Tasks by Combining Reinforcement Learning and Sampling-based Planning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018. [Google Scholar]
- Kavraki, L.E.; Svestka, P.; Latombe, J.; Overmars, M.H. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 1996, 12, 566–580. [Google Scholar] [CrossRef] [Green Version]
- Dai, H.; Khalil, E.B.; Zhang, Y.; Dilkina, B.; Song, L. Learning Combinatorial Optimization Algorithms over Graphs. arXiv 2017, arXiv:1704.01665. [Google Scholar]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. arXiv 2015, arXiv:1506.03134. [Google Scholar]
- Francis, A.; Faust, A.; Chiang, H.T.L.; Hsu, J.; Kew, J.C.; Fiser, M.; Lee, T.W.E. Long-Range Indoor Navigation with PRM-RL. IEEE Trans. Robot. 2020, 36, 1115–1134. [Google Scholar] [CrossRef] [Green Version]
- Chiang, H.T.L.; Hsu, J.; Fiser, M.; Tapia, L.; Faust, A. RL-RRT: Kinodynamic Motion Planning via Learning Reachability Estimators from RL Policies. IEEE Robot. Autom. Lett. 2019, 4, 4298–4305. [Google Scholar] [CrossRef] [Green Version]
- Maw, A.A.; Tyan, M.; Lee, J.W. iADA*: Improved Anytime Path Planning and Replanning Algorithm for Autonomous Vehicle. J. Intell. Robot. Syst. 2020, 100, 1005–1013. [Google Scholar] [CrossRef]
- Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. In Field and Service Robotics; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Sanders, A. An Introduction to Unreal Engine 4; A. K. Peters, Ltd.: Natick, MA, USA, 2016. [Google Scholar]
- Juliani, A.; Berges, V.; Vckay, E.; Gao, Y.; Henry, H.; Mattar, M.; Lange, D. Unity: A General Platform for Intelligent Agents. arXiv 2018, arXiv:1809.02627. [Google Scholar]
- Fielding, R.T. Architectural Styles and the Design of Network-Based Software Architectures. Ph.D. Thesis, University of California, Irvine, MA, USA, 2000. [Google Scholar]
- Meier, L.; Tanskanen, P.; Fraundorfer, F.; Pollefeys, M. PIXHAWK: A system for autonomous flight using onboard computer vision. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 2992–2997. [Google Scholar] [CrossRef]
- Qin, T.; Li, P.; Shen, S. VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator. IEEE Trans. Robot. 2018, 34, 1004–1020. [Google Scholar] [CrossRef] [Green Version]
- García, J.; Molina, J.M. Simulation in real conditions of navigation and obstacle avoidance with PX4/Gazebo platform. Pers. Ubiquitous Comput. 2020. [Google Scholar] [CrossRef]
- Mendoza-Mendoza, J.A.; Gonzalez-Villela, V.; Sepulveda-Cervantes, G.; Mendez-Martinez, M.; Sossa-Azuela, H. ArduPilot Working Environment. In Advanced Robotic Vehicles Programming: An Ardupilot and Pixhawk Approach; Apress: Berkeley, CA, USA, 2020; pp. 19–46. [Google Scholar] [CrossRef]
- Bennett, S. Development of the PID controller. IEEE Control Syst. Mag. 1993, 13, 58–62. [Google Scholar] [CrossRef]
- Rohmer, E.; Singh, S.P.N.; Freese, M. V-REP: A versatile and scalable robot simulation framework. In Proceedings of the 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 1321–1326. [Google Scholar] [CrossRef]
- Szepesvári, C. Algorithms for Reinforcement Learning. Synth. Lect. Artif. Intell. Mach. 2010, 4, 1–103. [Google Scholar] [CrossRef] [Green Version]
- Kober, J.; Bagnell, J.A.; Peters, J. Reinforcement learning in robotics: A survey. Int. J. Robot. Res. 2013, 32, 1238–1274. [Google Scholar] [CrossRef] [Green Version]
- Kormushev, P.; Calinon, S.; Caldwell, D. Reinforcement Learning in Robotics: Applications and Real-World Challenges. Robotics 2013, 2, 122–148. [Google Scholar] [CrossRef] [Green Version]
- Sanghi, N. Markov Decision Processes. In Deep Reinforcement Learning with Python: With PyTorch, TensorFlow and OpenAI Gym; Apress: Berkeley, CA, USA, 2021; pp. 19–48. [Google Scholar] [CrossRef]
- Koenig, N.; Howard, A. Design and use paradigms for Gazebo, an open-source multi-robot simulator. In Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566), Coimbra, Portugal, 8–12 September 2004; Volume 3, pp. 2149–2154. [Google Scholar] [CrossRef] [Green Version]
- Hernandez, A.; Copot, C.; De Keyser, R.; Vlas, T.; Nascu, I. Identification and path following control of an AR.Drone quadrotor. In Proceedings of the 2013 17th International Conference on System Theory, Control and Computing (ICSTCC), Sinaia, Romania, 11–13 October 2013; pp. 583–588. [Google Scholar] [CrossRef]
- Qiu, W.; Yuille, A. UnrealCV: Connecting Computer Vision to Unreal Engine. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016. [Google Scholar]
- Krishnan, S.; Boroujerdian, B.; Fu, W.; Faust, A.; Reddi, V.J. Air Learning: An AI Research Platform for Algorithm-Hardware Benchmarking of Autonomous Aerial Robots. arXiv 2019, arXiv:1906.00421. [Google Scholar]
- Boroujerdian, B.; Genc, H.; Krishnan, S.; Cui, W.; Faust, A.; Reddi, V.J. MAVBench: Micro Aerial Vehicle Benchmarking. arXiv 2019, arXiv:1905.06388. [Google Scholar]
- Hornung, A.; Wurm, K.M.; Bennewitz, M.; Stachniss, C.; Burgard, W. OctoMap: An efficient probabilistic 3D mapping framework based on octrees. Auton. Robot. 2013, 34, 189–206. [Google Scholar] [CrossRef] [Green Version]
- Fujimura, K.; Kunii, T.; Yamaguchi, K.; Toriya, H. Octree-Related Data Structures and Algorithms. IEEE Comput. Graph. Appl. 1984, 4, 53–59. [Google Scholar] [CrossRef]
- Chen, T.; Shan, J. A novel cable-suspended quadrotor transportation system: From theory to experiment. Aerosp. Sci. Technol. 2020, 104, 105974. [Google Scholar] [CrossRef]
Algorithms | |||
---|---|---|---|
iADA * | iADA * + DQN | iADA * + DDPG | |
Success | 47 | 50 | 50 |
Failure | 3 | − | − |
Success rate |
Task | Algorithms | ||
---|---|---|---|
iADA * | iADA * + DQN | iADA * + DDPG | |
Success | 37 | 47 | 45 |
Failure | 13 | 3 | 5 |
Success rate |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maw, A.A.; Tyan, M.; Nguyen, T.A.; Lee, J.-W. iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV. Appl. Sci. 2021, 11, 3948. https://doi.org/10.3390/app11093948
Maw AA, Tyan M, Nguyen TA, Lee J-W. iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV. Applied Sciences. 2021; 11(9):3948. https://doi.org/10.3390/app11093948
Chicago/Turabian StyleMaw, Aye Aye, Maxim Tyan, Tuan Anh Nguyen, and Jae-Woo Lee. 2021. "iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV" Applied Sciences 11, no. 9: 3948. https://doi.org/10.3390/app11093948
APA StyleMaw, A. A., Tyan, M., Nguyen, T. A., & Lee, J.-W. (2021). iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV. Applied Sciences, 11(9), 3948. https://doi.org/10.3390/app11093948