Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization
Abstract
:1. Introduction
- The present work is the first to address UAV autonomous landing using a deep reinforcement learning approach with only visual inputs. The agent was trained exclusively with low-resolution grayscale images, without the need for direct supervision, hand-crafted features, or dedicated external sensors.
- A divide-and-conquer approach is used to split a complex task in sequential ones, with separate DQNs assigned to each one of them. Internal triggers are learned at training time to autonomously switch between policies. We call this new type of networks: Sequential Deep Q-Networks (SDQNs).
- A partitioned buffer replay is defined and implemented to speed up learning. This buffer stores experiences in separate partitions based on their relevance. Note that this technique can be used in other complex tasks.
- Using SDQNs, the partitioned buffer replay, and domain randomization, a commercial UAV has been trained entirely in simulation and tested in real and simulated environments. The performances are similar to human pilots and a state-of-the-art algorithm but allow harvesting the benefits of DRL such as training suitable features in different scenarios.
2. Related Work
3. Problem Definition and Notation
3.1. Landmark Alignment
3.2. Vertical Descent
3.3. Vehicle Characteristics
4. Proposed Method
4.1. Sequential Deep Q-Networks (SDQN)
4.2. Partitioned Buffer Replay
4.3. Training through Domain Randomization
4.4. Suitability of the Method to Robotics Applications
5. Experiments
5.1. Methods
5.2. Results
- Uniform. The first test was performed on 21 unknown uniform textures belonging to the same categories as the training set. For the marker alignment phase, SDQN-DR has an accuracy of , while SDQN obtains a lower score (). The human performance is , the AR-tracker has a score of , and the random agent of . In the vertical descent phase, SDQN-DR achieved an accuracy of , for SDQN, for humans, and for the AR-tracker. Table 1 reports a comparison between our method (SDQN-DR) and human pilots. For the human subjects, the average time required to accomplish the task was 24 (marker alignment) and 46 (vertical descent), whereas for the SDQN-DR was only 12 and 38 . The humans were significantly slower but more accurate than the artificial agents. This result highlights a difference in strategy between humans and artificial agents, therefore the performance of the two groups must be compared carefully.Note that in our method the trade-off between accuracy and speed can be tuned by the cost-of-living (neutral reward), with higher values pushing the agent to complete the task faster (but with a lower accuracy).
- Corrupted. The second test was performed on the same 21 unknown textures but using a marker which has been corrupted through a semi-transparent dust-like layer (Figure 4i). In this condition, we observed a significant drop in the AR-tracker performances from to (marker alignment) and from to (vertical descent). This result can be explained by the fact that the underlying template matching algorithm failed in identifying the corrupted marker. Under the same condition, the SDQN-DR performed well, with a limited drop in performance from to (marker alignment) and from to (vertical descent), showing to be more robust to marker corruption.
- Mixed. The third test was done randomly sampling 25 textures from the test set and mixing them in a mosaic-like composition. In the marker alignment phase, SDQN-DR had a success rate of , SDQN , human pilots and the AR-tracker . In addition, for the vertical descent, we registered worst performances for all the agents (SDQN-DR = , SDQN = , Humans = , Random = , AR-tracker = ).
- Photo-Realistic. The fourth and last test has been done on three photo-realistic environments: warehouse, disaster site, and a power-plant (Figure 4e,g). In addition, here we observed a generic drop in performance for marker alignment (SDQN-DR = , SDQN = , Human = , Random = , AR-tracker = ) and vertical descent (SDQN-DR = , SDQN = , Human = , Random = , AR-tracker = ), showing how completing the task in a complex environment is more difficult for all agents.
6. Real World Implementation
7. Discussion and Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
- Gu, S.; Holly, E.; Lillicrap, T.; Levine, S. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore, 29 May–3 June 2017; pp. 3389–3396. [Google Scholar]
- Andrychowicz, O.M.; Baker, B.; Chociej, M.; Jozefowicz, R.; McGrew, B.; Pachocki, J.; Petron, A.; Plappert, M.; Powell, G.; Ray, A.; et al. Learning dexterous in-hand manipulation. Int. J. Rob. Res. 2020, 39, 3–20. [Google Scholar] [CrossRef] [Green Version]
- Tai, L.; Paolo, G.; Liu, M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 31–36. [Google Scholar]
- Zhu, Y.; Mottaghi, R.; Kolve, E.; Lim, J.J.; Gupta, A.; Fei-Fei, L.; Farhadi, A. Target-driven visual navigation in indoor scenes using deep reinforcement learning. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, Singapore, 29 May–3 June 2017; pp. 3357–3364. [Google Scholar]
- Kahn, G.; Villaflor, A.; Ding, B.; Abbeel, P.; Levine, S. Self-supervised deep reinforcement learning with generalized computation graphs for robot navigation. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 1–8. [Google Scholar]
- Ha, D.; Schmidhuber, J. Recurrent world models facilitate policy evolution. In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 2–8 December 2018; pp. 2450–2462. [Google Scholar]
- Thabet, M.; Patacchiola, M.; Cangelosi, A. Sample-efficient Deep Reinforcement Learning with Imaginary Rollouts for Human-Robot Interaction. arXiv 2019, arXiv:1908.05546. [Google Scholar]
- Zhang, F.; Leitner, J.; Milford, M.; Corke, P. Modular deep q networks for sim-to-real transfer of visuo-motor policies. arXiv 2016, arXiv:1610.06781. [Google Scholar]
- Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain randomization for transferring deep neural networks from simulation to the real world. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar]
- Tobin, J.; Biewald, L.; Duan, R.; Andrychowicz, M.; Handa, A.; Kumar, V.; McGrew, B.; Ray, A.; Schneider, J.; Welinder, P.; et al. Domain randomization and generative models for robotic grasping. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 3482–3489. [Google Scholar]
- Polvara, R.; Patacchiola, M.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R.; Cangelosi, A. Toward End-to-End Control for UAV Autonomous Landing via Deep Reinforcement Learning. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; pp. 115–123. [Google Scholar] [CrossRef]
- Van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
- Thrun, S.; Schwartz, A. Issues in using function approximation for reinforcement learning. In Proceedings of the 1993 Connectionist Models Summer School, 1st ed.; Psychology Press: London, UK, 1993. [Google Scholar]
- Forster, C.; Faessler, M.; Fontana, F.; Werlberger, M.; Scaramuzza, D. Continuous on-board monocular-vision-based elevation mapping applied to autonomous landing of micro aerial vehicles. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 111–118. [Google Scholar]
- Sukkarieh, S.; Nebot, E.M.; Durrant-Whyte, H.F. A high integrity IMU/GPS navigation loop for autonomous land vehicle applications. IEEE Trans. Robo. Autom. 1999, 15, 572–578. [Google Scholar] [CrossRef] [Green Version]
- Baca, T.; Stepan, P.; Saska, M. Autonomous landing on a moving car with unmanned aerial vehicle. In Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France, 6–8 September 2017; pp. 1–6. [Google Scholar]
- Beul, M.; Houben, S.; Nieuwenhuisen, M.; Behnke, S. Fast autonomous landing on a moving target at MBZIRC. In Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France, 6–8 September 2017; pp. 1–6. [Google Scholar]
- Bähnemann, R.; Pantic, M.; Popović, M.; Schindler, D.; Tranzatto, M.; Kamel, M.; Grimm, M.; Widauer, J.; Siegwart, R.; Nieto, J. The ETH-MAV Team in the MBZ International Robotics Challenge. J. Field Rob. 2009, 36, 78–103. [Google Scholar] [CrossRef]
- Gui, Y.; Guo, P.; Zhang, H.; Lei, Z.; Zhou, X.; Du, J.; Yu, Q. Airborne vision-based navigation method for uav accuracy landing using infrared lamps. J. Intell. Rob. Syst. 2013, 72, 197. [Google Scholar] [CrossRef]
- Tang, D.; Hu, T.; Shen, L.; Zhang, D.; Kong, W.; Low, K.H. Ground stereo vision-based navigation for autonomous take-off and landing of uavs: A chan-vese model approach. Int. J. Adv. Rob. Syst. 2016, 13, 67. [Google Scholar] [CrossRef] [Green Version]
- Lin, S.; Garratt, M.A.; Lambert, A.J. Monocular vision-based real-time target recognition and tracking for autonomously landing an UAV in a cluttered shipboard environment. Autono. Robots 2017, 41, 881–901. [Google Scholar] [CrossRef]
- Falanga, D.; Zanchettin, A.; Simovic, A.; Delmerico, J.; Davide, S. Vision-based Autonomous Quadrotor Landing on a Moving Platform. In Proceedings of the 2017 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Shanghai, China, 11–13 October 2017; pp. 200–207. [Google Scholar]
- Serra, P.; Cunha, R.; Hamel, T.; Cabecinhas, D.; Silvestre, C. Landing of a Quadrotor on a Moving Target Using Dynamic Image-Based Visual Servo Control. IEEE Trans. Rob. 2016, 32, 1524–1535. [Google Scholar] [CrossRef]
- Lee, D.; Ryan, T.; Kim, H.J. Autonomous landing of a VTOL UAV on a moving platform using image-based visual servoing. In Proceedings of the 2012 IEEE International Conference on Robotics and Automation, Saint Paul, MN, USA, 14–18 May 2012; pp. 971–976. [Google Scholar]
- Kersandt, K. Deep Teinforcement Learning as Control Method for Autonomous Uavs. Master’s Thesis, Universitat Politècnica de Catalunya, Catalonia, Spain, February 2017. [Google Scholar]
- Xu, Y.; Liu, Z.; Wang, X. Monocular Vision based Autonomous Landing of Quadrotor through Deep Reinforcement Learning. In Proceedings of the 2018 37th Chinese Control Conference (CCC), Wuhan, China, 25–27 July 2018; pp. 10014–10019. [Google Scholar] [CrossRef]
- Lee, S.; Shim, T.; Kim, S.; Park, J.; Hong, K.; Bang, H. Vision-Based Autonomous Landing of a Multi-Copter Unmanned Aerial Vehicle using Reinforcement Learning. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; pp. 108–114. [Google Scholar] [CrossRef]
- Rodriguez-Ramos, A.; Sampedro, C.; Bavle, H.; De La Puente, P.; Campoy, P. A deep reinforcement learning strategy for UAV autonomous landing on a moving platform. J. Intell. Rob. Syst. 2019, 93, 351–366. [Google Scholar] [CrossRef]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
- Nonami, K.; Kendoul, F.; Suzuki, S.; Wang, W.; Nakazawa, D. Autonomous Flying Robots: Unmanned Aerial Vehicles and Micro Aerial Vehicles, 1st ed.; Springer: Berlin, Germany, 2010. [Google Scholar]
- Goldstein, H. Classical Mechanics, 2nd ed.; World Student Series; Addison-Wesley: Reading, MA, USA; Menlo Park, CA, USA; Amsterdam, The Netherland, 1980. [Google Scholar]
- Barto, A.G.; Mahadevan, S. Recent advances in hierarchical reinforcement learning. Discrete Event Dyn. Syst. 2003, 13, 341–379. [Google Scholar] [CrossRef]
- Narasimhan, K.; Kulkarni, T.; Barzilay, R. Language understanding for text-based games using deep reinforcement learning. arXiv 2015, arXiv:1506.08941. [Google Scholar]
- WawrzyńSki, P.; Tanwani, A.K. Autonomous reinforcement learning with experience replay. Neural Netw. 2013, 41, 156–167. [Google Scholar] [CrossRef] [PubMed]
- Polvara, R.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R. Towards autonomous landing on a moving vessel through fiducial markers. In Proceedings of the 2017 European Conference on Mobile Robots (ECMR), Paris, France, 6–8 September 2017; pp. 1–6. [Google Scholar] [CrossRef]
Marker Alignment | Vertical Descent | |||||||
---|---|---|---|---|---|---|---|---|
Method | SR | T | X | Y | SR | T | X | Y |
Humans | 90% | 24s | 0.47(0.35)m | 0.49(0.37)m | 91% | 46s | 0.22(0.17)m | 0.23(0.18)m |
SDQN-DR [ours] | 91% | 12s | 0.80(0.40)m | 0.78(0.41)m | 89% | 38s | 0.28(0.18)m | 0.28(0.18)m |
Marker Alignment | Vertical Descent | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Method | Uniform | Corrupted | Mixed | Photo-Real. | TOT | Uniform | Corrupted | Mixed | Photo-Real. | TOT |
Random agent | 4% | 4% | 4% | 4% | 4% | 1% | 1% | 1% | 1% | 1% |
AR-Tracker [37] | 95% | 0% | 82% | 84% | 65% | 98% | 0% | 82% | 91% | 68% |
SDQN [ours] | 39% | 27% | 9% | 8% | 21% | 44% | 18% | 40% | 17% | 30% |
SDQN-DR [ours] | 91% | 81% | 84% | 57% | 78% | 89% | 51% | 82% | 81% | 75% |
Action-Space Robustness | ||||
Parameters/Phase | t = 2.0 | t = 1.0 | t = 0.5 | t = 0.25 |
Marker alignment | 91% | 91% | 89% | 47% |
Vertical descent | 89% | 78% | 82% | 42% |
Drift Robustness | ||||
Parameters/Phase | a = 0.0 | a = 0.1 | a = 0.2 | a = 0.3 |
Marker alignment | 91% | 79% | 72% | 68% |
Vertical descent | 89% | 82% | 82% | 82% |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Polvara, R.; Patacchiola, M.; Hanheide, M.; Neumann, G. Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization. Robotics 2020, 9, 8. https://doi.org/10.3390/robotics9010008
Polvara R, Patacchiola M, Hanheide M, Neumann G. Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization. Robotics. 2020; 9(1):8. https://doi.org/10.3390/robotics9010008
Chicago/Turabian StylePolvara, Riccardo, Massimiliano Patacchiola, Marc Hanheide, and Gerhard Neumann. 2020. "Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization" Robotics 9, no. 1: 8. https://doi.org/10.3390/robotics9010008
APA StylePolvara, R., Patacchiola, M., Hanheide, M., & Neumann, G. (2020). Sim-to-Real Quadrotor Landing via Sequential Deep Q-Networks and Domain Randomization. Robotics, 9(1), 8. https://doi.org/10.3390/robotics9010008