Fast and High-Quality Monocular Depth Estimation with Optical Flow for Autonomous Drones
Abstract
:1. Introduction
2. Proposed Method
2.1. Optical Flow Attention and Skip Connection
2.2. Perceptual Discriminator
3. Experiments
3.1. Implementation
3.2. Accuracy, Error and Latency Evaluation and Ablation Study
3.3. Collision Rate Evaluation
4. Results
4.1. Accuracy, Error and Latency Evaluation
4.2. Ablation Study
4.3. Collision Rate Evaluation
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
CNN | Convolutional Neural Network |
UAV | Unmanned Aerial Vehicles |
VTOL | Vertical Take-Off and Land |
HTOL | Horizontal Take-Off and Landing |
DNN | Deep Neural Network |
GPU | Graphical Processing Unit |
CGAN | Conditional Generative Adversarial Network |
SLAM | Simultaneous Localization and Mapping |
VGG | Visual Geometry Group |
References
- Macrina, G.; Pugliese, L.D.P.; Guerriero, F.; Laporte, G. Drone-aided Routing: A Literature Review. Transp. Res. Part C Emerg. Technol. 2020, 120, 102762. [Google Scholar] [CrossRef]
- Fotouhi, A.; Qiang, H.; Ding, M.; Hassan, M.; Giordano, L.G.; Garcia-Rodriguez, A.; Yuan, J. Survey on UAV cellular communications: Practical aspects, standardization advancements, regulation, and security challenges. IEEE Commun. Surv. Tutor. 2019, 21, 3417–3442. [Google Scholar] [CrossRef]
- Scott, M.J.; Verhagen, W.J.; Bieber, M.T.; Marzocca, P. A Systematic Literature Review of Predictive Maintenance for Defence Fixed-Wing Aircraft Sustainment and Operations. Sensors 2022, 22, 7070. [Google Scholar] [CrossRef] [PubMed]
- Susanto, T.; Setiawan, M.B.; Jayadi, A.; Rossi, F.; Hamdhi, A.; Sembiring, J.P. Application of Unmanned Aircraft PID Control System for Roll, Pitch and Yaw Stability on Fixed Wings. In Proceedings of the 2021 International Conference on Computer Science, Information Technology, and Electrical Engineering (ICOMITEE), IEEE, Banyuwangi, Indonesia, 27–28 October 2021; pp. 186–190. [Google Scholar]
- Ito, S.; Akaiwa, K.; Funabashi, Y.; Nishikawa, H.; Kong, X.; Taniguchi, I.; Tomiyama, H. Load and Wind Aware Routing of Delivery Drones. Drones 2022, 6, 50. [Google Scholar] [CrossRef]
- Fuhrman, T.; Schneider, D.; Altenberg, F.; Nguyen, T.; Blasen, S.; Constantin, S.; Waibe, A. An interactive indoor drone assistant. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE, Macau, China, 3–8 November 2019; pp. 6052–6057. [Google Scholar]
- Hou, Y.; Zhang, Z.; Wang, C.; Cheng, S.; Ye, D. Research on Vehicle Identification Method and Vehicle Speed Measurement Method Based on Multi-Rotor UAV Equipped with LiDAR. In Proceedings of the IEEE International Conference on Advanced Electronic Materials, Computers and Software Engineering, Shenzhen, China, 24–26 April 2020. [Google Scholar]
- Moffatt, A.; Platt, E.; Mondragon, B.; Kwok, A.; Uryeu, D.; Bhandari, S. Obstacle Detection and Avoidance System for Small UAVs Using A LiDAR. In Proceedings of the IEEE International Conference on Unmanned Aircraft Systems, Athens, Greece, 1–4 September 2020. [Google Scholar]
- Li, J.; Liu, Y.; Du, S.; Wu, P.; Xu, Z. Hierarchical and adaptive phase correlation for precise disparity estimation of UAV images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7092–7104. [Google Scholar] [CrossRef]
- Liu, J.; Zhang, L.; Wang, Z.; Wang, R. Dense Stereo Matching Strategy for Oblique Images That Considers the Plane Directions in Urban Areas. IEEE Trans. Geosci. Remote Sens. 2020, 58, 5109–5116. [Google Scholar] [CrossRef]
- McGee, T.G.; Sengupta, R.; Hedrick, K. Obstacle Detection for Small Autonomous Aircraft using Sky Segmentation. In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 4679–4684. [Google Scholar]
- Valisetty, R.; Haynes, R.; Namburu, R.; Lee, M. Machine Learning for US Army UAVs Sustainment: Assessing Effect of Sensor Frequency and Placement on Damage Information in The Ultrasound Signals. In Proceedings of the IEEE International Conference on Machine Learning and Applications, Orlando, FL, USA, 17–20 December 2018; pp. 165–172. [Google Scholar]
- Figetakis, E.; Refaey, A. UAV Path Planning Using on-Board Ultrasound Transducer Arrays and Edge Support. In Proceedings of the IEEE International Conference on Communications Workshops, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar]
- Lidar, V. Velodyne Lidar Products. Available online: https://velodynelidar.com/products/ (accessed on 1 March 2022).
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. In Proceedings of the Fourth International Conference on 3D Vision, Stanford, CA, USA, 25–28 October 2016; pp. 239–248. [Google Scholar]
- Zhang, Z.; Xu, C.; Yang, J.; Gao, J.; Cui, Z. Progressive Hard-Mining Network for Monocular Depth Estimation. IEEE Trans. Image Process. 2018, 27, 3691–3702. [Google Scholar] [CrossRef]
- Li, J.; Klein, R.; Yao, A. A Two-Streamed Network for Estimating Fine-Scaled Depth Maps from Single RGB Images. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 3372–3380. [Google Scholar]
- Kuznietsov, Y.; Stuckler, J.; Leibe, B. Semi-Supervised Deep Learning for Monocular Depth Map Prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6647–6655. [Google Scholar]
- Eigen, D.; Fergus, R. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale convolutional architecture. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 2650–2658. [Google Scholar]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from A Single Image Using A Multi-Scale Deep Network. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Lee, J.H.; Han, M.K.; Ko, D.W.; Suh, I.H. From Big to Small: Multi-scale Local Planar Guidance for Monocular Depth Estimation. arXiv 2019, arXiv:1907.10326b. [Google Scholar]
- Liu, F.; Shen, C.; Lin, G. Deep Convolutional Neural Fields for Depth Estimation from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 5162–5170. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June– 1 July 2016; pp. 770–778. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Bhat, S.F.; Alhashim, I.; Wonka, P. Adabins: Depth Estimation Using Adaptive Bins. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 4009–4018. [Google Scholar]
- Li, Y.; Wang, Y.; Lu, Z.; Xiao, J. DepthGAN: GAN-based Depth Generation of Indoor Scenes from Semantic Layouts. arXiv 2022, arXiv:2203.11453. [Google Scholar]
- Kwak, D.h.; Lee, S.h. A Novel Method for Estimating Monocular Depth Using Cycle GAN and Segmentation. Sensors 2020, 20, 2567. [Google Scholar] [CrossRef] [PubMed]
- Fraga-Lamas, P.; Ramos, L.; Mondéjar-Guerra, V.; Fernández-Caramés, T.M. A Review on IoT Deep Learning UAV Systems for Autonomous Obstacle Detection and Collision Avoidance. Remote Sens. 2019, 11, 2144. [Google Scholar] [CrossRef]
- Muruganathan, S.D.; Lin, X.; Määttänen, H.L.; Sedin, J.; Zou, Z.; Hapsari, W.A.; Yasukawa, S. An Overview of 3GPP Release-15 Study on Enhanced LTE Support for Connected Drones. IEEE Commun. Stand. Mag. 2021, 5, 140–146. [Google Scholar] [CrossRef]
- Koubâa, A.; Ammar, A.; Alahdab, M.; Kanhouch, A.; Azar, A.T. Deepbrain: Experimental Evaluation of Cloud-Based Computation Offloading and Edge computing in The Internet-of-Drones for Deep Learning Applications. Sensors 2020, 20, 5240. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Zhang, J.; Wang, O.; Lin, Z.; Lu, H. SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 541–550. [Google Scholar]
- Shimada, T.; Nishikawa, H.; Kong, X.; Tomiyama, H. Pix2Pix-Based Depth Estimation from Monocular Images for Dynamic Path Planning of Multirotor on AirSim. In Proceedings of the International Symposium on Advanced Technologies and Applications in the Internet of Things, Kusatsu, Japan, 23–24 August 2021. [Google Scholar]
- Shimada, T.; Nishikawa, H.; Kong, X.; Tomiyama, H. Pix2Pix-Based Monocular Depth Estimation for Drones with Optical Flow on AirSim. Sensors 2022, 22, 2097. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Yang, X.; Chen, J.; Dang, Y.; Luo, H.; Tang, Y.; Liao, C.; Chen, P.; Cheng, K.T. Fast Depth Prediction and Obstacle Avoidance on A Monocular Drone Using Probabilistic convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2019, 22, 156–167. [Google Scholar] [CrossRef]
- Mur-Artal, R.; Montiel, J.M.M.; Tardos, J.D. ORB-SLAM: A Versatile and Accurate Monocular SLAM System. IEEE Trans. Robot. 2015, 31, 1147–1163. [Google Scholar] [CrossRef]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Arslan, A.T.; Seke, E. Face Depth Estimation With Conditional Generative Adversarial Networks. IEEE Access 2019, 7, 23222–23231. [Google Scholar] [CrossRef]
- Baby, A.T.; Andrews, A.; Dinesh, A.; Joseph, A.; Anjusree, V. Face Depth Estimation and 3D Reconstruction. In Proceedings of the 2020 Advanced Computing and Communication Technologies for High Performance Applications, Cochin, India, 2–4 July 2020; pp. 125–132. [Google Scholar]
- Farnebäck, G. Two-Frame Motion Estimation Based on Polynomial Expansion. In Proceedings of the Scandinavian Conference on Image Analysis, Halmstad, Sweden, 29 June–2 July 2003; pp. 363–370. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-NET: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Sungatullina, D.; Zakharov, E.; Ulyanov, D.; Lempitsky, V. Image Manipulation with Perceptual Discriminators. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 579–595. [Google Scholar]
- Shah, S.; Dey, D.; Lovett, C.; Kapoor, A. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles. In Proceedings of the Field and Service Robotics, Zurich, Switzerland, 12–15 September 2017. [Google Scholar]
- Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision Meets Robotics: The KITTI Dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired Image-To-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Perez, E.; Winger, A.; Tran, A.; Garcia-Paredes, C.; Run, N.; Keti, N.; Bhandari, S.; Raheja, A. Autonomous Collision Avoidance System for a Multicopter using Stereoscopic Vision. In Proceedings of the IEEE International Conference on Unmanned Aircraft Systems, Dallas, TX, USA, 12–15 June 2018. [Google Scholar]
Error (↓) | Accuracy (↑) | ||||
---|---|---|---|---|---|
RMSE | Rel. | ||||
Shimada et al. [33] | 5.924 | 0.133 | 0.882 | 0.956 | 0.977 |
Shimada et al. [34] | 5.917 | 0.131 | 0.886 | 0.957 | 0.982 |
Cycle GAN [46] | 5.961 | 0.135 | 0.891 | 0.960 | 0.984 |
Ours | 5.771 | 0.131 | 0.898 | 0.970 | 0.994 |
Error (↓) | Accuracy (↑) | ||||
---|---|---|---|---|---|
RMSE | Rel. | ||||
Eigen et al. [19] | 7.156 | 1.515 | 0.692 | 0.899 | 0.967 |
Liu et al. [23] | 6.986 | 0.217 | 0.647 | 0.882 | 0.961 |
Kuznietsov et al. [18] | 4.621 | 0.113 | 0.862 | 0.960 | 0.986 |
Shimada et al. [34] | 7.605 | 0.154 | 0.813 | 0.958 | 0.985 |
Xin et al. [36] | 5.752 | 0.125 | 0.869 | 0.956 | 0.980 |
Ours | 4.712 | 0.121 | 0.870 | 0.973 | 0.992 |
Method | Inference Time [ms] | ||
---|---|---|---|
Nano | Xavier | RTX 2070 SUPER | |
Eigne et al. [19] | 8.8 | 5.1 | 1.4 |
Shimada et al. [33] | 18.4 | 13.4 | 2.9 |
Shimada et al. [34] | 18.0 | 12.5 | 2.7 |
CycleGAN [46] | 32.1 | 23.1 | 4.5 |
Xin et al. [36] | 33.4 | 22.2 | 4.8 |
Kuznietsov et al. [18] | 70.1 | 39.1 | 9.6 |
Ours | 23.8 | 14.5 | 3.6 |
Method | Inference Time [ms] | ||
---|---|---|---|
Nano | Xavier | RTX 2070 SUPER | |
Eigne et al. [19] | 9.5 | 7.0 | 1.6 |
Shimada et al. [33] | 18.4 | 12.9 | 3.1 |
Shimada et al. [34] | 17.7 | 13.4 | 3.0 |
CycleGAN [46] | 34.5 | 24.6 | 5.7 |
Xin et al. [36] | 35.2 | 24.8 | 5.5 |
Kuznietsov et al. [18] | 76.3 | 41.3 | 12.0 |
Ours | 25.9 | 16.4 | 4.3 |
Perceptual | Optical Flow | Skip | Error (↓) | Accuracy (↑) | |||
---|---|---|---|---|---|---|---|
Discriminator | Attention | Connection | RMSE | Rel. | |||
6.721 | 0.244 | 0.774 | 0.887 | 0.928 | |||
✓ | 6.227 | 0.237 | 0.768 | 0.883 | 0.923 | ||
✓ | 6.290 | 0.226 | 0.771 | 0.888 | 0.929 | ||
✓ | 5.942 | 0.134 | 0.887 | 0.956 | 0.977 | ||
✓ | ✓ | 6.287 | 0.236 | 0.768 | 0.887 | 0.930 | |
✓ | ✓ | 5.951 | 0.206 | 0.818 | 0.915 | 0.950 | |
✓ | ✓ | 5.929 | 0.131 | 0.887 | 0.960 | 0.986 | |
✓ | ✓ | ✓ | 5.771 | 0.131 | 0.898 | 0.970 | 0.994 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shimada, T.; Nishikawa, H.; Kong, X.; Tomiyama, H. Fast and High-Quality Monocular Depth Estimation with Optical Flow for Autonomous Drones. Drones 2023, 7, 134. https://doi.org/10.3390/drones7020134
Shimada T, Nishikawa H, Kong X, Tomiyama H. Fast and High-Quality Monocular Depth Estimation with Optical Flow for Autonomous Drones. Drones. 2023; 7(2):134. https://doi.org/10.3390/drones7020134
Chicago/Turabian StyleShimada, Tomoyasu, Hiroki Nishikawa, Xiangbo Kong, and Hiroyuki Tomiyama. 2023. "Fast and High-Quality Monocular Depth Estimation with Optical Flow for Autonomous Drones" Drones 7, no. 2: 134. https://doi.org/10.3390/drones7020134
APA StyleShimada, T., Nishikawa, H., Kong, X., & Tomiyama, H. (2023). Fast and High-Quality Monocular Depth Estimation with Optical Flow for Autonomous Drones. Drones, 7(2), 134. https://doi.org/10.3390/drones7020134