Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices †
Abstract
:1. Introduction
2. Related Works
2.1. Object Detection
2.2. Homography-Based Mapping
2.3. Depth Estimation from Monocular Views
2.4. Absolute Height Estimation
3. System Architecture
3.1. AraBox Device
3.2. YOLOv5 and YOLOv8 Object Detectors
3.3. Homography-Based Mapping
3.4. Monocular Depth Estimation
3.5. Absolute Distance Estimation of Objects
3.6. Estimation of Absolute Height of Objects
3.7. Semi-Automatic Pipeline Configuration
4. Experiments and Results
4.1. Dataset
4.2. Validation Methodology
4.3. Results
4.3.1. Experiment No. 1—Indoor Area
4.3.2. Experiment No. 2—Outdoor Area
4.3.3. Summary of the Results
5. Conclusions and Future Works
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Cyganek, B.; Siebert, J. An Introduction to 3D Computer Vision Techniques and Algorithms; John Wiley & Sons: Hoboken, NJ, USA, 2009; pp. 459–474. [Google Scholar] [CrossRef]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
- NVidia. Jetson Nano. 2024. Available online: https://developer.nvidia.com/embedded/jetson-nano (accessed on 5 June 2024).
- Jocher, G.; Chaurasia, A.; Stoken, A.; Borovec, J.; NanoCode012; Kwon, Y.; TaoXie; Fang, J.; imyhxy; Michael, K.; et al. Ultralytics/YOLOv5: V6.1—TensorRT, TensorFlow Edge TPU and OpenVINO Export and Inference. 2022. Available online: https://zenodo.org/records/6222936 (accessed on 27 June 2024).
- Ranftl, R.; Bochkovskiy, A.; Koltun, V. Vision Transformers for Dense Prediction. arXiv 2021, arXiv:2103.13413. [Google Scholar]
- Graham, B.; El-Nouby, A.; Touvron, H.; Stock, P.; Joulin, A.; Jégou, H.; Douze, M. LeViT: A Vision Transformer in ConvNet’s Clothing for Faster Inference. arXiv 2021, arXiv:2104.01136. [Google Scholar]
- MYLED sp. z o.o. 2021. Available online: https://myled.pl/ (accessed on 3 June 2024).
- Shahin, M.; Chen, F.F.; Bouzary, H.; Krishnaiyer, K. Integration of Lean practices and Industry 4.0 technologies: Smart manufacturing for next-generation enterprises. Int. J. Adv. Manuf. Technol. 2020, 107, 2927–2936. [Google Scholar] [CrossRef]
- Shahin, M.; Maghanaki, M.; Hosseinzadeh, A.; Chen, F.F. Improving operations through a lean AI paradigm: A view to an AI-aided lean manufacturing via versatile convolutional neural network. Int. J. Adv. Manuf. Technol. 2024, 133, 5343–5419. [Google Scholar] [CrossRef]
- Bekbolatova, M.; Mayer, J.; Ong, C.W.; Toma, M. Transformative Potential of AI in Healthcare: Definitions, Applications, and Navigating the Ethical Landscape and Public Perspectives. Healthcare 2024, 12, 125. [Google Scholar] [CrossRef]
- Gąsienica-Józkowy, J.; Cyganek, B.; Knapik, M.; Głogowski, S.; Przebinda, L. Estimation of absolute distance and height of people based on monocular view and deep neural networks for edge devices operating in the visible and thermal spectra. In Proceedings of the 18th Conference on Computer Science and Intelligence Systems (FedCSIS 2023), Warsaw, Poland, 17–20 September 2023; pp. 503–511. [Google Scholar] [CrossRef]
- Hafiz, A.M.; Bhat, G.M. A survey on instance segmentation: State of the art. Int. J. Multimed. Inf. Retr. 2020, 9, 171–189. [Google Scholar] [CrossRef]
- Garrido-Jurado, S.; Muñoz Salinas, R.; Madrid-Cuevas, F.; Marín-Jiménez, M. Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 2014, 47, 2280–2292. [Google Scholar] [CrossRef]
- Zaidi, S.S.A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.N.; Lee, B. A Survey of Modern Deep Learning based Object Detection Models. arXiv 2021, arXiv:2104.11892. [Google Scholar] [CrossRef]
- Gąsienica-Józkowy, J.; Knapik, M.; Cyganek, B. An ensemble deep learning method with optimized weights for drone-based water rescue and surveillance. Integr.-Comput.-Aided Eng. 2021, 28, 221–235. [Google Scholar] [CrossRef]
- Knapik, M.; Cyganek, B. Driver’s fatigue recognition based on yawn detection in thermal images. Neurocomputing 2019, 338, 274–292. [Google Scholar] [CrossRef]
- Cyganek, B.; Wozniak, M. Tensor-Based Shot Boundary Detection in Video Streams. New Gener. Comput. 2017, 35, 311–340. [Google Scholar] [CrossRef]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv 2014, arXiv:1311.2524. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv 2016, arXiv:1506.01497. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. arXiv 2016, arXiv:1506.02640. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision—ECCV 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar] [CrossRef]
- Knapik, M.; Cyganek, B. Fast eyes detection in thermal images. Multimed. Tools Appl. 2021, 80, 3601–3621. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6517–6525. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 18 June 2023).
- Szeliski, R. Image Alignment and Stitching: A Tutorial. Found. Trends. Comput. Graph. Vis. 2006, 2, 1–104. [Google Scholar] [CrossRef]
- DeTone, D.; Malisiewicz, T.; Rabinovich, A. Deep Image Homography Estimation. arXiv 2016, arXiv:1606.03798. [Google Scholar]
- Nguyen, T.; Chen, S.W.; Shivakumar, S.S.; Taylor, C.J.; Kumar, V. Unsupervised Deep Homography: A Fast and Robust Homography Estimation Model. arXiv 2018, arXiv:1709.03966. [Google Scholar] [CrossRef]
- Michels, J.; Saxena, A.; Ng, A.Y. High Speed Obstacle Avoidance Using Monocular Vision and Reinforcement Learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML ’05), Bonn, Germany, 7–11 August 2005; ACM: New York, NY, USA, 2005; pp. 593–600. [Google Scholar] [CrossRef]
- Saxena, A.; Chung, S.; Ng, A. Learning depth from single monocular images. Adv. Neural Inf. Process. Syst. 2005, 18, 1161–1168. [Google Scholar]
- Hoiem, D.; Efros, A.A.; Hebert, M. Automatic photo pop-up. ACM Trans. Graph. 2005, 24, 577–584. [Google Scholar] [CrossRef]
- Eigen, D.; Puhrsch, C.; Fergus, R. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. arXiv 2014, arXiv:1406.2283. [Google Scholar]
- Laina, I.; Rupprecht, C.; Belagiannis, V.; Tombari, F.; Navab, N. Deeper Depth Prediction with Fully Convolutional Residual Networks. arXiv 2016, arXiv:1606.00373. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Lee, J.H.; Kim, C.S. Single-image depth estimation using relative depths. J. Vis. Commun. Image Represent. 2022, 84, 103459. [Google Scholar] [CrossRef]
- Ranftl, R.; Lasinger, K.; Hafner, D.; Schindler, K.; Koltun, V. Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer. arXiv 2020, arXiv:1907.01341. [Google Scholar] [CrossRef]
- Yin, F.; Zhou, S. Accurate Estimation of Body Height From a Single Depth Image via a Four-Stage Developing Network. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8264–8273. [Google Scholar] [CrossRef]
- Lee, D.S.; Kim, J.S.; Jeong, S.C.; Kwon, S.K. Human Height Estimation by Color Deep Learning and Depth 3D Conversion. Appl. Sci. 2020, 10, 5531. [Google Scholar] [CrossRef]
- Alphonse, P.; Sriharsha, K. Depth estimation from a single RGB image using target foreground and background scene variations. Comput. Electr. Eng. 2021, 94, 107349. [Google Scholar] [CrossRef]
- Mou, L.; Zhu, X.X. IM2HEIGHT: Height Estimation from Single Monocular Imagery via Fully Residual Convolutional-Deconvolutional Network. arXiv 2018, arXiv:1802.10249. [Google Scholar]
- ELP. Available online: http://www.elpcctv.com/fixed-focus-usb500w05g-series-c-46_81.html (accessed on 4 August 2024).
- Seek Thermal. Available online: https://www.thermal.com/micro-core.html (accessed on 4 August 2024).
- Google Maps. Available online: https://www.google.pl/maps (accessed on 24 April 2023).
- Zheng, C.; Wu, W.; Chen, C.; Yang, T.; Zhu, S.; Shen, J.; Kehtarnavaz, N.; Shah, M. Deep Learning-Based Human Pose Estimation: A Survey. arXiv 2022, arXiv:2012.13392. [Google Scholar] [CrossRef]
HBM Vision | HBM Thermo | MDE | Fusion | Ground Truth | Number of Frames | |
---|---|---|---|---|---|---|
Person 1 | 182 cm | 180 cm | 192 cm | 187 cm | 186 cm | 346 |
Error 1 | 2.15% | 3.23% | 3.23% | 0.27% | - | 346 |
Person 2 | 177 cm | 167 cm | 180 cm | 174 cm | 178 cm | 350 |
Error 2 | 0.56% | 6.18% | 1.12% | 1.12% | - | 350 |
Person 3 | 173 cm | 167 cm | 177 cm | 174 cm | 173 cm | 412 |
Error 3 | 0.00% | 3.47% | 2.31% | 0.29% | - | 412 |
Avg. Error | 0.90% | 4.29% | 2.22% | 0.56% | - | - |
HBM Vision | HBM Thermo | MDE | Fusion | Ground Truth | No. of Frames | |
---|---|---|---|---|---|---|
Person 4 | 187 cm | 188 cm | 190 cm | 189 cm | 185 cm | 865 |
Error 4 | 1.08% | 1.62% | 2.70% | 2.03% | - | 865 |
Person 5 | 178 cm | 187 cm | 175 cm | 179 cm | 179 cm | 1044 |
Error 5 | 0.56% | 4.47% | 2.23% | 0.00% | - | 1044 |
Person 6 | 169 cm | 189 cm | 179 cm | 179 cm | 174 cm | 937 |
Error 6 | 2.87% | 7.94% | 2.87% | 2.87% | - | 937 |
Person 7 | 169 cm | 180 cm | 167 cm | 171 cm | 170 cm | 1255 |
Error 7 | 0.59% | 5.88% | 1.76% | 0.44% | - | 1255 |
Person 8 | 172 cm | 179 cm | 166 cm | 171 cm | 168 cm | 1968 |
Error 8 | 2.38% | 6.55% | 1.19% | 1.64% | - | 1968 |
Person 9 | 162 cm | 171 cm | 160 cm | 163 cm | 167 cm | 1080 |
Error 9 | 2.99% | 2.40% | 4.19% | 2.25% | - | 1080 |
Person 10 | 155 cm | 157 cm | 163 cm | 160 cm | 160 cm | 1015 |
Error 10 | 3.13% | 1.88% | 1.88% | 0.00% | - | 1015 |
Avg. Error | 1.94% | 4.39% | 2.40% | 1.38% | - | - |
HBM Vision | HBM Fusion | MDE | Fusion | |
---|---|---|---|---|
Avg. Error | 1.63% | 4.36% | 2.35% | 1.14% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gąsienica-Józkowy, J.; Cyganek, B.; Knapik, M.; Głogowski, S.; Przebinda, Ł. Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices. Information 2024, 15, 474. https://doi.org/10.3390/info15080474
Gąsienica-Józkowy J, Cyganek B, Knapik M, Głogowski S, Przebinda Ł. Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices. Information. 2024; 15(8):474. https://doi.org/10.3390/info15080474
Chicago/Turabian StyleGąsienica-Józkowy, Jan, Bogusław Cyganek, Mateusz Knapik, Szymon Głogowski, and Łukasz Przebinda. 2024. "Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices" Information 15, no. 8: 474. https://doi.org/10.3390/info15080474
APA StyleGąsienica-Józkowy, J., Cyganek, B., Knapik, M., Głogowski, S., & Przebinda, Ł. (2024). Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices. Information, 15(8), 474. https://doi.org/10.3390/info15080474