ReS2tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices
Abstract
:1. Introduction
- the optimization of the algorithm for embedded CUDA GPUs, such as the NVIDIA Tegra, by using massively parallel computing,
- the use of the NEON intrinsics to optimize the algorithm for vectorized SIMD processing on embedded ARM CPUs, and
- the deployment of our approach on the DJI Manifold 2-G attached to a DJI Matrix 210v2 RTK UAV and a use-case specific evaluation with respect to accuracy, processing speed and power consumption.
1.1. Paper Outline
1.2. Related Work
1.2.1. Embedded Stereo Processing on FPGAs
1.2.2. On the Emergence of Embedded Processing on GPUs
1.2.3. Are Embedded CPUs Suitable for Stereo Processing?
2. Materials and Methods
2.1. Processing Pipeline for Real-Time Dense Disparity Estimation
2.1.1. Dense Image Matching and Cost Computation
2.1.2. Cost Optimization and Disparity Computation
2.1.3. Post-Processing
Subpixel Disparity Refinement
Occlusion Detection by Left–Right Consistency Check
Median Filter
2.2. Real-Time Processing by Massively Parallel Computing on CUDA-Enabled GPUs
2.2.1. Matching Cost Computation
Calculating the Census Transformation and Its Hamming Distance
Inverted and Truncated Version of the Normalized Cross-Correlation
2.2.2. Semi-Global Matching Optimization
2.2.3. Consistency Check
2.2.4. Median Filter
2.3. Vectorized SIMD Processing with NEON Intrinsic Set on ARM CPUs
- a thread-level parallelization, and
- a vectorized data processing with the Single-Instruction-Multiple-Data (SIMD) NEON intrinsics.
2.3.1. Calculating the Census Transformation and Its Hamming Distance
2.3.2. Semi-Global Matching Optimization
2.3.3. Consistency Check
2.3.4. Median Filter
3. Results
3.1. Quantitative Evaluation of Accuracy on Public Stereo Benchmarks
3.1.1. Accuracy
KITTI 2015 Stereo Benchmark
Middelbury 2014 Stereo Benchmark
3.1.2. The Effect of Subpixel Disparity Refinement
3.1.3. Accurate Left–Right Consistency Check
3.1.4. Throughput, Frame Rates and Power Consumption
- MAXN
- This is the setting enabling the maximum performance. With this, all eight cores of the ARM CPU are activated and can clock up to a maximum of GHz. The maximum clock rate of the GPU is set to GHz. This is the setting with which all previous experiments were conducted.
- 30 W
- In this, again all eight cores of the CPU are enabled. However, they are restricted to a maximum clock rate of GHz. Furthermore, the clock rate of the GPU is restricted to 905 MHz.
- 15 W
- In this setting, four cores of the CPU are enabled which clock at a maximum rate of GHz, while the GPU clocks up to 675 MHz.
- 10 W
- In the smallest setting, only two cores of the CPU are enabled with a maximum of GHz and the clock rate of the GPU is restricted to only 522 MHz.
3.2. Qualitative and Quantitative Evaluation of Real-Time Stereo Processing on Board Low-Cost UAVs
4. Discussion
4.1. Accuracy
4.2. Processing Speed
4.3. Power Consumption
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Stereoscopic Vision
Appendix B. Left–Right Consistency Check
Appendix C. General-Purpose Computing on CUDA-Enabled GPUs
- Reduction of global memory access due to higher latency. Instead, data which is processed multiple times by threads in a thread-block should be cached inside the shared memory space.
- Pooling of global memory access and reduction of non-contiguous data storage.
- Efficient and maximum use of hardware resources.
Appendix D. Map Reduce Method
Appendix E. Vectorized SIMD Processing on ARM CPUs
- Reduction of the dependencies between conventional CPU and vectorized SIMD processing, to minimize the latency induced by copying data between the SISD and SIMD pipeline.
- Exploitation of cache coherence, to speed up the data transfer between the memory and the vector-registers.
- Dependencies in the data of vector-instructions trigger pipeline-stalls, in which the SIMD pipeline is stopped until the dependencies are resolved slowing down the processing.
- Minimal use of conditional branching, since if the Branch Prediction Unit (BPU) of the CPU predicts the wrong branch, the pipeline must be recursively cleared until the point of branching and restarted.
Appendix F. Sorting Networks
- wires, which hold and transport one value of the input vector each, and
- comparators, which are responsible for comparing the values of the connected wires and swap these if necessary.
References
- Nex, F.; Remondino, F. UAV for 3D mapping applications: A review. Appl. Geomat. 2014, 6, 1–15. [Google Scholar] [CrossRef]
- Restas, A. Drone applications for supporting disaster management. World J. Eng. Technol. 2015, 03, 316–321. [Google Scholar] [CrossRef] [Green Version]
- Perz, R.; Wronowski, K. UAV application for precision agriculture. Aircr. Eng. Aerosp. Technol. 2019, 91, 257–263. [Google Scholar] [CrossRef]
- Sebbane, Y.B. Intelligent Autonomy of UAVs: Advanced Missions and Future Use; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Shakhatreh, H.; Sawalmeh, A.H.; Al-Fuqaha, A.; Dou, Z.; Almaita, E.; Khalil, I.; Othman, N.S.; Khreishah, A.; Guizani, M. Unmanned Aerial Vehicles (UAVs): A survey on civil applications and key research challenges. IEEE Access 2019, 7, 48572–48634. [Google Scholar] [CrossRef]
- Nex, F.; Rinaudo, F. LiDAR or photogrammetry? Integration is the answer. Eur. J. Remote Sens. 2011, 43, 107–121. [Google Scholar] [CrossRef] [Green Version]
- Hirschmueller, H. Accurate and efficient stereo processing by semi-global matching and mutual information. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–26 June 2005; pp. 807–814. [Google Scholar]
- Hirschmueller, H. Stereo processing by semi-global matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
- Gehrig, S.K.; Eberli, F.; Meyer, T. A real-time low-power stereo vision engine using semi-global matching. In Proceedings of the International Conference on Computer Vision Systems, Liège, Belgium, 13–15 October 2009; pp. 134–143. [Google Scholar]
- Banz, C.; Hesselbarth, S.; Flatt, H.; Blume, H.; Pirsch, P. Real-time stereo vision system using semi-global matching disparity estimation: Architecture and FPGA-implementation. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, Samos, Greece, 19–22 July 2010; pp. 93–101. [Google Scholar]
- Wang, W.; Yan, J.; Xu, N.; Wang, Y.; Hsu, F. Real-Time High-Quality Stereo Vision System in FPGA. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 1696–1708. [Google Scholar] [CrossRef]
- Schmid, K.; Tomic, T.; Ruess, F.; Hirschmueller, H.; Suppa, M. Stereo vision based indoor/outdoor navigation for flying robots. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Tokyo, Japan, 3–7 November 2013; pp. 3955–3962. [Google Scholar]
- Gehrig, S.K.; Rabe, C. Real-time semi-global matching on the CPU. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 85–92. [Google Scholar]
- Honegger, D.; Oleynikova, H.; Pollefeys, M. Real-time and low latency embedded computer vision hardware based on a combination of FPGA and mobile CPU. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, 14–18 September 2014; pp. 4930–4935. [Google Scholar]
- Barry, A.J.; Oleynikova, H.; Honegger, D.; Pollefeys, M.; Tedrake, R. FPGA vs. pushbroom stereo vision for MAVs. In Proceedings of the IROS Workshop on Vision-based Control and Navigation of Small Lightweight UAVs, Hamburg, Germany, 28 September–2 October 2015. [Google Scholar]
- Hofmann, J.; Korinth, J.; Koch, A. A scalable high-performance hardware architecture for real-time stereo vision by semi-global matching. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 845–853. [Google Scholar]
- Rahnama, O.; Cavalleri, T.; Golodetz, S.; Walker, S.; Torr, P. R3SGM: Real-time raster-respecting semi-global matching for power-constrained systems. In Proceedings of the IEEE International Conference on Field-Programmable Technology, Naha, Japan, 10–14 December 2018. [Google Scholar]
- Zhao, J.; Liang, T.; Feng, L.; Ding, W.; Sinha, S.; Zhang, W.; Shen, S. FP-Stereo: Hardware-efficient stereo vision for embedded applications. In Proceedings of the IEEE International Conference on Field-Programmable Logic and Applications, Gothenburg, Sweden, 31 August–4 September 2020; pp. 269–276. [Google Scholar]
- Ruf, B.; Monka, S.; Kollmann, M.; Grinberg, M. Real-time on-board obstacle avoidance for UAVs based on embedded stereo vision. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2018, XLII-1, 363–370. [Google Scholar] [CrossRef] [Green Version]
- Kalb, T.; Kalms, L.; Göhringer, D.; Pons, C.; Muddukrishna, A.; Jahre, M.; Ruf, B.; Schuchert, T.; Tchouchenkov, I.; Ehrenstråhle, C.; et al. Developing low-power image processing applications with the TULIPP reference platform instance. In Hardware Accelerators in Data Centers; Springer: Berlin/Heidelberg, Germany, 2019; pp. 181–197. [Google Scholar]
- Rosenberg, I.D.; Davidson, P.L.; Muller, C.M.; Han, J.Y. Real-time stereo vision using semi-global matching on programmable graphics hardware. In ACM SIGGRAPH 2006 Sketches; Association for Computing Machinery: New York, NY, USA, 2006. [Google Scholar]
- Ernst, I.; Hirschmüller, H. Mutual information based semi-global stereo matching on the GPU. In Proceedings of the International Symposium on Advances in Visual Computing, Las Vegas, NV, USA, 1–3 December 2008; pp. 228–239. [Google Scholar]
- Banz, C.; Blume, H.; Pirsch, P. Real-time semi-global matching disparity estimation on the GPU. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 6–13 November 2011; pp. 514–521. [Google Scholar]
- Michael, M.; Salmen, J.; Stallkamp, J.; Schlipsing, M. Real-time stereo vision: Optimizing semi-global matching. In Proceedings of the IEEE Intelligent Vehicles Symposium, Gold Coast City, Australia, 23–26 June 2013; pp. 1197–1202. [Google Scholar]
- Hernandez-Juarez, D.; Chacón, A.; Espinosa, A.; Vázquez, D.; Moure, J.C.; López, A.M. Embedded real-time stereo estimation via semi-global matching on the GPU. Procedia Comput. Sci. 2016, 80, 143–153. [Google Scholar] [CrossRef] [Green Version]
- Chang, Q.; Zha, A.; Wang, W.; Liu, X.; Onishi, M.; Maruyama, T. Z2-ZNCC: ZigZag scanning based zero-means normalized cross correlation for fast and accurate stereo matching on embedded GPU. In Proceedings of the IEEE International Conference on Computer Design, Hartford, CT, USA, 18–21 October 2020. [Google Scholar]
- Spangenberg, R.; Langner, T.; Adfeldt, S.; Rojas, R. Large scale semi-global matching on the CPU. In Proceedings of the IEEE Intelligent Vehicles Symposium, Dearborn, MI, USA, 8–11 June 2014; pp. 195–201. [Google Scholar]
- Arndt, O.J.; Becker, D.; Banz, C.; Blume, H. Parallel implementation of real-time semi-global matching on embedded multi-core architectures. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, Samos, Greece, 15–18 July 2013; pp. 56–63. [Google Scholar]
- Rahnama, O.; Frost, D.; Miksik, O.; Torr, P.H. Real-Time dense stereo matching with ELAS on FPGA-accelerated embedded devices. IEEE Robot. Autom. Lett. 2018, 3, 2008–2015. [Google Scholar] [CrossRef] [Green Version]
- Geiger, A.; Roser, M.; Urtasun, R. Efficient large-scale stereo matching. In Proceedings of the Asian Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2011; pp. 25–38. [Google Scholar]
- Saidi, T.E.; Khouas, A.; Amira, A. Accelerating stereo matching on mutlicore ARM platform. In Proceedings of the IEEE International Symposium on Circuits and Systems, Seville, Spain, 10–21 October 2020. [Google Scholar]
- ARM. NEON Programmers Guide; ARM: Cambridge, UK, 2013. [Google Scholar]
- Scharstein, D.; Szeliski, R. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vis. 2002, 47, 7–42. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Tuytelaars, T.; Van Gool, L. SURF: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 404–417. [Google Scholar]
- Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An efficient alternative to SIFT or SURF. In Proceedings of the International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
- Zabih, R.; Woodfill, J. Non-parametric local transforms for computing visual correspondence. In Proceedings of the European Conference on Computer Vision, Stockholm, Sweden, 2–6 May 1994; pp. 151–158. [Google Scholar]
- Zhang, Z. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1330–1334. [Google Scholar] [CrossRef] [Green Version]
- Menze, M.; Geiger, A. Object scene flow for autonomous vehicles. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3061–3070. [Google Scholar]
- Scharstein, D.; Hirschmüller, H.; Kitajima, Y.; Krathwohl, G.; Nešić, N.; Wang, X.; Westling, P. High-resolution stereo datasets with subpixel-accurate ground truth. In Proceedings of the German Conference on Pattern Recognition, Münster, Germany, 2–5 September 2014; pp. 31–42. [Google Scholar]
- Cui, H.; Dahnoun, N. Real-time stereo vision implementation on Nvidia Jetson TX2. In Proceedings of the Mediterranean Conference on Embedded Computing, Budva, Montenegro, 10–14 June 2019; pp. 1–5. [Google Scholar]
- Schönberger, J.L.; Sinha, S.N.; Pollefeys, M. Learning to fuse proposals from multiple scanline optimizations in semi-global matching. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 739–755. [Google Scholar]
- DJI Onboard SDK Advanced Sensing-Stereo Camera. Available online: https://developer.dji.com/onboard-sdk/documentation/guides/component-guide-advanced-sensing-stereo-camera.html (accessed on 19 March 2021).
- DJI. Matrice 200 V2-Serie-User Manual; DJI: Shenzhen, China, 2020. [Google Scholar]
- ARM. ARM Architecture Reference Manual for ARMv8 Architecture Profile; ARM: Cambridge, UK, 2017. [Google Scholar]
- Knuth, D. Sorting and searching. Art Comput. Program. 1998, 3, 513. [Google Scholar]
Reference | HW Device | Embedded SoC | Resolution | Disp. Range | FPS | Power |
---|---|---|---|---|---|---|
Gehrig et al. [9] | FPGA | 64 | 27 | <3 W | ||
Banz et al. [10] | FPGA | 128 | 30 | n/a | ||
Honegger et al. [14] | FPGA | ✓ | 32 | 60 | <5 W | |
Wang et al. [11] † | FPGA | 96 | 67 | n/a | ||
Barry et al. [15] | FPGA | ✓ | 32 | 120 | <5 W * | |
Hofmann et al. [16] † | FPGA | ✓ | 64 | 140 | n/a | |
Ruf et al. [19] | FPGA | ✓ | 64 | 29 | n/a | |
Rahnama et al. [17] † | FPGA | ✓ | 128 | 109 | <3 W | |
Zhao et al. [18] † | FPGA | ✓ | 128 | 161 | 6.6 W | |
Banz et al. [23] † | GPU | 128 | 25 | n/a | ||
Michael et al. [24] | GPU | 64 | 11.7 | n/a | ||
Hernandez-Juarez et al. [25] † | GPU | ✓ | 128 | 42 | <10 W | |
ReStAC-CUDA (ours) | GPU | ✓ | 128 | 24 | ~20 W * | |
Gehrig and Rabe [13] † | CPU | 16 | 14 | n/a | ||
Arndt et al. [28] † | CPU | ✓ | 64 | 0.5 | n/a | |
Spangenberg et al. [27] † | CPU | 128 | 16 | n/a | ||
ReStAC-NEON (ours) | CPU | ✓ | 128 | 7.2 | ~18 W * |
Approach | Configuration | HW Device | Resolution (in pixels) | Accuracy | ||
---|---|---|---|---|---|---|
D1-all (Est.) | D1-all (All) | Density | ||||
Zhao et al. [18] | —SGM | FPGA | - | - | ||
Zhao et al. [18] | —SGM | FPGA | - | - | ||
Ruf et al. [19] | —SGM | FPGA | ||||
Rahnama et al. [17] | —MGM | FPGA | ||||
Rahnama et al. [17] | —MGM | FPGA | ||||
Cui and Dahnoun [41] † | -native | GPU | - | - | ||
Cui and Dahnoun [41] † | -optimized | GPU | - | - | ||
Chang et al. [26] | Z-ZNCC | GPU | ||||
Hernandez-Juarez et al. [25] | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—CUDA | —SGM | GPU | ||||
ReStAC—NEON | —SGM | CPU | ||||
ReStAC—NEON | —SGM | CPU | ||||
Schönberger et al. [42] | —SGM-Forest | CPU | ||||
OpenCV-SGBM | —SGM | CPU | ||||
Hirschmueller [8] | CT—SGM | GPU |
Approach | Configuration | Resolution | Accuracy | ||||
---|---|---|---|---|---|---|---|
bad0.5 | bad1 | bad2 | bad4 | Density | |||
ReStAC—CUDA | —SGM | Orig. Q | |||||
ReStAC—CUDA | —SGM | Orig. Q | |||||
ReStAC—CUDA | —SGM | Orig. Q | |||||
ReStAC—CUDA | —SGM | Orig. Q | |||||
ReStAC—NEON | —SGM | Orig. Q | |||||
Schönberger et al. [42] | —SGM-Forest | Orig. H | |||||
OpenCV-SGBM | —SGM | Orig. Q | |||||
Hirschmueller [8] | CT—SGM | Orig. H |
Approach | Configuration | Resolution (in pixels) | Accuracy | |
---|---|---|---|---|
D1-all (Est.) | D1-all (All) | |||
ReStAC—CUDA | —SGM—fine | () | () | |
ReStAC—CUDA | —SGM—fine | () | () | |
ReStAC—CUDA | —SGM—fine | () | () | |
ReStAC—CUDA | —SGM—fine | () | () |
Approach | Configuration | Resolution | Accuracy | |||
---|---|---|---|---|---|---|
bad 0.5 | bad 1 | bad 2 | bad 4 | |||
ReStAC—CUDA | —SGM—fine | Orig. Q | () | () | () | () |
ReStAC—CUDA | —SGM—fine | Orig. Q | () | () | () | () |
ReStAC—CUDA | —SGM—fine | Orig. Q | () | () | () | () |
ReStAC—CUDA | —SGM—fine | Orig. Q | () | () | () | () |
Approach | Configuration | HW Device | Throughput (in MDE/s) |
---|---|---|---|
Zhao et al. [18] | —SGM | FPGA | |
Zhao et al. [18] | —SGM | FPGA | |
Ruf et al. [19] | —SGM | FPGA | |
Rahnama et al. [17] | — MGM | FPGA | |
Cui and Dahnoun [41] | -optimized | GPU (TX2) | |
Chang et al. [26] | Z-ZNCC | GPU (TX2) | |
Hernandez-Juarez et al. [25] | —SGM | GPU (TX1) | |
ReStAC-CUDA | —SGM | GPU (AGX) | |
ReStAC-CUDA | —SGM | GPU (AGX) | |
ReStAC-CUDA | —SGM—fine | GPU (AGX) | |
ReStAC-CUDA | —SGM—fine | GPU (AGX) | |
ReStAC-CUDA | —SGM—exact-cc | GPU (AGX) | |
ReStAC-CUDA | —SGM—exact-cc | GPU (AGX) | |
ReStAC-CUDA | —SGM | GPU (AGX) | |
ReStAC-CUDA | —SGM | GPU (AGX) | |
ReStAC-NEON | —SGM | CPU |
Approach | Configuration | HW Device | Throughput (in MDE/s) | Accuracy | |
---|---|---|---|---|---|
D1-all (Est.) | Density | ||||
ReStAC-CUDA | —4-Path-SGM | GPU (AGX) | () | () | () |
ReStAC-CUDA | —4-Path-SGM | GPU (AGX) | () | () | () |
ReStAC-CUDA | —4-Path-SGM | GPU (AGX) | () | () | () |
ReStAC-CUDA | —4-Path-SGM | GPU (AGX) | () | () | () |
ReStAC-NEON | —4-Path-SGM | CPU | () | () | () |
Approach | Configuration | HW Device | Throughput (in MDE/s) | Frame Rate | |
---|---|---|---|---|---|
at pixels (in FPS) | at pixels (in FPS) | ||||
ReStAC-CUDA | —4-Path-SGM | GPU (TX2) | |||
ReStAC-NEON | —4-Path-SGM | CPU |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ruf, B.; Mohrs, J.; Weinmann, M.; Hinz, S.; Beyerer, J. ReS2tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices. Sensors 2021, 21, 3938. https://doi.org/10.3390/s21113938
Ruf B, Mohrs J, Weinmann M, Hinz S, Beyerer J. ReS2tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices. Sensors. 2021; 21(11):3938. https://doi.org/10.3390/s21113938
Chicago/Turabian StyleRuf, Boitumelo, Jonas Mohrs, Martin Weinmann, Stefan Hinz, and Jürgen Beyerer. 2021. "ReS2tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices" Sensors 21, no. 11: 3938. https://doi.org/10.3390/s21113938
APA StyleRuf, B., Mohrs, J., Weinmann, M., Hinz, S., & Beyerer, J. (2021). ReS2tAC—UAV-Borne Real-Time SGM Stereo Optimized for Embedded ARM and CUDA Devices. Sensors, 21(11), 3938. https://doi.org/10.3390/s21113938