Depth Prediction Improvement for Near-Field iToF Lidar in Low-Speed Motion State
Abstract
:1. Introduction
1.1. Contributions
- We propose a new method to unwrap iTOF depth maps during motion and reduce motion blur artifacts, serving as a benchmark for future research. This method extends the NFL sensor depth range to the third and fourth depth cycles, enabling coverage up to 25 m, surpassing current methods limited to the second depth cycle.
- We introduce our new Atrous Spatial Inception Pyramid Module (ASIPM) to estimate the blur transformation of vertices for the overall context based on the study conducted by Huo et al. [10], a Visual Attention Network to estimate the blur transformation of vertices per specific objects, and an Inverse Blur Tensor Module to compensate for the effect of motion blur efficiently.
- Unlike existing approaches that require ambient light frames, our method exclusively uses laser measurements, making it ideal for visually challenging environments like smoke, dust, or darkness.
1.2. Paper Structure
2. Problem Description
3. Related Works
3.1. Motion Blur Noise Removal
3.2. Phase Unwrapping in Motion State
3.3. Improving Model Predictions Using Continuous Learning
4. Depth Unwrapping with Motion Deblurring Method
4.1. Method Formal Background
4.1.1. Handling Grayscale Time Shift
4.1.2. Handling Ambiguous Depth Map Blurriness
4.1.3. Phase Unwrapping in the Dynamic State
4.2. Method Implementation Details
4.2.1. Proposed Model Architecture
- Main Branch, Unwrapped Depth Map Prediction Branch: an autoencoder architecture with four stages:
- The TofRegNetMotion input encoder is the first input stage. It is designed to deal with ambiguous depth map details to extract the overall depth context. It comprises four subsequent vanilla ResNet blocks [34].
- The ResNet Preactivation backbone is the following stage, designed for detailed depth feature extraction. The stage is composed of eight subsequent Preactivation ResNet blocks [35].
- The TofRegNetMotion output decoder is designed to predict unwrapped depth maps and reduce motion blur simultaneously. It fuses the feature block extracted with the ResNet Preactivation backbone and the features extracted by the motion blur inverse tensor prediction branch to perform the prediction task. The module is composed of four subsequent up-sampling residual blocks.
- The depth regression head predicts the final detailed unwrapped depth map with reduced motion blur. It uses the output of the TofRegNetMotion input encoder stage and the TofRegNetMotion output decoder stage through skip connections. The module comprises a transposed convolution layer followed by two convolution layers.
- Motion Blur Inverse Tensor Prediction Branch: Consists of three primary components, as in Figure 5:
- The ASIPM module processes the wrapped depth map to predict the general context optical flow of the point cloud vertices. It comprises four different CNN kernels, based on residual inception [16], with four different dilation levels (one, two, four, and eight dilation kernels) followed by a feature fusion CNN kernel composed of an adaptive average pooling layer followed by two convolution layers, a batch normalization layer, and a ReLU activation layer.
- The Inverse Blur Tensor module generates the final motion blur inverse tensor feature map. It concatenates the output of the visual attention and ASIPM modules. Then, it applies cascaded convolution kernels to estimate the final inverse blur tensor. The module comprises a concatenation layer followed by three dilation convolution kernels, each of which has a dilation size of two.
4.2.2. The Continuous Learning Framework
- Dropout layers are integrated into the model, activated solely during training with the dynamic dataset. These layers aim to prevent overfitting to the dynamic dataset, thereby retaining knowledge about the static dataset and promoting generalization [38].
- The rehearsal technique involves randomly selecting batches from the static dataset during training with the dynamic dataset. This selection process ensures representation from the entire static dataset, preventing overfitting to specific scenarios. By incorporating batches from the static dataset, the model retains previous knowledge while learning from the dynamic dataset [39].
- L2 regularization is applied by amplifying the loss function for static dataset batches during training with the dynamic dataset, effectively penalizing inaccurate predictions [40]. When training the model with the dynamic dataset, the loss function for static dataset batches is multiplied by a factor greater than 1.0. This over-penalization ensures that the model prioritizes fitting to the static dataset, thereby retaining its knowledge even during training with dynamic data [40].
4.2.3. Loss Functions
4.2.4. Learning Rate Decay Scheduling
5. Experiments and Results
5.1. Training Details
5.2. Datasets
5.3. Results
5.3.1. Comparative Analysis
5.3.2. Quantitative Analysis
- The ground truth is affected by motion blur, while predicted maps address this issue, resulting in a higher error rate between the prediction and the ground truth.
- The model’s predictions may capture more details than those detected by the ground truth, contributing to error calculation discrepancies.
- External factors like blooming and sunlight interference impact both ground truth capture and prediction, although this falls beyond the scope of this paper.
5.3.3. Qualitative Analysis
5.4. Ablation Study
- Full working model with all components intact.
- Full model without utilizing the spatial transformation network.
- Model without the spatial transformation network and the inverse blur kernel estimation network, operating only the core pipeline using the ambiguous depth map as input.
- Model with the spatial transformation network but without the inverse blur kernel estimation network, operating only the core pipeline using a grayscale image instead of the ambiguous depth map as input.
- Model without the spatial transformation network and the inverse blur kernel estimation network, operating only the core pipeline using a grayscale image instead of the ambiguous depth map as input.
- For the normal case (i.e., the static dataset followed by the dynamic dataset), we used the default training configuration as described in Section 5.1.
- For the dynamic dataset only, we used the training configurations as follows: using the Adam optimizer, the values of and were 0.9 and 0.999, respectively. The learning rate was scheduled to decay using the stepwise algorithm, where the starting rate was 0.0007 and the minimum rate was 0.0003. We performed 30 epochs but with a stepwise decay of 10 steps. We used dropout with a 25% rate and a batch size of 16.
6. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
MDPI | Multidisciplinary Digital Publishing Institute |
iToF | Indirect Time-of-Flight |
Lidar | Light Detection and Ranging |
NFL | Near-Field Lidar |
AMCW | Amplitude-Modulated Continuous Wave |
CMOS | Complementary Metal-Oxide Semiconductor |
ASIPM | Atrous Spatial Inception Pyramid Module |
DCS | Differential Correlation Samples |
SNR | Signal Noise Ratio |
FPP | Fringe Projection Profilometry |
STN | Spatial Transformer Network |
GS | Grayscale |
RMSE | Root Mean Square Error |
MAE | Mean Absolute Error |
SqRel | Squared Relative Difference |
AbsRel | Absolute Relative Difference |
SiLog | Scale-Invariant Logarithmic Error |
W/o | Without |
References
- Eising, C.; Horgan, J.; Yogamani, S. Near-Field Perception for Low-Speed Vehicle Automation Using Surround-View Fisheye Cameras. IEEE Trans. Intell. Transp. Syst. 2022, 23, 13976–13993. [Google Scholar] [CrossRef]
- IEEE Std 2846-2022; IEEE Standard for Assumptions in Safety-Related Models for Automated Driving Systems. IEEE: Piscataway, NJ, USA, 2022; pp. 1–59. [CrossRef]
- Kumar, V.R.; Eising, C.; Witt, C.; Yogamani, S.K. Surround-View Fisheye Camera Perception for Automated Driving: Overview, Survey & Challenges. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3638–3659. [Google Scholar] [CrossRef]
- Reddy Cenkeramaddi, L.; Bhatia, J.; Jha, A.; Kumar Vishkarma, S.; Soumya, J. A Survey on Sensors for Autonomous Systems. In Proceedings of the 2020 15th IEEE Conference on Industrial Electronics and Applications (ICIEA), Kristiansand, Norway, 9–13 November 2020; pp. 1182–1187. [Google Scholar] [CrossRef]
- Zhang, X.; Gong, Y.; Lu, J.; Wu, J.; Li, Z.; Jin, D.; Li, J. Multi-Modal Fusion Technology Based on Vehicle Information: A Survey. IEEE Trans. Intell. Veh. 2023, 8, 3605–3619. [Google Scholar] [CrossRef]
- Valeo. Valeo Near Field LiDAR. 2022. Available online: https://levelfivesupplies.com/wp-content/uploads/2021/09/Valeo-Mobility-Kit-near-field-LiDAR-data-sheet.pdf (accessed on 9 December 2022).
- Roriz, R.; Cabral, J.; Gomes, T. Automotive LiDAR Technology: A Survey. IEEE Trans. Intell. Transp. Syst. 2022, 23, 6282–6297. [Google Scholar] [CrossRef]
- ESPROS Photonics Corporation. DATASHEET–epc660-3D TOF Imager 320 ×240 Pixel, V2.19. 2022. Available online: https://www.espros.com/downloads/01_Chips/Datasheet_epc660.pdf (accessed on 9 December 2022).
- Texas Instruments Incorporated. OPT8241 3D Time-of-Flight Sensor Datasheet-SBAS704B. 2015. Available online: https://www.ti.com/product/OPT8241 (accessed on 9 December 2022).
- Huo, D.; Masoumzadeh, A.; Yang, Y.H. Blind Non-Uniform Motion Deblurring using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), New Orleans, LA, USA, 19–20 June 2022; pp. 436–445. [Google Scholar] [CrossRef]
- Hanto, D.; Pratomo, H.; Rianaris, A.; Setiono, A.; Sartika, S.; Syahadi, M.; Pristianto, E.J.; Kurniawan, D.; Bayuwati, D.; Adinanta, H.; et al. Time of Flight Lidar Employing Dual-Modulation Frequencies Switching for Optimizing Unambiguous Range Extension and High Resolution. IEEE Trans. Instrum. Meas. 2023, 72, 7001408. [Google Scholar] [CrossRef]
- Bamji, C.; Godbaz, J.; Oh, M.; Mehta, S.; Payne, A.; Ortiz, S.; Nagaraja, S.; Perry, T.; Thompson, B. A Review of Indirect Time-of-Flight Technologies. IEEE Trans. Electron Devices 2022, 69, 2779–2793. [Google Scholar] [CrossRef]
- Bulczak, D.; Lambers, M.; Kolb, A. Quantified, Interactive Simulation of AMCW ToF Camera Including Multipath Effects. Sensors 2017, 18, 13. [Google Scholar] [CrossRef]
- Solomon, C.; Breckon, T. Enhancement. In Fundamentals of Digital Image Processing: A practical Approach with Examples in Matlab; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2011; pp. 85–111. [Google Scholar] [CrossRef]
- Gao, J.; Gao, X.; Nie, K.; Gao, Z.; Xu, J. A Deblurring Method for Indirect Time-of-Flight Depth Sensor. IEEE Sens. J. 2023, 23, 2718–2726. [Google Scholar] [CrossRef]
- Herrmann, C.; Willersinn, D.; Beyerer, J. Residual vs. inception vs. classical networks for low-resolution face recognition. In Proceedings of the Image Analysis: 20th Scandinavian Conference, SCIA 2017, Tromsø, Norway, 12–14 June 2017; Proceedings, Part II 20. Springer: Cham, Switzerland, 2017; pp. 377–388. [Google Scholar]
- Chang, M.; Yang, C.; Feng, H.; Xu, Z.; Li, Q. Beyond Camera Motion Blur Removing: How to Handle Outliers in Deblurring. IEEE Trans. Comput. Imaging 2021, 7, 463–474. [Google Scholar] [CrossRef]
- Argaw, D.M.; Kim, J.; Rameau, F.; Cho, J.W.; Kweon, I.S. Optical flow estimation from a single motion-blurred image. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; Volume 35, pp. 891–900. [Google Scholar]
- Zhang, S.; Zhen, A.; Stevenson, R.L. Deep motion blur removal using noisy/blurry image pairs. J. Electron. Imaging 2021, 30, 033022. [Google Scholar] [CrossRef]
- Nagiub, M.; Beuth, T.; Sistu, G.; Gotzig, H.; Eising, C. Near Field iToF LIDAR Depth Improvement from Limited Number of Shots. arXiv 2023, arXiv:2304.07047. [Google Scholar]
- Jung, H.; Brasch, N.; Leonardis, A.; Navab, N.; Busam, B. Wild ToFu: Improving Range and Quality of Indirect Time-of-Flight Depth with RGB Fusion in Challenging Environments. In Proceedings of the 2021 International Conference on 3D Vision (3DV), Virtual, 1–3 December 2021; pp. 239–248. [Google Scholar] [CrossRef]
- Wang, S.; Chen, T.; Shi, M.; Zhu, D.; Wang, J. Single-frequency and accurate phase unwrapping method using deep learning. Opt. Lasers Eng. 2023, 162, 107409. [Google Scholar] [CrossRef]
- Qiao, X.; Ge, C.; Deng, P.; Wei, H.; Poggi, M.; Mattoccia, S. Depth Restoration in Under-Display Time-of-Flight Imaging. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 5668–5683. [Google Scholar] [CrossRef]
- Schelling, M.; Hermosilla, P.; Ropinski, T. Weakly-Supervised Optical Flow Estimation for Time-of-Flight. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 2134–2143. [Google Scholar] [CrossRef]
- Jokela, M.; Pyykönen, P.; Kutila, M.; Kauvo, K. LiDAR Performance Review in Arctic Conditions. In Proceedings of the 2019 IEEE 15th International Conference on Intelligent Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 5–7 September 2019; pp. 27–31. [Google Scholar] [CrossRef]
- Vyas, P.; Saxena, C.; Badapanda, A.; Goswami, A. Outdoor monocular depth estimation: A research review. arXiv 2022, arXiv:2205.01399. [Google Scholar]
- Wang, Z.; Zhang, Z.; Lee, C.Y.; Zhang, H.; Sun, R.; Ren, X.; Su, G.; Perot, V.; Dy, J.; Pfister, T. Learning to Prompt for Continual Learning. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 139–149. [Google Scholar] [CrossRef]
- Smith, J.; Tian, J.; Hsu, Y.C.; Kira, Z. A Closer Look at Rehearsal-Free Continual Learning. arXiv 2022, arXiv:2203.17269. [Google Scholar]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
- Zhang, K.; Ren, W.; Luo, W.; Lai, W.S.; Stenger, B.; Yang, M.H.; Li, H. Deep image deblurring: A survey. Int. J. Comput. Vis. 2022, 130, 2103–2130. [Google Scholar] [CrossRef]
- Solomon, C.; Breckon, T. Blind deconvolution. In Fundamentals of Digital Image Processing: A Practical Approach with Examples in Matlab; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2011; pp. 156–158. [Google Scholar]
- Verdié, Y.; Song, J.; Mas, B.; Busam, B.; Leonardis, A.; McDonagh, S. Cromo: Cross-modal learning for monocular depth estimation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 3927–3937. [Google Scholar]
- Riley, K.F.; Hobson, M.P.; Bence, S.J. Mathematical Methods for Physics and Engineering: A Comprehensive Guide; Cambridge University Press: Cambridge, UK, 2006; Volumes 1255–1271; pp. 377–388. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Gao, H.; Yang, Y.; Yao, D.; Li, C. Hyperspectral Image Classification With Pre-Activation Residual Attention Network. IEEE Access 2019, 7, 176587–176599. [Google Scholar] [CrossRef]
- White, J.; Ruiz-Serra, J.; Petrie, S.; Kameneva, T.; McCarthy, C. Self-Attention Based Vision Processing for Prosthetic Vision. In Proceedings of the 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Sydney, Australia, 24–27 July 2023; pp. 1–4. [Google Scholar] [CrossRef]
- Wang, X.; Zhou, W.; Jia, Y. Attention GAN for Multipath Error Removal From ToF Sensors. IEEE Sens. J. 2022, 22, 19713–19721. [Google Scholar] [CrossRef]
- Mai, Z.; Li, R.; Jeong, J.; Quispe, D.; Kim, H.; Sanner, S. Online continual learning in image classification: An empirical survey. Neurocomputing 2022, 469, 28–51. [Google Scholar] [CrossRef]
- Lomonaco, V.; Pellegrini, L.; Rodriguez, P.; Caccia, M.; She, Q.; Chen, Y.; Jodelet, Q.; Wang, R.; Mai, Z.; Vamithzquez, D.; et al. CVPR 2020 continual learning in computer vision competition: Approaches, results, current challenges and future directions. Artif. Intell. 2022, 303, 103635. [Google Scholar] [CrossRef]
- De Lange, M.; Aljundi, R.; Masana, M.; Parisot, S.; Jia, X.; Leonardis, A.; Slabaugh, G.; Tuytelaars, T. A Continual Learning Survey: Defying Forgetting in Classification Tasks. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 3366–3385. [Google Scholar] [CrossRef] [PubMed]
- Yan, X.; Gilani, S.Z.; Qin, H.; Mian, A. Structural similarity loss for learning to fuse multi-focus images. Sensors 2020, 20, 6647. [Google Scholar] [CrossRef]
- You, K.; Long, M.; Wang, J.; Jordan, M.I. How does learning rate decay help modern neural networks? arXiv 2019, arXiv:1908.01878. [Google Scholar]
- Konar, J.; Khandelwal, P.; Tripathi, R. Comparison of Various Learning Rate Scheduling Techniques on Convolutional Neural Network. In Proceedings of the 2020 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 22–23 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
Method | Input | RMSE (m) | AbsRel | SqRel |
---|---|---|---|---|
Modified Wild ToFu | iToF Wrapped Depth Map (4-DCS + GS) | 1.85 | 1.57 | 0.7 |
Ours | iToF Wrapped Depth Map (4-DCS + GS) | 1.07 | 0.23 | 0.42 |
Ours | iToF Wrapped Depth Map (2-DCS + GS) | 1.12 | 0.25 | 0.43 |
Dataset | Prediction Accuracy | Correction Accuracy | RMSE (m) | iRMSE (1/m) | SqRel | AbsRel | SiLog | MAE (m) | iMAE (1/m) | |
---|---|---|---|---|---|---|---|---|---|---|
Two-DCS | Static | 99% | 99% | 1.12 | 0.31 | 0.43 | 0.26 | 0.17 | 0.28 | 0.043 |
Dynamic | 89.69% | 89.45% | 3.89 | 0.29 | 0.021 | 0.012 | 0.43 | 2.31 | 0.0017 | |
Four-DCS | Static | 99% | 99% | 1.08 | 0.31 | 0.42 | 0.24 | 0.1 | 0.25 | 0.05 |
Dynamic | 89.74% | 89.13% | 4.02 | 0.3 | 2.33 | 1.32 | 0.46 | 2.42 | 0.0024 |
Prediction Frame | Correction Frame | RMSE (m) | iRMSE (1/m) | SqRel | AbsRel | SiLog | ||||
---|---|---|---|---|---|---|---|---|---|---|
Accuracy | Top 90% Precision | Accuracy | Top 90% Precision | |||||||
Two-DCS | Full Mode | 89.69% | 98.69% | 89.45% | 98.81% | 3.88 | 2.92 | 0.021 | 0.012 | 0.42 |
W/o STN | 88.74% | 98.71% | 88.61% | 98.88% | 4.16 | 3.29 | 0.029 | 0.016 | 0.98 | |
W/o Inverse Blur | 87.74% | 98.63% | 87.41% | 98.72% | 4.35 | 3.94 | 0.030 | 0.017 | 1.05 | |
Using GS Only | 71.50% | 99.17% | 71.48% | 99.17% | 7.22 | 8.53 | 0.210 | 0.086 | 6.16 | |
Using GS w/o STN | 71.36% | 99.18% | 71.33% | 99.18% | 7.2 | 8.53 | 0.220 | 0.087 | 6.17 | |
Four-DCS | Full Mode | 89.47% | 98.43% | 89.13% | 98.81% | 4.03 | 2.99 | 0.023 | 0.013 | 0.46 |
W/o STN | 84.70% | 98.50% | 85.18% | 99.02% | 4.76 | 4.14 | 0.043 | 0.022 | 1.77 | |
W/o Inverse Blur | 86.23% | 98.10% | 87.05% | 99.00% | 4.38 | 3.45 | 0.028 | 0.016 | 0.70 | |
Using GS Only | 71.36% | 99.18% | 71.35% | 99.18% | 7.20 | 8.53 | 0.220 | 0.086 | 6.17 | |
Using GS w/o STN | 79.30% | 97.88% | 78.58% | 98.22% | 5.94 | 5.17 | 0.082 | 0.041 | 2.44 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nagiub, M.; Beuth, T.; Sistu, G.; Gotzig, H.; Eising, C. Depth Prediction Improvement for Near-Field iToF Lidar in Low-Speed Motion State. Sensors 2024, 24, 8020. https://doi.org/10.3390/s24248020
Nagiub M, Beuth T, Sistu G, Gotzig H, Eising C. Depth Prediction Improvement for Near-Field iToF Lidar in Low-Speed Motion State. Sensors. 2024; 24(24):8020. https://doi.org/10.3390/s24248020
Chicago/Turabian StyleNagiub, Mena, Thorsten Beuth, Ganesh Sistu, Heinrich Gotzig, and Ciarán Eising. 2024. "Depth Prediction Improvement for Near-Field iToF Lidar in Low-Speed Motion State" Sensors 24, no. 24: 8020. https://doi.org/10.3390/s24248020
APA StyleNagiub, M., Beuth, T., Sistu, G., Gotzig, H., & Eising, C. (2024). Depth Prediction Improvement for Near-Field iToF Lidar in Low-Speed Motion State. Sensors, 24(24), 8020. https://doi.org/10.3390/s24248020