# Deep Learning for Transient Image Reconstruction from ToF Data

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. iToF Cameras

#### 1.2. Transient Cameras

## 2. Related Work

## 3. Proposed Approach

#### 3.1. The Transient Imaging Prior

#### 3.2. Training Pipeline

#### 3.2.1. Backscattering Model

#### 3.2.2. Predictive Model

#### 3.2.3. Training of the Deep Learning Model

#### 3.2.4. Bilateral Filtering

## 4. Training and Test Datasets

## 5. Experimental Results

#### 5.1. Ablation Studies

## 6. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## References

- Marin, G.; Dominio, F.; Zanuttigh, P. Hand gesture recognition with jointly calibrated leap motion and depth sensor. Multimed. Tools Appl.
**2016**, 75, 14991–15015. [Google Scholar] [CrossRef] - Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part I. IEEE Robot. Autom. Mag.
**2006**, 13, 99–110. [Google Scholar] [CrossRef][Green Version] - Blais, F. Review of 20 years of range sensor development. J. Electron. Imaging
**2004**, 13, 231–243. [Google Scholar] [CrossRef] - Kim, Y.M.; Theobalt, C.; Diebel, J.; Kosecka, J.; Miscusik, B.; Thrun, S. Multi-view image and ToF sensor fusion for dense 3D reconstruction. In Proceedings of the International Conference on Computer Vision Workshops (ICCVW), Kyoto, Japan, 27 September–4 October 2009; pp. 1542–1549. [Google Scholar] [CrossRef][Green Version]
- Kerl, C.; Souiai, M.; Sturm, J.; Cremers, D. Towards Illumination-Invariant 3D Reconstruction Using ToF RGB-D Cameras. In Proceedings of the 2014 2nd International Conference on 3D Vision, Tokyo, Japan, 8–11 December 2014; Volume 1, pp. 39–46. [Google Scholar] [CrossRef][Green Version]
- Tang, Y.; Chen, M.; Lin, Y.; Huang, X.; Huang, K.; He, Y.; Li, L. Vision-Based Three-Dimensional Reconstruction and Monitoring of Large-Scale Steel Tubular Structures. Adv. Civ. Eng.
**2020**, 2020, 1236021. [Google Scholar] [CrossRef] - Tang, Y.; Li, L.; Wang, C.; Chen, M.; Feng, W.; Zou, X.; Huang, K. Real-time detection of surface deformation and strain in recycled aggregate concrete-filled steel tubular columns via four-ocular vision. Robot. Comput. Integr. Manuf.
**2019**, 59, 36–46. [Google Scholar] [CrossRef] - Zhu, Q.; Chen, L.; Li, Q.; Li, M.; Nüchter, A.; Wang, J. 3D LIDAR point cloud based intersection recognition for autonomous driving. In Proceedings of the 2012 IEEE Intelligent Vehicles Symposium, Madrid, Spain, 3–7 June 2012; pp. 456–461. [Google Scholar] [CrossRef]
- Wang, Y.; Chao, W.L.; Garg, D.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-LiDAR From Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Amzajerdian, F.; Pierrottet, D.; Petway, L.; Hines, G.; Roback, V. Lidar systems for precision navigation and safe landing on planetary bodies. In International Symposium on Photoelectronic Detection and Imaging 2011: Laser Sensing and Imaging; and Biological and Medical Applications of Photonics Sensing and Imaging; Amzajerdian, F., Chen, W., Gao, C., Xie, T., Eds.; International Society for Optics and Photonics, SPIE: Bellingham, WA, USA, 2011; Volume 8192, pp. 27–33. [Google Scholar] [CrossRef][Green Version]
- Dhond, U.R.; Aggarwal, J.K. Structure from stereo-a review. IEEE Trans. Syst. Man, Cybern.
**1989**, 19, 1489–1510. [Google Scholar] [CrossRef][Green Version] - Zanuttigh, P.; Marin, G.; Dal Mutto, C.; Dominio, F.; Minto, L.; Cortelazzo, G.M. Time-of-Flight and Structured Light Depth Cameras: Technology and Applications; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Dubayah, R.O.; Drake, J.B. Lidar remote sensing for forestry. J. For.
**2000**, 98, 44–46. [Google Scholar] - Horaud, R.; Hansard, M.; Evangelidis, G.; Clément, M. An Overview of Depth Cameras and Range Scanners Based on Time-of-Flight Technologies. Mach. Vis. Appl.
**2016**, 27, 1005–1020. [Google Scholar] [CrossRef][Green Version] - Frank, M.; Plaue, M.; Rapp, H.; Koethe, U.; Jähne, B.; Hamprecht, F.A. Theoretical and experimental error analysis of continuous-wave time-of-flight range cameras. Opt. Eng.
**2009**, 48, 1–16. [Google Scholar] [CrossRef][Green Version] - Gupta, M.; Nayar, S.K.; Hullin, M.B.; Martin, J. Phasor Imaging: A Generalization of Correlation-Based Time-of-Flight Imaging. ACM Trans. Graph.
**2015**, 34, 1–18. [Google Scholar] [CrossRef] - Su, S.; Heide, F.; Wetzstein, G.; Heidrich, W. Deep End-to-End Time-of-Flight Imaging. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- Kilho, S.; Liu, M.; Taguchi, Y. Learning to remove multipath distortions in Time-of-Flight range images for a robotic arm setup. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 3390–3397. [Google Scholar] [CrossRef][Green Version]
- Marco, J.; Hernandez, Q.; Muñoz, A.; Dong, Y.; Jarabo, A.; Kim, M.H.; Tong, X.; Gutierrez, D. DeepToF: Off-the-Shelf Real-Time Correction of Multipath Interference in Time-of-Flight Imaging. ACM Trans. Graph.
**2017**, 36, 1–12. [Google Scholar] [CrossRef] - Agresti, G.; Schaefer, H.; Sartor, P.; Zanuttigh, P. Unsupervised Domain Adaptation for ToF Data Denoising with Adversarial Learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Lefloch, D.; Nair, R.; Lenzen, F.; Schäfer, H.; Streeter, L.; Cree, M.J.; Koch, R.; Kolb, A. Technical foundation and calibration methods for time-of-flight cameras. In Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications; Springer: Berlin/Heidelberg, Germany, 2013; pp. 3–24. [Google Scholar]
- Lindner, M.; Schiller, I.; Kolb, A.; Koch, R. Time-of-flight sensor calibration for accurate range sensing. Comput. Vis. Image Underst.
**2010**, 114, 1318–1328. [Google Scholar] [CrossRef] - Lenzen, F.; Schäfer, H.; Garbe, C. Denoising time-of-flight data with adaptive total variation. In International Symposium on Visual Computing; Springer: Berlin/Heidelberg, Germany, 2011; pp. 337–346. [Google Scholar]
- Agresti, G.; Zanuttigh, P. Deep Learning for Multi-Path Error Removal in ToF Sensors. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Jarabo, A.; Masia, B.; Marco, J.; Gutierrez, D. Recent advances in transient imaging: A computer graphics and vision perspective. Vis. Inform.
**2017**, 1, 65–79. [Google Scholar] [CrossRef] - Kirmani, A.; Hutchison, T.; Davis, J.; Raskar, R. Looking around the corner using transient imaging. In Proceedings of the International Conference on Computer Vision (ICCV), Kyoto, Japan, 29 September–2 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 159–166. [Google Scholar]
- Sun, Q.; Dun, X.; Peng, Y.; Heidrich, W. Depth and Transient Imaging with Compressive SPAD Array Cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
- O’Toole, M.; Heide, F.; Xiao, L.; Hullin, M.B.; Heidrich, W.; Kutulakos, K.N. Temporal Frequency Probing for 5D Transient Analysis of Global Light Transport. ACM Trans. Graph.
**2014**, 33, 1–11. [Google Scholar] [CrossRef][Green Version] - O’Toole, M.; Lindell, D.B.; Wetzstein, G. Confocal non-line-of-sight imaging based on the light-cone transform. Nature
**2018**, 555, 338–341. [Google Scholar] [CrossRef] - Xin, S.; Nousias, S.; Kutulakos, K.N.; Sankaranarayanan, A.C.; Narasimhan, S.G.; Gkioulekas, I. A Theory of Fermat Paths for Non-Line-Of-Sight Shape Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Liu, X.; Guillén, I.; La Manna, M.; Nam, J.H.; Reza, S.A.; Le, T.H.; Jarabo, A.; Gutierrez, D.; Velten, A. Non-line-of-sight imaging using phasor-field virtual wave optics. Nature
**2019**, 572, 620–623. [Google Scholar] [CrossRef][Green Version] - Dong, G.; Zhang, Y.; Xiong, Z. Spatial Hierarchy Aware Residual Pyramid Network for Time-of-Flight Depth Denoising. In Proceedings of the European Conference on Computer Vision (ECCV); Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 35–50. [Google Scholar]
- Fuchs, S. Multipath Interference Compensation in Time-of-Flight Camera Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Istanbul, Turkey, 23–26 August 2010; pp. 3583–3586. [Google Scholar] [CrossRef]
- Fuchs, S.; Suppa, M.; Hellwich, O. Compensation for Multipath in ToF Camera Measurements Supported by Photometric Calibration and Environment Integration. In Computer Vision Systems; Chen, M., Leibe, B., Neumann, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 31–41. [Google Scholar]
- Jiménez, D.; Pizarro, D.; Mazo, M.; Palazuelos, S. Modeling and correction of multipath interference in time of flight cameras. Image Vis. Comput.
**2014**, 32, 1–13. [Google Scholar] [CrossRef] - Freedman, D.; Smolin, Y.; Krupka, E.; Leichter, I.; Schmidt, M. SRA: Fast Removal of General Multipath for ToF Sensors. In Proceedings of the European Conference on Computer Vision (ECCV); Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 234–249. [Google Scholar]
- Bhandari, A.; Feigin, M.; Izadi, S.; Rhemann, C.; Schmidt, M.; Raskar, R. Resolving multipath interference in Kinect: An inverse problem approach. In Proceedings of the SENSORS, 2014 IEEE, Valencia, Spain, 2–5 November 2014; pp. 614–617. [Google Scholar] [CrossRef]
- Guo, Q.; Frosio, I.; Gallo, O.; Zickler, T.; Kautz, J. Tackling 3D ToF Artifacts Through Learning and the FLAT Dataset. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Marin, G.; Zanuttigh, P.; Mattoccia, S. Reliable fusion of tof and stereo depth driven by confidence measures. In European Conference on Computer Vision (ECCV); Springer: Cham, Switzerland, 2016. [Google Scholar]
- Gudmundsson, S.A.; Aenaes, H.; Larsen, R. Fusion of stereo vision and time-of-flight imaging for improved 3d estimation. Int. J. Intell. Syst. Technol. Appl.
**2008**, 5, 425–433. [Google Scholar] [CrossRef][Green Version] - Poggi, M.; Agresti, G.; Tosi, F.; Zanuttigh, P.; Mattoccia, S. Confidence Estimation for ToF and Stereo Sensors and Its Application to Depth Data Fusion. IEEE Sensors J.
**2020**, 20, 1411–1421. [Google Scholar] [CrossRef] - Whyte, R.; Streeter, L.; Cree, M.J.; Dorrington, A.A. Resolving multiple propagation paths in time of flight range cameras using direct and global separation methods. Opt. Eng.
**2015**, 54, 1–9. [Google Scholar] [CrossRef][Green Version] - Agresti, G.; Zanuttigh, P. Combination of Spatially-Modulated ToF and Structured Light for MPI-Free Depth Estimation. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Naik, N.; Kadambi, A.; Rhemann, C.; Izadi, S.; Raskar, R.; Bing Kang, S. A Light Transport Model for Mitigating Multipath Interference in Time-of-Flight Sensors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Jongenelen, A.P.P.; Carnegie, D.A.; Payne, A.D.; Dorrington, A.A. Maximizing precision over extended unambiguous range for TOF range imaging systems. In Proceedings of the 2010 IEEE Instrumentation Measurement Technology Conference Proceedings, Austin, TX, USA, 3–6 May 2010; pp. 1575–1580. [Google Scholar] [CrossRef][Green Version]
- Wójcik, P.I.; Kurdziel, M. Training neural networks on high-dimensional data using random projection. Pattern Anal. Appl.
**2019**, 22, 1221–1231. [Google Scholar] [CrossRef][Green Version] - Liu, B.; Wei, Y.; Zhang, Y.; Yang, Q. Deep Neural Networks for High Dimension, Low Sample Size Data. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, Melbourne, Australia, 19–25 August 2017; pp. 2287–2293. [Google Scholar] [CrossRef][Green Version]
- Dorrington, A.A.; Godbaz, J.P.; Cree, M.J.; Payne, A.D.; Streeter, L.V. Separating true range measurements from multi-path and scattering interference in commercial range cameras. In Three-Dimensional Imaging, Interaction, and Measurement; International Society for Optics and Photonics: Bellingham, WA, USA, 2011; Volume 7864, p. 786404. [Google Scholar]

**Figure 5.**Depth error maps on the real datasets ${S}_{4}$ (

**a**) and ${S}_{5}$ (

**b**) obtained applying our method and a single frequency prediction at 60 MHz. Blue colour indicates depth underestimation, while red colour indicates depth overestimation. The dark blue areas are those for which we do not have ground truth depth available. The Mean Absolute Error (MAE) for each scene is also reported.

**Figure 6.**Network prediction for selected pixels in an image. The dashed lines correspond to the depth ground truth values while the red plots indicate the predicted backscattering vectors.

**Figure 7.**Qualitative comparison between several state-of-the-art MPI correction algorithms on some real scenes sampled from ${S}_{4}$ and ${S}_{5}$. On the left side the depth ground truth is shown, while the others display the error between the prediction of each method and the ground truth.

**Figure 8.**Depth profile estimation in proximity of a corner. The left plot reports the depth ground map, while the right one compares with our approach the depth profile over the highlighted line on the left image, estimated by different state-of-the-art MPI correction algorithms.

**Figure 9.**Training curves obtained running the network optimization for noise levels ${\sigma}_{v}=$ {0.00;0.01;0.02;0.03} on training and validation sets. The metrics monitored are, from left to right, the measurement error, the reconstruction error, the overall error and the MAE on the depth estimated using the predicted output backscattering vector on synthetic data.

**Figure 10.**Performance of a network trained on synthetic data with or without noise on the S3 dataset.

**Figure 11.**(

**a**) Depth error maps on dataset ${S}_{3}$ obtained without spatial correlation. (

**b**) Predicted depth error maps obtained with increasing kernel size on three real scenes, from left to right respectively P = 1, 2 and 3.

**Figure 12.**Behaviour of MAE, MSE and Earth Mover Distance (EMD loss functions varying amplitude $\widehat{A}$ and position $\widehat{T}$ of the predicted direct component).

Solution | # of Frequencies | Complexity | MPI Type | Output | |
---|---|---|---|---|---|

Fuchs et al. [33,34] | Iterative | 1 | High | 2-sparse | Depth |

Jiménez et al. [35] | Iterative | 1 | High | 2-sparse | Depth |

SRA [36] | LP | $K>1$ | Avg | M-sparse | Backscattering |

Bhandari et al. [37] | Deterministic | $2K+1$ | High | K-sparse | Depth |

Son et al. [18] | FCN | Low | General | Depth + Object Boundary | |

DeepToF [19] | CNN | 1 | High | General | Depth |

Agresti et al. [24] | CNN | 3 | Avg | General | Depth |

Guo et al. [38] | CNN | 3 | Avg | General | Depth |

Su et al. [17] | CNN | 2 | High | General | Depth |

Agresti et al. [20] | CNN + UDA | 3 | High | General | Depth |

Dong et al. [32] | CNN | 1 | High | General | Depth |

Our Approach | CNN | 3 | Low | 2-sparse | Backscattering |

**Table 2.**Properties of the real-world datasets ${S}_{3},{S}_{4}$ and ${S}_{5}$. In this work we only used the data relating to the frequencies marked in bold.

Dataset | Type | Depth GT | Trans. GT | No. Scenes | Spatial Res. | Modulation Frequencies |
---|---|---|---|---|---|---|

${S}_{3}$ | Real | yes | no | 8 | $320\times 239$ | 10, $\mathbf{20}$, 30, 40, $\mathbf{50}$ and $\mathbf{60}$ $\mathrm{MHz}$ |

${S}_{4}$ | Real | yes | no | 8 | $320\times 239$ | $\mathbf{20}$, $\mathbf{50}$ and $\mathbf{60}$ $\mathrm{MHz}$ |

${S}_{5}$ (box) | Real | yes | no | 8 | $320\times 239$ | 10, $\mathbf{20}$, 30, 40, $\mathbf{50}$ and $\mathbf{60}$ $\mathrm{MHz}$ |

**Table 3.**Quantitative comparison between several state-of-the-art Multi-Path Interference (MPI) correction algorithms on the real datasets ${S}_{4}$ and ${S}_{5}$. Each row reports the depth MAE and the relative error obtained applying the corresponding method w.r.t. the maximum employed frequency (60 MHz for all methods except 20 MHz for [19] (*)).

Method | ${\mathit{S}}_{4}$ Dataset | ${\mathit{S}}_{5}$ Dataset | ||
---|---|---|---|---|

MAE | Relative | MAE | Relative | |

[cm] | Error | [cm] | Error | |

Single frequency (20 MHz) | $7.28$ | - | $5.06$ | - |

Single frequency (60 MHz) | $5.43$ | - | $3.62$ | - |

SRA [36] | $5.11$ | $94.1\%$ | $3.37$ | $93.1\%$ |

DeepToF [19] | $5.13$ | $70.5\%$ * | $6.68$ | $132\%$ * |

+ calibration | $5.46$ | $75\%$ * | $3.36$ | $66.4\%$ * |

Agresti et al. [24] | $3.19$ | $58.7\%$ | $2.22$ | $60.5\%$ |

Agresti et al. [20] | $2.36$ | $43.5\%$ | $1.66$ | $46.1\%$ |

Our Approach | $2.79$ | $51.4\%$ | $2.27$ | $62.7\%$ |

Ours + bilateral filtering | $2.60$ | $47.9\%$ | $2.12$ | $58.6\%$ |

Our Approach (without spatial correlation) | $3.43$ | $63.2\%$ | $2.52$ | $69.6\%$ |

Ours + bilateral filtering | $2.99$ | $55.1\%$ | $1.88$ | $52.0\%$ |

**Table 4.**MAE on the ${S}_{3}$ dataset for different amounts of noise. Window size is $3\times 3$. The best performance (in bold) is achieved with a noise standard deviation ${\sigma}_{v}=0.02$.

Mean Absolute Error for Noise with Different Standard Deviations (${\mathit{\sigma}}_{\mathit{v}}$) | ||||
---|---|---|---|---|

${\sigma}_{v}$ | $0.00$ | $0.01$ | $0.02$ | $0.03$ |

MAE [$\mathrm{cm}$] | $4.02$ | $2.65$ | $\mathbf{2}.\mathbf{58}$ | $2.83$ |

**Table 5.**MAE on the ${S}_{3}$ dataset for different window sizes. Noise level is ${\sigma}_{v}=0.02$. The best performance (in bold) is achieved with a $3\times 3$ window size.

Mean Absolute Error for Different Window Sizes | ||||
---|---|---|---|---|

Window size | $1\times 1$ | $3\times 3$ | $5\times 5$ | $7\times 7$ |

MAE [$\mathrm{cm}$] | $2.72$ | $\mathbf{2}.\mathbf{58}$ | $2.61$ | $2.80$ |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Buratto, E.; Simonetto, A.; Agresti, G.; Schäfer, H.; Zanuttigh, P. Deep Learning for Transient Image Reconstruction from ToF Data. *Sensors* **2021**, *21*, 1962.
https://doi.org/10.3390/s21061962

**AMA Style**

Buratto E, Simonetto A, Agresti G, Schäfer H, Zanuttigh P. Deep Learning for Transient Image Reconstruction from ToF Data. *Sensors*. 2021; 21(6):1962.
https://doi.org/10.3390/s21061962

**Chicago/Turabian Style**

Buratto, Enrico, Adriano Simonetto, Gianluca Agresti, Henrik Schäfer, and Pietro Zanuttigh. 2021. "Deep Learning for Transient Image Reconstruction from ToF Data" *Sensors* 21, no. 6: 1962.
https://doi.org/10.3390/s21061962