TCG-Depth: A Two-Stage Symmetric Confidence-Guided Framework for Transparent Object Depth Completion
Abstract
1. Introduction
- We propose TCG-Depth, a two-stage confidence-guided depth completion framework specifically tailored for transparent objects, which introduces spatial reliability awareness to enhance the robustness of single-view depth completion.
- We design a reliability estimation mechanism that integrates pixel-wise confidence prediction with an image-level adaptive threshold, enabling the framework to autonomously distinguish between reliable and unreliable regions across diverse scenes and transparent object geometries.
- We develop a confidence-aware feature modulation strategy that selectively refines unreliable depth predictions while preserving high-confidence geometric information, significantly improving completion accuracy in challenging transparent regions.
2. Related Work
2.1. Transparent Object Perception
2.2. Single-View Transparent Object Perception
2.3. Multi-View Transparent Object Perception
3. Method
3.1. Overview
3.2. Initial Depth Completion
3.3. Confidence Mask Estimation
3.3.1. Pixel-Wise Confidence Prediction
3.3.2. Scene-Adaptive Threshold Regression
3.3.3. Soft Confidence-Gating Function
3.3.4. Confidence Supervision
3.4. Confidence-Aware Depth Refinement
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details and Baselines
4.4. Ablation Studies
4.4.1. Effectiveness of the Overall Framework Design
4.4.2. Effectiveness of Confidence-Based Region Separation
4.4.3. Effectiveness of Different Confidence-Based Refinement Strategies
4.5. Comparison to State-of-the-Art Methods
4.6. Cross-Dataset Generalization
4.7. Qualitative Comparison
5. Discussions and Limitations
5.1. Reliability Analysis of Confidence Mapping
5.2. Failure Cases and Environmental Constraints
5.3. Future Roadmap
- Real-world Deployment and Hardware Integration: A primary focus of our future roadmap is to validate the framework’s zero-shot generalization on diverse real-world captures. We aim to deploy TCG-Depth onto physical robotic manipulation platforms to evaluate its real-time performance and grasping success rates in unstructured environments.
- Robustness in Extreme Optical Scenarios: Future work will involve extensive experiments on edge cases, such as glassware with high-curvature geometries and environments under extreme lighting conditions (e.g., intense specular glare or low-light transmission). This will further refine the confidence module’s ability to characterize structural unreliability in perceptually degraded scenes.
- Complete Volumetric Reconstruction: Beyond depth completion, we plan to explore the integration of generative priors to infer missing geometric structures. This research will focus on recovering complete 3D point clouds for transparent objects, even in cases of severe texture loss or complex overlapping refractions.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| TCG-Depth | Confidence-Aware Two-Stage Depth Completion for Transparent Objects |
| RMSE | Root Mean Squared Error |
| MAE | Mean Absolute Error |
| REL | Absolute Relative Error |
| CMAM | Convolutional Multi-scale Attention Module |
References
- Schober, D.; Güldenring, R.; Love, J.; Nalpantidis, L. Vision-based robot manipulation of transparent liquid containers in a laboratory setting. In Proceedings of the 2025 IEEE/SICE International Symposium on System Integration (SII), Munich, Germany, 21–24 January 2025; pp. 1193–1200. [Google Scholar] [CrossRef]
- Ren, L.; Dong, J.; Liu, S.; Zhang, L.; Wang, L. Embodied intelligence toward future smart manufacturing in the era of AI foundation model. IEEE/ASME Trans. Mechatronics 2024, 30, 2632–2642. [Google Scholar] [CrossRef]
- Jiang, J.; Cao, G.; Deng, J.; Do, T.-T.; Luo, S. Robotic perception of transparent objects: A review. IEEE Trans. Artif. Intell. 2024, 5, 2547–2567. [Google Scholar] [CrossRef]
- Wei, L.; Ding, M.; Li, S. Monocular vision-based depth estimation of forward-looking scenes for mobile platforms. Appl. Sci. 2025, 15, 4267. [Google Scholar] [CrossRef]
- Liu, H.; Guo, D.; Cangelosi, A. Embodied intelligence: A synergy of morphology, action, perception and learning. ACM Comput. Surv. 2025, 57, 186. [Google Scholar] [CrossRef]
- Kim, J.; Jeon, M.; Jung, S.; Yang, W.; Jung, M.; Shin, J.; Kim, A. Transpose: Large-scale multispectral dataset for transparent objects. Int. J. Robot. Res. 2024, 43, 731–738. [Google Scholar] [CrossRef]
- Liang, Y.; Deng, B.; Liu, W.; Qin, J.; He, S. Monocular depth estimation for glass walls with context: A new dataset and method. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15081–15097. [Google Scholar] [CrossRef]
- Shi, J.; Yong, A.; Jin, Y.; Li, D.; Niu, H.; Jin, Z.; Wang, H. ASGrasp: Generalizable transparent object reconstruction and 6-DoF grasp detection from RGB-D active stereo camera. In 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 5441–5447. [Google Scholar]
- Kendall, A.; Gal, Y. What uncertainties do we need in Bayesian deep learning for computer vision? In Proceedings of the NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Poggi, M.; Kim, S.; Tosi, F.; Kim, S.; Aleotti, F.; Min, D.; Sohn, K.; Mattoccia, S. On the confidence of stereo matching in a deep-learning era: A quantitative evaluation. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 5293–5313. [Google Scholar] [CrossRef] [PubMed]
- Hwang, S.; Lee, K. Confidence-guided LiDAR depth completion for robust 3D object detection. IEEE Access 2025, 13, 159998–160009. [Google Scholar] [CrossRef]
- Zhu, L.; Mousavian, A.; Xiang, Y.; Mazhar, H.; van Eenbergen, J.; Debnath, S.; Fox, D. RGB-D local implicit function for depth completion of transparent objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4649–4658. [Google Scholar]
- Fang, H.; Fang, H.S.; Xu, S.; Xu, S.; Lu, C. TransCG: A large-scale real-world dataset for transparent object depth completion and a grasping baseline. IEEE Robot. Autom. Lett. 2022, 7, 7383–7390. [Google Scholar] [CrossRef]
- Fritz, M.; Bradski, G.; Karayev, S.; Darrell, T.; Black, M.J. An additive latent feature model for transparent object recognition. In Proceedings of the Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
- Maeno, K.; Nagahara, H.; Shimada, A.; Taniguchi, R.-I. Light field distortion feature for transparent object recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Bian, L.; Shi, P.; Chen, W.; Xu, J.; Yi, L.; Chen, R. TransTouch: Learning transparent objects depth sensing through sparse touches. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 9566–9573. [Google Scholar]
- Li, S.; Yu, H.; Ding, W.; Liu, H.; Ye, L.; Xia, C.; Wang, X.; Zhang, X. Visual–tactile fusion for transparent object grasping in complex backgrounds. IEEE Trans. Robot. 2023, 39, 3838–3856. [Google Scholar] [CrossRef]
- Wang, Y.R.; Zhao, Y.; Xu, H.; Eppel, S.; Aspuru-Guzik, A.; Shkurti, F.; Garg, A. MvTrans: Multi-view perception of transparent objects. arXiv 2023, arXiv:2302.11683. [Google Scholar] [CrossRef]
- Dai, Q.; Zhang, J.; Li, Q.; Wu, T.; Dong, H.; Liu, Z.; Tan, P.; Wang, H. Domain randomization-enhanced depth simulation and restoration for perceiving and grasping specular and transparent objects. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 374–391. [Google Scholar]
- Fang, I.; Shi, K.; He, X.; Tan, S.; Wang, Y.; Zhao, H.; Huang, H.; Yuan, W.; Feng, C.; Zhang, J. FusionSense: Bridging common sense, vision, and touch for robust sparse-view reconstruction. arXiv 2024, arXiv:2410.08282. [Google Scholar]
- Lee, J.; Kim, S.M.; Lee, Y.; Kim, Y.M. NFL: Normal field learning for 6-DoF grasping of transparent objects. IEEE Robot. Autom. Lett. 2023, 9, 819–826. [Google Scholar] [CrossRef]
- Lin, J.; Yeung, Y.; Ye, S.; Lau, R.W.H. Leveraging RGB-D data with cross-modal context mining for glass surface detection. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2025; Volume 39, pp. 5254–5261. [Google Scholar]
- Ichnowski, J.; Avigal, Y.; Kerr, J.; Goldberg, K. Dex-NeRF: Using a neural radiance field to grasp transparent objects. In Proceedings of the Conference on Robot Learning (CoRL), London, UK, 8–11 November 2021; pp. 526–536. [Google Scholar]
- Kerr, J.; Fu, L.; Huang, H.; Avigal, Y.; Tancik, M.; Ichnowski, J.; Kanazawa, A.; Goldberg, K. Evo-NeRF: Evolving NeRF for sequential robot grasping. In Proceedings of the Conference on Robot Learning (CoRL), Auckland, New Zealand, 14–18 December 2022; pp. 353–367. [Google Scholar]
- Dai, Q.; Zhu, Y.; Geng, Y.; Ruan, C.; Zhang, J.; Wang, H. GraspNeRF: Multiview-based 6-DoF grasp detection for transparent and specular objects using generalizable NeRF. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1757–1763. [Google Scholar]
- Tao, T.; Zheng, H.; Xiao, J.; Wu, W.; Yang, J. SRNet-Trans: A single-image guided depth completion regression network for transparent objects. Appl. Sci. 2025, 15, 10566. [Google Scholar] [CrossRef]
- Sajjan, S.; Moore, M.; Pan, M.; Nagaraja, G.; Lee, J.; Zeng, A.; Song, S. ClearGrasp: 3D shape estimation of transparent objects for manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 3634–3642. [Google Scholar]
- Liu, J.; Ma, H.; Guo, Y.; Zhao, Y.; Zhang, C.; Sui, W.; Zou, W. Monocular depth estimation and segmentation for transparent objects with iterative semantic and geometric fusion. arXiv 2025, arXiv:2502.14616. [Google Scholar] [CrossRef]
- Fan, X.; Ye, C.; Deng, A.; Wu, X.; Pan, M.; Yang, H. TDCNet: Transparent objects depth completion with CNN–Transformer dual-branch parallel network. IEEE Sens. J. 2025, 25, 36629–36641. [Google Scholar] [CrossRef]
- Li, T.; Chen, Z.; Liu, H.; Wang, C. FDCT: Fast depth completion for transparent objects. IEEE Robot. Autom. Lett. 2023, 8, 5823–5830. [Google Scholar] [CrossRef]
- Zhai, D.-H.; Yu, S.; Wang, W.; Guan, Y.; Xia, Y. Tcrnet: Transparent object depth completion with cascade refinements. IEEE Trans. Autom. Sci. Eng. 2024, 22, 1893–1912. [Google Scholar] [CrossRef]
- Xu, H.; Wang, Y.R.; Eppel, S.; Aspuru-Guzik, A.; Shkurti, F.; Garg, A. SeeingGlass: Joint point cloud and depth completion for transparent objects. arXiv 2021, arXiv:2110.00087. [Google Scholar]
- Liu, I.; Yang, E.; Tao, J.; Chen, R.; Zhang, X.; Ran, Q.; Liu, Z.; Su, H. ActiveZero: Mixed domain learning for active stereovision with zero annotation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13033–13042. [Google Scholar]
- Chen, R.; Liu, I.; Yang, E.; Tao, J.; Zhang, X.; Ran, Q. ActiveZero++: Mixed domain learning stereo and confidence-based depth completion with zero annotation. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14098–14113. [Google Scholar] [CrossRef] [PubMed]
- Lu, L.; Bu, C.; Su, Z.; Guan, B.; Yu, Q.; Pan, W.; Zhang, Q. Generative deep-learning-embedded asynchronous structured light for three-dimensional imaging. Adv. Photonics 2024, 6, 046004. [Google Scholar] [CrossRef]
- Su, Z.; Xu, Y.; Qin, X.; Zhang, D.; Wang, H.; Lu, L. Automatic crack tracking with consistent stereo imaging and wavelet analysis. Int. J. Mech. Sci. 2025, 304, 110662. [Google Scholar] [CrossRef]
- Dai, Q.; Zhu, Y.; Geng, Y.; Ruan, C.; Zhang, J.; Wang, H. GraspNeRF: Multiview-based 6-DoF grasp detection for transparent and specular objects using generalizable NeRF. arXiv 2022, arXiv:2210.06575. [Google Scholar]
- Chen, K.; James, S.; Sui, C.; Liu, Y.; Abbeel, P.; Dou, Q. StereoPose: Category-level 6D transparent object pose estimation from stereo images via back-view NOCS. arXiv 2022, arXiv:2211.01644. [Google Scholar]
- Gurrola-Ramos, J.; Dalmau, O.; Alarcon, T.E. A residual dense U-Net neural network for image denoising. IEEE Access 2021, 9, 31742–31754. [Google Scholar] [CrossRef]
- Wang, P.; Chen, P.; Yuan, Y.; Liu, D.; Huang, Z.; Hou, X.; Cottrell, G. Understanding convolution for semantic segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 1451–1460. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Yin, X.; Goudriaan, J.A.N.; Lantinga, E.A.; Vos, J.A.N.; Spiertz, H.J. A flexible sigmoid function of determinate growth. Ann. Bot. 2003, 91, 361–371. [Google Scholar] [CrossRef] [PubMed]











| Model | RGB | Depth | CBAM | Conf-Gating | RMSE ↓ | REL ↓ | MAE ↓ |
|---|---|---|---|---|---|---|---|
| Stage1-a (RGB only) | ✓ | 0.029 | 0.071 | 0.018 | |||
| Stage1-a (Depth only) | ✓ | 0.027 | 0.053 | 0.016 | |||
| Stage1-a (RGB+Depth) | ✓ | ✓ | 0.018 | 0.023 | 0.013 | ||
| Stage1-a (RGB+Depth +CBAM) | ✓ | ✓ | ✓ | 0.017 | 0.022 | 0.012 | |
| Stage2 w/o conf-gating | ✓ | ✓ | ✓ | 0.016 | 0.022 | 0.011 | |
| TCG-Depth (Ours) | ✓ | ✓ | ✓ | ✓ | 0.013 | 0.018 | 0.009 |
| Region | RMSE ↓ | REL ↓ | MAE ↓ |
|---|---|---|---|
| High-confidence regions | 0.009 | 0.012 | 0.006 |
| Low-confidence regions | 0.031 | 0.056 | 0.021 |
| Full Scene | 0.017 | 0.022 | 0.012 |
| Strategy | RMSE ↓ | MAE ↓ | REL ↓ |
|---|---|---|---|
| Binary confidence-guided gating (B-Mask) | 0.016 | 0.021 | 0.011 |
| Linear confidence-guided gating (L-Mask) | 0.016 | 0.020 | 0.010 |
| Soft confidence-guided gating (S-Mask, Ours) | 0.013 | 0.018 | 0.009 |
| Model | RMSE ↓ | REL ↓ | MAE ↓ | 1.05 ↑ | 1.10 ↑ | 1.25 ↑ | Params (M) ↓ | FPS ↑ |
|---|---|---|---|---|---|---|---|---|
| CG (ClearGrasp) | 0.054 | 0.083 | 0.037 | 50.48 | 68.68 | 95.28 | 42.5 | 21.6 |
| LIDF-Refine | 0.019 | 0.034 | 0.015 | 78.22 | 94.26 | 99.80 | 28.7 | 31.2 |
| DFNet | 0.018 | 0.023 | 0.013 | 83.76 | 95.67 | 99.71 | 26.1 | 33.0 |
| FDCT | 0.015 | 0.022 | 0.010 | 88.18 | 97.15 | 99.81 | 27.8 | 30.2 |
| TranspareNet | 0.026 | 0.023 | 0.013 | 88.45 | 96.25 | 99.42 | 30.3 | 27.4 |
| TCRNet | 0.017 | 0.020 | 0.010 | 88.96 | 96.94 | 99.87 | 34.5 | 22.8 |
| Ours | 0.013 | 0.018 | 0.009 | 91.58 | 97.65 | 99.82 | 38.2 | 21.4 |
| Model | RMSE ↓ | REL ↓ | MAE ↓ | |||
|---|---|---|---|---|---|---|
| CG (ClearGrasp) | 0.085 | 0.095 | 0.052 | 47.26 | 70.76 | 92.54 |
| LIDF-Refine | 0.152 | 0.225 | 0.139 | 9.86 | 20.63 | 46.02 |
| DFNet | 0.041 | 0.054 | 0.031 | 62.74 | 83.31 | 97.33 |
| FDCT | 0.042 | 0.058 | 0.033 | 60.12 | 81.45 | 96.88 |
| TranspareNet | 0.045 | 0.071 | 0.040 | 33.43 | 70.14 | 99.40 |
| TCRNet | 0.034 | 0.049 | 0.027 | 63.67 | 86.63 | 99.47 |
| Ours | 0.031 | 0.042 | 0.025 | 67.26 | 88.16 | 99.03 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Huang, K.; Yao, C.; Lv, K.; Ye, S.; Zhuang, J. TCG-Depth: A Two-Stage Symmetric Confidence-Guided Framework for Transparent Object Depth Completion. Symmetry 2026, 18, 405. https://doi.org/10.3390/sym18030405
Huang K, Yao C, Lv K, Ye S, Zhuang J. TCG-Depth: A Two-Stage Symmetric Confidence-Guided Framework for Transparent Object Depth Completion. Symmetry. 2026; 18(3):405. https://doi.org/10.3390/sym18030405
Chicago/Turabian StyleHuang, Kaixin, Chendong Yao, Ke Lv, Sichao Ye, and Jiayan Zhuang. 2026. "TCG-Depth: A Two-Stage Symmetric Confidence-Guided Framework for Transparent Object Depth Completion" Symmetry 18, no. 3: 405. https://doi.org/10.3390/sym18030405
APA StyleHuang, K., Yao, C., Lv, K., Ye, S., & Zhuang, J. (2026). TCG-Depth: A Two-Stage Symmetric Confidence-Guided Framework for Transparent Object Depth Completion. Symmetry, 18(3), 405. https://doi.org/10.3390/sym18030405

