RGB-D Mirror Segmentation with Reliability-Guided Residual Correction
Abstract
1. Introduction
- We extend SATNet to RGB-D mirror segmentation by introducing a dedicated depth branch that injects hierarchical sensor-depth features into the symmetry-aware decoder.
- We propose a Reliability-Guided Residual Correction Module (RGRCM), which internally constructs dual-depth evidence through a Dual-Depth Evidence Block (DDEB) and performs uncertainty-aware residual correction for final prediction refinement.
- We demonstrate through extensive experiments that the proposed framework improves mirror segmentation performance in a stable manner, especially in terms of region overlap and balanced error.
2. Related Work
2.1. Mirror Segmentation from RGB Images
2.2. RGB-D Mirror Segmentation
2.3. Reliability-Aware RGB-D Fusion and Dual-Depth Cues
3. Method
3.1. Overall Architecture
3.2. Depth Branch
3.3. Reliability-Guided Residual Correction Module
3.3.1. Dual-Depth Evidence Block
3.3.2. Main Prediction and Uncertainty Estimation
3.3.3. Reliability-Guided Residual Correction
3.4. Loss Function
4. Experiments and Results
4.1. Experimental Setup
Dataset and Evaluation Metrics
4.2. Implementation Details
4.3. Comparison with State-of-the-Art Methods
4.4. Qualitative Results
4.5. Ablation Study
4.5.1. Effect of the Proposed Components
4.5.2. Uncertainty Map Visualization
4.5.3. Statistical Stability
4.5.4. Robustness to Sensor-Depth Corruption
4.5.5. Effect of Sensor Depth, Predicted Depth, and Dual-Depth Evidence
4.5.6. Effect of the Safe Loss Term
4.5.7. Hyperparameter Sensitivity Analysis
4.6. Complexity Analysis
4.7. Failure Case Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Yang, X.; Mei, H.; Xu, K.; Wei, X.; Yin, B.; Lau, R.W.H. Where Is My Mirror? In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8809–8818. [Google Scholar]
- Mei, H.; Dong, B.; Dong, W.; Peers, P.; Yang, X.; Zhang, Q.; Wei, X. Depth-Aware Mirror Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 3044–3053. [Google Scholar]
- Whelan, T.; Goesele, M.; Lovegrove, S.; Straub, J.; Green, S.; Szeliski, R.; Butterfield, S.; Verma, S.; Newcombe, R. Reconstructing Scenes with Mirror and Glass Surfaces. ACM Trans. Graph. 2018, 37, 102:1–102:11. [Google Scholar] [CrossRef]
- Mei, H.; Yang, X.; Wang, Y.; Liu, Y.; He, S.; Zhang, Q.; Wei, X.; Lau, R.W.H. Don’t Hit Me! Glass Detection in Real-World Scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3684–3693. [Google Scholar]
- Lin, J.; Wang, G.; Lau, R.W.H. Progressive Mirror Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 3697–3705. [Google Scholar]
- Tan, X.; Lin, J.; Xu, K.; Chen, P.; Ma, L.; Lau, R.W.H. Mirror Detection with the Visual Chirality Cue. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3492–3504. [Google Scholar] [CrossRef]
- Guan, H.; Lin, J.; Lau, R.W.H. Learning Semantic Associations for Mirror Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5941–5950. [Google Scholar]
- Huang, T.; Dong, B.; Lin, J.; Liu, X.; Lau, R.W.H.; Zuo, W. Symmetry-Aware Transformer-Based Mirror Detection. Proc. AAAI Conf. Artif. Intell. 2023, 37, 935–943. [Google Scholar] [CrossRef]
- Zhou, W.; Cai, Y.; Dong, X.; Qiang, F.; Qiu, W. ADRNet-S*: Asymmetric Depth Registration Network via Contrastive Knowledge Distillation for RGB-D Mirror Segmentation. Inf. Fusion 2024, 108, 102392. [Google Scholar] [CrossRef]
- Ying, X.; Chuah, M.C. UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 20–37. [Google Scholar]
- Yang, L.; Kang, B.; Huang, Z.; Zhao, Z.; Xu, X.; Feng, J.; Zhao, H. Depth Anything V2. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- He, R.; Lin, J.; Lau, R.W.H. Efficient Mirror Detection via Multi-level Heterogeneous Learning. Proc. AAAI Conf. Artif. Intell. 2023, 37, 790–798. [Google Scholar] [CrossRef]
- Zhou, W.; Cai, Y.; Zhang, L.; Yan, W.; Yu, L. UTLNet: Uncertainty-Aware Transformer Localization Network for RGB-Depth Mirror Segmentation. IEEE Trans. Multimed. 2024, 26, 4564–4574. [Google Scholar] [CrossRef]
- Zhou, W.; Cai, Y.; Qiang, F. Morphology-Guided Network via Knowledge Distillation for RGB-D Mirror Segmentation. IEEE Trans. Intell. Transp. Syst. 2024, 25, 17382–17391. [Google Scholar] [CrossRef]
- Zhou, W.; Zhang, H.; Liu, Y.; Luo, T. Enhancing RGB-D Mirror Segmentation with a Neighborhood-Matching and Demand-Modal Adaptive Network Using Knowledge Distillation. IEEE Trans. Autom. Sci. Eng. 2025, 22, 12679–12692. [Google Scholar] [CrossRef]
- Kurohiji, R.; Hachiya, H. Depth Inconsistency-based Spatial-channel Attention Gate for Mirror Segmentation. In Proceedings of the 36th British Machine Vision Conference (BMVC), Sheffield, UK, 24–27 November 2025. [Google Scholar]
- Hu, X.; Yang, K.; Fei, L.; Wang, K. ACNet: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1440–1444. [Google Scholar]
- Chen, X.; Lin, K.-Y.; Wang, J.; Wu, W.; Qian, C.; Li, H.; Zeng, G. Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 561–577. [Google Scholar]
- Fan, D.-P.; Zhai, Y.; Borji, A.; Yang, J.; Shao, L. BBS-Net: RGB-D Salient Object Detection with a Bifurcated Backbone Strategy Network. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 275–292. [Google Scholar]
- Wang, S.; Jiang, F.; Xu, B. Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection. Sensors 2023, 23, 7221. [Google Scholar] [CrossRef] [PubMed]
- Peng, Y.; Zhai, Z.; Feng, M. SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection. Sensors 2024, 24, 1117. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Ghosh, D.K.; Jung, Y.J. Event-based video deblurring based on image and event feature fusion. Expert Syst. Appl. 2023, 223, 119917. [Google Scholar] [CrossRef]
- Ghosh, D.K.; Jung, Y.J. Two-stage cross-fusion network for stereo event-based depth estimation. Expert Syst. Appl. 2024, 241, 122743. [Google Scholar] [CrossRef]
- Ghosh, D.K.; Jung, Y.J. Depth cue fusion for event-based stereo depth estimation. Inf. Fusion 2025, 117, 102891. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 10012–10022. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Xie, Z.; Wang, S.; Yu, Q.; Tan, X.; Xie, Y. CSFwinformer: Cross-Space-Frequency Window Transformer for Mirror Detection. IEEE Trans. Image Process. 2024, 33, 1853–1867. [Google Scholar] [CrossRef] [PubMed]
- Zha, M.; Fu, F.; Pei, Y.; Wang, G.; Li, T.; Tang, X.; Yang, Y.; Shen, H.T. Dual Domain Perception and Progressive Refinement for Mirror Detection. IEEE Trans. Circuits Syst. Video Technol. 2024, 34, 11942–11953. [Google Scholar] [CrossRef]
- Shao, Z.; Chen, R.; Shi, X.; Liu, B.; Li, C.; Ma, L.; Yeung, D.-Y. Mirror Detection via Multi-Directional Similarity Perception and Spectral Saliency Enhancement. IEEE Trans. Circuits Syst. Video Technol. 2025, 35, 10099–10109. [Google Scholar] [CrossRef]
- Meng, Q.; Liu, Y.; Hu, R.; Liang, M.; Yan, J.; Zhu, L. SAMirror: Enhancing Mirror Detection via Integrated Visual-Depth Cues in Segment Anything Model. Vis. Comput. 2025, 41, 12679–12690. [Google Scholar] [CrossRef]







| Type | Method | IoU↑ | ↑ | MAE↓ | BER↓ |
|---|---|---|---|---|---|
| RGB mirror methods | VCNet | 73.01 | 0.849 | 0.052 | 10.42 |
| SATNet | 80.69 | 0.877 | 0.030 | 7.33 | |
| CSFwinformer-B | 78.66 | 0.863 | 0.031 | 8.57 | |
| DPRNet | 76.10 | 0.811 | 0.047 | – | |
| S2MD | 78.60 | 0.866 | 0.030 | – | |
| SAMirror | 79.20 | 0.836 | 0.026 | 10.02 | |
| RGB-D mirror methods | PDNet | 77.77 | 0.825 | 0.042 | 7.77 |
| SANet | 78.43 | 0.834 | 0.041 | 8.16 | |
| NDANet-S* | 79.93 | 0.844 | 0.035 | 7.56 | |
| MGNet-S* | 80.80 | 0.859 | 0.030 | 7.39 | |
| UTLNet | 80.50 | 0.858 | 0.032 | 7.23 | |
| ADRNet-S* | 82.21 | 0.871 | 0.030 | 7.02 | |
| Kurohiji and Hachiya † | 70.94 | 0.881 | 0.079 | – | |
| Ours | 83.57 | 0.899 | 0.026 | 6.26 |
| Method | IoU↑ | ↑ | MAE↓ | BER↓ |
|---|---|---|---|---|
| Baseline (SATNet) | 80.69 | 0.877 | 0.030 | 7.33 |
| SATNet + Depth Branch | 83.02 | 0.898 | 0.028 | 6.72 |
| SATNet + Depth Branch + RGRCM (simple depth evidence) | 83.41 | 0.900 | 0.028 | 6.48 |
| SATNet + Depth Branch + RGRCM (full) | 83.57 | 0.899 | 0.026 | 6.26 |
| Method | IoU↑ | ↑ | MAE↓ | BER↓ |
|---|---|---|---|---|
| Baseline (SATNet) | 80.13 ± 0.95 | 0.875 ± 0.006 | 0.032 ± 0.002 | 7.84 ± 0.49 |
| + Depth Branch | 83.04 ± 0.25 | 0.897 ± 0.003 | 0.028 ± 0.001 | 6.64 ± 0.20 |
| + DDEB + RGRCM (Full) | 83.42 ± 0.16 | 0.898 ± 0.002 | 0.027 ± 0.001 | 6.37 ± 0.14 |
| Paired t-test p-values vs. Full model: | ||||
| Baseline (SATNet) | 0.0015 ** | 0.0006 *** | 0.0053 ** | 0.0038 ** |
| + Depth Branch | 0.0294 * | 0.8712 | 0.0993 | 0.0695 |
| Method | Corruption | IoU↑ | ↑ | MAE↓ | BER↓ |
|---|---|---|---|---|---|
| SATNet + Depth Branch | No corruption | 83.02 | 0.898 | 0.028 | 6.72 |
| Missing 30% | 77.79 (−5.23) | 0.859 | 0.033 | 9.97 | |
| Missing 50% | 76.71 (−6.31) | 0.851 | 0.034 | 10.60 | |
| Missing 70% | 76.60 (−6.42) | 0.851 | 0.034 | 10.69 | |
| Noise = 3 | 82.68 (−0.34) | 0.896 | 0.029 | 6.95 | |
| Noise = 5 | 82.32 (−0.70) | 0.893 | 0.029 | 7.16 | |
| Noise = 10 | 81.53 (−1.49) | 0.888 | 0.030 | 7.67 | |
| Block 10% | 82.47 (−0.55) | 0.893 | 0.028 | 6.76 | |
| Block 30% | 81.37 (−1.65) | 0.884 | 0.029 | 6.97 | |
| Block 50% | 80.73 (−2.29) | 0.879 | 0.029 | 7.14 | |
| Ours | No corruption | 83.57 | 0.899 | 0.026 | 6.26 |
| Missing 30% | 78.94 (−4.63) | 0.868 | 0.031 | 9.37 | |
| Missing 50% | 77.93 (−5.63) | 0.861 | 0.032 | 9.98 | |
| Missing 70% | 77.87 (−5.69) | 0.861 | 0.032 | 10.04 | |
| Noise = 3 | 83.32 (−0.24) | 0.899 | 0.027 | 6.45 | |
| Noise = 5 | 83.11 (−0.46) | 0.898 | 0.027 | 6.58 | |
| Noise = 10 | 82.66 (−0.90) | 0.896 | 0.028 | 6.89 | |
| Block 10% | 83.21 (−0.35) | 0.897 | 0.027 | 6.41 | |
| Block 30% | 82.22 (−1.35) | 0.889 | 0.028 | 6.85 | |
| Block 50% | 81.64 (−1.92) | 0.886 | 0.029 | 7.14 |
| Method | IoU↑ | ↑ | MAE↓ | BER↓ |
|---|---|---|---|---|
| Sensor depth only (without DDEB) | 83.02 | 0.898 | 0.028 | 6.72 |
| Predicted depth only (without DDEB; predicted depth as depth-branch input) | 82.50 | 0.882 | 0.029 | 6.85 |
| Sensor + predicted depth (full model with DDEB) | 83.57 | 0.899 | 0.026 | 6.26 |
| Method | IoU↑ | ↑ | MAE↓ | BER↓ |
|---|---|---|---|---|
| SATNet + Depth Branch + RGRCM (without ) | 83.02 | 0.897 | 0.027 | 6.75 |
| SATNet + Depth Branch + RGRCM (with ) | 83.57 | 0.899 | 0.026 | 6.26 |
| IoU↑ | ↑ | MAE↓ | BER↓ | ||
|---|---|---|---|---|---|
| 0.01 | 5.0 | 82.85 | 0.894 | 0.027 | 6.83 |
| 0.1 | 5.0 | 83.57 | 0.899 | 0.026 | 6.26 |
| 0.2 | 5.0 | 83.04 | 0.896 | 0.027 | 6.70 |
| 0.1 | 3.0 | 83.15 | 0.895 | 0.027 | 6.60 |
| 0.1 | 5.0 | 83.57 | 0.899 | 0.026 | 6.26 |
| 0.1 | 7.0 | 83.56 | 0.899 | 0.027 | 6.34 |
| Method | Input Size | GFLOPs↓ | Params (M)↓ | FPS↑ | Peak Mem (GB)↓ | IoU↑ | BER↓ |
|---|---|---|---|---|---|---|---|
| PDNet | 82.31 | 80.54 | 153.8 | 0.41 | 77.77 | 7.77 | |
| UTLNet | 157.74 | 263.69 | 9.5 | - | 80.50 | 7.23 | |
| SATNet (baseline) | 102.14 | 125.35 | 60.8 | 0.59 | 80.69 | 7.33 | |
| + Depth Branch | 103.11 | 126.01 | 59.5 | 0.62 | 83.02 | 6.72 | |
| + Depth Branch + RGRCM (ours) | 105.08 | 126.13 | 57.8 | 1.15 | 83.57 | 6.26 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kim, T.; Jung, Y.J. RGB-D Mirror Segmentation with Reliability-Guided Residual Correction. Sensors 2026, 26, 3739. https://doi.org/10.3390/s26123739
Kim T, Jung YJ. RGB-D Mirror Segmentation with Reliability-Guided Residual Correction. Sensors. 2026; 26(12):3739. https://doi.org/10.3390/s26123739
Chicago/Turabian StyleKim, Taehyeon, and Yong Ju Jung. 2026. "RGB-D Mirror Segmentation with Reliability-Guided Residual Correction" Sensors 26, no. 12: 3739. https://doi.org/10.3390/s26123739
APA StyleKim, T., & Jung, Y. J. (2026). RGB-D Mirror Segmentation with Reliability-Guided Residual Correction. Sensors, 26(12), 3739. https://doi.org/10.3390/s26123739

