SemABC: Semantic-Guided Adaptive Bias Calibration for Generative Zero-Shot Point Cloud Segmentation
Abstract
1. Introduction
- We propose a dual-branch network where the auxiliary branch exploits visual–semantic correlations to complement the prediction of the main branch. It leverages the higher semantic relevance of the synthesized visual features of unseen classes to prevent bias toward the seen classes.
- We propose an adaptive bias calibration based on the confidence of segmentation predictions. This module dynamically integrates the predictions from the main and auxiliary branches, thus effectively suppressing the network’s bias towards seen classes.
- Extensive experiments show that our method significantly outperforms existing generative GZSL methods in terms of both segmentation accuracy and robustness on the benchmark datasets.
2. Related Works
2.1. Point Cloud Semantic Segmentation
2.2. Zero-Shot Point Cloud Semantic Segmentation
2.3. Calibrated Stacking
3. Methods
3.1. Overview
3.2. Primary Segmentation Branch
3.3. Visual–Semantic Fusion Branch
3.4. Adaptive Bias Calibration Module
3.5. Three-Stage Training Strategy
4. Experiments
4.1. Settings
4.2. Implementation Details
4.3. Comparisons
4.4. Ablation Studies
4.4.1. Module Effectiveness
4.4.2. Channel and Spatial Attention
4.4.3. Prediction Confidence Estimation
4.4.4. Hyperparameters
4.5. Limitations
5. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
3D | Three-Dimensional |
2D | Two-Dimensional |
ZSL | Zero-Shot Learning |
GZSL | Generalized Zero-Shot Learning |
NLP | Natural Language Processing |
MLP | Multi-Layer Perceptron |
MMD | Maximum Mean Deviation |
mIoU | Mean Intersection-over-Union |
HmIoU | Harmonic Mean of mIoU |
SGD | Stochastic Gradient Descent |
Adam | Adaptive Moment Estimation |
GMMN | Generative Moment Matching Network |
S3DIS | Stanford Large-Scale three-dimensional Indoor Spaces |
VSFM | Visual–Semantic Fusion Module |
BCM | Adaptive Bias Calibration Module |
Max | Maximum Probability |
References
- Ansary, S.I.; Mishra, A.; Deb, S.; Deb, A.K. A framework for robotic grasping of 3D objects in a tabletop environment. Multimed. Tools Appl. 2025, 84, 25865–25894. [Google Scholar] [CrossRef]
- Hu, D.; Gan, V.J.; Yin, C. Robot-assisted mobile scanning for automated 3D reconstruction and point cloud semantic segmentation of building interiors. Autom. Constr. 2023, 152, 104949. [Google Scholar] [CrossRef]
- Huang, C.; Mees, O.; Zeng, A.; Burgard, W. Visual language maps for robot navigation. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 10608–10615. [Google Scholar]
- Xiao, X.; Liu, B.; Warnell, G.; Stone, P. Motion planning and control for mobile robot navigation using machine learning: A survey. Auton. Robot. 2022, 46, 569–597. [Google Scholar] [CrossRef]
- Arena, F.; Collotta, M.; Pau, G.; Termine, F. An overview of augmented reality. Computers 2022, 11, 28. [Google Scholar] [CrossRef]
- Liu, G.; van Kaick, O.; Huang, H.; Hu, R. Active self-training for weakly supervised 3D scene semantic segmentation. Comput. Vis. Media 2024, 10, 425–438. [Google Scholar] [CrossRef]
- Shao, Y.; Tong, G.; Peng, H. Exploring high-contrast areas context for 3D point cloud segmentation via MLP-driven Discrepancy mechanism. Comput. Graph. 2025, 129, 104222. [Google Scholar] [CrossRef]
- Sun, C.Y.; Tong, X.; Liu, Y. Semantic segmentation-assisted instance feature fusion for multi-level 3D part instance segmentation. Comput. Vis. Media 2023, 9, 699–715. [Google Scholar] [CrossRef]
- Cheraghian, A.; Rahman, S.; Chowdhury, T.F.; Campbell, D.; Petersson, L. Zero-shot learning on 3d point cloud objects and beyond. Int. J. Comput. Vis. 2022, 130, 2364–2384. [Google Scholar] [CrossRef]
- Xu, W.; Xu, R.; Wang, C.; Xu, S.; Guo, L.; Zhang, M.; Zhang, X. Spectral prompt tuning: Unveiling unseen classes for zero-shot semantic segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 6369–6377. [Google Scholar]
- Zhang, Y.; Guo, M.H.; Wang, M.; Hu, S.M. Exploring regional clues in CLIP for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3270–3280. [Google Scholar]
- Su, T.; Wang, H.; Qi, Q.; Wang, L.; He, B. Transductive learning with prior knowledge for generalized zero-shot action recognition. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 260–273. [Google Scholar] [CrossRef]
- Ge, C.; Wang, J.; Qi, Q.; Sun, H.; Xu, T.; Liao, J. Semi-transductive learning for generalized zero-shot sketch-based image retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence, Montréal, QC, Canada, 8–10 August 2023; Volume 37, pp. 7678–7686. [Google Scholar]
- Liu, Y.; Tao, K.; Tian, T.; Gao, X.; Han, J.; Shao, L. Transductive zero-shot learning with generative model-driven structure alignment. Pattern Recognit. 2024, 153, 110561. [Google Scholar] [CrossRef]
- Khan, F.B.; Khan, A.; Durad, M.H.; Khan, F.A.; Ali, A. ISAnWin: Inductive generalized zero-shot learning using deep CNN for malware detection across windows and android platforms. PeerJ Comput. Sci. 2024, 10, e2604. [Google Scholar]
- Chen, S.; Wang, W.; Xia, B.; Peng, Q.; You, X.; Zheng, F.; Shao, L. Free: Feature refinement for generalized zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 122–131. [Google Scholar]
- Wu, J.; Zhang, T.; Zha, Z.J.; Luo, J.; Zhang, Y.; Wu, F. Prototype-augmented self-supervised generative network for generalized zero-shot learning. IEEE Trans. Image Process. 2024, 33, 1938–1951. [Google Scholar] [CrossRef]
- Michele, B.; Boulch, A.; Puy, G.; Bucher, M.; Marlet, R. Generative zero-shot learning for semantic segmentation of 3d point clouds. In Proceedings of the 2021 International Conference on 3D Vision (3DV), London, UK, 1–3 December 2021; pp. 992–1002. [Google Scholar]
- Yang, Y.; Hayat, M.; Jin, Z.; Zhu, H.; Lei, Y. Zero-shot point cloud segmentation by semantic-visual aware synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 11586–11596. [Google Scholar]
- Li, Y.; Swersky, K.; Zemel, R. Generative moment matching networks. In Proceedings of the International Conference on Machine Learning. PMLR, Lille, France, 7–9 July 2015; pp. 1718–1727. [Google Scholar]
- Odena, A.; Olah, C.; Shlens, J. Conditional image synthesis with auxiliary classifier gans. In Proceedings of the International Conference on Machine Learning. PMLR, Sydney, Australia, 6–11 August 2017; pp. 2642–2651. [Google Scholar]
- Wei, Z.; Chen, P.; Yu, X.; Li, G.; Jiao, J.; Han, Z. Semantic-aware SAM for point-prompted instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3585–3594. [Google Scholar]
- Zheng, X.; Lyu, Y.; Wang, L. Learning modality-agnostic representation for semantic segmentation from any modalities. In Proceedings of the European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 146–165. [Google Scholar]
- Li, H.; Zhang, D.; Dai, Y.; Liu, N.; Cheng, L.; Li, J.; Wang, J.; Han, J. Gp-nerf: Generalized perception nerf for context-aware 3d scene understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21708–21718. [Google Scholar]
- Zhai, X.; Han, W.; Li, X.; Huang, S. PLGCA: A Progressive Local-Global Context-Aware Semantic Segmentation Network for Crop Remote Sensing Mapping. In Proceedings of the 2024 6th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT), Guangzhou, China, 19–21 July 2024; pp. 491–495. [Google Scholar]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 2017, 30, 5105–5114. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar]
- Zhou, Z.; Lei, Y.; Zhang, B.; Liu, L.; Liu, Y. Zegclip: Towards adapting clip for zero-shot semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 11175–11185. [Google Scholar]
- Tian, J.; Aggarwal, L.; Colaco, A.; Kira, Z.; Gonzalez-Franco, M. Diffuse attend and segment: Unsupervised zero-shot segmentation using stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 3554–3563. [Google Scholar]
- Cheraghian, A.; Rahman, S.; Petersson, L. Zero-shot learning of 3d point cloud objects. In Proceedings of the 2019 16th International Conference on Machine Vision Applications (MVA), Tokyo, Japan, 27–31 May 2019; pp. 1–6. [Google Scholar]
- Cheraghian, A.; Rahman, S.; Campbell, D.; Petersson, L. Mitigating the hubness problem for zero-shot learning of 3d objects. arXiv 2019, arXiv:1907.06371. [Google Scholar]
- Cheraghian, A.; Rahman, S.; Campbell, D.; Petersson, L. Transductive zero-shot learning for 3d point cloud classification. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 923–933. [Google Scholar]
- Chao, W.L.; Changpinyo, S.; Gong, B.; Sha, F. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part II 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 52–68. [Google Scholar]
- Shi, Y.; Jiang, C.; Song, F.; Ye, Q.; Long, Y.; Zhang, H. Multi-domain feature-enhanced attribute updater for generalized zero-shot learning. Neural Comput. Appl. 2025, 37, 8397–8414. [Google Scholar] [CrossRef]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 3111–3119. [Google Scholar]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Nießner, M. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5828–5839. [Google Scholar]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3d semantic parsing of large-scale indoor spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. Semantickitti: A dataset for semantic scene understanding of lidar sequences. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9297–9307. [Google Scholar]
- Frome, A.; Corrado, G.S.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.; Mikolov, T. Devise: A deep visual-semantic embedding model. Adv. Neural Inf. Process. Syst. 2013, 26, 2121–2129. [Google Scholar]
- Boulch, A.; Puy, G.; Marlet, R. FKAConv: Feature-kernel alignment for point cloud convolution. In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November– 4 December 2020. [Google Scholar]
- Boulch, A. ConvPoint: Continuous convolutions for point cloud processing. Comput. Graph. 2020, 88, 24–34. [Google Scholar] [CrossRef]
- Malinin, A.; Gales, M. Predictive uncertainty estimation via prior networks. Adv. Neural Inf. Process. Syst. 2018, 31, 7047–7058. [Google Scholar]
- Liu, W.; Wang, X.; Owens, J.; Li, Y. Energy-based out-of-distribution detection. Adv. Neural Inf. Process. Syst. 2020, 33, 21464–21475. [Google Scholar]
- Seo, S.; Seo, P.H.; Han, B. Learning for single-shot confidence calibration in deep neural networks through stochastic inferences. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9030–9038. [Google Scholar]
- Tang, Y.; Lin, Z.; Wang, Q.; Zhu, P.; Hu, Q. Amu-tuning: Effective logit bias for clip-based few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 23323–23333. [Google Scholar]
Method | Training Set | ScanNet | S3DIS | SemanticKITTI | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Encoder | Segmentor | mIoU | HmIoU | mIoU | HmIoU | mIoU | HmIoU | |||||||
Full supervision | 43.3 | 51.9 | 45.1 | 47.2 | 74.0 | 50.0 | 66.6 | 59.6 | 59.4 | 50.3 | 57.5 | 54.5 | ||
Unseen classes only for segmentor | 41.5 | 39.2 | 40.3 | 40.3 | 60.9 | 21.5 | 48.7 | 31.8 | 52.9 | 13.2 | 42.3 | 21.2 | ||
Supervision with seen classes | 39.2 | 0.0 | 31.3 | 0.0 | 70.2 | 0.0 | 48.6 | 0.0 | 55.8 | 0.0 | 44.0 | 0.0 | ||
ZSLPC-Seg [32] | 16.4 | 4.2 | 13.9 | 6.7 | 5.2 | 1.3 | 4.0 | 2.1 | 26.4 | 10.2 | 21.8 | 14.7 | ||
DeViSe-3DSeg [42] | 12.8 | 3.0 | 10.9 | 4.8 | 3.6 | 1.4 | 3.0 | 2.0 | 42.9 | 4.2 | 27.6 | 7.5 | ||
3DGenZ [18] | 32.8 | 7.7 | 27.8 | 12.5 | 53.1 | 7.3 | 39 | 12.9 | 41.4 | 10.8 | 35 | 17.1 | ||
SV-Seg [19] | 34.5 | 14.3 | 30.4 | 20.2 | 58.9 | 9.7 | 43.8 | 16.7 | 46.4 | 12.8 | 39.4 | 20.1 | ||
Ours | 34.9 | 14.9 | 30.9 | 20.9 | 68.1 | 10.4 | 50.3 | 18.0 | 49.5 | 15.2 | 42.3 | 23.3 |
Methods | VSFM | BCM | mIoU | HmIoU | ||
---|---|---|---|---|---|---|
Fixed Calibration | ✕ | ✕ | 34.6 | 14.2 | 30.5 | 20.1 |
w/o BCM | ✓ | ✕ | 34.9 | 13.9 | 30.8 | 19.9 |
w/o VSFM | ✕ | ✓ | 32.4 | 14.0 | 28.7 | 19.6 |
Ours | ✓ | ✓ | 34.9 | 14.9 | 30.9 | 20.9 |
Channel | Spatial | mIoU | HmIoU | ||
---|---|---|---|---|---|
✕ | ✕ | 32.8 | 13.9 | 29.0 | 19.5 |
✓ | ✕ | 32.0 | 13.8 | 28.3 | 19.3 |
✕ | ✓ | 31.8 | 14.4 | 28.4 | 19.9 |
✓ | ✓ | 34.9 | 14.9 | 30.9 | 20.9 |
Method | ScanNet | S3DIS | ||||||
---|---|---|---|---|---|---|---|---|
mIoU | HmIoU | mIoU | HmIoU | |||||
Baseline | 34.5 | 14.3 | 30.4 | 20.2 | 58.9 | 9.7 | 43.8 | 16.7 |
Max | 34.8 | 13.3 | 30.5 | 19.3 | 67.7 | 9.8 | 49.9 | 17.1 |
Entropy | 34.4 | 14.1 | 30.3 | 20.0 | 68.1 | 9.3 | 50.0 | 16.4 |
Energy | 34.6 | 14.1 | 30.5 | 20.0 | 67.3 | 9.5 | 49.5 | 16.7 |
Variance | 34.8 | 13.8 | 30.6 | 19.8 | 66.9 | 9.8 | 49.3 | 17.1 |
Kurtosis | 34.9 | 14.9 | 30.9 | 20.9 | 68.1 | 10.4 | 50.3 | 18.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, Y.; Qi, M. SemABC: Semantic-Guided Adaptive Bias Calibration for Generative Zero-Shot Point Cloud Segmentation. Appl. Sci. 2025, 15, 8359. https://doi.org/10.3390/app15158359
Wei Y, Qi M. SemABC: Semantic-Guided Adaptive Bias Calibration for Generative Zero-Shot Point Cloud Segmentation. Applied Sciences. 2025; 15(15):8359. https://doi.org/10.3390/app15158359
Chicago/Turabian StyleWei, Yuyun, and Meng Qi. 2025. "SemABC: Semantic-Guided Adaptive Bias Calibration for Generative Zero-Shot Point Cloud Segmentation" Applied Sciences 15, no. 15: 8359. https://doi.org/10.3390/app15158359
APA StyleWei, Y., & Qi, M. (2025). SemABC: Semantic-Guided Adaptive Bias Calibration for Generative Zero-Shot Point Cloud Segmentation. Applied Sciences, 15(15), 8359. https://doi.org/10.3390/app15158359