Advancements in Semantic Segmentation of 3D Point Clouds for Scene Understanding Using Deep Learning
Abstract
1. Introduction
- We present a chronological and categorized view of the deep learning methods most relevant to 3D semantic segmentation.
- We synthesize these approaches through a clear classification based on data representation (points, voxels, views, etc.) and algorithmic paradigm (convolutional, graphs, transformers, etc.).
- We carry out a rigorous comparative analysis of the results (mIoU) obtained on the main public datasets: SemanticKITTI, ScanNet, S3DIS, etc.
- We highlight the challenges still open and areas for future research, particularly in terms of robustness, domain adaptation and computational efficiency.
2. Data Representation
3. Methods for 3D Semantic Segmentation
3.1. Supervised Approaches
3.1.1. Point-Based Methods
PointNets/MLP Methods
ConvNet-Based Methods
Graph-Based Methods
RNN-Based Methods
- An encoder: Extract features from the 3D point cloud. This can be achieved by converting the point cloud into a voxel grid and applying a 3D CNN to extract features from the voxel grid. Alternatively, it can be accomplished using a PointNet++ architecture, which processes point clouds directly, leveraging a Multi-Layer Perceptron (MLP) to extract features from each point.
- Recurrent layers: These layers process the features extracted by the encoder and learn temporal dependencies between points in the point cloud. This can be performed using a simple RNN, such as an LSTM or GRU, or a more complex architecture like a bidirectional RNN or a multi-layer RNN.
- A decoder: This component predicts the semantic class of each point in the point cloud based on the features and dependencies learned by the recurrent layers. This can be achieved using a simple MLP or a more complex architecture, such as a fully connected Conditional Random Field (CRF) [59].
- A loss function: This component is used to train the model by comparing the predicted semantic class of each point to the ground-truth semantic class. Common loss functions used in 3D semantic segmentation include cross-entropy loss or dice loss.
Transformer-Based Methods
3.1.2. Projection-Based Methods
Multi-View Representation
Spherical Representation
Volumetric Representation
3.1.3. Voxel-Based Methods
Uniform Voxilization
Non-Uniform Voxilization
3.1.4. Other Representations
3.2. Unsupervised Approaches
3.2.1. Self-Supervised Learning
3.2.2. Generative Methods
3.2.3. Implicit Representation
4. Benchmark Datasets
Other Datasets
Dataset | Year | Type | Application | Size | Sensor |
---|---|---|---|---|---|
S3DIS [129] | 2016 | RWE | IS | 273 MP | Matterport |
ScanNet [130] | 2017 | RWE | IS | 242 MP | RGB-D |
Semantic3D [131] | 2017 | RWE | OS | 4000 MP | MLS |
SemanticKITTI [32] | 2019 | RWE | OS | 4549 MP | MLS |
NuScenes [132] | 2020 | RWE | OS | 341 TF | Velodyne HDL-32E |
ModelNet40 [35] | 2015 | SE | OC | 12.3 TN | - |
ScanNet [130] | 2017 | RWE | IS | 242 MP | RGB-D |
ParisLille-3D [135] | 2018 | RWE | OS | 1430 MP | MLS |
5. Evaluation Metrics
- Mean Intersection over Union (mIoU): to check the effectiveness of semantic segmentation methods, mIoU is used as an indicator of the degree of similarity between the intersection and union of two ensembles, it can be calculated mathematically by the ratio of the intersection to the union of two sets: ground truth and predicted output. Given classes, including empty classes, mIoU is quickly calculated as shown by the formula.
- Overall Accuracy (OAcc): Also known as OA, it is a simple metric calculated by the ratio between the correct accuracy of the model and the total number of samples.
- Mean Accuracy (mAcc): Represents the average class accuracy, which is a direct per-class application of OAcc, averaged over the total number of classes k.
- Execution time: The computing time of the algorithms must be seriously considered. With acquisition systems becoming increasingly resolved and accurate, and with large amounts of data being stored, the performance of semantic segmentation methods is a central issue. The authors determine the efficiency of each of their methods in order to evaluate the scalability of their methods to larger and higher-dimensional datasets. But it remains a big challenge due to the use of different datasets, as well as the fact that each method is executed on different performance machines (ram capacity, GPU, etc.).
- Dice Similarity Coefficient (DSC): The Dice Similarity Coefficient (DSC) is a widely used metric for measuring the similarity between two sets of data, particularly in image segmentation tasks. It quantifies the degree of overlap between a predicted segmentation mask and the ground truth mask, providing a robust evaluation of segmentation performance.Mathematically, the Dice coefficient is calculated by comparing pixels on a per-pixel basis between the predicted segmentation and the corresponding ground truth. The principle is straightforward: the number of common pixels (i.e., pixels present in both masks) is counted using the intersection of the two images, which is then multiplied by two to account for both sets. This value is then divided by the total number of pixels across both images, ensuring a normalized similarity score between 0 and 1. A DSC of 1 indicates a perfect match, while a score of 0 signifies no overlap.
6. Analysis and Discussion
6.1. From 2D to 3D: Transferring Vision Paradigms
6.2. Architectural Integration Techniques : From Hybrid Architectures to Multi-Sensor Fusion in 3D Semantic Segmentation
View | Complexity | Representative |
---|---|---|
Raw Points | RandLA-Net [149], KPConv [58] | |
Range View | SqueezeSeg [8], RangeNet++ [91] | |
Bird’s Eye View | PolarNet [148] | |
Voxel (Dense) | PVCNN [4] | |
Voxel (Sparse) | MinkowskiNet [6], SPVNAS [4] | |
Voxel (Cylinder) | Cylinder3D [157] | |
Multi-View | AMVNet [9], RPVNet [10] |
6.3. Benchmarking of Algorithms
Date of Publication | Methods | Mean IoU (%) | View |
---|---|---|---|
2023 | UniSeg [168] | 75.2 | P + V + T |
2023 | SphereFormer [92] | 74.8 | T |
2023 | RangeFormer [91] | 73.3 | P |
2022 | 2DPASS [11] | 72.9 | H |
2022 | PTv2 [83] | 71.2 | T |
2022 | PVKD [169] | 71.2 | V |
2021 | AF2S3Net [101] | 70.8 | P + V |
2020 | Cylinder3D [157] | 68.9 | V |
2020 | SPVNAS [4] | 66.4 | P + V |
2020 | JS3C-Net [105] | 66 | P + V |
2022 | GFNet [170] | 65.4 | P |
2020 | KPRNet [171] | 63.1 | H |
2017 | PointNet++ [40] | 20.1 | P |
2017 | SPGraph [54] | 17.4 | G |
2017 | PointNet [3] | 14.6 | P |
2023 | RangeViT [172] | 64.0 | T |
Method | mIoU (%) | Dataset | Computational Cost | Noise Robustness | Real-Time | Practical Applicability |
---|---|---|---|---|---|---|
PointNet [3] | 47.6 | S3DIS | Low | Low | Yes | Low |
PointNet++ [40] | 54.5 | S3DIS | Moderate | Moderate | No | Moderate |
KPConv [58] | 70.6 | S3DIS | High | High | No | High |
RandLA-Net [149] | 70.0 | SemanticKITTI | Low | Moderate | Yes | High |
PointCNN [42] | 65.4 | S3DIS | High | Moderate | No | Moderate |
SPVNAS [4] | 66.4 | SemanticKITTI | Moderate | High | Yes | High |
Cylinder3D [157] | 68.9 | SemanticKITTI | Moderate | High | No | High |
RangeNet++ [91] | 65.5 | SemanticKITTI | Low | Moderate | Yes | Moderate |
PolarNet [148] | 54.3 | SemanticKITTI | Low | Moderate | Yes | Moderate |
SPGraph [54] | 62.1 | Semantic3D | High | High | No | Moderate |
MinkowskiNet [6] | 67.0 | ScanNet | High | High | No | High |
PTV1 [78] | 70.4 | S3DIS | High | High | No | Moderate |
PTV2 [83] | 72.7 | S3DIS | High | High | No | Moderate |
PTV3 [84] | 72.3 | S3DIS | High | High | No | Moderate |
AF2S3Net [101] | 70.8 | SemanticKITTI | Moderate | High | Yes | High |
2DPASS [11] | 72.9 | SemanticKITTI | High | High | No | High |
SphereFormer [92] | 74.8 | SemanticKITTI | High | High | No | High |
UniSeg [168] | 75.2 | SemanticKITTI | High | High | No | High |
FusionNet [100] | 66.3 | SemanticKITTI | Moderate | High | No | High |
SalsaNext [173] | 59.5 | SemanticKITTI | Moderate | Moderate | Yes | Moderate |
RangeViT [172] | 64.0 | SemanticKITTI | High | Moderate | No | Moderate |
FlashSplat [156] | 71.0 | ScanNet | Moderate | High | Yes | High |
GaussianCut [154] | 69.5 | ScanNet | High | High | No | Moderate |
Mask3D [152] | 74.0 | ScanNet | High | High | No | High |
Point-BERT [113] | 66.6 | S3DIS | High | High | No | Moderate |
SPFormer [174] | 54.9 | ScanNet | High | Moderate | No | Low |
LatticeNet [175] | 64.0 | ScanNet | High | Moderate | No | Moderate |
SegCloud [20] | 61.3 | Semantic3D | Moderate | Moderate | No | Moderate |
OctNet [93] | 50.7 | Semantic3D | High | Moderate | No | Low |
VMNet [102] | 52.9 | SemanticKITTI | Moderate | Moderate | No | Moderate |
Methods | S3DIS | ScanNet | Semantic3D | SemanticKITTI | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
mAcc | oAcc | mIoU | mAcc | oAcc | mIoU | mAcc | oAcc | mIoU | mAcc | oAcc | mIoU | |
SqueezeSeg [8] | - | - | - | - | - | - | - | - | - | - | - | - |
RangNet++ [91] | - | - | - | - | - | - | - | - | - | - | - | - |
OctNet [93] | 39 | 68.9 | 26.3 | 26.4 | 76.6 | 18.1 | 71.3 | 80.7 | 50.7 | - | - | - |
SegCloud [20] | - | - | - | - | - | - | 88.1 | 61.3 | - | - | - | - |
RangNet53 [20] | - | - | - | - | - | - | 88.1 | 61.3 | - | - | 52.2 | - |
MI-Net [176] | - | - | - | - | - | - | - | - | - | - | - | - |
TangentConv [177] | 62.2 | 82.5 | 52.8 | 55.1 | 80.1 | 40.9 | 80.7 | 89.3 | 66.4 | - | - | 40.9 |
PointNet [3] | 67.1 | 81.0 | 54.5 | - | - | - | - | - | - | - | - | 20.1 |
PointNet++ [40] | 67.1 | 81.0 | 54.5 | - | - | - | - | - | - | - | - | 20.1 |
PointWeb [52] | 76.2 | 87.3 | 66.7 | - | - | - | - | - | - | - | - | - |
PointSIFT [50] | - | - | 70.23 | 70.2 | - | 41.5 | - | - | - | - | - | - |
RSNet [178] | 66.5 | - | 56.5 | - | - | - | - | - | - | - | - | - |
KPConv [58] | 79.1 | - | 70.6 | - | - | - | - | 92.9 | 74.6 | - | - | - |
PointCNN [42] | 75.6 | 88.1 | 65.4 | - | - | - | - | - | - | - | - | - |
PointConv [49] | - | - | - | - | - | - | - | - | - | - | - | - |
RandLA-Net [149] | 82.0 | 88.0 | 70.0 | - | - | - | - | 94.8 | 77.4 | - | - | 53.9 |
PolarNet [148] | - | - | - | - | - | - | - | - | - | - | - | 54.3 |
DGCN [41] | - | - | - | - | - | - | - | - | - | - | - | - |
SPG [54] | 73.0 | 85.5 | 62.1 | - | - | - | - | 94.0 | 73.2 | - | - | 17.4 |
SPLATNet [179] | - | - | - | - | - | 39.3 | - | - | - | - | - | 18.4 |
LatticeNet [175] | - | - | - | - | - | 64.0 | - | - | - | - | - | 52.9 |
VMNet [102] | - | - | - | - | - | - | 64.0 | - | - | - | - | 52.9 |
LaserMix [180] | - | - | 73.0 | - | - | 61.4 | - | - | - | - | - | 60.8 |
PTV1 [78] | 90.8 | 76.5 | 70.4 | - | - | - | - | - | - | - | - | - |
PTV2 [83] | 91.6 | 78.0 | 72.7 | - | - | - | - | - | - | - | - | - |
PTV3 [84] | 78.4 | 91.4 | 72.3 | - | - | - | - | - | - | - | - | - |
PointTransformer [78] | 76.5 | 90.8 | 70.4 | - | - | - | - | - | - | - | - | - |
7. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bello, S.A.; Yu, S.; Wang, C.; Adam, J.M.; Li, J. Review: Deep Learning on 3D Point Clouds. Remote Sens. 2020, 12, 1729. [Google Scholar] [CrossRef]
- Xie, S.; Gu, J.; Guo, D.; Qi, C.R.; Guibas, L.; Litany, O. PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding. In Computer Vision–ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12348, pp. 574–591. [Google Scholar] [CrossRef]
- Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Tang, H.; Liu, Z.; Zhao, S.; Lin, Y.; Lin, J.; Wang, H.; Han, S. Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution. In Computer Vision–ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12373, pp. 685–702. [Google Scholar] [CrossRef]
- Chen, R.; Liu, Y.; Kong, L.; Zhu, X.; Ma, Y.; Li, Y.; Hou, Y.; Qiao, Y.; Wang, W. CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP. arXiv 2023, arXiv:2301.04926. [Google Scholar] [CrossRef]
- Choy, C.; Gwak, J.; Savarese, S. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3070–3079. [Google Scholar] [CrossRef]
- Aksoy, E.E.; Baci, S.; Cavdar, S. Salsanet: Fast Road and Vehicle Segmentation in Lidar Point Clouds for Autonomous Driving. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 926–932. [Google Scholar] [CrossRef]
- Wu, B.; Wan, A.; Yue, X.; Keutzer, K. SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 1887–1893. [Google Scholar] [CrossRef]
- Liong, V.E.; Nguyen, T.N.T.; Widjaja, S.; Sharma, D.; Chong, Z.J. AMVNet: Assertion-based Multi-View Fusion Network for LiDAR Semantic Segmentation. arXiv 2020, arXiv:2012.04934. [Google Scholar] [CrossRef]
- Xu, J.; Zhang, R.; Dou, J.; Zhu, Y.; Sun, J.; Pu, S. RPVNet: A Deep and Efficient Range-Point-Voxel Fusion Network for LiDAR Point Cloud Segmentation. arXiv 2021, arXiv:2103.12978. [Google Scholar] [CrossRef]
- Yan, X.; Gao, J.; Zheng, C.; Zheng, C.; Zhang, R.; Cui, S.; Li, Z. 2DPASS: 2D Priors Assisted Semantic Segmentation on LiDAR Point Clouds. In Computer Vision–ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer Nature: Cham, Switzerland, 2022; Volume 13688, pp. 677–695. [Google Scholar] [CrossRef]
- Roldão, L.; De Charette, R.; Verroust-Blondet, A. 3D Semantic Scene Completion: A Survey. Int. J. Comput. Vis. 2022, 130, 1978–2005. [Google Scholar] [CrossRef]
- Liu, M.; Zhu, Y.; Cai, H.; Han, S.; Ling, Z.; Porikli, F.; Su, H. Partslip: Low-shot Part Segmentation for 3d Point Clouds via Pretrained Image-Language Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 21736–21746. [Google Scholar] [CrossRef]
- Aleotti, J.; Caselli, S. A 3D Shape Segmentation Approach for Robot Grasping by Parts. Robot. Auton. Syst. 2012, 60, 358–366. [Google Scholar] [CrossRef]
- Mo, K.; Guerrero, P.; Yi, L.; Su, H.; Wonka, P.; Mitra, N.; Guibas, L.J. StructureNet: Hierarchical Graph Networks for 3D Shape Generation. arXiv 2019, arXiv:1908.00575. [Google Scholar] [CrossRef]
- Kareem, A.; Lahoud, J.; Cholakkal, H. PARIS3D: Reasoning-Based 3D Part Segmentation Using Large Multimodal Model. In Computer Vision–ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer Nature: Cham, Switzerland, 2025; Volume 15130, pp. 466–482. [Google Scholar] [CrossRef]
- Zhou, F.; Zhang, Q.; Zhu, H.; Liu, S.; Jiang, N.; Cai, X.; Qi, Q.; Hu, Y. Attentional Keypoint Detection on Point Clouds for 3D Object Part Segmentation. Appl. Sci. 2023, 13, 12537. [Google Scholar] [CrossRef]
- Zhou, T.; Wang, W. Prototype-Based Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6858–6872. [Google Scholar] [CrossRef]
- Yasir, S.M.; Ahn, H. Deep Learning-Based 3D Instance and Semantic Segmentation: A Review. arXiv 2024. [Google Scholar] [CrossRef]
- Tchapmi, L.; Choy, C.; Armeni, I.; Gwak, J.; Savarese, S. Segcloud: Semantic Segmentation of 3d Point Clouds. In Proceedings of the 2017 International Conference on 3D Vision (3DV), Qingdao, China, 10–12 October 2017; pp. 537–547. [Google Scholar] [CrossRef]
- Landrieu, L.; Simonovsky, M. Segmentation Sémantique à Grande Echelle Par Graphe de Superpoints. In Proceedings of the RFIAP, Paris, France, 26–28 June 2018; Available online: https://hal.science/hal-01939229v1 (accessed on 10 July 2025).
- Nguyen, A.; Le, B. 3D Point Cloud Segmentation: A Survey. In Proceedings of the 2013 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM), Manila, Philippines, 12–15 November 2013; pp. 225–230. [Google Scholar] [CrossRef]
- Li, X.; Ding, H.; Yuan, H.; Zhang, W.; Pang, J.; Cheng, G.; Chen, K.; Liu, Z.; Loy, C.C. Transformer-Based Visual Segmentation: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 10138–10163. [Google Scholar] [CrossRef] [PubMed]
- Grilli, E.; Menna, F.; Remondino, F. A Review of Point Clouds Segmentation and Classification Algorithms. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 339. [Google Scholar] [CrossRef]
- Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv 2017. [Google Scholar] [CrossRef]
- Yu, H.; Yang, Z.; Tan, L.; Wang, Y.; Sun, W.; Sun, M.; Tang, Y. Methods and Datasets on Semantic Segmentation: A Review. Neurocomputing 2018, 304, 82–103. [Google Scholar] [CrossRef]
- Lateef, F.; Ruichek, Y. Survey on Semantic Segmentation Using Deep Learning Techniques. Neurocomputing 2019, 338, 321–348. [Google Scholar] [CrossRef]
- Xie, Y.; Tian, J.; Zhu, X.X. Linking Points With Labels in 3D: A Review of Point Cloud Semantic Segmentation. IEEE Geosci. Remote Sens. Mag. 2020, 8, 38–59. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4338–4364. [Google Scholar] [CrossRef]
- Vodrahalli, K.; Bhowmik, A.K. 3D Computer Vision Based on Machine Learning with Deep Neural Networks: A Review. J. Soc. Inf. Disp. 2017, 25, 676–694. [Google Scholar] [CrossRef]
- Ioannidou, A.; Chatzilari, E.; Nikolopoulos, S.; Kompatsiaris, I. Deep Learning Advances in Computer Vision with 3D Data: A Survey. ACM Comput. Surv. 2018, 50, 1–38. [Google Scholar] [CrossRef]
- Behley, J.; Garbade, M.; Milioto, A.; Quenzel, J.; Behnke, S.; Stachniss, C.; Gall, J. SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9296–9306. [Google Scholar] [CrossRef]
- Hackel, T.; Wegner, J.D.; Schindler, K. Fast Semantic Segmentation of 3D Point Clouds with Strongly Varying Density. Isprs Ann. Photogramm. Remote. Sens. Spat. Inf. Sci. 2016, III-3, 177–184. [Google Scholar] [CrossRef]
- Chua, C.S.; Jarvis, R. Point Signatures: A New Representation for 3d Object Recognition. Int. J. Comput. Vis. 1997, 25, 63–85. [Google Scholar] [CrossRef]
- Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3D ShapeNets: A Deep Representation for Volumetric Shapes. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar] [CrossRef]
- Li, Y.; Pirk, S.; Su, H.; Qi, C.R.; Guibas, L.J. FPNN: Field Probing Neural Networks for 3D Data. arXiv 2016. [Google Scholar] [CrossRef]
- Su, H.; Maji, S.; Kalogerakis, E.; Learned-Miller, E. Multi-View Convolutional Neural Networks for 3D Shape Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 945–953. [Google Scholar] [CrossRef]
- Guo, K.; Zou, D.; Chen, X. 3D Mesh Labeling via Deep Convolutional Neural Networks. ACM Trans. Graph. 2015, 35, 1–12. [Google Scholar] [CrossRef]
- Uy, M.A.; Pham, Q.H.; Hua, B.S.; Nguyen, D.T.; Yeung, S.K. Revisiting Point Cloud Classification: A New Benchmark Dataset and Classification Model on Real-World Data. arXiv 2019. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic Graph CNN for Learning on Point Clouds. ACM Trans. Graph. 2019, 38, 1–12. [Google Scholar] [CrossRef]
- Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-Transformed Points. arXiv 2018, arXiv:1801.07791. [Google Scholar] [CrossRef]
- Ma, X.; Qin, C.; You, H.; Ran, H.; Fu, Y. Rethinking Network Design and Local Geometry in Point Cloud: A Simple Residual MLP Framework. arXiv 2022, arXiv:2003.00492. [Google Scholar] [CrossRef]
- Yan, X.; Zheng, C.; Li, Z.; Wang, S.; Cui, S. PointASNL: Robust Point Clouds Processing using Nonlocal Neural Networks with Adaptive Sampling. arXiv 2020, arXiv:2003.00492. [Google Scholar] [CrossRef]
- Maturana, D.; Scherer, S. Voxnet: A 3d Convolutional Neural Network for Real-Time Object Recognition. In Proceedings of the 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Germany, 28 September–2 October 2015; pp. 922–928. [Google Scholar] [CrossRef]
- Çiçek, Ö.; Abdulkadir, A.; Lienkamp, S.S.; Brox, T.; Ronneberger, O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. arXiv 2016, arXiv:1606.06650. [Google Scholar] [CrossRef]
- Graham, B.; Engelcke, M.; Maaten, L.V.D. 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 9224–9232. [Google Scholar] [CrossRef]
- Kundu, A.; Yin, X.; Fathi, A.; Ross, D.; Brewington, B.; Funkhouser, T.; Pantofaru, C. Virtual Multi-view Fusion for 3D Semantic Segmentation. arXiv 2020. [Google Scholar] [CrossRef]
- Wu, W.; Qi, Z.; Fuxin, L. Pointconv: Deep Convolutional Networks on 3d Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9621–9630. [Google Scholar] [CrossRef]
- Jiang, M.; Wu, Y.; Zhao, T.; Zhao, Z.; Lu, C. PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation. arXiv 2018, arXiv:1807.00652. [Google Scholar] [CrossRef]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Zhao, H.; Jiang, L.; Fu, C.W.; Jia, J. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5560–5568. [Google Scholar] [CrossRef]
- Simonovsky, M.; Komodakis, N. Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3693–3702. [Google Scholar] [CrossRef]
- Landrieu, L.; Simonovsky, M. Large-Scale Point Cloud Semantic Segmentation with Superpoint Graphs. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4558–4567. [Google Scholar] [CrossRef]
- Liu, Z.; Qi, X.; Fu, C.W. One Thing One Click: A Self-Training Approach for Weakly Supervised 3d Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 1726–1736. [Google Scholar] [CrossRef]
- Xie, Z.; Chen, J.; Peng, B. Point Clouds Learning with Attention-Based Graph Convolution Networks. Neurocomputing 2020, 402, 245–255. [Google Scholar] [CrossRef]
- Wang, L.; Huang, Y.; Hou, Y.; Zhang, S.; Shan, J. Graph Attention Convolution for Point Cloud Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10296–10305. [Google Scholar] [CrossRef]
- Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. Kpconv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6411–6420. [Google Scholar] [CrossRef]
- Tseng, H.; Chang, P.; Andrew, G.; Jurafsky, D.; Manning, C. A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005. In Proceedings of the Fourth SIGHAN Workshop on Chinese language Processing, Jeju Island, Republic of Korea, 14–15 October 2005; Available online: https://aclanthology.org/I05-3027/ (accessed on 10 July 2025).
- Fan, H.; Yang, Y. PointRNN: Point Recurrent Neural Network for Moving Point Cloud Processing. arXiv 2019. [Google Scholar] [CrossRef]
- Ye, X.; Li, J.; Huang, H.; Du, L.; Zhang, X. 3D Recurrent Neural Networks with Context Fusion for Point Cloud Semantic Segmentation. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 415–430. [Google Scholar] [CrossRef]
- Yi, L.; Zhao, W.; Wang, H.; Sung, M.; Guibas, L. GSPN: Generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud. arXiv 2018. [Google Scholar] [CrossRef]
- Shi, S.; Wang, X.; Li, H. Pointrcnn: 3D Object Proposal Generation and Detection from Point Cloud. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 770–779. [Google Scholar]
- Liang, D.; Zhou, X.; Xu, W.; Zhu, X.; Zou, Z.; Ye, X.; Tan, X.; Bai, X. PointMamba: A Simple State Space Model for Point Cloud Analysis. arXiv 2024. [Google Scholar] [CrossRef]
- He, Q.; Zhang, J.; Peng, J.; He, H.; Li, X.; Wang, Y.; Wang, C. PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning. arXiv 2024, arXiv:2405.15214. [Google Scholar] [CrossRef]
- Han, X.; Tang, Y.; Wang, Z.; Li, X. Mamba3D: Enhancing Local Features for 3D Point Cloud Analysis via State Space Model. arXiv 2024. [Google Scholar] [CrossRef]
- Zhang, T.; Yuan, H.; Qi, L.; Zhang, J.; Zhou, Q.; Ji, S.; Yan, S.; Li, X. Point Cloud Mamba: Point Cloud Learning via State Space Model. arXiv 2024, arXiv:2403.00762. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 4171–4186. [Google Scholar] [CrossRef]
- Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. arXiv 2021, arXiv:2101.01169. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Chu, X.; Tian, Z.; Wang, Y.; Zhang, B.; Ren, H.; Wei, X.; Xia, H.; Shen, C. Twins: Revisiting the Design of Spatial Attention in Vision Transformers. arXiv 2021. [Google Scholar] [CrossRef]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Chen, D.; Guo, B. CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12114–12124. [Google Scholar] [CrossRef]
- Minaee, S.; Boykov, Y.; Porikli, F.; Plaza, A.; Kehtarnavaz, N.; Terzopoulos, D. Image Segmentation Using Deep Learning: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 3523–3542. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar] [CrossRef]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.; Koltun, V. Point Transformer. arXiv 2020, arXiv:2012.09164. [Google Scholar] [CrossRef] [PubMed]
- Lu, D.; Xie, Q.; Wei, M.; Gao, K.; Xu, L.; Li, J. Transformers in 3D Point Clouds: A Survey. arXiv 2022. [Google Scholar] [CrossRef]
- Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.R.; Hu, S.M. PCT: Point Cloud Transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
- Pan, X.; Xia, Z.; Song, S.; Li, L.E.; Huang, G. 3D Object Detection with Pointformer. arXiv 2020, arXiv:2012.11409. [Google Scholar] [CrossRef]
- Lai, X.; Liu, J.; Jiang, L.; Wang, L.; Zhao, H.; Liu, S.; Qi, X.; Jia, J. Stratified Transformer for 3D Point Cloud Segmentation. arXiv 2022. [Google Scholar] [CrossRef]
- Wu, X.; Lao, Y.; Jiang, L.; Liu, X.; Zhao, H. Point Transformer V2: Grouped Vector Attention and Partition-based Pooling. arXiv 2022. [Google Scholar] [CrossRef]
- Wu, X.; Jiang, L.; Wang, P.S.; Liu, Z.; Liu, X.; Qiao, Y.; Ouyang, W.; He, T.; Zhao, H. Point Transformer V3: Simpler, Faster, Stronger. arXiv 2023, arXiv:2312.10035. [Google Scholar] [CrossRef]
- Lahoud, J.; Cao, J.; Khan, F.S.; Cholakkal, H.; Anwer, R.M.; Khan, S.; Yang, M.H. 3D Vision with Transformers: A Survey. arXiv 2022. [Google Scholar] [CrossRef]
- Kong, L.; Liu, Y.; Chen, R.; Ma, Y.; Zhu, X.; Li, Y.; Hou, Y.; Qiao, Y.; Liu, Z. Rethinking Range View Representation for LiDAR Segmentation. arXiv 2023, arXiv:2303.05367. [Google Scholar] [CrossRef]
- Cohen, T.S.; Geiger, M.; Koehler, J.; Welling, M. Spherical CNNs. arXiv 2018, arXiv:1801.10130. [Google Scholar] [CrossRef]
- Wang, Y.; Shi, T.; Yun, P.; Tai, L.; Liu, M. PointSeg: Real-Time Semantic Segmentation Based on 3D LiDAR Point Cloud. arXiv 2018, arXiv:1807.06288. [Google Scholar] [CrossRef]
- Wu, B.; Zhou, X.; Zhao, S.; Yue, X.; Keutzer, K. SqueezeSegV2: Improved Model Structure and Unsupervised Domain Adaptation for Road-Object Segmentation from a LiDAR Point Cloud. arXiv 2018, arXiv:1809.08495. [Google Scholar] [CrossRef]
- Xu, C.; Wu, B.; Wang, Z.; Zhan, W.; Vajda, P.; Keutzer, K.; Tomizuka, M. SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation. In Computer Vision–ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12373, pp. 1–19. [Google Scholar] [CrossRef]
- Milioto, A.; Vizzo, I.; Behley, J.; Stachniss, C. Rangenet++: Fast and Accurate Lidar Semantic Segmentation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 4213–4220. [Google Scholar] [CrossRef]
- Lai, X.; Chen, Y.; Lu, F.; Liu, J.; Jia, J. Spherical Transformer for LiDAR-based 3D Recognition. arXiv 2023, arXiv:2303.12766. [Google Scholar] [CrossRef]
- Riegler, G.; Ulusoy, A.O.; Geiger, A. OctNet: Learning Deep 3D Representations at High Resolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6620–6629. [Google Scholar] [CrossRef]
- Wang, P.S.; Liu, Y.; Guo, Y.X.; Sun, C.Y.; Tong, X. O-CNN: Octree-Based Convolutional Neural Networks for 3D Shape Analysis. ACM Trans. Graph. 2017, 36, 1–11. [Google Scholar] [CrossRef]
- He, Y.; Yu, H.; Liu, X.; Yang, Z.; Sun, W.; Anwar, S.; Mian, A. Deep Learning Based 3D Segmentation: A Survey. arXiv 2021, arXiv:2103.05423. [Google Scholar] [CrossRef]
- Huang, Y.; Zheng, W.; Zhang, Y.; Zhou, J.; Lu, J. Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction (TPVFormer). arXiv 2023, arXiv:2302.07817. [Google Scholar] [CrossRef]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar] [CrossRef]
- Tallamraju, R.; Rajappa, S.; Black, M.J.; Karlapalem, K.; Ahmad, A. Decentralized Mpc Based Obstacle Avoidance for Multi-Robot Target Tracking Scenarios. In Proceedings of the 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Philadelphia, PA, USA, 6–8 August 2018); pp. 1–8. [Google Scholar] [CrossRef]
- Couprie, C.; Farabet, C.; Najman, L.; LeCun, Y. Indoor Semantic Segmentation Using Depth Information. arXiv 2013. [Google Scholar] [CrossRef]
- Zhang, F.; Fang, J.; Wah, B.; Torr, P. Deep FusionNet for Point Cloud Semantic Segmentation. In Computer Vision–ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12369, pp. 644–663. [Google Scholar] [CrossRef]
- Cheng, R.; Razani, R.; Taghavi, E.; Li, E.; Liu, B. (AF)2 -S3Net: Attentive Feature Fusion with Adaptive Feature Selection for Sparse Semantic Segmentation Network. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 12542–12551. [Google Scholar] [CrossRef]
- Hu, Z.; Bai, X.; Shang, J.; Zhang, R.; Dong, J.; Wang, X.; Sun, G.; Fu, H.; Tai, C.L. Vmnet: Voxel-mesh Network for Geodesic-Aware 3d Semantic Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15488–15498. [Google Scholar] [CrossRef]
- Shi, S.; Jiang, L.; Deng, J.; Wang, Z.; Guo, C.; Shi, J.; Wang, X.; Li, H. PV-RCNN++: Point-Voxel Feature Set Abstraction with Local Vector Representation for 3D Object Detection. Int. J. Comput. Vis. 2023, 131, 531–551. [Google Scholar] [CrossRef]
- Wang, S.; Zhu, J.; Zhang, R. Meta-RangeSeg: LiDAR Sequence Semantic Segmentation Using Multiple Feature Aggregation. IEEE Robot. Autom. Lett. 2022, 7, 9739–9746. [Google Scholar] [CrossRef]
- Yan, X.; Gao, J.; Li, J.; Zhang, R.; Li, Z.; Huang, R.; Cui, S. Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion. Proc. AAAI Conf. Artif. Intell. 2021, 35, 3101–3109. [Google Scholar] [CrossRef]
- Gerdzhev, M.; Razani, R.; Taghavi, E.; Bingbing, L. TORNADO-Net: mulTiview tOtal vaRiatioN semAntic Segmentation with Diamond inceptiOn Module. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 9543–9549. [Google Scholar] [CrossRef]
- Chen, Z.; Xu, H.; Chen, W.; Zhou, Z.; Xiao, H.; Sun, B.; Xie, X.; Kang, W. PointDC:Unsupervised Semantic Segmentation of 3D Point Clouds via Cross-modal Distillation and Super-Voxel Clustering. arXiv 2024, arXiv:2304.08965. [Google Scholar] [CrossRef]
- Tang, P.; Xu, H.M.; Ma, C. ProtoTransfer: Cross-Modal Prototype Transfer for Point Cloud Segmentation. In Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 3314–3324. [Google Scholar] [CrossRef]
- Afham, M.; Dissanayake, I.; Dissanayake, D.; Dharmasiri, A.; Thilakarathna, K.; Rodrigo, R. CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding. arXiv 2022, arXiv:2203.00680. [Google Scholar] [CrossRef]
- Puy, G.; Gidaris, S.; Boulch, A.; Siméoni, O.; Sautier, C.; Pérez, P.; Bursuc, A.; Marlet, R. Three Pillars improving Vision Foundation Model Distillation for Lidar. arXiv 2023, arXiv:2310.17504. [Google Scholar] [CrossRef]
- Umam, A.; Yang, C.K.; Chen, M.H.; Chuang, J.H.; Lin, Y.Y. PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation. arXiv 2023, arXiv:2312.04016. Version Number: 2. [Google Scholar] [CrossRef]
- Zhang, Q.; Litany, O.; Choy, C.; Koltun, V. Self-Supervised Pretraining of 3D Features on Any Point-Cloud. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. [Google Scholar] [CrossRef]
- Yu, X.; Tang, L.; Rao, Y.; Huang, T.; Zhou, J.; Lu, J. Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, arXiv:2111.14819., 18–24 June 2022. arXiv 2022. [Google Scholar] [CrossRef]
- Pang, Y.; Liu, W.; Lin, C.Z.; Yu, X.; Zheng, Y.; Gong, M.; Li, Y. Masked Autoencoders for Point Cloud Self-supervised Learning. In Proceedings of the European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022. [Google Scholar] [CrossRef]
- Xu, R.; Wang, T.; Zhang, W.; Chen, R.; Cao, J.; Pang, J.; Lin, D. MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training. arXiv 2023, arXiv:2303.13510. [Google Scholar] [CrossRef]
- Qin, Z.; Yu, H.; Wang, C.; Guo, Y.; Peng, Y.; Ilic, S.; Hu, D.; Xu, K. GeoTransformer: Fast and Robust Point Cloud Registration with Geometric Transformer. arXiv 2023, arXiv:2308.03768. [Google Scholar] [CrossRef]
- Wu, J.; Zhang, C.; Xue, T.; Freeman, W.T.; Tenenbaum, J.B. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Barcelona, Spain, 5–10 December 2016. [Google Scholar] [CrossRef]
- Yang, G.; Chen, X.; Zhao, H.; Xu, Z.; Zhang, L.; Huang, J.; Huang, J.; Niethammer, M.; Fidler, S. PointFlow: 3D Point Cloud Generation with Continuous Normalizing Flows. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar] [CrossRef]
- Nichol, A.; Dhariwal, P.; Chan, W.; Ramachandran, A. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv 2022. [Google Scholar] [CrossRef]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy Networks: Learning 3D Reconstruction in Function Space. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar] [CrossRef]
- Mildenhall, B.; Srinivasan, P.; Tancik, M.; Barron, J.; Ramamoorthi, R.; Ng, R. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020. [Google Scholar] [CrossRef]
- Zhi, S.; Laidlow, T.; Leutenegger, S.; Davison, A.J. In-Place Scene Labelling and Understanding with Implicit Scene Representation. arXiv 2021. [Google Scholar] [CrossRef]
- Wang, R.; Zhang, S.; Huang, P.; Zhang, D.; Yan, W. Semantic Is Enough: Only Semantic Information for NeRF Reconstruction. In Proceedings of the 2023 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 13 October 2023; pp. 906–912. [Google Scholar] [CrossRef]
- Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph. (TOG) 2023, 42, 1–13. [Google Scholar] [CrossRef]
- Zheng, Y.; Li, X.; Xu, J.; Zhu, Y.; Jiang, Y.; Wang, Y.; Zhang, Z.; Bao, H.; Wang, G. MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar] [CrossRef]
- Xu, H.; Chen, W.; Zhang, Z.; Su, H.; Yu, F. VolRecon: Volume Rendering of Signed Ray Distance Fields for Generalizable Multi-view 3D Reconstruction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023. [Google Scholar]
- Jang, W.; Agapito, L. CodeNeRF: Disentangled Neural Radiance Fields for Object Categories. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12929–12938. [Google Scholar] [CrossRef]
- Armeni, I.; Sener, O.; Zamir, A.R.; Jiang, H.; Brilakis, I.; Fischer, M.; Savarese, S. 3D Semantic Parsing of Large-Scale Indoor Spaces. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1534–1543. [Google Scholar] [CrossRef]
- Dai, A.; Chang, A.X.; Savva, M.; Halber, M.; Funkhouser, T.; Niessner, M. ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2432–2443. [Google Scholar] [CrossRef]
- Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. Semantic3D.Net: A New Large-scale Point Cloud Classification Benchmark. Isprs Ann. Photogramm. Remote. Sens. Spat. Inf. Sci. 2017, IV-1/W1, 91–98. [Google Scholar] [CrossRef]
- Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A Multimodal Dataset for Autonomous Driving. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11618–11628. [Google Scholar] [CrossRef]
- Mo, K.; Zhu, S.; Chang, A.X.; Yi, L.; Tripathi, S.; Guibas, L.J.; Su, H. Partnet: A Large-Scale Benchmark for Fine-Grained and Hierarchical Part-Level 3d Object Understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 909–918. [Google Scholar] [CrossRef]
- Geiger, A.; Lenz, P.; Urtasun, R. Are We Ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
- Roynard, X.; Deschaud, J.E.; Goulette, F. Paris-Lille-3D: A Large and High-Quality Ground-Truth Urban Point Cloud Dataset for Automatic Segmentation and Classification. Int. J. Robot. Res. 2018, 37, 545–557. [Google Scholar] [CrossRef]
- Ding, H.; Jiang, X.; Shuai, B.; Liu, A.Q.; Wang, G. Context Contrasted Feature and Gated Multi-Scale Aggregation for Scene Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 2393–2402. [Google Scholar] [CrossRef]
- Shuai, B.; Ding, H.; Liu, T.; Wang, G.; Jiang, X. Toward Achieving Robust Low-Level and High-Level Scene Parsing. IEEE Trans. Image Process. 2019, 28, 1378–1390. [Google Scholar] [CrossRef]
- Li, X.; Zhao, H.; Han, L.; Tong, Y.; Tan, S.; Yang, K. Gated Fully Fusion for Semantic Segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 11418–11425. [Google Scholar] [CrossRef]
- Li, X.; Zhang, L.; Cheng, G.; Yang, K.; Tong, Y.; Zhu, X.; Xiang, T. Global Aggregation Then Local Distribution for Scene Parsing. IEEE Trans. Image Process. 2021, 30, 6829–6842. [Google Scholar] [CrossRef]
- Zhang, L.; Li, X.; Arnab, A.; Yang, K.; Tong, Y.; Torr, P.H.S. Dual Graph Convolutional Network for Semantic Segmentation. arXiv 2019, arXiv:1909.06121. [Google Scholar] [CrossRef]
- Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-local Neural Networks. arXiv 2017, arXiv:1711.07971. [Google Scholar] [CrossRef]
- Yuan, Y.; Chen, X.; Chen, X.; Wang, J. Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation. arXiv 2019, arXiv:1909.11065. [Google Scholar] [CrossRef]
- Kirillov, A.; Wu, Y.; He, K.; Girshick, R. Pointrend: Image Segmentation as Rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
- Ding, H.; Jiang, X.; Liu, A.Q.; Thalmann, N.M.; Wang, G. Boundary-Aware Feature Propagation for Scene Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Glasgow, UK, 23–28 August 2019; pp. 6818–6828. [Google Scholar] [CrossRef]
- Li, X.; Li, X.; Zhang, L.; Cheng, G.; Shi, J.; Lin, Z.; Tan, S.; Tong, Y. Improving Semantic Segmentation via Decoupled Body and Edge Supervision. In Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK, 23–28 August 2020; pp. 435–452. [Google Scholar] [CrossRef]
- He, H.; Li, X.; Cheng, G.; Shi, J.; Tong, Y.; Meng, G.; Prinet, V.; Weng, L. Enhanced Boundary Learning for Glass-like Object Segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 15839–15848. [Google Scholar] [CrossRef]
- Jhaldiyal, A.; Chaudhary, N. Semantic Segmentation of 3D LiDAR Data Using Deep Learning: A Review of Projection-Based Methods. Appl. Intell. 2023, 53, 6844–6855. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhou, Z.; David, P.; Yue, X.; Xi, Z.; Gong, B.; Foroosh, H. PolarNet: An Improved Grid Representation for Online LiDAR Point Clouds Semantic Segmentation. arXiv 2020, arXiv:2003.14032. [Google Scholar] [CrossRef]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. Randla-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11108–11117. [Google Scholar] [CrossRef]
- He, W.; Jamonnak, S.; Gou, L.; Ren, L. CLIP-S$⌃4$: Language-Guided Self-Supervised Semantic Segmentation. arXiv 2023, arXiv:2305.01040. [Google Scholar] [CrossRef]
- Alonso, I.; Riazuelo, L.; Montesano, L.; Murillo, A.C. 3D-MiniNet: Learning a 2D Representation From Point Clouds for Fast and Efficient 3D LIDAR Semantic Segmentation. IEEE Robot. Autom. Lett. 2020, 5, 5432–5439. [Google Scholar] [CrossRef]
- Schult, J.; Engelmann, F.; Hermans, A.; Litany, O.; Tang, S.; Leibe, B. Mask3D: Mask Transformer for 3D Semantic Instance Segmentation. arXiv 2022, arXiv:2210.03105. [Google Scholar] [CrossRef]
- Vu, T.; Kim, K.; Luu, T.M.; Nguyen, X.T.; Yoo, C.D. SoftGroup for 3D Instance Segmentation on Point Clouds. arXiv 2022, arXiv:2203.01509. [Google Scholar] [CrossRef]
- Jain, U.; Mirzaei, A.; Gilitschenski, I. GaussianCut: Interactive segmentation via graph cut for 3D Gaussian Splatting. arXiv 2024, arXiv:2411.07555. [Google Scholar] [CrossRef]
- Cen, J.; Fang, J.; Yang, C.; Xie, L.; Zhang, X.; Shen, W.; Tian, Q. Segment Any 3D Gaussians. arXiv 2023, arXiv:2312.00860. [Google Scholar] [CrossRef]
- Shen, Q.; Yang, X.; Wang, X. FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally. arXiv 2024. [Google Scholar] [CrossRef]
- Zhu, X.; Zhou, H.; Wang, T.; Hong, F.; Li, W.; Ma, Y.; Li, H.; Yang, R.; Lin, D. Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-Based Perception. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6807–6822. [Google Scholar] [CrossRef]
- Madawy, K.E.; Rashed, H.; Sallab, A.E.; Nasr, O.; Kamel, H.; Yogamani, S. RGB and LiDAR fusion based 3D Semantic Segmentation for Autonomous Driving. arXiv 2019. [Google Scholar] [CrossRef]
- Zhang, X.; He, L.; Chen, J.; Wang, B.; Wang, Y.; Zhou, Y. Multiattention Mechanism 3D Object Detection Algorithm Based on RGB and LiDAR Fusion for Intelligent Driving. Sensors 2023, 23, 8732. [Google Scholar] [CrossRef] [PubMed]
- Dimitrovski, I.; Spasev, V.; Kitanovski, I. Deep Multimodal Fusion for Semantic Segmentation of Remote Sensing Earth Observation Data. arXiv 2024. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, J.; Wang, B.; Liu, J. A Cross-Modal Feature Fusion Model Based on ConvNeXt for RGB-D Semantic Segmentation. Mathematics 2023, 11, 1828. [Google Scholar] [CrossRef]
- Li, J.; Xiao, J.; Lu, Q.; Xu, Q.; Deng, J.; He, Y.; Zheng, N. Deep Learning for LiDAR-Only and LiDAR-Fusion 3D Perception: A Survey. Intell. Robot. 2021, 1, 171–190. [Google Scholar] [CrossRef]
- Li, Y.; Chen, H.; Cui, Z.; Timofte, R.; Pollefeys, M.; Chirikjian, G.; Van Gool, L. Towards Efficient Graph Convolutional Networks for Point Cloud Handling. arXiv 2021. [Google Scholar] [CrossRef]
- Xu, M.; Ding, R.; Zhao, H.; Qi, X. PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds. arXiv arXiv:2103.14635, 2021. [CrossRef]
- Zhang, Z.; Hua, B.S.; Yeung, S.K. ShellNet: Efficient Point Cloud Convolutional Neural Networks using Concentric Shells Statistics. arXiv aa 2019, arXiv:1908.06295. [Google Scholar] [CrossRef]
- Razani, R.; Cheng, R.; Taghavi, E.; Bingbing, L. Lite-Hdseg: Lidar Semantic Segmentation Using Lite Harmonic Dense Convolutions. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xi’an, China, 30 May–5 June 2021; pp. 9550–9556. [Google Scholar]
- Graham, B.; van der Maaten, L. Submanifold Sparse Convolutional Networks. arXiv 2017, arXiv:1706.01307. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, R.; Li, X.; Kong, L.; Yang, Y.; Xia, Z.; Bai, Y.; Zhu, X.; Ma, Y.; Li, Y.; et al. Uniseg: A Unified Multi-Modal Lidar Segmentation Network and the Openpcseg Codebase. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 1–6 October 2023; pp. 21662–21673. [Google Scholar] [CrossRef]
- Hou, Y.; Zhu, X.; Ma, Y.; Loy, C.C.; Li, Y. Point-to-Voxel Knowledge Distillation for Lidar Semantic Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 8479–8488. [Google Scholar] [CrossRef]
- Qiu, H.; Yu, B.; Tao, D. GFNet: Geometric Flow Network for 3D Point Cloud Semantic Segmentation. arXiv 2022, arXiv:2207.02605. [Google Scholar] [CrossRef]
- Kochanov, D.; Nejadasl, F.K.; Booij, O. KPRNet: Improving Projection-Based LiDAR Semantic Segmentation. arXiv 2020, arXiv:2007.12668. [Google Scholar] [CrossRef]
- Ando, A.; Gidaris, S.; Bursuc, A.; Puy, G.; Boulch, A.; Marlet, R. RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 5240–5250. [Google Scholar] [CrossRef]
- Cortinhal, T.; Tzelepis, G.; Erdal Aksoy, E. SalsaNext: Fast, Uncertainty-Aware Semantic Segmentation of LiDAR Point Clouds. In Advances in Visual Computing; Bebis, G., Yin, Z., Kim, E., Bender, J., Subr, K., Kwon, B.C., Zhao, J., Kalkofen, D., Baciu, G., Eds.; Springer International Publishing: Cham, Switzerland, 2020; Volume 12510, pp. 207–222. [Google Scholar] [CrossRef]
- Sun, J.; Qing, C.; Tan, J.; Xu, X. Superpoint Transformer for 3D Scene Instance Segmentation. arXiv 2022, arXiv:2211.15766. [Google Scholar] [CrossRef]
- Rosu, R.A.; Schütt, P.; Quenzel, J.; Behnke, S. LatticeNet: Fast Point Cloud Segmentation Using Permutohedral Lattices. arXiv 2019, arXiv:1912.05905. [Google Scholar] [CrossRef]
- Li, S.; Chen, X.; Liu, Y.; Dai, D.; Stachniss, C.; Gall, J. Multi-Scale Interaction for Real-Time LiDAR Data Segmentation on an Embedded Platform. IEEE Robot. Autom. Lett. 2022, 7, 738–745. [Google Scholar] [CrossRef]
- Tatarchenko, M.; Park, J.; Koltun, V.; Zhou, Q.Y. Tangent Convolutions for Dense Prediction in 3d. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3887–3896. [Google Scholar] [CrossRef]
- Huang, Q.; Wang, W.; Neumann, U. Recurrent Slice Networks for 3d Segmentation of Point Clouds. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2626–2635. [Google Scholar] [CrossRef]
- Su, H.; Jampani, V.; Sun, D.; Maji, S.; Kalogerakis, E.; Yang, M.H.; Kautz, J. Splatnet: Sparse Lattice Networks for Point Cloud Processing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2530–2539. [Google Scholar] [CrossRef]
- Kong, L.; Ren, J.; Pan, L.; Liu, Z. LaserMix for Semi-Supervised LiDAR Semantic Segmentation. arXiv 2023, arXiv:2207.00026. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Benallal, H.; Abdallah Saab, N.; Tairi, H.; Alfalou, A.; Riffi, J. Advancements in Semantic Segmentation of 3D Point Clouds for Scene Understanding Using Deep Learning. Technologies 2025, 13, 322. https://doi.org/10.3390/technologies13080322
Benallal H, Abdallah Saab N, Tairi H, Alfalou A, Riffi J. Advancements in Semantic Segmentation of 3D Point Clouds for Scene Understanding Using Deep Learning. Technologies. 2025; 13(8):322. https://doi.org/10.3390/technologies13080322
Chicago/Turabian StyleBenallal, Hafsa, Nadine Abdallah Saab, Hamid Tairi, Ayman Alfalou, and Jamal Riffi. 2025. "Advancements in Semantic Segmentation of 3D Point Clouds for Scene Understanding Using Deep Learning" Technologies 13, no. 8: 322. https://doi.org/10.3390/technologies13080322
APA StyleBenallal, H., Abdallah Saab, N., Tairi, H., Alfalou, A., & Riffi, J. (2025). Advancements in Semantic Segmentation of 3D Point Clouds for Scene Understanding Using Deep Learning. Technologies, 13(8), 322. https://doi.org/10.3390/technologies13080322