Geometry-Aware Human Noise Removal from TLS Point Clouds via 2D Segmentation Projection
Abstract
1. Introduction
2. Related Work
- (i)
- Point-cloud-native learning-based methods. Since PointNet [7], numerous architectures have been proposed for the semantic segmentation and understanding of large-scale point clouds, including RandLA-Net [8], KPConv [9], Point Transformer [10], Stratified Transformer [11], MinkowskiNet [12], and PointNeXt [13]. While these models achieve strong performance, they generally require large amounts of 3D annotations for supervised training, which can be expensive and time-consuming to obtain in practical surveying workflows [6,8]. In addition, surveys on point-cloud denoising and processing provide broad overviews of such learning-based pipelines and their challenges in real-world scenes [3,6]. More recently, newer point-based backbones have continued to improve the efficiency–accuracy trade-off on large-scale scenes; for example, Point Transformer V3 (PTv3) provides a streamlined and scalable architecture for point cloud understanding [14]. Complementary to learning-based backbones, practical large-scale point-cloud workflows often employ lightweight geometric preprocessing to reduce the search space and improve computational efficiency. Such grid-based preprocessing is widely used in tasks such as ground-point extraction.
- (ii)
- Image-based detection/segmentation models. High-accuracy detectors and instance segmentation frameworks such as Mask Region-Based Convolutional Neural Network (Mask R-CNN) [15] and one-stage detectors, such as YOLO [16], have been widely adopted in 2D vision. Recent implementations of Ultralytics (e.g., YOLOv8) also provide instance segmentation variants suitable for efficient mask extraction [17]. Large-scale datasets such as COCO [4] have made pretrained models widely available. For semantic segmentation, models such as DeepLab [18] are commonly used, while transformer-based detectors (e.g., DETR) [19] further broaden the design space for robust 2D recognition. In addition, modern mask prediction transformers and foundation segmentation models (e.g., Mask2Former and Segment Anything) have advanced general-purpose mask extraction, offering strong 2D priors that can be integrated into 2D–3D workflows when appropriate [20,21].
- (iii)
- Examples of 2D–3D fusion and projection-based coupling. Several frameworks propagate 2D recognition results into 3D space to gain a broader understanding. Frustum PointNets [22] and RoarNet [23], for example, generate 3D frustums from 2D detections and apply dedicated 3D networks within them. PointPainting assigns image-segmentation cues to points to improve 3D object detection [24]. These methods share the general direction of transferring 2D recognition into 3D, but they typically do not develop post-projection mechanisms that systematically suppress false positives using explicit geometric consistency checks. By contrast, our approach combines pretrained 2D segmentation with geometry-aware filtering, leveraging planarity-related cues [25] and density-based clustering principles [5,26] to reduce false positives caused by projection misalignment and background structures. Recent fusion methods have explored tighter cross-modal coupling. This can be performed, for example, via cross-attention between multi-view image features and 3D point features (e.g., 2D–3D Interlaced Transformer) [27] and open-vocabulary co-embedding of points, images, and text to understand flexible 3D scenes (e.g., OpenScene) [28]. Recently, Yue et al. [29] reported a YOLO-based 2D–3D fusion approach to improve multi-class point cloud segmentation.
3. Proposed Method
3.1. Overview
3.2. Extraction of Noise Candidates from 2D Images (Input and Preprocessing)
3.3. The 2D Image–3D Point Cloud Matching Stage (Input and Preprocessing)
3.4. False-Positive Suppression via Geometric Processing
3.4.1. Removal of Small Noise Clusters
3.4.2. Coarse Extraction of Human Candidate Regions
3.4.3. Background Segmentation
3.4.4. Validation of Human Plausibility per Cluster
4. Experiments
4.1. Experimental Setup and Datasets
4.1.1. Data Acquisition and Datasets
- Osaka Metropolitan University, Sugimoto Campus (hereafter, OMU)—60 stations and 82 scenes. This campus has substantial daily pedestrian traffic, making it well-suited for evaluating human detection performance (i.e., human-related noise).
- Jinaimachi, Tondabayashi City, Osaka Prefecture, Japan (hereafter, JM)—55 stations and 68 scenes. Designated as an Important Preservation District for Groups of Traditional Buildings, this area contains many planar structures, such as walls; it is, therefore, appropriate for testing the suppression of false positives using practical cultural-heritage scenarios.
- Waymo Open Dataset (Perception Dataset v2.0.1) [35]. We use the validation split of the Waymo Open Dataset as the evaluation dataset, from which 15 scenes were randomly selected. This dataset was captured using a vehicle-mounted MMS and thus differs substantially from the TLS datasets in terms of sensor motion, continuously changing viewpoints, and illumination variations; therefore, it is used here to examine the transferability and limitations of the proposed method beyond static TLS settings.
4.1.2. Ground-Truth Annotation
4.1.3. Implementation Details (2D Inference and Computing Environment)
4.1.4. Evaluation of Reprojection Accuracy
4.2. Evaluation Metrics
4.3. Parameter Validation
4.3.1. Parameter Settings for DBSCAN and Radius Parameter in the Cylindrical Filter
- (i)
- Performance Distribution Comparison Using IoU Heatmaps
- (ii)
- Optimal Parameter Selection
- (iii)
- Universal Robust Region
- Neighborhood search radius for DBSCAN clustering: ;
- Cylinder radius : .
4.3.2. Parameter Settings for Principal Component Analysis
- (i)
- Stability Evaluation with Respect to Observation Distance and Point-Cloud Density
- (ii)
- Threshold Sensitivity Analysis and Final Optimization
4.3.3. Distance Threshold Setting
4.3.4. Choice of 2D Model (Ablation)
4.4. Overall Evaluation Setup and Baseline Definition
- (1)
- Simple projection of 2D segmentation results (2D-only);
- (2)
- Addition of 3D clustering;
- (3)
- Addition of cylindrical processing;
- (4)
- Addition of background segmentation;
- (5)
- The full proposed method with cluster-level human plausibility validation (proposed).
5. Results
5.1. Ablation Study Results and Computational Cost
5.2. Quantitative Results on Two Real-World Datasets
5.2.1. Overall Performance of the Proposed Method
5.2.2. Comparison with Geometry-Based Sequential Dynamic Object Removal Methods
- (i)
- Overview of Quantitative Results
- (ii)
- Results from the OMU Dataset
- (iii)
- Results on the JM Dataset
- (iv)
- Qualitative Comparison and Failure Analysis
5.3. Generalization to MMS Data
- (1)
- Degradation under low-contrast conditions
- (2)
- Occlusion by nearby structures
5.4. Qualitative Visualization of Human Extraction and Removal
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| CNN | Convolutional Neural Network |
| COCO | Common Objects in Context |
| DBSCAN | Density-Based Spatial Clustering of Applications with Noise |
| FN | False Negative |
| FP | False Positive |
| FPS | Frames Per Second |
| GT | Ground Truth |
| IoU | Intersection over Union |
| KDE | Kernel Density Estimation |
| LiDAR | Light Detection and Ranging |
| Mask R-CNN | Mask Region-Based Convolutional Neural Network |
| MMS | Mobile Mapping System |
| PCA | Principal Component Analysis |
| RANSAC | Random Sample Consensus |
| RGB | Red-Green-Blue |
| RMSE | Root Mean Square Error |
| TLS | Terrestrial Laser Scanner |
| TP | True Positive |
| YOLO | You Only Look Once |
References
- Remondino, F. Heritage Recording and 3D Modeling with Photogrammetry and 3D Scanning. Remote Sens. 2011, 3, 1104–1138. [Google Scholar] [CrossRef]
- Yang, S.; Hou, M.; Li, S. Three-Dimensional Point Cloud Semantic Segmentation for Cultural Heritage: A Comprehensive Review. Remote Sens. 2023, 15, 548. [Google Scholar] [CrossRef]
- Zhang, C.; Zhang, X.; Lao, M.; Jiang, T.; Xu, X.; Li, W.; Zhang, F.; Chen, L. Deep Learning for Point Cloud Denoising: A Survey. arXiv 2025, arXiv:2508.11932. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft COCO: Common Objects in Context. In Computer Vision—ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2014; Volume 8693, pp. 740–755. ISBN 978-3-319-10601-4. [Google Scholar]
- Schubert, E.; Sander, J.; Ester, M.; Kriegel, H.P.; Xu, X. DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN. ACM Trans Database Syst 2017, 42, 19:1–19:21. [Google Scholar] [CrossRef]
- Guo, Y.; Wang, H.; Hu, Q.; Liu, H.; Liu, L.; Bennamoun, M. Deep Learning for 3D Point Clouds: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 4338–4364. [Google Scholar] [CrossRef] [PubMed]
- Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 77–85. [Google Scholar]
- Hu, Q.; Yang, B.; Xie, L.; Rosa, S.; Guo, Y.; Wang, Z.; Trigoni, N.; Markham, A. RandLA-Net: Efficient Semantic Segmentation of Large-Scale Point Clouds. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11105–11114. [Google Scholar]
- Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and Deformable Convolution for Point Clouds. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6410–6419. [Google Scholar]
- Zhao, H.; Jiang, L.; Jia, J.; Torr, P.; Koltun, V. Point Transformer. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 16239–16248. [Google Scholar]
- Lai, X.; Liu, J.; Jiang, L.; Wang, L.; Zhao, H.; Liu, S.; Qi, X.; Jia, J. Stratified Transformer for 3D Point Cloud Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8490–8499. [Google Scholar]
- Choy, C.; Gwak, J.; Savarese, S. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 3070–3079. [Google Scholar]
- Qian, G.; Li, Y.; Peng, H.; Mai, J.; Al Kader Hammoud, H.A.; Elhoseiny, M.; Ghanem, B. PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies. In Proceedings of the 36th International Conference on Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2022; pp. 23192–23204. [Google Scholar]
- Wu, X.; Jiang, L.; Wang, P.-S.; Liu, Z.; Liu, X.; Qiao, Y.; Ouyang, W.; He, T.; Zhao, H. Point Transformer V3: Simpler, Faster, Stronger. arXiv 2023, arXiv:2312.10035. [Google Scholar] [CrossRef]
- He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Ultralytics YOLOv8—Documentation. Available online: https://docs.ultralytics.com/models/yolov8/ (accessed on 28 December 2025).
- Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Computer Vision—ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 833–851. ISBN 978-3-030-01233-5. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-End Object Detection with Transformers. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12346, pp. 213–229. ISBN 978-3-030-58451-1. [Google Scholar]
- Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. arXiv 2021, arXiv:2112.01527. [Google Scholar] [CrossRef]
- Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.-Y.; et al. Segment Anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [CrossRef] [PubMed]
- Qi, C.R.; Liu, W.; Wu, C.; Su, H.; Guibas, L.J. Frustum PointNets for 3D Object Detection from RGB-D Data. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 918–927. [Google Scholar]
- Shin, K.; Kwon, Y.P.; Tomizuka, M. RoarNet: A Robust 3D Object Detection Based on RegiOn Approximation Refinement. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 2510–2515. [Google Scholar]
- Vora, S.; Lang, A.H.; Helou, B.; Beijbom, O. PointPainting: Sequential Fusion for 3D Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 4603–4611. [Google Scholar]
- Weinmann, M.; Jutzi, B.; Mallet, C. Geometric Features and Their Relevance for 3D Point Cloud Classification. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, IV-1/W1, 157–164. [Google Scholar] [CrossRef]
- Sander, J.; Ester, M.; Kriegel, H.-P.; Xu, X. Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications. Data Min. Knowl. Discov. 1998, 2, 169–194. [Google Scholar] [CrossRef]
- Yang, C.-K.; Chen, M.-H.; Chuang, Y.-Y.; Lin, Y.-Y. 2D-3D Interlaced Transformer for Point Cloud Segmentation with Scene-Level Supervision. arXiv 2023, arXiv:2310.12817. [Google Scholar] [CrossRef]
- Peng, S.; Genova, K.; Jiang, C.M.; Tagliasacchi, A.; Pollefeys, M.; Funkhouser, T. OpenScene: 3D Scene Understanding with Open Vocabularies. arXiv 2022, arXiv:2211.15654. [Google Scholar] [CrossRef]
- Yue, H.; Wang, Q.; Zhang, M.; Xue, Y.; Lu, L. 2D–3D Fusion Approach for Improved Point Cloud Segmentation. Autom. Constr. 2025, 177, 106336. [Google Scholar] [CrossRef]
- Habibiroudkenar, P.; Ojala, R.; Tammi, K. DynaHull: Density-Centric Dynamic Point Filtering in Point Clouds. J. Intell. Robot. Syst. 2024, 110, 165. [Google Scholar] [CrossRef]
- Duberg, D.; Zhang, Q.; Jia, M.; Jensfelt, P. DUFOMap: Efficient Dynamic Awareness Mapping. IEEE Robot. Autom. Lett. 2024, 9, 5038–5045. [Google Scholar] [CrossRef]
- Jia, M.; Zhang, Q.; Yang, B.; Wu, J.; Liu, M.; Jensfelt, P. BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps. IEEE Robot. Autom. Lett. 2024, 9, 6256–6263. [Google Scholar] [CrossRef]
- Fischler, M.A.; Bolles, R.C. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Commun ACM 1981, 24, 381–395. [Google Scholar] [CrossRef]
- Pauly, M.; Keiser, R.; Kobbelt, L.P.; Gross, M. Shape Modeling with Point-Sampled Geometry. ACM Trans. Graph. 2003, 22, 641–650. [Google Scholar] [CrossRef]
- Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in Perception for Autonomous Driving: Waymo Open Dataset. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2443–2451. [Google Scholar]
- Liu, Y.; Wang, X.; Hu, E.; Wang, A.; Shiri, B.; Lin, W. VNDHR: Variational Single Nighttime Image Dehazing for Enhancing Visibility in Intelligent Transportation Systems via Hybrid Regularization. IEEE Trans. Intell. Transp. Syst. 2025, 26, 10189–10203. [Google Scholar] [CrossRef]
- Li, Q.; Du, Q.; Tian, L.; Liao, W.; Lu, G. Enhanced Semantic Segmentation of LiDAR Point Clouds Using Projection-Based Deep Learning Networks. IEEE Trans. Geosci. Remote Sens. 2025, 63, 5706015. [Google Scholar] [CrossRef]



















| Dataset | Precision | Recall | IoU |
|---|---|---|---|
| OMU | 0.9502 | 0.9014 | 0.8607 |
| JM | 0.8912 | 0.9028 | 0.8132 |
| Planarity Score | Osaka Metropolitan University, Sugimoto Campus [IoU] | Jinaimachi, Tondabayashi City [IoU] |
|---|---|---|
| 0.40 | 0.6044 | 0.785 |
| 0.45 | 0.7819 | 0.7953 |
| 0.50 | 0.8316 | 0.7798 |
| 0.55 | 0.8432 | 0.8186 |
| 0.60 | 0.8607 | 0.8132 |
| 0.65 | 0.8584 | 0.7721 |
| 0.70 | 0.8588 | 0.7739 |
| 0.75 | 0.8586 | 0.7939 |
| 0.80 | 0.8586 | 0.7894 |
| Model | FPS | Total Time [s] | Precision | Recall | IoU |
|---|---|---|---|---|---|
| YOLOv8 | 3.89 | 935.62 | 0.9502 | 0.9014 | 0.8607 |
| YOLOv12 | 3.96 | 938.18 | 0.9444 | 0.8964 | 0.8515 |
| YOLOv8 * | 2.20 | 990.15 | 0.9470 | 0.8810 | 0.8396 |
| YOLOv12 * | 1.56 | 1029.07 | 0.9471 | 0.8834 | 0.8419 |
| Mask R-CNN * | 0.59 | 1663.09 | 0.9408 | 0.8897 | 0.8425 |
| Model | FPS | Total Time [s] | Precision | Recall | IoU |
|---|---|---|---|---|---|
| YOLOv8 | 3.89 | 571.58 | 0.8912 | 0.9028 | 0.8132 |
| YOLOv12 | 3.96 | 569.91 | 0.8549 | 0.8971 | 0.7786 |
| YOLOv8 * | 2.20 | 582.68 | 0.9112 | 0.8076 | 0.7486 |
| YOLOv12 * | 1.56 | 572.01 | 0.8823 | 0.9003 | 0.8038 |
| Mask R-CNN * | 0.59 | 689.17 | 0.9090 | 0.8152 | 0.7537 |
| Dataset | Pattern | Method | Time [s] | Precision | Recall | IoU |
|---|---|---|---|---|---|---|
| OMU | (1) | 2D-only | 96.98 | 0.6090 | 0.7185 | 0.4917 |
| (2) | +Clustering | 187.90 | 0.7469 | 0.7001 | 0.5659 | |
| (3) | +Cylinder | 290.79 | 0.6373 | 0.9340 | 0.6098 | |
| (4) | +Background | 848.03 | 0.8310 | 0.9230 | 0.7771 | |
| (5) | Proposed | 935.62 | 0.9502 | 0.9014 | 0.8607 | |
| JM | (1) | 2D-only | 77.05 | 0.6623 | 0.7278 | 0.5308 |
| (2) | +Clustering | 106.62 | 0.7617 | 0.7117 | 0.5821 | |
| (3) | +Cylinder | 170.63 | 0.4671 | 0.9283 | 0.4508 | |
| (4) | +Background | 505.39 | 0.7546 | 0.8941 | 0.6927 | |
| (5) | Proposed | 571.58 | 0.8912 | 0.9028 | 0.8132 |
| Method | Total Time [s] | Precision | Recall | IoU |
|---|---|---|---|---|
| DUFOMap | 303.01 | - | 0.9316 | - |
| BeautyMap | 1562.37 | - | 0.7343 | - |
| Ours (YOLOv8) | 935.62 | 0.9502 | 0.9014 | 0.8607 |
| Method | Total Time [s] | Precision | Recall | IoU |
|---|---|---|---|---|
| DUFOMap | 1842.68 | - | 0.6479 | - |
| BeautyMap | 4936.39 | - | 0.5226 | - |
| Ours (YOLOv8) | 571.58 | 0.8912 | 0.9028 | 0.8132 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Komura, F.; Yoshida, D.; Ueda, R. Geometry-Aware Human Noise Removal from TLS Point Clouds via 2D Segmentation Projection. Sensors 2026, 26, 1237. https://doi.org/10.3390/s26041237
Komura F, Yoshida D, Ueda R. Geometry-Aware Human Noise Removal from TLS Point Clouds via 2D Segmentation Projection. Sensors. 2026; 26(4):1237. https://doi.org/10.3390/s26041237
Chicago/Turabian StyleKomura, Fuga, Daisuke Yoshida, and Ryosei Ueda. 2026. "Geometry-Aware Human Noise Removal from TLS Point Clouds via 2D Segmentation Projection" Sensors 26, no. 4: 1237. https://doi.org/10.3390/s26041237
APA StyleKomura, F., Yoshida, D., & Ueda, R. (2026). Geometry-Aware Human Noise Removal from TLS Point Clouds via 2D Segmentation Projection. Sensors, 26(4), 1237. https://doi.org/10.3390/s26041237

