Next Article in Journal
An Optical–SAR Remote Sensing Image Automatic Registration Model Based on Multi-Constraint Optimization
Next Article in Special Issue
A Transferable Modeling Framework for Improving the Cooling Effect of Urban Green Space: Multi-Temporal Sampling, 3D Morphological Reconstruction and Bayesian Network
Previous Article in Journal
Evolution of Urban Spatial Morphology and Its Driving Mechanisms in Fujian Province Based on Multi-Source Nighttime Light Remote Sensing
Previous Article in Special Issue
Sensitivity Analysis of Sentinel-2 Imagery to Assess Urban Tree Functional Traits: A Physical Approach Based on Local Climate Zones
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GRASS: Glass Reflection Artifact Suppression Strategy via Virtual Point Removal in LiDAR Point Clouds

1
College of Computer Science and Electronic Engineering, Hunan University, Lushan South Road, Changsha 410012, China
2
School of Design, South Campus, Hunan University, Pailou Road, Changsha 410012, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2026, 18(2), 332; https://doi.org/10.3390/rs18020332
Submission received: 1 December 2025 / Revised: 11 January 2026 / Accepted: 17 January 2026 / Published: 19 January 2026

Highlights

What are the main findings?
  • A dual-module GRASS framework is proposed, which robustly estimates complete and continuous glass planes involved in reflections by combining multi-echo properties with geometric segmentation, overcoming the limitations of previous sparse or incomplete glass estimations.
  • High-precision identification of virtual points is achieved by fusing reflection symmetry with learned geometric similarity, enabling accurate removal of sparse, low-structural-continuity reflection artifacts and significantly outperforming existing methods.
What are the implications of the main findings?
  • Provides a high-quality 3D data foundation for building surveying, offering cleaner and more precise data for downstream applications like architectural modeling and infrastructure inspection by effectively suppressing glass reflection artifacts in TLS point clouds.
  • Enhances the robustness of LiDAR-based perception systems by offering novel theoretical insights and practical solutions for 3D reflection problems, significantly improving reliability in complex glass-rich environments.

Abstract

In building measurement using terrestrial laser scanners (TLSs), acquired 3D point clouds (3DPCs) often contain significant reflection artifacts caused by reflective glass surfaces. Such reflection artifacts significantly degrade the performance of downstream applications. This study proposes a novel strategy, called GRASS, to remove these reflection artifacts. Specifically, candidate glass points are identified based on multi-echo returns caused by glass components. These potential glass regions are then refined through planar segmentation using geometric constraints. Then, we trace laser beam trajectories to identify the reflection affected zones based on the estimated glass planes and scanner positions. Finally, reflection artifacts are identified using dual criteria: (1) Reflection symmetry between artifacts and their source entities across glass components. (2) Geometric similarity through a 3D deep neural network. We evaluate the effectiveness of the proposed solution across a variety of 3DPC datasets and demonstrate that the method can reliably estimate multiple glass regions and accurately identify virtual points. Furthermore, both qualitative and quantitative evaluations confirm that GRASS outperforms existing methods in removing reflection artifacts by a significant margin.

1. Introduction

Glass surfaces, such as glass doors and glass windows, are common in man-made urban and indoor environments. They may seem visually inconspicuous to the human eye, but their presence introduces significant challenges in both 2D and 3D perception systems. As light passes through the glass, it is also reflected by the glass surface, causing a glass image to capture not only the intended scene behind the glass but also the unwanted scene in front of it. In the past decade, 2D glass reflection artifact suppression (GRAS) approaches, including single-based decomposition [1,2,3,4], learning-based [5,6,7,8], and multi-image-based methods [9,10,11,12,13,14], have garnered significant attention and achieved notable progress.

1.1. Motivation

Research on reflection removal in 3D scenarios remains relatively underexplored. In particular, existing point cloud denoising studies primarily target noisy points that deviate stochastically from the true geometric surface of objects [15,16,17,18]. Such “geometric noise”, typically originating from measurement equipment errors, environmental interference, or uneven point cloud density, is fundamentally different from reflections. Reflection-induced virtual points, on the other hand, are physically inconsistent and are incorrectly located behind reflective surfaces as shown in Figure 1. These specular artifacts from real-world surfaces often share nearly identical geometric characteristics with true scene points and are particularly problematic in dense point cloud systems such as terrestrial laser scanning (TLS). Moreover, in many indoor or urban scenarios, the reflected content can be visually and structurally indistinguishable from actual indoor objects, which severely degrade the performance of downstream tasks such as registration [19], scene understanding [20], and building reconstruction [21].
Current studies for reflection removal in LiDAR measurement generally follow a two-stage paradigm: glass plane extraction and virtual point discrimination [22,23,24,25,26]. Specifically, these methods first detect glass surfaces and then search for counterparts for points behind the glass via mirror transformation. Reflection artifacts are identified by comparing reflections with their counterparts using feature or geometric consistency.
In the glass extraction stage, due to the high transmissivity and specular reflection properties of glass, glass surfaces on buildings often exhibit voids (i.e., areas with no returned points) in TLS scans. This makes it challenging to obtain complete and continuous glass planes. Existing glass extraction methods, including those based on multi-echo features [22,25], reflection intensity thresholds [24,26], and multimodal fusion [23], each possess advantages in specific scenarios, but exhibit significant limitations. Methods relying on multi-echo features typically yield extremely sparse glass surface representations, those using reflection intensity thresholds struggle to handle glass with occlusions such as curtains behind it; and multi-modal fusion approaches are prone to misidentifying objects behind the glass as part of the glass itself due to glass void issues. Consequently, virtual points generated from these undetected and incomplete glass regions cannot be accurately mapped and identified. Therefore, how to obtain all reflection-involved complete and continuous glass regions has become a critical prerequisite for suppressing the generation of reflection artifacts in TLS measurement.
In the virtual point discrimination stage, due to energy attenuation and the sparse nature of reflected light caused by glass surfaces, the resulting reflection artifacts often exhibit a sparser distribution and lower structural continuity compared to their actual object, which makes it difficult to identify them as the mirrored objects of the same real surfaces. Current methods based on traditional handcrafted features [26,27] are inadequate to capture the subtle yet crucial similarities in the sparse distribution pattern between virtual points and the their counterparts. Therefore, it is an urgent to introduce a neural network architecture with powerful nonlinear feature representation capabilities and cross-hierarchical receptive field aggregation mechanisms to effectively capture the similarities between reflection artifacts and their corresponding real objects.

1.2. Contribution and Paper Organization

To tackle this challenge, we propose a novel solution, called GRASS in Figure 1a, GRAS for 3DPCs acquired by LiDAR by effectively removing the virtual points, which consist of two main modules: (1) Glass Plane Estimation Module: By analyzing multi-echo LiDAR returns, we first construct a 2D count map through spherical projection. To address irregular point count distributions caused by sparse virtual points, indoor occlusions, and missing glass points due to total reflections, planar segmentation is applied to cluster these sparse and fragmented glass points into surface patches. Reflective surfaces are then robustly identified by rules across segmented regions rather than raw pixels or points. (2) Virtual Points Detection Module: Given the estimated glass planes, we transform candidate virtual points located behind them to search for their corresponding real points. We then compute a confidence score for each point by combining reflection symmetry with geometric similarity learned through a feature similarity network. Finally, by thresholding these confidence scores, we distinguish virtual points from real objects. we validate our approach on challenging outdoor and indoor scenes with diverse glass structures. The results demonstrate that our method significantly improves the removal of reflection-induced artifacts while preserving real structures. Our contributions can be summarized as follows:
  • We propose a robust glass detection strategy leveraging multi-echo properties and geometrically segmented regions, which significantly enhances the completeness and reflective surface estimation.
  • By jointly exploiting reflection symmetry and geometric similarity between reflected and real-world points, we develop an effective virtual point detection module that eliminates reflection artifacts without relying on handcrafted geometric constraints.
  • We evaluated the quantitative performance of virtual point removal by simulating the glass reflection artifact using multiple diverse public or self-collected 3DPCs datasets. Extensive experiments demonstrate the effectiveness of the proposed solution on glass reflection artifact suppression over the existing methods.
The remaining of this paper is organized as follows. We first investigate existing denoising methods, including TLS-specific indoor virtual points removal, SLAM-related noise handling, and TLS-specific outdoor reflection artifacts in Section 2. Section 3 explains the proposed solution and implemented details. Section 4 presents the experimental results. The current limitations and open challenges are discussed in Section 5.

2. Related Work

2.1. Indoor GRAS

To address reflection problem by showcase glass in complex indoor environments, Gao et al. [28] transformed point clouds into a range map and calculated reflectance variances within sliding windows to identify and remove reflective areas by comparing with neighboring scans. However, this method requires adjacent scan data and is not applicable when such data are unavailable. Considering the challenges of directly denoising large-scale point clouds using deep learning, Gao et al. [29] proposed a transformer-based model to detect reflective regions from range maps. However, both [28,29] are specifically designed for indoor scenes and are not suitable for the large-scale 3D point clouds acquired by TLS in building measurement applications.

2.2. Outdoor GRAS

  • Echo-based methods. Research on virtual point removal in outdoor 3D point clouds remains scarce. Refs. [22,25] introduced a method that identifies glass surfaces using multi-return echoes. After detecting the dominant glass plane, they used a score function based on local geometric features to identify virtual points. However, their methods fail to extract glass surfaces in a completed form, leaving residual reflection artifacts undetected. To achieve complete glass extraction, refs. [23,30] adopt image-domain glass inference strategies, in which glass regions are first estimated or completed in the 2D panoramic image space and then projected into the 3D point cloud. Specifically, ref. [23] performs super-pixel segmentation followed by morphological operations to fill glass holes, while ref. [30] employs learned image segmentation to complete glass regions based on a count map representation. However, these image-domain completion approaches tend to introduce background structures located behind the glass into the inferred glass regions, particularly in areas where laser returns are missing due to total reflection. When reprojected into 3D space, such filled regions do not correspond to physical samples on the true glass plane, leading to ambiguity between glass surfaces and background objects.
  • Intensity-based methods. Differing from the echo-based method [22,23,25,30], Shao et al. [24] and Fang et al. [26] extract mirror-like reflective surfaces based on intensity instead, where the intensity values returned by glass objects are much lower than those of other objects. However, some glass components with curtains drawn behind them cannot be detected due to their high intensity values returned. As a result, the virtual points reflected by them cannot be removed.

2.3. GRAS in SLAM

In SLAM scenarios, mirror or glass induced reflective virtual points on mobile platforms is typically mitigated by leveraging multi-return effects. In refs. [31,32], after identifying the glass surface based on intensities, they directly filter out affected points behind the glass. Even though this strategy allows for fast and efficient virtual points removal in indoor environments, all real-world points behind glass will also be eliminated.
The method proposed in refs. [33,34] leverages reflection symmetry to identify virtual points by projecting points located behind glass surfaces onto their mirrored positions using glass plane equations, and then classifying them based on distance thresholds. However, limited scan lines and low resolution leads to large distances between corresponding points—even belonging to the same physical object. As a result, many virtual points that should be removed exceed the distance threshold are mistakenly classified as valid ones.

3. Methodology

3.1. Overview

Figure 2 shows the workflow of our approach, which consists of two modules:
  • Glass plane detection module (Section 3.2): We first project the 3D points onto a 2D count map and record echo count returns per pixel, then assign the count values per pixel to its corresponding first echoes of point clouds. Next, a planar segmentation method is applied to decompose the entire scene into simple planar geometric structures. Subsequently, glass objects are identified using a rule-based extraction approach.
  • Virtual points removal module (Section 3.3): We present a learning-based approach for virtual point removal in which the 3D feature similarity estimation network is trained independently. Specifically, geometric similarities are computed between pairs of points located at symmetric positions with respect to the estimated glass plane. Virtual points are then identified by applying a threshold to the product of reflection symmetry and the estimated feature similarity at each point.

3.2. Glasses Plane Detection

3.2.1. Detection Principle via Multi-Echo Count

In general, only one single 3D point is generated when a laser beam hits a real-world object. However, as shown in Figure 1b, multiple 3D points are produced when a single light ray hits a glass surface due to reflections and transmission. We leverage this phenomenon to detect glass regions based on their reflective properties.
As shown in Figure 3, we define a unit sphere centered at the scanner and partition its surface into local m × m patches, where each patch corresponds to a 3D frustum. These patches are defined based on the intrinsic angular resolution of the scanner ( ω a z i m u t h a l × ω p o l a r ). Ideally, each primitive angular bin ( m = 1 ) should receive only one single pulse for non-glass objects and two or three pulses for glass objects; however, due to mechanical uncertainties and sampling noise inherent in the whisk-broom scanning mechanism, the resulting count distribution is often too sparse to reliably distinguish glass from non-glass regions using a simple threshold. To mitigate these errors and enhance statistical stability, we employ a larger surface patch covering an angular range of m ω a z i m u t h a l × m ω p o l a r . Through empirical evaluation, we found that m = 3 provides an optimal trade-off between statistical stability and angular resolution; for instance, using a RIEGL VZ-400 scanner ( ω = 0 . 06 ) results in an effective patch resolution of 0 . 18 × 0 . 18 . Consequently, glass regions become clearly distinguishable as they exhibit significantly higher count values compared to non-glass regions.

3.2.2. Challenges of Point-Wise Count-Based Glass Estimation

However, glass estimation based on a count map to directly explore reflection plane estimation in the 3D space of LiDAR still faces significant challenges:
  • According to the count map projection mechanism shown in Figure 4a, the count value of a single pixel depends on the number of valid return signals within its spatial domain. However, the ambiguous multi-echo point clouds will lead to uneven count distribution in the same glasses in Figure 4b, resulting in sparse extraction of reflective area based on count-based threshold. Subsequently, sparse reflective area points result in incomplete and inaccurate orientation and position estimation for the reflective plane. Finally, the method struggles to detect the virtual points it creates, thereby decreasing the accuracy of textcolorredvirtual points removal.
  • It has also been proved that the high reflectivity of glass surfaces may lead to total specular reflection, especially at large incidence angles or under perpendicular incidence, resulting in significant missing point cloud data from glass surfaces [35]. As a result, even real objects and far-away reflection artifacts inside the building planes exhibit dense point clusters, no points can be sampled on the missing glass regions. Instead, these signals may be projected onto real planar objects (e.g., room partition) inside the building, resulting in a higher point count on planar objects behind the glass than on the actual glass surfaces. As shown in Figure 4b, many pixels with high multi-count values correspond to points located on indoor objects, not glass planes. In such case, it is challenging to use an iterative RANSAC to extract glass surfaces. Since RANSAC relies on the highest number of fitted points to select plane parameters in each iteration, this purely statistical criterion may not be reliable for such scenarios [36].
In our study, to detect glass regions more reliably and accurately, we avoid extracting glass surfaces directly from the point-wise count map. Instead, we exploit the observation that glass windows in our scenes predominantly exhibit planar geometry. Accordingly, we first partition the point cloud into a set of simple planar segments in Figure 4c. At this stage, each point still retains its original point-wise count value. To suppress the strong intra-surface variability caused by multi-echo ambiguity and missing returns, we convert the count representation from the point level to the segment level. Specifically, for each planar segment, we compute the average count value of all points belonging to the segment and assign this average uniformly to every point within the segment. As a result, all points within the same planar partition share an identical count value in Figure 4d. This segment-level count map enables the use of a global threshold to extract glass-like planar partitions in a complete and stable manner. However, due to sparse returns and partial occlusions, a single physical glass surface is often fragmented into multiple disconnected planar segments. To address this issue and further suppress non-glass structures, we introduce a surface merging stage. In this step, planar segments within the same facade that exhibit consistent geometric properties are merged to form a dominant glass surface in Figure 4e. This dominant surface representation not only reconstructs the complete glass geometry, but also facilitates the removal of remaining non-glass objects that may locally exhibit high count values, resulting in the final glass extraction shown in Figure 4f.

3.2.3. Surface Growing Segmentation

Here, we adopt a classical surface growing segmentation method [37] to segment planar structures from the point cloud. The segmentation is initialized by planar seeds detected using a 3D Hough transform [38], and planar regions are subsequently expanded through iterative surface growing. The surface growing process involves several parameters that serve different roles. Specifically, distance-related thresholds (e.g., seed _ max _ distance and segment _ max _ distance ) directly control point-to-plane consistency and thus determine the geometric quality of the resulting segments. Other parameters, such as neighborhood search radius and the maximum number of nearest neighbors, mainly affect neighborhood selection and computational efficiency, without imposing additional geometric constraints. For clarity and reproducibility, all parameters used in the surface growing segmentation and their corresponding values are summarized in Table 1. After segmentation, all points belonging to the same planar region are assigned a unique segment ID.

3.2.4. Dominant Glass Plane Extraction and Refinement

The segment-level count maps in 2D and 3D form are shown in Figure 4d, where we see that the count value in the same segment is the same. Here, we consider point clouds having count values higher than a threshold (e.g., 10) as candidates. Instead of directly applying RANSAC to the selected points over candidate points for the glass plane estimation in 3D space, we adopt a segment growing method to merge adjacent planar segments into a large dominant planar segment based on normal vector direction and point-to-plane distance. As a result, glass-like clusters located in the indoor scene (e.g., room partition) behind the dominant glass segment, or clusters with fewer points, higher curvature, or higher linearity are discarded (such as trees). Finally, the remaining large segments containing glass windows and doors are extracted.
After the glass components are detected, we can determine the affected area by tracing the transmission path from the laser scanner position through four corner points of each glass plane: (1) For single dominant glass structures, even with partial missing points, the dominant glass segments can create a complete continuous surface as the main glass plane. Then, points that fall behind the dominant glass are identified as affected points. (2) For multiple glass components distributed on different facades, we can iteratively identify each individual glass element to extract the corresponding reflection-affected point cloud.

3.3. Virtual Point Removal

In this section, we focus on detecting virtual points in affected areas estimated in Section 3.2. As shown in Figure 5, evaluating every point P Ω back is virtual or not is costly and there are challenges to find the corresponding real points to given virtual points. Virtual points typically exhibit lower density than their real counterparts. Moreover, occlusions may cause missing symmetric correspondences. Given these challenges, we intuitively assume that the global shape feature may outweigh local details in identifying virtual points. This motivates our approach: we hierarchically downsampled points in Ω back and Ω front , separately. Then, both global and local features are captured based on multi-scale sphere representations of downsampled points with our proposed 3D feature similarity estimation network. Plus, virtual points exhibit reflection symmetry to their real counterparts by the glass plane. Therefore, we can determine if a point in Ω back is virtual or not by combining reflection symmetry and neighborhood-based feature similarity.

3.3.1. Reflection Symmetry

In the downsampled points, for a given point P Ω back , we first need to find its symmetric position of real point Q Ω front . To find its real point, we need to calculate the plane function of each glass. With the plane function a x + b y + c z + d = 0 , we can obtain the mirror transformation matrix:
A = 1 2 a 2 2 a b 2 a c 2 a d 2 a b 1 2 b 2 2 b c 2 b d 2 a c 2 b c 1 2 c 2 2 c d 0 0 0 1 .
Then, we can calculate the mirrored point P ^ = A P where A is the mirror matrix. Finally, we use the KD-Tree nearest search to find its nearest point Q Ω front to P ^ and compute:
γ sym ( P ) = 1 , P ^ Q r e P ^ Q β 1 , P ^ Q > r ,
where P ^ Q is the Euclidean distance between P and Q, and r as a threshold is normally set equal to the coarsest downsampled radius in our experiments. β 1 controls the rate at which the symmetry score γ sym ( P ) decays with the Euclidean distance between the reflected point P ^ and its nearest neighbor Q. A larger value of β 1 makes the score less sensitive to small geometric deviations, whereas a smaller β 1 enforces stricter geometric symmetry.

3.3.2. Geometric Similarity

In this study, we utilize Kernel Point Convolution (KPConv) [39] as the backbone to construct the bi-scale Feature Pyramid Network (FPN), which we refer to as KPConv-FPN. As depicted in Figure 6a, KPConv-FPN serves as a multi-level feature extraction backbone used in our study. Each level contains multiple consecutive residual connection blocks, each block consisting of sequential extraction, batch normalization, ReLU activation, and residual connection components. It performs feature extraction on both the input point and its local neighborhood within the higher-density upper-level point cloud. After extracting features of points P and its symmetric points Q , we calculate the feature similarity score as follows:
γ sim ( P ) = e Φ ( P ) Φ ( Q ) β 2 ,
where Φ ( · ) denotes the feature vectors extracted by the 3D feature similarity estimation network, and Φ ( P ) Φ ( Q ) is the Euclidean distance between the feature representations of P and its symmetric point Q . β 2 controls the rate at which γ sim ( P ) decays with the Euclidean distance between the feature vectors Φ ( P ) and Φ ( Q ) . A larger value of β 2 allows greater tolerance to local feature variations extracted by the KPConv-FPN network, whereas a smaller β 2 enforces stricter feature-level consistency.
There is a challenge case where the indoor real-world points and the virtual points are spatially adjacent or exhibit similar features. This necessitates an optimization framework that simultaneously (1) minimizes the feature distance between virtual points and real point clouds and (2) maintains discriminability between reflected objects and other indoor structures. To achieve the dual objectives, we employ a quadruplet loss. As shown in Figure 5, for each virtual point P anc , we select its corresponding real-world counterpart P pos , the geometrically closest negative sample P neg 1 , and the feature-wise closest point P neg 2 within the affected region. Then, we perform the sphere radius search for P anc , P pos , P neg 1 , and P neg 2 with a pre-defined size for generation of P anc , P pos , and P neg 1 , P neg 2 respectively. The quadruplet loss is as follows:
L quadruplet ( P anc , P neg 1 , P neg 2 , P pos ) = max D ( P anc , P pos ) α 1 · D ( P anc , P neg 1 ) + α 2 , 0 + max D ( P anc , P pos ) α 3 · D ( P anc , P neg 2 ) + α 2 , 0 .
where,
D ( P , Q ) = Φ ( P ) Φ ( Q ) ,
Through this loss, the network learns to simultaneously minimize the feature difference between the virtual points P anc and its symmetric point P pos while maximizing the feature differences from P neg 1 and P neg 2 . It is noted that α 1 , α 2 , α 3 in the quadruplet loss are trainable.

3.3.3. Detection of Virtual Points

After obtaining both the reflection symmetry score γ sym ( P ) in Equation (2) and feature similarity score γ sim ( P ) in the Equation (3), we can compute the final confidence score:
Γ ( P ) = γ sym ( P ) · γ sim ( P ) .
We can share this score Γ ( P ) with its neighboring points. As shown in Figure 7, our method can achieve more discriminative similarity scores for virtual points compared to [25]. Finally, we can set a threshold τ [ 0 , 1 ] to point P Ω back as follows:
L ( P ) = 1 , Γ ( P ) > τ 0 , Γ ( P ) τ .
where L ( P ) = 1 indicates a virtual point, and L ( P ) = 0 denotes a real point.

4. Evaluations and Results

Our evaluation consists of two parts: glass plane estimation and virtual point detection. For glass estimation, we compare our proposed method with the multi-echoes count method [25] and an intensity-based method [24] using the dominant-glass scene data and the multi-glass scene data. For virtual point detection, we evaluate the performance of the proposed learning-based method in comparison with the existing methods [23,25,26,30] in indoor and outdoor urban scenes.

4.1. Results of Glass Plane Estimation

In this section, we first evaluate glass estimation in scenes containing multiple small glass planes. The intensity-based glass filtering method [24] is included in this comparison due to its particular sensitivity to surface reflectance properties. In real-world settings, small windows are often covered by curtains. Such scenarios effectively expose the limitations of reflectance-based approaches like [24], which rely on consistent surface reflectivity. Subsequently, our comparison focuses solely on the multi-return method when evaluating dominant glass plane estimation. Since the points behind large glass surfaces are densely distributed and exhibit low reflectance due to energy loss of the laser pulse after transmitting the glass, using intensity alone to detect them becomes inapplicable.
Multiple small glass plane scenes: Figure 8 shows that the intensity-based method [24] has two limitations: (1) It cannot extract glass objects with curtains drawn behind due to their high intensities returned; as a result, even the points behind these glass objects are all virtual points, but cannot be removed. (2) It will detect all glass with low intensities, even some of them do not generate reflection-induced virtual points due to scanning geometry (e.g., boundary-of-FOV or no-intersection cases), which will increase the computational load of subsequent processing. In contrast, the glass objects extracted using the multi-return method [22] and our method can detect glass regions in high-intensities and glass regions with no virtual points will be ignored. Furthermore, our method is capable of detecting glass objects in a more complete and accurate form than the multi-return method, which is more beneficial for subsequent virtual point removal processing.
Dominant glass plane scene: As clearly shown in Figure 9, ref. [22] fails to capture the entire glass areas completely, in such case, some virtual points returned in the missing regions are not considered in the next virtual points removal process. In contrast, our method can extract the sparse and fragmented glass point to fill in the local missing glass regions faithfully, which leads to more reliable plane estimation and enhances the effectiveness of virtual point filtering.

4.2. Results of Virtual Point Removal

After extracting the glass, we then focus on virtual point removal, which consists of four subsections: datasets, experimental setup, and both qualitative and quantitative analysis.

4.2.1. Datasets

Synthetic Training Dataset: Collecting a large number of building models with a large number of reflection artifacts by TLS is labor-intensive and time-consuming. Therefore, to obtain the training data, we add synthetically generated reflection artifacts to real building models. To ensure architectural diversity, our training dataset contains: Japanese-style building sets [24], Korean-style structures [25], and Semantic3D [40] with European-style buildings. These datasets collectively cover 50 building scenes, all captured using TLS [41].
For each scene, we randomly placed 20 glass planes on the facade with arbitrarily selected sizes ranging from 1m∼10 m in width and 1 m∼10 m in height, respectively, then we perform the ray casting from the LiDAR to four corner points of each glass surface to identify the area where the reflected laser pulses can reach. The point cloud within this area is then transformed to the other side of the glass using the plane equation. To simulate more realistic reflections, we randomly remove one-half of the transformed points. Thus, we generated a total of 1000 glass components of various sizes with corresponding reflection artifacts. Figure 10 shows a real building model scene model with a few real reflection artifacts and the building scene with synthetically generated reflection artifacts. We see that the overall characteristics of the synthetic model are very close to that of the real model in terms of the glass shape.
Real Test Dataset: To evaluate our method’s performance on real-world data in diverse reflection scenarios, we conducted experiments using three distinct datasets: UNIST Building Dataset containing annotated five scenes with the dominant glass plane [25] and one additional office building with multiple small glass planes captured in Japan [24]; 3DRN Dataset containing both outdoor urban scenes (two street-view scenarios) and indoor environments (2 indoor settings) [26]; 3DRef SLAM indoor scans (included in the Supplementary Materials) [42].

4.2.2. Experimental Setup

To enable the 3D feature similarity estimation network to learn various shapes, we divided the 50 models into 40 for training and 10 for validation. The network was trained for 100 epochs. We implemented a two-stage sampling strategy with resolutions of 0.15 m (first stage) and 0.45 m (second stage). We also conducted point cloud augmentation including random rotations and scaling. Training used a batch size of 4 and an initial learning rate of 0.01, decaying by 0.5 every 10 epochs with the Adam optimizer. The parameters β 1 in Equation (2) and β 2 in Equation (3) are set empirically from 1 to 2 in our experiments. Additionally, the confidence score threshold τ is generally set between 0.7 and 0.9, where a lower value is used for scattered reflected points and a higher value is applied when the reflected virtual points are more complete.

4.2.3. Quantitative Performance

Results on UNIST dataset: In Table 2, we compare the proposed algorithm with other methods based on the F 1 score, given by the following:
F 1 = 2 × Precision × Recall Precision + Recall
where recall and precision are defined as Recall = TP / ( TP + FN ) and Precision = TP / ( TP + FP ) , where TP, FP, and FN denote the number of correctly detected virtual points, the number of incorrectly detected virtual points, and number of incorrectly detected valid points, separately. We see that the proposed method outperforms the previous methods except in the “Botanical garden” scene by [30]. Note that the “Botanical garden” scene contains a high density of real vegetation points and reflected virtual tree points inside the building. The intertwined indoor vegetation and the reflected trees exhibit nearly identical geometric and symmetric features. Consequently, our method easily misclassifies the actual vegetation as virtual tree points due to their strong similarity.
Results on 3DRN dataset: In Table 3, we compare our method with other methods based on the same evaluation metrics as [26], which are outlier detection rate (ODR), inlier detection rate (IDR), accuracy, and signal-to-noise (SNR):
ODR = TN FP + TN
IDR = TP TP + FN
Accuracy = TP + TN TP + FN + FP + TN
SNR = 10 · lg TP + FN FP + FN
In these metrics, TN represents the number of correctly removed virtual points, while FN describes the number of real-world points incorrectly detected as virtual points. TP refers to the number of preserved real points, and FP is the number of undetected virtual points. In these evaluation metrics, the ODR function corresponds to recall in the F 1 score metric system, measuring the proportion of detected virtual reflection points out of all virtual points. The IDR denotes the ratio of preserved real-world points to all ground-truth real points. The accuracy reflects the overall efficacy of the proposed method in eliminating virtual points; SNR quantifies the denoised output quality in decibels: A higher SNR indicates superior signal quality with negligible noise interference. Compared to the method introduced in [25], our approach demonstrates significant advantages in both indoor and outdoor scenes across all four metrics except for ODR on Scan 10. For Scan 10, although it removes a large number of virtual points, it also misclassifies a substantial portion of real points as virtual points. In terms of the method introduced in [26], the proposed method achieves superior performance except for IDR on Scan 04. In this dataset, even our method incorrectly removes more real points, it also eliminates significantly more virtual points.

4.2.4. Qualitative Performance

As shown in Figure 11, the reflections removed by the existing method [25] present a spatially sparse pattern. This is because it calculates the feature similarity based on the point level and thus it fails to eliminate points that exhibit insufficient feature similarity. In contrast, the proposed method can selectively remove most of the virtual points. A limitation of our method is that some real-world points inside the building are misclassified as reflections as shown in “Engineering building”. We can also observe that in “Office building”, some reflections reflected by windows by second and third floor still can not be fully removed due to the occlusion by the top of trees of their corresponding real-world points.
As shown in Figure 12, multiple dominant glass planes are successfully detected, enabling effective removal of their associated virtual points. However, two significant challenges persist in indoor scenes: (1) Valid ground points outside the room boundaries are systematically misclassified as reflections; (2) Certain ceiling reflection artifacts remain unremoved due to the absence of corresponding real ceiling points in the reference data.

4.2.5. Runtime and Efficiency Evaluation

Our framework adopts a hybrid implementation strategy: the glass plane detection module is implemented in C++ for optimal performance in geometric processing, while the virtual points removal module is implemented in PyTorch (v2.9.0) to leverage GPU acceleration for deep learning inference. All experiments were conducted on a workstation with an Intel Core i9-14900K CPU, 128 G RAM, and an NVIDIA RTX A5000 GPU (24 GB GDDR6X).
Regarding processing efficiency in Table 4, the glass plane detection module accounts for the majority of the computation time, primarily due to the surface growing segmentation process. In contrast, the count values calculation and glass determination—are computationally negligible. As for the virtual point removal module, excluding training time and considering inference only, it only requires small amount of time for both feature extraction and virtual point identification compared to the glass extraction stage. In the future, we will accelerate the surface growing segmentation algorithm through methods such as parallel processing.

4.3. Ablation Study

This section aims to validate the effectiveness of the reflection symmetry (RS) and geometric similarity (GS) with the quadruplet loss modules on virtual removal results. In the ablation studies of the Geometric Similarity (GS) module, if we remove the ( D anc , neg 1 ), the quadruplet loss will degrade to a triplet loss, that means the max term containing D ( P anc , P neg 1 ) is removed, only the max term containing D ( P anc , P neg 2 ) is kept. Similarly, we only keep the max term containing D ( P anc , P neg 1 ) if ( D anc , neg 2 ) is removed. For each ablated configuration, the network was retrained to evaluate its performance.
Table 5 shows our method (with all modules enabled) achieves the highest score on the average F 1 score, significantly outperforming other controlled groups (A, B, C, D). The results of group (A) demonstrate that removing the RS module leads to an overall performance degradation, with particularly noticeable declines in models (a), (b), and (c). Interestingly, models (d) and (f) without RS even slightly outperform our full method. This performance fluctuation highlights the applicability boundaries of the RS module. In scenario (d), auxiliary structures (e.g., sunshades) attached to windows introduce non-planarity and artificial thickness, causing deviations in the estimated plane equation and subsequent reflection errors. In scenario (f), severe occlusions result in virtual points lacking true physical correspondences. While feature-only matching might yield accidental matches (false positives) due to similar local geometries, the RS module strictly prunes these physically invalid correlations. Although this may marginally reduce numerical metrics in specific cases, it ensures the geometric consistency and physical validity of the final reconstruction. Nevertheless, the comprehensive evaluation confirms that retaining the RS module yields superior overall performance across the entire dataset. The results of (B) show that removing the loss terms ( D anc , neg 1 ) and ( D anc , neg 2 ) from the GS module leads to a sharp performance drop. The key role of the GS module lies in negative sample processing: optimizing only the positive sample ( D anc , pos ) is insufficient for achieving good performance. It is crucial to also incorporate the geometrically nearest negative sample D anc , neg 1 and the feature-nearest negative sample D anc , neg 2 . From the results of (C) and (D), it can be inferred that D anc , neg 1 and D anc , neg 2 target different types of hard negative samples (spatial proximity vs. feature-wise similarity). Removing either component harms the final performance, but removing the geometric component D anc , neg 1 has a slightly greater impact on the overall average than removing the feature component D anc , neg 2 . Overall, the ablation study clearly quantifies the contributions of the RS module and the negative sample optimization terms ( D anc , neg 1 , D anc , neg 2 ) in the GS module, which validate that the method (with all modules enabled) significantly outperforms any variant with a single module removed, thereby validating the rationality of the model design.

5. Discussion

Recent efforts [22,23,24,26,30] to remove reflection artifacts from TLS point clouds, including our approach, generally follow a two-stage pipeline: (1) glass surface identification, followed by (2) virtual point detection and removal behind these surfaces based on geometric features. While this framework is intuitive, it suffers from several limitations in both generalizability and scalability:
  • Limitations in glass surface identification: Glass region detection in existing methods often relies on rule-based heuristics tailored to specific scanning configurations and scene layouts, requiring prior knowledge of structural assumptions. As a result, irregular or curved glass surfaces (e.g., ring-shaped or convex panels), particularly when freestanding rather than wall-embedded, are difficult to detect reliably. Our method follows a similar geometric paradigm and likewise assumes that glass surfaces form dominant planar structures embedded in walls that can be geometrically distinguished in LiDAR data. Under this assumption, planar glass surfaces provide sufficient geometric evidence to support robust glass segmentation. Even when LiDAR returns on glass are sparse, fragmented, or entirely missing due to total reflection, partial glass observations can be aggregated through surface-level merging to form a dominant glass plane hypothesis. This aggregated planar representation subsequently provides a reliable geometric basis for subsequent robust virtual point detection and removal. It is noted that in extreme cases where reflection points become excessively sparse, the glass reflection count may drop below a reliable level, making glass surface estimation inherently difficult for count-based methods, including ours.
  • Limitations in geometry-based virtual point detection: Virtual point detection typically relies on geometric features between corresponding point pairs across glass surfaces. However, due to occlusions, some real points that correspond to virtual points may not be captured. In such case, these virtual points cannot be correctly classified. Moreover, non-glass reflective materials such as water surfaces, polished stone, or marble floors in buildings can also produce some misleading reflection artifacts. These artifacts often exhibit different shapes, densities, or echo behaviors compared to real objects, which is still a challenging detection scenario.
  • Limitations in Data Availability and Benchmarking: Due to the high cost of TLS data acquisition, existing datasets are not only limited in volume but particularly scarce for scenarios involving reflective glass surfaces. Consequently, current existing studies are based on self-collected datasets, and the research community lacks adequate open source benchmarks to support broader research in this area.
These limitations highlight the necessity for developing a more generalized, data-driven learning-based framework capable of addressing the above challenges through the fusion of multi-modal data (e.g., LiDAR with RGB imagery).

6. Conclusions

We propose a novel virtual point removal framework for TLS measurements. Our approach begins with reflective glass detection using multi-return echoes combined with planar segmentation and geometric constraints. Subsequently, we developed a 3D feature similarity estimation network with a quadruplet loss that extracts deep point features through spherical representation, enabling effective identification of symmetry relations and geometric similarities between real and virtual points. The network was trained on TLS models with synthetically generated reflection artifacts and evaluated on real-world datasets containing manually annotated ground truth. Experimental results demonstrate that our method achieves significant qualitative and quantitative improvements over state-of-the-art approaches.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs18020332/s1.

Author Contributions

Conceptualization, W.S.; formal analysis, W.S. and Y.Z.; methodology, W.S. and Y.Z.; software, W.S. and Y.Z.; supervision, W.S., Y.L., and T.J.; writing—original draft, W.S. and Y.L.; writing—review and editing, W.S., Y.Z., Y.L., and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key R&D Program of China (Grant No. 2022ZD0119000), Hunan Provincial Key R&D Program of China (Grant No. 2024JK2020 and Grant No. 2024JK2021), Hunan Provincial Natural Science Foundation of China (Grant No. 2024JJ10027), Young Talents of Huxiang (Grant No. Z202433000575) and Changsha Science Fund for Distinguished Young Scholars (Grant No. kq2306002).

Data Availability Statement

The datasets can be accessed from the following sources: (1) UNIST Dataset: https://vip.unist.ac.kr/large-scale-3d-point-clouds-dataset-for-virtual-point-removal (accessed on 15 October 2024); (2) 3DRN Dataset: https://github.com/Tsuiky/3DRN (accessed on 5 January 2025); (3) Japan Dataset: https://github.com/wpshao/GRASS (accessed on 5 January 2025); (4) 3DRef Dataset: https://3dref.github.io (accessed on 10 January 2025); (5) PyTorch Framework: https://pytorch.org/get-started/locally/ (accessed on 15 October 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Shih, Y.-C.; Krishnan, D.; Durand, F.; Freeman, W.T. Reflection removal using ghosting cues. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 3193–3201. [Google Scholar] [CrossRef]
  2. Arvanitopoulos, N.; Achanta, R.; Süsstrunk, S. Single image reflection suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1752–1760. [Google Scholar] [CrossRef]
  3. Martinez, J.; Pistonesi, S.; Maciel, M.C.; Flesia, A.G. Multi-scale fidelity measure for image fusion quality assessment. Inf. Fusion 2019, 50, 197–211. [Google Scholar] [CrossRef]
  4. RahmaniKhezri, H.; Kim, S.; Hefeeda, M. Unsupervised single-image reflection removal. IEEE Trans. Multimed. 2023, 25, 4958–4971. [Google Scholar] [CrossRef]
  5. Fan, Q.; Yang, J.; Hua, G.; Chen, B.; Wipf, D. A generic deep architecture for single image reflection removal and image smoothing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3258–3267. [Google Scholar] [CrossRef]
  6. Wan, R.; Shi, B.; Duan, L.-Y.; Tan, A.-H.; Kot, A.C. CRRN: Multi-scale guided concurrent reflection removal network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4777–4785. [Google Scholar] [CrossRef]
  7. Yang, J.; Gong, D.; Liu, L.; Shi, Q. Seeing deeply and bidirectionally: A deep learning approach for single image reflection removal. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; pp. 675–691. [Google Scholar] [CrossRef]
  8. Wei, K.; Yang, J.; Fu, Y.; Wipf, D.; Huang, H. Single image reflection removal exploiting misaligned training data and network enhancements. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 8170–8179. [Google Scholar] [CrossRef]
  9. Wieschollek, P.; Gallo, O.; Gu, J.; Kautz, J. Separating reflection and transmission images in the wild. In Computer Vision—ECCV 2018: Proceedings of the 15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part XIII; Springer: Berlin/Heidelberg, Germany, 2018; pp. 90–105. [Google Scholar] [CrossRef]
  10. Guo, X.; Cao, X.; Ma, Y. Robust separation of reflection from multiple images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 2195–2202. [Google Scholar] [CrossRef]
  11. Prasad, B.H.; Mitra, K. Burst reflection removal using reflection motion aggregation cues. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 239–248. [Google Scholar] [CrossRef]
  12. Han, B.-J.; Sim, J.-Y. Reflection removal using low-rank matrix completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3872–3880. [Google Scholar] [CrossRef]
  13. Hu, Q.; Guo, X. Single image reflection separation via component synergy. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 1–6 October 2023; pp. 13092–13101. [Google Scholar] [CrossRef]
  14. Kong, N.; Tai, Y.-W.; Shin, J.S. A physically-based approach to reflection separation: From physical modeling to constrained optimization. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 209–221. [Google Scholar] [CrossRef]
  15. Ambrosino, A.; Di Benedetto, A.; Fiani, M. Hybrid Denoising Algorithm for Architectural Point Clouds Acquired with SLAM Systems. Remote Sens. 2024, 16, 4559. [Google Scholar] [CrossRef]
  16. Gonizzi Barsanti, S.; Marini, M.R.; Malatesta, S.G.; Rossi, A. Evaluation of Denoising and Voxelization Algorithms on 3D Point Clouds. Remote Sens. 2024, 16, 2632. [Google Scholar] [CrossRef]
  17. Zheng, Z.; Zha, B.; Zhou, Y.; Huang, J.; Xuchen, Y.; Zhang, H. Single-Stage Adaptive Multi-Scale Point Cloud Noise Filtering Algorithm Based on Feature Information. Remote Sens. 2022, 14, 367. [Google Scholar] [CrossRef]
  18. Wang, L.; Chen, Y.; Xu, H. Point Cloud Denoising in Outdoor Real-World Scenes Based on Measurable Segmentation. Remote Sens. 2024, 16, 2347. [Google Scholar] [CrossRef]
  19. Zhao, Y.; Zhang, J.; Xu, S.; Ma, J. Deep learning-based low overlap point cloud registration for complex scenarios: A review. Inf. Fusion 2024, 107, 102305. [Google Scholar] [CrossRef]
  20. Wu, Y.; Liu, J.; Gong, M.; Miao, Q.; Ma, W.; Xu, C. Joint semantic segmentation using representations of LiDAR point clouds and camera images. Inf. Fusion 2024, 108, 102370. [Google Scholar] [CrossRef]
  21. Xiong, B.; Jin, Y.; Li, F.; Chen, Y.; Zou, Y.; Zhou, Z. Knowledge-driven inference for automatic reconstruction of indoor detailed as-built BIMs from laser scanning data. Autom. Constr. 2023, 156, 105097. [Google Scholar] [CrossRef]
  22. Yun, J.-S.; Sim, J.-Y. Reflection removal for large-scale 3D point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4597–4605. [Google Scholar] [CrossRef]
  23. Yun, J.-S.; Sim, J.-Y. Cluster-wise removal of reflection artifacts in large-scale 3D point clouds using superpixel-based glass region estimation. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 1780–1784. [Google Scholar] [CrossRef]
  24. Shao, W.; Kakizaki, K.; Araki, S.; Mukai, T. Reflections removal produced by multiple transparent and reflective glass objects in TLS measurements. In Proceedings of the IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC), Torino, Italy, 26–30 June 2023; pp. 370–379. [Google Scholar] [CrossRef]
  25. Yun, J.-S.; Sim, J.-Y. Virtual point removal for large-scale 3D point clouds with multiple glass planes. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 729–744. [Google Scholar] [CrossRef] [PubMed]
  26. Fang, L.; Li, T.; Lin, Y.; Zhou, S.; Yao, W. A coupled optical–radiometric modeling approach to removing reflection noise in TLS data of urban areas. ISPRS J. Photogramm. Remote Sens. 2025, 220, 217–231. [Google Scholar] [CrossRef]
  27. Rusu, R.B.; Blodow, N.; Beetz, M. Fast point feature histograms (FPFH) for 3D registration. In Proceedings of the IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; pp. 3212–3217. [Google Scholar] [CrossRef]
  28. Gao, R.; Park, J.; Hu, X.; Yang, S.; Cho, K. Reflective noise filtering of large-scale point cloud using multi-position LiDAR sensing data. Remote Sens. 2021, 13, 3058. [Google Scholar] [CrossRef]
  29. Gao, R.; Li, M.; Yang, S.-J.; Cho, K. Reflective noise filtering of large-scale point cloud using transformer. Remote Sens. 2022, 14, 577. [Google Scholar] [CrossRef]
  30. Lee, O.; Joo, K.; Sim, J.-Y. Learning-based reflection-aware virtual point removal for large-scale 3D point clouds. IEEE Robot. Autom. Lett. 2023, 8, 8510–8517. [Google Scholar] [CrossRef]
  31. Koch, R.; May, S.; Murmann, P.; Nüchter, A. Identification of transparent and specular reflective material in laser scans to discriminate affected measurements for faultless robotic SLAM. Robot. Auton. Syst. 2017, 87, 296–312. [Google Scholar] [CrossRef]
  32. Koch, R.; May, S.; Koch, P.; Kühn, M.; Nüchter, A. Detection of specular reflections in range measurements for faultless robotic SLAM. In Robot 2015: Second Iberian Robotics Conference. Advances in Intelligent Systems and Computing; Reis, L., Moreira, A., Lima, P., Montano, L., Muñoz-Martinez, V., Eds.; Springer: Cham, Switzerland, 2015; Volume 417. [Google Scholar] [CrossRef]
  33. Zhao, X.; Yang, Z.; Schwertfeger, S. Mapping with reflection: Detection and utilization of reflection in 3D lidar scans. In Proceedings of the IEEE International Symposium on Safety, Security, and Rescue Robotics, Abu Dhabi, United Arab Emirates, 4–6 November 2020; pp. 27–33. [Google Scholar] [CrossRef]
  34. Li, Y.; Zhao, X.; Schwertfeger, S. Detection and utilization of reflections in LiDAR scans through plane optimization and plane SLAM. Sensors 2024, 24, 4794. [Google Scholar] [CrossRef]
  35. Born, M.; Wolf, E. Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light, 7th ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
  36. Xu, Y.; Boerner, R.; Yao, W.; Hoegner, L.; Stilla, U. Pairwise coarse registration of point clouds in urban scenes using voxel-based 4-planes congruent sets. ISPRS J. Photogramm. Remote Sens. 2019, 151, 106–123. [Google Scholar] [CrossRef]
  37. Vosselman, G. Point cloud segmentation for urban scene classification. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2013, XL-7/W2, 257–262. [Google Scholar] [CrossRef]
  38. Vosselman, G.; Gorte, B.G.H.; Sithole, G.; Rabbani, T. Recognising structure in laser scanner point clouds. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 2004, 46, 33–38. [Google Scholar]
  39. Thomas, H.; Qi, C.R.; Deschaud, J.-E.; Marcotegui, B.; Goulette, F.; Guibas, L. KPConv: Flexible and deformable convolution for point clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6410–6419. [Google Scholar] [CrossRef]
  40. Hackel, T.; Savinov, N.; Ladicky, L.; Wegner, J.D.; Schindler, K.; Pollefeys, M. SEMANTIC3D.NET: A new large-scale point cloud classification benchmark. ISPRS Ann. Photogramm. Remote Sens. Spatial Inf. Sci. 2017, IV-1/W1, 91–98. [Google Scholar] [CrossRef]
  41. RIEGL Laser Measurement Systems. Available online: http://www.riegl.com/ (accessed on 15 November 2025).
  42. Zhao, X.; Schwertfeger, S. 3DRef: 3D dataset and benchmark for reflection detection in RGB and lidar data [dataset]. In Proceedings of the International Conference on 3D Vision, Davos, Switzerland, 18–21 March 2024; pp. 225–234. [Google Scholar] [CrossRef]
Figure 1. The Glass Reflection Artifact suppression Strategy (GRASS). (a) An illustration of GRASS. (b) The principle of reflection in TLS measurement. Blue line: The laser beam hits the glass surface, producing a glass point ( P glass ), a virtual point reflected from tree by the glass ( P virtual ), and a sofa point inside the building ( P sofa ). Orange line: The real tree ( P tree , symmetric to P virtual ) is detected behind LiDAR during the sensor rotation. (c) The real building scene captured by TLS where the glass planes are shown in yellow and virtual points are shown in red.
Figure 1. The Glass Reflection Artifact suppression Strategy (GRASS). (a) An illustration of GRASS. (b) The principle of reflection in TLS measurement. Blue line: The laser beam hits the glass surface, producing a glass point ( P glass ), a virtual point reflected from tree by the glass ( P virtual ), and a sofa point inside the building ( P sofa ). Orange line: The real tree ( P tree , symmetric to P virtual ) is detected behind LiDAR during the sensor rotation. (c) The real building scene captured by TLS where the glass planes are shown in yellow and virtual points are shown in red.
Remotesensing 18 00332 g001
Figure 2. The proposed GRASS framework comprises two main modules: (1) a glass plane detection module (Section 3.2) for precise estimation of reflective surfaces, and (2) a virtual point removal module (Section 3.3) that selectively removes virtual points while preserving real ones. Here, P anc represents virtual points produced by glass-induced specular reflection, which correspond to real physical object points P pos .
Figure 2. The proposed GRASS framework comprises two main modules: (1) a glass plane detection module (Section 3.2) for precise estimation of reflective surfaces, and (2) a virtual point removal module (Section 3.3) that selectively removes virtual points while preserving real ones. Here, P anc represents virtual points produced by glass-induced specular reflection, which correspond to real physical object points P pos .
Remotesensing 18 00332 g002
Figure 3. Partitioning of the unit sphere into local surface patches.
Figure 3. Partitioning of the unit sphere into local surface patches.
Remotesensing 18 00332 g003
Figure 4. Overview of the proposed glass plane detection module. The top and bottom rows display the point clouds in 2D and 3D forms, respectively. (a) Schematic of the transformation from 3D point clouds to a 2D count map. (b) Point-level count map: Red regions indicate areas with high number of returns (e.g., glass surfaces), while blue regions represent areas with low number of returns (e.g., building facades). (c) Segmentation results: Points belonging to the same segment are assigned a uniform color. (d) Segment-level count map derived by averaging the point-level counts from (b) over each planar segment in (c): All points within a segment share an identical count value, resulting in a significantly more even count distribution than (b). (e) Glass-like surface extraction results: each color represents a unified surface resulting from merging adjacent candidates after the initial thresholding of the segment-level count map, while non-glass objects are rendered in gray for contrast. (f) Final glass object extraction results: Indoor non-glass points are filtered out based on their position within the spatial frustum defined by the LiDAR sensor, as well as isolated small surfaces that do not meet the minimum area requirement.
Figure 4. Overview of the proposed glass plane detection module. The top and bottom rows display the point clouds in 2D and 3D forms, respectively. (a) Schematic of the transformation from 3D point clouds to a 2D count map. (b) Point-level count map: Red regions indicate areas with high number of returns (e.g., glass surfaces), while blue regions represent areas with low number of returns (e.g., building facades). (c) Segmentation results: Points belonging to the same segment are assigned a uniform color. (d) Segment-level count map derived by averaging the point-level counts from (b) over each planar segment in (c): All points within a segment share an identical count value, resulting in a significantly more even count distribution than (b). (e) Glass-like surface extraction results: each color represents a unified surface resulting from merging adjacent candidates after the initial thresholding of the segment-level count map, while non-glass objects are rendered in gray for contrast. (f) Final glass object extraction results: Indoor non-glass points are filtered out based on their position within the spatial frustum defined by the LiDAR sensor, as well as isolated small surfaces that do not meet the minimum area requirement.
Remotesensing 18 00332 g004
Figure 5. Point sampling and radius search. For a given virtual point P anc , we find its positive sample P pos Ω front and its geometrically nearest negative sample by spatial distance P neg 1 Ω back and the the nearest neighbor in the latent feature space P neg 2 Ω back , respectively. Then, we use the KD-Tree to find neighbors within a radius centered on P anc , P pos , and P neg 1 , P neg 2 , respectively.
Figure 5. Point sampling and radius search. For a given virtual point P anc , we find its positive sample P pos Ω front and its geometrically nearest negative sample by spatial distance P neg 1 Ω back and the the nearest neighbor in the latent feature space P neg 2 Ω back , respectively. Then, we use the KD-Tree to find neighbors within a radius centered on P anc , P pos , and P neg 1 , P neg 2 , respectively.
Remotesensing 18 00332 g005
Figure 6. KPConv-FPN architecture and its core components. (a) Overall architecture of the KPConv-FPN network, featuring dual input streams and multi-level feature fusion. The input consists of first-level downsampled points with a constant scalar feature of 1 and second-level downsampled points. The symbol D represents the number of extracted feature channels, which is set to 128 in our experiments. After processing through multiple ResBlock layers, the final output feature dimension of the network is 4 · D (i.e., 512-dimensional). (b) Illustration of ConvBlocks (top) and ResBlock (bottom). The MLP block in the ResBlock refers to a multilayer perceptron consisting of a linear (fully connected) layer, batch normalization (BN), and ReLU activation. (c) KPConv illustrated on 3D points. Input points with a constant scalar feature (shown in grey) are convolved using KPConv kernels defined by a set of kernel points (shown in black), each associated with a learnable filter weight [39].
Figure 6. KPConv-FPN architecture and its core components. (a) Overall architecture of the KPConv-FPN network, featuring dual input streams and multi-level feature fusion. The input consists of first-level downsampled points with a constant scalar feature of 1 and second-level downsampled points. The symbol D represents the number of extracted feature channels, which is set to 128 in our experiments. After processing through multiple ResBlock layers, the final output feature dimension of the network is 4 · D (i.e., 512-dimensional). (b) Illustration of ConvBlocks (top) and ResBlock (bottom). The MLP block in the ResBlock refers to a multilayer perceptron consisting of a linear (fully connected) layer, batch normalization (BN), and ReLU activation. (c) KPConv illustrated on 3D points. Input points with a constant scalar feature (shown in grey) are convolved using KPConv kernels defined by a set of kernel points (shown in black), each associated with a learnable filter weight [39].
Remotesensing 18 00332 g006
Figure 7. Similarity scores for virtual points. (a) A 3D scene where the wall and human in Ω affected are reflected from Ω front . The similarity scores in (b,c) are obtained by using the previous method [25] and the proposed method, respectively.
Figure 7. Similarity scores for virtual points. (a) A 3D scene where the wall and human in Ω affected are reflected from Ω front . The similarity scores in (b,c) are obtained by using the previous method [25] and the proposed method, respectively.
Remotesensing 18 00332 g007
Figure 8. Comparison of the glass region estimation results on TLS data with multiple small glass planes. (a) The panoramic images of input TLS point clouds: The red rectangles are glass objects with curtains drawn behind; green circles indicate glass component with no virtual points. (b) Intensity map; the color from dark blue to bright yellow represents the intensity values, ranging from low to high. (c) Glass regions estimation results based on intensity-based method [24], where the detected windows are coded in yellow and other points are coded in purple. (d) Glass components estimation based on the multi-echo count method, the red rectangle highlights the sparse pattern [22]. (e) Glass region estimation results by the proposed algorithm. (d,e) follow the same color-coding scheme as (c). Results are shown from top to bottom for the scenes: “Shopping mall” and “Office building”.
Figure 8. Comparison of the glass region estimation results on TLS data with multiple small glass planes. (a) The panoramic images of input TLS point clouds: The red rectangles are glass objects with curtains drawn behind; green circles indicate glass component with no virtual points. (b) Intensity map; the color from dark blue to bright yellow represents the intensity values, ranging from low to high. (c) Glass regions estimation results based on intensity-based method [24], where the detected windows are coded in yellow and other points are coded in purple. (d) Glass components estimation based on the multi-echo count method, the red rectangle highlights the sparse pattern [22]. (e) Glass region estimation results by the proposed algorithm. (d,e) follow the same color-coding scheme as (c). Results are shown from top to bottom for the scenes: “Shopping mall” and “Office building”.
Remotesensing 18 00332 g008
Figure 9. Comparison of the glass region estimation results on TLS data with the dominant glass plane. The red circles highlight that our method achieves more complete glass extraction compared to the multi-echo count method. The first row is the panoramic images of the input TLS point clouds. The second row indicates the glass regions estimation results based on the multi-return method [22], the glass components in red rectangle present the sparse pattern. The third row shows glass region estimation results by the proposed algorithm. The scenes are shown from left to right: (a) “International hall”, (b) “Botanical garden”, (c) “Terrace”, (d) “Engineering building”, (e) “Gymnasium”.
Figure 9. Comparison of the glass region estimation results on TLS data with the dominant glass plane. The red circles highlight that our method achieves more complete glass extraction compared to the multi-echo count method. The first row is the panoramic images of the input TLS point clouds. The second row indicates the glass regions estimation results based on the multi-return method [22], the glass components in red rectangle present the sparse pattern. The third row shows glass region estimation results by the proposed algorithm. The scenes are shown from left to right: (a) “International hall”, (b) “Botanical garden”, (c) “Terrace”, (d) “Engineering building”, (e) “Gymnasium”.
Remotesensing 18 00332 g009
Figure 10. The overview of the synthetically generated reflection artifacts of real scene: Top: A real scene. Middle: Real scene with a small number of reflection artifacts. Bottom: Real scene further modified with synthetically generated reflection artifacts, including multiple arbitrarily placed glass planes. The glass and virtual points are colored in yellow and red, respectively.
Figure 10. The overview of the synthetically generated reflection artifacts of real scene: Top: A real scene. Middle: Real scene with a small number of reflection artifacts. Bottom: Real scene further modified with synthetically generated reflection artifacts, including multiple arbitrarily placed glass planes. The glass and virtual points are colored in yellow and red, respectively.
Remotesensing 18 00332 g010
Figure 11. Comparison of the virtual point detection results on UNIST building dataset [25] and a multiple-glass dataset [24]. The glass planes are visualized in yellow, while the real and virtual points are colored in gray and red, respectively. The blue and green boxes provide magnified views of localized details regarding virtual point removal results at different interior locations. The blue circles highlight false positives, which are real-world points inside the building incorrectly classified as virtual points. (a) Input TLS data with glass planes in yellow. (b) Ground truth labeling. (c) Existing method results [25]. (d) Proposed method. (e) Refined TLS data. Scenes from top to bottom: “Architecture building”, “Botanical garden”, “Engineering building”, “Natural science building”, “Terrace”, “Office building”.
Figure 11. Comparison of the virtual point detection results on UNIST building dataset [25] and a multiple-glass dataset [24]. The glass planes are visualized in yellow, while the real and virtual points are colored in gray and red, respectively. The blue and green boxes provide magnified views of localized details regarding virtual point removal results at different interior locations. The blue circles highlight false positives, which are real-world points inside the building incorrectly classified as virtual points. (a) Input TLS data with glass planes in yellow. (b) Ground truth labeling. (c) Existing method results [25]. (d) Proposed method. (e) Refined TLS data. Scenes from top to bottom: “Architecture building”, “Botanical garden”, “Engineering building”, “Natural science building”, “Terrace”, “Office building”.
Remotesensing 18 00332 g011
Figure 12. Result of proposed method on 3DRN dataset. The yellow planes represent the extracted glass surfaces, while the red points denote virtual points. (a) Input point clouds. (b) Ground truth of virtual points. (c) Glass plane extraction and virtual points detection. (d) Virtual points removal results.
Figure 12. Result of proposed method on 3DRN dataset. The yellow planes represent the extracted glass surfaces, while the red points denote virtual points. (a) Input point clouds. (b) Ground truth of virtual points. (c) Glass plane extraction and virtual points detection. (d) Virtual points removal results.
Remotesensing 18 00332 g012
Table 1. Parameters of surface growing segmentation.
Table 1. Parameters of surface growing segmentation.
ParameterDefault ValueDescription
seed_min_number3Minimum number of points required to form a seed plane
seed_search_radius0.3 mNeighborhood radius for seed detection
seed_max_distance0.2 mDistance threshold for evaluating point-to-seed-plane consistency
segment_min_number20Minimum number of points required to form a valid segment
segment_search_radius0.5 mNeighborhood radius for surface growing
segment_search_nearest_k20Maximum number of nearest neighbors considered within the search radius (for efficiency in dense regions)
recompute_min_distance0.2 mDistance threshold triggering plane re-fitting
recompute_min_number1Minimum number of newly added points required to trigger plane re-estimation
Table 2. Quantitative performance comparison in terms of the overall F 1 score evaluated on the UNIST dataset (a–e) and an additional multiple small glass building scene (f). Specifically, (a)–(e) correspond to `Architecture building’, `Botanical garden’, `Engineering building’, `Natural science building’, and `Terrace’, while (f) corresponds to “Office building”.
Table 2. Quantitative performance comparison in terms of the overall F 1 score evaluated on the UNIST dataset (a–e) and an additional multiple small glass building scene (f). Specifically, (a)–(e) correspond to `Architecture building’, `Botanical garden’, `Engineering building’, `Natural science building’, and `Terrace’, while (f) corresponds to “Office building”.
Method(a)(b)(c)(d)(e)(f)Average
[23]0.6150.5850.7290.8800.5200.7090.673
[25]0.6940.8220.6270.7770.379-0.659
[30]0.7660.8620.7810.9240.862-0.839
Proposed0.8220.8460.9230.9760.8750.8110.876
“-”: indicates that the corresponding method is not applicable to the office building scene or cannot be faithfully reproduced under the original method assumptions.
Table 3. Comparison of different methods on 3DRN datasets. From top to bottom is Scan 04, Scan 05, Scan 10, Scan 11.
Table 3. Comparison of different methods on 3DRN datasets. From top to bottom is Scan 04, Scan 05, Scan 10, Scan 11.
ScenarioODR (%)IDR (%)Accuracy (%)SNR (dB)
[25][26]Proposed[25][26]Proposed[25][26]Proposed[25][26]Proposed
Scan 04outdoor75.1287.3089.7299.2898.6898.7891.4295.0095.838.9611.2812.09
Scan 05outdoor50.2488.5390.5297.4997.4897.5588.3295.7496.188.3912.7713.25
Scan 10indoor47.8141.8382.9374.9584.3187.7869.0075.0086.714.014.947.69
Scan 11indoor97.4373.2077.1378.0994.5694.8679.5992.9093.486.5511.1411.51
average 67.6572.7185.0787.4593.7594.7482.0889.6693.056.9810.0311.14
Table 4. Processing performance of the proposed algorithm evaluated on the UNIST dataset (scenes a–e), office building (f), and the 3DRN dataset.
Table 4. Processing performance of the proposed algorithm evaluated on the UNIST dataset (scenes a–e), office building (f), and the 3DRN dataset.
ModelNumber of
Points
Processing Time (s)
Glass Region
Estimation
Virtual Point
Detection
Total
(a)5,562,97266.721.2667.98
(b)6,140,38346.310.7847.09
(c)9,720,67188.901.8490.74
(d)4,913,71032.250.7132.96
(e)5,000,90279.831.5281.35
(f)8,316,36059.640.2659.90
Scan 045,038,85832.111.8833.99
Scan 053,342,46626.220.7927.01
Scan 101,880,83710.310.5810.89
Scan 111,942,03112.420.3012.72
Table 5. Ablation study of the proposed virtual points detection modules. The methods are defined by combinations of reflection symmetry (RS) and geometric similarity (GS) with quadruplet loss. Performance is evaluated on UNIST datasets from (a) to (e) and office building (f) based on the F 1 score.
Table 5. Ablation study of the proposed virtual points detection modules. The methods are defined by combinations of reflection symmetry (RS) and geometric similarity (GS) with quadruplet loss. Performance is evaluated on UNIST datasets from (a) to (e) and office building (f) based on the F 1 score.
MethodsModules(a)(b)(c)(d)(e)(f)Average
ProposedRS + GS(Loss) { D anc , pos , D anc , neg 1 , D anc , neg 2 } 0.8220.8460.9230.9760.8750.8110.876
AGS(Loss) { D anc , pos , D anc , neg 1 , D anc , neg 2 } 0.7820.7910.8570.9800.8680.8510.855
BRS + GS(Loss) { D anc , pos } 0.7220.7740.8200.9090.8260.7580.805
CRS + GS(Loss) { D anc , pos , D anc , neg 1 } 0.8020.8420.8880.9590.8740.7820.858
DRS + GS(Loss) { D anc , pos , D anc , neg 2 } 0.7670.8050.9080.9670.8480.7620.843
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shao, W.; Zhang, Y.; Xue, Y.; Ji, T.; Lao, Y. GRASS: Glass Reflection Artifact Suppression Strategy via Virtual Point Removal in LiDAR Point Clouds. Remote Sens. 2026, 18, 332. https://doi.org/10.3390/rs18020332

AMA Style

Shao W, Zhang Y, Xue Y, Ji T, Lao Y. GRASS: Glass Reflection Artifact Suppression Strategy via Virtual Point Removal in LiDAR Point Clouds. Remote Sensing. 2026; 18(2):332. https://doi.org/10.3390/rs18020332

Chicago/Turabian Style

Shao, Wanpeng, Yu Zhang, Yifei Xue, Tie Ji, and Yizhen Lao. 2026. "GRASS: Glass Reflection Artifact Suppression Strategy via Virtual Point Removal in LiDAR Point Clouds" Remote Sensing 18, no. 2: 332. https://doi.org/10.3390/rs18020332

APA Style

Shao, W., Zhang, Y., Xue, Y., Ji, T., & Lao, Y. (2026). GRASS: Glass Reflection Artifact Suppression Strategy via Virtual Point Removal in LiDAR Point Clouds. Remote Sensing, 18(2), 332. https://doi.org/10.3390/rs18020332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop