PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising

Sun, Shuyu; Huang, Jianqiang; Zhao, Shuai; Huang, Tengchao

doi:10.3390/app15137073

Open AccessArticle

PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising

¹

State Key Laboratory of Modern Optical Instrumentation, Zhejiang University, No. 38 Zheda Road, Hangzhou 310027, China

²

Jiaxing Research Institute, Zhejiang University, Jiaxing 314031, China

³

Shanghai Institute for Advanced Study of Zhejiang University, Shanghai 201203, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(13), 7073; https://doi.org/10.3390/app15137073

Submission received: 1 May 2025 / Revised: 4 June 2025 / Accepted: 17 June 2025 / Published: 23 June 2025

(This article belongs to the Section Optics and Lasers)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Featured Application

The PcBD model provided in this article can be used as a preprocessing method for wind tunnel tests and more depth camera-based target detection experiments, and researchers can use one or more of these modules to flexibly combine them. In addition, researchers can directly use the Bound57 dataset proposed in this paper as a verification method for point cloud preprocessing, or they can generate datasets dedicated to special scenarios based on the construction logic and provided code of the Bound57 dataset to meet different needs.

Abstract

In target detection tasks equipped with depth sensors, it is crucial to adopt the point cloud pretreatment process, which is directly related to the quality of the obtained three-dimensional model of the target. However, there are few methods that can be combined with common preprocessing methods to quickly process ToF camera output. In real-life experiments, the common method is to adopt multiple types of preprocessing methods and adjust parameters separately. We proposed PcBD, a method that integrates outlier removal, boundary detection, and smooth sliders. PcBD does not limit the number of input points, and can remove outliers and predict smooth projection boundaries at one time while ensuring that the total number of points remains unchanged. We also introduced Bound57, a benchmark dataset that contains point clouds with synthetic noise, outliers, and projected boundary labels. Experimental results show that PcBD performs significantly better than state-of-the-art methods in various de-noising and boundary detection tasks.

Keywords:

point cloud processing; projection boundary; de-noising; 3D datasets; deep learning; ToF cameras

Graphical Abstract

1. Introduction

Three-dimensional (3D) target detection plays a crucial role in various scientific and engineering fields. With the development of 3D laser ranging technology, Time-of-Flight (ToF) cameras now have a higher ranging accuracy and greater resolution, which allows the users to collect three-dimentional point cloud data qualified to be applied on high-precision target detection tasks [1]. For researchers, it is necessary to measure the geometric characteristics such as length, thickness, and projection boundaries to analyze physical movements and deformation of a target, encompassing rigid ones such as projected area of structures along the flow direction [2,3], and deformable ones such as parachute deployments [4,5] or even the accumulation of an ice cluster.

To calculate the geometric characteristics of a target, it is necessary to obtain their projection boundary in certain directions. For a parachute, as an example, the projection boundary is closely related to the wind resistance and is the key to judging whether the structure is fully deployed. However, raw point clouds from a ToF camera contain noise points, incomplete areas, and ranging errors caused by the inherent limitations of the sensors or the matching ambiguity in image reconstruction. These factors will seriously affect downstream tasks, especially measurements. Therefore, developing a coherent pre-processing flow for ToF cameras is the key to achieving an integration of real-time 3D measurement and data output. A traditional approach in engineering is to apply multiple existing methods to deal with these defects one by one, but this requires tuning and optimizing parameters for each model separately, and takes up a large amount of computing space. On the other hand, recent research has only focused on 3D de-noising or completion of rigid targets, and then using a very simple traditional method to extract the projection boundaries. These methods may not be applicable in practice because existing real scan datasets lack labels for the outlier points, while computer-simulated de-noising datasets are mostly generated from complete 3D models thus cannot simulate the original output of ToF cameras well. Since it is difficult to establish completion standards for deformable targets, rigid point cloud completion algorithms are also of little use. In addition, traditional 3D object projection boundary extraction methods also rely on parameter tuning and density consistency of the target point cloud.

To solve the above problems, we propose a deep-learning-based point cloud processing flow called PcBD. The PcBD network contain three parts, each of which focuses on one processing task, and the three-dimensional features of the input point cloud are retained after each block and further enhanced in the next block to have stronger two-dimensional and three-dimensional representation capabilities. Figure 1 shows the processing flow of PcBD on a raw target point cloud of a ToF camera. In order to accurately recognize and remove outliers, we apply the SHOT local reference frame (LRF) construction [6] in the Feature-Extraction block and introduce an improved Point Transformer [7] to simultaneously process three-dimensional coordinates and local reference frames. In the Boundary-Detecting and the Smoothing blocks, we propose a Cross Transformer strcture to remind the three-dimensional features and the newly generated two-dimensional features of their positions in each other’s geometric shapes, and further combine 3D structures and 2D shapes to achieve accurate detecting and smoothing of the projection boundary. We design a suitable method in the network structure to ensure that the total number of points remains unchanged while removing outliers and outputting the projection boundary, which is necessary for a point cloud processing flow.

We also introduce a new benchmark Bound57 for point cloud processing flow, which simulates the raw output of ToF cameras from 3D objects through carefully designed methods. Researchers can simulate ToF raw data and ground truths for various tasks by themselves through the provided approach, as shown in Figure 2. Detailed generating methods are discussed in Section 4.1. Quantitative tests show that the proposed PcBD performs better as a processing flow for raw point cloud data and is expected to be applied to practical tasks.

Our main contributions can be summarized as follows:

We propose a novel point cloud processing flow PcBD for outlier removal, boundary detection, and smoothing of raw point cloud data. Compared with traditional engineering application methods, PcBD can fulfill multiple tasks in one network using only 3D coordinates, significantly improving the real-time performance of 3D measurement.
We have improved various traditional 3D point cloud processing modules and combined them with local reference frame calculation and novel convolutional layers to achieve point cloud 3D feature extraction with strong expressiveness. In the multi-task processing flow of PcBD, the extracted 3D features are continuously strengthened to guide different tasks.
We propose a method to combine 3D and 2D features for projection boundary prediction and smoothing with a novel cross transformer structure to simultaneously search for the locations of target features in 2D and 3D point clouds, allowing the network to extract 2D projection boundaries in combination with 3D features, and de-noise the 3D coordinates of the projection boundaries in combination with 2D features.
We propose a new benbenchmark, Bound57, for multiple point cloud processing tasks, including outlier-removal, de-noising, boundary-detection and up-sampling. Our proposed method can be used to generate new 3D point cloud data and to test point cloud processing flows for arbitrary functions.

The source code for the proposed method and instructions for obtaining the dataset are available at the following link: https://github.com/desperadossy/PcBD, accessed on 1 June 2025.

2. Related Work

Point cloud de-noising. Real scans of a ToF camera can contain two kinds of noise: systematic ranging errors, and the spatial outlier points which could be caused by tiny ash or rain drops in the air, or entirely due to the target reflectivity and reflection angle, especially when there is an existence of mirror-like surfaces. Early methods made approaches to filter out the outlier points with lower neighbor density [8,9], these methods usually rely on prior data, and could only remove the outlier noisy points. Other researchers have made approaches to use a bilateral filter [10,11] or to approximate and project points onto an underlying surface [12] to deal with the underlying noise of each point, while flexible objects often cannot be represented as a set of regular surfaces, and the bilateral filters would wipe out the edge areas of low density. Some graph-based methods have introduced graph filters to graph-represented point clouds [13,14], but the 3D representation accuracy of these methods is seriously insufficient.

It is becoming clearer that a more proper way to process 3D point clouds is to operate on each single point, and methods have been proposed to deal with 3D coordinates, such as PointNet++ [15] and other point cloud feature encoders based on convolutional neural network layers [16,17,18]. Some deep-learning-based methods focus on outlier removal using spatial and temporal information [19,20,21], which have been proved to be efficient on processing weather-caused noise points, but the effect on ranging noise is limited. PCNet [22] firstly came up with an idea to predict displacement for each point, and other methods followed to optimize the reasoning ability of the network to the underlying surface [23,24,25]. The method proposed in [26] employed score matching [27] to estimate the noise-convolved distribution of a noisy point cloud and de-noise the point cloud by gradient ascent, and has been proved to be able to be applied on arbitrary shapes. Vogel et al. introduced P2P-Bridge to treat de-noising as finding an optimal transport (Schrödinger bridge) between noisy and clean point clouds, and achieved significant improvement over prior methods on existing datasets [28]. These methods do not distinguish between outliers and ranging errors and discuss how to deal with them separately. Some recent studies have simultaneously implemented outlier filtering and point cloud de-noising in a single network [29], but their training data is only generated from complete 3D models by down-sampling and adding simple Gaussian noise.

None of these above-mentioned methods form a stable processing flow of depth camera output data. The PointCVaR [30] was proposed to de-noise the input point clouds based on the results of classification networks. However, it only relies on the prediction output of other systems and does not utilize the effective three-dimensional features extracted by the preceding system.

Point cloud boundary detecting. The methods to determine the boundary or a “hull” of a projected point cloud can be separated to convex hulls and concave hulls, which are mainly named after the shape of the constructed hull. Chand et al. [31] proposed a gift-wrapping algorithm to generate a convex hull for point clouds in arbitrary dimensional space. The Graham’s algorithm [32] proposed in 1972 aimed to judge whether a point is a vertex on the convex hull by calculating the cross product of vectors between three points. Nevertheless, a convex hull is not perfectly suitable for a flexible target because the boundary of the projected cloud is not likely to be a convex polygon. The alpha-shape method [33] was proposed to solve such problems as it constructs a concave hull by rolling a circle with a fixed radius around the cloud to detect the edge points. Several other approaches have been introduced to improve parameter adjustments [34,35]. However, in the scanning of the real world, because of the inherent noise of the point cloud data, the algorithm may need to adopt a different hull-wrapping idea in different areas. Due to the lack of attention to the original 3D structure, existing methods cannot determine which type of hull to build in which area. Subsequent studies have mostly focused on detecting 3D edges rather than 2D projected boundaries. A recent method, the circle-rolling method (CRM) [36], applies a 2D rolling circle to 3D point cloud slices and then maps the detected 2D boundaries back to 3D space to obtain the 3D boundary of a point cloud. Deep learning models for projection boundary detection are relatively rare. As a landmark method, BoundED [37] used statistical features to detect the 3D boundary of an object. However, the 2D projection boundary is not necessarily parts of the 3D edges, so a unique method needs to be developed.

Compared with the above-mentioned point cloud de-noising and projection boundary detecting methods, PcBD divides outliers and ranging errors into two parts, remaining the clear-out idea of removing outliers and underlying predicting idea of eliminating the ranging errors of boundary points. The network directly predicts projection boundary points based on the projection shape and the original three-dimensional structures without relying on specific construction shapes or local densities. In addition, PcBD is trained on a ToF simulation dataset, Bound57, which can be better applied in practical situations. Combining the above characteristics, PcBD forms a continuous and effective point cloud processing flow to fulfill multiple tasks.

3. Methods

Figure 1 illustrates the structure of PcBD, which includes a Feature-Extracting block with an Outlier-Removal block, a Boundary-Detecting block, and a Smoothing block. We firstly introduce the overall framework, and detailed structures will be discussed as follows.

3.1. Overview

The overall structure of PcBD contains 3 main blocks: the Feature-Extracting block with an Outlier-Removal block to remove the outlier points, the Boundary-Detecting block to predict the projection boundary, and the Smoothing block to smooth the predicted boundary curves. Given an initial input point cloud

P = {\{p_{i}\}}_{i = 0}^{N}

where N denotes the number of points, the Feature-Extracting block calculate the local reference frames (LRF) of every point to enhance the perception ability of the model with regard to geometric structures. The local reference frames contain three orthogonal vectors as the 3-dimensional axes of the local coordinate system for each point. We optimize some point cloud processing units such as PointNet++ [15] and PAConv [16] to satisfy our need to understand geometry connections and fit the point cloud with local reference frames. The output point-wise feature

F = {\{f_{i}\}}_{i = 0}^{N}

will be of size

N \times C

. The Outlier-Removal process in the Feature-Extracting block takes the extracted feature

F

and outputs outlier labels

L_{O}

of size

N \times 1

to mark the spatial outlier points with scores. To keep the data contiguous and remain the tensor shape, we replace the predicted outlier points with the first non-outlier point, and apply the same operation to the point-wise feature

F

to generate an outlier-removed feature

F_{C}

: threshold

\{\begin{matrix} P_{C} & = {p_{i}}_{L_{O_{i}} < θ_{O}} \cup {\{p_{0}\}}_{N - | {L_{O} < θ_{O}} |} \\ F_{C} & = {f_{i} ∣ p_{i} \in P_{C}} \end{matrix}

(1)

where

θ_{O}

is the threshold of noise label, and || represents the length of a set. In this way, the number of points remains the same, allowing us to utilize a further process.

We then project the outlier-removed point cloud

P_{C}

along the Z axis to size

N \times 2

before the Boundary-Detecting block. To detect the projection shape, we send the projected 2D point cloud through a 2D-Detection block to extract 2D shape information

F_{2 D}

. To recover the missing 3D structural data, we employ a Transformer to perform cross-attention between 3D and 2D features

F_{C}

and

F_{2 D}

. The output feature

F_{B}

contains both 2D geometry and 3D structural data. The boundary labels

L_{B}

of size

N \times 1

are predicted through an MLP layer to mark projection boundary points with scores. In the same way as before, the projection-bound point cloud is generated by replacing non-boundary points with the first boundary point:

P_{B} = {\{p_{j}\}}_{L_{B_{j}} < θ_{B}} + {\{p_{0} \in \{p_{j}\}\}}_{N - len ({L_{B} < θ_{B}})}

(2)

The smoothing block takes the predicted boundary points

P_{B}

with the point-wise 3D and 2D features

F_{C}

and

F_{B}

, and detects the edges of the predicted boundary by kNN-sampling and grouping between 3D and 2D features. The block finally outputs a three-dimensional replacement vector

V_{B}

of size

N \times 3

to cancel out the noise and fix the inherent limitation of ToF cameras (see Section 3.4):

B = P_{B} + V_{B}

(3)

To summarize the entire processing flow of our PcBD framework, the overall Algorithm 1 is provided. Through our design, the proposed PcBD can be directly utilized in real-world experiments to process raw scan data and output an outlier-removed target with smoothed projection boundary

B

, which fits a large section of experiment needs. Detailed structures are discussed in Section 3.2, Section 3.3 and Section 3.4.

Algorithm 1 Overall Pipeline of PcBD Processing Flow

Input:: Raw point cloud $P = {p_{i}}_{i = 1}^{N} \in R^{N \times 3}$
Output:: Smoothed boundary point cloud $B$
: /* Step 1: Feature Extraction and Outlier Removal */
1:: for all $p \in P$ do
2:: Find neighbors ${p_{i}}$ within radius R
3:: Calculate LRF $V = {x, y, z}$ and disambiguate direction using Equations (4)–(7)
4:: end for
5:: Concatenate $P$ with LRF $\to R^{N \times 12}$
6:: Extract features $F$ via SA + PT + FP (see Figure 3)
7:: Predict outlier labels $L_{O} \in R^{N \times 1}$
8:: Remove outliers using Equation (1): $P_{C} \leftarrow$ replace outliers with first non-outlier
: /* Step 2: Boundary Detection */
9:: Project $P_{C}$ onto XY-plane: $P_{2 D} \in R^{N \times 2}$
10:: Extract 2D feature $F_{2 D}$ using Equation (8)
11:: Perform cross-attention: $F_{B} \leftarrow CrossTransformer (F, F_{2 D})$
12:: Predict boundary labels $L_{B} \in R^{N \times 1}$
13:: Extract boundary points $P_{B}$ using Equation (2)
: /* Step 3: Smoothing */
14:: Project $P_{B}$ to 2D: $P_{B 2 D}$
15:: Find kNN of $P_{B 2 D}$ in $P_{2 D}$ ; gather $F, F_{2 D}, P_{C}$
16:: Fuse features via PAConv + reshape + PT
17:: Predict displacement vector $V_{B} \in R^{N \times 3}$ via MLP
18:: Compute final smoothed boundary: $B = P_{B} + V_{B}$ (Equation (3))
19:: return $B$

3.2. Feature-Extracting & Outlier Removal

Since our final goal is to predict the two-dimensional boundary of the projected target point cloud, the Feature-Extracting and Outlier Removal block only cleans the spatial outlier points to save hardware space, and we will deal with systematic error noise in the Bound-Detecting block, which will be introduced in Section 3.4.

Figure 3. The Feature-Extracting Block, with detailed structures of the advanced Set-Abstraction and Feature Propagation modules. Note that k is the kNN-sampling neighbors, C, C1, and C2 are number of channels, N is the input number of points of the SA module, and N1 is the output number of points. The Feature Propagation module takes both the input and output features of an SA module as its input.

Figure 3 illustrates the complete structure of the Feature-Extracting Block as well as some detailed modules within it. The input includes a 3D point cloud PC and the corresponding local reference frame vectors LRF, consisting of a

(3 + 9)

-dimensional geometric descriptor for each point. The SA block follows the idea of PointNet++ [15], but extends it by incorporating LRFs and relative angles into the feature input. The output from the SA block is passed into an an advanced Point Transformer layer PT [7] as we integrate the normal vector encoding into the positional encoding in the attention query to further refine the neighborhood structure representation. The down-sampled features are propagated back to the original point cloud resolution through two Feature Propagation FP blocks. Finally, the extracted 3D feature

F

is processed by a Multi-Layer Perceptron (MLP) with one output channel followed by a sigmoid activation function to predict the outlier labels

L_{O}

. The detailed design principals and structural details are introduced as follows:

It is not enough to calculate the relevant geometric information only based on the three-dimensional coordinates of a point cloud. Noisy point clouds may contain a lot of invalid information, so it is necessary to extract more stable embedded connections from three-dimensional coordinates. Deep-learning-based methods can not only process three-dimensional point cloud data, but many claim to accept data inputs that contain more dimensions, such as six-dimensional point clouds that also contain normal vectors. However, in practical applications, most researchers want to directly input the three-dimensional point cloud acquired by the sensor into the model to obtain calculation results without additional operations. We hope that the normal vectors can be internally calculated in the proposed model, thereby improving the robustness of the model under high noise without extra process. Since its introduction, the SHOT descriptor [6] has become one of the most accurate point cloud descriptors and has greatly improved the accuracy of point cloud registration. Its main contribution is the construction of the local reference frames and how it splits the neighborhood of a point and builds a histogram within each split area. In our proposed network, the input point cloud

P

of size

N \times 3

will be sent to build the local reference frames for each point [6]:

M = \frac{1}{\sum_{i : d_{i} \leq R} (R - d_{i})} \sum_{i : d_{i} \leq R} (R - d_{i}) (p_{i} - p) {(p_{i} - p)}^{T} = U Σ V^{T}

(4)

where

V = {x, y, z}

.

Then, the algorithm defines

M (k)

as the subset of points within the support whose distance from the feature point is among the k-closest to the median distance:

M (k) ≜ \{i : | m - i | \leq k, m = arg {median}_{j} d_{j}\}

(5)

And the spatial splitting sets are defined as:

\begin{matrix} S_{x}^{+} & ≜ \{i : d_{i} \leq R \land (p_{i} - p) \cdot x^{+} \geq 0\} \\ S_{x}^{-} & ≜ \{i : d_{i} \leq R \land (p_{i} - p) \cdot x^{-} > 0\} \\ {\tilde{S}}_{x}^{+} & ≜ \{i : i \in M (k) \land (p_{i} - p) \cdot x^{+} \geq 0\} \\ {\tilde{S}}_{x}^{-} & ≜ \{i : i \in M (k) \land (p_{i} - p) \cdot x^{-} > 0\} \end{matrix}

(6)

And the LRF is defined as:

\begin{matrix} x & = \{\begin{matrix} x^{+}, & |S_{x}^{+}| > |S_{x}^{-}| \\ x^{-}, & |S_{x}^{+}| < |S_{x}^{-}| \\ x^{+}, & |S_{x}^{+}| = |S_{x}^{-}| \land |{\tilde{S}}_{x}^{+}| > |{\tilde{S}}_{x}^{-}| \\ x^{-}, & |S_{x}^{+}| = |S_{x}^{-}| \land |{\tilde{S}}_{x}^{+}| < |{\tilde{S}}_{x}^{-}| \end{matrix} \\ z & = z^{+} / z^{-} (same as above) \\ y & = x \times z \end{matrix}

(7)

In the equations, R denotes the neighborhood radius used to construct LRFs,

d_{i}

is the Euclidean distance,

p

is the query point, and

p_{i}

represents its neighboring points such that

d_{i} = ∥ p_{i} - p ∥ \leq R

. Singular value decomposition (SVD) of

M

yields

M = U Σ V^{T}

, where the columns of

U

correspond to the principal axes of the local neighborhood. The first column

x^{+}

is taken as the initial x-axis direction of the LRF. To eliminate directional ambiguity, sets

S_{x}^{+}

and

S_{x}^{-}

, and refined subsets

{\tilde{S}}_{x}^{+}

and

{\tilde{S}}_{x}^{-}

(restricted to a median-based neighborhood

M (k)

) are constructed to decide the final sign of

x

. The final LRF is defined by

{x, y, z}

, as in Equations (5)–(7).

We make full use of the accurate establishment of the point cloud neighborhood coordinate system proposed by the SHOT descriptor, as we use the calculated Z-axis as the normal vector to make the point cloud to size

N \times 6

, and send the calculated X and Y axes together into the Set Abstraction block proposed in [15]. The original SHOT descriptor utilized a complex way to describe the neighborhood of points which is extremely time-consuming. Since we are using a neural network model, we hope to hand this part of the work to the network, so we input the point clouds, calculated 3-axis vectors after kNN-sampling into the Feature-Extraction module for learning. The original Set Abstraction block in PointNet++ firstly kNN-samples the input point cloud

P

with the point-wise feature. Then, the sampled point feature will be sent into a convolutional layer followed by a max-pooling layer, while in the proposed PcBD, the Set Abstraction block takes the calculated 3-dimentional axis as additional input, and applies the grouping operation with the kNN-sampling index as the input feature, as is shown in Figure 3. We also calculate the angles between the neighborhood normal vectors and the local reference of the center point, allowing the network to better learn the distribution of neighborhood surfaces. To handle such comprehensive input, we make use of a recently proposed position adaptive convolutional layer, the PAConv layer [16] to process the combination of 3D coordinates, local reference frames and relative angles. The PAConv block was proposed to replace regular convolutional layers as it attempts to learn 3D structures well with a dynamic kernel called ScoreNet, and has been proved to be efficient and easy to be plugged-in in existing CNN-based models. We make several adjustments on the proposed blocks to be suitable for our shape of inputs, and we also calculate the input feature in the PAConv block with the distances of the kNN neighbors as a weight to enhance the network’s ability to grasp the underlying geometric structure.

After applying a max-pooling layer on the output feature, the final output of the Set Abstraction block contains down-sampled 3D point clouds, down-sampled local reference 3-dimensional axes, and the processed feature. We first normalize the kNN distance to the range of (0,1], then add 1 to it, and then divide it by the feature. This ensures that the network cares more about neighbors closer to the center and less about those farther awayfrom the center. We utilize two Set Abstraction blocks in the Feature Extractor, each followed by a Point Transformer [7] to pay attention to connections between the down-sampled input point cloud and the processed feature, as in several other state-of-the-art works [38]. The Point Transformer was proposed to focus more on local and global structures, compensating for the potential loss of fine-grained details in Set Abstraction blocks. The Point Transformer, through the self-attention mechanism, uses point cloud coordinates as a query reference, making the point-wise feature extraction more reasonable [7]:

y_{i} = \sum_{x_{j} \in X (i)} ρ (γ (φ (x_{i})) - ψ (x_{j}) + δ) ⊙ (α (x_{j} + δ))

(8)

where

φ, α, ψ, γ

are the convolutional layers,

δ

is the position-encoding feature which is often obtained by passing the 3D coordinates through a convolutional layer. ⊙ is the Hadamard product process, and

ρ

represents the SoftMax function. Given an input point-wise feature

X = {x_{i}}_{i}

,

X (i)

represents a local neighborhood of

x_{i}

, usually generated by applying the kNN-sampling index to the feature. In our case, we further let the calculated normal vectors guide the query process in self-attention:

y_{i} = \sum_{x_{j} \in X (i)} ρ (γ (φ (x_{i})) - ψ (x_{j}) + ϕ (δ + δ_{n})) ⊙ (α (x_{j} + ϕ (δ + δ_{n})))

(9)

The position encoding feature

δ

is usually obtained by calculating the relative coordinates from the center point after kNN-sampling, and passing through a simple MLP layer. We operate the calculated normal vectors in the same way to output the normal encoding feature

δ_{n}

. To preserve the original steady structure of the Point Transformer, we concatenate the features

δ

with

δ_{n}

and integrate the information with another MLP layer

ϕ

.

To predict the label of outlier points, the network needs to obtain a point-wise feature

F

of the input point cloud

P

, representing the local structural distributions of each point. We further follow the work of PointNet++ and make use of the Feature Propagation Blocks, which is proposed to propagate features from subsampled points to the original points. We extend the original formula [15] to:

\begin{matrix} f^{(j)} (x) & = \frac{\sum_{i = 1}^{3} ω_{i} (x) f_{i}^{(j)} (x)}{\sum_{i = 1}^{3} ω_{i} (x)} \\ where ω_{i} (x) & = \frac{1}{d {(x, x_{i})}^{2}} \cdot \frac{1}{d {(n, n_{i})}^{2}}, j = 1, \dots, C \end{matrix}

(10)

Here,

x

denotes a query point in the original point cloud, and

x_{i}

are its three nearest neighbors from the subsampled feature set. The feature

f^{(j)} (x)

is the interpolated value of the j-th feature channel at point

x

, and

f_{i}^{(j)}

is the j-th channel of the i-th neighbor point. The interpolation weight

ω_{i} (x)

combines both spatial proximity and normal consistency, where

d (x, x_{i})

is the Euclidean distance and

d (n, n_{i})

denotes the cosine distance between normal vectors. This design allows the calculated normal vectors to make contribution to the weight

ω_{i} (x)

on the propagation process to reserve more neighborhood data in the output point-wise feature. After 2 sets of Feature Propagation Blocks, the output feature

F

will be of the same point size as the input cloud

P

, and outlier-point labels

L_{O}

are obtained by applying an MLP layer with an output channel of 1, and a sigmoid layer to map the labels between 0 and 1.

At last, we apply a score extractor to clean the marked outlier points. The score extractor simply sets the predicted outlier points to non-outlier points based on the predicted outlier-point labels. We set the outlier to the first non-outlier point in the proposed PcBD. Note that we use appropriate functions in Pytorch [39] to maintain the continuity of point cloud data for backpropagation, as is shown in Equation (1). In this article, we set the threshold

θ_{O}

for outlier points to 0.7, and a new threshold could be set in other applications during the fine-tuning stage. We apply the same operation on the point-wise feature

F

to remove the influence of outlier features on subsequent operations, and then input the Outlier-Removed point cloud

P_{C}

and the Outlier-Removed point-wise feature

F_{C}

into the Boundary-Detecting block. The calculated local reference frames are not accurate after Outlier-Removal, so we no longer use them in subsequent operations.

3.3. Boundary-Detecting

Now that we obtain the Outlier-Removed input point cloud

P_{C}

with a point-wise feature

F_{C}

, our next goal is to find the geometric characteristics of

P_{C}

on the 2D plane. We firstly remove the third dimension of

P_{C}

to turn it into a two-dimensional point cloud

P_{2 D}

on the X–Y plane. Note that we design a proper Pytorch function with backpropagation to achieve three-dimensional to two-dimensional projection to ensure the continuity and backwardness of the data during training. The projected point cloud

P_{2 D}

will be of size

N \times 2

. In the same way as the 3D inputs, we treat the kNN-sampling process as an essential way to process point clouds in the lower dimension, so we apply kNN-sampling on the projected point cloud

P_{2 D}

to find neighbors for each point on 2D plane, and the 2D kNN feature

F_{2 D}

is extracted by applying several layers (see the 2D-Detection Block in Figure 4):

F_{2 D} = T (ψ (PAConv [P_{2 D}, φ (P_{2 D}) \cdot (1 + dist)]))

(11)

Here, the projected point cloud

P_{2 D}

is kNN-sampled. As in the Feature Extractor, we utilize the kNN neighborhood distances as a weight to split the network’s attention to features of kNN neighbors with different distances to center. After an MLP layer

φ

and the PAConv block, the size of the output feature will be

N \times D \times k

, where D is the channel of feature, and k is the kNN-sampling number. We reshape this feature to size

N \times (k D)

, making it suitable for an extra MLP layer

ψ

with the kernel size the same as the kNN-sampling number to combine the distributed neighborhood features, eliminating the need for max-pooling. We utilize another Point Transformer block T to process the combined kNN-neighborhood feature, just as in the Encoder. Here we modify some layers in the block to adapt to the input two-dimensional point cloud coordinate reference. The output feature

F_{2 D}

with size

N \times D

now contains 2D region distribution information.

It is not enough to just use the point cloud feature on the two-dimensional plane. To solve the previously mentioned problem of determining the hull-wrapping method, it is also necessary to combine the previously extracted three-dimensional feature of the point cloud. Here, we propose a Transformer structure (see the Cross Transformer in Figure 4) to make cross-attention between 2D and 3D coordinates and features. As is illustrated in Figure 4, we firstly make the 3D point-wise feature

F_{C}

as the query vector, and the 2D point-wise feature

F_{2 D}

as the key vector to establish cross-modal associations between 3D structures and the 2D projection. The initial value vector is composed of a combination of 2D and 3D features and encoded by an MLP layer. After a Point Transformer block, the network looks for relevant information in the 2D point-wise feature

F_{2 D}

based on the 3D point-wise feature

F_{C}

, and the 3D query is able to focus on relevant 2D key points, effectively capturing essential spatial and semantic correlations between the original 3D point cloud and its 2D projection. The value output at this time serves as an enhancement of the 3D features, providing a 3D representation with 2D details.

In order to extract the projection boundary, we swap the roles of query and key in the next step, as shown in Figure 4, the 3D point-wise feature

F_{C}

now serves as the key vector, while the 2D point-wise feature

F_{2 D}

serves as the query vector. Through this process, we let the 2D feature actively seek relevant information in the 3D feature space, and the network now focuses more on the refinement and enhancement of the 2D point-wise feature to meet the needs of the final extraction of the projection boundary. The 3D structural data is embedded into the value vector after the first Transformer, and the value vector obtains further attention on 2D geometric data afterwards. It now contains a deep combination of 2D and 3D information, while focusing more on 2D shapes, which is suitable for the boundary-detecting task as we name it the boundary feature

F_{B}

.

Similar to the Feature Extractor, the boundary labels

L_{B}

are obtained by applying an MLP layer and a sigmoid layer to the boundary feature

F_{B}

, and the predicted projection boundary is extracted after another score extractor. Note that in the Feature Extractor, the score extractor was low-pass for the outlier-point labels

L_{O}

, while being high-pass for the boundary labels

L_{B}

. In the Feature Extractor, we replace the predicted non-boundary points with the first boundary point, as in Equation (2). We set the threshold

θ_{B}

for the boundary to 0.6 in this article, which means that any point with a predicted boundary label greater than 0.6 will be considered as a point of the projection boundary

P_{B}

. Here we directly use the 3D point cloud

P_{C}

to output the three-dimensional coordinates of the projection boundary points. This ensures the consistency of the network outputs

(P_{C} - P_{B})

, and is conducive to subsequent processing of the projection boundary.

3.4. Smoothing

As was discussed in Section 3.2, simply applying an outlier-removal process is not enough to correct errors in ToF camera imaging. It is time to deal with the systematic errors, which manifest themselves as random ranging noise on the point cloud. Since we are only focusing on the projection boundary, we only design a smoothing model for the output boundary prediction

P_{B}

, effectively reducing the computing power requirement. The smoothing process not only de-noises the boundary point cloud, but also corrects the inherent limitation of ToF cameras, that is, the incompletion of the target edge, which always occurs when the target has non-planar edges, especially in regions where the surface normal deviates significantly from the camera’s viewpoint, as is shown in Figure 5.

Point cloud de-noising methods based on deep learning can be roughly divided into two types: predicting displacement vectors for each point [22,25], or directly fitting underlying surfaces [23,24,26]. Our PcBD model follows the first method to predict a replacement vector

V_{B}

to smooth the predicted boundary. It has been argued that shrinkage and outliers may cause inaccuracy in the prediction of displacement vectors, however our input

P_{B}

has been outlier-removed, and we will introduce how to eliminate the effects of shrinkage by combining 2D and 3D features below. On the other hand, the underlying-surfaces-based methods is only suitable for de-noising tasks for the whole point cloud, they themselves have a fixed shortcoming of insufficient noise removal on the edge of the point cloud, which happens to be the part we are most concerned about.

To precisely predict the proper projection boundary to fit the spatial relationships, we design a smoothing block as Figure 6 shows. We first project the predicted three-dimensional boundary

P_{B}

to a two-dimensional boundary

P_{B 2 D}

, in order to make kNN neighbor connections with the projected 2D shape. We find the k-nearest neighbors of the projected boundary in the projected point cloud and group relevant features including the 3D point-wise feature

F_{C}

, the 2D point-wise feature

F_{2 D}

and the 3D outlier-removed 3D point cloud

P_{C}

using the kNN neighbor information. The shapes of the corresponding outputs are as shown in Figure 6. Each boundary point in the 2D projection now has an associated neighborhood that includes 3D coordinates and point-wise features from 3D structures and 2D shapes. This combination enables the model to refine the boundary smoothing, while maintaining the geometric integrity of the underlying projection hull shape.

We then send the combined kNN features and coordinates into a PAConv block to integrate 3D information with the 2D boundary. This process enhances the network’s ability to understand the extracted boundary from a higher dimensional perspective. Through our design, the PcBD model will not only try to smooth the two-dimensional shape, but also makes the detailed 3D local structure more reasonable. We reshape the output feature to shape

N \times (k \cdot C)

, so that the subsequent blocks can operate on each point based on its multi-dimensional description. A Point Transformer block follows next, to adaptively focus on relevant points in the neighborhood of each boundary point using self-attention. This is particularly useful for a reasonable allocation of the role of 2D and 3D features in the smoothing process.

At last, the output of the Point Transformer is passed through a simple MLP layer, leading to a moving vector

V_{B}

of the size

N \times 3

. The smoothed projection boundary can be simply obtained by adding the moving vector

V_{B}

to the 3D boundary

P_{B}

, as in Equation (3). We insist on setting the smoothing vector and the final boundary output of the network to three-dimensional, so that users can directly combine the output smoothed boundary

B

with the outlier-removed point cloud

P_{C}

for subsequent processing, which is useful for other tasks, such as registration, safety warnings, etc.

4. Experiments and Results

We firstly introduce the proposed Bound57 dataset in this section, including the generation steps and training setups. Then we show the results on the Bound57 dataset of the proposed PcBD model. As for the competition methods, we have chosen several existing point cloud de-noising methods to test the outlier-removal and boundary-smoothing results, then we employ some existing methods of 2D projection hull building to test the boundary detection results. We test our pre-trained model on some other tasks as well to test its robustness. Finally, an ablation study is introduced to analyze the efficiency of some detailed modules.

4.1. The Bound57 Dataset

We have designed a dataset, Bound57, for the tasks which PcBD covers. The dataset contains an input target point cloud containing initial noise, outliers, and incomplete edges, as well as an outlier-removed ground truth, a de-noised ground truth, an edge-patched ground-truth and a projection boundary ground truth. We expect the users to employ the proposed dataset as a flexible verification baseline for their point cloud data processing process, so it can be used as a point cloud outlier-removal dataset, a point cloud de-noising dataset, an edge-patching dataset, or a point cloud projection boundary dataset. In our case, we only use the outlier-removed ground truth, and the projection boundary ground truth to test the outlier-removal and projection boundary predicting and smoothing abilities of our network, for that the projection boundary ground truth can be also considered as part of the de-noised ground truth. Compared to other point cloud de-noising datasets [22,26], our dataset contains much more shapes, better simulates real-life point cloud data generated by ToF cameras, and provides labels for outlier points.

As is shown in Figure 2, our dataset starts with 3D models from the ShapeNet [40]. Designed to provide large-scale 3D shape data for computer vision and computer graphics research, the ShapeNet dataset contains a large amount of 3D models in 57 categories, covering a variety of object types such as furniture, vehicles, buildings, and everyday objects. Given a 3D model in .obj format, we firstly import it as triangular meshes using the related function from the pytorch3d library. Then, we randomly define a set of spherical coordinates

(r, θ, φ)

to determine the view spot. In the Bound57 dataset specifically, the radial distance r is uniformly sampled within the range

[1.25, 3.0]

, and the 3D models are already normalized in the unit sphere. The elevation angle

θ

is sampled from

[- 90^{\circ}, 90^{\circ}]

, and the azimuth angle

φ

is sampled from

[0^{\circ}, 360^{\circ}]

, enabling a full coverage of viewpoints around the object. We define two Field-of-View (FoV) cameras in the pytorch3d library under this viewpoint. The first camera has a resolution of

640 \times 480

to simulate a ToF camera, while another has

2560 \times 1440

pixels to generate the ground truths. We generate two grayscale depth maps from the imported 3D meshes by rendering them onto the 2D image plane for both cameras. The rendering is achieved by related functions in the Pytorch3D library, and the depth map rendered by the ground-truth-generating camera is of higher resolution.

We then design a function to extract contours from a given depth map. It begins by identifying the top-left pixel of the depth map as the background depth value, and creates a binary image where non-background pixels are set to an intensity of 255, and background pixels are set to 0. By employing the Canny edge-detection algorithm [41] in the OpenCV library [42], a second binary map is generated, only setting the projection boundary to an intensity of 255. The boundary pixel indices are extracted using OpenCV functions, allowing us to highlight the 2D projection boundary in the original depth map based on these indices. Since we have already set the model’s pixels to the same grayscale value in the first step, the accuracy of the boundary extraction algorithm is ensured. This uniformity in pixel intensity guarantees that the boundary detection process effectively distinguishes between the object and the background, minimizing possible errors.

The rendered depth maps can be back-projected to output the needed point clouds using functions in the Pytorch3D library. The detailed steps are shown in Figure 7, as we transform a given depth map from the pixel coordinate system u–v to the image coordinate system

x o y

, and then transform to the camera coordinate system

O_{c} X_{c} Y_{c} Z_{c}

, leading to point cloud data similar to the outputs of common ToF cameras. We didn’t convert the point cloud to the world coordinate system as in other datasets such as PCN [43] so that researchers can directly use the point cloud obtained from ToF cameras as the input without any pre-processing. With the extracted contour pixel index, the projection boundary can be back-projected in the same way. However, due to limitations of the previously used algorithm, the extracted contour index may not be on the object in the depth map. As in Figure 8, for the pixels with correct depths, we transform them as a point in the output boundary point cloud, for the pixels with depths equal to the camera’s far plane (Zfar), we propose a method to replace it with a neighbor point: for a pixel with depth of Zfar, we firstly check the depth of eight neighbor pixels. For the neighbor pixels with correct depths, we further check their four neighbor pixels alongside u and v axis. Those with at least one secondary neighbor with Zfar depth will be considered as a boundary pixel. This is because if a pixel has four neighbor pixels on the object, it cannot be the projection boundary. In this way, the final output boundary point cloud can well describe a detailed two-dimensional projection of the object without errors in depth information.

For the ground-truth camera, we back-project both the depth map and the extracted contours to output ground truths for up-sampling and boundary-detecting tasks. For the ToF-simulating camera, we only back-project the depth map as an intermediary point cloud, while we mark the boundary points in it in an additional dimension as the ground truth of the boundary labels

L_{B}

. Users can use this intermediary point cloud as a ground truth for point cloud de-noising tasks.

At last, we artificially make the intermediary point cloud more similar to real scans. As is shown in Figure 9, to simulate the output point clouds of a ToF camera, we firstly introduce deficiencies in the edge regions, which makes sense because defects always appear in sparse areas and non-planar structures where the reflection is the weakest in real scans, while sparse and non-planar structures always occur near the boundary (such as bicycle wheels in Figure 2). For an input point cloud, we calculate the average X–Y plane distances from each point to the nearest 32 points as the local 2D density, which will be used as a criterion for judging sparse areas. Our overall idea is to select seed points in sparse and complex areas and then apply the relative operations, and we divide the point cloud into n parts based on the polar angle of each point to do that, avoiding selecting seed points from the same area. We set n as a random number between 8 and 32.

In each divided part, the boundary points are extracted using the boundary labels

L_{B}

, and we select at most two seed points within the boundary points in each divided part: the boundary points with at least 50% lower local 2D density than average will be grouped as sparse boundary points, and we choose one with the lowest local 2D density as a seed point. For the remaining boundary points, we introduce another new criterion to select a seed point: we count the number of nearer points alongside the Z-axis in the 32 closest neighbors for each boundary point, and select one with the largest number as a seed point. This criterion is able to select the points with the most complex local structure and thus the most likely to have defects. In this way, we select one complex seed point and at most one sparse seed point, ensuring that defects that do not comply with imaging rules will not occur.

For each extracted sparse seed point, we randomly select 16–128 kNN-neighbor points, and then randomly eliminate 12.5–33% of them. For the complex seed points, we randomly select 64–256 kNN-neighbor points, and then randomly eliminate 50–80% of them. It is reasonable for the sparse seed points to have fewer neighbors in the same range, while complex structures generally cover a wider range in space. The output point clouds with defects are shown in Figure 9, we successfully introduce suitable defects in the sparse and complex 3D structures, accurately simulating the imaging characteristics of a ToF camera.

Now it’s time to add noise to the point cloud. Our method here is also based on point cloud density, as we multiply the previously local 2D density by a constant

V_{n}

as a reference value

N_{C}

for adding noise:

N_{C} = V_{n} \cdot density (P_{2 D})

(12)

In actual ToF imaging, the ranging accuracy is inversely proportional with the square of the target distance. So, as we add Gaussian noise to the point cloud to simulate the ranging error, we set the standard deviation value of each point as the square of the target point’s Z coordinate multiplied by the reference value

N_{C}

. The noise generated in this way has different standard deviations at each point, which can effectively avoid overfitting during training. After proper adjustment, the value of

V_{n}

in the proposed dataset is set to 0.15.

In addition to random noise on the target point cloud, we also need to introduce outliers for the outlier-removal task. We firstly set the number of outlier points

O_{N}

randomly as 5–15% of the total number of points of the input point cloud. This will ensure that we will not generate too many tasks that are too simple or complex, making the proposed dataset suitable for training universal networks. We randomly choose

O_{N}

points in the noised point cloud, then generate the outliers by moving these points along a random 3D vector. In ToF imaging, the noise along the depth direction (Z axis) is significantly higher than the other two axes, so we set the noise level along the Z axis higher and the other two axes lower. Specifically, we set the Z-axis noise standard deviation of the outlier movement vector to 50–100 times

N_{C}

, and the X and Y-axis noise standard deviation 5–20 times

N_{C}

. We add another additional dimension to mark the outlier points as the ground truth of the outlier labels

L_{O}

. The point cloud shown in Figure 9 proves the authenticity and rationality of the outliers in our dataset, and that our final five-dimensional output is ready for training.

We split the objects from ShapeNet in a rough proportion to 8:1:1 for training, validation, and testing. In total, we get 57,449 objects from 57 categories, 45,935 for training, 5800 objects for validation, and 5714 for testing. For each object in the training dataset, we generate data of eight viewpoints, each data pack of a viewpoint contains an input noised point cloud with outliers in .pcd format, a boundary label and outlier label for each point in .npy format, and a ground truth 2D boundary point cloud in .pcd format. If needed, the dataset could also output a de-noised ground truth and an up-sampled ground truth in .pcd format.

4.2. Training

We implemented our network on Pytorch (https://pytorch.org/) and trained it on NVIDIA A100 GPU (Santa Clara, CA, USA). We set the learning rate to 0.0002 and exponentially decay the learning rate by 0.95 after every 10 epochs. During the training process, we down-sample the input point clouds along with their associated outlier and boundary labels to a fixed size of 4096 points using shared indexing. Similarly, the ground truth for the 2D projection is also down-sampled to 4096 points to maintain consistency with the input size. This ensures that the network is trained on uniform point sets, optimizing both computational efficiency and learning efficacy.

Our loss function L is designed to optimize three key aspects of the network’s output: accurate classification of outlier and boundary points, precise reconstruction of cleaned and boundary point clouds, and effective adjustment of boundary points to their target positions. It combines the binary cross-entropy loss (BCE) [44] for label predictions with Chamfer Distance (CD) [45] for point cloud alignment:

L_{CD} (P, G) = \frac{1}{N} \sum_{x_{i} \in P} min_{g \in G} ∥ x_{i} - g ∥

(13)

L_{BCE} (L, L^{'}) = - \frac{1}{N} \sum_{i = 1}^{N} [L_{i} log (L_{i}^{'}) + (1 - L_{i}) log (1 - L_{i}^{'})]

(14)

\begin{matrix} L & = [L_{BCE} (L_{O}, {gt}_{L_{O}}) + L_{CD} (P_{C}, {gt}_{P_{C}})] \\ + [L_{BCE} (L_{B}, {gt}_{L_{B}}) + L_{CD} (P_{B}, {gt}_{P_{B}})] \\ + [L_{CD} (B, {gt}_{B})] \end{matrix}

(15)

Here,

L

and

L^{'}

are two input labels to calculate BCE loss,

{gt}_{L_{O}}

and

{gt}_{L_{B}}

are the corresponding ground-truths of predicted outlier labels

L_{O}

and boundary labels

L_{B}

, while

{gt}_{P_{C}}

and

{gt}_{P_{B}}

refer to the outlier-removed point cloud and the projected boundary extracted from ground-truth labels.

{gt}_{B}

is the ground-truth boundary from the dataset. Given the input point cloud

P

and ground-truth

G

, Chamfer distance is widely used in 3D vision tasks [38,46] to measure the similarity between two point clouds, particularly when the correspondences between points are not known. It penalizes the average squared distances between each point in one point cloud to its nearest neighbor in the other one. In our implementation, we adopt the

L_{1}

variant of Chamfer distance as in Equation (13), where each distance term

∥ x_{i} - g ∥

denotes the point-wise Euclidean distance without squaring. Compared with the squared

L_{2}

formulation, the

L_{1}

Chamfer distance is less sensitive to outliers and yields more stable gradients, especially in regions with sharp geometric boundaries or sparse correspondences. In the training step, we calculate the CD loss between the outlier-removed prediction

P_{C}

and the corresponding ground-truth with the BCE loss between the outlier label prediction

L_{O}

and the corresponding ground-truth to evaluate the Feature-Extracting & Outlier-Removal block; we calculate the CD loss between the 2D boundary prediction

P_{B}

and the corresponding ground-truth with the BCE loss between the boundary label prediction

L_{B}

and the corresponding ground truth to evaluate the Boundary-Detecting block. Finally, we calculate the CD loss between the smoothed 2D boundary prediction

B

and the corresponding ground truth to evaluate the Smoothing block. In this way, we can ensure a balance between classification accuracy and geometric fidelity, leading to a comprehensive learning objective that aligns with the network’s goals of outlier removal, boundary extraction, and refinement.

4.3. Results on Bound57

In our experiments, we test the performance of the proposed PcBD network in three steps: the outlier-removal results, the boundary detection results, and the boundary de-noising results. For outlier-removing, we evaluated the effectiveness of our proposed method by comparing it with several traditional and state-of-the-art point cloud de-noising approaches, including Radius Outlier Removal (ROR) [47], Statistical Outlier Removal (SOR) [48], DBSCAN [49], PointCleanNet [22], DMRde-noise [24], Score de-noise [26], and PD-LTS [50]. For ROR, SOR, and DBSCAN, we directly write the codes based on the algorithm principles and test them on the dataset. We set the input parameters of these geometric methods as the average point-to-point distance of the input point cloud multiplied by a set fixed value, which is consistent with the setting process of general experiments. For DMRde-noise and Score de-noise, we train the networks on the proposed dataset Bound57 using their respective open-source code, best hyperparameters, and the loss functions reported in their respective papers. We use the ground-truth boundary labels to extract the boundary points of the output point cloud from the comparison methods, calculate the CD loss between the ground-truth boundary point cloud, and add it to the loss function of the comparison methods. This allows us to compare the smoothing effects of different methods on the target boundary point cloud. The PointCleanNet and PD-LTS methods have been verified in the original articles to have the ability to remove outlier points, so we utilize their provided pre-trained models so that they can only be trained on local patches sampled from several target point clouds, which is not suitable for our dataset. The outlier-removal results of some categories and the overall average are shown in Table 1, as we use the CD loss of the cleaned point cloud to the extracted outlier-removed ground truth as a judging standard.

It should be noted that the unlike our proposed PcBD which firstly removes outliers then de-noises the projection boundary, the output point clouds of the compared deep-learning-based methods (besides PointCleanNet) are both outlier-removed and de-noised, so it is unfair to judge their outlier-removing abilities by comparing their outputs with the un-de-noised ground truths. However, we could still extract some effective information from the table. For example, for traditional methods (ROR, SOR, and DBSCAN), their outlier-removing effects vary greatly in different catalogs, while PcBD, Score de-noise and PD-LTS are relatively stable. We make a visual comparison in Figure 10 to more intuitively show the effects of different methods on the outlier points. It can be seen that the geometric methods such as ROR and SOR suffer from the differences in density in different local regions, especially the lower half of the piano, which has been completely cleared due to its low density. The PointCleanNet predicts too many outliers on the proposed dataset, resulting in the output point cloud being too sparse. The Score de-noise and PD-LTS have some point cloud shrinkage problems, and although they smooth the point cloud underlying surfaces, some outliers close to the surface are still not filtered out. Compared with the above methods, PcBD accurately predicts almost all of the outliers, and can also use 3D structural features to distinguish target points from outliers in some relatively sparse areas (such as bicycle handlebars and spokes).

To evaluate the boundary detection performance of our method, we compared the boundary outputs with several approaches, including traditional and some improved novel methods. The

α

-Shapes [33] was chosen as a classic geometric method, providing a baseline for simple and robust boundary estimation. We chose the Adaptive

α

-Shapes [34] as it automatically calculates the parameter

α

. The Grid-Contour [51] and the normal-vector-based method proposed in [52] are also tested as a complement to geometric methods. As above, we adaptively set the input parameters of these methods to a value related to the input density to make sure they achieve the best results. We applied these methods on the outlier-removed ground truth to eliminate the effects of outliers, and the results are illustrated in Table 2. Most deep-learning-based methods on point cloud boundary detection focus on 3D semantic edges [37,53,54,55], so we can’t directly compare the performance. However, we show the potentials to fulfill 3D boundary-detection tasks of PcBD in Section 3.4 as a proof of the advanced nature of our network.

A visual comparison of different 2D boundary-detecting methods is made in Figure 11. When

α

-Shapes faces an input point cloud whose basic density does not match the set

α

value, the predicted projection boundary may be missing some parts. Other prediction methods may output holes caused by noise inside the point cloud. Compared with these methods, PcBD also compares and establishes connections between the three-dimensional features of the input point cloud, so that it can accurately predict the projection boundary while avoiding identifying internal areas as boundaries. On the other hand, even if it has not been smoothed, the predicted projection boundary of PcBD is already smoother than the above methods. The improvement of PcBD across different shapes suggests that point-wise features with 3D context and attention across 2D and 3D features significantly outperform handcrafted 2D descriptors. Notably, PcBD’s performance does not rely on pre-determined parameters, but instead learns to interpret structural continuity via cross-dimensional attention. This allows the model to distinguish real boundaries from internal gaps caused by noise or projection sparsity.

To evaluate the boundary smoothing, or the de-noising performance of our method, we compared Bilateral Filter [10], Iterative guidance normal filter [56], Weighted Multi-Proj [57], Sparse Regularization-Based de-noising [58], Moving Least Squares (MLS) [59], and Adaptive MLS [60] as traditional geometric approaches, while the PointCleanNet [22], PCDNF [61], PointFilter [62], DMRde-noise [24], Score de-noise [26], and PD-LTS [50] are utilized as deep-learning based methods.

For the geometry-based methods, we adaptively set the input parameters using average point distances, and apply them on the outlier-removed point clouds. We extract the boundaries using the ground truth labels afterwards, and then compare them with the ground truth boundaries. For DMRde-noise and Score de-noise, we directly input the raw inputs on the trained networks on the Bound57 dataset. The PointCleanNet provides two pre-trained networks: an outlier-removing one and a de-noising one. We choose them all so that the overall processing flow is similar to the proposed PcBD, and the boundaries of the output de-noised cloud can be extracted using the ground truth boundary labels. For PD-LTS, we directly extract the boundaries on the output point clouds of the outlier-removal stage, as they have already been de-noised. For PCDNF and PointFilter, we found that the original boundary points were moved a lot after processing, so we gave these methods another opportunity to directly de-noise the boundary point clouds extracted from the proposed dataset. The results are shown in Table 3.

Judging from the results in Table 3, the PCDNF achieved greater accuracy; however, it is based on the extracted projection boundary ground truths. Even so, PCDNF is less accurate than PcBD in over half of the categories in the dataset (32 of 57 categories). Comparatively, PointFilter has the same advantages but does not perform better than PcBD. The DMRNoise is now proved unsuitable for training on the proposed dataset for it performed poorly on both outlier-removing and de-noising. The pretrained model of PointCleanNet and PD-LTS also did not perform well. This is easy to explain because the dataset they originally used generated noise at a certain level, and the noise level was also needed to be an input when de-noising. Among the geometry-based methods, the MLS has the best de-noising effect on projected boundaries. Note that PcBD outperforms many traditional de-noising methods even though it was not explicitly trained on per-point displacement loss. This indicates that learning the underlying shapes is more effective than learning to predict moving vectors. Furthermore, the use of projected neighborhood grouping in 2D combined with 3D features enables better completion of low-density regions, which geometric methods often fail to restore due to their reliance on local consistency alone. Figure 12 shows some of the boundary point clouds output by those methods. We chose some simple and some complex shapes to compare the smoothing effects of different methods on boundaries, and demonstrate objects with more curves, because our proposed method may deal with a large number of flexible and deformable targets. It can be seen from the visual comparison that the smoothed boundary of PcBD has the best consistency, while the outputs from other methods are scattered and sparse. In parts of the input point cloud with lower density (such as a corner of a pillow, a motorcycle handle, and a rifle muzzle), the comparison methods predicted poorly and had fractures. On the other hand, PcBD can better identify these low-density parts and make up for them during the smoothing process. This also proves the superiority of our proposed Bound57 dataset, which generates more errors in complex parts to challenge the performance of a method.

To sum up, PcBD’s performance on the Bound57 dataset exceeds that of most existing point cloud processing methods. This is under the circumstance that we introduced ground truth labels for other methods as a reference. The results support the conclusion that PcBD’s combined processing flow of outlier removal, boundary detection, and smoothing forms a coherent beneficial pipeline. The continuous feature across modules contributes not only to improved accuracy but also robustness across object types. The advantages of PcBD as a point cloud processing flow are once again reflected: if the best-performing models except PcBD in the above three tests are selected and combined into one processing flow, the final output smoothed boundary results will only be worse (as in Figure 13), while PcBD applies the point cloud features extracted from each part of the task to subsequent operations, ensuring the ability to complete all tasks.

4.4. Other Experiments

In this section, we added three experiments to test the PcBD method: first, we use PcBD to extract and smooth the projection boundaries of real-world scans from wind tunnel tests. We selected two different parachutes to simulate flexible deformable objects that may be encountered in practical applications. The output results are shown in Figure 14. The first target is a circular parachute. PcBD successfully predicted the shape of its projected boundary, and the network successfully identified boundary points at sparse edges and ignored a hole caused by reflection errors in the point cloud. The second target is a star-shaped parachute with front traction ropes. The edges are sparser in the point cloud, just as we generated in Bound57. PcBD also outputs its projected point cloud and identifies sparse boundaries while removing necessary outliers. The parachute point clouds we collected were relatively dense, so we set the input point to 16,384, which also proved that PcBD’s pre-trained model has the ability to process point clouds of different sizes.

We then tested the PcBD model trained on Bound57 on an existing real-world scan dataset, WHU-TLS [63]. This experiment further tested the following capabilities of the model: generalization of targets, directions of projection, and the adaptability to the number of input points. As shown in Figure 15, we selected two scenes in the dataset: pedestrians and a vehicle (a), and trees (b). The pedestrians and trees are selected as deformable targets, and the vehicle as a rigid target to test the generalization of the model to different targets. In the experiment, we selected different directions as projection vectors to test the directional generalization projection boundary extraction capability of PcBD, and sampled different numbers of input point clouds to test the adaptability of the model to the number of input points. As in the figure, the PcBD model can fulfill the tasks of target outlier removal, projection boundary detection and smoothing at different input point numbers, and has good adaptability to both rigid and deformable objects. When changing the projection direction, PcBD can still output accurate and valid smoothed boundaries. However, when the projection direction leads to an exposion of too many incomplete target structures (such as the car in the third picture in Figure 15a), the error of the detected boundary will increase. This is because PcBD relies on complete two-dimensional and three-dimensional structures for predicting the projection boundary. For targets such as trees, the original point cloud is relatively scattered in the first place, but PcBD can still make reasonable boundary predictions and remove potential outliers, which has proved the rationality of our proposed processing flow.

At last, we tested PcBD’s ability to extract three-dimensional boundaries. For an input solid point cloud, we rotate it at different angles, sample the point clouds to 16,384 points, process them with PcBD, and then splice the projection boundaries of the target in all postures. We designed the process of rotating the input point cloud to ensure that the three-dimensional point cloud boundaries in all directions can be completely extracted. The result is shown in Figure 16. The PcBD network outputs fine three-dimensional point cloud boundaries and has great potential. We believe that through certain fine-tuning, the network we propose can have the ability to complete this task better.

4.5. Ablation Studies

To prove the necessity of the modules we designed for the PcBD network, we design an ablation study to examine each of the proposed essential blocks. We replace one of the PcBD’s innovative blocks with traditional units at a time and train these versions of the network on the proposed Bound57 dataset. The ablation studies follow the order of the processing flow of our network: Firstly, we replace the weighted calculation of normal vectors from SHOT to traditional PCA method, as in many geometry-based point cloud de-noising approaches. Then, we relace the PAConv blocks in the Set Abstraction block to regular convolutional layers while the kNN distances, normal vectors and angles are directly concatenated. We replace the proposed Point Transformers with normal vectors with the original ones in the Feature-Extracting block next. To verify the rationality of the Boundary-Detecting block, we firstly remove the part of extracting the relationship between 2D and 3D point clouds, and, in another version, we directly detect the boundary from the 2D projection point cloud. We do similar adjustments in the Smoothing block as we directly predict the moving vector on the detected boundary in 3D coordinates. The test results of these versions of PcBD on the Bound57 dataset are listed in Table 4.

The results show that our proposed design of the Feature-Extracting block, Boundary-Detecting block and Smoothing block outperforms the widely-used blocks, and that the position of employment we choose is proper. Furthermore, the connection of the 2D and 3D features enables the network to learn structures and shapes with greater complexity.

5. Discussion

The proposed PcBD framework integrates three blocks: Outlier removal, boundary detection, and smoothing, into a processing flow. This structure solves a long-standing challenge in point cloud processing: Transforming the raw point clouds into target 3D data that can be used for measurement tasks. Experimental results on the Bound57 dataset demonstrate the advantages of PcBD on related tasks, and ablation studies show that PcBD benefits from advanced structures such as feature extraction guided by local reference frames, long-context feature reuse, and cross-attention perception between 2D and 3D information, which enables more consistent performance across shapes and densities.

In addition, experimental results on real-world scans and datasets demonstrate the effectiveness of our proposed Bound57 as a benchmark dataset for practical ToF scenarios. Bound57 introduces sparse defects, outliers, and ranging errors to better reflect real-world scans. The continued failure of many traditional and state-of-the-art methods on Bound57 proves that the datasets on which they were trained were flawed.

However, limitations still exist. PcBD is currently optimized for projection boundary smoothing and does not explicitly handle the ranging errors of internal points. However, the idea of combining 2D and 3D features has been proven to be effective in smoothing boundary points, and we look forward to expanding the model to the overall point cloud de-noising task based on this idea in future research. On the other hand, PcBD’s prediction results for three-dimensional edges are positive, and we believe that after a reasonable reconstruction of the network structure, PcBD will also be able to handle this task. For some tasks such as boundary extraction of tree point clouds, we believe that specific semantic features are needed to guide them. If combined with target recognition algorithms to introduce long-range semantic features to optimize the tasks along the processing flow, better results are to be expected.

6. Conclusions

In this paper, we propose a novel network architecture PcBD for point cloud outlier removal, projection boundary extraction, and boundary smoothing in one processing flow. By introducing local reference frames, improved 3D point cloud processing modules, and letting 2D and 3D features make connections and guide each other’s work during processing, PcBD is able to learn to output high-precision de-noised point cloud projection boundaries, ensuring more accurate and reliable processing for downstream applications. Additionally, we introduced a dataset, Bound57, designed to simulate realistic ToF camera outputs, incorporating edge deficiencies and synthetic noise to reflect real-world sensor imperfections.

Compared to existing methods, PcBD achieves:

A 89.47–99.30% reduction in Chamfer-L1 distance in outlier removal, outperforming methods such as ROR and PD-LTS;
A 58.15–81.33% improvement in boundary detection accuracy over methods such as Grid-Contour and Adaptive $α$ -Shape methods;
Superior smoothing results with an average Chamfer-L1 distance of 9.92, surpassing traditional methods such as MLS and modern networks such as Score de-noise.
As a processing flow, significantly outperforms a processing flow consisting of three different advanced methods in completing the above tasks simultaneously.

In addition, the proposed Bound57 dataset serves as a benchmark for multi-task point cloud processing flows, filling a gap in current public datasets. Together, PcBD and Bound57 offer a reliable and extensible framework for future work in point cloud processing and real-time 3D detection.

Author Contributions

Conceptualization, S.S.; methodology, S.S.; software, S.S.; validation, S.S. and J.H.; investigation, S.S.; resources, S.Z.; writing—original draft preparation, S.S.; writing—review and editing, J.H. and S.Z.; visualization, S.S.; supervision, S.Z.; project administration, T.H.; funding acquisition, T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang, grant number 2024C01131, and the Key Research and Development Program of JiaXing, grant number 2024BZ20017.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Codes of the PcBD model presented in this paper and the proposed Bound57 dataset are available at https://github.com/desperadossy/PcBD, accessed on 1 June 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ToF	Time of Flight
MLP	MultiLayer Perceptron
kNN	k-Nearest Neighbors
LRF	Local Reference Frame
GT	Ground Truth
CD	Chamfer Distance
BCE	Binary Cross Entropy

References

Henry, P.; Krainin, M.; Herbst, E.; Ren, X.; Fox, D. RGB-D Mapping: Using Kinect-Style Depth Cameras for Dense 3D Modeling of Indoor Environments. Int. J. Robot. Res. 2012, 31, 647–663. [Google Scholar] [CrossRef]
Han, D. Research on Control System Design of Automobile Wind Tunnel Model Test. In Proceedings of the 2023 Asia-Europe Conference on Electronics, Data Processing and Informatics (ACEDPI), Prague, Czech Republic, 17–19 April 2023; pp. 15–19. [Google Scholar] [CrossRef]
Inman, J.A.; Danehy, P.M. From Wind Tunnels to Flight Vehicles: Visualization and quantitative measurements supporting NASA’s space program. In Proceedings of the Optical Sensors and Sensing Congress, Washington, DC, USA, 22–26 June 2020; p. LW4E.1. [Google Scholar] [CrossRef]
Tanner, C.L.; Clark, I.G.; Chen, A. Overview of the Mars 2020 parachute risk reduction activity. In Proceedings of the 2018 IEEE Aerospace Conference, Big Sky, MT, USA, 3–10 March 2018; pp. 1–11. [Google Scholar] [CrossRef]
O’Farrell, C.; Muppidi, S.; Brock, J.M.; Van Norman, J.W.; Clark, I.G. Development of models for disk-gap-band parachutes deployed supersonically in the wake of a slender body. In Proceedings of the 2017 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2017; pp. 1–16. [Google Scholar] [CrossRef]
Salti, S.; Tombari, F.; Di Stefano, L. SHOT: Unique signatures of histograms for surface and texture description. Comput. Vis. Image Underst. 2014, 125, 251–264. [Google Scholar] [CrossRef]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.; Koltun, V. Point Transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021. [Google Scholar]
Wang, W.; You, X.; Chen, L.; Tian, J.; Tang, F.; Zhang, L. A Scalable and Accurate De-Snowing Algorithm for LiDAR Point Clouds in Winter. Remote Sens. 2022, 14, 1468. [Google Scholar] [CrossRef]
Kurup, A.; Bos, J. DSOR: A Scalable Statistical Filter for Removing Falling Snow from LiDAR Point Clouds in Severe Winter Weather. arXiv 2021, arXiv:2109.07078. [Google Scholar]
Digne, J.; Franchis, C. The Bilateral Filter for Point Clouds. Image Process. Line 2017, 7, 278–287. [Google Scholar] [CrossRef]
Si, H.; Wei, Z.; Zhu, Z.; Chen, H.; Liang, D.; Wang, W.; Wei, M. LBF:Learnable Bilateral Filter For Point Cloud Denoising. arXiv 2022, arXiv:2210.15950. [Google Scholar]
Huang, H.; Wu, S.; Gong, M.; Cohen-Or, D.; Ascher, U.M.; Zhang, H. Edge-aware point set resampling. ACM Trans. Graph. (TOG) 2013, 32, 1–12. [Google Scholar] [CrossRef]
Zeng, J.; Cheung, G.; Ng, M.; Pang, J.; Yang, C. 3D Point Cloud Denoising using Graph Laplacian Regularization of a Low Dimensional Manifold Model. arXiv 2019, arXiv:1803.07252. [Google Scholar] [CrossRef]
Hu, W.; Gao, X.; Cheung, G.; Guo, Z. Feature Graph Learning for 3D Point Cloud Denoising. IEEE Trans. Signal Process. 2020, 68, 2841–2856. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. arXiv 2017, arXiv:1706.02413. [Google Scholar]
Xu, M.; Ding, R.; Zhao, H.; Qi, X. PAConv: Position Adaptive Convolution with Dynamic Kernel Assembling on Point Clouds. arXiv 2021, arXiv:2103.14635. [Google Scholar]
Ao, S.; Hu, Q.; Yang, B.; Markham, A.; Guo, Y. SpinNet: Learning a General Surface Descriptor for 3D Point Cloud Registration. arXiv 2021, arXiv:2011.12149. [Google Scholar]
Thomas, H.; Qi, C.R.; Deschaud, J.E.; Marcotegui, B.; Goulette, F.; Guibas, L.J. KPConv: Flexible and Deformable Convolution for Point Clouds. arXiv 2019, arXiv:1904.08889. [Google Scholar]
Seppänen, A.; Ojala, R.; Tammi, K. 4DenoiseNet: Adverse Weather Denoising from Adjacent Point Clouds. IEEE Robot. Autom. Lett. 2023, 8, 456–463. [Google Scholar] [CrossRef]
Heinzler, R.; Piewak, F.; Schindler, P.; Stork, W. CNN-based Lidar Point Cloud De-Noising in Adverse Weather. IEEE Robot. Autom. Lett. 2020, 5, 2514–2521. [Google Scholar] [CrossRef]
Zhao, X.; Wen, C.; Wang, Y.; Bai, H.; Dou, W. TripleMixer: A 3D Point Cloud Denoising Model for Adverse Weather. arXiv 2024, arXiv:2408.13802. [Google Scholar]
Rakotosaona, M.; La Barbera, V.; Guerrero, P.; Mitra, N.J.; Ovsjanikov, M. PointCleanNet: Learning to Denoise and Remove Outliers from Dense Point Clouds. Comput. Graph. Forum 2020, 39, 185–203. [Google Scholar] [CrossRef]
Duan, C.; Chen, S.; Kovacevic, J. 3D Point Cloud Denoising via Deep Neural Network based Local Surface Estimation. arXiv 2019, arXiv:1904.04427. [Google Scholar]
Luo, S.; Hu, W. Differentiable Manifold Reconstruction for Point Cloud Denoising. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1330–1338. [Google Scholar] [CrossRef]
Hermosilla, P.; Ritschel, T.; Ropinski, T. Total Denoising: Unsupervised Learning of 3D Point Cloud Cleaning. arXiv 2019, arXiv:1904.07615. [Google Scholar]
Luo, S.; Hu, W. Score-Based Point Cloud Denoising (Learning Gradient Fields for Point Cloud Denoising). In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 17 October 2021; pp. 4563–4572. [Google Scholar] [CrossRef]
Hyvarinen, A. Estimation of Non-Normalized Statistical Models by Score Matching. J. Mach. Learn. Res. 2005, 6, 695–709. [Google Scholar]
Vogel, M.; Tateno, K.; Pollefeys, M.; Tombari, F.; Rakotosaona, M.J.; Engelmann, F. P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2024. [Google Scholar]
Li, Y.; Sheng, H. A single-stage point cloud cleaning network for outlier removal and denoising. Pattern Recognit. 2023, 138, 109366. [Google Scholar] [CrossRef]
Li, X.; Lu, J.; Ding, H.; Sun, C.; Zhou, J.T.; Meng, C.Y. Risk-optimized Outlier Removal for Robust 3D Point Cloud Classification. arXiv 2024, arXiv:2307.10875. [Google Scholar] [CrossRef]
Chand, D.R.; Kapur, S.S. An Algorithm for Convex Polytopes. J. ACM 1970, 17, 78–86. [Google Scholar] [CrossRef]
Graham, R.L. An Efficient Algorithm for Determining the Convex Hull of a Finite Planar Set. Inf. Process. Lett. 1972, 1, 132–133. [Google Scholar] [CrossRef]
Edelsbrunner, H.; Kirkpatrick, D.; Seidel, R. On the shape of a set of points in the plane. IEEE Trans. Inf. Theory 1983, 29, 551–559. [Google Scholar] [CrossRef]
dos Santos, R.C.; Galo, M.; Carrilho, A.C. Extraction of Building Roof Boundaries From LiDAR Data Using an Adaptive Alpha-Shape Algorithm. IEEE Geosci. Remote Sens. Lett. 2019, 16, 1289–1293. [Google Scholar] [CrossRef]
Peethambaran, J.; Muthuganapathy, R. A non-parametric approach to shape reconstruction from planar point sets through Delaunay filtering. Comput.-Aided Des. 2015, 62, 164–175. [Google Scholar] [CrossRef]
Yang, Q.; Li, Z.; Liu, Z.; Jiang, X.; Gao, X. The Circle Pure Rolling Method for Point Cloud Boundary Extraction. Sensors 2025, 25, 45. [Google Scholar] [CrossRef]
Bode, L.; Weinmann, M.; Klein, R. BoundED: Neural Boundary and Edge Detection in 3D Point Clouds via Local Neighborhood Statistics. arXiv 2022, arXiv:2210.13305. [Google Scholar] [CrossRef]
Xiang, P.; Wen, X.; Liu, Y.S.; Cao, Y.P.; Wan, P.; Zheng, W.; Han, Z. SnowflakeNet: Point Cloud Completion by Snowflake Point Deconvolution with Skip-Transformer. arXiv 2021, arXiv:2108.04444. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv 2019, arXiv:1912.01703. [Google Scholar]
Chang, A.X.; Funkhouser, T.; Guibas, L.; Hanrahan, P.; Huang, Q.; Li, Z.; Savarese, S.; Savva, M.; Song, S.; Su, H.; et al. ShapeNet: An Information-Rich 3D Model Repository. arXiv 2015, arXiv:1512.03012. [Google Scholar]
Canny, J. A Computational Approach to Edge Detection. In Readings in Computer Vision; Fischler, M.A., Firschein, O., Eds.; Morgan Kaufmann: San Francisco, CA, USA, 1987; pp. 184–203. [Google Scholar] [CrossRef]
Bradski, G. The OpenCV Library. Dr. Dobb’s J. Softw. Tools 2000, 25, 120–123. [Google Scholar]
Yuan, W.; Khot, T.; Held, D.; Mertz, C.; Hebert, M. PCN: Point Completion Network. arXiv 2019, arXiv:1808.00671. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Fan, H.; Su, H.; Guibas, L.J. A Point Set Generation Network for 3D Object Reconstruction from a Single Image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 605–613. [Google Scholar]
Achlioptas, P.; Diamanti, O.; Mitliagkas, I.; Guibas, L. Learning Representations and Generative Models for 3D Point Clouds. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 40–49. [Google Scholar]
Rusu, R.B.; Cousins, S. 3D is here: Point Cloud Library (PCL). In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 1–4. [Google Scholar] [CrossRef]
Rusu, R.B. Semantic 3D Object Maps for Everyday Manipulation in Human Living Environments. KI-Künstliche Intelligenz 2010, 24, 345–348. [Google Scholar] [CrossRef]
Anant, R.; Sunita, J.; Anand, J.; Kumar, M. A Density Based Algorithm for Discovering Density Varied Clusters in Large Spatial Databases. Int. J. Comput. Appl. 2010, 3, 1–4. [Google Scholar]
Mao, A.; Yan, B.; Ma, Z.; He, Y. Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network. In Proceedings of the 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 5768–5777. [Google Scholar] [CrossRef]
Gao, Y.; Chen, W.; Yu, C.; Yao, Y.; Xiao, T. Automatic extraction of building elevation contours based on LIDAR data. In Proceedings of the 2017 4th International Conference on Systems and Informatics (ICSAI), Hangzhou, China, 11–13 November 2017; pp. 1352–1357. [Google Scholar] [CrossRef]
Sun, D.; Fan, Z.; Li, Y. Automatic extraction of boundary characteristic from scatter data. J. Huazhong Univ. 2008, 36, 82–84. [Google Scholar]
Yu, L.; Li, X.; Fu, C.W.; Cohen-Or, D.; Heng, P.A. EC-Net: An Edge-aware Point set Consolidation Network. arXiv 2018, arXiv:1807.06010. [Google Scholar]
Xie, Y.; Tu, Z.; Yang, T.; Zhang, Y.; Zhou, X. EdgeFormer: Local patch-based edge detection transformer on point clouds. Pattern Anal. Appl. 2025, 28, 11. [Google Scholar] [CrossRef]
Loizou, M.; Averkiou, M.; Kalogerakis, E. Learning Part Boundaries from 3D Point Clouds. Comput. Graph. Forum 2020, 39, 183–195. [Google Scholar] [CrossRef]
Han, X.F.; Jin, J.S.; Wang, M.J.; Jiang, W. Iterative guidance normal filter for point cloud. Multimed. Tools Appl. 2018, 77, 16887–16902. [Google Scholar] [CrossRef]
Duan, C.; Chen, S.; Kovacevic, J. Weighted multi-projection: 3D point cloud denoising with tangent planes. In Proceedings of the 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), Anaheim, CA, USA, 26–28 November 2018; pp. 725–729. [Google Scholar] [CrossRef]
Leal, E.; Sanchez-Torres, G.; Branch, J.W. Sparse Regularization-Based Approach for Point Cloud Denoising and Sharp Features Enhancement. Sensors 2020, 20, 3206. [Google Scholar] [CrossRef] [PubMed]
Alexa, M.; Behr, J.; Cohen-Or, D.; Fleishman, S.; Levin, D.; Silva, C. Computing and rendering point set surfaces. IEEE Trans. Vis. Comput. Graph. 2003, 9, 3–15. [Google Scholar] [CrossRef]
Xu, Z.; Foi, A. Anisotropic Denoising of 3D Point Clouds by Aggregation of Multiple Surface-Adaptive Estimates. IEEE Trans. Vis. Comput. Graph. 2021, 27, 2851–2868. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Zhao, Y.; Zhan, S.; Liu, Y.; Chen, R.; He, Y. PCDNF: Revisiting Learning-Based Point Cloud Denoising via Joint Normal Filtering. IEEE Trans. Vis. Comput. Graph. 2024, 30, 5419–5436. [Google Scholar] [CrossRef]
Zhang, D.; Lu, X.; Qin, H.; He, Y. Pointfilter: Point Cloud Filtering via Encoder-Decoder Modeling. arXiv 2020, arXiv:2002.05968. [Google Scholar] [CrossRef]
Dong, Z.; Liang, F.; Yang, B.; Xu, Y.; Zang, Y.; Li, J.; Yuan, W.; Dai, W.; Fan, H.; Hyyppä, J.; et al. Registration of large-scale terrestrial laser scanner point clouds: A review and benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 163, 327–342. [Google Scholar] [CrossRef]

Figure 1. The PcBD Network. The structure contains three parts: Feature-Extraction, Boundary-Detection and Smoothing. The network uses the predicted point-by-point labels and filters to remove outliers and predict projection boundaries, and finally outputs a set of 3D vectors to smooth the projection boundaries.

Figure 2. The generation route of the proposed Bound57 dataset. By inputting an .obj model, a simulated output point cloud from a ToF camera and ground truths (GT) with labels for various point cloud processing tasks can be obtained.

Figure 4. The Boundary-Detecting block, with detailed structure of the proposed 2D detection process and the Cross Transformer.

Figure 5. Illustration of the inherent limitation of ToF cameras when detecting projection boundaries.

Figure 6. The Smoothing block.

Figure 7. The transformation from a depth map to a point cloud.

Figure 8. The method to back-project the boundary point cloud.

Figure 9. The simulation of ToF imaging in the Bound57 dataset.

Figure 10. Visual comparison of outlier-removing.

Figure 11. Visual comparison of boundary detecting. We only show the inputs with outliers. The outlier points will be removed for the comparing methods.

Figure 12. Visual comparison of boundary-detecting. We only show the inputs with outliers. The outlier points will be removed for the comparing methods, and the projection boundaries of the comparisons are extracted using ground truth boundary labels. * Results on the extracted ground truth boundaries.

Figure 13. A comparison between the PcBD processing flow and a classic processing flow which applies multiple methods.

Figure 14. The output of PcBD on real-world parachute scans.

Figure 15. Test results of PcBD on the WHU-TLS dataset. (a) Pedestrians and a vehicle. (b) Trees. We change the number of points and projection direction within a scene to test the adaptability of PcBD.

Figure 16. The attempts of PcBD to extract 3D point cloud boundaries.

Table 1. Outlier-removal results on Bound57 under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Table 1. Outlier-removal results on Bound57 under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Methods/Results	Avg.	Airplane	Bench	Bicycle	Lamp	Microphone	Piano	Stove	Watercraft
DMRde-noise [24]	14.43	13.83	13.77	14.29	16.48	17.46	14.10	15.26	12.49
PointCleanNet [22]	5.02	4.39	4.68	5.72	4.48	5.93	6.13	2.79	5.81
Score de-noise [26]	3.90	3.73	3.89	4.33	3.74	3.56	3.87	3.98	3.83
PD-LTS-Heavy [50]	3.79	3.17	3.92	4.39	3.13	3.22	3.85	4.08	3.62
PD-LTS-Light [50]	3.15	2.65	3.45	4.18	2.61	2.66	3.17	3.35	3.10
DBSCAN [49]	2.31	0.81	1.54	1.30	2.00	3.00	2.89	2.46	1.11
SOR [48]	1.38	0.58	0.96	0.98	1.10	1.07	1.70	1.77	0.73
ROR [47]	0.95	0.26	0.61	0.58	0.58	0.56	1.02	1.42	0.46
PcBD (Ours)	0.10	0.07	0.13	0.23	0.05	0.04	0.11	0.09	0.10

Table 2. Boundary-detecting results on Bound57 under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Table 2. Boundary-detecting results on Bound57 under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Methods/Results	Avg.	Bag	Bowl	Car	Chair	Mug	Printer	Telephone
Normal-based [52]	38.78	45.26	61.28	39.80	34.60	55.84	40.78	33.62
Ada $α$ -Shapes [34]	25.18	25.77	55.62	25.27	19.44	47.28	46.39	19.07
$α$ -Shapes [33]	23.02	27.11	48.50	19.79	16.62	29.50	50.12	27.62
Grid-Contour [51]	17.30	13.36	32.65	16.40	14.89	31.14	29.20	20.21
PcBD (Ours)	7.24	10.55	14.20	7.08	7.72	9.54	6.42	3.23

Table 3. Boundary-smoothing results on Bound57 under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Table 3. Boundary-smoothing results on Bound57 under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Methods/Results	Avg.	Bathtub	Camera	Earphone	Flowerpot	Motorbike	Pillow	Rifle	Train
DMRde-noise [24]	61.99	81.24	55.51	54.01	81.91	33.96	63.26	25.12	37.30
PointCleanNet [22]	59.90	66.98	57.65	43.59	73.29	31.21	75.18	23.54	46.42
PD-LTS-Heavy [50]	16.03	18.86	15.90	13.21	17.75	11.86	22.59	8.21	12.53
Score de-noise [26]	14.85	17.39	14.10	11.75	17.35	10.67	19.18	6.89	11.45
PD-LTS-Light [50]	13.33	15.56	12.89	12.32	15.56	10.53	15.94	7.13	10.55
Bilateral Filter [10]	12.03	14.48	11.46	9.42	13.95	8.99	14.90	5.76	8.60
AdaMLS [60]	11.01	11.43	9.97	7.70	14.11	7.71	10.87	5.23	8.39
SparseReg [58]	11.85	13.59	11.31	10.36	12.76	9.67	12.09	8.27	9.97
Iter-norm-filter [56]	10.97	13.06	10.30	9.50	13.05	8.24	12.27	5.63	8.04
PointFilter [62] *	10.86	12.78	10.33	8.96	12.24	8.73	12.41	6.85	8.19
W-Multi-Proj [57]	10.29	12.86	9.04	10.00	12.32	8.36	12.90	5.57	7.72
MLS [59]	10.21	12.51	9.86	8.62	12.34	8.68	11.33	5.41	7.98
PCDNF [61] *	9.28	11.08	8.55	7.59	10.87	7.43	10.69	5.08	6.71
PcBD (Ours)	9.92	8.29	7.04	19.60	13.24	8.75	9.68	4.37	8.05

* Results on the extracted ground truth boundaries.

Table 4. Ablation studies of the proposed PcBD. Results are under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Table 4. Ablation studies of the proposed PcBD. Results are under Chamfer-L1 distance (

\times 10^{3}

, in millimeters).

Version	Outlier	Boundary	Smoothing
	Removing	Detecting
PcBD	0.100	7.239	9.918
PCA Normal Calculation	0.109	7.533	10.201
Regular Conv in Set-Abstraction	0.116	8.078	10.638
Regular PT in FE	0.111	8.031	10.545
Only 2D Projection in BD	0.107	8.338	10.782
No Cross Attention (2D and 3D)	0.105	7.705	9.999
Smoothing Only Using 3D Boundary	0.100	7.241	10.504

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, S.; Huang, J.; Zhao, S.; Huang, T. PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising. Appl. Sci. 2025, 15, 7073. https://doi.org/10.3390/app15137073

AMA Style

Sun S, Huang J, Zhao S, Huang T. PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising. Applied Sciences. 2025; 15(13):7073. https://doi.org/10.3390/app15137073

Chicago/Turabian Style

Sun, Shuyu, Jianqiang Huang, Shuai Zhao, and Tengchao Huang. 2025. "PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising" Applied Sciences 15, no. 13: 7073. https://doi.org/10.3390/app15137073

APA Style

Sun, S., Huang, J., Zhao, S., & Huang, T. (2025). PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising. Applied Sciences, 15(13), 7073. https://doi.org/10.3390/app15137073

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PcBD: A Novel Point Cloud Processing Flow for Boundary Detecting and De-Noising

Abstract

Featured Application

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Overview

3.2. Feature-Extracting & Outlier Removal

3.3. Boundary-Detecting

3.4. Smoothing

4. Experiments and Results

4.1. The Bound57 Dataset

4.2. Training

4.3. Results on Bound57

4.4. Other Experiments

4.5. Ablation Studies

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI