1. Introduction
With the advancement of marine science and engineering, the demand for underwater structure monitoring has been steadily increasing. Traditional monitoring methods are primarily based on optical measurement techniques, often requiring diver-assisted operations [
1]. These methods impose strict water quality requirements and pose risks to personnel. At present, multibeam sonar enables three-dimensional underwater point cloud mapping; however, the density of the measured points depends on the number of beams, typically 512 or 1024, which fails to meet high-precision monitoring requirements [
2]. Other sonar systems, such as mechanically scanned sonar and side-scan sonar, provide high-resolution two-dimensional imaging but lack three-dimensional monitoring capabilities [
3].
The emergence of 3D (three-dimensional) sonar technology has addressed the limitations of optical imaging and conventional sonar measurement techniques, such as multibeam and side-scan sonar, in terms of imaging coverage and measurement accuracy [
2]. This advancement provides essential technical support for high-precision three-dimensional monitoring of underwater structures. However, 3D sonar measurements are susceptible to interference from complex underwater environments, often resulting in substantial noise, which hinders its broader application. For instance, suspended particles in the water generate clustered outliers, multiple reflections of acoustic waves between underwater objects introduce spurious noise points, and surface scattering from object surfaces and volume scattering from within objects further degrade measurement accuracy [
4]. Thus, effective data filtering is urgently needed. However, challenges such as acoustic shadows and complex underwater structure geometries complicate filtering processes [
5]. Overall, current 3D sonar filtering techniques remain underdeveloped relative to their extensive application demands, making it difficult to satisfy the growing need for underwater monitoring. In particular, with the increasing construction of offshore wind farms and the rising demand for underwater infrastructure maintenance, enhancing 3D sonar data filtering capabilities is crucial for advancing automation and intelligent applications in underwater structure monitoring [
6,
7].
To address the filtering challenges in 3D sonar data, we review the existing research on underwater point cloud filtering.
Early international studies on bathymetric data outlier detection predominantly employed methods such as median/mean filtering [
8], threshold filtering [
9], and angle and gradient filtering [
10]. These methods cannot remove cluster noise, since they cannot consider a large area information.
Filtering techniques based on trend surfaces have been widely applied in sonar data processing, which can take a wide range of information into consideration [
11]. These methods fit a quadratic polynomial surface function to approximate the actual seabed topography and identify outliers based on deviations from the fitted values. While easy to implement, these algorithms perform poorly in areas with complex terrain variations. To improve trend surface filtering, an enhanced algorithm was presented incorporating the influence domain of natural neighboring points, achieving localized trend surface fitting and filtering. However, this approach suffers from high computational costs when processing large datasets [
12].
Based on the surface filtering, assuming that outliers are all above the surface, a special method was proposed, the cloth simulation filtering algorithm, which was introduced for removing gross errors from underwater point clouds [
13,
14]. When all outliers are above the surface, it can achieve good performance. But it cannot deal with more complex cases. To better address complex noise in 3D sonar data, He et al. proposed a partitioned filtering method that constructs local coordinate systems for point cloud blocks, fits a trend surface, and applies Grubbs’ test for adaptive threshold-based noise removal. However, this method struggles with clustered noise [
15]. Overall, despite the success of statistical methods in large-scale point cloud denoising, they are highly sensitive to parameter selection, making them challenging to apply in complex environments.
Apart from the above surface-based methods, there are some statistical methods. Traditional LiDAR point cloud filtering methods often include slope-based filtering algorithms, morphological filtering algorithms, fitting-based filtering algorithms, and filtering based on irregular triangular networks (TIN) [
16,
17,
18,
19]. Slope-based filtering algorithms differentiate outliers based on slope variations between neighboring points. Morphological filtering algorithms apply mathematical morphology operations (e.g., erosion, dilation, and opening) to remove outliers. Fitting-based filtering models surfaces using mathematical functions (e.g., plane or surface fitting) to determine whether a point belongs to a structural surface or is an outlier. TIN-based filtering constructs a triangulated irregular network (TIN) to analyze relative height relationships for noise removal. However, most of these methods focus on terrain data and struggle with complex underwater structures in point cloud data.
- 2.
Deep Learning-Based Methods
Recent advancements in deep learning have introduced innovative approaches to noise removal. Neural network architectures [
20,
21,
22] can be trained to learn complex patterns in point clouds and effectively identify outliers, enabling end-to-end data filtering. Rakotosaona et al. [
23] leveraged PCPNet [
24] for the robust processing of densely sampled point clouds with significant noise. Hu et al. introduced RandLA-Net [
25], an efficient and lightweight architecture for point-wise classification in large-scale point clouds, effectively identifying and removing anomalies. However, deep learning for underwater point cloud processing requires large-scale annotated underwater datasets, necessitating extensive underwater operations and auxiliary judgment using underwater imaging equipment, which significantly increases the costs. As a result, despite the widespread adoption of 3D sonar in underwater structure monitoring, no open datasets currently exist, and limited sample availability hinders the development of deep learning models, increasing the difficulty of intelligent processing [
26].
Aimed at that, we propose a method to capture the structure information of the 3D sonar point cloud data, and on this basis, we achieve the removal of both far-surface and near-surface outliers, as well as a cluster of outliers in seafloor and structures with near vertical surface. The contributions of our work are listed as follows:
We employ plane features to effectively characterize the geometric attributes of the point cloud data.
These features are leveraged to precisely describe the inherent properties of the structural components within the scene.
By exploiting these distinct characteristics, our method successfully differentiates the structural point cloud from noise, enabling robust separation.
The core contribution of our work lies in the adaptation and systematic integration of these techniques specifically for 3D sonar data.
The rest of this paper is organized as follows.
Section 2 introduces the theory and characteristics of 3D sonar data.
Section 3 gives a detailed description of the proposed method.
Section 4 and
Section 5 describe and analyze the experimental results. Finally, our conclusions are drawn in
Section 6. The flowchart of the proposed method is shown in
Figure 1.
3. Methods
3.1. 3D Spatial Feature
In point cloud processing, traditional methods typically employ Least Squares (LS) fitting to estimate local surface normal vectors by computing the covariance matrix of the point cloud and performing eigenvalue decomposition. However, LS assumes that errors exist only in the z-direction, neglecting the fact that measurement errors can occur in all three coordinate directions. To achieve a more robust normal estimation, this study adopts the Total Least Squares (TLS) method [
15], which solves for local plane parameters in three-dimensional space using Singular Value Decomposition (SVD).
The point set
A = {
p1(
x1,
y1,
z1),
p2(
x2,
y2,
z2),…,
pm(
xm,
ym,
zm)} can also be represented as
[
15]. We select the
k neighbor set of
:
is the index of
in point set
. Then, the geometric center of the neighboring points of
is
, where
In TLS, {
Pi} is used to construct the decentralized matrix
MSVD is then applied to
M:
where
is the singular matrix, satisfying
. The last column of the right singular matrix (corresponding to
) is the normal vector of
.
A matrix based on local three-dimensional point cloud statistics can divide LiDAR data into spherical, linear and planar structures, as shown in
Figure 2. Therefore, singular values were used to describe the structural features of the neighborhood space of points:
As shown in (6), we select the most like structure as the
of
.
At this point, the TLS method is used to calculate the normal vector and structural features of the points in 3D space, and the above operations are performed on all points, providing a basis for the growth segmentation of the subsequent point cloud region and noise removal. In the next section, we will aggregate similar structures.
3.2. Region Growth Segmentation of Point Cloud Region Based on Spatial Features
The region growing algorithm is adopted to segment a point cloud dataset A = {p1(x1, y1, z1), p2(x2, y2, z2),…, pm(xm, ym, zm)} into n clusters with distinct attributes, forming a set of point cloud clusters C{p1, p2, …} and Clusters{C1, C2, …, Cn}. For a point cloud dataset A containing m points obtained from object scanning, we extract the target region, while removing the surrounding noise requires region growing segmentation based on the 3D spatial distribution of the points.
The region growing algorithm progressively aggregates points with similar attributes to identify and segment different objects in the point cloud, thereby isolating the target region, namely the area where interested structures exist. In point cloud
A, there are often spatial distribution differences between surfaces of structures or between surfaces and noise points, whereas points belonging to the same surface or adjacent regions typically exhibit smaller characteristics. To extract surfaces and separate noise points, we introduce the definition of fitting residuals
Sd. For any point ∀
p(
x,
y,
z) ∈
A{
p1,
p2, …,
pm}, the distance from
p to the plane
S fitted by its k-nearest neighbors is given by
where
is the average coordinate of the neighboring of
p. (
,
,
) is the normal vector of
p.
At the same time, different plane objects are segmented according to the angle difference of normal vectors. For the normal vector
,
of any point
pi and
pj in
A, the angle difference between them is
Then, the regional growth segmentation process is as follows:
- 1.
Set three thresholds according to the spatial distribution of the midpoint cloud of
A,
,
,
,
is the residual threshold fitted for the point to the plane;
is the included angle difference threshold (small), and
is the included angle difference threshold (large), as shown in
Figure 3.
- 2.
Find any point p from the undivided points in A, incorporate the point into the empty seed point set seeds{} and point cloud cluster C1{}, and mark p as segmented.
- 3.
Starting from point p, find all points in its neighborhood P{p1, p2, …, pk}, if there is point pi in P, and the attribute difference between point pi and point p is < , then pi is added in C1{}. If < , pi is added to seeds{}. Remove p from seeds{}.
- 4.
Iterate over the remaining points in seeds{}, and repeat step 3 until seeds{} is empty, at which point cloud cluster C1 is stored.
- 5.
Repeat steps (2)–(4) until all points in A have been processed. The storage of Clusters{C1, C2, …, Cn} is completed.
After the regional growth segmentation of all points, according to the point characteristics given by Equation (6), we can analyze any point cloud cluster
Ci{
p1,
p2, …}, and its features are expressed as
Then, we can describe the structure features of Clusters{C1, C2, …, Cn} further.
3.3. Noise Removal Considering the Spatial Distribution and Structural Features of Point Clouds
The spatial characteristics of the 3D point cloud have been achieved, the regional growth algorithm makes the point clouds with the same spatial distribution gather together, and then, it is necessary to accurately extract the target object and remove noise according to its spatial distribution and structural characteristics.
Since point cloud dataset A is obtained by scanning structures, which contain a lot of planar information and surrounding noise points, some noise points will inevitably form a few planar structures, but these noises are often sparse. Therefore, it is necessary not only to adopt structural feature constraints but also to remove them by a filtering method.
Then, the regional growth segmentation process is as follows:
Statistical analysis of spatial distance of point cloud dataset A. Perform spatial calculation for the neighborhood space of all points in A. For ∀pi ∈ A, the average neighborhood space distance of point pi (xi, yi, zi) to its k nearest neighbor set {Pi} is
where ||
pi − pj||
2 represents the distance between
pi and
pj.
- 2.
The spatial distance calculation of Clusters{C1, C2, …, Cn}. As for any Ci{p1, p2, …} ∈ Clusters{C1, C2, …, Cn}, the neighboring distance of Ci is
- 3.
Object extraction and noise removal with structural features and spatial distance constraints. The point cloud clusters that meet the characteristics of the planar structure and are relatively clustered in spatial distribution are extracted (in our case, the structures are planar like structures):
where
Tr is the threshold of the neighboring distance.
If satisfies (12), it is the point cluster of structures. This parameter is related to the performance parameters of the sonar, and usually it can be calculated from the angular resolution of the sonar and the detection distance. All the point clouds that meet the conditions are clustered together; that is, the structure point cloud is extracted, and the noise has been removed.
4. Experiments
4.1. 3D Sonar Point Cloud Data
The BV5000 3D Mechanical Scanning Sonar (Teledyne BlueView, Bothell, WA, USA) was used to observe each cabin, and the observation results are shown in the table below. There is a circular blind spot at the bottom, which is caused by the way the 3D sonar works. Take BV5000 as an example to introduce the three-dimensional sonar system: BV5000 has two models, and they work at the frequencies of 2.25 MHz and 1.35 MHz, respectively.
Table 1 describes the main system parameters of the BV5000.
The data used in this study are from Changjiang underwater caisson observation, and the dataset contains caissons. Caisson foundations are widely employed in engineering construction, typically installed by excavating soil from beneath the structure to facilitate sinking. For this study, two representative regions within the research area were selected for experimentation, encompassing both structural data and surrounding noise (see
Figure 4; noise clusters are highlighted in the red box). The two datasets, denoted as Data 1 and Data 2, comprise 2,624,672 and 2,417,866 points, respectively.
4.2. The Experimental Process
In this experiment, C++ and PCL library are used for point cloud data processing. The experimental process of Data 1 was analyzed to verify the effectiveness of the proposed method. The comparative analysis mainly focused on the following core steps: (1) extracting the three-dimensional spatial features of the point cloud based on the TLS; (2) point cloud segmentation based on region growth algorithm; (3) target extraction by combining the structural characteristics and spatial distribution information of the point cloud. For our method, the 0.1 m setting of the point-to-plane fitting residual, the angle threshold is 20°.
According to the method proposed in
Section 2.1 of this paper, the three-dimensional sonar point cloud is processed in the experiment. First, the TLS method is adopted, the number of neighborhood points
k = 30 is set, and the local features of the point cloud in three-dimensional space are solved by singular value decomposition. As shown in
Figure 5a, yellow points represent scattered points, red points represent linear structure points, and blue points represent planar structure points. The experimental results show that this method can identify the spatial structure features of different points accurately. However, there are still some mis-classification points in the planar area, which actually constitute the structure of the structure, so preserving these points is essential for the complete reconstruction of the structure area.
In fact, it can be observed that the main structure of the caisson presents a regular geometric shape: the bottom surface of the caisson’s shaft is square, the sides are approximately rectangular, the bottom is wider, tapering towards the middle section, the top is slightly narrower, and the length and width of the bottom are approximately 10 m, with a shaft height of about 10 m. The bottom of the data is the underwater terrain point cloud. This part contains circular voids, which are caused by the blind area of the sonar scanning. At areas with terrain undulations, there will also be sonar shadows. When the terrain rises, echo data will be displayed on the side close to the sonar equipment in the raised part, while the other side at the back is affected by terrain obstruction and has data missing, forming a shadow. Similarly, in the terrain depression area, shadows will also be produced. Unlike the raised part, in the depression area of the terrain, a shadow will appear on the side close to the sonar equipment.
In order to further optimize the point cloud segmentation, the region growth algorithm is introduced, and the point to plane fitting residual and normal vector angle are combined to optimize the point cloud clustering. In view of the adjustment of the mis-classification points in
Figure 5a, the experiment set three times the standard deviation of the mean value of the fitting residual threshold and set the angle threshold of the normal vector = 5° to effectively distinguish different planes and noise points. The segmentation results are shown in
Figure 5b. Different colors represent different point cloud clusters. The purple point cloud area represents the structural point cloud, and its recognition effect is significantly improved, while there are still some discrete point cloud clusters around.
According to the obtained point cloud structural characteristics (
Figure 5a), the spatial structure characteristics of the point cloud cluster are analyzed based on the equation. The structural characteristics of the point cloud after regional growth are shown in
Figure 5c. This shows that the method can strike a balance between local structural features (
Figure 5a) and global region growth segmentation (
Figure 5b) and obtain optimal feature characterization. However, there are still limitations in selecting target point clouds only by relying on spatial structure features. For example, the noise points in the red box area in
Figure 5c are easily misidentified as planar point clouds due to their dense distribution and proximity to the surface. Therefore, in order to further remove interference points, the filtering method described in
Section 3.3 was used to optimize the experiment.
The neighborhood spatial analysis of point cloud dataset is carried out, the average neighborhood spatial distance of each point p is calculated, and the whole neighborhood distance of each point cloud cluster Ci is calculated. Based on the spatial distribution difference between the discrete point cloud cluster and the target point cloud cluster, the threshold of neighborhood spatial distance is set as the exclusion criterion.
The experimental results are shown in
Figure 5d. The black point cloud represents the noise point cloud identified by filtering method in
Section 3.3, and the remaining color point clouds meet the neighborhood spatial distance threshold. These point clouds will be further combined with Equation (12) for feature verification, so as to screen the final target point convergence.
Finally, according to the results (
Figure 5d) and the structural characteristics of point clouds after regional growth (
Figure 5c), the structural point clouds (purple point clouds in
Figure 6a) matching the planar characteristics are screened, and the noise point clouds (black point clouds in
Figure 6a) are eliminated. The final target object extraction results are shown in
Figure 6b, indicating that the proposed method can effectively remove the target point while maintaining the integrity and structural consistency of the target point cloud.
We also carried out the same process for Data 2, and the processing result is shown in
Figure 7. A large amount of noise in the data has been well filtered out, and the main structure has been well retained. There are some missing point clouds in the lower part of the data. This is mainly due to the missed measurement of some point clouds during the measurement process, rather than over-filtering caused by the algorithm in this paper.
4.3. Accuracy Evaluation
In order to evaluate the denoising effect of point clouds combined with spatial features and regional growth, Precision, Recall and F1 Score were used as evaluation indicators in this study. The proportion of the point cloud that the precision measurement algorithm determines as the target point is actually correctly identified among all points. The recall rate reflects the proportion of the actual target point cloud successfully detected and removed by the algorithm. The F1 score is the harmonic average of the accuracy rate and the recall rate, which is used to comprehensively evaluate the accuracy and coverage ability of the algorithm.
TP (True Positive) indicates the number of targets correctly recognized, FP (False Positive) indicates the number of noise points incorrectly recognized as targets, and FN (False Negative) indicates the number of targets that are not detected.
The ground truth data are shown in
Figure 8 and
Table 2. It can be seen from the experimental results in
Table 1 that the method proposed in this paper achieved an accuracy of 78.93% and 85.64%, respectively, in the experiments of Data 1 and 2, demonstrating an excellent ability to suppress mis-classification. Especially in terms of Recall, the method proposed in this paper achieves 83.38% on Data 2, indicating that this method retains the real target points when extracting the target point cloud.
The results differ significantly between the two datasets. In fact, the dataset is categorized into two distinct regions: the wellbore wall and the bottom surface. Initial evaluations indicate that all methods exhibit relatively lower accuracy on the first batch of data. Upon detailed comparative analysis, we identified that the primary discrepancy lies within the point clouds of the bottom surface. The second batch contains a denser distribution of bottom points, with a significant number of points appearing outside the wellbore wall. During the manual generation of ground truth, these points were retained as valid structural features rather than being classified as noise. In contrast, the first batch contained fewer such points outside the wall, which were predominantly filtered out as noise during manual processing. This suggests a potential human error in the annotation process, where valid points in the outer region were mistakenly labeled as noise. Consequently, the observed variations in data processing accuracy are primarily attributed to deviations in the ground truth.
The proposed method utilizes TLS to extract 3D spatial features and optimizes the process through a combination of region growing and statistical filtering. This approach achieves a balance between local feature preservation and global segmentation, ensuring greater stability and generalization in complex environments. As a result, the method enhances the completeness of the target point cloud while reducing noise interference, demonstrating significant application value in 3D sonar point cloud processing.
5. Discussion
5.1. The Stability of the Algorithm for Threshold Setting
The threshold parameters were determined by analyzing data from small regions and observing the experimental results, after which they were extended to larger areas. Although underwater structures exhibit a certain degree of data complexity, the scenarios themselves are relatively uniform. Consequently, thresholds established in small regions demonstrate significant potential for application across larger domains.
To verify the stability of the algorithm for threshold setting, we conducted a threshold setting experiment with the angle threshold as 50°. The included angle threshold is the most important and influential threshold; so, it was mainly adjusted. The final results are shown in
Figure 9 and
Table 3. Overall, some data indicators have improved, and some have decreased, but the basic accuracy is all good, indicating that the method proposed in this paper is stable under different parameter settings.
5.2. Comparison and Ablation Experiments
To verify the performance of the algorithm, we conducted comparison experiments. The statistical filtering method and the distance clustering method, as comparison methods [
15,
27,
28,
29], were also used in the experiment. The core principle of statistical filtering is based on the local statistical characteristics of point cloud data. By calculating the average distance between each point and its neighboring points and comparing it with the global average distance, outliers that are far from the surrounding points can be identified and removed. The distance clustering method is a classification method based on the distance between points. The commonly used algorithm is Euclidean distance clustering. It classifies points whose distance is less than the set threshold into one category by calculating the Euclidean distance between points. The results are shown in
Figure 10 and
Figure 11 and
Table 4. Overall, the accuracy of the distance clustering algorithm is higher than that of statistical filtering, but both are lower than the method in this paper.
It can be seen that statistical filtering is always prone to under-filtering, and some clustered small blocks are difficult to handle well. This is because statistical filtering has difficulty distinguishing aggregated noise. The distance clustering method can handle some aggregated noise points, and the overall result is good. However, compared with the method in this paper, there is still the problem of insufficient accuracy. The main reason is that a single distance feature is insufficient to describe the complex structure of underwater 3D sonar.
Since our algorithm also incorporates a statistical filtering strategy, it significantly outperforms the standard statistical filtering method in terms of accuracy metrics. This also can be considered as an ablation experiment, which shows that the adopted structural information is important for filtering. Both of the statistical filtering and distance clustering methods only utilize the attribute information of a single point cloud, making it difficult to comprehensively describe the differences between structural point clouds and noise points. Therefore, their accuracy is also relatively low. As a comparison, the proposed method achieves F1 scores of 78.65% and 84.49% in outlier removal, effectively eliminating noise while largely preserving structural features. For Data 1, our method achieved precision improvements of 0.38% and 0.07% compared to the statistical filtering and distance clustering, respectively. In terms of recall, our method outperformed these baselines by 1.63% and 0.93%, respectively. Consequently, the F1 score saw gains of 1.02% and 0.51% over the two methods. For Data 2, our method improved precision by 0.13% over statistical filtering. Regarding recall, we observed improvements of 0.44% and 4.05% compared to statistical filtering and distance clustering, respectively. Finally, the F1 score increased by 0.28% and 0.14% relative to the respective baselines. The comparison figure is shown in
Figure 12.
5.3. Limitations
We evaluated the method on two datasets acquired from Changjiang underwater caisson observations, both from the same structural category. This may be narrow to support claims of robustness or generalization in complex underwater environments. However, this limitation stems from the scarcity of 3D underwater sonar data across different structural types and the lack of publicly available datasets. From a structural perspective, although we utilized data exclusively from caissons, other similar scenarios, such as bridge pier inspections and bank slope surveys, share comparable geometric characteristics, primarily consisting of planar or curved surfaces. Furthermore, unlike optical imaging, the underwater environment has a relatively minor impact on acoustic imaging. Therefore, we maintain that our method possesses a certain degree of generalization capability and practical value for these analogous structures. In terms of the calculation speed, the method presented in this paper is inferior to the methods being compared. However, we believe that the additional time cost spent on calculating the structural features is worthwhile for obtaining better feature description performance.
Finally, while our method demonstrates improvements over existing approaches, these gains are incremental rather than overwhelming; nevertheless, our approach exhibits a consistent overall advantage. More importantly, we believe this work validates a promising research direction: explicitly incorporating structural characteristics is an effective strategy for processing 3D sonar point clouds.