1. Introduction
In recent years, with the continuous development of sonar equipment, the acquisition of underwater point cloud data has become more convenient, and its quality and cost performance are continuously improving. It is widely used in the three-dimensional modeling of large scenes, underwater target identification and detection, and underwater pile foundation detection during wading projects [
1,
2,
3,
4,
5,
6,
7,
8]. In the latter, a sonar is installed on the two-degrees-of-freedom head, and the scanning is driven by motor rotation. However, the three-dimensional point cloud data obtained for the underwater pile foundation and surrounding terrain are incomplete and cannot fully capture the state of the pile foundation. Xu [
9] et al. proposed a method for detecting defects in underwater pile foundations using binocular vision and the YOLOV8 neural network model for recognition. However, this imaging method is ineffective at capturing the three-dimensional characteristics of underwater pile foundations. As point cloud data intrinsically contain three-dimensional features, it is necessary to perform scanning at multiple locations and register the findings to obtain a complete set of three-dimensional point cloud data.
The accurate registration of point cloud data normally uses the initial value provided by the rudimentary scanning of point cloud data from two different angles. Then, the iterative operation is carried out according to the initial value before the transformation matrix is solved. The iterative closest point algorithm (ICP) is the basis of iterative registration proposed by Besl and McKay [
10]. It directly utilizes all point cloud information to carry out an iterative transformation. This algorithm has high accuracy and is often used for precise registration. Most iteration methods have been improved based on this approach. Liu et al. [
11] proposed an improved principal component analysis (PCA)-based fast ICP matching algorithm. By solving the principal component of two sets of point clouds, the respective PCA coordinate system is formed. Then, the K-D tree is used to quickly search for the nearest point to improve the traditional ICP method and registration efficiency.
There is a scarcity of point cloud data for pile foundation defects, so overfitting problems often occur in the training of the model, and for defect detection, an extreme imbalance between positive and negative samples usually exists. To solve the above problems, point cloud generation should be carried out according to the original point cloud data. Goodfellow et al. [
12] proposed a generative deep learning model called Generative Adversarial Network (GAN). This network has a simple structure and can perform unsupervised learning on the main features of original datasets; however, it is difficult to train, and the model is prone to collapse. To avoid complexity during training, researchers from Cornell University and NVIDIA proposed a three-dimensional (3D) point cloud generation model, PointFlow [
13], based on a flow model, which generated 3D point clouds by modeling them as distributions. Luo et al. [
14] proposed a diffusion probability model using noise distribution to replace the original distribution. By transforming the noise distribution into a desired shape, it enabled point cloud data generation equivalent to the process of reverse diffusion.
Point cloud classification is a method that classifies the point cloud into different point clusters, where the same point cluster has similar or the same attributes. Since the convolutional neural network (CNN) has achieved great success in image segmentation, classification, and other fields, researchers have gradually applied CNN to point cloud processing. However, point clouds are disordered, so many researchers first conduct regularization preprocessing for point clouds, convert point cloud data into regular data-like images, and then use the traditional CNN method [
15]. The multi-view method adopts the above ideas to transform 3D point cloud data into a two-dimensional plane from multiple angles, which is then projected onto a picture. Then, the projected image convolves in two dimensions [
16,
17,
18]. However, the transition from 3D to 2D loses a large amount of information, and the original point cloud data cannot be accurately classified. The PointNet [
19] network presented a new idea, which abandoned the regularization of point cloud data and directly convolved the input of point cloud coordinates. It independently learned each of the cloud points and extracted global features for classification. PointNet++ [
20] and PointMLP [
21] have achieved higher accuracy in point cloud classification and segmentation based on the improvement of PointNet. With the significant progress of Transformer in natural language and image processing, Guo [
22] and Zhao [
23] et al. designed a point cloud processing neural network based on Transformer, which also achieved good accuracy. To improve the efficiency and accuracy of pile defect detection, an attention mechanism was introduced based on PointMLP to allow focus on the information most critical to the current task and reduce the attention given to other input information.
This article proposes a method based on PCA-ICP combined with an improved PointMLP point cloud segmentation network. By scanning the point cloud data of underwater pile foundations, defects and normal states can be accurately classified. The contributions of this article are as follows:
- (1)
The PCA-ICP registration method, multiple filtering algorithms, and the Random Sample Consensus (RANSAC) method can be used to obtain the complete pile point cloud.
- (2)
An underwater pile defect detection method based on the diffusion probability model and improved PointMLP is proposed, along with a slice method to calculate the pile defect volume.
The rest of this article is organized as follows:
Section 2 introduces the complete pile foundation point cloud according to the preprocessing method.
Section 3 presents the diffusion probability model, which increases the number of data points, and the improved PointMLP with attention mechanism, which recognizes the defect type. In
Section 4, the experiments are described to verify the defect detection method. The conclusions are detailed in
Section 5.
2. Point Cloud Data Preprocessing Method
In this study, the sonar system was installed on a 2-DOF cradle head, and the cradle head was rotated by a motor to scan the underwater scene to obtain cloud data from the pile foundation’s mud surface and other views. The density of the point cloud was irregular, including a large amount of data, outliers, and noise, so the point cloud data obtained could not be directly used, and preprocessing was necessary. The preprocessing method is shown in
Figure 1. The methodconsists of three stages: multi-site point cloud registration, point cloud filtering, and point cloud completion, as follows:
- (1)
Multi-site point cloud matching: The PCA-ICP registration algorithm is used to register scanning point clouds from different sites.
- (2)
Point cloud filtering: Voxel filtering, straight pass filtering, spherical area filtering, Gaussian statistical filtering, and radius filtering are used to obtain point clouds from individual pile foundations.
- (3)
Point cloud completion: The point cloud completion of the pile foundation is achieved using the RANSAC fitting cylinder.
PCA-ICP registration is used for point cloud data from different sonar sites, so the point cloud registration is performed from multiple perspectives.
- (1)
PCA method
Suppose the point set,
P = {
Pi},
i = 1, 2, …,
n, is three-dimensional data representing the point cloud distribution. Firstly, the mean value and covariance matrix of the point set are calculated using Equation (1), and the three feature vectors of the covariance matrix are represented as three vertical directions, respectively. The spatial Cartesian coordinate system of the point set is established using the XYZ coordinate axis, and the corresponding transformation parameters of the covariance matrix of the two-point cloud are obtained using the mean value as the origin of the coordinate system. The coordinate system of the point cloud matched with the coordinate system of the target point cloud is adjusted using the desired transformation parameters, and the pre-matching of the point cloud PCA is complete.
- (2)
ICP method
The ICP method is the nearest point iteration method. It uses the nearest corresponding point iteration to accomplish the alignment and matching of multi-viewpoint clouds, which can be regarded as the least squares based on the spatial transformation optimal problem. The basic idea of this algorithm is to transform the corresponding points between two clouds into a three-dimensional matrix and convert the space coordinates of the point cloud to the reference point cloud space coordinate system to achieve point cloud registration.
First, PCA coarse registration is performed with a large overlap, and then ICP registration is performed based on coarse registration. In other words, the rudimentary PCA registration of point cloud M and point cloud N obtains a rotation and translation matrix after the first rotation and translation to obtain and N. Then, point clouds P and Q are approximately PCA-aligned after the first rotation and translation to obtain and Q. The second rotation and translation matrix is obtained using and N for ICP registration, and then the point clouds and Q are used for ICP registration. and Q are obtained after the second rotation and translation.
Firstly, point clouds P and Q with overlapping parts are registered using PCA to obtain and Q. Overlapping areas between point clouds make it possible to carry out the ICP algorithm.
The basic principle of the ICP algorithm is as follows: in the target point cloud
N and source point cloud
to be matched, the nearest point (
Mi,
Ni) is found according to certain constraints, and the optimal matching parameters
R and
T are calculated to minimize the error function
E (
R,
T), which is the following:
where
n is the number of nearest point pairs;
Ni is a point in the target point cloud
N;
Mi is the nearest point in the source point cloud
M corresponding to
Ni;
R is the rotation matrix; and
T is the translation vector.
After horizontal plane calibration, there are still spot clouds as well as multiple pile foundations and noise point clouds in the data, so it is necessary to use a filtering algorithm to obtain the point cloud data of a single pile foundation. In this study, the voxel filter was first used to compress the point cloud, reduce the number of points, and increase the subsequent filtering processing speed. Then, a pass-through filter was used to filter out the points with values below the threshold in the specified dimension. Subsequently, spherical region filtering was performed to obtain the points of the specified spherical region. Finally, point cloud radius filtering based on connectivity analysis was used to preserve the point cloud meeting assumptions, and the point cloud of a single pile foundation was obtained.
Since sonar scans are performed at the locations of measuring points, the point cloud of a single pile foundation obtained via filtering is incomplete. In this paper, the RANSAC method was used to fit the cylinder, the point cloud coordinates were converted into polar coordinates, and the unscanned angles of the pile foundation were calculated. Finally, the point cloud completion was carried out in the calculated angle area to obtain complete point cloud data for a single pile foundation.
3. Underwater Pile Foundation Defect Detection Methods
In this paper, we propose an underwater pile foundation defect detection method based on the diffusion probability model to produce a pile foundation defect dataset, and the PointMLP classification network is trained to obtain the final model. The preprocessed actual pile foundation point cloud data is then analyzed by the model for defect detection.
3.1. Diffusion Probability Model
Inspired by the diffusion process in non-equilibrium thermodynamics, the points in a point cloud are compared to particles in a non-equilibrium thermodynamic system in contact with a heat bath. In the presence of a heat bath, the positions of the particles evolve randomly, diffusing from the original distribution to a noise distribution (entropy increase theory). The generation of a point cloud is, therefore, equivalent to learning the back-diffusion process, transforming the noise distribution into a distribution of the desired shape. The model is shown in
Figure 2.
For the point cloud, consisting of N points, each xi can be considered a point independently sampled from the data with the distribution , where z is the shape’s latent coefficient. A diffusion model consists of two processes: the diffusion process and the reverse process.
The purpose of the diffusion process is to gradually map
to a multidimensional normal distribution (Gaussian noise) via a Markov chain, i.e.,
where
is the Markov diffusion kernel, defined as the Gaussian distribution
. The variance-scheduling hyperparameter
βt controls the diffusivity of the process. The process corresponds to the iterative addition of small amounts of Gaussian noise, which eventually transforms the target into a multidimensional normal distribution that is independent in different dimensions
.
Different from the forward diffusion process, which only adds noise to the points, the reverse process is generated by sampling based on a normal distribution, aiming to recover the desired shape from the input noise:
where
is defined as
, and
µθ is the estimated mean value learned by a neural network implemented with parameter
θ. Through this process, we can gradually eliminate Gaussian noise, pass a set of points sampled from the starting distribution of the standard normal distribution,
, through a reversed Markov chain to obtain a point cloud with the target shape, and finally generate the data that matches the target distribution.
In this study, the initial dataset was obtained, which was then used to train a diffusion probability model to generate an expanded dataset of pile foundation point cloud data in preparation for training a PointMLP network.
3.2. Improving the PointMLP Network Model
PointMLP is a deep learning architecture specially designed for point cloud data processing. Its core purpose is to extract and enhance the geometric characteristics of point clouds through a multilevel perceptron (MLP) and affine transformation, and optimize the training efficiency of deep networks using residual connections. The architecture mainly consists of two key modules: the geometric affine module and the ResP Block.
The function of the geometric affine module is to carry out an affine transformation on input point cloud data to enhance the expression of geometric features. This module typically includes MLP and Batch Normalization (BN) layers and introduces nonlinearity in conjunction with the ReLU activation function to ensure that the model can learn more complex point cloud distribution patterns. The feature after the affine transformation can better adapt to the geometric changes (such as rotation, translation, or scaling) in the point cloud, improving the robustness of the model.
ResP Block is the core computing unit of PointMLP, which adopts the design of residual connection to relieve the gradient loss problem of the deep network. Each ResP Block contains multiple layers of MLP and BN internally and introduces a variety of feature operations such as Subtraction, Product, Hadamard Product, and Summation to enhance feature interaction. These operations enable the model to capture the local and global geometrical relationships of the point cloud in a more detailed way. In addition, the residual connection allows the gradient to be directly returned, ensuring efficient training of the deep network.
The entire architecture gradually extracts high-level features by stacking multiple ResP Blocks and finally outputs a point cloud representation with strong discrimination. The advantage of PointMLP lies in its simple and efficient MLP-based design, which avoids complex convolution or graph operations. Meanwhile, it ensures the expression and training stability of the model through geometric affine and residual mechanisms, making it highly effective in point cloud classification, segmentation, and other tasks.
The attention mechanism ensures the network knows which position to focus on, allowing it to automatically pay attention to important features, improving the network’s performance. To improve the efficiency and accuracy of the classification, this study added an attentional mechanism based on PointMLP. The attentional mechanism is shown in
Figure 3. The original Rectified Linear Unit (ReLU) activation function was replaced with the Sigmoid Linear Unit (SiLU) function, which offers several advantages in deep learning architectures. This modification significantly enhances the network’s performance by introducing smooth, non-monotonic characteristics that help maintain stable gradients during backpropagation. When integrated with the Squeeze-and-Excitation (SE) attention mechanism, this combination ensures a robust feature extraction framework. The SE attention mechanism adaptively recalibrates channel-wise feature responses, while SiLU activation provides smoother decision boundaries and better gradient flow throughout the network. This synergistic effect results in improved feature representation and more stable training dynamics. The enhanced attentional PointMLP network architecture, as illustrated in
Figure 4, demonstrates superior performance in processing point cloud data by optimizing the combination of these advanced components. Such architectural improvements lead to better convergence properties and more discriminative feature learning capabilities, particularly in complex 3D vision tasks where point cloud processing is crucial.
In this study, the dataset generated via the diffusion probability model was divided into the training set and the test set according to a ratio of 9:1, and the SE-attention PointMLP network was used for training. Finally, the classification network model was obtained. The point cloud data of a single pile foundation was fed into the classification network model, and identification and judgment were carried out to detect the existence and types of defects in the pile foundation.
3.3. Slice Method for Volume Calculation
A defective pile foundation is a type of irregularity. This study utilized a slice calculation method for the point cloud volume. The basic idea is to slice the point cloud along the Z-axis, then calculate the slice area and obtain the total volume by summing the sliced volumes. The slicing method consists of four stages: pile foundation point cloud slicing, contour boundary point finding and sequencing, slice area calculation, and point cloud volume calculation. The process is as follows:
Pile foundation point cloud slicing: between the minimum value and maximum value of the point cloud Z-axis, a fixed width is set to cut the point clouds from bottom to top and obtain point cloud slices successively.
Contour boundary point finding and sorting: the boundary point cloud is extracted using the grid division method, and then the out-of-order point cloud is sorted using the polar coordinate method.
Area calculation for slices: the area is calculated using the contour point area statistics method for the sorted contour points.
Point cloud volume calculation: the product of the area of each slice and the fixed width is accumulated to obtain the volume of the entire pile base point cloud. The key stages are the contour boundary point finding and sorting, and the slice area calculation.
3.3.1. Contour Boundary Point Finding and Sorting
The gridding method is divided into three steps: (1) gridding; (2) finding the boundary grid; and (3) extracting the boundary lines. The first step is to create a minimum bounding box for the data point set and partition it with a rectangular grid at a specific interval. Then, the boundary grids are found and connected to form a “coarse boundary” consisting of boundary grids. Finally, for each boundary grid, it is determined whether the points within are boundary points according to certain rules. The initial boundary is connected and smoothed.
As the contour boundary points found are disordered and have a large impact on the subsequent area calculation, this study proposes using a polar coordinate method. The center of mass of the contour boundary points is calculated first, and then the polar coordinates of each point are calculated and sorted according to the polar coordinate angle.
3.3.2. Slice Area Calculation
If n vertices
are specified to form a polygon with the first and last vertices joined counterclockwise, the area enclosed can be calculated as follows:
where
are the coordinates of the vertices
for the outer contour polygon P of the sliced plane point cloud; i is the vertex number for the outer contour boundary polygon of the point cloud slice; and n is the number of point cloud slices.
In this study, the contour point area statistics method described above was used for the calculation of point cloud slice areas. The simplicity of the calculating process ensures the accuracy of the point cloud volume.