Next Article in Journal
Man-Made Objects Classification in Long-Baseline Monostatic–Bistatic SAR Images: Algorithm Training and Testing on Repeat-Pass CSG Images
Previous Article in Journal
Full-Depth Inversion of the Sound Speed Profile Using Remote Sensing Parameters via a Physics-Informed Neural Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

GPRAformer: A Geometry-Prior Rational-Activation Transformer for Denoising Multibeam Sonar Point Clouds of Exposed Subsea Pipelines

1
College of Oceanography and Space Informatics, China University of Petroleum (East China), Qingdao 266580, China
2
College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Remote Sens. 2026, 18(3), 439; https://doi.org/10.3390/rs18030439
Submission received: 30 October 2025 / Revised: 7 January 2026 / Accepted: 29 January 2026 / Published: 30 January 2026

Highlights

What are the main findings?
  • A Transformer model based on geometric priors and rational activation mechanism (GPRAformer) is proposed, which can achieve more accurate noise segmentation of MBES point clouds in complex seabed environments.
  • High-precision and robust MBES point-cloud noise segmentation in complex seabed environments is achieved through the pipeline-informed prior encoder (PIPE) feature-sampling module, the rational-activation Kolmogorov–Arnold network Transformer (RaKANsformer) feature-extraction module, and the class-adaptive loss (CAL)-constrained noise-segmentation module.
What are the implications of the main finding?
  • The PIPE feature sampling module extracts pipeline geometric priors to enhance the separability between pipeline and noise points; the RaKANsformer feature extraction module strengthens feature extraction through self-attention, gated attention, and rational activations; and the CAL constraint noise segmentation module mitigates false and missed detections arising from class imbalance in MBES point-cloud data via class-adaptive weighting.
  • This method is capable of completely preserving the geometric contour of the exposed pipeline, which validates its outstanding performance and strong stability in complex marine environments.

Abstract

The detection of exposed subsea pipelines is a key task in current marine remote sensing, and multibeam echosounders (MBESs) are a primary instrument for detecting exposed pipelines. However, complex seabed environments interfere with acoustic echoes, introducing substantial noise points into MBES point-cloud data and substantially degrading its quality. Conventional point-cloud denoising methods struggle to suppress noise while simultaneously preserving pipeline integrity, whereas point-cloud noise-segmentation methods can better address this challenge. Nevertheless, noise-segmentation methods remain constrained by the lack of geometric priors and the presence of class imbalance. To address these issues, this paper proposes a geometry-prior and rational-activation Transformer for the MBES point-cloud denoising of exposed subsea pipelines (GPRAformer). The method comprises the following three core designs: a pipeline-informed prior encoder (PIPE) sampling module to enhance the separability between pipeline points and noise points; a rational-activated Kolmogorov–Arnold network transformer (RaKANsformer) feature extraction module that couples gated self-attention with KAN structures using rational-function activations for joint feature extraction, thereby strengthening global dependency modeling and nonlinear expressivity; and class-adaptive loss (CAL)-constrained noise-segmentation module that introduces intra-class consistency and inter-class separation constraints to mitigate false detections and miss detections arising from class imbalance. Evaluations on actual measured MBES point-cloud datasets show that, compared with the suboptimal model under each metric, GPRAformer achieves improvements of 6.83%, 1.78%, 5.12%, and 6.20% in mean intersection over union (mIoU), Accuracy, F1-score, and Recall, respectively. These results indicate a significant enhancement in overall segmentation performance. Therefore, GPRAformer can achieve high-precision and robust MBES point-cloud noise segmentation in complex seabed environments.

1. Introduction

Subsea pipelines are a critical infrastructure for marine energy transmission. However, due to the complex seabed topography and spatiotemporally variable hydrodynamic conditions, pipelines originally buried beneath the seabed can become prone to exposure and may suffer damage or even rupture under the combined effects of hydrodynamic erosion and vessel operations, leading to oil and gas leakage and, consequently, severe environmental pollution and economic loss [1]. Therefore, regular inspection of exposed subsea pipelines is essential. Nevertheless, the complex and variable seafloor environment often hinders direct observation of targets, making marine remote sensing an effective approach for pipeline inspection.Multibeam echosounders (MBESs), which are used marine remote sensing instruments, can acquire high-resolution three-dimensional point-cloud data [2,3], providing essential support for detecting exposed subsea pipelines and their operating principle is illustrated in Figure 1. An MBES transmits acoustic pulses toward the seafloor via a transducer and receives the echoes; water depth is then obtained by calculating the acoustic travel time. The system can simultaneously measure tens to hundreds of points in the across-track direction, thereby forming a depth swath with a specific swath width [4]. However, during acquisition, MBES point clouds are limited by the acoustic imaging mechanism and thus contain substantial spatially clustered noise. This noise is primarily caused by water-column disturbances, sidelobe effects, and discontinuities in echo-intensity distributions, which degrades the quality of MBES point-cloud data.
Because of the strong spatial coupling between noise and exposed pipelines, conventional point-cloud denoising methods struggle to effectively suppress noisy points, making it difficult to preserve the integrity of pipeline structures. In contrast, point-cloud noise segmentation casts MBES denoising as a binary segmentation problem that distinguishes noise points from non-noise points, thereby separating noise while maximally preserving the features of exposed pipelines. Existing point-cloud noise-segmentation algorithms mainly include traditional geometry-based methods and deep learning methods. The former primarily rely on explicit features such as normals, curvature, or neighborhood statistics [5,6,7]. Specifically, the region growing algorithm [8] analyzes the continuity of point-cloud normals and curvatures to perform noise segmentation based on local geometric consistency; the k-nearest neighbor (KNN) algorithm [9] partitions point sets via neighborhood clustering according to inter-point distances and density patterns; and the random sample consensus algorithm (RANSAC) [10] iteratively estimates model parameters by random sampling of minimal point sets, thereby achieving noise segmentation while enabling robust extraction of primitive structures such as planes or cylinders under heavy-noise conditions. Although these methods can yield satisfactory results in simple or structured scenes, their generalization capability remains limited in complex seafloor environments.
In recent years, deep learning-based models have substantially improved point-cloud noise-segmentation accuracy through end-to-end feature extraction, among which Transformer architectures excel at capturing long-range dependencies and global context via self-attention [11,12,13,14]. However, when confronted with non-Euclidean, sparse, and unevenly distributed point-cloud data, the feed-forward network (FFN) branch of conventional Transformers often shows limited nonlinear representational capacity in modeling local geometric features. Recently, the Kolmogorov–Arnold Network (KAN) has been proposed to replace fixed activation functions with learnable functions, thereby enhancing the nonlinear characterization of local geometric features while maintaining model interpretability [15,16,17]. Therefore, integrating the strengths of Transformers and KANs enables more effective feature extraction in MBES point clouds under complex seafloor conditions.
However, when applying existing deep models to MBES point-cloud data, it is essential to account for the unique data distribution and noise characteristics inherent to MBES measurements. Noise in MBES point clouds often appears in strip-shaped or clustered patterns and tends to form high-density interference regions near exposed pipelines, causing true pipeline points to be misclassified as outliers and thus disrupting the geometric continuity of the structure. In addition, noise points constitute only a very small proportion of the entire dataset, resulting in a significant class-imbalance problem. Most existing noise-segmentation methods, however, are designed for remote sensing or general 3D scenes and are primarily optimized for typical targets such as terrain and buildings. These approaches lack specificity for the underwater noise characteristics of MBES point clouds. Consequently, they often struggle to meet the stringent geometric-continuity requirements of slender pipeline structures in MBES data and exhibit limited performance when dealing with the unique noise distributions present in underwater environments. Furthermore, the severe class imbalance frequently leads to high false-positive and false-negative rates.
Therefore, this paper proposes a geometry-prior, rational-activation Transformer for the multibeam echosounder point-cloud noise segmentation of exposed subsea pipelines, termed GPRAformer. This method enables precise noise removal in complex seafloor environments while preserving pipeline details. The contributions are as follows:
(1)
To address the tendency of current point-cloud noise-segmentation algorithms to disregard pipeline characteristics and thereby confuse pipeline and noise points, we propose a pipeline-informed prior encoder (PIPE) sampling module that constructs prior features via pipe-axis estimation and cylindrical-coordinate feature design to enhance the separability between pipeline and noise points.
(2)
To address the insufficient nonlinear representation and weak interpretability of existing models in complex seafloor environments, this paper proposes a rational-activated KAN transformer (RaKANsformer) feature extraction module. This module leverages self-attention to fully model the global dependencies of point clouds, employs gated attention to highlight salient features, and integrates the rational-activated KAN (RaKAN) block as its nonlinear modeling component. Compared with conventional piecewise-linear activations, the rational activation provides higher curvature expressiveness with a stable response to extreme inputs, making it better suited for modeling the coexistence of regular pipeline geometry and irregular geometric disturbances in MBES point clouds.
(3)
To address class imbalance between noise points and non-noise points, which often leads to missed detections and misclassifications, we propose a class-adaptive loss (CAL) constraints noise-segmentation module. By establishing intra-class consistency loss (LICC) and inter-class separation loss (LICS) through neighborhood consistency-based graph smoothing constraints and local support degree-based outlier penalty mechanisms, the problem of mis-detection and missed detection caused by the imbalance of noise points and non-noise points in MBES point clouds is effectively alleviated.

2. Materials

The encoder of a Transformer primarily consists of multi-head self-attention and an FFN. The former adaptively models global features by computing attention weights in parallel across different subspaces, whereas the latter employs nonlinear mappings to further enhance feature expressiveness. In recent years, Transformers have been progressively introduced into point-cloud noise segmentationtasks. Guo et al. [18] proposed the Point Cloud Transformer (PCT), whose key idea is to employ an offset-attention module to sharpen attention weights and thereby reduce the impact of sparse noise. However, PCT relies mainly on global attention and lacks explicit geometric constraints, making it less effective in scenarios involving complex seafloor morphology and spatially correlated noise. Wang et al. [19] introduced a transformer-based cross-task interactive primitive segmentation (TCIPS) method that models global spatial relationships among all tasks to improve segmentation accuracy. Liang et al. [20] proposed a Transformer network with a multi-level geometric feature embedding (MGFE-T), which leverages 3D structural information in point clouds to enhance Transformer performance in semantic segmentation. Nevertheless, MGFE-T focuses primarily on semantic feature enrichment and does not explicitly address the strong backscatter noise and stratified outliers commonly found in multibeam bathymetric point clouds.
In contrast, KAN replaces the weighted-sum structure of conventional neurons with functional nodes, enabling multidimensional mappings via a family of one-dimensional learnable functions and effectively enhancing the model’s nonlinear representational capacity and interpretability. Recently, KAN-based point-cloud segmentation has been investigated. Zhou et al. [21] proposed a model termed KANFilter, which emulates the real denoising process in an ensemble manner and achieves competitive results. This is beneficial for mitigating small-scale noise commonly found around seafloor microstructures. Building on KAN, Zhou et al. [22] introduced FGPointKAN++, which further strengthens local geometric consistency and detail preservation, enabling improved segmentation of fine-grained structures in relatively clean environments. These works highlight KAN’s advantage in modeling local geometric variations and enhancing noise–target separability at the neighborhood scale. For MBES point clouds, however, noise typically exhibits strong spatial correlation and therefore requires a comprehensive analysis that incorporates global structural information rather than relying solely on local filtering. Consequently, models based solely on KAN remain insufficiently robust for denoising in complex seafloor environments.In summary, Transformers and KAN offer complementary strengths in feature modeling. Transformers excel at capturing global contextual information and modeling dependencies among distant points through self-attention. KAN, by introducing learnable one-dimensional rational functions into the FFN, achieves efficient nonlinear mappings and interpretable feature transformations, enabling more fine-grained modeling of local geometric variations. To fully exploit this complementary advantage, a RaKAN-based feed-forward unit is designed, where rational functions are used as the basis functions of the KAN layer to enhance its nonlinear representation capability. Building on this, RaKAN is integrated with multi-head attention and a gated attention mechanism to construct the RaKANsformer module. This design enables the model to capture both global contextual information and local geometric details, thereby achieving high-precision modeling of complex multibeam point-cloud structures.

3. Method

The overall pipeline of the proposed GPRAformer comprises the following three stages: a PIPE feature sampling module, a RaKANsformer feature extraction module, and a CAL constraint noise segmentation module. The network architecture is shown in Figure 2.
First, the input MBES point-cloud data are processed by furthest point sampling (FPS) to select centroids, and local neighborhoods are constructed using KNN. On this basis, a PIPE feature sampling module is designed, which derives geometric priors such as radial, axial, and angular features via pipe-axis estimation and cylindrical-coordinate construction, and it embeds them into the coordinate information as network inputs to enhance the separability between pipeline and noise points. Subsequently, the geometry-prior-augmented point cloud is partitioned into group tokens, encoded, and fed into the RaKANsformer feature extraction module, which models global dependencies through self-attention, performs salient feature selection via gated attention, and enhances nonlinear representation and interpretability through RaKAN blocks, thereby further improving noise-segmentation discriminability. After upsampling and once fully connected layers are applied to the enhanced features, optimization is conducted using the CAL constraint noise-segmentation module, which jointly alleviates false and missed detections caused by class imbalance and achieves precise removal of noise points.

3.1. PIPE Feature Sampling Module

The PIPE feature sampling module achieves structured resampling and feature enhancement of the input point cloud by constructing a local neighborhood with pipeline geometric constraints. The core PIPE block can explicitly model the axial direction and radial distribution features within the local region, providing high-quality geometric inputs for the subsequent RaKANsformer feature extraction.
The core idea of the PIPE block is to leverage the geometric properties of the pipeline. The pipeline is characterized geometrically by rotational symmetry and axial extension; therefore, for an ideal pipe wall, the radial distance from a point to the axis should be approximately constant. This property allows the radial distance to serve as a criterion for distinguishing pipeline points from noise, since the farther a point deviates from the axis, the higher its likelihood of belonging to the background or noise. In practice, however, scale differences arise across regions due to variations in pipe diameter and pose, and using a fixed distance threshold would lead to misclassification. To address this, for each point-cloud frame we estimate a local axis from the principal direction and centroid to compensate for pose differences, and we then normalize the radial distance.
In addition, we construct cylindrical-coordinate features consistent with pipeline morphology, including the radial distance ρ i , sine and cosine expansions of the circumferential angle θ , and the axial position z, and apply robust normalization based on the median and quantiles. The above geometric prior features are concatenated with the coordinates to obtain the enhanced input data.
For each frame, the input point cloud is an N × 4 matrix as follows: the first three columns are the 3D coordinates ( x , y , z ) , and the fourth column is a binary label, the value ranges from 0 to 1, where 0 denotes ground points and 1 denotes noise points. For each frame, the principal direction and the centroid are estimated. The centroid is computed as follows:
c = 1 N i = 1 N x i R 3
where c R 3 denotes the centroid of the frame’s point cloud, N is the number of points in that frame, and x i R 3 is the 3D coordinate of the i-th point. Each point is translated to a coordinate system with the centroid as the origin to construct the centered matrix X ˜ as follows:
X ˜ = X E c R N × 3
where E is an all-ones column vector and X is the coordinate matrix of the frame’s point cloud. We perform singular value decomposition (SVD) on the centered point-cloud matrix. The first right singular vector is the direction of maximum variance, which is the optimal fitting axis. Under the condition of numerical stability and without hyperparameters, the pipeline’s axial direction and its orthogonal basis can be directly obtained for subsequent prior construction.
X ˜ = U S V
V = [ v 1 , v 2 , v 3 ]
a = v 1 v 1 2
where a is the unit-vector estimate of the pipe-axis direction, and let a 2 = 1 .
The axial and radial decomposition of an arbitrary point x i are as follows:
v i = x i c
z i = v i a
r i = v i z i a
ρ i = r i 2
among them, v i represents the radial coordinate, z i represents the projection coordinate along the axis direction, r i represents the radial component, and ρ i represents the Euclidean distance to the axis. On the plane orthogonal to a, u is obtained by the Gram–Schmidt orthogonalization of e x or e y . By v = a × u , the orthogonal basis u , v can be obtained, and further, the circumferential angle θ i is as follows:
θ i = a t a n 2 r i v , r i u
obtaining the sine and cosine representations of θ i as sin θ i and cos θ i , which avoid numerical discontinuities near [ π , π ] and benefit gradient-based optimization and generalization.
Meanwhile, to enhance the stability of the values, ρ i and z i are standardized as follows:
ρ ^ i = ρ i Q 0.95 ( { ρ j } ) + ε
z ^ i = z i median ( { z j } ) Q 0.95 ( { | z j median ( { z k } ) | } ) + ε
among them, Q 0.95 ( · ) denotes the 95th percentile, median ( · ) denotes the median, and ε is a small constant to avoid division by zero. The quantities ρ ^ i and z ^ i are the standardized radial distance and axial position, respectively. Finally, the prior-enhanced feature ϕ i of the pipeline is obtained as follows:
ϕ i = [ ρ ^ i , sin θ i , cos θ i , z ^ i ]

3.2. RaKANsformer Feature Extraction Module

In large-scale point-cloud processing, existing methods often struggle to adequately capture point-to-point dependencies and exhibit limited capability to represent complex nonlinear geometric features. When high-density noise is present, the extracted features often fail to maintain both semantic consistency and structural integrity. To address these issues, a feature-extraction module called RaKANsformer is proposed, which includes the following three core components: a multi-head self-attention mechanism that captures long-range relationships among points; a gated attention mechanism that adaptively highlights salient geometric features; and a RaKAN block that enhances nonlinear local geometric representations through rational activation mappings. The overall architecture is shown in Figure 3. In the RaKANsformer module, the geometry-prior-enhanced point cloud is divided into group tokens, encoded, and processed through multi-head self-attention, effectively capturing global features and achieving robust and interpretable feature extraction in complex multibeam point-cloud environments.
Multi-head self-attention constructs multiple attention subspaces in parallel within the feature space to learn point-to-point dependencies at different semantic levels. Its core idea is to generate the query (Q), key (K), and value (V) matrices from the input features and to compute similarity weights between points, thereby forming a global relation matrix as follows:
A t t ( Q , K , V ) = S o f t m a x Q K T d k V
among them, d k represents the scaling factor. This mechanism adaptively captures semantic relationships between distant local regions in the point cloud, enabling cross-region feature interaction and global structure modeling, thereby providing rich contextual support for subsequent geometric feature learning.
To further enhance the model’s feature selection and channel expressivity, a gated-attention module is introduced after the multi-head attention output. This module first applies a 1 × 1 convolution for channel compression, and it then uses a second 1 × 1 convolution to restore the channel dimensionality, followed by a Sigmoid activation to generate a channel-wise weight map A c . The weights A c are multiplied element-wise with the original features to achieve adaptive reweighting as follows:
F = F A c
among them, F represents the output feature, and ⊙ indicates element-wise multiplication. This process highlights semantically salient channels and suppresses noise interference, effectively enhancing the robustness and expressiveness of point-cloud features.
In addition, the RaKAN module introduces a learnable family of one-dimensional rational functions as the activation units in KAN, allowing both the activation curvature and the effective response range to be adaptively adjusted during training. This design supports fine-grained modeling of highly nonlinear mappings associated with complex geometric patterns in MBES point-cloud data, where regular pipeline structures often coexist with irregular geometric disturbances, including outliers and local density bursts. Concretely, the rational activation in RaKAN is parameterized as the ratio of a second-order numerator polynomial to a second-order denominator polynomial, and all coefficients in both polynomials are trainable. These parameters are progressively optimized through backpropagation, enabling the activation to shape its nonlinear response in a data-driven manner and thereby providing flexible yet stable feature transformations for robust geometric feature extraction.
In summary, the RaKANsformer module provides a unified representation of global dependency modeling and local geometric enhancement, maintaining strong feature robustness and discriminative capability under high-noise point-cloud conditions and providing a more reliable feature foundation for MBES point-cloud noise-segmentation tasks.

3.3. CAL Constraint Noise Segmentation Module

In point-cloud noise segmentation, noise points typically constitute the minority class. Conventional cross-entropy loss is susceptible to class imbalance, causing the model to bias toward predicting non-noise points and thus leading to severe missed detections. Moreover, point clouds exhibit pronounced local spatial correlation; if one relies solely on point-wise classification loss, the predictions tend to contain discontinuities or isolated erroneous labels. To address these issues, we design a CAL constraint noise-segmentation module, composed of the following three components: cross-entropy (CE), LICC, and LICS. These three terms are jointly optimized to balance classification accuracy and spatial consistency.
Adopt the standard cross-entropy loss LCE as the primary loss as follows:
L c e = 1 N i = 1 N m = 1 2 y i m log y ^ i m
where y i m is the ground-truth label of point i, y ^ i m is the predicted class probability, and m is the class index.
Meanwhile, considering that point clouds exhibit local spatial correlation and neighboring points often belong to the same class, we construct neighborhoods via KNN and constrain the similarity of predicted probabilities within each neighborhood to formulate the LICC as follows:
L i c c = 1 N i = 1 N y ^ i 1 K j N i y ^ j 2
L i c c represents LICC, y ^ i denotes the predicted probability that point i is noise, N i is the KNN set of point i, y ^ j is the predicted probability of a neighbor j, and K is the neighborhood size. This loss minimizes the mean squared deviation between each point’s prediction and those of its neighbors, enforcing intra-class proximity and producing smoother, more consistent predictions within each neighborhood.
Additionally, to suppress isolated false detections, an inter-class separation constraint is introduced. Under this mechanism, if a point is predicted as noise but its neighbors provide insufficient support, a penalty is imposed on that point as follows:
L i c s = 1 N i = 1 N y ^ i · max 0 , τ 1 K j N i y ^ j
L i c s represents LICS and τ represents the neighborhood support threshold.
The overall loss form constructed is as follows:
L C A L = α L c e + β L i c c + γ L i c s
where α , β , and γ are weights. Acting jointly, the three terms enable the model to achieve high classification accuracy in point-cloud denoising while markedly improving spatial consistency and robustness. In implementation, set α = 1.0 to ensure that the cross-entropy loss remains the dominant supervisory signal, while β and γ are set to 0.5 to assign moderate strength to the intra-class consistency and inter-class separation terms. Such a configuration enables the model to maintain its focus on the primary classification objective while learning more discriminative and spatially consistent feature representations, thereby exhibiting a more stable optimization process and enhanced generalization capability in complex point-cloud scenarios.

4. Results

4.1. Experimental Data and Setup

The exposed submarine pipeline dataset used in our experiments was acquired during a representative offshore survey using a Sonic 2024 multibeam echosounder (R2Sonic, LLC, Austin, Texas, USA). This dataset exhibits a severe class imbalance, with pipeline points accounting for only 1.57% of all points. The remaining points consist of complex seabed background structures and noise, with non-noise background points accounting for 88.59% and noise points accounting for 11.41%. Moreover, although the data was collected from the same survey area, there is still significant diversity within the data. The undulation of the seabed terrain, the exposure morphology of the pipelines, and their local geometric variations differ at different spatial positions. Additionally, the point-cloud point density and noise distribution characteristics fluctuate with the measurement conditions and the terrain background, resulting in samples covering various and diverse inspection scenarios. The collection area and the actual measurement data display are shown in Figure 4.
Furthermore, in order to clearly and accurately describe the architecture of GPRAformer, the relevant parameters are shown in Table 1 and the network dimensions are presented in Table 2. Experiments were conducted on a Linux server equipped with an Intel Core i9 CPU, 128 GB system memory, and NVIDIA GeForce RTX 4090 GPU, and the model was optimized using AdamW. During training, the model weights corresponding to the best validation performance were saved dynamically.
In the GPRAformer, the training batch size is set to 16, with each input point cloud consisting of 50,000 three-dimensional coordinate points, forming an initial tensor of size [16, 50,000, 3]. Subsequently, the PIPE geometric prior sampling module performs structured resampling, dividing the point cloud into 128 local groups. Each group contains 32 points, and every point is encoded into a 7-dimensional feature vector composed of 3D coordinates and four geometric prior features, resulting in a geometrically enhanced representation of size [16, 128, 32, 7]. To facilitate sequence modeling in the RaKANsformer module, this tensor is reshaped into the sequence format [16 × 128, 7, 32]. Each local group is then mapped to a 256-dimensional feature vector, producing a group-level global embedding of size [16, 128, 256]. Finally, the network performs point-wise classification through fully connected layers to effectively distinguish between noise points and non-noise points.

4.2. Evaluation Indicators

In the noise-segmentation task for exposed subsea pipelines, true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values are the key metrics for evaluating model performance. TP denotes the number of noise points correctly identified by the model; TN denotes the number of non-noise points correctly identified; FP denotes the number of non-noise points predicted as noise; and FN denotes the number of noise points predicted as non-noise. Building on these metrics, to more comprehensively evaluate performance in this task, we adopt the mean intersection over union (mIoU) as the core metric [23,24]. mIoU measures the overlap between model predictions and ground-truth labels and is widely used in semantic segmentation. The calculation formula is as follows:
mIoU = TP TP + FP + FN
In addition, we further introduce Accuracy [25], F1-Score [26,27], and Recall to more comprehensively assess overall model performance. Accuracy denotes the overall average classification accuracy. Recall evaluates the model’s ability to identify positive samples. F1-Score assesses both the proportion of correct identifications and the degree to which all relevant samples are covered. The calculation of these metrics is as follows:
Accuracy = TP + TN TP + FP + FN + TN
Recall = TP TP + FN
F 1 - Score = 2 × P × Recall P + Recall
among them, P is defined as follows:
P = TP TP + FP

4.3. Noise Segmentation Experimental Results

To objectively evaluate performance, the point-cloud noise-segmentation results of GPRAformer were compared with those of traditional algorithms and four recent methods. Figure 5, Figure 6, Figure 7, Figure 8, Figure 9 and Figure 10 present two representative MBES point-cloud samples. Blue points denote non-noise points, and red points denote noise points. Figure 5 and Figure 6 show the top view and front view segmentation results of Dataset A. In Figure 6, the region marked by a yellow box indicates an exposed pipeline, which is enlarged for clearer visualization and shown in Figure 7. Figure 8 and Figure 9 display the corresponding segmentation results of Dataset B. The region marked by a yellow box, representing the exposed pipeline area, is enlarged and shown in Figure 10.
From the overall comparison, the RASCAN algorithm exhibits obvious mis-segmentation in high-noise regions; in particular, noise points near the pipeline are not effectively distinguished, causing portions of the pipeline structure to be overwhelmed by noise or to lose information. The PCT algorithm performs well in preserving global structure, but its suppression of local detail noise is limited, leading to a large number of ground points being misclassified. KANFilter and MGFE-T are insufficient at removing large noise clusters and fail to fully eliminate background interference. FGPointKAN++ is unstable; its segmentation on data B is relatively better, but on data A it still shows problems similar to those of KANFilter and MGFE-T. Moreover, none of the above methods effectively distinguishes noise near the pipeline. Some misclassify pipeline points as noise, resulting in missing pipeline information, whereas others fail to detect nearby noise points, causing the pipeline structure to be further obscured by noise.
In contrast, GPRAformer yields clear and complete denoising results on MBES point-cloud data, preserving terrain details while fully retaining the geometric outline of the exposed pipeline. The comparisons of the magnified pipeline regions within the yellow boxes in Figure 6 and Figure 8 demonstrate that GPRAformer achieves superior denoising performance, particularly in the edge-transition areas of the pipeline, where noise and pipeline points are strongly coupled.
To further validate the noise-segmentation performance of each algorithm, we conducted a quantitative evaluation of the denoised MBES point-cloud results. Table 3 presents the average mIoU, Accuracy, F1-score, and Recall over all test data for the different methods.
From the metrics in Table 3, clear differences in segmentation accuracy are observed across methods. The traditional RASCAN algorithm performs the worst overall, with an mIoU of 42.21% and an Accuracy of 69.88%, indicating limited denoising effectiveness under high-noise conditions. MGFE-T and PCT show notable improvements over traditional approaches as follows: MGFE-T attains an mIoU of 47.39%, an Accuracy of 70.76%, and F1-score and Recall of 57.54% and 75.52%, respectively. Although these results slightly improve upon conventional methods, the overall accuracy remains low. PCT achieves an mIoU of 85.11%, clearly outperforming MGFE-T and suggesting enhanced global modeling capability; however, its F1-score is only 77.19%, reflecting residual local mis-segmentation. KANFilter reaches an mIoU of 81.10% with an Accuracy of 92.47%, but the overall gain is limited. FGPointKAN++ exhibits strong Accuracy and an F1-score of 82.93%, yet its mIoU is somewhat lower, indicating that, despite higher overall recognition accuracy, some errors persist.
In contrast, GPRAformer achieves the best performance across all metrics as follows: mIoU of 91.94%, Accuracy of 95.60%, F1-score of 88.05%, and Recall of 91.95%. These results indicate that GPRAformer attains an optimal balance between noise suppression and structural preservation and demonstrates superior robustness and accuracy in complex seafloor environments.
To more intuitively demonstrate how segmentation accuracy influences denoising performance, Figure 11 and Figure 12 present a comparison of MBES point-cloud data A and data B before and after denoising. Each figure is organized into four rows to clearly show the results at different stages as follows: (a–c) display the noisy raw point clouds, including the top view, front view, and a zoomed-in pipeline region; (d–f) show the clean reference results after noise removal; (g–i) present the denoising results obtained using the RANSAC method; and (j–l) illustrate the denoising results produced by our method. RANSAC is selected for comparison because its performance is noticeably inferior to that of other methods, making it more suitable for highlighting the impact of segmentation accuracy on the final denoising outcome.
From Figure 11, it can be observed that the RANSAC algorithm mistakenly removes a large number of seabed points, resulting in noticeable holes in the background region. Moreover, in the critical pipeline area, some true pipeline points are incorrectly classified as noise and removed, compromising the structural integrity of the pipeline. In Figure 12, the extensive misclassification of seabed points leads to an even sparser point cloud, further reducing the continuity of the overall geometric shape. In contrast, the method proposed in this paper more effectively preserves both the seabed background and the pipeline structure, achieving a stable balance between noise suppression and geometric fidelity.

5. Discussion

5.1. Ablation Experiment

To further verify the effectiveness of the proposed modules, we designed and conducted ablation studies. The results are shown in Table 4.
In the RaKANsformer-based multibeam point-cloud denoising results, mIoU, Accuracy, F1-score, and Recall are 87.51%, 91.86%, 79.83%, and 88.19%, respectively. After introducing the PIPE module, all four metrics improve markedly to 89.95%, 94.63%, 85.07%, and 89.71%. This indicates that PIPE effectively enhances discriminability and structural consistency for exposed pipelines, yielding more robust separation of noise from valid structures near the pipeline. Building on this, adding the CAL module further boosts performance, with mIoU, Accuracy, F1-score, and Recall increasing to 91.94%, 95.60%, 88.05%, and 91.95%, demonstrating that CAL substantially reduces false and missed detections.

5.2. Hyperparameter Sensitivity Analysis

To determine the optimal strategy, we conducted a sensitivity analysis on hyperparameters such as the learning rate and the number of training epochs. Each parameter setting was independently executed 10 times, and the average results were used to enhance the robustness of the conclusions. The average performance of the four evaluation metrics on the validation set is shown in Figure 13.
When the learning rate is searched within 10 5 to 10 1 , 1 × 10 3 yields the best and most balanced performance; an excessively large rate makes training unstable, whereas an overly small rate leads to slow convergence and limited performance. With respect to training epochs, all metrics peak at 200; at 100 and 150, the model remains underfitted, while at 250 and 300, overfitting emerges. Therefore, a learning rate of 1 × 10 3 and 200 epochs constitute the optimal setting, which is adopted in subsequent experiments.

5.3. Complexity and Inference Time Analysis

To compare the efficiency of different point-cloud noise segmentation algorithms, Table 5 summarizes the computational cost and inference speed of several baseline methods on the MBES exposed-pipeline point-cloud dataset used in this study. Since RANSAC is a classical geometric method without learnable parameters, its model size is not reported. For the remaining deep learning methods, both the parameter counts and the total inference time on the point-cloud test set under the same experimental environment are provided.
It should be noted that the model size of GPRAformer is relatively large, and its inference speed is not the fastest. This is mainly because the network incorporates structured modules such as geometry-prior encoding and rational-activation Transformers, which enhance the model’s ability to capture complex noise patterns and local geometric characteristics while inevitably introducing additional learnable parameters and computational cost. In contrast, other baseline methods have lower computational overhead but exhibit certain limitations in accuracy. Overall, although GPRAformer has slightly higher complexity than some lightweight models, this cost does not negatively affect its overall performance, and the model maintains a favorable balance between accuracy and robustness.

6. Conclusions

To address the severe noise interference in MBES point clouds under complex seafloor conditions and the difficulty of conventional noise-segmentation methods in balancing structural integrity with noise suppression, this paper proposes a geometry-prior, rational-activation transformer-based denoising method for exposed subsea pipeline point clouds (GPRAformer). The approach comprises the following three core modules: a PIPE feature sampling module, a RaKANsformer feature extraction module, and a CAL constraint noise-segmentation module. The PIPE feature sampling module extracts pipeline geometric priors to enhance the separability between pipeline and noise points; the RaKANsformer feature extraction module strengthens feature extraction through self-attention, gated attention, and rational activations; and the CAL constraint noise-segmentation module mitigates false and missed detections arising from class imbalance in MBES point-cloud data via class-adaptive weighting. Experiments on in-situ MBES datasets of exposed subsea pipelines show that GPRAformer outperforms existing methods in mIoU, Accuracy, F1-Score, and Recall. Moreover, the method preserves the full geometric outline of exposed pipelines in the noise-segmentation task, validating its superior performance and robustness in complex marine environments.
Although the proposed method achieves strong performance in terms of noise-segmentation accuracy and robustness, there remains room for improving inference efficiency, which may pose a limitation in real-time or near-real-time engineering applications. It should be noted that the primary objective of this study is to reliably suppress noise while maximally preserving the geometric integrity of subsea pipelines under complex seafloor conditions. Therefore, the current model architecture and hyperparameter settings are designed to prioritize upper-bound performance and stability rather than minimizing latency and resource consumption. For practical deployment, future work will proceed along the following two directions: first, developing a lightweight version of the model and incorporating compression strategies such as pruning and distillation to reduce computation and GPU memory usage while maintaining segmentation accuracy as much as possible; and second, systematically analyzing the trade-off between computational efficiency and accuracy on target hardware platforms and providing recommended configurations and deployment schemes for different application scenarios, thereby better supporting field applications that require both efficiency and reliability.

Author Contributions

Conceptualization: J.Z. and S.D.; methodology: J.Z.; validation: S.D. and J.L.; formal analysis: J.Z. and W.J.; investigation: J.Z. and X.C.; data curation: W.J.; writing—original draft: J.Z.; writing—review and editing: S.D. visualization: J.Z.; supervision: W.J. and S.D.; project administration: X.C. and J.L.; funding acquisition: X.C. and J.L.; J.Z. and S.D. contributed equally to the study and are co-first-authors. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62471494 and 52171341).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Cui, X.; Li, Y.; Li, J.; Zhang, J. Cross-PIC: A cross-scale in-context learning network for 3D multibeam point cloud segmentation of submarine pipelines. Ocean Eng. 2025, 315, 119778. [Google Scholar] [CrossRef]
  2. Downing, E.; O’Reilly, L.; Majcher, J.; O’Mahony, E.; Peters, J. A semi-automated, hybrid GIS–AI approach to seabed boulde’ detection using high-resolution multibeam echosounder. Remote Sens. 2025, 17, 2711. [Google Scholar] [CrossRef]
  3. Gościewski, D.; Gerus-Gościewska, M.; Szczepańska, A. Application of polynomial interpolation for iterative complementation of the missing nodes in a regular network of squares used for the construction of a digital terrain model. Remote Sens. 2024, 16, 999. [Google Scholar] [CrossRef]
  4. Zhou, J.; Koge, H.; Maki, T. Automation of MBES noise reduction: An approach based on seafloor bathymetry features derived from manual editing procedures. Ocean Eng. 2024, 299, 117397. [Google Scholar] [CrossRef]
  5. Xu, B.; Chen, Z.; Zhu, Q.; Ge, X.; Huang, S.; Zhang, Y.; Liu, T.; Wu, D. Geometrical segmentation of multi-shape point clouds based on adaptive shape prediction and hybrid voting RANSAC. Remote Sens. 2022, 14, 2024. [Google Scholar] [CrossRef]
  6. Zhao, F.; Huang, H.; Xiao, N.; Yu, J.; Geng, G. A point cloud segmentation algorithm based on multi-feature training and weighted random forest. Meas. Sci. Technol. 2024, 36, 015407. [Google Scholar] [CrossRef]
  7. Chen, X.; Mao, J.; Zhao, B.; Wu, C.; Qin, M. Facet-segmentation of point cloud based on multiscale hypervoxel region growing. J. Indian Soc. Remote Sens. 2025, 53, 3775–3796. [Google Scholar] [CrossRef]
  8. Yan, Z.; Zhao, H. Inner wall defect detection in oil and gas pipelines using point cloud data segmentation. Autom. Constr. 2025, 173, 106098. [Google Scholar] [CrossRef]
  9. Ye, J.; Liu, X.; Madhusudanan, H.; Wang, Y.; Zhu, J.; Wang, Y.; Ru, C.; Liu, X.; Sun, Y. Automatic point cloud clustering for surface defect diagnosis. IEEE Trans. Autom. Sci. Eng. 2025, 22, 12538–12547. [Google Scholar] [CrossRef]
  10. Chung, K.-L.; Chang, W.-T. Centralized RANSAC-based point cloud registration with fast convergence and high accuracy. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 5431–5442. [Google Scholar] [CrossRef]
  11. Li, T.; Lin, Y.; Cheng, B.; Ai, G.; Yang, J.; Fang, L. PU-CTG: A point cloud upsampling network using transformer fusion and GRU correction. Remote Sens. 2024, 16, 450. [Google Scholar] [CrossRef]
  12. Zhou, W.; Wang, Q.; Jin, W.; Shi, X.; He, Y. Graph transformer for 3D point clouds classification and semantic segmentation. Comput. Graph. 2024, 124, 104050. [Google Scholar] [CrossRef]
  13. Yuan, T.; Yu, Y.; Wang, X. Semantic segmentation of large-scale point clouds by integrating attention mechanisms and transformer models. Image Vis. Comput. 2024, 146, 105019. [Google Scholar] [CrossRef]
  14. Chu, X.; Zhao, S.; Dai, H. AIFormer: Adaptive interaction transformer for 3D point cloud understanding. Remote Sens. 2024, 16, 4103. [Google Scholar] [CrossRef]
  15. Lee, A.; Gomes, H.M.; Zhang, Y.; Kleijn, W.B. Kolmogorov–Arnold networks still catastrophically forget but differently from MLP. Proc. AAAI Conf. Artif. Intell. 2025, 39, 18053–18061. [Google Scholar] [CrossRef]
  16. Ren, J.; Wen, C.; Zhang, L.; Su, H.; Yang, C.; Lv, Y.; Yang, N.; Qin, X. High performance point-voxel feature set abstraction with mamba for 3D object detection. Expert Syst. Appl. 2025, 286, 128127. [Google Scholar] [CrossRef]
  17. Yang, X.; Wang, X. Kolmogorov–Arnold Transformer. arXiv 2025, arXiv:2409.10594. [Google Scholar]
  18. Guo, M.-H.; Cai, J.-X.; Liu, Z.-N.; Mu, T.-J.; Martin, R.R.; Hu, S.-M. PCT: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
  19. Wang, T.; Xi, W.; Cheng, Y.; Zhang, J.; Yin, R.; Yang, Y. Enhancing primitive segmentation through transformer-based cross-task interaction. Eng. Appl. Artif. Intell. 2025, 158, 111307. [Google Scholar] [CrossRef]
  20. Liang, Z.; Lai, X. Multilevel geometric feature embedding in transformer network for ALS point cloud semantic segmentation. Remote Sens. 2024, 16, 3386. [Google Scholar] [CrossRef]
  21. Zhou, X.; Shi, C.; Zhu, D.; Zhou, C. KANFilter: A simple and effective multi-module point cloud denoising model. Clust. Comput. 2025, 28, 423. [Google Scholar] [CrossRef]
  22. Zhou, G.; Ye, W.; Li, S.; Zhao, J.; Wang, Z.; Li, G.; Li, J. FGPointKAN++ point cloud segmentation and adaptive key cutting plane recognition for cow body size measurement. Artif. Intell. Agric. 2025, 15, 783–801. [Google Scholar] [CrossRef]
  23. Li, Y.; Liu, S.; Wu, J.; Sun, W.; Wen, Q.; Wu, Y.; Qin, X.; Qiao, Y. Multi-scale Kolmogorov–Arnold network (KAN)-based linear attention network: Multi-scale feature fusion with KAN and deformable convolution for urban scene image semantic segmentation. Remote Sens. 2025, 17, 802. [Google Scholar] [CrossRef]
  24. Zhang, R.; Huang, G.; Bao, F.; Guo, X. Multi-neighborhood sparse feature selection for semantic segmentation of LiDAR point clouds. Remote Sens. 2025, 17, 2288. [Google Scholar] [CrossRef]
  25. Farhadpour, S.; Warner, T.A.; Maxwell, A.E. Selecting and interpreting multiclass loss and accuracy assessment metrics for classifications with class imbalance: Guidance and best practices. Remote Sens. 2024, 16, 533. [Google Scholar] [CrossRef]
  26. Xu, H.; Huai, Y.; Zhao, X.; Meng, Q.; Nie, X.; Li, B.; Lu, H. SK-TreePCN: Skeleton-embedded transformer model for point cloud completion of individual trees from simulated to real data. Remote Sens. 2025, 17, 656. [Google Scholar] [CrossRef]
  27. Ciou, T.-S.; Lin, C.-H.; Wang, C.-K. Airborne LiDAR point cloud classification using ensemble learning for DEM generation. Sensors 2024, 24, 6858. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The working principle of MBES and a schematic diagram of the collected multibeam point-cloud data.
Figure 1. The working principle of MBES and a schematic diagram of the collected multibeam point-cloud data.
Remotesensing 18 00439 g001
Figure 2. GPRAformer network structure diagram.
Figure 2. GPRAformer network structure diagram.
Remotesensing 18 00439 g002
Figure 3. RaKANsformer block structure diagram.
Figure 3. RaKANsformer block structure diagram.
Remotesensing 18 00439 g003
Figure 4. Actual measurement data collection area and actual measurement data display.
Figure 4. Actual measurement data collection area and actual measurement data display.
Remotesensing 18 00439 g004
Figure 5. Noise segmentation performance comparison on MBES point-cloud data A (top view).
Figure 5. Noise segmentation performance comparison on MBES point-cloud data A (top view).
Remotesensing 18 00439 g005
Figure 6. Noise segmentation performance comparison on MBES point-cloud data A (front view).
Figure 6. Noise segmentation performance comparison on MBES point-cloud data A (front view).
Remotesensing 18 00439 g006
Figure 7. Enlarged view of the exposed pipes in point-cloud data A.
Figure 7. Enlarged view of the exposed pipes in point-cloud data A.
Remotesensing 18 00439 g007
Figure 8. Noise segmentation performance comparison on MBES point-cloud data B (top view).
Figure 8. Noise segmentation performance comparison on MBES point-cloud data B (top view).
Remotesensing 18 00439 g008
Figure 9. Noise segmentation performance comparison on MBES point-cloud data B (front view).
Figure 9. Noise segmentation performance comparison on MBES point-cloud data B (front view).
Remotesensing 18 00439 g009
Figure 10. Enlarged view of the exposed pipes in point-cloud data B.
Figure 10. Enlarged view of the exposed pipes in point-cloud data B.
Remotesensing 18 00439 g010
Figure 11. The effect of noise removal on MBES data A.
Figure 11. The effect of noise removal on MBES data A.
Remotesensing 18 00439 g011
Figure 12. The effect of noise removal on MBES data B.
Figure 12. The effect of noise removal on MBES data B.
Remotesensing 18 00439 g012
Figure 13. Sensitivity analysis of the learning rate and epochs on the performance of the model.
Figure 13. Sensitivity analysis of the learning rate and epochs on the performance of the model.
Remotesensing 18 00439 g013
Table 1. GPRAformer network parameter explanation.
Table 1. GPRAformer network parameter explanation.
CategorySpecification
Attention Heads per block8
KNN Neighborhood Size32
Rational function numerator degree2
Rational function denominator degree2
Initial Learning Rate0.001
Batch Size16
Training Epochs200
Table 2. Explanation of the network dimensions of the GPRAformer.
Table 2. Explanation of the network dimensions of the GPRAformer.
CategorySpecification
Input dimensions[16, 50,000, 3]
Output dimension of PIPE feature sampling[16, 128, 32, 7]
Input dimension of RaKANsformer feature extraction[16 × 128, 7, 32]
Output dimension of RaKANsformer feature extraction[16, 128, 256]
Output dimensions[16, 50,000, 2]
Table 3. Average comparison results of different algorithms.
Table 3. Average comparison results of different algorithms.
MethodmIoU (%)Accuracy (%)F1-Score (%)Recall (%)
RASCAN [10]42.2169.8856.8875.06
KANFilter [21]81.1092.4780.5785.62
FGPointKAN++ [22]70.8593.8282.9385.75
MGFE-T [20]47.3970.7657.5475.52
PCT [18]85.1185.4677.1985.46
Proposed91.9495.6088.0591.95
Table 4. Module Ablation experiment metric calculation results (✓ indicates that the module is included, × indicates that the module is not included).
Table 4. Module Ablation experiment metric calculation results (✓ indicates that the module is included, × indicates that the module is not included).
RaKANPIPECALmIoU (%)Accuracy (%)F1-Score (%)Recall (%)
××87.5191.8679.8388.19
×89.9594.6385.0789.71
91.9495.6088.0591.95
Table 5. Computational cost, inference speed of different methods on the MBES exposed-pipeline dataset.
Table 5. Computational cost, inference speed of different methods on the MBES exposed-pipeline dataset.
MethodParams (M)Inference Time (s)
RANSAC-2.8
PCT3.04.5
MGFE-T8.45.3
KANFilter2.68.0
FGPointKAN++10.88.7
GPRAformer7.211.2
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, J.; Dai, S.; Jiang, W.; Cui, X.; Li, J. GPRAformer: A Geometry-Prior Rational-Activation Transformer for Denoising Multibeam Sonar Point Clouds of Exposed Subsea Pipelines. Remote Sens. 2026, 18, 439. https://doi.org/10.3390/rs18030439

AMA Style

Zhang J, Dai S, Jiang W, Cui X, Li J. GPRAformer: A Geometry-Prior Rational-Activation Transformer for Denoising Multibeam Sonar Point Clouds of Exposed Subsea Pipelines. Remote Sensing. 2026; 18(3):439. https://doi.org/10.3390/rs18030439

Chicago/Turabian Style

Zhang, Jingyao, Song Dai, Weihua Jiang, Xuerong Cui, and Juan Li. 2026. "GPRAformer: A Geometry-Prior Rational-Activation Transformer for Denoising Multibeam Sonar Point Clouds of Exposed Subsea Pipelines" Remote Sensing 18, no. 3: 439. https://doi.org/10.3390/rs18030439

APA Style

Zhang, J., Dai, S., Jiang, W., Cui, X., & Li, J. (2026). GPRAformer: A Geometry-Prior Rational-Activation Transformer for Denoising Multibeam Sonar Point Clouds of Exposed Subsea Pipelines. Remote Sensing, 18(3), 439. https://doi.org/10.3390/rs18030439

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop