A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry

Han, Rui; Hong, Yukai; Han, Xibin; Zhang, Yi; Hu, Shunming; Huan, Yuan; Cui, Xiaodong; Li, Xiaohu

doi:10.3390/jmse14030285

Open AccessArticle

A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry

by

Rui Han

^1,2,3,

Yukai Hong

^1,2,

Xibin Han

^1,2,*

,

Yi Zhang

^1,2,

Shunming Hu

⁴,

Yuan Huan

^1,2,

Xiaodong Cui

³ and

Xiaohu Li

^1,2

¹

State Key Laboratory of Submarine Geoscience, Hangzhou 310012, China

²

Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China

³

College of Geodesy and Geomatics, Shandong University of Science and Technology, Qingdao 266590, China

⁴

School of Earth Sciences and Engineering, Hohai University, Nanjing 211100, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(3), 285; https://doi.org/10.3390/jmse14030285

Submission received: 15 December 2025 / Revised: 9 January 2026 / Accepted: 13 January 2026 / Published: 30 January 2026

(This article belongs to the Special Issue Advances in Altimetry Technologies in Marine Observation)

Download

Browse Figures

Versions Notes

Abstract

With the rapid development and widespread application of multibeam echo-sounding systems, large-scale and high-resolution seafloor topography can be efficiently acquired, enabling precise mapping of seabed terrain. However, due to complex oceanographic conditions, instrumental noise, and acoustic interferences, the acquired multibeam data often contain outliers that deviate from the true seafloor surface. These outliers can distort the representation of seafloor topography, adversely affecting subsequent geological analysis and engineering applications. To address this issue, a hybrid outlier detection method combining CUBE filtering with the Isolation Forest (IForest) algorithm, termed CUBE-IForest, is proposed. The method first employs CUBE filtering to remove gross outliers based on local uncertainty estimation, followed by the application of IForest to identify subtle anomalies in the refined data, achieving hierarchical detection of outliers. Experimental results based on in situ multibeam bathymetric data from the northeastern Pacific demonstrate that compared with traditional filtering methods the CUBE-IForest approach significantly improves detection accuracy and reduces both false positive and false negative rates by approximately 30%, confirming its efficiency and reliability in seafloor mapping and analysis.

Keywords:

multibeam bathymetry; outlier removal; isolation forest

1. Introduction

With the rapid development of marine resource exploitation, seabed topographic mapping, and marine engineering construction, multibeam bathymetric technology, as an efficient and high-precision means of seabed terrain detection, has been widely applied in the field of ocean mapping [1,2,3,4]. By transmitting multiple acoustic beams and receiving their reflected signals, multibeam echo sounder (MBES) systems can rapidly acquire three-dimensional bathymetric data over large seabed areas, providing essential data support for marine scientific research, seabed resource exploration, and subsea engineering construction [5]. However, in practical surveys, multibeam bathymetric data are often affected by various factors, such as instrumental noise, environmental interference, and data acquisition errors, resulting in the presence of outliers in the dataset. These outliers not only degrade data quality but also mislead subsequent data analysis and decision-making [6,7]. Therefore, efficiently and accurately detecting and eliminating outliers in multibeam bathymetric data has become an important research topic in the field of ocean mapping.

Outlier detection in multibeam bathymetric data is usually performed using filtering methods, which are mainly categorized into interactive filtering and automatic filtering [8,9]. Interactive filtering methods rely on human–computer interaction and generally provide high filtering quality; however, they are time-consuming and inefficient when dealing with massive datasets. Researchers [10,11,12] including Peng Gangyue have also proposed a two-stage filtering method based on relative density, following the “small-threshold preliminary filtering plus clustering-based data recovery” approach. After removing gross errors using a small threshold, the eliminated data are partitioned and processed with a clustering algorithm combined with least-squares surface fitting and a cluster-boundary filtering rule, thereby balancing the trade-off between gross error elimination and valid data preservation. However, this method exhibits low computational efficiency when processing large-scale datasets, is sensitive to clustering parameters, and may produce ambiguous cluster boundaries in complex terrains such as steep slopes or gullies, leading to the misclassification of valid terrain points. Dong Jiang et al. [13] proposed a trend surface filtering method that fits the seabed topography by constructing a polynomial surface function based on depth data and planar position coordinates. This method performs well in detecting outliers when the seabed terrain is relatively flat; however, it fails to accurately represent the true seabed morphology in complex terrain conditions. Zhao Xianghong et al. [14] proposed a gross error detection method for multibeam bathymetric data based on a BP neural network. By constructing and training a multilayer feedforward network, the method fits the complex curves of single-ping data, and, in combination with correlation analysis and vertical inspection of adjacent pings, achieves gross error localization and removal. However, its performance strongly depends on the quality of the training samples, involves high computational complexity, and tends to ignore geomorphic features in areas with significant terrain undulations. To address the challenges of outlier detection in underwater terrain data, Li proposed an optimized stepwise outlier detection algorithm [15] combining DBSCAN and Isolation Forest (DBSCAN-IForest). By automatically determining the neighborhood radius and minimum points parameters of the DBSCAN algorithm, the method reduces the influence of manual intervention on the detection results. Experimental results demonstrate that this algorithm achieves a significantly higher detection rate of outliers than traditional methods. However, the effects of sample limitations and environmental variables on the algorithm’s generalization capability still require further investigation. The comparison of traditional methods is presented in Table 1.

Although significant progress has been made in denoising and outlier detection methods for multibeam bathymetric data, several challenges remain [20,21,22,23,24]. Most traditional algorithms rely heavily on empirical parameter tuning, making them difficult to adapt to complex and variable seabed topography. Although intelligent algorithms exhibit adaptive capabilities, they often struggle to strike a balance between detection accuracy and computational efficiency [25]. The multibeam bathymetric data used in this study were collected from a typical tectonic geomorphological region in the northeastern Pacific Ocean, where seamounts and valleys are distributed alternately. The local terrain is highly rugged, with frequent slope transitions and significant elevation changes, representing a complex and challenging scenario for multibeam mapping applications. High-density coverage surveys were conducted along multiple survey lines in this area, and attitude compensation, sound-speed correction, and preliminary quality control were applied to the raw data. Nevertheless, due to complex sea conditions, strong near-surface sound-speed variations, and numerous slope breaks within the survey region, a considerable number of noise points and prominent outliers remained after data inversion. These noisy points were concentrated not only along seamount flanks and valley areas but also in relatively gentle terrains, resulting in locally fragmented surfaces, abrupt elevation discontinuities, and strip-like artifacts in the initial terrain model.

To improve data quality, we tested a variety of traditional denoising methods, including clustering-based algorithms and mainstream processing techniques such as CUBE filtering. Although these methods can effectively remove some significant outliers in smooth or moderately undulating areas, two major challenges persist in regions with strong geomorphic variability, such as seamounts and stepped slopes [26,27]. On the one hand, to avoid mistakenly removing true terrain features, parameter settings are typically conservative, causing many subtle and hidden anomalous points to remain undetected. On the other hand, in areas with abrupt slope changes and highly variable topography, traditional approaches that rely heavily on empirical thresholds tend to misclassify normal terrain points as outliers, resulting in excessive smoothing or “filter-wave” artifacts that compromise terrain authenticity and continuity.

To address these issues, this study proposes an optimized hybrid outlier detection framework that integrates the CUBE filtering approach with the Isolation Forest (IForest) algorithm. The proposed method first employs CUBE filtering based on local uncertainty estimation to remove significant outliers, and subsequently applies the Isolation Forest algorithm to accurately detect subtle and clustered anomalies. This stepwise strategy enhances detection accuracy and robustness under varying terrain conditions. Validation using real multibeam bathymetric data demonstrates that the proposed CUBE-IForest (CIF) method effectively identifies both significant and subtle outliers, achieving substantially higher detection accuracy and a lower false-positive rate compared with traditional methods.

2. Methodology

2.1. Cube Algorithm

CUBE (Combined Uncertainty and Bathymetry Estimator) is a multibeam bathymetric data processing algorithm proposed by Calder and Mayer [28,29]. Its core concept is to model and propagate the uncertainty of each measurement point and to select the most reliable depth value based on Bayesian estimation [30].

For multiple depth observations within the same grid cell, let each measurement be denoted as

z_{i}

, the associated uncertainty is denoted as

σ_{i}

. CUBE computes the expected depth using either maximum likelihood estimation or minimum variance estimation:

z^{*} = \frac{\sum_{i} \frac{z_{i}}{σ_{i}^{2}}}{\sum_{i} \frac{1}{σ_{i}^{2}}},

(1)

The interval for i is [1, n], where n represents the total number of depth observations within that specific grid cell. A measurement is considered an outlier if it deviates from the current depth estimate by more than a specified multiple.

2.2. Isolated Forest Algorithm

Isolation Forest (IForest) is an efficient unsupervised learning algorithm for detecting outliers in data [31,32,33,34]. It constructs decision trees by randomly selecting features and split points to isolate data points. Outliers are typically few in number and differ significantly from normal data, making them more easily identified through random splits. Since outliers occupy a small proportion of the dataset and their feature distribution deviates from that of normal points, normal data tend to cluster in dense regions, whereas outliers can be isolated into separate subspaces with only a few random splits due to their distinctive features. Unlike traditional distance- or density-based methods, Isolation Forest detects outliers by directly exploiting the property that outliers are more easily isolated. This approach demonstrates significant advantages when handling high-dimensional data and large-scale datasets [35,36,37].

Algorithm Implementation Steps:

Construct Isolation Forest: Randomly select subsamples and recursively partition them to generate multiple isolation trees. In each isolation tree, recursively choose a feature and split value to divide the data into left and right subtrees.
Calculate Path Length: The path length is defined as the number of splits required from the root node to the leaf node. For each data point, calculate the average number of splits needed for isolation across all trees. Anomalous points typically have shorter path lengths (because they are easier to isolate quickly).
Anomaly Determination: The anomaly score s is computed, where s ≈ 1 indicates an anomaly, and s ≈ 0 indicates normal data.

Anomaly Score: The score is calculated based on the average path length from multiple isolation trees:

S_{(x, n)} = 2^{- \frac{E (h (x))}{C (n)}}

(2)

where the value range of the abnormal score s is designed to be within [0, 1]. The use of an exponential function with base 2 is to map the exponential part of the path length to a probability or score interval.

h (x)

: Represents the path length.

E (h (x))

: Is the average path length of data point x across all trees.

C(

n

): Is the normalization factor, which is related to the size of the dataset n.

Score Range: s ∈ [0, 1], with values closer to 1 indicating a higher likelihood of being an anomaly.

2.3. Constructing the Model for the Anomaly Detection Algorithm

CUBE filtering computes statistical measures for the neighborhood of each point in the three-dimensional data and removes obvious outliers by setting thresholds based on these statistics. The Isolation Forest algorithm is combined with CUBE filtering through feature-level fusion to achieve efficient detection of data outliers, the process is shown in Figure 1. For each point p, points within its neighborhood radius are considered to construct a covariance matrix. Subsequently, features of the filtered data, including the k-nearest neighbor (KNN) average distances

d_{a v g}

, distance variance

σ_{d}^{2}

, inverse local density

\frac{1}{ρ}

, planarity

P

, curvature

C

, along with the original coordinates X, Y, Z, are used to construct feature vectors. These feature vectors are then fed as input to the Isolation Forest algorithm.

F = [w_{x} X, w_{y} Y, w_{z} Z, w_{d_{a v g}} d_{a v g}, w_{σ_{d}^{2}} σ_{d}^{2}, w_{\frac{1}{ρ}} (\frac{1}{ρ}), w_{P} P, w_{C} C]

(3)

The weights

w_{*}

are determined using grid search and are configured as follows:

w_{x} = w_{y} = w_{z} = 1.0, w_{d_{a v g}} = 2.5, w_{σ_{d}^{2}} = 2.0, w_{\frac{1}{ρ}} = 1.8, w_{P} = w_{C} = 3.0,

The planarity

P

and curvature

C

are defined as:

P = \frac{λ_{1} - λ_{2}}{λ_{1} + λ_{2} + λ_{3}}, C = \frac{λ_{3}}{λ_{1} + λ_{2} + λ_{3}}

where

λ_{1} \geq λ_{2} \geq λ_{3}

are the eigenvalues of the neighborhood covariance matrix.

In the Isolation Forest phase, different weights are assigned to different types of features. High weights are given to geometric features

(w_{P}, w_{C} \geq 3.0)

. Because seabed terrain anomalies often manifest as sudden changes in curvature or planarity. Assigning higher weights enhances the model’s sensitivity to structural deformations. For statistical features, weights

(w_{d_{a v g}}, w_{σ_{d}^{2}}, w_{\frac{1}{ρ}} \approx 2.0)

are used to suppress random noise caused by sea conditions while preserving density gradient anomalies. The coordinates are given low weights to prevent the spatial coordinates from dominating the segmentation process, which could mask local patterns. This approach retains spatial location information while emphasizing local statistical and geometric characteristics.

3. Experimental Validation and Analysis

3.1. Data Description and Parameter Settings

The data selected for the experiment are derived from the real-world water depth measurements of the R2 Sonic 2024 multibeam sonar system in a specific region of the Northeast Pacific. The dataset covers an area of approximately 1900 m × 1600 m, containing a total of 158,720 discrete water depth data points. After applying corrections for installation deviations, attitude, sound velocity, and tide level, as shown in Figure 2, numerous anomalous data points were identified. These anomalies led to significant false topography, which severely impacted the accuracy of the true seafloor digital elevation model (DEM) and the contour map. Therefore, these outliers must be filtered out.

CUBE filtering was applied for data cleaning, feature extraction, and standardization. Missing values, duplicates, and obviously erroneous data were removed, and outlier records with depth values outside a reasonable range were excluded. Essential features were extracted for each depth point, including spatial coordinates and depth values. These features served as inputs for the Isolation Forest algorithm to identify potential outliers. Considering the potential differences in scale and range among features, the data were standardized using the Z-score method [38]. This process eliminates the influence of differing scales, allowing all features to be compared on a uniform basis, thereby enhancing the performance and stability of the Isolation Forest algorithm.

After preprocessing the raw data, feature selection was performed to identify the subset of features that contributed most significantly to outlier detection. The dataset comprising the selected features was then used to train the Isolation Forest algorithm, during which key parameters of the model were systematically adjusted to identify the optimal configuration. These included the number of trees, the maximum depth of each tree, and the number of samples used for splitting, among other key parameters. Cross-validation was employed to ensure that the selected parameters performed well across different datasets. Finally, the anomaly score threshold was used to quantify the degree of anomaly for each data point, enabling more accurate identification of potential outliers.

In anomaly detection, the setting of thresholds directly affects the accuracy and reliability of the final results. In this study, a sliding window approach was employed for threshold determination. The data were divided into multiple windows, with thresholds computed independently within each window and used to assess anomalies. Local thresholds were applied in dense regions, while sparse regions reverted to a global threshold to prevent misclassification due to uneven data distribution. As shown in Figure 3, the histogram exhibits two main peaks, indicating distinct grouping of thresholds across different depth regions. The data exhibit a right-skewed distribution, which is attributed to the pronounced terrain variations in the region. Landslide areas are characterized by locally high curvature and low density, with gradual transitions at the edges to normal terrain.

The anomaly thresholds exhibit significant variation with seabed depth, as illustrated in Figure 4. The individual thresholds indicated by orange points fluctuate considerably, whereas the layered average thresholds represented by the green line clearly display the trend with depth. The threshold rises sharply (−6000 to −5500 m), stabilizes (−5500 to −4500 m), then decreases, reflecting depth-dependent threshold heterogeneity. In deeper regions (below −5500 m), the thresholds are relatively low, possibly due to sparse anomaly distribution or flatter terrain. Between −5500 m and −4500 m, the thresholds increase rapidly, indicating the presence of complex geomorphic structures in this region and necessitating stricter anomaly detection criteria. Beyond −4500 m, the thresholds gradually decrease, which may reflect smoother terrain or reduced anomaly density. After threshold determination, anomaly detection was performed on the data, as shown in Figure 5. In the figure, blue points represent normal values, while red points indicate detected outliers.

3.2. Other Methods

To validate the feasibility of the CUBE-IForest (CIF) method for detecting outliers in multibeam bathymetric data, three commonly used outlier detection methods were selected for comparison: the CUBE filtering method, a neural network-based filtering method, and the DBSCAN method.

CUBE filtering, based on spatial neighborhood analysis, is widely used for outlier removal in bathymetric data. By analyzing the distribution of depth values within a neighborhood, outliers deviating from the normal range can be identified and removed. Specifically, a threshold of three times the mean deviation was employed to detect outliers in this study. The number of outliers detected by the CUBE filtering method, based on a fitted seabed surface model, is shown in Figure 6, where blue points represent normal values and red points indicate outliers.

The DBSCAN method is a density-based spatial clustering algorithm that distinguishes normal points from outliers using a neighborhood radius (Eps) and a minimum number of points (MinPts) [39]. In this study, the parameters were determined using the K-distance method, with the elbow point of 0.22 selected as the neighborhood radius and the minimum number of points set to 10. The experimental results are presented in Figure 7.

The neural network-based filtering methods leverage deep learning techniques to learn the complex nonlinear features of the data through model training, thereby enabling automatic identification and removal of outliers. In this study, an autoencoder was selected as a representative neural network-based filtering method. This method fits a surface model to the data during training and employs a threshold of three times the mean deviation for outlier detection. The experimental results are shown in Figure 8, where blue points represent normal values and red points indicate detected outliers.

By comparing the detection performance of different algorithms for abnormal data, it can be seen that the CIF method significantly outperforms the traditional Cube, DBSCAN, and neural network methods. The experimental results show that the Cube method, due to its lack of consideration of local terrain continuity, leads to overly sensitive detection and serious false alarm phenomena, misclassifying a large number of normal terrain undulations as outliers; the DBSCAN method, although able to identify some abnormal clusters, has poor robustness in density-uniform regions, resulting in blurred and disordered abnormal detection edges and a chaotic distribution; and the neural network method (Figure 4) shows obvious insufficient generalization ability and fails to effectively extract abnormal features, causing large-scale missed detections. In contrast, the improvement point of the CIF method lies in its outstanding feature discrimination and spatial consistency: it can not only accurately capture the real abnormal blocks in clustered distribution, avoiding discrete misjudgment, but also can sensitively distinguish the subtle differences between “sensor noise” and “natural terrain undulations”, maintaining the integrity of the terrain topology structure, and achieving extremely high detection precision and recall rates, demonstrating the superiority of this algorithm in handling complex 3D terrain data.

3.3. Sensitivity Analysis

To further evaluate the stability and reliability of the model, this section conducts a two-factor sensitivity analysis on two key hyperparameters: the number of decision trees and the pollution rate threshold. The experimental results are shown in Figure 9. The values in the figure represent the prediction errors of the model under specific parameter combinations.

From the horizontal trend of the heat map, it can be seen that the pollution rate is the dominant factor affecting the model’s performance. When the pollution rate gradually increases from 0.01 to 0.1, the model’s error significantly rises (the color changes from light yellow to deep red).

Under the condition of a fixed number of decision trees, a lower pollution rate setting leads to better performance indicators [40]. This indicates that the model is sensitive to the prior assumption of the abnormal proportion in the data. An excessively high set pollution rate will cause the model to mistakenly classify some normal samples as abnormal samples. From a vertical perspective, increasing the number of decision trees helps improve the accuracy and stability of the model, but the marginal effect will gradually weaken.

When n_estimators (the number of trees) increases from 50 to 250, the values of each column show a downward trend, and the model’s performance gradually improves.

In particular, in a high pollution rate environment (such as 0.1), increasing the number of trees can significantly alleviate the decline in performance; while in a lower pollution level environment, only a relatively small number of trees (such as 150) are needed to achieve a relatively stable growth effect.

The experimental results show that in order to ensure the accuracy of the prediction, it is advisable to preferentially select a higher number of decision trees to enhance the integration effect of the model, and carefully set the pollution rate parameter based on the actual business situation.

3.4. Contour Analyzing

Furthermore, by comparing the original data with the processed data and depth contour maps, the effectiveness of the proposed method in detecting outliers in regions with significant seabed relief, as well as its impact on data quality, can be visually observed.

In the contour map of the original data, noticeable anomalous fluctuations and abrupt changes in local contour lines are observed in certain regions (Figure 10a). These fluctuations are caused by outliers arising from noise or environmental interference during data acquisition, which prevent the contours from accurately reflecting the continuity and smoothness of the seabed topography. The contour map generated from the data processed by the CUBE-IForest method (Figure 10b) shows well-constrained edges, without dense contour clustering, accurately representing terrain features and minimizing the influence of outliers on the model. Compared with the original data, the CUBE method smooths the surface effectively, but some outliers still cause local distortions in the contour lines (Figure 10c). DBSCAN effectively identifies isolated outliers, reducing the impact of noise, and generates smoother contour maps compared with the original data and the CUBE method; however, some minor outliers remain (Figure 10d). Although the neural network provides a good fitting performance, some outliers remain unremoved, resulting in locally dense contour lines (Figure 10e). Overall, the CUBE-IForest method demonstrates the best performance in outlier removal, producing the clearest and smoothest contour maps.

3.5. Processing of Sea Mountain Data

To further evaluate the detection accuracy and computational efficiency of the proposed CUBE-IForest method in complex seabed terrains, it was applied to a set of multibeam bathymetric data from a typical seamount region. In the original terrain, multiple anomalous spikes and noise artifacts were observed on the seamount slopes and adjacent seabed areas (Figure 10a). These outliers are primarily caused by multiple reflections of acoustic waves on steep slopes and inconsistencies in beam footprints, which significantly compromise the accuracy and continuity of the terrain data.

The original seabed terrain exhibits noticeable deficiencies: in certain areas highlighted by red circles (Figure 11a), anomalous protruding noise points interfere with the accurate interpretation of the terrain, while the presentation of terrain details is relatively blurred, and the hierarchical structure of the seabed morphology is insufficiently clear. After processing with the CUBE-IForest method (Figure 11b), the visual representation of the terrain is significantly improved. Noise is largely filtered, the color gradation of the seabed becomes more distinct, and the overall contours are more regular, enabling a clearer depiction of the general seabed morphology. The CUBE method (Figure 11c) provides some noise suppression, but residual local anomalies remain, and the restoration of terrain details is insufficient. Compared with the CUBE-IForest method, its optimization of seabed morphology is relatively limited. DBSCAN processing (Figure 11d) improves terrain representation to some extent and reduces part of the noise; however, residual noise is still evident, and the preservation of seabed details remains suboptimal, resulting in a less satisfactory overall terrain depiction. Neural network processing (Figure 11e) effectively removes most noise interference, although the terrain in the lower-left corner remains affected by noise. While it preserves the undulations and structural details of the seabed well, the overall boundaries of the seabed terrain remain unclear.

In summary, the optimization effects of different data processing methods on seabed terrain vary significantly. Considering noise filtering, preservation of terrain details, and the overall contour representation, the CUBE-IForest method demonstrates superior performance in optimizing seabed terrain data, clearly identifying both the variations in elevation and local geomorphic features. It achieves an effective balance between noise suppression and detail preservation, providing more reliable visual and data support for precise analysis and study of seabed topography.

4. Discussion

The map of the data generated after processing is shown in Figure 12. In the proposed model, the CUBE algorithm is first employed to estimate depth values and their associated uncertainties at a local scale, enabling spatial gridding and physically constrained pre-screening of the data. This produces a “preliminary smoothed terrain” with uncertainty indicators, reducing extreme outliers. Building on this, an Isolation Forest model is introduced to perform detailed identification and removal of residual and concealed outliers. A sliding window mechanism combined with dynamic thresholding is employed to address the instability of detection thresholds caused by variations in data distribution across different terrain environments. By moving the window spatially and continuously analyzing local anomaly distribution characteristics, the decision threshold for IForest anomaly scores is dynamically adjusted, allowing the detection process to adapt to local variations in complex terrains, such as seamounts and canyons. The original data and processing data are shown in Table 2. Compared with the raw data and processing, the data statistics are slightly fluctuating and the distribution characteristics of normal data are not changed.

To compare the efficiency of the proposed method with traditional approaches in detecting outliers, precision was introduced as an evaluation metric [41]. The Precision measures the proportion of correctly predicted positive samples among all samples predicted as positive, reflecting the accuracy of the model’s positive predictions. The Recall measures the proportion of true positive samples that are correctly identified by the model, indicating how many actual positive instances are detected. The F1-score is the harmonic mean of precision and recall, providing a combined measure that balances both metrics. A higher F1-score indicates a better balance between precision and recall.

\begin{matrix} P r e c i s i o n = \frac{T P}{F P + T P} \\ R e c a l l = \frac{T P}{F N + T P} \\ F - s c o r e = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n} \end{matrix}

Here TP refers to true positive, which is a correctly identified outlier. In this study, TP is done by manual inspection by experts; FP refers to false positive, i.e., a normal point incorrectly identified as an outlier; and FN refers to false negative, i.e., an outlier incorrectly classified as normal. Precision evaluates the effectiveness of outlier detection, with higher values indicating better performance.

The performance of the four methods was compared through experiments, and the specific results are summarized in Table 3.

As shown in Figure 13, the CUBE-IForest method demonstrates superior performance in detecting outliers in multibeam bathymetric data. It outperforms other methods in terms of precision, recall, and F-score, while maintaining reasonable computational efficiency. Although its recall is 81.41%, the very low false positive rate indicates that high-quality outlier detection can be achieved with minimal false alarms. In applications sensitive to false positives and requiring a balance between detection quality and efficiency, the CUBE-IForest method demonstrates a clear technical advantage.

According to the evaluation metrics, the model has improved by approximately 10% in terms of accuracy, recall rate, and F-score compared to traditional methods. Through the comprehensive enhancement of these three key indicators, the overall performance of the model has increased by 30%. This improvement not only reflects the improvement in a single metric but also demonstrates the all-round progress of the model in multi-dimensional evaluation.

5. Conclusions

The research integrates existing technologies and has its unique features in the application of multi-beam bathymetric data [42,43]. Different from previous methods, our approach not only improves the efficiency of data processing but also maintains high accuracy, even in high-noise environments. This study has solved the problem of identifying outliers in multi-beam bathymetric data, especially the limitations of traditional algorithms in terms of efficiency and accuracy. A hybrid outlier detection method combining CUBE filtering and the Isolation Forest algorithm (CUBE-IForest) has been proposed. This method utilizes the spatial consistency advantage of CUBE filtering and the statistical discriminative ability of the Isolation Forest to conduct hierarchical detection of various types of outliers, thereby improving the efficiency and reliability of data cleaning. The verification using in situ multi-beam data from the Northeast Pacific Ocean indicates that this method demonstrates outstanding overall performance in terms of detection accuracy, computational efficiency, and adaptability.

Although the proposed method was validated using complex deep-sea geomorphological datasets, its applicability is not restricted to steep or rugged terrain. In flatter or shallow-water environments, the method is expected to remain effective but may exhibit reduced sensitivity due to weaker terrain-induced feature contrasts. In addition, the performance of the method depends on the availability of sufficient local samples; therefore, extremely sparse survey designs or highly irregular beam configurations may limit the robustness of local feature estimation. Compared with the latest deep learning and machine learning techniques for detecting outliers, the advantages lie in the faster processing speed of data and better training data results. However, there are also some limitations. The actual value of the experiment is determined by the artificial determination, and the artificial determination can make full use of the prior experience and background knowledge, but its subjectivity can also lead to certain errors. The experiment is not fully studied on other seabed landforms, which will be discussed in the future.

Overall, the CUBE-IForest method demonstrates strong performance in detection accuracy, computational efficiency, and adaptability, providing an effective approach to enhance the reliability of multibeam bathymetric data and the accuracy of subsequent terrain analysis. This study not only offers valuable insights for multibeam survey data processing but also provides a novel perspective for outlier identification in complex spatial datasets. From an application standpoint, the method facilitates quality control and automatic outlier removal in multibeam bathymetric data, supporting seabed terrain modeling, geohazard identification, and analysis of subsea structural features. However, some limitations remain: the current validation is primarily based on offline data processing, and the algorithm’s stability and computational performance under real-time data streams have not yet been fully assessed. Future research will focus on adaptive parameter optimization, integration with deep learning models, and embedded implementation within real-time measurement systems, aiming to further enhance the robustness and practicality of the method.

Author Contributions

Conceptualization, R.H. and X.H.; Methodology, R.H.; Software, R.H.; Validation, R.H., Y.Z. and Y.H. (Yukai Hong); Formal analysis, R.H.; Investigation, R.H., Y.H. (Yuan Huan) and S.H.; Writing—review & editing, X.H., X.C. and X.L.; Resources, X.H., X.C. and X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by the National Key R&D Program of China (Grant No. 2023YFC2811305, 2023YFC2811205-01), Scientific Research Fund of the Second Institute of Oceanography, MNR (Grant No. SZ2405), and National Natural Science Foundation of China (Grant No. U2244222).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, X.; Zhu, J.; Zhu, Q.; Jiao, Y.; Ding, X.; Liu, Z.; Ding, D.; Jia, Y.; Li, S.; Liu, Y. Fine Processing and Analysis of Multibeam Bathymetric Data Outlier from Surveying and Mapping in the South China Sea. Earth Sci. 2023, 50, 535–550. [Google Scholar] [CrossRef]
Durap, A. Interpretable machine learning for coastal wind prediction: Integrating SHAP analysis and seasonal trends. J. Coast. Conserv. 2025, 29, 24. [Google Scholar] [CrossRef]
Durap, A. Predicting ocean parameters with explainable machine learning: Overcoming scale and time challenges. Reg. Stud. Mar. Sci. 2025, 90, 104424. [Google Scholar] [CrossRef]
Zhou, S.; Guo, J.; Zhang, H.; Jia, Y.; Sun, H.; Liu, X.; An, D. SDUST2023BCO: A global seafloor model determined from a multi-layer perceptron neural network using multi-source differential marine geodetic data. Earth Syst. Sci. Data 2025, 17, 165–179. [Google Scholar] [CrossRef]
Shaw, J.; Potter, D.P.; Wu, Y. Geomorphic diversity and complexity of the inner shelf, Canadian Arctic Archipelago, based on LiDAR and multibeam sonar surveys. Can. J. Earth Sci. 2020, 57, 123–132. [Google Scholar] [CrossRef]
Cao, D.; Wang, C.; Du, M.; Xi, X. A Multiscale Filtering Method for Airborne LiDAR Data Using Modified 3D Alpha Shape. Remote Sens. 2024, 16, 1443. [Google Scholar] [CrossRef]
Durap, A. Explainable machine learning for bathymetric mapping: Adaptive normalization and feature engineering in complex seabed terrains. Ocean Sci. J. 2025, 60, 52. [Google Scholar] [CrossRef]
Hongjian, X. Study on Seabed 3D Modeling Based on Multi-beam Sounding Anomaly Detection. Appl. IC 2022, 39, 186–187. [Google Scholar] [CrossRef]
Skytt, V.; Kermarrec, G.; Dokken, T. LR B-splines to approximate bathymetry datasets: An improved statistical criterion to judge the goodness of fit. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102894. [Google Scholar] [CrossRef]
Peng, G.; Ji, Y.; Yue, J.; Li, J.; Song, Y. Research on multi-beam bathymetric data filtering method based on cluster analysis. Eng. Surv. Mapp. 2016, 25, 31–34. [Google Scholar] [CrossRef]
Wei, Y.; Jin, S.; Li, S.; Wang, L.; Bian, G.; Wang, M. Automatic recognition and cleaning of outliers in multi-beam bathymetric data using clustering algorithm. Acta Geod. Cartogr. Sin. 2022, 51, 2294–2302. [Google Scholar] [CrossRef]
Hou, T.; Huff, L.C.; Mayer, L. Automatic Detection of Outliers in Multibeam Echo Sounding Data. In Center for Coastal and Ocean Mapping; University of New Hampshire: Durham, NH, USA, 2001. [Google Scholar]
Dong, J.; Ren, L. Filter of MBSS sounding data based on trend surface. Hydrogr. Surv. Charting 2007, 27, 25–28. [Google Scholar]
Zhao, X.; Bao, J.; Huang, C.; Ouyang, Y.; Lu, X.; Huang, X. Detecting outliers of multibeam sounding with BP neural network. Geomat. Inf. Sci. Wuhan Univ. 2019, 44, 518–524. [Google Scholar] [CrossRef]
Li, M.; Su, M.; Zhang, B.; Yue, Y.; Wang, J.; Deng, Y. Research on a DBSCAN-IForest Optimisation-Based Anomaly Detection Algorithm for Underwater Terrain Data. Water 2025, 17, 626. [Google Scholar] [CrossRef]
Wang, J.; Jin, S.; Bian, G.; Cui, Y.; Long, Z. A multi-beam outlier automatic filtering algorithm combining uncertainty and density clustering method. Acta Geod. Cartogr. Sin. 2023, 52, 1669–1678. [Google Scholar] [CrossRef]
Chen, G.Y.; Krzyzak, A. Wavelet-based 3D Data Cube Denoising Using Three Scales of Dependency. Circuits Syst. Signal Process. 2024, 43, 4010–4020. [Google Scholar] [CrossRef]
LI, Q.; Han, B.; Wang, X. Multidimensional Data Anomaly Detection Method Based on Fuzzy Isolated Forest Algorithm. Comput. Digit. Eng. 2020, 48, 862–866. [Google Scholar] [CrossRef]
Chen, X.; Zhong, M.; Sun, M.; An, D.; Feng, W.; Yang, M. Recovering Bathymetry Using BP Neural Network Combined with Modified Gravity–Geologic Method: A Case Study in the South China Sea. Remote Sens. 2024, 16, 4023. [Google Scholar] [CrossRef]
Yang, F.; Xu, F.; Fan, M.; Bu, X.; Tu, Z.; Yan, X. An Intelligent Detection Method for Different Types of Outliers in Multibeam Bathymetric Point Cloud. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5920710. [Google Scholar] [CrossRef]
Li, Z.; Peng, Z.; Zhang, Z.; Chu, Y.; Xu, C.; Yao, S.; García-Fernández, Á.F.; Zhu, X.; Yue, Y.; Levers, A.; et al. Exploring modern bathymetry: A comprehensive review of data acquisition devices, model accuracy, and interpolation techniques for enhanced underwater mapping. Front. Mar. Sci. 2023, 10, 1178845. [Google Scholar] [CrossRef]
Zhang, K.; Yang, F.; Zhao, C.; Feng, C. Using robust correlation matching to estimate sand-wave migration in Monterey Submarine Canyon, California. Mar. Geol. 2016, 376, 102–108. [Google Scholar] [CrossRef]
Kulanuwat, L.; Chantrapornchai, C.; Maleewong, M.; Wongchaisuwat, P.; Wimala, S.; Sarinnapakorn, K.; Boonya-aroonnet, S. Anomaly Detection Using a Sliding Window Technique and Data Imputation with Machine Learning for Hydrological Time Series. Water 2021, 13, 1862. [Google Scholar] [CrossRef]
Shvetsova, N.; Bakker, B.; Fedulova, I.; Schulz, H.; Dylov, D.V. Anomaly Detection in Medical Imaging with Deep Perceptual Autoencoders. IEEE Access 2021, 9, 118571–118583. [Google Scholar] [CrossRef]
Xiong, Z.; Zhu, D.; Liu, D.; He, S.; Zhao, L. Anomaly Detection of Metallurgical Energy Data Based on iForest-AE. Appl. Sci. 2022, 12, 9977. [Google Scholar] [CrossRef]
Zhou, J.; Koge, H.; Maki, T. Automation of MBES noise reduction: An approach based on seafloor bathymetry features derived from manual editing procedures. Ocean Eng. 2024, 299, 117397. [Google Scholar] [CrossRef]
Meng, J.; Yan, J.; Zhang, Q. Anti-Interference Bottom Detection Method of Multibeam Echosounders Based on Deep Learning Models. Remote Sens. 2024, 16, 530. [Google Scholar] [CrossRef]
Wei, Y.; Jin, S.; Zhao, W.; Gao, Y.; Zhan, X. Application of filtering multibeam sounding with CUBE in deep sea. Hydrogr. Surv. Charting 2024, 44, 12–15+20. [Google Scholar]
Calder, B.R.; Mayer, L.A. Automatic processing of high-rate, high-density multibeam echosounder data. Geochem. Geophys. Geosystems 2003, 4, 1048. [Google Scholar] [CrossRef]
Wang, J.; Jin, S.; Bian, G.; Cui, Y.; Long, Z. Parameter configuration method of CUBE algorithm assimilation model. Hydrogr. Surv. Charting 2022, 42, 14–18. [Google Scholar]
Yuli, L. Multi-source Weighted Sequence Data Mining Method Based on Isolated Forests. Tech. Autom. Appl. 2025, 44, 147–150. [Google Scholar]
Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation Forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Washington, DC, USA, 15–19 December 2008; pp. 413–422. [Google Scholar]
Lifandali, O.; Abghour, N.; Chiba, Z. Feature Selection Using a Combination of Ant Colony Optimization and Random Forest Algorithms Applied To Isolation Forest Based Intrusion Detection System. Procedia Comput. Sci. 2023, 220, 796–805. [Google Scholar] [CrossRef]
Heigl, M.; Anand, K.A.; Urmann, A.; Fiala, D.; Schramm, M.; Hable, R. On the Improvement of the Isolation Forest Algorithm for Outlier Detection with Streaming Data. Electronics 2021, 10, 1534. [Google Scholar] [CrossRef]
Saremi, M.; Bagheri, M.; Agha Seyyed Mirzabozorg, S.A.; Hassan, N.E.; Hoseinzade, Z.; Maghsoudi, A.; Rezania, S.; Ranjbar, H.; Zoheir, B.; Beiranvand Pour, A. Evaluation of Deep Isolation Forest (DIF) Algorithm for Mineral Prospectivity Mapping of Polymetallic Deposits. Minerals 2024, 14, 1015. [Google Scholar] [CrossRef]
Monemizadeh, V.; Kiani, K. Detecting anomalies using rotated isolation forest. Data Min. Knowl. Discov. 2025, 39, 24. [Google Scholar] [CrossRef]
Zhou, P.; Chen, J.; Wang, S. A Dual Robust Strategy for Removing Outliers in Multi-Beam Sounding to Improve Seabed Terrain Quality Estimation. Sensors 2024, 24, 1476. [Google Scholar] [CrossRef]
Durap, A. Data-driven models for significant wave height forecasting: Comparative analysis of machine learning techniques. Results Eng. 2024, 24, 103573. [Google Scholar] [CrossRef]
Wang, X.; Bai, Y. The global Minmax k-means algorithm. Springerplus 2016, 5, 1665. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Hong, K.; Yuan, Y.; Lin, Y.-T.; Han, D. The Role of Quantified Parameters on River Plume Structure: Numerical Simulation. J. Mar. Sci. Eng. 2024, 12, 321. [Google Scholar] [CrossRef]
Li, Z.; Wang, J.; Zhang, Z.; Jin, F.; Yang, J.; Sun, W.; Cao, Y. A Method Based on Improved iForest for Trunk Extraction and Denoising of Individual Street Trees. Remote Sens. 2022, 15, 115. [Google Scholar] [CrossRef]
Downing, E.; O’Reilly, L.; Majcher, J.; O’Mahony, E.; Peters, J. A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder. Remote Sens. 2025, 17, 2711. [Google Scholar] [CrossRef]
Lv, X.; Wang, L.; Huang, D.; Wang, S. A Novel Cone Model Filtering Method for Outlier Rejection of Multibeam Bathymetric Point Cloud: Principles and Applications. Sensors 2023, 23, 7483. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flowchart of the proposed method.

Figure 2. Original data. (There are pronounced outliers).

Figure 3. This histogram shows the frequency distribution of sliding local anomaly thresholds (anomaly proportion: 2.8%). The x-axis is the local anomaly threshold (0.425–0.575), and the y-axis is frequency. Key benchmarks (mean: 0.519) are marked.

Figure 4. This scatter plot illustrates local anomaly thresholds vs. depth (−6000 to −3000m). Orange points = point-wise thresholds, green curve = layered mean threshold.

Figure 5. CUBE-IForest method detection results.

Figure 6. CUBE filtering detection results.

Figure 7. DBSCAN detection results.

Figure 8. Neural network detection results.

Figure 9. Interaction: Trees vs. Contamination.

Figure 10. Contour Map. (The red frame visually illustrates the comparison of elimination effects).

Figure 11. Processing results of seamount multibeam bathymetric data. (The presence of extreme outliers may significantly affect the magnitude of color variation) (The submarine topography in red circle has obvious defects.).

Figure 12. Processed by the CUBE-IForest method. (Outliers were removed).

Figure 13. Statistics of evaluation metrics.

Table 1. Method Comparison.

Method	Advantages	Limitations
DBSCAN [16]	Effectively identifies spatially clustered anomalies	Complex parameter tuning, large computational cost
CUBE [17]	Based on uncertainty modeling, high reliability	May retain noise when handling continuous anomalies
IForest [18]	Efficient, adaptable to high-dimensional data	Potential misclassification in noisy environments
Neural Networks [19]	Highly automated, strong at capturing complex patterns	Dependent on data quality, high computational cost

Table 2. Comparison of data.

Statistic	Original Data	CUBE-IForest Processed
Number of points	158,720	147,927
Kurtosis	x (−1.1775), y (−1.1141), z (−0.4247)	x (−1.1771), y (−1.1105), z (−0.4462)
Skewness	x (−0.0731), y (−0.0602), z (0.5631)	x (−0.0811), y (−0.0632), z (0.5598)
Maximum depth (m)	11,306.85	5882.96
Mean depth (m)	4983.33	4879.59
Median depth (m)	4822.62	4713.92
Minimum depth (m)	159.49	3073.19
IQR (m)	x (2800.0000), y (9000.0000), z (965.6677)	x (2600.0000), y (8800.0000), z (959.6309)
Standard deviation (m)	x (1601.7259), y (5301.6828), z (598.0200)	x (1540.5244), y (5208.1366), z (585.3908)

Table 3. Comparison of Method Evaluation Metrics.

Method	TP	FP	FN	Precision	Recall	F-Score	Time
CUBE-IForest	578	36	132	94.14%	81.41%	87.31%	4.76 s
CUBE	375	103	156	78.45%	70.62%	74.33%	2.28 s
DBSCAN	463	112	346	80.52%	57.23%	66.91%	4.53 s
Neural network	618	79	173	88.67%	78.13%	83.07%	13.35 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Han, R.; Hong, Y.; Han, X.; Zhang, Y.; Hu, S.; Huan, Y.; Cui, X.; Li, X. A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry. J. Mar. Sci. Eng. 2026, 14, 285. https://doi.org/10.3390/jmse14030285

AMA Style

Han R, Hong Y, Han X, Zhang Y, Hu S, Huan Y, Cui X, Li X. A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry. Journal of Marine Science and Engineering. 2026; 14(3):285. https://doi.org/10.3390/jmse14030285

Chicago/Turabian Style

Han, Rui, Yukai Hong, Xibin Han, Yi Zhang, Shunming Hu, Yuan Huan, Xiaodong Cui, and Xiaohu Li. 2026. "A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry" Journal of Marine Science and Engineering 14, no. 3: 285. https://doi.org/10.3390/jmse14030285

APA Style

Han, R., Hong, Y., Han, X., Zhang, Y., Hu, S., Huan, Y., Cui, X., & Li, X. (2026). A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry. Journal of Marine Science and Engineering, 14(3), 285. https://doi.org/10.3390/jmse14030285

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid CUBE-IForest Approach for Outlier Detection in Multibeam Bathymetry

Abstract

1. Introduction

2. Methodology

2.1. Cube Algorithm

2.2. Isolated Forest Algorithm

2.3. Constructing the Model for the Anomaly Detection Algorithm

3. Experimental Validation and Analysis

3.1. Data Description and Parameter Settings

3.2. Other Methods

3.3. Sensitivity Analysis

3.4. Contour Analyzing

3.5. Processing of Sea Mountain Data

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI