A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder

Downing, Eoin; O’Reilly, Luke; Majcher, Jan; O’Mahony, Evan; Peters, Jared

doi:10.3390/rs17152711

Open AccessArticle

A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder

by

Eoin Downing

^*

,

Luke O’Reilly

,

Jan Majcher

,

Evan O’Mahony

and

Jared Peters

Green Rebel Marine Limited, Penrose One, Penrose Dock, T23KW81 Cork, Ireland

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(15), 2711; https://doi.org/10.3390/rs17152711

Submission received: 30 May 2025 / Revised: 30 July 2025 / Accepted: 31 July 2025 / Published: 5 August 2025

(This article belongs to the Section Ocean Remote Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

The detection of seabed boulders is a critical step in mitigating geological hazards during the planning and construction of offshore wind energy infrastructure, as well as in supporting benthic ecological and palaeoglaciological studies. Traditionally, side-scan sonar (SSS) has been favoured for such detection, but the growing availability of high-resolution multibeam echosounder (MBES) data offers a cost-effective alternative. This study presents a semi-automated, hybrid GIS-AI approach that combines bathymetric position index filtering and a Random Forest classifier to detect boulders and delineate boulder fields from MBES data. The method was tested on a 0.24 km² site in Long Island Sound using 0.5 m resolution data, achieving 83% recall, 73% precision, and an F1-score of 77—slightly outperforming the average of expert manual picks while offering a substantial improvement in time-efficiency. The workflow was validated against a consensus-based master dataset and applied across a 79 km² study area, identifying over 75,000 contacts and delineating 89 contact clusters. The method enables objective, reproducible, and scalable boulder detection using only MBES data. Its ability to reduce reliance on SSS surveys while maintaining high accuracy and offering workflow customization makes it valuable for geohazard assessment, benthic habitat mapping, and offshore infrastructure planning.

Keywords:

GIS; machine learning (ML); multibeam echosounder (MBES); marine remote sensing; geohazard; boulder; automation; digital elevation model (DEM); remote sensing

Graphical Abstract

1. Introduction

Offshore wind farm developments are becoming central to the ongoing transition from the use of fossil fuels to renewable energy sources. Although the marine environment provides vast energy harnessing opportunities, engineering efforts are complicated by metocean and geological conditions [1]. Extensive reconnaissance and pre-construction surveys are required to provide detailed site characterizations and minimize risks. Detecting seabed objects that may be hazardous to planned developments is one of the goals of these surveys. Boulders are among the most notorious and commonly occurring geohazards to offshore infrastructure projects (e.g., [2]). Their presence and distribution on the seabed play an important role in sites considered for offshore developments during hazard assessments and benthic ecological investigations [3]. They need to be identified to ensure successful subsea engineering, and avoid hindrances to pile driving, dredging, and cable and pipeline lying operations.

Boulders are also important for various geoscientific research strategies outside of their significance to industry pursuits. Palaeoglaciological research uses ex situ boulders that were transported by glaciers and ice sheets (i.e., “erratics”) as proxies for ice motion and behaviour (e.g., [4]). In unglaciated terrain, boulders provide information on regolith formation and weathering, tectonism, volcanism, and erosion [5].

Dense clusters of boulders, known as boulder fields, often create extensive spatial constraints and necessitate designating no-go zones for subsea engineering. However, despite the importance of boulders to offshore development and research, and the inherent importance of dense groups of boulders, the definition of boulder fields remains inconsistent, and their delineation is often unquantified and created to suit individual needs. For example, Ref. [6] differentiates types of boulder fields based on sorting and texture to characterise depositional mechanisms that include tsunamis. Conversely, Ref. [7] delineate boulder fields based on target density in a GIS to identify geogenic reef. These inconsistencies often reflect the goals of individual studies and are appropriate for the intended purpose. However, many fields of research and industries would likely benefit from a more standardised, quantifiable approach that fosters adaptation and comparisons.

Additionally, as boulder fields and individual boulders provide unique and productive hard bottom habitats characterized by diverse benthic communities, their detection and investigation of spatial characteristics informs ecological studies [7,8]. This is evidenced by the frequency that boulders are mentioned in the Interpretation Manual of EU habitats [9] which aids in characterizing coastal seabed habitat types.

Geological seabed data are a common parameter in geospatial site selection analyses for OWE planning [10], and these data are being produced with increasing quality as techniques and equipment improve. Modern offshore surveying techniques allow efficient mapping of the seabed with centimetric accuracy (e.g., [11]). Hull-mounted multibeam echosounder (MBES) and deep-towed side-scan sonar (SSS) devices are standard tools for high-resolution marine remote sensing in many industries (e.g., [12,13,14]). The former is usually used directly for mapping the seabed’s topography (i.e., for providing depth measurements) and indirectly for mapping the surficial seabed substrate (through backscatter intensity measurements) [15]. The latter, deep-towed close to the seabed provides high-resolution backscatter intensity measurements which are mostly used for object detection [16].

Conventionally, SSS backscatter intensity data are used for boulder detection, either through manual picking or applying sophisticated machine learning algorithms [17]. Similar to MBES, SSS provides wide area coverage due to its transducers’ geometry, which in the case of SSS additionally benefits from the device being towed at a controlled altitude above the seabed. In theory this enables higher spatial resolution of seabed images, as the emitted sound travels less distance through the water column than in the case of a ship hull-mounted MBES. Practically, however, SSS data acquisition comes with additional costs and risks; it requires subsea positioning, which is less precise than high-grade surface differential GPS positioning of MBES data, as it requires acoustic relay of information underwater, commonly through Ultra-Short Baseline or USBL devices, before being referenced to GPS [18].

Nonetheless, while SSS data are often collected for industry applications or individual case study projects, high-resolution MBES data are becoming increasingly available as open-source datasets are produced to expand coverage for national and international seabed mapping initiatives (e.g., INFOMAR [19] and NCEI [20]). This enables research [21] and enhances desktop studies for industrial applications (e.g., [22,23]). These data also improve the possibility to detect fine-scale features like boulders using high-resolution MBES bathymetry. However, the data remain relatively unexplored, despite ever increasing availability of such data. Maximising information derived from publicly available marine remote sensing datasets becomes paramount to support the research and engineering initiatives that will be critical in supporting the rapid development of urban centres, increasing clean energy needs, and the renewable energy transition in response to the climate crisis.

The volumes of such datasets entail extensive processing and data interpretation efforts, requiring highly skilled and qualified personnel. Additionally, manual delineation and interpretation of seabed features recorded in the MBES bathymetry remains highly subjective. Together these issues mean that deriving reliable seabed information requires substantial technical training and theoretical knowledge, and that bias can only be reduced to acceptable levels. This problem was previously tackled with research concentrating on the automatic delineation of seabed features and substrates in the MBES swath bathymetry. For example, Object-Based Image Analysis (OBIA) was successfully applied to classify the seabed surface substrates by combining MBES-derived bathymetry and backscatter data [24,25] to extract and classify features like sediment waves [21]. Convolutional Neural Networks (CNNs) have been used to automatically map broad-scale morphology on a large, basin-scale, open-source MBES bathymetry dataset [26]. However, detection of fine-scale objects like boulders in the high-resolution MBES data remains relatively unresearched.

One possible approach to detect such features is to firstly define their length-scale (i.e., approximate size) and differentiate them from the other features of different, broader length-scales. In the context of seabed mapping, this GIS approach is often referred to as bathymetric position index [27,28] or high-pass filter [29]. The method provides a baseline length-scale-dependent seabed morphology classification, which can be refined further to detect fine-scale features of interest [14,30].

Efforts employing machine learning algorithms of various complexity with marine remote sensing datasets, including MBES data, provide promising results for small-scale target detection [31,32]. Nonetheless, they offer mixed recall and accuracy levels, and more case studies are needed to diversify target classification datasets. Further development of this research could facilitate the identification of small targets and the production of reproduceable workflows.

Here we objectively detect contacts corresponding to potential boulders and boulder fields in a high-resolution, open-source MBES dataset. We achieve this by employing an algorithm that was developed to operate following open-source, GIS-based statistical functions with a Random Forest machine learning method. This iterative, semi-automated process is designed to improve the performance of the ML algorithm by enhancing data quality and applicability prior to its use. The demonstrated efficiency of the algorithm to semi-automatically detect large numbers of boulders in high-resolution MBES data gives an additional layer of information to seabed investigation efforts, whether academic- or industry-led. In particular, we recognize this work as relevant for reducing overall risks for offshore wind farm developments, by providing a time- and cost- efficient assessment of the boulder geohazard. Moreover, utilising MBES instead of SSS data helps ensure more reliable positioning, while applying a semi-automatic workflow reduces subjectivity and fatigue during the process compared with conventional manual contact picking. The reduced subjectivity stems from defining potential boulders and boulder fields using quantifiable characteristics of seabed targets in the depth-encoded raster data, rather than applying traditional sedimentological definitions. The algorithmic, semi-automated approach also improves target picking on MBES data to a point where SSS data may be less necessary for some applications, which would reduce environmental impacts through improved operational efficiency, and reduce occupational hazards offshore.

2. Background Information

Here we outline the conventional definitions of boulders and boulder fields that partially guided the initial stages of this paper and informed the development of methodology. These definitions, however, were subsequently revised to better align with the objectives of this paper, particularly to facilitate the automation of boulder and boulder field delineation in multibeam echosounder data.

2.1. Boulder

Sedimentologically, boulders are typically defined and categorized by size as rocks with a diameter greater than 256 mm [33]. This distinguishes them from other clastic sediments with lower diameters, like cobbles (greater than 16 mm) and pebbles (greater than 4 mm) [34]. The ability to detect boulders in any remote sensing data, including the MBES bathymetry, is therefore guided by (and restricted to) its spatial resolution, or pixel size.

Taking into consideration the traditional boulder definition, MBES bathymetry should have a resolution of at least 0.25 m or less, as one pixel may not be enough to objectively resolve small boulders; this standard is roughly consistent with other assessments on remote sensing resolution requirements for boulder detection [35]. However, as demonstrated further in this paper, lower, but reasonably high (submeter), spatial resolutions (like those commonly available open-source e.g., [19]) can still be used to detect medium-sized and large boulders which present considerable obstacles to subsea engineering.

2.2. Boulder Field

Boulder fields can be generally defined as landscapes (both terrestrial and underwater) with extensive spreads of boulders [36]. There are no clear, widely accepted definitions for boulder fields for neither terrestrial nor offshore engineering purposes. A common intuitive approach is to define boulder fields based on the number of boulders per unit area. For instance, Ref. [37] classify subsurface, buried boulder fields based on seismic anomaly (i.e., potential buried boulder) density. Specifically, they separate boulder fields into low, medium, and high density with >50, 50–100 and >100 boulders, respectively, per 100 × 100 m square blocks.

Although boulder field definitions expressed in count per area provide a quantifiable framework, in seabed surface applications it largely depends on how the square block grid used for counting is distributed. It could produce different results for the same site, depending on the alignment with the survey area and overlap of the grid blocks. If the grid used to calculate densities is shifted or rotated, it might include or exclude boulders on the edges of each block, leading to inconsistencies in boulder density calculations. This variability can impact the classification of a boulder field as low, medium, or high density, thereby affecting engineering decisions, resource allocation, and risk assessments for projects in these areas. This approach is also difficult to apply to sites where survey data coverage is irregular and cannot be captured using a regular rectangular cell grid for calculating densities. To address these limitations, in this paper, a clustering algorithm is tested to delineate contact clusters presenting potential boulder fields

3. Materials and Methods

3.1. Methodological Considerations

The development of the workflow presented the authors with the opportunity to revise the boulder and boulder field definitions, as mentioned above, to make them quantifiable in the state-of-the-art multibeam echosounder data. Such revisions of traditional definitions are occasionally necessary in geosciences, especially when (re)considering process-based, or use-specific applications, e.g., [38].

Apart from the revised definitions, in this section some remarks are made regarding the limitations of the MBES or any acoustic geophysical data to differentiate between boulders and other seabed features of similar shapes and sizes.

3.1.1. Boulder

For this investigation, we attempt to define possible boulders using a set of GIS functions and a Random Forest machine learning algorithm application with the minimum exposed diameter equal to or greater than the pixel size of the digital elevation model (DEM) used for detection. Various shape metrics are applied to reduce false positives by narrowing down detection criteria.

3.1.2. Boulder Field

Boulder fields can be defined using GIS cluster analysis tools. For this investigation, we define contact clusters as places, where there are at least 10 contacts close together, and each contact is no more than 100 m away from another closest contact in the same group. Considering ubiquitous use of GIS applications in offshore engineering, especially in offshore wind farm project desktop studies [23], this definition is relatively straightforward to apply and reduces overall ambiguity when delineating boulder fields.

3.1.3. Morphologies and Nomenclature

There are various anthropogenic and natural seabed features that are embedded in the elevation data collected by MBES. They have diverse shapes, sizes, and other inherent characteristics. For example, potentially large and elongated targets include shipwrecks, moraines, drumlins, and sediment waves. Smaller, near-circular objects may include various debris, discarded fishing gear parts, unexploded ordnance objects, and boulders, which this paper aims to detect.

MBES-derived elevation data allows for a clear visual differentiation between the larger and smaller targets; however, accurately classifying smaller target types remains challenging. For instance, differentiating between a boulder and an anthropogenic feature that resembles a boulder is virtually impossible without utilising other types of information. Depending on research objectives and desired accuracy, additional verification—such as magnetometric data or visual inspection via a remotely operated vehicle—might be needed.

Nonetheless, to prevent overstatement of the capabilities of the methodology presented and discussed here, detected features are referred to as ‘contacts’ rather than ‘boulders’, as they might potentially include other morphologies, which cannot be differentiated using remote sensing data alone. Similarly, potential boulder fields are referred to as ‘contact clusters’.

3.2. Study Area

The study area was chosen based on multiple parameters: open access to high-resolution processed multibeam bathymetric data, an area of potential offshore development in the future, and proximity to a large population centre, i.e., New York City.

3.2.1. Background Information and Regional Geology

The open-source bathymetric data were retrieved using the National Centers for Environmental Information (NCEI) bathymetry data viewer [20]. The data selected for this study were acquired during the National Oceanic and Atmospheric Administration’s National Ocean Service ‘H13385’ survey [39] onboard the RV Able 2, RV Osprey, and RV Ready 2. Teledyne RESON SeaBat T50-R MBES was used with Applanix POS MV 320 v5 and Trimble NetR GNSS, which provided positioning and attitude data. Mean Lower Low Water (MLLW) was used to vertically reference the data, representing the average level of the lowest tide for each day computed over a 19-year period. North American Datum of 1983 (NAD83) Universal Transverse Mercator (UTM) zone 18 N with the EPSG code 26918 was used as the horizontal coordinate reference system. The raw data were processed in CARIS HIPS/SIPS Version 10.4.3 software and the Combined Uncertainty and Bathymetry Estimator (CUBE) algorithm [29] was used for data cleaning and generating the final DEM at 0.5 m spatial resolution.

Study area with inset map showing the ‘test site’ bathymetry with colour ramp properties used in manual picking phase. The study site north of Sands Point, Manorhaven, and the greater area of Long Island Sound is part of a c. 200-million-year-old erosional formation that was part of the Appalachian Mountains. Throughout the more recent geological history (c. 3 million years), the North American glaciations began. Glaciation and stream action in the area were the driving force that eroded the top of the coastal-plain sediment [40]. Long Island Sound occupies a lowland that was initially carved into the coastal-plain by rivers, and subsequently glacially modified. The combined erosive effects of the ice advances included re-exposing, wearing down and smoothing of the Appalachian rocks that now form the Connecticut coast, cutting back and sculpting the remaining coastal plain wedge, forming the foundation of the study area, and redistributing eroded material in the form of glacial deposits [40]. These glacial deposits form the basis of boulders that form the majority of seabed contacts picked at this study site. Further ice rafted debris, which can be visible as seabed contacts, i.e., boulders in the case of the study site, were introduced to the area at Long Island Sound through the deglaciation of the Laurentide Ice Sheet during one of the first Heinrich events [41].

3.2.2. Test Site

This 0.24 km² area within the Mamaroneck harbour was chosen as our testing site for the master, manual, and automated picks. The area contains a representative mix of geomorphologies, with areas of singular seabed contacts and areas of high concentrations of seabed contacts, i.e., boulder fields.

3.2.3. Training Site

This 0.62 km² area is also within the overall study area north of Sands Point, Manorhaven, and was chosen as the training site for the algorithm based on several parameters. The training site displays comparable geomorphologies, seabed contacts, and concentrations of seabed contacts as the test site. The training site is geographically contiguous to the overall study area, whilst also maintaining reduced sample bias as the two sites are separated by c. 5 km. The training site has an increased area over the test site to increase the number of samples to train the algorithm.

3.3. Master Pick

A working group was set up to collect a verified contact sample referred to hereafter as “master pick” to establish a baseline to evaluate the performance of the automated and manual picks. The master pick also served as the HydroBoulder Model validation. This group consisted of five marine geophysical professionals with varying experience (combined experience of >40 years). The master pick took place over two 2 h sessions and was performed in the test site displayed in Figure 1. The test site was divided into numbered grids and a standardized bathymetric raster symbology composed of 1 singleband pseudocolor at 75% transparency overlayed on 1 multidirectional hillshade raster. The exercise prioritised real time collaboration and scrutiny over each grid. A “master pick” point shapefile was created to record contacts. To pick a contact, a majority-rule approach was applied. Once three members of the group agreed that a contact was present then a point was added to record its presence.

3.4. Manual Pick

Six marine geophysical professionals of varying levels of experience were chosen to carry out a manual pick of contacts across the test site. The objective was to pick contacts as accurately as possible. Each picker was provided with a *.tif of the test site, an empty point shapefile, and a grid polygon to make the picking more structured. The pickers were given a fixed amount of time (one hour) to complete the task. No standardised symbology was implemented so the pickers could style the *.tif raster data as they saw fit.

3.5. Semi-Automated Pick

The semi-automated contact detection algorithm was developed and tested using QGIS 3.28.15 Model Builder and Python (v. 3.12). The workflow takes 2 DEM inputs, corresponding to test and training sites (Figure 1) and consists of four main steps (Figure 2). A summarised graphical representation is shown in Figure 3. The first step, referred to as the “GIS Filter”, produces two outputs for each set: Semi-Filtered Polygons (SFPs) (Figure 3f) and a slope layer (Figure 3c) derived from the High-Pass Filtered (HPF) DEM (Figure 3b). These GIS layers are then used as inputs for the second step, the “Feature Extractor”, which calculates statistics for both sets. The third step, the “RF Classifier” trains and implements a Random Forest (RF) machine learning algorithm (Figure 3h). In the final step, the “Exporter” takes the positive SFPs (instances where the RF Classifier has identified a contact within the (SFP) of the test site and exports them as a point shapefile. All four steps are described in detail below. After the development phase, the semi-automated algorithm was used separately with the DEM corresponding to the entire study area (3.7).

3.5.1. GIS Filter

The GIS Filter described here was applied to two separate DEMs corresponding to the test site (Figure 3a) and training site and resulted in two equivalent sets of SFP datasets. The initial phase of the GIS Filter algorithm involves applying a high-pass filter to the DEM. This technique produces a detrended HPF DEM (Figure 3b), where pixel values represent heights relative to a specific pixel neighbourhood around them [28,29,42]. The procedure begins with the calculation of focal statistics for the DEM using a moving mean, with an empirically determined circular 3 × 3 pixel window (i.e., neighbourhood), which considers the data’s spatial resolution (0.5 m) and the typical dimensions of the target features [31].

Next, the GDAL slope tool is applied to the HPF DEM (Figure 3c), generating a slope layer that further accentuates features associated with potential contacts. Subsequently, slope contours with a 0.5 m interval are created, as dictated by the DEM’s spatial resolution (Figure 3d). The contour layer is then filtered to retain contours that correspond to slope values > 1° to filter out contours associated with a locally flat seabed that are unlikely to correspond to contacts (Figure 3e). The remaining contours are then converted to polygons, and holes are removed within these newly created polygons (Figure 3f).

Each polygon’s area and compactness are then calculated. To calculate the former, we used the $area function which calculates the ellipsoidal area of each target. To assess the latter, we calculated the IsoPerimetric Quotient (IPQ), also referred to as the Polsby-Popper method [43] or Cox’s circularity metric [44]. This measure is expressed in Table A1. The IPQ quantifies how closely a shape approximates a perfect circle, where values closer to 1 indicate greater compactness and more circular geometries. Polygons are further filtered to retain only polygons with an area >0.5 m² and compactness of >0.75 (Figure 3g). The area criterion can be adjusted based on the resolution of the data and the objectives of the end user, while the compactness limit is chosen to eliminate more elongated seabed features that were determined less likely to be discrete seabed targets based on the morphologies seen in the study area. The remaining polygons are dissolved and separated into single parts, producing both sets (i.e., corresponding to test site and training site) of SFPs. The training site polygon dataset is then manually filtered to create an ideal dataset for the RF training. In total, 2400 SFPs were selected as the final training site dataset, half of which were true positives (i.e., real contacts) and the other half of which were true negatives (i.e., no contact present).

3.5.2. Feature Extractor

The Feature Extractor processes the SFPs produced by the GIS Filter, and the slope of HPF layer associated with these polygons. For each polygon, zonal statistics are calculated using the slope of the HPF comprising mean, min, max, standard deviation, 25th, and 75th percentiles, as shown in Appendix A (Figure A1). Additional metrics are derived from these base zonal statistics, such as Variance, Skewness, Kurtosis, Inter Quartile Range, Peak-to-Mean Ratio, and Peak–to–Trough Difference (Table A1). These metrics are assigned to the training and test sets of SFPs and subsequently used to train and test the RF classifier.

3.5.3. RF Classifier

A Random Forest (RF) classifier [45] was employed to classify polygons (SFPs) from the test site into two groups: contact (1) and non-contact (0). It was implemented using the RandomForestClassifier class from the Scikit-learn library in Python v. 3.12 [46]. This method was chosen for its demonstrated effectiveness in geospatial contexts, particularly in classifying remote sensing data involving small-scale features such as boulders (e.g., [31,47]). Random Forest is well-suited to tasks involving heterogeneous environmental features due to its robustness to noise and capacity to model complex, nonlinear relationships [48,49]. It also performs reliably on small or moderately imbalanced datasets and is less prone to overfitting than many alternatives [50]. Compared with support vector machines or neural networks, RF requires minimal parameter tuning for effective results [51], trains efficiently on tabular data [52], and offers interpretable feature importance metrics [45]. These qualities made RF a suitable and practical choice for the object-based seabed classification task presented here.

The Random Forest algorithm constructs an ensemble of decision trees, each trained on a bootstrap sample of the dataset, and aggregates their predictions through majority voting to achieve robust and accurate classification results [45].

The training dataset consisted of a feature matrix, X_train, and a binary target vector, y_train. Each row of X_train corresponded to a polygon, while the columns represented the extracted features described in the Feature Extraction section. The target vector, y_train, indicated the classification of each polygon, with 1 for contact and 0 for non-contact. Polygons with invalid geometries or insufficient raster coverage were excluded to ensure the integrity of the dataset.

The Random Forest model was configured with 100 decision trees to balance computational efficiency with classification accuracy. Each tree was trained on a bootstrap sample of the data, introducing diversity in the ensemble [45]. Node splitting within the trees was optimized using the Gini index, which measures the impurity of a node as the probability that a randomly selected sample would be misclassified if it were randomly labelled according to the class distribution in that node [53]. To further enhance model robustness and reduce overfitting, the algorithm randomly selected a subset of features at each split. This ensemble approach enabled the model to capture complex, nonlinear relationships within the dataset while maintaining high generalization performance. A fixed random seed (42) was applied throughout the process to ensure reproducibility [46].

After the training process was finished, the Random Forest model was applied to the test site SFP. For each polygon in the test set, the model predicted the probability of being classified as contact (1) or non-contact (0) based on the extracted features. These predictions were stored in a new column, “Contact”, where a value of 1 indicated the presence of contact, and 0 indicated the absence of contact. This predictive process ensured that the model’s decisions were consistent with the patterns learned during training, while the use of a test set validated its performance on previously unseen data.

3.5.4. Exporter

The Exporter filters the test site SFPs where the model predicts contact (1). A new column “QC” is created which utilises four threshold parameters: mean (<1.5°), peak-to-trough (<0.8°), skewness (>2.8°), and area (<0.9 m²). If at least one threshold parameter is met, a check status will be populated in the “QC” column to review. If no threshold parameters are met, a no check status is populated within in the “QC” column. Once the QC status is allocated to the remaining polygons, the centroid tool is used to create a point at the centre of each polygon (Figure 3h,i). These points are exported as a shapefile. The attributes associated with each polygon are passed onto the point layer.

3.5.5. Contact Clusters

The common approach of defining contact clusters (or potential boulder fields) by count per area unit (see Section 2.2) is limited by the counting grid design, which can skew results based on its alignment with the survey area. To address this limitation, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm [54] in QGIS 3.28.15 was used directly on the product of the semi-automated contact picking workflow.

The function requires two parameters, a minimum cluster size and the maximum distance allowed between clustered points. In this case, point clusters or potential boulder fields are defined as places where there are at least 10 contacts close together, and each contact is no more than 100 m away from another contact in the same group. The GIS tool assigns cluster IDs to every point falling into a cluster defined this way. Subsequently, a minimum bounding geometry (concave hull) is created for every cluster ID, effectively delineating the clustered points as polygons (i.e., potential boulder fields).

3.6. Model Validation

Model validation is a fundamental step in assessing the accuracy of a machine learning algorithm [55]. To evaluate the performance of the model, we employed standard validation metrics, including Precision, Recall, and F1-score. These metrics are widely used in classification and object detection tasks to quantify the accuracy of predictions relative to ground truth labels (in our case the test site master pick). The selection of these metrics follows established evaluation methodologies in machine learning and computer vision [10,56,57]. Precision measures the proportion of correctly identified positive instances relative to all instances predicted as positive by the model. A high precision score indicates a low rate of false positives, which is particularly important in applications where false detections can be costly [56]. Recall (also known as Sensitivity) measures the proportion of actual positive instances that the model correctly identifies. High recall is desirable in applications where missing a positive instance has severe consequences (e.g., medical diagnosis or hazard detection) [56]. The F1-score provides a single performance measure by balancing Precision and Recall. It is the harmonic mean of the two metrics. The F1-score is particularly useful when the dataset exhibits class imbalances, as it ensures neither Precision nor Recall dominates the evaluation [56]. Formulas of each metric are displayed in Table A1.

The performance of the model was tested using the master pick dataset. To compare results, a 1 m (radius) buffer was applied to each master pick contact. This value was chosen considering the available open-source dataset’s spatial resolution (0.5 m) and the allowable total horizontal uncertainties for ‘Special Order’ and ‘Exclusive Order’ surveys (2 and 1 m, respectively) as determined by the International Hydrographic Organization [58]. Subsequently, the ‘Count points in polygon’ tool was used to separate the detection algorithm contacts into two categories. Any contact that falls within the buffer zone was deemed to be a True Positive. Conversely, contacts that did not fall within the buffer zone were considered False Positives. This method was also used when assessing the accuracy of each individual manual pick assessing the accuracy of each individual manual pick.

3.7. Semi-Automated Pick—Whole Study Area

Following the successful validation and development of the model, the semi-automated workflow was applied to the full study area seen in Figure 1. For computational efficiency, the overall study area was split into 4.3 km² tiles using GDAL “Retile” tool before applying the GIS Filter. The output SFP polygons were then merged to represent the entire site. The remaining Semi-Automated Pick steps were run using the model set up developed with the training and test site data. Moreover, to assess the delineated clusters of the whole site and provide a comparative visualization of contact densities, a potential boulder heatmap was created (Figure 4c) using the Kernel Density Estimation Tool available in QGIS 3.28.15.

4. Results

4.1. Master Pick

In total, 613 contacts were agreed to be real and not artefacts or other seabed features observed in the data. There was no prevalent orientation among the picked contacts. Most consisted of a minimum of four data pixels (0.5 m spatial resolution)

4.2. Manual Pick

The average number of contacts manually picked at the test site was 549 (range: 439–792, standard deviation: 114). Of these, the average number of true positives was 445 (range: 309–574, standard deviation: 78), whereas the average number of false positives was 104 (range: 32–207, standard deviation: 80). In total, the manual picks had an average recall value of 73, an average precision value of 82, and an average F1 value of 76.

4.3. Semi-Automated Pick

In total, 749 contacts were identified by the semi-automated pick at the test site. Of these, true positives accounted for 506 of the contacts, whereas false positives accounted for the remaining 243 contacts. The semi-automated pick had a recall value of 83, a precision value of 68, and an F1 value of 74. There was no prevalent orientation among the picked contacts. Most consisted of a minimum of four data pixels (0.5 m spatial resolution). The DBSCAN tool categorized the entire test site as one contact cluster.

4.4. Results Comparison

On average, manual pickers identified 10.4% fewer contacts than the master pick, with 72.6% of the contacts confirmed as true positive contacts. The manual pickers, on average, identified 26.7% fewer contacts than the detection algorithm, with the detection algorithm having slightly more true positives than the manual pick (algorithm: 506, manual: 445). The detection algorithm had more false positives, however (algorithm: 243, manual: 104). The manual pick had lower average recall values (algorithm: 83, manual: 73), higher average precision values (algorithm: 68, manual: 82), and comparable average F1 values (algorithm: 74, manual: 76) versus the detection algorithm pick. However, some pickers (Picker 4) had higher recall, precision, and F1 values compared with the detection algorithm. Comparatively, other pickers (Picker 1) had lower recall, precision, and F1 values compared with the algorithm. The detection algorithm identified 22% more contacts than the master pick, with 82.5% of the contacts confirmed as true positive contacts.

4.5. Semi-Automated Pick—Whole Study Area

Throughout the entire study area (Figure 5), 75,092 contacts were identified by the detection algorithm. Figure 4b shows the distribution of detected contacts; there is clear clustering of contacts in multiple areas on the periphery of the study area. Figure 4c displays a Kernel Density Estimation (KDE) heatmap of the areas with the most intense clustering.

DBSCAN clustering resulted in delineating 89 individual contact clusters (potential boulder fields) across the site (Figure 4d). Contact cluster size ranged from 4.4 km² to 0.002 km². The total area of the delineated contact clusters was 17 km² or 13% of the entire study area.

5. Discussion

This study represents a comprehensive effort to develop a semi-automated workflow for delineating large quantities of potential boulders and boulder fields using high-resolution MBES-derived DEM data. For this purpose, a combination of GIS-based functions and a machine learning Random Forest classifier was successfully implemented. To demonstrate its applicability, the presented method was tested against an average of expert manual contact picking outputs. This test showed improved overall performance from the semi-automated workflow.

An open source MBES dataset was used (cell size 0.5 m) in this study [1], over an area that could be considered appropriate for offshore renewable energy developments. The area, situated within the Long Island Sound, has operational windfarms nearby (i.e., South Folk Wind Farm [59]) and is adjacent to the currently planned Empire Wind 1 [60] and Sunrise Wind Windfarms [61]. These projects will, cumulatively, generate enough energy to power over 2 million homes [60,61]. The presence of surface and subsurface boulders might pose significant challenges to installation and operations within these upcoming projects [2,61]. Apart from the obvious engineering implications, the detection algorithm development can potentially aid understanding of hard-bottom seabed habitats associated with boulders [8,62] and provide insights into geoscientific research [4].

5.1. Manual Picking Statistics

Despite being given the same goals and direction, the six manual pickers produced contact lists that ranged from 439 to 792 picks. Generally, picks with the smallest number correlated with higher precision but lower recall. Conversely picks with a higher number of contacts suffered from lower precision but higher recall. These statistics make intuitive sense and highlight inherent problems with manual picking, which includes bias and fatigue.

The statistical differences between some pickers are vast. For example, Picker 1 picked 516 contacts with a recall score of 50% compared with Picker 4 who picked 792 with a recall score of 94%. Similarly, Picker 5 picked 439 contacts with a recall score of 68%. Despite the large difference in recall, the overall accuracy of Picker 4 (82%) and Picker 5 (79%) are comparable. This is largely because low recall scores can be muted by contrasting high precision scores—an outcome that is tied to the picking strategy and highlights the role that bias can play in manual picking and the assessment of manual picks [62,63,64]. This range in results highlights how critical a development of a standard, reproducible, quantitative approach is to academic and industrial pursuits [65].

In general, the manual picks had high F1 (overall accuracy) except for one anomalous score (Picker 1, Table 1). Notably, the experience level of pickers ranged from 1 year to over 10 and most pickers had experience levels that better represent people who would oversee this type of work. This suggests that because the range of experience encompasses those who would be likely to be involved in practice, there is no outlier that can be treated with less significance.

5.2. Algorithm Performance—Individual Contacts

The detection algorithm outperformed the average recall of individual pickers (Table 1; Figure 5), implying that the algorithm was better at identifying true positive results compared with the average individual picker. The algorithm’s precision value (68, Table 1) was lower than the average precision value in the manual picks (82, Table 1), suggesting that the improved recall came at the expense of precision (i.e., more false positives were detected by the semi-automated approach).

This can be attributed to the higher pick count of the detection algorithm which had more false positives (243 versus 104). However, the F1-score calculated by the detection algorithm (74) was comparable to the average manual picks (76), suggesting a balanced trade-off between precision and recall. A 30 min quality control process was conducted which further increased the F1-score of the detection algorithm’s pick to 76. This process can be omitted or extended depending on the risk aversion of the project. Nonetheless, the results of this study showcase the detection algorithm’s capacity to identify a greater number of contacts—including those potentially overlooked by manual methods—while showing that post-processing QC can help to reduce the number of false positives. Despite this, an algorithm-based approach to identifying contacts in MBES data offers significant cost and time saving advantages. Furthermore, the detection algorithm offers the benefit of reproducibility that would be impracticable with manual picking even from the same person from one effort to the next. This method reduces the reliance on labour-intensive manual picking and mitigates variability introduced by human bias. This manual picking bias affects both the selection of contacts and the final placement of picked points, which is highlighted by significant variations in pick positioning for the same contacts among individual pickers (Figure 5).

Running the trained detection algorithm on the entire 79 km² study area identified 75,092 contacts. In this study, an experienced picker took approximately 11 h to identify contacts within the 0.24 km² test site. Extrapolating this rate, manually picking the entire study area would require 329 h, which is likely conservative given the fact that human fatigue is not likely to be linear. In contrast, the detection algorithm completed the task in approximately 11 h, which included a baseline 5 h manual check QC (see Section 3.5.4). This demonstrates that the detection algorithm is not only reproducible but also achieves results exponentially faster than manual interpretation.

The successful semi-automatic delineation of contact clusters, interpreted as potential boulder fields, not only provides another valuable data layer to the workflow but also facilitates the standardization of what constitutes a boulder field. While simple definitions based on number of boulders per area unit (i.e., density) are intuitive, they are often constrained by methodological challenges relating to boulder counting method and grid design. The DBSCAN clustering [54] employed here eliminates the need for predefined counting grids by defining clusters based on spatial proximity between contacts. Of note, the DBSCAN parameters used to define clusters (a minimum cluster size of 10 contacts and a maximum inter-contact distance of 100 m) can be adjusted to reflect engineering requirements or research objectives. This method is not only efficient but can be implemented using a simple GIS tool, ensuring a highly reproductible and scalable workflow.

This was implemented using only a high-resolution bathymetric dataset, diverging from traditional contact picking using SSS data [16]. Due to its high operating frequency and low altitude above the seabed, SSS produces high-resolution imagery, making it appropriate for detailed contact identification. Additionally, the preservation of object-induced acoustic shadows in SSS backscatter enhances contact detection and enables deriving target height. However, deep-towed SSS data acquisition is inherently prone to positioning errors, not exclusively arising from the inherent separation of the sensor from the vessel’s satellite positioning system and the resultant necessity of underwater acoustic USBL positioning [18]. These errors can be an order of magnitude larger compared with hull-mounted MBES systems, which benefit from direct integration with high-grade GNSS and motion reference sensors.

Moreover, due to acoustic shadowing effects, backscatter intensity of contacts located within scour pits can be lower than that of contacts on more competent, scour-resistant seabed, reducing their visibility. Additionally, when picking contacts in the SSS waterfall data display, boulders located far from the nadir zone and near the maximum range values are difficult to pick due to the sensor’s low alongtrack resolution in this part of the swath [66]. This effect can be further exacerbated by interactions with a thermocline or halocline. The appearance of a potential boulder contact in SSS data is often associated with a distorted geometry, as the variable insonification angle changes depending on the contact’s distance from the side-scan sonar and the tow setup [32]. Furthermore, dynamic changes in bathymetry may necessitate rapid SSS towfish altitude adjustments, which in turn influence the final quality of the data.

Although most of the SSS contacts are usually covered at least twice by adjacent swaths, several factors contribute to the risk of missed and/or misinterpreted contacts. The along-track resolution variability can lead to data inconsistencies in contact detection. Highly variable acoustic shadow lengths further complicate interpretation, as shadow size depends on the insonification angle and local seabed topography. Moreover, varying backscatter intensities and distorted boulder shapes introduce further uncertainties. These factors affect both human operators and machine learning algorithms tasked with identifying contacts.

Given these limitations, utilising MBES bathymetry data for contacts detection and classification instead of SSS data offers several advantages. Bathymetric DEM data represents surface elevation; thus, the fundamental means of identifying contacts shifts completely. Rather than relying solely on backscatter intensity of the reflected SSS signals, the method presented here leverages minor, local elevation differences captured by the highly accurate vertical measurements of the MBES. Notably, the use of a hull-mounted MBES for this purpose is only feasible if the spatial resolution is high enough to capture contacts of interest. Otherwise, the implementation of deep-towed sensors is unavoidable.

5.3. Algorithm Performance—Comparison with Other Studies

In this paper, the proposed workflow for detecting potential boulder contacts in MBES bathymetry derived raster layers achieved an F1-score of 77, indicating a strong balance between precision and recall. Comparatively, a recent study [32] using neural networks on a MBES bathymetry-derived slope raster reported a mean average precision (mAP@50) of 70.46%. While mAP@50 provides a more detailed evaluation of precision across various recall levels, the F1-score directly reflects the detection algorithm’s overall classification performance for detected objects. The results suggest that the proposed method demonstrates a comparable ability to identify small-scale contacts, such as boulders.

A study by [67] applied convolutional neural networks to pick boulders in both MBES bathymetry-derived slope and SSS mosaic layers and compared results against manual classifications performed by two experts. The automatic detection using MBES-derived slope layers had a stronger agreement with the first expert F1-score of 0.75. In contrast, the automatic detection using SSS backscatter mosaics had a lower agreement with the second expert, yielding an F1-score of 0.63. Notably, the F1-score measuring the agreement between the two experts was 0.61, further emphasizing the subjectivity of manual contact picking [67]. Nonetheless, the F1-score of 0.77 achieved by the detection algorithm presented in this study provides a slight improvement in performance against automated neural network approaches. However, due to the differences in evaluation metrics, direct numerical comparisons should be interpreted cautiously, considering potential variations in dataset characteristics, ground-truthing (conducted here by the comparison against the master pick), and semi-automated picking evaluation approaches.

Recent efforts have combined object-based image analysis (OBIA) with machine learning to classify seabed features from MBES data. For example, Ref. [68] applied a Random Forest model, enhanced through the Boruta feature-selection algorithm, to map large-scale bedforms in high-resolution bathymetry. Although focused on broader geomorphological units, the study highlights the value of spatial derivatives such as slope and curvature for seabed classification. Our approach builds on this by showing that similar object-based techniques can be effectively applied to detect smaller features like boulders, using refined GIS filtering and high-resolution DEMs.

In another comparable study, Ref. [31] used topo-bathymetric Light Detection and Ranging (LiDAR) data and a Random Forest classifier to detect boulders based on colour contrast, height differences, shape, and boundary characteristics. To optimise feature selection, the authors applied Relief-F (a feature selection algorithm of [69]) and tested multiple feature neighbourhood sizes (0.5–3 m), finding 0.5 m to be the optimal size. Their model achieved 99–point accuracy, but boulder detection was limited (recall: 57%, precision: 27%), likely due to class imbalance. Our study differs by using high-resolution MBES data and implementing a GIS-based pre-filtering step (using contour shape and size metrics, and HPF slope statistics; Table A1) before performing Random Forest classification on associated HPF slope values. Comparing model validation results suggests that a GIS-based filtering approach before applying a Random Forest classifier leads to more accurate boulder detection than previously tested methods.

Similarly, Ref. [70] developed a semi-automated method for shipwreck detection using MBES bathymetry, combining raster-based slope filtering with object-level classification. Their workflow achieved recall = 0.73 and precision = 0.47, lower than our method’s recall = 0.83 and precision = 0.68. However, their use case involved larger, morphologically complex targets over a much broader spatial scale. Like our study, Ref. [1] employed machine learning to transition from pixel-based filters to object-focused detection, reinforcing the value of hybrid topographic + ML approaches. Our higher precision and reduced false-positive rate highlight the advantage of tailoring object metrics and classifier training to specific small-scale seabed features like boulders.

Ref. [71] employed a two-stage deep-learning pipeline (YOLOv11 + Faster R-CNN) on pre-processed SSS mosaics and reported precision = 0.74, recall = 0.88, and mAP @ [0.5:0.95] = 0.41, while reducing interpretation time by ~92×. These metrics slightly exceed ours (precision = 0.68, recall = 0.83), but their workflow depends on extensive backscatter preprocessing and inherits the georeferencing uncertainty of towed SSS. Our MBES-based method operates on minimally processed DEMs, delivering comparable F1 and more reliable spatial accuracy for engineering design.

5.4. Limitations and Future Work

Hull-mounted MBES data-derived DEM’s cell size is the primary limiting factor for the successful implementation of the semi-automated workflow presented in this study. A 0.5 m spatial resolution data demonstrated a suitable benchmark for future applications; however, ideally the cell size should be even smaller to detect for the smallest boulders within the Wentworth scale [34].

Another limitation of the detection algorithm, as discussed in Section 5.2, is its potential to introduce false positives or false negatives. However, while its precision is slightly lower when compared with the average manual picker performance, its recall is reasonably higher. Notably, after a brief user QC step, the final F1-score improves marginally, while the detection algorithm remains substantially more time efficient than any manual picking effort (near to 1.5 order of magnitude greater). These factors indicate that the algorithm satisfied the research objectives. Future refinement of the individual steps relating to the GIS Filter and Feature Extractor phase of the detection algorithm could further enhance precision and improve overall accuracy.

As stated throughout this paper, the detection algorithm is designed to detect local anomalies in seabed elevation, resulting in contacts characterized by morphologies typically associated with boulders. The development of each step was shaped by the combined expertise of the authors, ensuring that the approach was aligned to established geological and geophysical principles. To benchmark the algorithm’s performance, the reference master pick dataset was carefully created to isolate contacts that are potentially boulders. However, without visual confirmation, geotechnical sampling, or other complementary datasets derived from, for example, a magnetometric survey, the final identification of detected contacts cannot be fully guaranteed. Gathering such additional data and incorporating it into the ground truthing process could enhance the accuracy of the verification dataset, improving confidence in the algorithms approach.

The detection algorithm demonstrates high performance when detecting boulders on a relatively flat seabed with moderately diverse morphology. However, its ability may be diminished in terrains that are highly heterogeneous (i.e., outcropping bedrock). Adjusting algorithm parameters based on specific seabed characteristics may further improve detection accuracy. This approach aligns with findings that emphasize the critical role of seabed morphology in automated boulder detection [32].

The detection algorithm was trained on the test site with an area of 0.24 km². While training the detection algorithm on a larger test site would likely result in higher precision and recall, it would also require increased human interpretation. The authors acknowledge this as a limitation but emphasise that the chosen site size was carefully selected to test the detection algorithm, based on its size, the dense availability of contacts, and some variability in seafloor morphologies present.

6. Conclusions

GIS-based semi-automated methods are increasingly employed to delineate features based on strict, quantitative definitions [14,28,30]. This paper builds on this approach by redefining the traditional means of qualitative boulder description by applying quantitative GIS functions and a Random Forest algorithm to MBES-derived marine remote sensing DEM data. After a short quality control of results, the F1-score representing the overall accuracy of the semi-automated workflow when compared with the verification master pick data was 0.77, which is slightly higher than the F1 of 0.76, the average for expert manual picks. The method offers increased objectivity, reproducibility, and time-efficiency in the detection of potential boulder contacts and their clusters representing boulder fields. The results suggest that this method provides a slight improvement in performance against the automated neural network approaches. This method also allows parameter customisation to suit specific industry or research site characterization objectives without compromising quantifiability.

The positive results of this semi-automated algorithm are the product of the algorithm itself and the process that was developed and applied in tandem. This process is iterative and based on geostatistical tools applied in a GIS environment that work to refine the data into a bespoke set. This enables the ML process to act more effectively and efficiently. Essentially, the data are treated so that they increase the algorithm’s consistency and reduce the required amount of training data. This process could be compared to allowing a self-driving car algorithm to operate in a simplified model of reality for the sake of testing or revealing a particular attribute of travel (e.g., number of cross-traffic turns). If the environment can be tailored for the benefit of the algorithm without the removal of relevant data or the creation of spurious information, it behoves the users to offer such a dataset rather than forcing the ML to consider extraneous data.

In addition to the detection algorithm development, this study showcases the effectiveness of high-resolution MBES for small-scale contact detection. From a geohazard assessment perspective, the number of contacts on a site is often prioritised over their actual identification, at least at the initial baseline site characterization stage. Hence, the necessity of employing towed instruments like SSS for this purpose should be reconsidered. Simultaneously, this approach can aid benthic habitat mapping and geoscientific research by efficiently delineating regions of boulders relying solely on open-source MBES data. Using high-resolution MBES data provides more precise positional accuracy, reduces setup complexity, lowers project costs, and decreases human exposure to offshore work environments associated with towed SSS deployment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs17152711/s1, Supplementary Files: H13385 MB 50 cm MLLW 1of1 clipped validation area and Algorithm Validation Data.

Author Contributions

Conceptualization, E.D. and L.O.; methodology, E.D. and L.O.; software, E.D.; validation, E.D., J.M., J.P., E.O. and L.O.; formal analysis, E.D.; investigation, E.D. and J.M.; resources, J.P.; data curation, E.D., E.O. and J.M.; writing—original draft preparation, E.D., J.M. and L.O.; writing—review and editing, J.M., E.D., J.P., L.O. and E.O.; visualization, E.D., E.O. and J.M.; supervision, J.P.; project administration, E.D., E.O. and J.P.; funding acquisition, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Green Rebel Marine Ltd.

Data Availability Statement

The bathymetric grid in which this study was conducted is freely available here: Report for H13385. Our manual pick and algorithm pick datasets conducted on the test site are available on request.

Acknowledgments

We would like to thank Brian Óg Smyth and Odhran McCarthy for their contributions to this study. We would also like to thank Green Rebel, specifically Kevin O’Leary for facilitating the development of this workflow and algorithms. Finally, thank you to NOAA NCEI for making available the high resolution Multibeam used in this study. Without their publicly available high-resolution data, this study would not be possible.

Conflicts of Interest

The authors would like to disclose that an early version of the enclosed workflow was employed on a commercial project. However, the process presented here is more advanced and includes ML applications. Furthermore, sufficient data have been provided in the text and Supplementary Materials to ensure transparency and reproducibility of this study.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional Neural Network
CUBE	Combined Uncertainty Bathymetry Estimate
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
DEM	Digital Elevation Model
EPSG	European Petroleum Survey Group
GDAL	Geospatial Data Abstraction Library
GIS	Geographical Information Systems
GNSS	Global Navigation Satellite System
GPS	Global Positioning System
HFP	High Pass Filter
ID	Identification
INFORMAR	Integrated Mapping for the Sustainable Development of Irelands Marine Resource
IPQ	Isoperimetric Quotient
LiDAR	Light Detection and Ranging
MBES	Multibeam Echo Sounder
ML	Machine Learning
MLLW	Mean Lower Low Water
NCEI	National Centers for Environmental Information
OBIA	Object-Based Image Analysis
OWE	Offshore Wind Energy
POS MV	Position and Orientation System for Marine Vessels
QC	Quality Control
RF	Random Forest
RV	Research Vessel
SFP	Semi-Filtered Polygon
SSS	Side Scan Sonar
USBL	Ultra-Short Baseline

Appendix A

Table A1. Formulas for statistical equations used throughout the study.

Name	Description	Formula
Compactness	Quantifies how closely a shape approximates a perfect circle	$I P Q = \frac{4 π x A r e a}{P e r i m e t e r^{2}}$
Variance	A measure of dispersion	$\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}$
Mean	A measure of central tendency	$\bar{x} = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$
Skewness	A measure of Asymmetry	$\frac{n}{(n - 1) (n - 2)} \sum_{i = 1}^{n} {(\frac{x_{i} - \bar{x}}{s})}^{3}$
Kurtosis	A measure of tailedness of a distribution	$\frac{n (n + 1)}{(n - 1) (n - 2) (n - 3)} \sum_{i = 1}^{n} {(\frac{x_{i} - \bar{x}}{s})}^{4} - \frac{3 {(n - 1)}^{2}}{(n - 2) (n - 3)}$
Inter Quartile Range (IQR)	A measure of the spread of data	${Percentile}_{75} - {Percentile}_{25}$
Peak-to-Mean Ratio	A measure of variability	$\frac{\max (x)}{Mean}$
Peak-Trough Difference	A measure of the spread of data	$\max (x) - \min (x)$
Precision	Correct positives among predicted positives	$\frac{T P}{T P + F P}$
Recall	Correct positives among actual positives	$\frac{T P}{T P + F N}$
F1	Correct classification among all instances	$2 \times \frac{T P}{2 T P + F P + F N}$

Figure A1. A filtered polygon produced from the GIS Filter and its slope values which are clips. These values are then used to derive multiple statistics which the Random Forest classifier uses to assess whether a contact is present within the polygon or not.

Figure A2. (a–i) Recreates Figure 3, but is zoomed in to view the process over a singular contact.

References

Peters, J.L.; Butschek, F.; O’Connell, R.; Cummins, V.; Murphy, J.; Wheeler, A.J. Geological seabed stability model for informing Irish offshore renewable energy opportunities. Adv. Geosci. 2020, 54, 55–65. [Google Scholar] [CrossRef]
Wenau, S.; Schwarz, B.; Bihler, V.; Boyer, E.; Preu, B. Derisking offshore windfarm installation by sub-seafloor boulder detection based on dedicated seismic diffraction imaging. First Break 2022, 40, 41–45. [Google Scholar] [CrossRef]
Nyberg, J.; Zillén-Snowball, L.; Strömstedt, E. Spatial characterization of seabed environmental conditions and geotechnical properties for the development of marine renewable energy in Sweden. Q. J. Eng. Geol. Hydrogeol. 2022, 55, qjegh2021-091. [Google Scholar] [CrossRef]
Sutherland, J.L.; Davies, B.J.; Lee, J.R. A litho-tectonic event stratigraphy from dynamic Late Devensian ice flow of the North Sea Lobe, Tunstall, east Yorkshire, UK. Proc. Geol. Assoc. 2020, 131, 168–186. [Google Scholar] [CrossRef]
Darvill, C.M.; Bentley, M.J.; Stokes, C.R. Geomorphology and weathering characteristics of erratic boulder trains on Tierra del Fuego, southernmost South America: Implications for dating of glacial deposits. Geomorphology 2015, 228, 382–397. [Google Scholar] [CrossRef]
Goto, K.; Miyagi, K.; Kawamata, H.; Imamura, F. Discrimination of boulders deposited by tsunamis and storm waves at Ishigaki Island, Japan. Mar. Geol. 2010, 269, 34–45. [Google Scholar] [CrossRef]
Feldens, A.; Marx, D.; Herbst, A.; Darr, A.; Papenmeier, S.; Hinz, M.; Feldens, P. Distribution of boulders in coastal waters of Western Pomerania, German Baltic Sea. Front. Earth Sci. 2023, 11, 1155765. [Google Scholar] [CrossRef]
Franz, M.; von Rönn, G.A.; Barboza, F.R.; Karez, R.; Reimers, H.-C.; Schwarzer, K.; Wahl, M. How do geological structure and biological diversity relate? Benthic communities in boulder fields of the Southwestern Baltic Sea. Estuaries Coasts 2021, 44, 1994–2009. [Google Scholar] [CrossRef]
European Commission DG Environment. Interpretation Manual of European Union Habitats, Version EUR28; European Commission DG Environment: Brussels, Belgium, 2013. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The PASCAL Visual Object Classes VOC Challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef]
Aykut, N.O.; Akpınar, B.; Aydın, Ö. Hydrographic data modeling methods for determining precise seafloor topography. Comput. Geosci. 2013, 17, 661–669. [Google Scholar] [CrossRef]
Micallef, A.; Le Bas, T.P.; Huvenne, V.A.; Blondel, P.; Hühnerbach, V.; Deidun, A. A multi-method approach for benthic habitat mapping of shallow coastal areas with high-resolution multibeam data. Cont. Shelf Res. 2012, 39, 14–26. [Google Scholar] [CrossRef]
Pearce, B.; Fariñas-Franco, J.M.; Wilson, C.; Pitts, J.; deBurgh, A.; Somerfield, P.J. Repeated mapping of reefs constructed by Sabellaria spinulosa Leuckart 1849 at an offshore wind farm site. Cont. Shelf Res. 2014, 83, 3–13. [Google Scholar] [CrossRef]
Majcher, J.; Plets, R.; Quinn, R. Residual relief modelling: Digital elevation enhancement for shipwreck site characterisation. Archaeol. Anthropol. Sci. 2020, 12, 6. [Google Scholar] [CrossRef]
Lamarche, G.; Lurton, X. Recommendations for improved and coherent acquisition and processing of backscatter data from seafloor-mapping sonars. Mar. Geophys. Res. 2018, 39, 5–22. [Google Scholar] [CrossRef]
Zhang, F.; Zhang, W.; Cheng, C.; Hou, X.; Cao, C. Detection of small objects in side-scan sonar images using an enhanced YOLOv7-based approach. J. Mar. Sci. Eng. 2023, 11, 2155. [Google Scholar] [CrossRef]
Feldens, P.; Darr, A.; Feldens, A.; Tauber, F. Detection of boulders in side scan sonar mosaics by a neural network. Geosciences 2019, 9, 159. [Google Scholar] [CrossRef]
Luo, Q.; Yan, X.; Ju, C.; Chen, Y.; Luo, Z. An ultra-short baseline underwater positioning system with Kalman filtering. Sensors 2021, 21, 143. [Google Scholar] [CrossRef]
Guinan, J.; McKeon, C.; O’Keeffe, E.; Monteys, X.; Sacchetti, F.; Coughlan, M.; Nic Aonghusa, C. INFOMAR data in the EMODnet Geology data portal supports marine spatial planning and offshore energy development in the Irish offshore. Q. J. Eng. Geol. Hydrogeol. 2020, 54, qjegh2020-033. [Google Scholar] [CrossRef]
National Centers for Environmental Information (NCEI). NOAA Bathymetry Viewer. Available online: https://www.ncei.noaa.gov/maps/bathymetry/?layers=multibeam (accessed on 20 February 2025).
Summers, G.; Lim, A.; Wheeler, A.J. A scalable, supervised classification of seabed sediment waves using an object-based image analysis approach. Remote Sens. 2021, 13, 2317. [Google Scholar] [CrossRef]
Peters, J.L.; Wheeler, A.J.; Cummins, V. Data Resources Assessment—Phase 2. In EirWind Project Deliverable D2.1 Report; MaREI Centre, ERI, University College Cork: Cork, Ireland, 2019. [Google Scholar] [CrossRef]
Peters, J.L.; Remmers, T.; Wheeler, A.J.; Murphy, J.; Cummins, V. A systematic review and meta-analysis of GIS use to reveal trends in offshore wind energy research and offer insights on best practices. Renew. Sustain. Energy Rev. 2020, 128, 109916. [Google Scholar] [CrossRef]
Ierodiaconou, D.; Schimel, A.C.G.; Kennedy, D.; Monk, J.; Gaylard, G.; Young, M.; Diesing, M.; Rattray, A. Combining pixel and object-based image analysis of ultra-high resolution multibeam bathymetry and backscatter for habitat mapping in shallow marine waters. Mar. Geophys. Res. 2018, 39, 271–288. [Google Scholar] [CrossRef]
Janowski, L.; Wroblewski, R.; Dworniczak, J.; Kolakowski, M.; Rogowska, K.; Wojcik, M.; Gajewski, J. Offshore benthic habitat mapping based on object-based image analysis and geomorphometric approach: A case study from the Slupsk Bank, Southern Baltic Sea. Sci. Total Environ. 2021, 801, 149712. [Google Scholar] [CrossRef]
Arosio, R.; Hobley, B.; Wheeler, A.J.; Sacchetti, F.; Conti, L.A.; Furey, T.; Lim, A. Fully convolutional neural networks applied to large-scale marine morphology mapping. Front. Mar. Sci. 2023, 10, 1228867. [Google Scholar] [CrossRef]
Verfaillie, E.; Doornenbal, P.; Mitchell, A.J.; White, J.; Van Lancker, V. The Bathymetric Position Index (BPI) as a Support Tool for Habitat Mapping: Worked Example for the MESH Final Guidance; MESH Project, Marine Institute: Galway, Ireland, 2007; p. 14. [Google Scholar]
Walbridge, S.; Slocum, N.; Pobuda, M.; Wright, D.J. Unified geomorphological analysis workflows with Benthic Terrain Modeler. Geosciences 2018, 8, 94. [Google Scholar] [CrossRef]
Wessel, P. An empirical method for optimal robust regional-residual separation of geophysical data. Math. Geol. 1998, 30, 391–408. [Google Scholar] [CrossRef]
Gafeira, J.; Dolan, M.F.J.; Monteys, X. Geomorphometric characterization of pockmarks by using a GIS-based semi-automated toolbox. Geosciences 2018, 8, 154. [Google Scholar] [CrossRef]
Hansen, S.S.; Ernstsen, V.B.; Andersen, M.S.; Al-Hamdani, Z.; Baran, R.; Niederwieser, M.; Steinbacher, F.; Kroon, A. Classification of boulders in coastal environments using random forest machine learning on topo-bathymetric LiDAR data. Remote Sens. 2021, 13, 4101. [Google Scholar] [CrossRef]
Hinz, M.; Olberg, D.; Schlenz, B.; Schmitt, T. AI-based boulder detection in sonar data—Bridging the gap from experimentation to application. Int. Hydrogr. Rev. 2024, 25, 45–58. [Google Scholar] [CrossRef]
Nichols, G. Sedimentology and Stratigraphy, 2nd ed.; Wiley-Blackwell: Chichester, UK, 2010. [Google Scholar]
Wentworth, C.K. A scale of grade and class terms for clastic sediments. J. Geol. 1922, 30, 377–392. [Google Scholar] [CrossRef]
Feldens, P. Super resolution by deep learning improves boulder detection in side scan sonar backscatter mosaics. Remote Sens. 2020, 12, 2284. [Google Scholar] [CrossRef]
Kleman, J. Preservation of landforms under ice sheets and ice caps. Geomorphology 1994, 9, 19–32. [Google Scholar] [CrossRef]
Pittman, J.; Griffiths, S.; Guigné, J.Y. Near surface sub-seabed boulder detection using a 3D acoustic profiler. In Proceedings of the 9th Offshore Site Investigation and Geotechnics (OSIG) Conference, London, UK, 12 September 2023. [Google Scholar]
Mason, R.J.; Polvi, L.E. How big is a boulder? The importance of boulder definition choice in earth science research and river management. Earth Surf. Process. Landf. 2024, 49, 2840–2854. [Google Scholar] [CrossRef]
National Centers for Environmental Information (NCEI). Hydrographic Survey H13385. Available online: https://www.ngdc.noaa.gov/nos/H12001-H14000/H13385.html (accessed on 20 February 2025).
Young, R.A.; Gordon, L.M.; Owen, L.A.; Huot, S.; Zerfas, T.D. Evidence for a late glacial advance near the beginning of the Younger Dryas in western New York State: An event postdating the record for local Laurentide ice sheet recession. Geosphere 2020, 17, 271–305. [Google Scholar] [CrossRef]
Margold, M.; Stokes, C.R.; Clark, C.D. Reconciling records of ice streaming and ice margin retreat to produce a palaeogeographic reconstruction of the deglaciation of the Laurentide Ice Sheet. Quat. Sci. Rev. 2018, 189, 1–30. [Google Scholar] [CrossRef]
Weiss, A.D. Topographic position and landforms analysis. In Proceedings of the 2001 ESRI User Conference, San Diego, CA, USA, 9–13 July 2001. [Google Scholar]
Polsby, D.D.; Popper, R.D. The third criterion: Compactness as a procedural safeguard against partisan gerrymandering. Yale Law Policy Rev. 1991, 9, 301–353. [Google Scholar] [CrossRef]
Cox, E.P. A method of assigning numerical and percentage values to the degree of roundness of sand grains. J. Paleontol. 1927, 1, 179–183. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Chen, W.; Xie, X.; Wang, J.; Pradhan, B.; Hong, H.; Bui, D.T.; Duan, Z.; Ma, J. A comparative study of logistic model tree, random forest, and classification and regression tree models for spatial prediction of landslide susceptibility. Catena 2017, 151, 147–160. [Google Scholar] [CrossRef]
Belgiu, M.; Drăguț, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J. Random forests for classification in ecology. Ecology 2007, 88, 2783–2792. [Google Scholar] [CrossRef] [PubMed]
Probst, P.; Boulesteix, A.-L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. WIREs Data Min. Knowl. Discov. 2019, 9, e1301. [Google Scholar] [CrossRef]
Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar] [CrossRef]
Farris, F.A. The Gini index and measures of inequality. Am. Math. Mon. 2010, 117, 851–864. [Google Scholar] [CrossRef]
Ester, M.; Kriegel, H.-P.; Sander, J.; Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD ’96), Portland, OR, USA, 2–4 August 1996; AAAI Press: Menlo Park, CA, USA, 1996; pp. 226–231. [Google Scholar]
Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI ’95), Portland, OR, USA, 2–4 August 1995; Morgan Kaufmann: San Mateo, CA, USA, 1995; Volume 2, pp. 1137–1143. [Google Scholar]
Van Rijsbergen, C.J. Information Retrieval, 2nd ed.; Butterworth-Heinemann: London, UK, 1979. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
International Hydrographic Organization (IHO). IHO Standards for Hydrographic Surveys, S-44, 6th ed.; Version 6.1.0; International Hydrographic Bureau: Monaco, 2023; Available online: https://iho.int/uploads/user/pubs/standards/s-44/S-44_Edition_6.1.0.pdf (accessed on 2 March 2025).
South Fork Wind. Ørsted: Discover NY’s First Offshore Wind Farm. Available online: https://southforkwind.com (accessed on 26 February 2025).
Empire Wind. Empire Wind: Powering New York’s Future. Available online: https://www.empirewind.com (accessed on 12 February 2025).
Sunrise Wind. Sunrise Wind: A Partnership for Clean Energy. Available online: https://sunrisewindny.com (accessed on 12 February 2025).
Fletcher, T.; Booth, M.T.; Pritt, J.J. A comparison of recreational and survey-grade side-scan sonar systems in mapping reservoir fish habitat. N. Am. J. Fish. Manag. 2024, 44, 1422–1438. [Google Scholar] [CrossRef]
Gonzalez-Socoloske, D.; Olivera-Gomez, L.D. Gentle giants in dark waters: Using side-scan sonar for manatee research. Open Remote Sens. J. 2012, 5, 1–14. [Google Scholar] [CrossRef]
Daugherty, D.J.; Fleming, B.P. Effects of training on side-scan sonar use as a fish survey tool: A case study in alligator gar. J. Fish Wildl. Manag. 2021, 12, 152–163. [Google Scholar] [CrossRef]
Van Unen, P.; Lekkerkerk, H.-J. Machine Learning as a Tool: Detecting Boulders in a Multibeam Point Cloud. Hydro Int. 2021. Available online: https://www.hydro-international.com/content/article/machine-learning-as-a-tool (accessed on 18 March 2025).
Blondel, P. The Handbook of Sidescan Sonar; Springer: Berlin/Heidelberg, Germany, 2009; p. 316. [Google Scholar] [CrossRef]
Feldens, P.; Schmidt, V.; Held, P.; Wilken, D. Automatic detection of boulders by neural networks: A comparison of multibeam echo sounder and side-scan sonar performance. Hydrogr. Nachr. 2021, 119, 6–11. [Google Scholar] [CrossRef]
Janowski, Ł. Advancing seabed bedform mapping in the Kuźnica Deep: Leveraging multibeam echosounders and machine learning for enhanced underwater landscape analysis. Remote Sens. 2025, 17, 373. [Google Scholar] [CrossRef]
Kira, K.; Rendell, L.A. A practical approach to feature selection. In Machine Learning Proceedings 1992; Sleeman, D., Edwards, P., Eds.; Morgan Kaufmann: Burlington, MA, USA, 1992; pp. 249–256. [Google Scholar] [CrossRef]
Pols, C.T.; Sturt, F.; El Safadi, C.; Marcu, A. Shipwreck detection in bathymetry data using semi-automated methods: Combining machine learning and topographic inference approaches. J. Archaeol. Sci. 2025, 181, 106297. [Google Scholar] [CrossRef]
Dowah, H.; Kirawan, K. Automated Boulder Localization and Recognition in Side-Scan Sonar Data Using Deep Learning. Master’s Thesis, Blekinge Institute of Technology, Karlskrona, Sweden, 2025. [Google Scholar]

Figure 1. Study area with inset map showing the ‘test site’ bathymetry with colour ramp properties used in manual picking phase.

Figure 2. Detection algorithm workflow for contact detection and clustering. The workflow consists of 5 main parts: GIS Filter that obtains filtered contour polygons and slope DEMs from manipulated bathymetry grids. Feature Extractor that extracts zonal and derived statistics from each polygons associated slope DEM. RF classifier is trained on the training site extracted statistics and is used to detect contacts in the test site. Exporter Filters detects contacts, creates QC thresholds, and generates point geometries. Contact clusters that calculate cluster fields based on the spatial distribution of contacts.

Figure 3. A summarised graphical representation of the algorithm’s workflow for the test site (see Appendix A material for a single contact representation of the workflow (Figure A2). (a) Raw 0.5 m bathymetry raster. (b) HPF created from the bathymetry raster. (c) Slope of the HPF. (d) Contours created from the slope raster. (e) Filtered contours to remove excess features. (f) Polygonization of the remaining contours. (g) Filtering of polygons by compactness and area. (h) Random Forest Classification of polygons that contain statistics which signal contact presence. (i) Centroids placed in the remaining polygons.

Figure 4. (a) 0.5 m bathymetry. (b) Contacts picked by the detection algorithm. (c) KDE heatmap of contacts. (d) Contact clusters created from the distribution of the picked contacts.

Figure 5. Contacts derived from individual picks (greyscale) and detection algorithm (red). White circles represent 1 m circular buffers around the verification Master Pick points.

Table 1. Summary of detection performance of the semi-automated detection method, labeled as “Algo” (automated output) and “Algo with QC” (after a 30 min quality control review). These represent the semi-automated detection picks. Performance is compared against six manual annotators (Pickers 1–6). The “Master” pick was created through a majority-rule consensus by a working group of marine geophysical professionals over a series of collaborative sessions. Reported metrics include the number of detected contacts (No. Contacts), true positives (TP), false positives (FP), and derived values for recall, precision, and F1-score (all expressed as percentages). “Manual Pick Average” and “Standard Deviation” summarize the mean and variability across Pickers 1–6.

Name	No. Contacts	TP	FP	Recall	Precision	F1-Score
Master	613	613	0	100	100	100
Algo	749	506	243	83	68	74
Algo with QC	696	507	189	83	73	77
Picker 1	516	309	207	50	60	55
Picker 2	466	434	32	71	93	80
Picker 3	545	460	85	75	84	79
Picker 4	792	574	218	94	72	82
Picker 5	439	417	22	68	95	79
Picker 6	537	478	59	78	89	83
Standard Deviation	128	95	98	16	14	12
Manual Pick Average	549	445	104	73	82	76

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Downing, E.; O’Reilly, L.; Majcher, J.; O’Mahony, E.; Peters, J. A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder. Remote Sens. 2025, 17, 2711. https://doi.org/10.3390/rs17152711

AMA Style

Downing E, O’Reilly L, Majcher J, O’Mahony E, Peters J. A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder. Remote Sensing. 2025; 17(15):2711. https://doi.org/10.3390/rs17152711

Chicago/Turabian Style

Downing, Eoin, Luke O’Reilly, Jan Majcher, Evan O’Mahony, and Jared Peters. 2025. "A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder" Remote Sensing 17, no. 15: 2711. https://doi.org/10.3390/rs17152711

APA Style

Downing, E., O’Reilly, L., Majcher, J., O’Mahony, E., & Peters, J. (2025). A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder. Remote Sensing, 17(15), 2711. https://doi.org/10.3390/rs17152711

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semi-Automated, Hybrid GIS-AI Approach to Seabed Boulder Detection Using High Resolution Multibeam Echosounder

Abstract

1. Introduction

2. Background Information

2.1. Boulder

2.2. Boulder Field

3. Materials and Methods

3.1. Methodological Considerations

3.1.1. Boulder

3.1.2. Boulder Field

3.1.3. Morphologies and Nomenclature

3.2. Study Area

3.2.1. Background Information and Regional Geology

3.2.2. Test Site

3.2.3. Training Site

3.3. Master Pick

3.4. Manual Pick

3.5. Semi-Automated Pick

3.5.1. GIS Filter

3.5.2. Feature Extractor

3.5.3. RF Classifier

3.5.4. Exporter

3.5.5. Contact Clusters

3.6. Model Validation

3.7. Semi-Automated Pick—Whole Study Area

4. Results

4.1. Master Pick

4.2. Manual Pick

4.3. Semi-Automated Pick

4.4. Results Comparison

4.5. Semi-Automated Pick—Whole Study Area

5. Discussion

5.1. Manual Picking Statistics

5.2. Algorithm Performance—Individual Contacts

5.3. Algorithm Performance—Comparison with Other Studies

5.4. Limitations and Future Work

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI