Nearshore Benthic Habitat Mapping Based on Multi-Frequency, Multibeam Echosounder Data Using a Combined Object-Based Approach: A Case Study from the Rowy Site in the Southern Baltic Sea

: Recently, the rapid development of the seabed mapping industry has allowed researchers to collect hydroacoustic data in shallow, nearshore environments. Progress in marine habitat mapping has also helped to distinguish the seaﬂoor areas of varied acoustic properties. As a result of these new developments, we have collected a multi-frequency, multibeam echosounder dataset from the valuable nearshore environment of the southern Baltic Sea using two frequencies: 150 kHz and 400 kHz. Despite its small size, the Rowy area is characterized by diverse habitat conditions and the presence of red algae, unique on the Polish coast of the Baltic Sea. This study focused on the utilization of multibeam bathymetry and multi-frequency backscatter data to create reliable maps of the seaﬂoor. Our approach consisted of the extraction of 70 secondary features of bathymetric and backscatter data, including statistic and textural attributes of different scales. Based on ground-truth samples, we have identiﬁed six habitat classes and selected the most relevant features of the bathymetric and backscatter data. Additionally, ﬁve types of image processing pixel-based and object-based classiﬁers were tested. We also evaluated the performance of algorithms using an accuracy assessment based on the validation subset of the ground-truth samples. Our best results reached 93% overall accuracy and a kappa coefﬁcient of 0.90, conﬁrming that nearshore seabed habitats can be accurately distinguished based on multi-frequency, multibeam echosounder measurements. Our predictive habitat mapping of shallow euphotic zones creates a new scientiﬁc perspective and provides relevant data for the management of natural resources. Object-based approaches previously used in various environments and areas suggest that methodology presented in this study may be scalable.


Introduction
Shallow, coastal benthic habitats represent one of the most productive and valuable ecosystems on Earth [1]. The particular hydrodynamic conditions of these environments are responsible for the highly active exchange of nutrients, sediments, and biota. Their locations within euphotic zones make them an ideal place for the growth of macroalgae, which provide good settlements for benthic communities. Nearshore benthic habitats usually form complicated patterns, in which conducting spatial determination analysis is very important for ecosystem management and protection. Finally, [16,17], spectral analysis [18], and neural network analysis (e.g., [19,20]). The image processing approach benefits from different kinds of classification (often supervised) and allows researchers to apply many geomorphometric attributes (e.g., [21,22]). The approach presented here is based on object image analysis related to different acoustic products: backscatter and bathymetry combined in a relational database.

Study Site
This study focused on the nearshore shallow area located within the Polish Exclusive Economic Zone (EEZ). The detailed location of the aforementioned area is presented in Figure 1. The research area has dimensions of around 1.0 × 1.4 km, covering approximately 1.4 km 2 . The outer boundary is located at a distance ranging between 0.5 and 2.0 km from the shoreline. The depth of the analyzed area varies from 4 to 20 m with a mean of 10 m. The geomorphology of the seabed is diversified, including valleys and crests of irregular shapes. The Rowy site neighbors and is partly within the nearshore coastal area of Slowinski National Park in northern Poland, near Gardno lake, at the coast. The protection of the surrounding marine environments has been established since 1995, when the borders of the National Park were expanded to marine areas up to a depth of 10 m as Ramsar site no. 757 [23]. The Rowy site is also located within the area of Natura 2000, no. PLB990002 [24].
The substratum of the study site is made of glacial tills that belong to a large moraine area occurring at the coast. The till outcrops represent relicts of postglacial structures that are crossed by valleys filled with modern marine sand and gravelly sand deposits, which form structures similar to ripple marks [25]. Glacial tills are often covered by large, dense boulder areas, and such terrain is rare The Rowy site neighbors and is partly within the nearshore coastal area of Slowinski National Park in northern Poland, near Gardno lake, at the coast. The protection of the surrounding marine environments has been established since 1995, when the borders of the National Park were expanded to marine areas up to a depth of 10 m as Ramsar site no. 757 [23]. The Rowy site is also located within the area of Natura 2000, no. PLB990002 [24].
The substratum of the study site is made of glacial tills that belong to a large moraine area occurring at the coast. The till outcrops represent relicts of postglacial structures that are crossed by Remote Sens. 2018, 10,1983 4 of 21 valleys filled with modern marine sand and gravelly sand deposits, which form structures similar to ripple marks [25]. Glacial tills are often covered by large, dense boulder areas, and such terrain is rare within the Polish part of the southern Baltic Sea. Such a hard substratum provides a good base for various vegetation and benthic communities. Within the immediate surroundings of the Rowy area there is a lack of big urban or industrial areas, sources of contamination, and big river estuaries, so the environment maintains its relatively original nature. Previous research has confirmed the high biodiversity of the benthic communities within the analyzed area [26]. The presence of six species of red algae has been found there, such as Bangiophyceae, which is very rare in the Polish coast of the Baltic Sea, including unique Furcellaria lumbricalis and Polysiphonia fucoides. Moreover, boulder sites are often colonized by dense, cemented communities of Mytilus trossulus bivalves, with over 2500 individuals per m 2 in some locations [27]. The presence of large patches of such macroalgae is very valuable in terms of the functioning of the ecosystem and increasing the diversity of the phytophile fauna [28].

Hydroacoustic Data Acquisition and Processing
Hydroacoustic data were acquired during two surveys on 26-27 May 2018, using a small boat equipped with a multibeam echosounder (MBES) NORBIT iWBMS (model STX). The MBES was mounted on a pole and oriented vertically downwards during the measurements. The device allowed us to collect bathymetric data in a depth range of 2-150 m. Its angular spread across the ship's track was 150 • , allowing us to collect 512 beams. The beam width had dimensions of 0.9 × 0.9 • at the working frequency of 400 kHz. The maximum angular coverage at the aforementioned frequency could be set up to 210 • . The MBES worked with an integrated GNSS/INS navigation system (Wave Master, manufactured by Applanix: 85 Leek Crescent, Richmond Hill, ON Canada, L4B 3B3), including online RTK corrections for high positioning accuracy. During each survey, the working frequency of the MBES was set to a fixed value of 150 kHz or 400 kHz, depending on the day of the acquisition. In the case of a survey with the 150 kHz frequency, the maximum angular coverage was reduced to 160 • . Our measurements were performed with a maximum ping rate of 30 Hz and a sweep time of 500 µs. To provide accurate profiles of the sound speed in the water column, the sound speed was measured consistently using a sound velocity profiler. Multibeam echosounder surveys were designed and performed to provide full spatial coverage (150%) of the area at a constant speed of 5.5-6 knots.
The hydroacoustic data were processed and cleaned using QPS Qimera 1.6.3 and Fledermaus Geocoder Toolbox (FMGT) 7.8.4 software. We gridded the bathymetric and backscatter data from the MBES with the maximum reliable resolution, which helped to avoid data gaps between the survey lines and to maintain consistency. Therefore, for the 150 kHz frequency, the data was gridded with a grid size of 0.75 m, and for the 400 kHz frequency, the grid size was 0.5 m. In order to determine the sensor errors, we utilized the combined uncertainty and bathymetric estimator (CUBE) algorithm, which obtains multiple estimations of depths related to the variation of the acoustic data [29]. We obtained standard deviations of the CUBE surface of lower than 30 cm. The backscatter mosaic was created using the 'flat' mode of the angle varying gain (AVG) correction tool with 'blend' mosaicking style and a window size of 300 [30]. The AVG correction in Fledermaus Geocoder Toolbox normalizes the backscatter data without angular dependency based on the calculations of the average backscatter response, between 20 and 60 degrees (or grazing angles [8]). 'Flat' mode is a standard type of AVG calculation, which smooths small variations of the backscatter signal and reduces its noise [31]. The window size indicates a series of a specified number of consecutive pings that is used for AVG normalization. The selected number of corrected curves (in our study, 300 pings) is used as a sliding window, moving along the survey lines (e.g., [32]). 'Blend' mosaicking style is a standard method for the management of overlapping lines in FMGT [8]. It blends nadir pixels with other overlapping pixels [31]. The bathymetric and backscatter data were created in reference coordinate system UTM 33 N based on WGS 84.

Ground-Truth Sampling and Analysis
The ground-truth samples were collected on 7 September 2018. They included sediment and video sampling using a Van Veen grab sampler and a remotely operated vehicle. The locations of the samples were carefully selected, chosen because of the particular characteristics of the seafloor based on prior knowledge of the research area [26]. Because of the difficulties of taking sediment samples from tills and boulders, of the total number of 31 samples, 29 were documented by video recordings and 14 were collected by the grab sampler. The sediment ground-truth data were analyzed using granulometric and sieve analysis, including the use of Folk and Ward parameters and Wentworth classification of the sediments [33,34]. A ROV was used in almost all the point locations, and apart from the video recordings in the target place, it was directed to carefully investigate the seafloor of each ground-truth point sampling area. Using mounted sensors, it obtained additional information, such as time, depth, direction, and temperature data, but its positioning and driving path were not capable of being precisely obtained during such seabed investigations. Therefore, despite having over 100 min of video recordings, we decided to generalize the obtained material and identify one class of ground-truth sample per specific point location. A deep investigation of the video recordings in conjunction with sediment analysis and hydroacoustic data distinguished six classes of habitats [6], which are presented in Table 1. The characteristic examples of the backscatter images shown in Table 1 occupy a spatial area of 9 m 2 and were presented using a false composite with R and G bands that corresponded to the backscatter intensity at 400 kHz and 150 kHz, respectively. The image descriptions shown in Table 1 also refer to fragments of backscatter images presented as false RGB composites. The geographic coordinates of the ground-truth samples with their descriptions are shown in Table A1. It should be noted that despite the fact that Samples 11 and 11b were acquired in similar locations, the distance between them was 15 m in a straight line, which was reflected in the different seafloor types at those locations. Because one class of acoustic facies represented artificial structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process.

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level co-occurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400 kHz). The sizes of the sliding windows resulted in the occurrence of no data in some parts of the area, which slightly reduced its dimensions in the case of some secondary features, but it did not affect the further analysis.  Table A1. It should be noted that despite the fact that Samples 11 and 11b were acquired in similar locations, the distance between them was 15 m in a straight line, which was reflected in the different seafloor types at those locations. Because one class of acoustic facies represented artificial structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process.  Table A1. It should be noted that despite the fact that Samples 11 and 11b were acquired in similar locations, the distance between them was 15 m in a straight line, which was reflected in the different seafloor types at those locations. Because one class of acoustic facies represented artificial structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process. Bare, flat area of very fine sand with worm burrows S similar locations, the distance between them was 15 m in a straight line, which was reflected in the different seafloor types at those locations. Because one class of acoustic facies represented artificial structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process. Very dark homogenous areas similar locations, the distance between them was 15 m in a straight line, which was reflected in the different seafloor types at those locations. Because one class of acoustic facies represented artificial structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process. structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process. structures, such as a shipwreck located at a single, certain site of the area, we decided to perform classification algorithms for five distinct classes and assign the class of artificial structures manually at the end of the process.

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level cooccurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level cooccurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level cooccurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400 kHz). The sizes of the sliding windows resulted in the occurrence of no data in some parts of the area,

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level cooccurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400 kHz). The sizes of the sliding windows resulted in the occurrence of no data in some parts of the area,

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level cooccurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400 kHz). The sizes of the sliding windows resulted in the occurrence of no data in some parts of the area, which slightly reduced its dimensions in the case of some secondary features, but it did not affect the

Feature Extraction and Selection
We extracted 70 secondary features from the bathymetric and backscatter data, 35 for each of the two analyzed frequencies (150 kHz and 400 kHz). Together with the primary datasets (bathymetry 150 kHz, bathymetry 400 kHz, backscatter 150 kHz, and backscatter 400 kHz), we had 74 parameters in total. Table 2 presents all the extracted secondary features. The bathymetry-based features included the following: slope, aspect, eastness, northness, curvature, planar curvature, profile curvature, surface area to planar area (arc-chord ratio) [35], vector ruggedness measure (VRM) ruggedness [36], kurtosis, standard deviation of bathymetry, variance, fine-scale bathymetric position index (BPI), and broad-scale bathymetric position index [37]. While a majority of the features were calculated based on a sliding window size of 3 × 3 pixels, for some of them (slope, VRM ruggedness, bathymetry standard deviation, and kurtosis), we tested a multiscale approach [38]. For these features, we applied the following scales: 3 × 3, 5 × 5, 7 × 7, and 9 × 9. The backscatter secondary features included the following: backscatter standard deviation and various kinds of grey level cooccurrence matrices (GLCMs) [39], including homogeneity, contrast, dissimilarity, entropy, angular second moment, mean, standard deviation, and correlation. The backscatter secondary features were extracted based on object-based statistics. The spatial extent of all the secondary features (of both the bathymetric and backscatter data) was almost the same within the analyzed dataset (150 kHz or 400 kHz). The sizes of the sliding windows resulted in the occurrence of no data in some parts of the area, which slightly reduced its dimensions in the case of some secondary features, but it did not affect the Artificial structures, such as a shipwreck All the secondary features were imported (or created within the software in the case of the GLCMs) to eCognition software. The object-based statistics were extracted on the basis of the multiresolution segmentation of the backscatter intensity images with different 'scales' of segmentation (see Sections 2.4 and 3.4). The image objects were simply classified based on the point locations of the training samples (see Table A1). The mean scalar statistics of the classified objects, including all the investigated secondary features, were exported as georeferenced data.
All the secondary features were selected using the Boruta feature selection algorithm in the R software, using the 'Boruta' and 'rgdal' libraries [40,41]. Boruta is a wrapper function based on the random forest classifier, which selects the most important attributes after conducting multiple executions, evaluating performance by combining different subsets of input variables [22]. The result of the algorithm is expressed via feature importance (Z-score). The Z-score expresses a number of standard deviations between the result and the mean score. Features with the highest importance have Z-scores that are significantly higher than their shadow attributes and therefore are selected as Remote Sens. 2018, 10,1983 7 of 21 confirmed [42]. Features without a decision at the end of the analysis are marked as tentative [43]. We used the 'rgdal' library to properly import the georeferenced data to the R software [41]. Table 2. List of extracted secondary features of the bathymetric and backscatter data for each of the analyzed frequencies (150 kHz and 400 kHz). VRM-vector ruggedness measure; BPI-bathymetric position index; and GLCM-grey level co-occurrence matrix.

Image Processing and Evaluation
The backscatter mosaics created using the FMGT software were classified using image processing techniques. We evaluated pixel-based (PB) and object-based (OB) approaches. In our study, we used one unsupervised pixel-based (PB) method of image clustering-Jenks natural breaks classification. The algorithm works by maximizing the variance between the clusters and minimizing the variance within them [44]. We applied the method separately for grey-level backscatter images of different frequencies in the ArcGIS 10.4 software. According to our analysis of the ground-truth data, we computed the algorithm for the five classes of habitats.
The object-based image analysis (OBIA) of the acoustic facies was performed based on a multi-frequency, georeferenced backscatter image. We created objects in the eCognition Developer 9 software based on the multiresolution segmentation algorithm [45]. The technique creates images of objects using a bottom-up region merging method from one pixel based on a defined 'scale' of multiresolution segmentation. The merging process is based on the specific features of the relevant objects, such as their spectral properties or shapes. When the algorithm reaches the homogeneity criterion expressed by the 'scale' parameter, the fusion of neighboring objects stops [45]. Similarly, such as in other OB habitat mapping studies, we used the following multiresolution segmentation parameters: shape 0.1 and compactness 0.5 [46][47][48][49]. We tested the image objects created for the 'scales' of the multiresolution segmentation from 1 to 20 (with steps of 1).
The classifications of the image objects were performed based on a supervised approach using a few algorithms: classification and regression trees [50], support vector machines [51], random forests [52], and k-nearest neighbors. A supervised method assumes the utilization of a subset of ground-truth samples as training sites. Table A1 shows a description of all the ground-truth samples with separation for training and validation types. During the training process, the classifier computed the relationships between the image and the separated ground-truth data. The next step-application-used the inferred function to assign the unclassified areas implicitly [53].
The classification and regression tree (CART) technique generates a decision tree based on recursive partitioning. Decision trees are organized in branches and leaves (or nodes) that concentrate on similar groups of objects. Tree splitting increases the similarity within the groups until the terminal nodes are reached, and the splitting process stops [50]. The strength of the CART classifier is the easily interpreted result of the classification, which is explained as a series of questions. It does not assume any underlying relationships between the predictor and the response features. The weaknesses of the CART classifier include the need for a primary estimation of the right size of the trees and the risk of overfitting due to a large number of splits [54].
The support vector machine (SVM) is machine learning algorithm based on the support vector approach. It partly belongs to the kernel-based classification methods [51]. The kernel is responsible for the transformation of datapoints from the input space to a higher dimensional feature space. The classifier creates the finest decision boundaries (called hyperplanes) that separate the feature vectors inside this feature space [55]. The feature vectors that are nearest (in terms of distance) to a hyperplane are called support vectors. The goal of the classifier is to obtain the largest possible margin that will separate the features in the best way.
Similarly to the CART method, random forest (RF) is a classification technique based on a decision tree approach [52]. The algorithm is responsible for the generation of many simple decision trees based on a random set of variables. The classifier considers an input feature vector, classifying it with all the trees in the forest and resulting in a class with the highest number of 'votes' [56]. One of the best advantages of the RF classifier is the high level of performance that can be achieved after the evaluation of many decision trees.
The k-nearest neighbors (KNN) algorithm is one of the simplest classifiers used in this study. The algorithm classifies a certain query object based on a specified number (K) of training samples located at the direct neighbor of the query point. To measure the influence of the neighbors, the classifier calculates the Euclidean distance between the query point and each instance. The value of K has a significant impact on the classification results. A number that is too small can cause a large variance in the prediction, whereas a number that is too large may result in large model bias. Therefore, it is typically recommended to choose a small value for K but to choose one that is large enough to avoid the probability of misclassification [57].
The performance of the chosen classifiers was evaluated based on an accuracy assessment. Error matrices were calculated for each classification result with cross-tabulation performed between the generated map and the validation subset of the ground-truth samples [58]. We calculated the common accuracy assessment statistics, such as the following: user's and producer's accuracy [59,60], overall accuracy, and kappa index of agreement (KIA) [61]. We anticipated the possibility of several good results related to different methods of classification. In such a case, we combined the best results to strengthen the accuracy, similar to the approach used in a previous study [12]. The general workflow of all the steps required to generate the predictive habitat maps in this study is presented in Figure 2.

Discrimination of Ground-Truth Samples
Our analysis of the ground-truth sediment samples and the ROV video inspections distinguished 5 main habitat classes. One additional class of artificial structures, visible in Table 1 was assigned manually, so it was not considered as an input for the purposes of image processing. Figure 3A,B present the distribution of the mean backscatter intensity versus the specified habitat class for both of the frequencies used: 150 and 400 kHz. The values of the backscatter intensity were expressed as relative intensity values in the logarithmic scale in dB [6]. In general, the diagrams depicting the discrimination of the habitat classes showed a clear separation between the two groups of habitat classes. Sands and very fine sands were characterized by a low return of the acoustic signal, whereas the three remaining classes showed high backscatter. Moreover, the spread of the boxplots for the 150kHz dataset was wider than the spread of the boxplots for the 400 kHz dataset. The thinner spread of the latter suggested that it could separate the habitat classes more clearly than the 150 kHz dataset.

Discrimination of Ground-Truth Samples
Our analysis of the ground-truth sediment samples and the ROV video inspections distinguished 5 main habitat classes. One additional class of artificial structures, visible in Table 1 was assigned manually, so it was not considered as an input for the purposes of image processing. Figure 3A,B present the distribution of the mean backscatter intensity versus the specified habitat class for both of the frequencies used: 150 and 400 kHz. The values of the backscatter intensity were expressed as relative intensity values in the logarithmic scale in dB [6]. In general, the diagrams depicting the discrimination of the habitat classes showed a clear separation between the two groups of habitat classes. Sands and very fine sands were characterized by a low return of the acoustic signal, whereas the three remaining classes showed high backscatter. Moreover, the spread of the boxplots for the 150 kHz dataset was wider than the spread of the boxplots for the 400 kHz dataset. The thinner spread of the latter suggested that it could separate the habitat classes more clearly than the 150 kHz dataset.

Multibeam Echosounder Data Processing
The results of the multibeam data processing using the QPS Qimera and FMGT software are presented in Figure 4. The multi-frequency backscatter mosaic contained bands R (red) and G (green). We assigned the backscatter mosaic for the 400 kHz frequency to band R and backscatter grid of the 150 kHz frequency to band G. Figure 4B shows the location sites of the acquisition of the ground-truth samples. The bathymetric and backscatter datasets were used as a basis to compute 70 secondary features.

Multibeam Echosounder Data Processing
The results of the multibeam data processing using the QPS Qimera and FMGT software are presented in Figure 4. The multi-frequency backscatter mosaic contained bands R (red) and G (green). We assigned the backscatter mosaic for the 400 kHz frequency to band R and backscatter grid of the 150 kHz frequency to band G. Figure 4B shows the location sites of the acquisition of the groundtruth samples. The bathymetric and backscatter datasets were used as a basis to compute 70 secondary features.

Feature Selection
The Boruta feature selection algorithm was used as a basis for the supervised classifiers. For each scale of tested multiresolution segmentations, we extracted values of all the secondary features. We predicted the habitat maps using different sets of important and tentative attributes of the Boruta

Multibeam Echosounder Data Processing
The results of the multibeam data processing using the QPS Qimera and FMGT software are presented in Figure 4. The multi-frequency backscatter mosaic contained bands R (red) and G (green). We assigned the backscatter mosaic for the 400 kHz frequency to band R and backscatter grid of the 150 kHz frequency to band G. Figure 4B shows the location sites of the acquisition of the groundtruth samples. The bathymetric and backscatter datasets were used as a basis to compute 70 secondary features.

Feature Selection
The Boruta feature selection algorithm was used as a basis for the supervised classifiers. For each scale of tested multiresolution segmentations, we extracted values of all the secondary features. We predicted the habitat maps using different sets of important and tentative attributes of the Boruta

Feature Selection
The Boruta feature selection algorithm was used as a basis for the supervised classifiers. For each scale of tested multiresolution segmentations, we extracted values of all the secondary features. We predicted the habitat maps using different sets of important and tentative attributes of the Boruta results. Figure 5A,B present the boxplots of the application of the feature selection algorithm for the best results in this study. For multiresolution segmentation scale 5, Boruta confirmed the importance of three features: backscatter 400 kHz, backscatter 150 kHz, and curvature 400 kHz. Three additional tentative features were suggested: bathymetry 150 kHz, GLCM homogeneity 400 kHz, and bathymetry 400 kHz. For multiresolution segmentation scale 10, the Boruta feature selection technique confirmed the importance of backscatter 400 kHz, backscatter 150 kHz, and bathymetry 400 kHz. It suggested three additional tentative attributes: slope 400 kHz, GLCM entropy 150 kHz, and the standard deviation of bathymetry 400 kHz created with sliding window size 9. Other secondary features were not relevant, so they were left for further analysis.
results. Figure 5A,B present the boxplots of the application of the feature selection algorithm for the best results in this study. For multiresolution segmentation scale 5, Boruta confirmed the importance of three features: backscatter 400 kHz, backscatter 150 kHz, and curvature 400 kHz. Three additional tentative features were suggested: bathymetry 150 kHz, GLCM homogeneity 400 kHz, and bathymetry 400 kHz. For multiresolution segmentation scale 10, the Boruta feature selection technique confirmed the importance of backscatter 400 kHz, backscatter 150 kHz, and bathymetry 400 kHz. It suggested three additional tentative attributes: slope 400 kHz, GLCM entropy 150 kHz, and the standard deviation of bathymetry 400 kHz created with sliding window size 9. Other secondary features were not relevant, so they were left for further analysis.

Image Processing
As described above, object-based image analysis was performed based on multiresolution segmentation of different scales, from 1 to 20. The best classification results were obtained for image

Image Processing
As described above, object-based image analysis was performed based on multiresolution segmentation of different scales, from 1 to 20. The best classification results were obtained for image objects scales 5 and 10. For scale 5, the highest performance used the k-nearest neighbors classifier with K = 1, after applying six selected secondary features (with tentative attributes). Another best classification result was calculated using the random forest classifier for multiresolution segmentation scale 10. In this example, a set of four secondary features was applied: backscatter 400 kHz, backscatter 150 kHz, bathymetry 400 kHz, and slope 400 kHz.
Independently of the object-based image analysis, we performed pixel-based image processing using Jenks natural breaks clustering algorithm with unsupervised separation of single-frequency backscatter intensity images (150 kHz and 400 kHz) for the five classes. The results of all the applied methods of classification are presented in Figure 6. A visual inspection of the generated results supported by knowledge of the ground-truth samples allowed us to determine that both of the pixel-based results ( Figure 6B,C) had difficulties with separating between the very fine sand (VFS) and sand (S) classes. These difficulties were visible in the pixel-based results as large noise in the areas with low backscatter return (the darkest areas in corresponding Figure 6A). It may have been caused by similar and slightly overlapping distribution of the backscatter intensity between these two classes ( Figure 3). Moreover, the pixel-based results probably underestimated the class of red algae (R) and simultaneously overestimated the class of boulders (B). This was visible especially in the results of the Jenks natural breaks algorithm for the 150 kHz frequency, where the areas of red algae, represented in the backscatter false color composite by dark orange areas (see Figure 6A), were almost not separated from the boulder class ( Figure 6C). The separation of the sandy gravel and gravelly sand (SG_GS) class was similar in both PB results ( Figure 6B,C) and one OB result (KNN, see Figure 6D), and it was probably underestimated in the case of the random forest result ( Figure 6E). For comparison with the PB results, it seems that the performance of both of the OB classifiers was good in separating between the very fine sand (VFS) and sand (S) classes. The noise within the areas of low backscatter intensity visible in both PB results was almost absent in the OB results. Between the two OB results there was, however, visible bias in the spatial separation between the boulder (B) and sandy gravel-gravelly sand (SG_GS) classes (compare Figure 6D,E).

Accuracy Assessment of Results
The performances of all the applied approaches of image analysis were evaluated based on error matrices and accuracy assessment statistics, shown in Tables A2-A5. Both object-based results had similar statistics with an overall accuracy of 86% and a kappa index of agreement of 0.81 (Tables A4  and A5). The accuracy assessment of the pixel-based results indicated much lower statistics with an overall accuracy of 42% and a kappa index of agreement of 0.24-0.27 (Tables A2 and A3). The error matrices confirmed our visual evaluations of the classifiers for certain classes suggested in the previous section. The highest user's and producer's accuracy per class indicated that apart from the VFS class, the KNN classifier perfectly determined the SG_GS class, while the RF method ideally separated the R class in comparison with all the other results. We took advantage of these perfect separations by combining both OB results. The predictive model that combines the object-based KNN and RF algorithms is presented in Figure 6F. This model increased the overall accuracy to 93% and the KIA measurement to 0.90. The error matrix of the combined model is shown in Table A6. Despite receiving high statistics in the accuracy assessment, we should note that the small number of ground-truth samples meant that each sample represented a high kappa value. This issue should be paid close attention as a potential source of errors when comparing this study with other marine habitat mapping studies.

Discussion
Multi-frequency, multibeam echosounder data is a promising new approach in the characterization of seabed habitats. Recent research confirms that the simultaneous analysis of many frequencies leads to a better understanding of seafloor properties [62]. Although we did not use a multibeam echosounder with multispectral mode (such as the R2Sonic 2026) for our measurements, we repeated the hydroacoustic surveys with different frequencies. Similar research has been presented in [62], where the surveys were repeated with three different frequencies: 200, 400, and 600 kHz. This approach helped us to make a detailed acoustic characterization of the seabed sediments. In our case, we performed hydroacoustic research at two frequencies: 150 kHz and 400 kHz. The additional information from the ground-truth data allowed us to define the distributions of the acoustic backscatter for all the classes of habitats, which differed depending on the frequency used. All the feature selection results confirmed that attributes of both frequencies were useful to explain the variability of the analyzed data.
The Boruta feature selection algorithm has been tested in the benthic habitat mapping literature a few times, giving promising results [22,63]. Our results confirmed the usefulness of the application of this feature selection method in habitat mapping. We recommend that if the algorithm would work on statistics gathered from object-based image analysis, then the classification should be performed on the same segmentation setting.
Our study confirmed that beyond the primary features, such as backscatter and bathymetry, some other secondary features were useful, such as slope, GLCM entropy, GLCM homogeneity, and the standard deviation of bathymetry. We suggest remembering such attributes for further feature selection actions. The list of suggested secondary features is not yet finished and may include, for example, spatial autocorrelation [63]; hue, saturation, and intensity [64]; angular range analysis [65], Q-values [66]; and maximum orbital velocity [64].
The scale of multiresolution segmentation is a very important setting of OBIA, which has an impact on further analysis, including the results of the classification [45]. Up to now, at least a few benthic habitat mapping studies have included the application of different scales of multiresolution segmentation [46,49]. To estimate the parameter in a proper way, we tested many scales from 1 to 20, with a step of 1-similar to the approach in [49]-for a wider range of the parameters. The best scale was chosen for the best accuracy assessment of the evaluated classification methods. Although the investigation of the dependency between the accuracy and the multiresolution segmentation scale used was not the aim of this study, we tested 80 sets of the OB segmentation-classification results (20 scales × 4 classifiers). Our attempts confirmed that the scale of the multiresolution segmentation was imperfect, and its incorrect determination may have led to poor results in the object-based classification. Future research should take a closer look at this phenomenon and investigate the changes in accuracy depending on the scale of the multiresolution segmentation parameter.
In this study, we performed a robust object-based methodology on a relatively small test area, characterized by diverse habitat conditions with the occurrence of unique red algae. Considering the regional conditions, there are no areas with similar characteristics within the Polish coast of the southern Baltic Sea. It should be noted that in the marine habitat mapping literature, there have been studies based on similar or smaller spatial extents, such as 0.056 km 2 [48] or 0.39 km 2 [12]. Other methods of benthic habitat mapping based on object-based image analysis were previously applied in various environments and areas, from smaller areas [48] to slightly less diverse areas within the Polish coast of the southern Baltic Sea [49] to larger areas [67]. Therefore, we can state that our methodology would be scalable.
In this study, we designed a ground-truth survey to encompass the representativeness of all kinds of habitats. It is necessary to keep in mind that a set of samples that is too small can lead to a falsified accuracy result [68]. Some studies have presented results of seabed mapping after analysis of similarly small but representative numbers of ground-truth samples [10,17,42,49]. In any such case, there is a possibility of errors, for which the sources have been described in detail (e.g., [69]). Despite the relatively small number of samples, we used varied methods of sampling, including Van Veen grabs and ROV video inspections within all the sites. Thus, our ground-truth survey was designed to obtain strict and diverse knowledge of the analyzed area.
Considering the unit of analysis, the methods of classification could be separated between pixel-based (PB) and object-based (OB) methods. The utilization of ground-truth samples allowed for further division between unsupervised and supervised techniques. The Jenks natural breaks method has been applied in habitat mapping studies several times [48,70]. In comparison with similar research, we obtained poor accuracy using this classification in this study. In our pixel-based classifications, there were visible 'salt and pepper' effects caused by the noise of the input data, which was obvious in comparison with the OB approaches [71]. The reason for the poor accuracy may be related to the overlapping distribution of the backscatter intensity for the habitat classes described in Section 3.1.
Different approaches of machine learning or decision trees have been widely used in recent predictive habitat mapping (e.g., [12,22,46,48,67,72,73]). Such approaches belong to both PB and OB techniques and supervised classification methods, executing top-down strategies: "assemble first, predict later" [13]. The OB approach of supervised classifiers has been developed over the last few years in marine habitat mapping (e.g., [12,46,48,65,67]). Many of the aforementioned studies concerned evaluations of classification methods. In particular, the random forest method seems to be a promising method for the automatic classification of benthic habitats. For example, in [65], the RF method achieved an excellent result of 94% overall accuracy and a KIA of 90%. Results with 80% overall accuracy are common in marine habitat mapping when using the random forest classifier [12,22,67].
The KNN classifier has been applied much less often in marine habitat mapping studies with other well-known examples [46,48]. In these studies, the KNN classifier separated classes with an overall accuracy from 52% to 66%. Considering the KIA value (from 0.38 to 0.43), the performance of the KNN classifier in these studies can be described as fair to moderate [46]. In our study, we obtained better accuracy using this method, but possible sources of errors should be kept in mind (see Section 3.5). We recommend continuing to evaluate this method of classification in further habitat mapping studies.
The application of two frequencies of MBES measurements is very interesting from the viewpoint of marine habitat mapping. The acoustic responses of the habitats are dependent on the frequency; therefore, distinct frequencies may reveal different attributes. With two frequencies, we have a better possibility of achieving habitat discrimination. One recent study has suggested that the combination of PB and OB methods can lead to a better separation of classes, resulting in better accuracy [12]. In the aforementioned study, the application of such an approach increased the overall accuracy by 5.1% and the kappa value by 0.06 (overall accuracy-83.6%, KIA-0.78). In comparison, the combination of two OB classifiers in our study allowed us to increase the overall accuracy by 7.1% and the KIA by 0.10. Both results suggest that the combination of the best classification outcomes might be useful and promising in future marine habitat mapping studies.

Conclusions
In this study, we developed a robust workflow for predictive habitat mapping based on multi-frequency, multibeam echosounder data. For the first time, we recognized and distinguished six nearshore habitats of the Rowy area in the southern Baltic Sea. The identified habitats included very rare seascapes for the Polish coast of the Baltic Sea, encompassing species of red algae and boulder sites colonized by Mytilus Trossulus bivalves. Future research will be conducted using the same model of multibeam echosounder device but with an acoustically calibrated option regarding the backscatter strength. Therefore, the composition of the seafloor will be represented from a physical point of view, which would create new perspectives in benthic habitat mapping, such as the ability to track spatial changes of habitats over time [42].
An important part of our workflow was the feature extraction and selection. We extracted 70 secondary features of the bathymetric and backscatter data. They included either pixel-based statistics or object-based GLCM textures. Some features were calculated based on multiscale or object-based approaches. The Boruta feature selection algorithm allowed us to choose relevant attributes, which included the following (beyond bathymetry and backscatter): slope, GLCM entropy, GLCM homogeneity, and the standard deviation of bathymetry. Our results confirmed the usefulness of the application of the Boruta feature selection method in habitat mapping. The proper feature selection helped us to discriminate habitat classes with similar distributions of backscatter intensity. However, the list of secondary features is not yet complete. We suggest expanding it for other attributes and a multiscale approach.
We tested different aspects of image processing, such as pixel-based and object-based image analysis, unsupervised and supervised methods of classification, and habitat mapping based on single-frequency and multi-frequency multibeam echosounder (MBES) datasets. Our results demonstrated the great usefulness of object-based image analysis and supervised classifiers, such as the random forest and k-nearest neighbors algorithms. Because, in our case, each classifier performed better with respect to specific classes of habitats, we took advantage of the best results and combined them, obtaining very good agreement-93% overall accuracy and a 0.90 Kappa coefficient. We applied such a combination based on two object-based results. In our study, the application of the multi-frequency, MBES dataset with the proper selection of secondary features significantly increased the accuracy of the habitat maps with respect to the single-frequency results.
Our workflow encouraged us to offer some additional suggestions. We recommend taking a closer look at the scale of multiresolution segmentation in object-based marine habitat mapping studies. A particularly interesting topic is the changes in accuracy depending on the scale of multiresolution segmentation parameter. We also recommend evaluating the k-nearest neighbors method of classification in future habitat mapping studies.
The rapid development of the hydroacoustic industry will bring about the greater availability of multi-frequency, multibeam echosounder data. Our predictive habitat mapping of shallow euphotic zones creates a new scientific perspective and provides relevant data for the management of natural resources.