1. Introduction
Multibeam echosounders (MBES) have recently become the preferred survey tool for seabed mapping. The high density of soundings, which are deployed in a swath pattern that is typically several times wider than the water depth, makes them ideal for efficient hydrographic charting [
1,
2]. Additionally, MBES record measurements on acoustic reflectivity, referred to as “backscatter”, which can be used to derive information regarding the nature of the seafloor [
3], such as volume heterogeneity (e.g., sediment grain size, distribution, and geological layering) and interface characteristics (e.g., substrate, roughness, bedforms; [
4,
5]). Therefore, the MBES signal provides both qualitative and quantitative information on seafloor environmental characteristics, and it has been commonly used to delineate and map surficial geology (e.g., [
6,
7,
8,
9]). As an extension of this application, relationships between benthic species and physical seafloor characteristics have allowed backscatter to be utilized as a tool for describing benthic habitat. A growing number of studies are now using backscatter, often combined with seabed morphology information derived from bathymetric measurements, to study and map benthic species distributions and biodiversity patterns on the seafloor [
10,
11,
12].
The interpretation of backscatter intensity is complex when compared to bathymetric measurements [
5], which are relatively straightforward to derive using the geometry and transit time of the acoustic signal [
1]. This is commonly recognized as a limitation of using backscatter as a proxy for seafloor geology or benthic habitat [
3,
10,
12]. There are several confounding factors that need to be considered for backscatter intensity to represent seafloor material characteristics:
The MBES electronics that are involved in transmitting and receiving the acoustic signal, and how they measure acoustic intensity [
13]. This commonly varies between sonar manufacturers, between successive generations of sonars from the same manufacturer, and even between different units of the same sonar from the same manufacturer. Information on how these measurements are derived is often proprietary and unknown by the end user of the data [
5].
Propagation loss of the acoustic signal in the water column. This is affected by a complex interaction between temporally and geographically dynamic oceanographic parameters, such as temperature and density, and the presence of water column scatterers, such as particulate material, plankton, and nekton, which are often not fully quantified during data acquisition [
13,
14].
Frequency-dependent interaction between the acoustic signal and substrate. The depth of substrate penetration and the reflection and scattering of the signal at the seabed depend on sediment characteristics and co-depend on grazing angle (see point 4 below), but will also change with operating frequency [
13,
15].
The angular dependence of the backscatter response, which co-depends on substrate material characteristics. For example, a hard/rough seabed with coarse sediments scatters the acoustic signal in all directions, yielding a backscatter intensity that is largely independent of the grazing angle of individual beams across the MBES swath. However, a soft/smooth surface comprising fine sediments produces less scattering, resulting in a maximum backscatter intensity at nadir and progressively lower returns with increasing grazing angle of the outer beams [
13,
16].
Geometric and radiometric corrections that account for the factors that are outlined above can now be readily applied via commercial MBES post-processing software, allowing for the production of compensated backscatter mosaics for geological interpretation [
17]. Alternatively, some studies have sought to preserve the angular dependence of the backscatter response to inform seafloor classification [
16,
18,
19]. Regardless of how these challenges are handled, the goal is generally to produce backscatter values that consistently represent measurable seafloor characteristics. A singular unified solution is difficult to achieve though, owing to the large number of confounding factors that affect the calculation of backscatter intensity.
Obtaining absolute measurements of seafloor backscatter is a particularly challenging task. Ideally, backscatter measurements should be calibrated to combine or compare both spatial and temporal datasets from different surveys [
5]. There are two types of MBES backscatter calibrations: absolute and relative. Absolute calibration should preferably be handled by the sonar manufacturer under controlled conditions, where transducer and electronic components are measured to determine frequency response, angular directivity, level linearity, and other relevant parameters for both signal transmission and reception. In practice, this is often not adequately undertaken and no quality standards are currently available for backscatter data—variability is commonly observed between MBES systems [
5,
20]. Relative calibration has become a practical option in the absence of absolute calibration, and a number of methods have recently been explored including the use of natural reference areas [
20,
21], calibration targets [
22], and comparison against calibrated single-beam sonar backscatter collected simultaneously [
23,
24]. At present, there are no standard, widely accepted approaches to calibration. The large volume of non-calibrated legacy data will remain relative even if calibration standards are eventually adopted, posing challenges for the use of such datasets in benthic habitat and surficial geology mapping studies.
Seafloor mapping can involve combinations of separate surveys that utilize multiple sonar systems, potentially of various operating frequencies, sometimes over multiple years (e.g., [
25,
26]). Because standard calibration procedures have not been widely adopted, and owing to the inherent complexities of measuring backscatter return, simply combining backscatter data from separate surveys or systems for use in seabed mapping is likely to introduce large amounts of error when using automated approaches, which can ultimately impact seabed classifications or the quality of statistical models. Furthermore, multi-source backscatter datasets are becoming increasingly common with the maturation of MBES as a technology [
12], and they are an invaluable resource given the high cost of acquisition. Therefore, it is necessary to develop robust methods for using multi-source backscatter for geological and biological seabed mapping, but few studies have addressed this challenge. Notable exceptions have generally employed strategies to avoid the integration and harmonization of different backscatter datasets to respect frequency-dependent differences in the sediment response for MBES [
26] and side-scan [
27] datasets. Such approaches are only feasible with a low number of survey datasets, or with datasets of similar extent, combined with adequate ground-truthing within each survey coverage.
Combining multiple backscatter datasets to generate a single harmonized mosaic is a potential approach for dealing with multi-source backscatter data. The overlap between surveys provides a common area at which to compare backscatter intensities and derive statistical relationships between datasets. Predictive surficial geology and habitat modelling generally rely on continuous-coverage environmental data layers to produce map products, and a single harmonized backscatter mosaic is therefore often desirable (e.g., [
9,
28]). An uneven distribution of ground truth over separate backscatter datasets inhibits the production of full-coverage habitat maps and enhances uncertainty in the map products [
26]. Harmonizing backscatter datasets to produce a single mosaic prior to habitat mapping would ameliorate some of these difficulties. Methods to facilitate the combination of several partially overlapping backscatter datasets have been applied by Hughes Clarke et al. [
25] in the Bay of Fundy, Canada, wherein offsets were applied to the values of one backscatter layer to match another. Such “bulk shift” approaches to multi-source backscatter harmonization have not otherwise been widely developed.
Here, we explore methods for harmonizing disparate backscatter datasets for use in seabed mapping. The recent implementation of multispectral MBES—collecting data at multiple frequencies simultaneously to produce multiple backscatter datasets [
15,
29]—provides new opportunities to simulate the application of multi-source harmonization methods. In this study, we use multispectral MBES datasets to simulate the combination of backscatter data from separate surveys, wherein the error of harmonized datasets can be measured against actual values that were obtained from different years or from different operating frequencies. The goals of this paper are to:
assess the feasibility of harmonizing backscatter datasets from different surveys obtained using the same MBES system and operating frequency, and also using different operating frequencies, using bulk shift methodologies at varying levels of complexity;
compare several bulk shift methods for harmonizing backscatter datasets collected from different surveys using the same operating frequency, and with different operating frequencies; and,
provide general recommendations for harmonizing multi-source backscatter datasets, with an R function for implementing these methods and evaluating the results based on the findings in this paper.
2. Materials and Methods
Three multispectral MBES datasets were used in this study to investigate the harmonization of relative backscatter obtained using three different operating frequencies for two different sites, and over two years: (1) Bedford Basin collected in 2016, (2) Bedford Basin in 2017, and (3) Patricia Bay collected in 2017 (
Figure 1). All three datasets were collected using an R2Sonic 2026 MBES (two different MBES units were used in 2016 and 2017), operating at 100, 200, and 400 kHz frequencies simultaneously. The MBES was integrated with a Valeport sound velocity probe mounted adjacent to the sonar head, and POS MVWave Master Inertial Navigation System (INS), utilizing two Trimble GPS antennas. All of the systems were integrated through QPS QINSy software installed on the acquisition PC aboard the wheelhouse of the survey vessel. CTD casts at the time of survey were conducted using an AML Base X2 that was fitted with a set of conductivity, temperature, and pressure probes. Data were processed using the QPS software suite with a consistent workflow for each site.
2.1. Patricia Bay, British Columbia
The physiography and surficial geology of Patricia Bay has been described previously [
30]. The multispectral MBES survey covers a site in the central section of the bay (
Figure 1a), ranging in depth from 20 to 72 m, deepening from northeast to southwest. The seabed along the centre of the survey, following the long axis, is expected to be muddy, flanked on both sides by patches of sand then gravel at the survey margins [
30]. These patches are visible in each of the multispectral frequencies, but there seem to be greater differences in relative backscatter between sediment types at the lowest frequency (100 kHz) than the highest (400 kHz), which appears to be more homogenous.
2.2. Bedford Basin, Nova Scotia
The two Bedford Basin datasets cover approximately the same area of seafloor at the mouth of the basin (
Figure 1b), where depths range from 14 to 63 m. The physiography has been described extensively [
15,
31]. A shallow ridge at around 15 m depth in the southern part of the survey contains coarse, hard sediment (bedrock and gravel, with boulder-sized clasts). Depth increases to the north, where the sediments are softer and the seafloor becomes relatively flat. Visible differences between backscatter frequencies in the deeper part of the survey are believed to be a result of subsurface dredge spoil, smothered by fine sediments [
15].
2.3. MBES Data Processing
Bathymetry and backscatter data quality were monitored during acquisition, and post-processing was conducted using the QPS software suite. The Fledermaus Geocoder Toolbox (FMGT) was used to process multispectral backscatter data. Absorption coefficients were calculated to correct for water column attenuation of each frequency using CTD casts at each area, and all frequency-specific corrections (e.g., pertaining to beam widths, etc.) were applied automatically in FMGT. By extracting frequency-specific pings from the R2Sonic 2026 data in FMGT, separate corrected backscatter mosaics were generated for 100, 200, and 400 kHz frequencies, which represent the relative backscatter intensity without the angular dependence of beam incidence on the seafloor. Bathymetric data were corrected for tides and manually cleaned in QPS Qimera. Each data layer was exported as a 0.5 m ASCII grid file.
2.4. Simulating Harmonization
Each study area was divided into two parts that overlapped by approximately one full survey line to simulate the combination of datasets that were acquired during separate surveys. For each study site, one of the two sections were treated as the reference area, containing the “target” backscatter dataset, while the other section was treated as the test area, containing the “shift” backscatter dataset (
Figure 1). The overlap between the target and shift datasets represents the information that is normally available to the mapper for harmonizing datasets from different surveys. The true accuracy of the correction can be determined by comparing the corrected shift layer to the portion of the target dataset that was not used to inform the correction, since each frequency of the multispectral datasets has the same extent (i.e., the target data within the test area).
Following Hughes Clarke et al. [
25], we will refer broadly to the correction of entire backscatter raster datasets as “bulk shift” approaches. If several key assumptions can be accepted, it may be feasible to use the areas where MBES surveys overlap to calibrate their relative backscatter measurements, which can be used to standardize the datasets to produce a single harmonized output. First, if the goal is to generate an internally consistent relative proxy for seabed substrate, then we must accept that the backscatter measurements of each dataset are a function of the same substrate properties. We must also accept that the values obtained within each survey are internally consistent (i.e., that the same substrate produces the same backscatter response). Finally, there must be an acceptable level of temporal homogeneity between separate surveys—large amounts of natural variability throughout time will preclude any attempts at harmonization. If these assumptions are tenable, attempts can be made to adjust the relative values of one backscatter dataset to match the scale of another.
Here, the error (i.e., the difference) between the target and shift datasets from the area of overlap was treated as the response variable when testing bulk shift methods, which was preferred to treating the shift dataset as the response for several reasons. First, it facilitates the visualization of the location and magnitude of error as a function of the dataset being shifted (e.g.,
Figure 2), which may help considerably in conceptualizing the relationships between the datasets being combined, and therefore in selecting an appropriate bulk shift method. This can also inform on the effects that the shift will have on specific data values, which is relevant, for instance, when the full dynamic range of the shift dataset is not represented in the area of overlap. How a given bulk shift method handles extrapolation of the error at these values can have important consequences for the final harmonized mosaic. Second, the primary map product of bulk shifts that use the error between backscatter layers as the response will be the map of predicted error (i.e., the correction values) for the shift layer, which makes explicit the locations where corrections are occurring if the bulk shift method is non-uniform. This can be useful for identifying the effects that the shift had on the detection of specific features or substrate types, and for assessing limitations of the correction.
A consistent workflow was followed for each bulk shift simulation consisting of five steps. Steps 1–4 can be performed outside of simulation; step 5 is the simulation evaluation:
Visualization: backscatter values where the datasets overlap are plotted, and the error between datasets is plotted to observe and diagnose obvious relationships (e.g.,
Figure 2). Filters such as locally estimated scatterplot smoothing (LOESS) can be used as a visual aid.
Model fitting: a model is fit to the error between datasets. The fit is assessed statistically and visually against plots from 1 above.
Shifting: the fitted model is used to predict the error across the shift dataset. The prediction is mapped, visually assessed, and added to the shift dataset to obtain a corrected layer with respect to the target.
Mosaicking: the corrected shift layer is mosaicked with the target to produce a single harmonized mosaic.
Evaluation: the corrected shift layer values are compared to the portion of the target dataset withheld for testing to produce the test statistics. The test error is also mapped and visually assessed to aid in determining the quality and limitations of the bulk shift. The harmonized mosaic is visually compared to the full original target dataset.
An R function is included (
Supplementary Code S1) that automates the harmonization of two backscatter raster datasets using this process (steps 1–4), also returning the diagnostics that are presented in this study. The function allows for the use of all modelling methods presented here, facilitates alternative modelling approaches, and includes options for tuning model parameters. Additionally, it supports any number of covariates, subsampling for large datasets, and optionally returns diagnostic two-dimensional (2D) and interactive three-dimensional (3D) plots. Details and guidelines for harmonizing backscatter datasets using this function are provided in
Supplementary Document S1 as a tutorial.
Modelling methods spanning a range of complexities were tested for each simulated bulk shift (step 2 above). The original 0.5 m backscatter grids were resampled to 1 m resolution (using a mean aggregation) to reduce the sizes of raster files. The area of overlap between the datasets still contained ~180,000 and 80,000 cells for the Bedford Basin and Patricia Bay datasets, respectively. 10,000 cells from each overlap were randomly sampled for statistical modelling to facilitate computational speed. At the simplest, the mean of the error between the two layers was added to the shift layer (i.e., an intercept-only model), which assumes that the difference between backscatter responses is constant across the dynamic range of the dataset. Simple and multiple linear regression (SLR and MLR, respectively) were used to test for non-uniform error across the range of values being corrected in the shift dataset. Other regression approaches that can accommodate non-linearity or polynomials might also be used to model non-linear error (see
Appendix A), but these may require tuning for each application. Boosted regression trees (BRTs) were used for their flexibility in modelling non-linear errors, and hyperparameters were held constant throughout all tests.
Exploratory analysis suggested that, although error between backscatter datasets was sometimes monotonically related to the dB level of the shift dataset (e.g.,
Figure 2), this was not always the case (e.g.,
Figure 3a). This suggested that error might not always be predicted by backscatter alone, and that other variables might influence the discrepancy between datasets. For example, the error between 400 and 200 kHz datasets from Patricia Bay was also linearly related to bathymetry (
Figure 3b), and this relationship appeared to be stronger than that of the backscatter datasets alone (
Figure 3c).
Therefore, bathymetry was also tested to explain error between backscatter datasets. It was implemented as an independent predictor in SLR, as a covariate in MLR, and as a covariate in BRTs. To test and control for potential overfitting, the addition of bathymetry to BRT models was also tested without interaction by first modelling the relationship between bathymetry and error using SLR, and then modelling the residual error using BRTs to produce an additive model. Other morphometric variables expected to influence backscatter response might also be included (e.g., roughness, slope; but see [
32]), depending on the study area, MBES system, and application.
Table 1 summarizes the methods tested here.
Backscatter values from each operating frequency differed between the 2016 and 2017 Bedford Basin datasets (
Figure 4a), which were collected using two separate R2Sonic 2026 units. The first simulations attempted to harmonize 100, 200, and 400 kHz datasets that were acquired by surveys from separate years in the Bedford Basin using each of the methods in
Table 1. Similarly, data that were collected during the same survey, but using different frequencies, also differed (
Figure 4b). The harmonization of datasets of different frequency from the same year was simulated for every combination of datasets for both the Bedford Basin and Patricia Bay datasets using each method in
Table 1.
Table 2 summarizes the bulk shift simulations.
2.5. Analysis and Comparison
Statistical and visual analyses were conducted for each bulk shift simulation. The first goal was to assess the feasibility of harmonizing backscatter datasets from different surveys that were obtained using the same MBES system and operating frequency, and to assess the feasibility of harmonizing backscatter datasets that were obtained using different operating frequencies. The second was to compare several methods for harmonizing these datasets to determine whether some are consistently robust, and are generally preferable.
Two statistics were used to compare the relative quality of the bulk shift methods. The mean absolute error (MAE) between the corrected shift layer and the target layer at the test area (including the overlap;
Figure 1) was calculated to measure the average error of the bulk shift over all of the raster cells. This describes the error between the shifted and target layers after correction but provides no information on the distributions of the data, which has implications for the quality of the final mosaic. The two-sample Kolmogorov–Smirnov (K-S) statistic,
D, measures the difference between cumulative distribution functions (CDF) for continuous variables (e.g.,
Figure 5).
D was calculated to estimate how closely distributions of the shifted datasets matched the target, where smaller values indicate greater similarity.
In practice, the quality of a bulk shift will often be evaluated on the “fitted” data, where surveys overlap, yet it is important that those statistics also be representative of the full dataset, outside the area of overlap. Fitting a model that is too specific can cause “overfitting”, producing over-optimistic evaluation statistics when compared to its actual performance, and hindering its transferability to new data—in this case, the full backscatter dataset being corrected. The differences between the MAE and D calculated on the withheld test data and those calculated on the fitted data at the area of overlap were quantified to determine whether the fitted statistics of a given bulk shift method are representative of its actual performance, or if the method tends to overfit. These differences are denoted θ, wherein a positive value indicates overfitting (the fitted statistics seem better than the test), and an equal or negative value indicates accurate or conservative estimates of performance. Ideally, θ should be negative, indicating that a given model and its fitted evaluation statistics are transferable to the full dataset.
Each bulk shift method (
Table 1) was ranked according to the statistical criteria to facilitate their comparison. The ranks for MAE and
D were assigned based on the average rank relative to the other methods across all simulations for (i) harmonization of the Bedford Basin datasets between years with the same operating frequency, and (ii) the harmonization of Bedford Basin and Patricia Bay datasets of different operating frequencies. The differences between evaluation statistics of the fitted and test data (
θ) were used to assign ranks for overfitting, describing how well the fitted statistics represented the quality of the bulk shifts as compared to the target dataset in test area. If, after correction, the value of the test statistic was less than or equal to that of the fitted, then the estimate was considered to be conservative, and it was assigned the best rank (tied with any other method demonstrating conservative estimates). The ranks for
θ of each statistic were averaged across all simulations for comparison.
The quality of each bulk shift harmonization was also visually assessed. The fit of each model to the backscatter error was visualized using 2D plots for bivariate models (i.e., only backscatter or bathymetry) and 3D plots for multivariate models to assist in understanding how the model corrections affect the data being shifted. The distribution of test error for the shifted layer was mapped by comparing it to the withheld portion of the target layer, and the final harmonized mosaic was visually compared to the original target layer (see
Appendix C).
4. Discussion
It is common for seabed mappers to inherit uncalibrated MBES data from several surveys, often in a processed form and without raw data. Ideally, backscatter acquisition should be calibrated to facilitate downstream comparison (e.g., using calibration targets or natural reference areas; [
20,
22]), yet it is impossible to ensure that these practices are universally adopted. The need to establish methods for efficiently handling uncalibrated multi-source backscatter datasets will persist even if calibration is eventually widely adopted, given the large amount of pre-existing MBES data. In a geological or biological mapping context, independent analysis of the datasets and
post hoc combination of results is an effective solution when the number of backscatter datasets is relatively low, and each has been adequately ground-truthed [
26], but this becomes problematic when ground-truth locations are unevenly or sparsely distributed over the backscatter layers. Multi-source backscatter harmonization can address some of these issues, but it has previously been performed ad hoc (e.g., [
9,
25,
28]). Here, we present the first investigation, to our knowledge, into standardized and repeatable methods for multi-source backscatter harmonization for seabed mapping. Although we focus on application to MBES data here, these methods are potentially applicable to backscatter data of other sources, such as side-scan sonar, including interferometric and synthetic aperture systems.
The results suggest that the harmonization of backscatter data acquired by a single MBES system during separate surveys is generally feasible using conservative bulk shift approaches. Here, the accuracies of bulk shifts using multiple linear regression were at least comparable, and often superior, to the most complex methods tested (BRTs). Furthermore, multiple linear regression largely avoided issues associated with model overfitting; the fitted model statistics were generally representative of the quality of the bulk shift in the test area. More flexible and complex models sometimes performed well, but nearly always overfit the data, producing fitted evaluation statistics that were over-optimistic regarding the error and distributions of the shifted data. The backscatter datasets used for these simulations were collected one year apart; greater temporal variation might be expected between legacy datasets that comprise longer time periods as a function of changes to the surficial geology of the seafloor.
The results suggest that harmonizing backscatter datasets of multiple frequencies is also feasible, with the caveat that success will partly depend on the extent to which the frequencies differ. In our results, harmonized mosaics were generally of higher quality when frequencies were more similar. Mosaics comprising datasets of different frequency produced using multiple regression were comparable, and sometimes superior, to those produced using the same frequency but from separate surveys (e.g., 200 to 400 kHz in
Figure 9 vs. 400 kHz in 2017 to 2016 in
Figure 6). Conversely, harmonized mosaics from 100 and 400 kHz frequencies were never as good as those produced using the same frequency from different surveys.
The frequency-dependent response of the seafloor is potentially the greatest challenge when attempting to harmonize backscatter datasets that were acquired using multiple MBES operating frequencies. It is well-accepted that the acoustic response of the seafloor is a function of the operating frequency of the MBES system [
33,
34,
35,
36,
37]. Frequency-dependent interactions between an acoustic signal and the seafloor (e.g., penetration depth) produce specific backscatter responses, which may be partially or fully lost at different frequencies when applying harmonization approaches [
25]. We may never expect to fully match the spectral detail between datasets of disparate operating frequency at a high resolution; these simulations suggest that the feasibility of harmonizing datasets of different frequency depends on the extent to which the frequencies differ, and the site-specific characteristics of the surficial substrata (e.g., stratification within the acoustic penetration depth, as observed at the Bedford Basin; see [
15]).
Relatedly, we expect data resolution to be an important factor in determining the feasibility of harmonizing multi-source backscatter datasets. MBES bathymetric and topographic error is partially dependent on spatial scale [
38], and we expect the same to be true for backscatter. The expected result of coarsening backscatter data resolution is a reduction in both natural variability (i.e., “noise”) and the ability to resolve fine scale seabed heterogeneity—both of which may facilitate harmonization. Different operating frequencies may detect unique fine-scale features at a high resolution, but they may describe broader, more general substrate trends at coarser resolutions. This is a form of scale dependence that might be exploited to render multi-frequency datasets comparable, depending on the intended use of the backscatter mosaic.
The qualities of the bulk shift methods examined in this study were partly evaluated on the representativeness of the fitted evaluation statistics compared to actual performance (i.e., “overfitting”). Normally, the fitted statistics from where backscatter datasets overlap would be the only information available to the operator. Therefore, it is important to determine how well these represent the actual quality of the correction, and if a given method is prone to overfitting. In some instances, it might be possible to design an independent assessment, but it would need to be spatially explicit (e.g., spatial blocking; [
39]), since the large amount of spatial autocorrelation within a continuously-sampled dataset hinders other common assessment techniques, such as cross-validation. An adequate blocking design might be difficult to achieve when the overlap between datasets is minimal, or where large spatial blocks would reduce the range of sampled values. The latter could result in sub-optimal bulk shift corrections where there are non-uniform backscatter differences in environmental, and by association, geographic space.
Visual assessment remains an important component to determining the quality of multi-source backscatter mosaics, and the operator should be attentive to several indicators of poor harmonization. Visual artefacts in the final mosaic along the boundaries between datasets are the most obvious sign that they have been poorly integrated. These may occur at only certain areas of the boundary, which likely means that the difference between backscatter datasets is non-uniform across either the dynamic range of the datasets or other environmental variables. Potential solutions are to explore the integration of auxiliary variables, such as bathymetry, or to adopt a non-linear modelling method—although the results here suggest that the latter should be approached cautiously. Apparent differences in spectral qualities, such as the dynamic range between sections of the mosaic, should be scrutinized. A “washed out” effect in the corrected portion of the mosaic (e.g.,
Figure 11c) might suggest that the shifted dataset had a greater dynamic range than the target in the area where they overlapped. Our results suggest that this effect is more likely to occur with flexible modelling methods, which should be avoided in such cases.
Additional covariates other than the backscatter values of the dataset being corrected may be used to improve the quality of bulk shifts in some cases. The degree to which bathymetry explained the error between backscatter datasets for both sites in this study was surprising. In all cases, there was a linear increase in absolute backscatter error with depth for datasets of different operating frequency (
Figure 13). At both study sites, the slope of the linear relationship corresponded to the magnitude of difference between operating frequencies (i.e., steeper slope for 100 to 400 than 200 to 400 kHz shifts). It was particularly interesting that the error between 100 and 200 kHz datasets was near zero at <20 m for both sites, but increased substantially with depth. Initially, the possibility that this was caused by a correlation between depth and substrate type (and frequency-dependent response) seemed likely, but the gradation and linearity of the relationship motivated an alternative explanation. The linearity was highly similar for each combination of frequencies at both sites and persisted even when a consistent relationship between backscatter and depth was lacking, suggesting that it was not caused solely by a correlation of substrate with depth. The sign of the slope for this relationship corresponded to the order in which frequencies were harmonized at both sites. Where a lower frequency was corrected to a higher one, greater positive error with depth was observed; where a higher frequency was corrected to a lower, greater negative error with depth was observed. This suggests that the corrected relative intensity of higher frequencies decreased at a greater rate than the lower frequencies with increasing depth.
We suggest two potential explanations for the linear change of backscatter error (i.e., difference) with depth at these sites. It would be most simple to conclude that the calculation of transmission loss (TL) was systematically incorrect because of errors in the absorption coefficient calculation—possible, for example, if the frequency-dependent decrease in attenuation with depth was modelled incorrectly. There is some evidence to support this theory, as there were also consistent linear increases in positive error with depth between datasets of the same frequency collected over multiple years at the Bedford Basin. In other words, the intensity of the 2017 data became relatively weaker than the 2016 with increasing depth, for all frequencies. The magnitude of increase in error with depth between frequencies would be surprising, though, at a depth range of only ~50 m if it was due to a depth-attenuation calculation error (see [
1]). Alternatively, the increase of error with depth might be at least partially characteristic of how the signal from the R2Sonic 2026 interfaces with the substrate, given the depth, for each frequency. For example, the increase in error with depth might be a function of frequency-dependent substrate penetration and bottom loss (i.e., sediment attenuation [
33]). As depth increases, the intensity of an acoustic signal attenuates due to absorption throughout the water column, but at a greater rate for higher frequencies [
1]. This is accounted for in post-processing to produce backscatter values that are depth independent. However, a factor that cannot be easily accounted for is how bottom loss (L; [
36]) might co-depend on frequency and depth, given decreased energy at the seafloor due to water column attenuation. This problem becomes increasingly complex while considering the acoustic properties of different substrate types, and the potential for differences in acquisition parameters. Regardless, it seems that the linear relationship between depth and error describes a consistent difference in how each frequency represents the substrate, given water depth.
From a survey at the International Marine Geological and Biological Habitat Mapping conference (GeoHab) in 2014, Lucieer et al. [
12] identified “lack of calibration required for optimizing backscatter data”, “the lack of standardized methods available for referencing”, and “the ongoing struggles with large data volumes” as the top concerns among backscatter end-users. Therefore, the work presented here is part of a larger effort to develop repeatable and standardized approaches for the acquisition and use of backscatter data for seabed biological and geological mapping (e.g., [
13,
40]. This effort has become increasingly urgent given the proliferation of MBES as a seabed mapping tool and the importance of backscatter as a proxy for substrate properties. Though the principles of underwater acoustics and sonar for seafloor mapping are well established [
1,
13,
36], the rapid development of MBES technology and its widespread uptake by a diversity of disciplines has created a knowledge gap regarding how to best implement acoustic data for biological and geological applications [
12]. Exciting new developments such as multispectral MBES continue to advance the power of acoustics for these applications, yet the advance of knowledge regarding how to utilize these technologies appropriately needs to keep pace. This is a great challenge, but its importance cannot be overstated given the urgent need for spatial information on marine ecosystems.
5. Conclusions
Our findings suggest that relatively straightforward statistical methods were effective at harmonizing backscatter datasets of the same frequency from separate surveys. We suggest that simple parametric approaches may often be preferable to more flexible ones based on the error and distributional statistics presented here, and the quality of the simulated mosaics. Multiple linear regression facilitates interpretation of multiple predictors, while retaining the distribution of the data–largely avoiding issues such as dynamic range compression and overfitting. More flexible approaches in this study produced unnecessarily complex corrections that altered the distribution of backscatter values in the dataset being shifted, to the detriment of mosaic quality. It is possible that flexible machine learning approaches, such as BRTs, may be better suited to instances of extensive overlap between datasets, where the entire dynamic range of the backscatter dataset undergoing correction has also been covered by the target layer. This has not been tested in detail, and requires further investigation.
Harmonizing backscatter datasets of different frequency is feasible, but whether it is appropriate depends on the magnitude of difference between frequencies. Therefore, critical to the harmonization procedure is the operator’s ability to assess the mosaic quality, and the tools available for this task are the fitted model statistics and the mosaic map. Although not explored in detail here, we also expect the data resolution to be an important factor in determining whether multi-frequency harmonization is feasible. Here, flexible modelling methods sometimes produced accurate mosaics using datasets of different frequency, but the fitted statistics were generally over-optimistic. Based on these findings, we advise caution in applying highly flexible non-parametric models for backscatter harmonization, especially when the overlap between datasets is limited. Again, simpler regression methods can largely avoid issues that are associated with overfitting. Visual analysis of the harmonized mosaic and the data distributions can serve as indicators of the quality of the harmonized product, alongside analysis of the fitted statistics.
It would be highly beneficial to the harmonization procedure if the acquisition of new backscatter data within an area of previous surveys overlapped a representative proportion of benthic conditions that were previously mapped. This could be efficiently accomplished by ensuring that the new survey covers at least the entire backscatter dynamic range of the previous survey, and ideally also the bathymetric range. We suggest that this is more efficient than maximizing the size of the overlapping area for harmonization. Although both are desirable, the former should be prioritized if necessary.