Results from the First Phase of the Seaﬂoor Backscatter Processing Software Inter-Comparison Project

: Seaﬂoor backscatter mosaics are now routinely produced from multibeam echosounder data and used in a wide range of marine applications. However, large di ﬀ erences ( > 5 dB) can often be observed between the mosaics produced by di ﬀ erent software packages processing the same dataset. Without transparency of the processing pipeline and the lack of consistency between software packages raises concerns about the validity of the ﬁnal results. To recognize the source(s) of inconsistency between software, it is necessary to understand at which stage(s) of the data processing chain the di ﬀ erences become substantial. To this end, willing commercial and academic software developers were invited to generate intermediate processed backscatter results from a common dataset, for cross-comparison. The ﬁrst phase of the study requested intermediate processed results consisting of two stages of the processing sequence: the one-value-per-beam level obtained after reading the raw data and the level obtained after radiometric corrections but before compensation of the angular dependence. Both of these intermediate results showed large di ﬀ erences between software solutions. This study explores the possible reasons for these di ﬀ erences and highlights the need for collaborative e ﬀ orts between software developers and their users to improve the consistency and transparency of the backscatter data processing sequence. of formats. backscatter processing concurrently with this study. A detailed analysis of various processing stages based on the sonar equation [7] is provided in this updated workflow and exported as an summary file with graphical displays of the various corrections. An ASCII output file is also produced that contains several fields describing the


Selection of Intermediate Processed Backscatter Levels
A template backscatter data processing pipeline and nomenclature were recently proposed for adoption to assist standardizing backscatter data processing [9]. In this theoretical pipeline, the various stages of radiometric and geometric corrections are chronologically ordered, and the intermediate backscatter levels obtained between each stage are named (BL 0 through to BL 4 ), providing a sequence of intermediate results ( Figure 1). However, since each software package applies these corrections in different orders, most of these specific outputs cannot be produced without significantly modifying the data processing code. For the current study, after discussion and agreement with software developers, it was concluded that a phased approach would be most effective. In this first phase, only the intermediate levels that can be provided without significantly altering the code (i.e., BL 0 and BL 3 ) were considered.  Figure 1 in [9]), resulting in the two common backscatter products-angular response curves and mosaic. Only the BL0 and BL3 intermediate outputs were requested from software developers during the current study.

BL0: The Backscatter Level as Read in The Raw Files
The first stage of backscatter processing consists of reading the raw backscatter data recorded in the MBES raw data files. For both Kongsberg and Teledyne Reson systems, the raw data format organizes the collected information into several types of data units, known as datagrams, and the structure of each datagram type is described in format specifications made publicly available by the manufacturers [21,22]. Not only are backscatter data typically available in different datagrams, but the formats, the intermediate calculations applied, and the output resolution may have changed over the years. For example, in Kongsberg systems, the backscatter data are available in both the "onevalue-per-beam" and "several-samples-per-beam" formats in two different datagrams ("Depth" datagram for the former and "Seabed Image" datagram for the later). In November 2005, the "Depth" datagram was superseded by the "XYZ 88" datagram, and the "Seabed Image" datagram was superseded by the "Seabed Image 89" datagram, with both newer datagrams upgrading the data resolution from 0.5 dB to 0.1 dB [21]. Although Kongsberg released KMALL format in 2017 [23], this format has not been adopted widely by the processing software packages and was not considered during this study. For the Reson system, datagrams with multiple samples per beam data are referred to as "snippets". With the aim of using the same raw data, software developers were requested to start the processing with the Seabed Image / Snippets data as the basis to calculate BL0.

BL3: The Backscatter Level After Radiometric Corrections but Before Compensation for Angular Dependence
Typically, several radiometric corrections are applied to the raw data (BL0) after they are extracted from the file. Schimel et al. [9] suggest the following three classifications : (i) Corrections for Gains applied during Reception (CorGR), (ii) Corrections for propagation through Water column and interaction with Seafloor and (CorWS) and (iii) Corrections for Mechanical Properties of the transducer (CorMP). This is not the approach that has been historically taken in different software implementations; some software may apply all corrections in bulk, others may combine several, or apply only partial corrections, or apply corrections in different orders. Therefore, this study could only request the levels before and after all radiometric corrections (BL0 and BL3, see Figure 2). BL3, the backscatter level corrected for radiometric corrections, as a function of the incident angle, is the  Figure 1 in [9]), resulting in the two common backscatter products-angular response curves and mosaic. Only the BL 0 and BL 3 intermediate outputs were requested from software developers during the current study.

BL 0 : The Backscatter Level as Read in The Raw Files
The first stage of backscatter processing consists of reading the raw backscatter data recorded in the MBES raw data files. For both Kongsberg and Teledyne Reson systems, the raw data format organizes the collected information into several types of data units, known as datagrams, and the structure of each datagram type is described in format specifications made publicly available by the manufacturers [21,22]. Not only are backscatter data typically available in different datagrams, but the formats, the intermediate calculations applied, and the output resolution may have changed over the years. For example, in Kongsberg systems, the backscatter data are available in both the "one-value-per-beam" and "several-samples-per-beam" formats in two different datagrams ("Depth" datagram for the former and "Seabed Image" datagram for the later). In November 2005, the "Depth" datagram was superseded by the "XYZ 88" datagram, and the "Seabed Image" datagram was superseded by the "Seabed Image 89" datagram, with both newer datagrams upgrading the data resolution from 0.5 dB to 0.1 dB [21]. Although Kongsberg released KMALL format in 2017 [23], this format has not been adopted widely by the processing software packages and was not considered during this study. For the Reson system, datagrams with multiple samples per beam data are referred to as "snippets". With the aim of using the same raw data, software developers were requested to start the processing with the Seabed Image / Snippets data as the basis to calculate BL 0 .

BL 3 : The Backscatter Level After Radiometric Corrections but Before Compensation for Angular Dependence
Typically, several radiometric corrections are applied to the raw data (BL 0 ) after they are extracted from the file. Schimel et al. [9] suggest the following three classifications: (i) Corrections for Gains applied during Reception (CorGR), (ii) Corrections for propagation through Water column and interaction with Seafloor and (CorWS) and (iii) Corrections for Mechanical Properties of the transducer (CorMP). This is not the approach that has been historically taken in different software implementations; some software may apply all corrections in bulk, others may combine several, or apply only partial corrections, or apply corrections in different orders. Therefore, this study could only request the levels before and after all radiometric corrections (BL 0 and BL 3 , see Figure 2). BL 3 , the backscatter level corrected for radiometric corrections, as a function of the incident angle, is the "angular response curve", that is one of the two backscatter outputs commonly produced. Further corrections would need to be applied to BL 3 to obtain a backscatter mosaic, including the flattening of the backscatter angular dependence.

Data Processing by Software Developers
Software developers provided the results as an ASCII text file in the format requested ( Table 2). One of the software packages already had some variations of the ASCII export built into their processing routine, while for others, the ASCII export was developed as a result of this request. Software developers were given the option to include additional columns as desired. The details of the data processing, as implemented by software developers for this project, are outlined in the following sections.

CARIS SIPS Backscatter Processing Workflow
The backscatter processing implementation in CARIS SIPS is a continuation of its bathymetric processing workflow and is aimed towards creating a backscatter mosaic. SIPS supports data sources from Reson and Kongsberg systems in their three record modes: Side scan (only applicable to Reson systems), beam average intensities, and snippets. Two separate backscatter processing engines are available within SIPS: Geocoder and SIPS backscatter processing engine. As the existing SIPS workflow did not allow end-users to extract BL 0 and BL 3 , these data were extracted by the SIPS software developers themselves. The following corrections and settings were selected: Processing Engine: SIPS; Source Data Type: Time series; Slant Range Correction, Beam Pattern Correction; Angular Variation Gains, Adaptive; AVG size filter, 200 samples. As of the release of the CARIS SIPS 11.1.3 (released March 2019), end-users have the ability to export the intermediate processing stages utilized in SIPS processing engine accessed through 'Advanced Settings' and by designating a 'Corrections Text Folder' where an ASCII file is stored that contains results of intermediate processing stages. (Figure 2). Geosciences 2019, 9, x FOR PEER REVIEW 5 of 24 "angular response curve", that is one of the two backscatter outputs commonly produced. Further corrections would need to be applied to BL3 to obtain a backscatter mosaic, including the flattening of the backscatter angular dependence.

Data Processing by Software Developers
Software developers provided the results as an ASCII text file in the format requested ( Table 2). One of the software packages already had some variations of the ASCII export built into their processing routine, while for others, the ASCII export was developed as a result of this request. Software developers were given the option to include additional columns as desired. The details of the data processing, as implemented by software developers for this project, are outlined in the following sections.

CARIS SIPS Backscatter Processing Workflow
The backscatter processing implementation in CARIS SIPS is a continuation of its bathymetric processing workflow and is aimed towards creating a backscatter mosaic. SIPS supports data sources from Reson and Kongsberg systems in their three record modes: Side scan (only applicable to Reson systems), beam average intensities, and snippets. Two separate backscatter processing engines are available within SIPS: Geocoder and SIPS backscatter processing engine. As the existing SIPS workflow did not allow end-users to extract BL0 and BL3, these data were extracted by the SIPS software developers themselves.

FMGT Backscatter Processing Workflow
The backscatter data processing in the QPS software suite is implemented in a separate toolbox: Fledermaus Geocoder Toolbox (FMGT). A notable factor in this implementation is that all the survey parameters are read directly from the survey line files. The processing parameters used included "Tx/Rx Power Gain Correction", "Apply Beam Pattern Correction", and "Keep data for ARA (Angle Range analysis)". Backscatter Range was selected based on the minimum and maximum value of backscatter from "calibrated" backscatter with beam angle cut off between 0 • and 90 • . Export of BL 0 and BL 3 data are available through export 'ASCII ARA beam detail' (Figure 3).

FMGT Backscatter Processing Workflow
The backscatter data processing in the QPS software suite is implemented in a separate toolbox: Fledermaus Geocoder Toolbox (FMGT). A notable factor in this implementation is that all the survey parameters are read directly from the survey line files. The processing parameters used included "Tx/Rx Power Gain Correction", "Apply Beam Pattern Correction", and "Keep data for ARA (Angle Range analysis)". Backscatter Range was selected based on the minimum and maximum value of backscatter from "calibrated" backscatter with beam angle cut off between 0° and 90°. Export of BL0 and BL3 data are available through export 'ASCII ARA beam detail' (Figure 3).

SonarScope Data Processing Workflow
SonarScope is a research tool developed by the department of Underwater Acoustics at IFREMER. SonarScope is available for free under an academic non-commercial use license. This tool is developed in Matlab as a laboratory tool aimed at research and development, rather than production. SonarScope can handle a variety of MBES formats. SonarScope implemented a new backscatter data processing methodology concurrently with this study. A detailed analysis of various processing stages based on the sonar equation [7] is provided in this updated workflow and exported as an HTML summary file with graphical displays of the various corrections. An ASCII output file is also produced that contains several fields describing the corrections ( Figure 4).

SonarScope Data Processing Workflow
SonarScope is a research tool developed by the department of Underwater Acoustics at IFREMER. SonarScope is available for free under an academic non-commercial use license. This tool is developed in Matlab as a laboratory tool aimed at research and development, rather than production. SonarScope can handle a variety of MBES formats. SonarScope implemented a new backscatter data processing methodology concurrently with this study. A detailed analysis of various processing stages based on the sonar equation [7] is provided in this updated workflow and exported as an HTML summary file with graphical displays of the various corrections. An ASCII output file is also produced that contains several fields describing the corrections ( Figure 4).

FMGT Backscatter Processing Workflow
The backscatter data processing in the QPS software suite is implemented in a separate toolbox: Fledermaus Geocoder Toolbox (FMGT). A notable factor in this implementation is that all the survey parameters are read directly from the survey line files. The processing parameters used included "Tx/Rx Power Gain Correction", "Apply Beam Pattern Correction", and "Keep data for ARA (Angle Range analysis)". Backscatter Range was selected based on the minimum and maximum value of backscatter from "calibrated" backscatter with beam angle cut off between 0° and 90°. Export of BL0 and BL3 data are available through export 'ASCII ARA beam detail' (Figure 3).

SonarScope Data Processing Workflow
SonarScope is a research tool developed by the department of Underwater Acoustics at IFREMER. SonarScope is available for free under an academic non-commercial use license. This tool is developed in Matlab as a laboratory tool aimed at research and development, rather than production. SonarScope can handle a variety of MBES formats. SonarScope implemented a new backscatter data processing methodology concurrently with this study. A detailed analysis of various processing stages based on the sonar equation [7] is provided in this updated workflow and exported as an HTML summary file with graphical displays of the various corrections. An ASCII output file is also produced that contains several fields describing the corrections ( Figure 4).

MB Process Data Processing and SONAR2MAT Data Conversion
MB Process is a proprietary backscatter data processing tool coded in Matlab and developed and used by Curtin University Centre for Marine Science and Technology (CMST) and Geoscience Australia (GA) researchers to process Kongsberg (.all) files and Reson MBES (files saved as XTF) [24]. CMST-GA MB Process is available to download for free from https://cmst.curtin.edu.au/products/ multibeam-software/ (last accessed September 2019). As this study used Reson (.s7k) files, the converter SONAR2MAT [25] was used to convert the (.s7k) data first to MATLAB (.mat) data files. SONAR2MAT converter supports a variety of MBES data formats and is available to download for free from https://cmst.curtin.edu.au/products/sonar2mat-software/ (last accessed Sept 2019). The script was used to calculate the mean for each beam (i.e., BL 0 ) from the converted snippets data packet (7028) using the samples that fall within +/− 5 dB around the bottom detect echo level. The corrections applied followed Parnum and Gavrilov [26], and required other converted data packets including settings (7000), bathymetry (7027), and beam geometry (7004), to produce BL 3 data. Data were then exported in to the ASCII format specified in Table 2 except for beam depth.

Results
The ASCII files obtained for each software differed in both format and contents. A summary of the contents of the ASCII files is provided in Appendix B (Table A2). The availability of the results on ping/beam basis made it convenient to compare data from each software. Data inter-comparison was conducted based on the ping/beam number, BL 0 , BL 3 , and incidence angle.

Flagged Invalid Beams
The number of beams flagged as "invalid" by each software was different ( Figure 5). For the Kongsberg EM 302 data, FMGT showed almost no flagged beams while both SIPS and SonarScope showed a large number of beams flagged. These differences were found to be related to each software's different choice of dealing with soundings with invalid bottom detection. Kongsberg's "XYZ 88" datagram provides information about 'detection information' that specifies among other things whether the beam had a valid bottom detection or not. The beams with invalid bottom detections, however, can be assigned interpolated backscatter to provide continuous backscatter across all beams (see note 4 p.44 [21]). FMGT has implemented the strategy to use the beams with invalid bottom detection, while SIPS and SonarScope utilize only the beams that have a valid bottom detection information available. For the purposes of comparison, only the beams that were considered valid by all software packages were used. The interface in the SonarScope to select the export of .csv and .html files that provide details of the various corrections applied to produce processed backscatter results. SonarScope version 20190702_R2017b (released 2 July 2019).

MB Process Data Processing and SONAR2MAT Data Conversion
MB Process is a proprietary backscatter data processing tool coded in Matlab and developed and used by Curtin University Centre for Marine Science and Technology (CMST) and Geoscience Australia (GA) researchers to process Kongsberg (.all) files and Reson MBES (files saved as XTF) [24]. CMST-GA MB Process is available to download for free from https://cmst.curtin.edu.au/products/multibeam-software/ (last accessed September 2019). As this study used Reson (.s7k) files, the converter SONAR2MAT [25] was used to convert the (.s7k) data first to MATLAB (.mat) data files. SONAR2MAT converter supports a variety of MBES data formats and is available to download for free from https://cmst.curtin.edu.au/products/sonar2mat-software/ (last accessed Sept 2019). The script was used to calculate the mean for each beam (i.e., BL0) from the converted snippets data packet (7028) using the samples that fall within +/-5 dB around the bottom detect echo level. The corrections applied followed Parnum and Gavrilov [26], and required other converted data packets including settings (7000), bathymetry (7027), and beam geometry (7004), to produce BL3 data. Data were then exported in to the ASCII format specified in Table 2 except for beam depth.

Results
The ASCII files obtained for each software differed in both format and contents. A summary of the contents of the ASCII files is provided in Appendix B (Table B1). The availability of the results on ping/beam basis made it convenient to compare data from each software. Data inter-comparison was conducted based on the ping/beam number, BL0, BL3, and incidence angle.

Flagged Invalid Beams
The number of beams flagged as "invalid" by each software was different ( Figure 5). For the Kongsberg EM 302 data, FMGT showed almost no flagged beams while both SIPS and SonarScope showed a large number of beams flagged. These differences were found to be related to each software's different choice of dealing with soundings with invalid bottom detection. Kongsberg's "XYZ 88" datagram provides information about 'detection information' that specifies among other things whether the beam had a valid bottom detection or not. The beams with invalid bottom detections, however, can be assigned interpolated backscatter to provide continuous backscatter across all beams (see note 4 p.44 [21]). FMGT has implemented the strategy to use the beams with invalid bottom detection, while SIPS and SonarScope utilize only the beams that have a valid bottom detection information available. For the purposes of comparison, only the beams that were considered valid by all software packages were used.

Comparison of BL 0 and BL 3
The software provided results whose patterns were qualitatively comparable but whose relative levels were often very different (BL 0 in Figure 6 and BL 3 in Figure 7). The mean values of BL 0 and BL 3 were computed for each beam and ping and showed that the differences between the tools could be larger than 5 dB (Figure 8). It was also evident these differences are not uniform across the swath. A pair-wise comparison revealed that the differences were more pronounced for the outer beams compared to near-nadir beams (Figure 9). Geosciences 2019, 9, x FOR PEER REVIEW 8 of 24 Figure 5. Percentage of beams flagged as "invalid" by each software of the EM 302 data.

Comparison of BL0 and BL3
The software provided results whose patterns were qualitatively comparable but whose relative levels were often very different (BL0 in Figure 6 and BL3 in Figure 7). The mean values of BL0 and BL3 were computed for each beam and ping and showed that the differences between the tools could be larger than 5 dB (Figure 8). It was also evident these differences are not uniform across the swath. A pair-wise comparison revealed that the differences were more pronounced for the outer beams compared to near-nadir beams (Figure 9).

Comparison of BL0 and BL3
The software provided results whose patterns were qualitatively comparable but whose relative levels were often very different (BL0 in Figure 6 and BL3 in Figure 7). The mean values of BL0 and BL3 were computed for each beam and ping and showed that the differences between the tools could be larger than 5 dB (Figure 8). It was also evident these differences are not uniform across the swath. A pair-wise comparison revealed that the differences were more pronounced for the outer beams compared to near-nadir beams (Figure 9).

Comparison of Reported Incidence Angles
CARIS SIPS reported incidence angles were positive and ranged from 0° to 80° while FMGT and SonarScope reported the incidence angle with a range from −80° to 80° with port swath incidence

Comparison of Reported Incidence Angles
CARIS SIPS reported incidence angles were positive and ranged from 0° to 80° while FMGT and SonarScope reported the incidence angle with a range from −80° to 80° with port swath incidence

Comparison of Reported Incidence Angles
CARIS SIPS reported incidence angles were positive and ranged from 0 • to 80 • while FMGT and SonarScope reported the incidence angle with a range from −80 • to 80 • with port swath incidence angles reported as negative numbers (Figures 10 and 11). Topographically-related variations in incidence angles are clearly visible in the output of SIPS, FMGT, and SonarScope, suggesting that seafloor slope was considered while computing the seafloor incidence angle. However, slight variations in the incidence angle are noticeable that may be related to the differences in the cleaning or smoothing of the DTM used to correct for seafloor slope.
Geosciences 2019, 9, x FOR PEER REVIEW 10 of 24 angles reported as negative numbers (Figures 10 and 11). Topographically-related variations in incidence angles are clearly visible in the output of SIPS, FMGT, and SonarScope, suggesting that seafloor slope was considered while computing the seafloor incidence angle. However, slight variations in the incidence angle are noticeable that may be related to the differences in the cleaning or smoothing of the DTM used to correct for seafloor slope.

Comparison of Corrections Applied For BL3 Processing
The difference between BL3 and BL0 was computed for each software solution in order to obtain the total correction factor applied in the radiometric correction stage (Figures 12 and 13). These show that each software applies very different processing corrections to arrive at BL3. In the case of SIPS, the correction appears as an along-track stripes pattern, which would implicate beam pattern correction (section 2.3.1). In the case of FMGT, the correction is reminiscent of the incidence angle. In the case of SonarScope, the correction increases somewhat regularly away from nadir. Without the knowledge of the intermediate stages between BL0 and BL3 (BL2A and BL2B-See Figure 2), these interpretations are not definitive. angles reported as negative numbers (Figures 10 and 11). Topographically-related variations in incidence angles are clearly visible in the output of SIPS, FMGT, and SonarScope, suggesting that seafloor slope was considered while computing the seafloor incidence angle. However, slight variations in the incidence angle are noticeable that may be related to the differences in the cleaning or smoothing of the DTM used to correct for seafloor slope.

Comparison of Corrections Applied For BL3 Processing
The difference between BL3 and BL0 was computed for each software solution in order to obtain the total correction factor applied in the radiometric correction stage (Figures 12 and 13). These show that each software applies very different processing corrections to arrive at BL3. In the case of SIPS, the correction appears as an along-track stripes pattern, which would implicate beam pattern correction (section 2.3.1). In the case of FMGT, the correction is reminiscent of the incidence angle. In the case of SonarScope, the correction increases somewhat regularly away from nadir. Without the knowledge of the intermediate stages between BL0 and BL3 (BL2A and BL2B-See Figure 2), these interpretations are not definitive.

Comparison of Corrections Applied For BL 3 Processing
The difference between BL 3 and BL 0 was computed for each software solution in order to obtain the total correction factor applied in the radiometric correction stage (Figures 12 and 13). These show that each software applies very different processing corrections to arrive at BL 3 . In the case of SIPS, the correction appears as an along-track stripes pattern, which would implicate beam pattern correction (Section 2.3.1). In the case of FMGT, the correction is reminiscent of the incidence angle. In the case of SonarScope, the correction increases somewhat regularly away from nadir.

Summary of Differences Between Software for Different Sonar Types
In the previous sections, the differences between SIPS, FMGT, and SonarScope processing an EM302 data file were explored. In this section, the results of BL0, BL3, and incidence angle for other sonar types are summarized. The results show that EM 710 (Figures 14 and 15), EM 3002 ( Figures 16  and 17), EM 2040 (Figures 18 and 19), and SeaBat 7125 (Figures 20 and 21) also present large differences.

Summary of Differences Between Software for Different Sonar Types
In the previous sections, the differences between SIPS, FMGT, and SonarScope processing an EM302 data file were explored. In this section, the results of BL0, BL3, and incidence angle for other sonar types are summarized. The results show that EM 710 (Figures 14 and 15), EM 3002 ( Figures 16  and 17), EM 2040 (Figures 18 and 19), and SeaBat 7125 (Figures 20 and 21) also present large differences.

Summary of Differences Between Software for Different Sonar Types
In the previous sections, the differences between SIPS, FMGT, and SonarScope processing an EM302 data file were explored. In this section, the results of BL 0 , BL 3 , and incidence angle for other sonar types are summarized. The results show that EM 710 (Figures 14 and 15                  The pairwise differences for both BL0 and BL3 results differ considerably among processing solutions for the example files from all sonar models. The mean differences (except for the SeaBat 7125 data file) ranged from ~2 dB to ~10 dB with standard deviations of up to 8 dB. For the Seabat 7125 data file, the mean of the difference between FMGT and MB Process results was < 1 dB, but the difference was ~100 dB for comparisons involving SIPS. The large discrepancy observed in SIPS results for the SeaBat 7125 data file indicates the application of large offset while reading the snippets.  The pairwise differences for both BL0 and BL3 results differ considerably among processing solutions for the example files from all sonar models. The mean differences (except for the SeaBat 7125 data file) ranged from ~2 dB to ~10 dB with standard deviations of up to 8 dB. For the Seabat 7125 data file, the mean of the difference between FMGT and MB Process results was < 1 dB, but the difference was ~100 dB for comparisons involving SIPS. The large discrepancy observed in SIPS results for the SeaBat 7125 data file indicates the application of large offset while reading the snippets. The pairwise differences for both BL 0 and BL 3 results differ considerably among processing solutions for the example files from all sonar models. The mean differences (except for the SeaBat 7125 data file) ranged from~2 dB to~10 dB with standard deviations of up to 8 dB. For the Seabat 7125 data file, the mean of the difference between FMGT and MB Process results was <1 dB, but the difference was~100 dB for comparisons involving SIPS. The large discrepancy observed in SIPS results for the SeaBat 7125 data file indicates the application of large offset while reading the snippets.

Relative Importance of Difference in Raw Data Reading (BL 0 ) Compared to Radiometric Correction (BL 3 -BL 0 )
The results presented above showed that software solutions provided levels that differ both at the initial raw data reading stage (BL 0 ) and after the radiometric correction have been applied (BL 3 ). A few possible reasons for differences in BL 0 will be discussed in Section 4.2.
The question arises as to whether the difference in the end results (BL 3 ) is due mostly to the difference in data reading (BL 0 ) or in radiometric corrections applied (BL 3 −BL 0 ). To assess which of the two sources of differences contributes the most to the difference in end results, the absolute value of the ratio between the difference in radiometric correction and the difference in raw data reading (Equation (1), considering two software solutions A and B) was calculated: Small γ values (tending towards zero) indicate that the difference in end results is mostly due to differences in raw data reading. Conversely, large γ values (tending towards infinity) indicate that the difference in end results is mostly due to the difference in radiometric corrections. A γ value of 1 indicates that both sources of difference contribute equally to the difference in end results. In Table 3, we report for each dataset and each pair of software that could be, thus, compared, the median γ value and its interquartile range.
In the case of Kongsberg systems, the median γ value is almost always less than 1, indicating that for most datasets and most software comparisons, the difference in data reading has more influence on the difference in the end results than radiometric corrections. Only the SIPS/FMGT comparison on the EM710 dataset shows a median γ larger than 1, indicating, in this case, that difference in radiometric corrections have a (slightly) larger influence. The same analysis applied to the SeaBat 7125 data produced many different results. MB Process and FMGT read the SeaBat 7125 raw data very similarly, leading to a very large median γ value that confirms that the difference in end results is almost entirely due to the difference in radiometric corrections. However, SIPS reads the data differently than the two other software, and the difference in end result seems to be mostly due to this difference in data reading when compared with FMGT (median γ of 0.71), but mostly due to difference in radiometric processing when compared with MB Process (median γ of 2.27). Note that the interquartile ranges are often quite large, indicating that to obtain the future goal of consistent end results both of the two sources of difference will need to be addressed but the most important source of the differences in the end-results (except for MB Process and FMGT's approach to processing SeaBat 7125 data), currently is simply due to the original choice of the starting value (BL 0 ).

Discussion
MBES backscatter data are increasingly used to provide information about the nature of the seabed, in resource management projects, to assess the potential environmental impacts of human activities on the seabed, and for monitoring and managing marine habitats [8,10]. In many of such projects, it is often required to merge backscatter data from several sources, which often use different data processing and analysis software packages (e.g., EU national monitoring programs in relation with the EU Marine Strategic Framework Directive [27], Seamap Australia-a national seafloor habitat classification scheme [28] and Marine AREA database for Norwegian waters: MAREANO [29]). In this context, the quality control of the data and final products have important regulatory and legal implications. It is incumbent upon government agencies and scientific institutions to recognize that software packages used to process the raw data into useable products also impact the interpretation of these products and thus should be accredited for quality level [30]. There is a lot to gain for all the parties involved, to develop quality control approaches for the algorithms, and reach a level of standardization sufficient to merge the products from different software packages. The comparative analysis of intermediate software results, as developed in this paper, is a first step in the direction of processing standardization. We acknowledge that this study has been limited to analysis of data from a selection of commonly used MBES, and only a few backscatter processing packages. Future work of the BSIP/BSWG will incorporate data from a wider range of MBES, will facilitate the analysis of data logged in a variety of formats including Kongsberg's new KMALL format for which software manufacturers are only starting to develop stable solutions at this time and hope to engage more backscatter processing software packages.

Importance of Accurate, Transparent, and Consistent Software Solutions in Science
The software solutions provide critical functionality to support data acquisition, processing, analysis, and visualization for nearly all the scientific disciplines, including benthic studies [31,32]. The choice of processing software is a critical decision. Software solutions may be chosen based on several criteria including accuracy, transparency, consistency, ease of use, price, fit for the specific processing needs, computing resources requirements, and compatibility with other tools being used by an organization and project partners. The determination of accuracy, transparency, and consistency of software solutions requires detailed testing that is beyond the scope of a single study such as the present one [33]. However, the unexplained differences between the backscatter results processed by the tools that are widely used by scientists is a concern shared by end-users of backscatter data, agencies funding data acquisition and processing; and software solution providers [34,35]. Hook and Kelley [33] identified a lack of quality control and means of comparing software output to expected and correct results as a critical challenge to assess a software package. The current study compares some of the intermediate processing results of non-transparent processing chains in an attempt to highlight which parts of these processing chains differ the most. Only four software solution providers participated in this study, but it is expected that future efforts will include other software packages. One very positive development has been that through this study and the cooperation of the software manufacturers, each of the three commercial software packages that were studied (QPS, CARIS, and SonarScope) now have the functionality to export intermediate results that will enable future end-users to be able to assess the processing chain themselves.

Why Do Different Approaches to Reading Raw Data Exist and Which One is Correct?
The results indicate that the raw data in the form of seabed image/snippets is read differently by various software to create what is termed as 'beam averaged backscatter' and was referred during this study as BL 0 . The impetus to compute beam averaged backscatter value stems from the need to reduce the statistical uncertainty of seafloor backscatter [6,15]. Through the commercial development of MBES, different approaches have been taken for the collection and provision of backscatter data, and these differences may offer some explanation for the discrepancies found. For instance, historically, the approaches taken to compute a single representative value per beam from recorded snippets differed based on:
Choice of how the backscatter samples are selected to compute a measure of central tendency, e.g., use all the samples within a beam vs. using some threshold around the bottom detect to obtain a subset of samples vs. some other variations to choose samples; 3.
Choice of the calculation method. MBES samples provided by sonar manufacturers represent backscatter strength in dB. These samples can be directly used to compute their central tendency, or they can be first converted into linear domain before calculating averages, and then the computed average converted back to a logarithmic scale.
For the purposes of this study, the software vendors were not required to disclose the details of their processing steps. The discussions over the course of this study with software developers indicated that this information might not be readily disclosed as the software developers are limited by non-disclosure agreements with hardware manufacturers from openly disclosing the internal processing of hardware. The information about the computation of BL 0 for various software that could be obtained during the study is summarized in Table 4. The impact of these various choices will result in differences in the reported results depending on the specific data set and range of the recorded backscatter values. These differences are the most likely reasons the BL 0 values reported for various tools were different. A recommendation to use one or the other approach based on rigorous analysis is beyond the scope of the current study, but further investigation into this issue should be prioritized in close collaboration with hardware manufacturers as well as software developers.

Need for Adoption of Metadata Standards
While MBES bathymetry data has long been subject to standards of accuracy [17], quantified uncertainties [36], and validated processing sequences, MBES backscatter mosaics are often considered qualitative products. The long-standing obstacle here is the complexity of the logistics of calibrating MBES backscatter data, and this situation has delayed the development and applications of the usage of this data-type [11]. The shift from a qualitative treatment of seafloor backscatter products such as backscatter mosaics to that of repeatable quantitative measurements may not be complete until feasible calibration procedures are developed, agreed upon, and routinely implemented. In the meantime, however, additional tools need to be made available to end-users to analyze the impact of their choices of parameters and algorithms in their backscatter data processing routine. Compilation of results from multisource multibeam echo sounders (e.g., [37]) and for multi-frequency systems (e.g., [38]) indicate the growing demand for consistent processing methodology. The ability to identify the reason(s) of differences in the processed results is, therefore, an essential component to understand if the differences in repeat or adjacent surveys are due to the seafloor changes, acquisition differences, or merely due to post-processing differences. This study reinforces the need for comprehensive metadata to accompany processed results [11]. In the absence of estimates of the accuracy or uncertainty of a data product (as is the case with MBES bathymetry), metadata provide the backscatter users with the minimum sufficient information to replicate the final product if necessary, and correct issues that may be discovered. Metadata also has an essential role in providing information to end-users (e.g., a geologist interpreting seabed sediment type) who may not be actively involved in, or have an in-depth knowledge of, backscatter processing yet whose perception of the data is influenced by the data provenance from acquisition through processing. The development and implementation of a standard metadata format for backscatter data products by the community (involving sonar manufacturers, software developers, and the users of this hardware and software across industry, academia, and government organizations) should, therefore, be a priority.

Collaboration between Backscatter Stakeholders
Our study reveals that much of the difference in backscatter results can be linked to a lack of communication between end-users, sonar manufacturers, and software developers. The current state where the results from different software packages are not reliable is a result of the independent evolution of methodology without considering end-users needs to be able to achieve consistent backscatter results irrespective of which software tool they use. This study has been conducted under the umbrella of the GEOHAB Backscatter Working Group (BSWG), which has been organized to provide a platform for academic, commercial, and government entities to collaborate to address challenges in backscatter processing. Although the calls for such collaborations have been numerous [39][40][41], collaborations focused on a specific data type (MBES backscatter) are rare. The lessons learned from the collaboration, which made the current study possible include:

•
The collaboration works well if all the stakeholders can communicate. BSWG provided an effective communication platform that facilitated the discussions. • Different entities may have different end goals in mind while collaborating on such projects. The framework of a successful collaboration depends on finding common goals. For example, in this case, the common goal was an improvement in the consistency of backscatter results, which motivated all stakeholders to agree to work closely. For other similar efforts, e.g., efforts to standardize seafloor backscatter segmentation and characterization, the identification of a common goal may not be very clear due to multiple divergent needs of end-users or desire to protect commercial interests.

•
Challenges of navigating proprietary restrictions both for multibeam echosounder software and hardware manufacturers are very real and may hamper successful collaboration between stakeholders [42].

Conclusions
The applications of seafloor backscatter data are expanding. To support such an expansion, there is a critical need for an increased output consistency among various software packages or, at least, a clear explanation for differences among software solutions. Hence, the progress made in this study was due to the cooperation of the software providers. For instance, during this study, significant differences were encountered between the outputs of several popular backscatter software packages, but through collaboration, a better understanding of where these differences were introduced in the processing pipeline was achieved. This study adapted the standard pipeline and nomenclature proposed by Schimel et al. [9] to produce the results from backscatter intermediate processing stages. However, the data from these intermediate processing stages are currently not produced consistently by all the software developers. Therefore, the active participation of software developers will be critical to make appropriate changes in the software packages to enable the export of results from intermediate processing stages while expanding this approach to other software packages.
Two intermediate processing levels were assessed during this study: the level read from the raw data files (BL 0 ) and the level after radiometric corrections but before the removal of angular dependence (BL 3 ). Software developers applied the required changes in their processing methodologies and provided data in Beam -Ping configuration with BL 0 and BL 3 reported for each beam along with incidence angle. Both BL 0 and BL 3 showed differences as high as >10 dB between the software packages. The differences in BL 0 indicate that closed source software has adopted different approaches to read and reduce the raw data. These differences suggest this stage as one of the major causes of the observed differences in the final products. The observed discrepancy between BL 0 calls for standardization of processing at this early stage of backscatter processing as well as more transparency from software providers to describe their computation choices. Critical choices of BL 0 computation that should be targeted for developing a standard includes: (a) the choice of computation method for central tendency, i.e., mean or median; (b) the selection of samples used to compute BL 0, and; (c) the choice of linear or logarithmic domain for computation.
This study has shown the applicability and usefulness of the availability of intermediate processing stages for the inter-comparison of proprietary software without requiring the software vendors to disclose their proprietary algorithms. Hence, although the scope of this study has been limited to understand the differences between the specific software package results, it adds weight to the argument of why it is critical for various sonar manufacturers, commercial, and academic software developers, and end-users from diverse domains to work together to develop methods that can improve the consistency of backscatter processing. It is evident from this, and several previous studies that accepted protocols to test and compare software processing results are desired. This study offers a first step towards the implementation of previously proposed processing protocols. As software developers start to offer the results from other intermediate processing stages, it can be envisioned that data test benches can be developed to aid end-users in evaluating various processing options currently available in processing tools [43]. IFREMER and Curtin University and published with their permission. These comparison results were focused only on developing intermediate backscatter processing stages and did not address software usability, bathymetry processing and other features that prospective users of these software packages may also want to consider.