A Multi-Objective Geoacoustic Inversion of Modal-Dispersion and Waveform Envelope Data Based on Wasserstein Metric

: The inversion of acoustic ﬁeld data to estimate geoacoustic parameters has been a prominent research focus in the ﬁeld of underwater acoustics for several decades. Modal-dispersion curves have been used to inverse seabed sound speed and density proﬁles, but such techniques do not account for attenuation inversion. In this study, a new approach where modal-dispersion and waveform envelope data are simultaneously inversed under a multi-objective framework is proposed. The inversion is performed using the Multi-Objective Bayesian Optimization (MOBO) method. The posterior probability densities (PPD) of the estimation results are obtained by resampling from the exploited state space using the Gibbs Sampler. In this study, the implemented MOBO approach is compared with individual inversions both from modal-dispersion curves and the waveform data. In addition, the effective use of the Wasserstein metric from optimal transport theory is explored. Then the MOBO performance is tested against two different cost functions based on the L2 norm and the Wasserstein metric, respectively. Numerical experiments are employed to evaluate the effect of different cost functions on inversion performance. It is found that the MOBO approach may have more profound advantages when applied to Wasserstein metrics. Results obtained from our study reveal that the MOBO approach exhibits reduced uncertainty in the inverse results when compared to individual inversion methods, such as modal-dispersion inversion or waveform inversion. However, it is important to note that this enhanced uncertainty reduction comes at the cost of sacriﬁcing accuracy in certain parameters other than the sediment sound speed and attenuation.


Introduction
With the advancement of sonar technology, underwater acoustic propagation modeling [1,2], underwater acoustic signal processing, and high-performance computing technology, the use of acoustic field data or extracted corresponding feature quantities to estimate seabed parameters has received extensive attention for several decades [3,4].The geoacoustic parameters and corresponding profile structures are challenging to measure directly [5].Physical sampling is point-based, and geoacoustic characteristics based on the analysis of sediment physical samples can only provide information about the specific measurement sites [6].Conducting physical sampling to determine the geoacoustic properties of large seabed areas is costly and time-consuming.In contrast, the method of remote sensing can provide a more comprehensive and efficient approach to characterization of the seabed properties over large areas.Consequently, the reversing of remotely sensed acoustic field or corresponding features is an effective route to obtaining knowledge about the seabed geoacoustic properties, particularly at low frequencies.
When a sound wave travels in shallow water environments, compression sound velocity and attenuation of the seafloor have significant effects on its propagation [7].The seafloor is generally parameterized as a layered model of density, attenuation, and Remote Sens. 2023, 15, 4893 2 of 17 sound speed properties.An important method for estimating these acoustic properties is matched-field processing (MFI) [8].To minimize uncertainties in the estimation of seabed parameters, researchers have developed signal processing techniques to derive pertinent details from acoustic field data, such as sound pressure or amplitude phase [9], waveguide invariants [10], reflection coefficient versus angle [11,12] data, etc.The inversion methods using modal amplitude ratio [13], green function [14,15], modal group speed dispersion [16] and vertical coherence [17] have been discussed by Zhou et al. [18], while the most widely used characteristic of propagating normal modes is modal group speed dispersion [19][20][21].
The multiparameter optimization is one of the most commonly used techniques in geoacoustic inversion.However, nonlinearity and ill-posedness are still bottlenecks in theoretical research and engineering applications of geoacoustic parameter estimation [22].Although traditional approaches based on quadratic regularization technique [23] have increased the generalization of the model, they normally result in a stable solution that actively ignores certain details in favor of streamlined results.The method based on modal dispersion curves (DCs) and modal amplitude effectively obtains local geoacoustic parameters in shallow water.These characteristics can be utilized to define multi-objective optimization problems for geoacoustic inversion.A three-objective optimization approach [7] is proposed for geoacoustic parameter estimation, utilizing acoustic normal mode dispersion curves, mode shapes, and modal-based longitudinal horizontal coherence, successfully removing ambiguities and yielding accurate results.However, the criterion principle of modal dispersion inversion, which relies on phase comparison, does not estimate the attenuation.When dealing with a single receiver in a modal context, the attenuation can be estimated by taking into account modal amplitude considerations, such as inversion based on modal amplitude ratios or broadband waveform inversion.Among them, the full waveform envelope obtained via the Hilbert transform (FWH), derived from the received pressure signals, is more robust to the effects of ambient noise than the modal amplitudes.
This paper combines already existing works on the DCs inversion [24] and the waveform inversion [25] to develop a Bayesian joint inversion method that is applied to simulated single receiver data.Modal group speed DCs extracted using time-warping analysis and the FWH as function of time are simultaneously inverted with a MOBO algorithm [26] to solve for the corresponding seabed parametrized in terms of sediment layer thickness H, sound speed c, density ρ, and attenuation α (as well as error parameters).The PPDs of parameters are obtained by resampling from the exploited state space using the Gibbs Sampler.The paper shows that the MOBO can provide a way to reduce uncertainty in inversion results at the expense of partial accuracy (DCs and FWH are still well-matched).It has also been demonstrated that the FWH matching results can be enhanced through the concurrent inversion of DCs.Besides, the geoacoustic inversion is usually performed using L2 norm misfit function or Bartlett (correlator)) processor, traditionally.However, this study innovatively introduces the Wasserstein metric from optimal transport theory into the field of geophysical acoustics inversion.Numerical experiments indicate that using the Wasserstein metric as a cost function is expected to result in reduced uncertainty in the estimation of certain parameters.
The remainder of this article is structured in the following way.In Section 2, the DCs inversion and Waveform inversion are introduced.In Section 3, the Bayesian inversion method is provided.This section will also outline the cost function including the Wasserstein metric and its utilization in acoustic waveform inversion.In Section 4, numerical experiments are conducted to compare the differences between the individual Bayesian optimization inversion approach based on the L2 norm (L2-BO) and the MOBO joint inversion method based on the Wasserstein metric (Wasserstein-MOBO) and L2 norm (L2-MOBO).In Section 4, numerical experiments are performed to compare the Wasserstein-MOBO, L2-MOBO and L2-BO inversion methods.Section 5 provides discussion and conclusions.

Modal Propagation in a Single Receiver Context
In the study of ocean acoustics propagation, the normal mode theory has been extensively utilized in acoustic field computation, feature extraction and other aspects.The normal mode theory can effectively be used to explain acoustic propagation in shallow water (D < 200 m), and low-frequency ( f < 250 Hz) environments.Considering a broadband source emitting a pulse at depth z s , the received pressure field Y( f ) described as the sum of M modal pressures in the frequency domain is given by where k m ( f ), β m ( f ), and Ψ m ( f , z) are the real part of the horizontal wave number, the modal attenuation coefficient, and the modal depth function of mode m at frequency f .The waveguide propagation range is r.The quantity Q = e jπ/4 / √ 8πρ(z S ) represents a constant factor where ρ(z S ) is the density at the source depth and S( f ) is the source spectrum.Equation ( 1) can be simplified as where is the modal amplitude and the mode phase is given by According to Equation (3), the modal attenuation coefficient directly affects the modal amplitude.The Dispersion curve is defined by the time-frequency (TF) position of a given mode.It can be written as The factors t m ( f ) and t s ( f ) in Equation ( 6) represent the arrival time and source group delay, respectively.The group speed v m g ( f ) = dω/dk m at frequency f is given by Jensen et al. [27].

Estimation of the DCs Using Time-Warping
The Short-Time Fourier Transform (STFT) is a spectral analysis technique that allows for the localization of the signal y(t) (given by the inverse Fourier transform of Equation ( 1)) in both time and frequency domains.It accomplishes this by partitioning the received time-domain signal y(t) into segments using window functions hw * (t), and then applying a Fourier transform to each window.This process enables the extraction of the signal's time-frequency characteristics.The STFT is given by The warping [19] transforms a complex unsteady acoustic propagation signal into a quasi-single-frequency signal with a specific frequency by resampling the signal in the time or frequency domain according to a specific relationship.The main objective of warping is to linearize the mode phase φ m and facilitate modal separation of waveguide signals.
Warping transform maps y(t) into the warped signal W h y(t) where the warping function χ(t) = (t 2 + t 2 r ) 1/2 has a derivative χ (t) relative to t [14,20,28,29].t r = r/c w is the signal time origin, with c w the sound speed in the waveguide.Once the mode m has been filtered, the DCs can be used as an input for the geoacoustic inversion [30].
Rather than relying on approximate closed-form expressions for Equations ( 1) and ( 3), an alternative approach involves utilizing numerical simulations obtained from modal codes, like KRAKENC [31] or ORCA [32].These modal codes enable the generation of accurate numerical results.The respective warping function can then be derived through numerical methods to obtain precise and reliable results.

FWH Inversion
Waveform inversion can be defined as an optimization constrained by a discrete envelope structure.Mathematically, it can be represented by m = argminR h(m), y h obs , where R(m) is the mismatch function between the replica signal h(m) and observed signal y h obs .To acquire the envelope, the procedure includes performance of the Hilbert transformation on the signal y obs to produce the FWH y h obs .Tikhonov regularization methods are commonly used to constrain the inversion using suitable prior knowledge.

Inversion Methods
From the perspective of Bayesian inversion, the inversion can be defined in terms of the PPD of the parameters.Consequently, the task of inversion is transformed into the determination of the PPD of the parameters and subsequent analysis and interpretation.This approach ensures a more comprehensive understanding of the inversion problem, utilizing the principles of Bayesian inference.The usual inversion method often selects one optimal model based on a cost function.In contrast, Bayesian inversion not only aims at finding the optimal solution but also evaluates and analyzes the inversion results.That is, the Bayesian inversion result is a collection of multiple inversion results.The uncertainty of the inversion results is expressed in the form of a probability distribution.

Bayesian Inversion Theory
Bayes' theorem is a fundamental theorem that addresses the conditional probability relationship of random events, making it particularly valuable in scenarios where information is limited.Set observed data as d ∈ R N and m ∈ R M represent the candidate model composed of M unknown geoacoustic parameters.This can be rewritten as where P(m|d) is the posterior probability density (PPD).The posterior distribution P(m|d) can be considered as the result of adjusting the prior distribution P(m) by sample information (sampling based on the observed data d).The conditional probability density function (PDF) P(d|m), also known as the likelihood, is the probability of observing the given data d, assuming a specific set of parameter values m.The P(m) encapsulates any prior knowledge, beliefs, or assumptions about the parameters, and it is independent of the observation.The likelihood is an estimate of the probability of observing the measured data given a particular model m.Equation (10) can be expressed as Considering the challenges associated with determining the statistical error uncertainty of parameterization and observation in practical scenarios, the assumption that the errors follow a Gaussian (normal) distribution with zero mean is often employed during data processing.As a result, the form of the likelihood function can be expressed as a Gaussian distribution.Likelihood is expressed as where the data misfit function φ(m) is often used in optimization approaches to quantify the difference between observed and predicted data.It is an essential determinant for admitting and rejecting proposal models (i.e., the multidimensional distance between observed d obs and estimated g m data vectors.).The mean model [33] is used to interpret the PPD, as In this paper, the optimization is carried out numerically using MOBO, an algorithm that combines the NSGA-II method.

Cost Function Based on L2 Norm and Wasserstein Metric
The setting of the objective function is a very important part of the optimization problem, and different objective functions will make the inversion process and results different.If we denote the observed data as d = {d 1 , d 2 , . . .d N } with N data points, the likelihood is defined by where φ(m) is the data misfit distance between observed data vectors d and estimated data vectors d r (m) for the model m.It is given by where T indicates transpose and is the covariance matrix which can be estimated from the residuals [9].The misfit function for DCs inversion is structured to minimize the mean-squared-error between the DCs' estimate, generated by KRAKENC, and extracted DCs' travel times, where frequency is the independent variable, as stated by where N m is the number of data frequency points in the time-frequency domain of DCs, t mr is the DC replica travel time and dt is a time shift.For the waveform matching based on L2 norm, the misfit is defined as where n is a point shift to decrease the mismatch.
The data misfit φ(m), which contains the covariance matrix of the observed data, is almost universally based on the Bartlett processor defined by the L2 norm.The nonconvexity of the L2 norm misfit function in local optimization methods leads to frequent trapping in local minima during inversion, primarily due to the inherently point-bypoint amplitude comparison rather than considering global properties for full waveform inversion (FWI), resulting in the occurrence of the cycle-skipping issues [34].
Another data misfit function, constructed by the Wasserstein metric from optimal transport theory, is proposed to define the misfit [35].Mathematically, let us consider two equal probability measures, µ ∈ P(X) and ν ∈ P(Y), where X and Y are two distinct and complete metric spaces.The corresponding probability density functions p and q are defined as µ = pdx and ν = qdy on the space R n .In this letter, p and q represent the probability densities related to the replica and observation, respectively.For the mapping T: µ → ν , which satisfies µ[U] = v T −1 (U) for any U ⊂ Y, the corresponding minimum transportation cost from µ to ν is defined as where y = T(x), M consists of all transport plans that are able to transfer p into q.The factor c(x, y) indicates the cost of transporting one unit mass from location x to y. Particularly, when c(x, y) = |x − y| 2 , I(p, q, c) becomes the Wasserstein metric Since the Wasserstein metric requires the distributions p and q to be mass-conservation and non-negativity, the recorded acoustic waveform signal cannot meet these demands in its natural form.To overcome this impediment, we need to transform acoustic signals into probability densities through the use of a suitable scaling function where b is a coefficient parameter and < • > is the integral operator.Therefore, considering Wasserstein-MOBO, the r FW H is given by

Multi-Objective Bayesian Optimization
BO is a good approach in the inversion of individual feature quantities, such as DCs' inversion and FWH inversion.However, for a multi-objective optimization (MOO) m ∈ R M is sought for p potentially concurring objectives with R = {r 1 , r 2 , . . . ,r P }.As there is often no single optimal solution that satisfies all objectives simultaneously, the concept of a Pareto front is utilized to identify a set of joint optimum solutions.The goal of MOBO is to learn the Pareto front: the set of optimal trade-offs.To determine the posterior distribution of the objectives with respect to the input seabed parameters vector m, denoted as R(m) = {r 1 (m), r2 (m), . . ., rP (m)}, a specific inference method or algorithm needs to be employed.
Therefore, the likelihood functions for the L2-DCs and L2-FWH inversions can be obtained by bringing Equations ( 16) and (17) into Equation ( 14), respectively, while the likelihood function of the L2-MOBO approach is a combination of these two functions.The likelihood function for the Wasserstein-MOBO approach can be obtained by bringing Equations ( 16) and ( 21) into Equation ( 14).

Environmental Setup and Simulated Data
The particular way DCs and waveform matching approaches are combined in this study is guided by experience acquired from extensive numerical simulations under conditions approximating those encountered in the Shallow Water 2006 (SW06) experiment, as shown in Figure 1.In this simulation, the choice of parameterization scheme for the geoacoustic model and the configuration of environmental parameters are based on previous research findings [36] from the SW06 experiment.

Environmental Setup and Simulated Data
The particular way DCs and waveform matching approaches are combined in this study is guided by experience acquired from extensive numerical simulations under conditions approximating those encountered in the Shallow Water 2006 (SW06) experiment, as shown in Figure 1.In this simulation, the choice of parameterization scheme for the geoacoustic model and the configuration of environmental parameters are based on previous research findings [36] from the SW06 experiment.Synthetic analysis of the environment is conducted using sound speed profiles [15] taken from the SW06 experiment, which shows a sediment layer on top of a deeper subbottom layer.The comparison is drawn utilizing the synthetic data sets computed for the two-layer geoacoustic model specified in Table 1, which parameterizes the seabed layers in terms of thickness H , density  , sound speed c and attenuation coefficient  .
The sound speed and bathymetry are taken as given, and the range-independent Synthetic analysis of the environment is conducted using sound speed profiles [15] taken from the SW06 experiment, which shows a sediment layer on top of a deeper subbottom layer.The comparison is drawn utilizing the synthetic data sets computed for the two-layer geoacoustic model specified in Table 1, which parameterizes the seabed layers in terms of thickness H, density ρ, sound speed c and attenuation coefficient α.The sound speed and bathymetry are taken as given, and the range-independent inversion is used to estimate the seabed parameters.As evidenced in Table 1, the model priors for the inversion were set to a broad range.Hamilton's [37] empirical relationships were employed to place lower and upper limits on sound speed (in meters per second) based on the density (in grams per cubic centimeter) of The source pulse given by a Hanning-weighted 4-period wave is emitted at depth 22 m and recorded 7 km away at depth 55 m.As shown in Figure 2a, the received pulse consists of only one wave packet for the channel frequency response at each sampled frequency.Because the source function is well known for the simulation, no source level convolution is required.Gaussian errors were added to simulated acoustic-pressure spectra to achieve average signal-to-noise ratio (SNR) of 10 dB.The source pulse given by a Hanning-weighted 4-period wave is emitted at depth 2 m and recorded 7 km away at depth 55 m.As shown in Figure 2a, the received puls consists of only one wave packet for the channel frequency response at each sampled fre quency.Because the source function is well known for the simulation, no source leve convolution is required.Gaussian errors were added to simulated acoustic-pressure spec tra to achieve average signal-to-noise ratio (SNR) of 10 dB.Furthermore, the inversion results of the multi-objective inversion approach are an alyzed using two alternative objective functions, the L2 norm and the Wasserstein metric The two components of Bayesian optimization are a probabilistic surrogate model and a acquisition function.In this numerical experiment, we employed a Gaussian process (GP Furthermore, the inversion results of the multi-objective inversion approach are analyzed using two alternative objective functions, the L2 norm and the Wasserstein metric.The two components of Bayesian optimization are a probabilistic surrogate model and an acquisition function.In this numerical experiment, we employed a Gaussian process (GP) as the probabilistic surrogate model to facilitate the execution of MOBO.Expected improvement (EI) [26] serves as the acquisition function, which selects trials based on a heuristic aimed at maximizing improvement.There is no unique global solution, but rather a set of solutions known as the Pareto frontier in multi-objective optimization problems.The misfit residuals (as well as data-error covariance matrix) are firstly determined by attempting an optimization process in iterations of the burn-in phase.Each optimization involved ~1 × 10 3 forward calculations.
By utilizing time warping, the DCs of four lower-order normal modes can be extracted from the 50-250 Hz frequency band and used to invert for geoacoustic parameters.
The corresponding spectrogram in Figure 3c is obtained by time-frequency transformation of the time series presented in Figure 2a.From the spectrogram, it is evident that the signal energy is well contained below 150 Hz.This spectrogram with highly coupled first and second modes cannot be used for modal filtering because they are too heavily coupled.However, it can be used to assess the standard of the filtering result in a visual manner.Figure 3a demonstrates that none of the four warping modes are nearly perfectly horizontal: a sensible conclusion as the warping function is derived from an iso-velocity appropriation.The signal recovered after reverse warping plotted in Figure 2c demonstrates the reversibility of the method.As shown in Figure 3b, applying a self-customized mask to the spectrogram of the warped signal yields the spectrogram of a single warped mode.Then inverse Short-Time Fourier Transform is applied to such modes to realize modal filtering and the dispersion curve is estimated.The results show that even the modes with interleaved peak textures on the spectrogram can still be separated by the warping transform.Additionally, the unwarping transform effectively restores the original signals.These findings highlight the stability and utility of the warping transform as a method for extracting DCs.In conclusion, four modes were extracted from the data within the frequency band 50-150 Hz and applied in the dispersion inversion.
horizontal: a sensible conclusion as the warping function is derived from an iso-velocity appropriation.The signal recovered after reverse warping plotted in Figure 2c demonstrates the reversibility of the method.As shown in Figure 3b, applying a self-customized mask to the spectrogram of the warped signal yields the spectrogram of a single warped mode.Then inverse Short-Time Fourier Transform is applied to such modes to realize modal filtering and the dispersion curve is estimated.The results show that even the modes with interleaved peak textures on the spectrogram can still be separated by the warping transform.Additionally, the unwarping transform effectively restores the original signals.These findings highlight the stability and utility of the warping transform as a method for extracting DCs.In conclusion, four modes were extracted from the data within the frequency band 50-150 Hz and applied in the dispersion inversion.

Comparison between Individual Inversions and Multi-Objective Inversions
The computational cost is a significant consideration in geoacoustic inversion due to the multiple forward calculations and the inversion algorithms.The average running times of the four methods of L2-DCs, L2-FWH, L2-MOBO and Wasserstein-MOBO, are 2750 s, 2910 s, 3620 s and 3305 s, respectively.Multi-objective frameworks have significantly higher time complexity than separated-objective frameworks.Although the Wasserstein metric formula is more complex, the computational cost of the L2-MOBO approach is higher than that of the Wasserstein-MOBO approach.This may be related to the convexity of the objective function.
Table 2 shows that the inverted Bayes estimator values (as well as standard deviation) for individual inversions using DCs are in good agreement whereas, using FWH, some parameters vary substantially.The L2-DCs method outperforms the other three techniques in terms of accurately estimating the sound speed and thickness of the sediment layer, according to Table 2. Similarly, the Wasserstein-MOBO method yields comparable results to L2-DCs in terms of sediment sound speed, demonstrating significantly lower levels of uncertainty compared to both L2-FWH and L2-MOBO approaches.However, all four methods exhibit limited effectiveness when it comes to basement sound speed.This indicates that the algorithms or characteristics employed by these approaches may not sufficiently account for the complex physical processes associated with basement layer.Assessing the NRMSE, both L2-FWH and L2-MOBO methods show similar values, approximately 7.3 × 10 −5 , which are notably higher than the NRMSE value of 5.67 × 10 −5 obtained by the Wasserstein-MOBO method.According to Figure 4, among the observed data, posteriori replicas, and best replicas (obtained through the optimal geoacoustic parameters), the matching performance of the DCs' inversion is particularly strong for the first three DCs.
Figure 4 shows the fit of the individual inversion to the observed modal-dispersion data.This figure illustrates an impressive level of agreement between the measured DCs data and the replica data for ensemble of models taken moderately from the algorithm generations in inversion.Note that the mismatch of the fourth mode below 95 Hz is rather large, which is probably caused by the rough estimation of Airy phase, shown in Figure 3c.According to Figure 4, the robust inversion result can be attributed to the DCs' insensitivity to the parameters, apart from sediment thickness and sound speed.serstein metric formula is more complex, the computational cost of the L2-MOBO approach is higher than that of the Wasserstein-MOBO approach.This may be related to the convexity of the objective function.
Table 2 shows that the inverted Bayes estimator values (as well as standard deviation) for individual inversions using DCs are in good agreement whereas, using FWH, some parameters vary substantially.The L2-DCs method outperforms the other three techniques in terms of accurately estimating the sound speed and thickness of the sediment layer, according to Table 2. Similarly, the Wasserstein-MOBO method yields comparable results to L2-DCs in terms of sediment sound speed, demonstrating significantly lower levels of uncertainty compared to both L2-FWH and L2-MOBO approaches.However, all four methods exhibit limited effectiveness when it comes to basement sound speed.This indicates that the algorithms or characteristics employed by these approaches may not sufficiently account for the complex physical processes associated with basement layer.Assessing the NRMSE, both L2-FWH and L2-MOBO methods show similar values, approximately 7.3 × 10 −5 , which are notably higher than the NRMSE value of 5.67 × 10 −5 obtained by the Wasserstein-MOBO method.According to Figure 4, among the observed data, posteriori replicas, and best replicas (obtained through the optimal geoacoustic parameters), the matching performance of the DCs' inversion is particularly strong for the first three DCs. Figure 4 shows the fit of the individual inversion to the observed modaldispersion data.This figure illustrates an impressive level of agreement between the measured DCs data and the replica data for ensemble of models taken moderately from the algorithm generations in inversion.Note that the mismatch of the fourth mode below 95 Hz is rather large, which is probably caused by the rough estimation of Airy phase, shown in Figure 3c.According to Figure 4, the robust inversion result can be attributed to the DCs' insensitivity to the parameters, apart from sediment thickness and sound speed.The envelope spectrum of the analog received signal shown in Figure 5a indicates that modes are highly coupled together, making modal attenuation information difficult to extract without modal filtering.The first three modes are visible in the real part of the Hilbert analytic signal, and there is no complex coupling, which is essential for geoacoustic attenuation inversion, as shown in Figure 5.The 95% credible interval of the FWH-BO inversion (gray interval in Figure 5b) widens with time, indicating that the uncertainty of FWH inversion results increases with modal order.The credible interval for first-order modal inversion results is 9.7%, while the credible intervals for second-order and thirdorder values are 38.6% and 37.1%, respectively.In general, within 4-6 s sampling, the objective function of FWH is well-matched, and the RMSE of the replica and observation is only 0.0108.inversion (gray interval in Figure 5b) widens with time, indicating that the uncertainty of FWH inversion results increases with modal order.The credible interval for first-order modal inversion results is 9.7%, while the credible intervals for second-order and thirdorder values are 38.6% and 37.1%, respectively.In general, within 4-6 s sampling, the objective function of FWH is well-matched, and the RMSE of the replica and observation is only 0.0108.

Discussion
Figure 6 depicts the distributions of likelihood functions in the posteriori probability space.Based on Figure 6, it can be seen that both misfit measures exhibit numerous local minima, but with different characteristics in terms of specific details.The distribution of the likelihood function space for DCs based on the L2 norm exhibits a convex ridge-like distribution along with the c s axis.Comparing Figure 6b,d, it can be concluded that the misfit function based on the Wasserstein metric significantly improves the convexity of the posterior distribution of the likelihood function obtained from MOBO inversion.
To characterize these uncertainties more generally, the posterior models are used to calculate the NRMSE for each method, as shown in Table 2.The NRMSE of Wasserstein -MOBO inversion is significantly smaller compared to that of the inversion using DCs.To help explain the results, Figures 7 and 8 display marginal probability densities from Bayesian inference of estimated model parameters in MOBO inversion.
The final posterior distribution was constructed by selecting 10,000 models from the exploitation phase.The 5000 models of the posterior distribution were then ordered by likelihood and divided into three groups: the top 20%, 50%(medium), and all models.Figures 7 and 8 show an example of the L2-MOBO analysis and Wasserstein-MOBO analysis, respectively.The seabed sound speed interfaces and corresponding observed data fits from the chosen models in each group, and the posterior distributions of sediment thickness, sediment density, basement density, sediment attenuation and basement attenuation for all models are also shown.To characterize these uncertainties more generally, the posterior models are used to calculate the NRMSE for each method, as shown in Table 2.The NRMSE of Wasserstein -MOBO inversion is significantly smaller compared to that of the inversion using DCs.To help explain the results, Figures 7 and 8 display marginal probability densities from Bayesian inference of estimated model parameters in MOBO inversion.According to Figure 7, the trends of most measured and predicted data points are in accordance with each other within the estimated uncertainty.The majority of DCs show a good fit, except for the third modal Airy phase, while the FWH are not well-matched in all details.The wide distribution of basement attenuation signifies a higher degree of uncertainty and potential bias associated with the parameter.In this simulation, the modal energy for the first four modes is concentrated within 15 m of the sea floor, so the inversion is not sensitive to the basement geoacoustic parameters [21].It is also worth noting the presence of skewness and asymmetry in the distribution of sediment thickness H.The overestimation of low-frequency estimates of the DCs can lead to such an asymmetry, which is consistent with the matching results in Figure 7c.
Figure 8 shows the posterior model results of the Wasserstein-MOBO analysis.By comparing the matching of FWH in Figures 7 and 8, the result in L2-MOBO exhibits a wider distribution of credible intervals in the inversion of sediment sound speed c s .In contrast, the Wasserstein-MOBO method shows a narrower distribution of credible intervals.The bimodal distribution of basement sound speed suggests that the parameter can take on two distinct sets of values with relatively high probabilities.However, for the majority of parameters, the Wasserstein-MOBO inversion yields more accurate results, particularly for sediment sound speed, where it results in narrower marginal peak that is closer to the true value.In addition, both methods provide accurate estimates of the sediment sound speed c s with uncertainties significantly lower than that of the basement sound speed c b .The marginal distribution of basement attenuation α b under both metrics exhibits a bimodal pattern, indicating that the MOBO method is insensitive to this parameter.The final posterior distribution was constructed by selecting 10,000 models from the exploitation phase.The 5000 models of the posterior distribution were then ordered by likelihood and divided into three groups: the top 20%, 50%(medium), and all models.Figures 7 and 8 show an example of the L2-MOBO analysis and Wasserstein-MOBO analysis, respectively.The seabed sound speed interfaces and corresponding observed data fits from the chosen models in each group, and the posterior distributions of sediment thickness, sediment density, basement density, sediment attenuation and basement attenuation for all models are also shown.According to Figure 7, the trends of most measured and predicted data points are in accordance with each other within the estimated uncertainty.The majority of DCs show a good fit, except for the third modal Airy phase, while the FWH are not well-matched in all details.The wide distribution of basement attenuation signifies a higher degree of uncertainty and potential bias associated with the parameter.In this simulation, the modal energy for the first four modes is concentrated within 15 m of the sea floor, so the inversion is not sensitive to the basement geoacoustic parameters [21].It is also worth noting the presence of skewness and asymmetry in the distribution of sediment thickness H.The

Conclusions
Broadband multi-objective geoacoustic inversions using simulated data, under conditions approximating those encountered in SW06, have been performed to exhibit the advantage of the MOBO approach in reducing inversion uncertainty.It was demonstrated that, under conditions of moderate SNRs, inversion techniques employing modal-dispersion analysis and Hilbert spectral analysis of waveform data yield equivalent marginal distri-butions of geoacoustic parameters, including attenuation, when compared to inversions using individual DCs.Nevertheless, the adoption of the L2 norm objective function results in notable variances in the overall multidimensional posterior probability distributions, thus amplifying the uncertainty associated with the inversion process.Introducing the Wasserstein metric, however, significantly alleviates this uncertainty, leading to improved inversion outcomes.
Although MOBO, as a multi-feature inversion method, achieved more stable inversion results than individual-feature inversion methods under the same simulated conditions, it is important to note that numerical experiments did not directly consider the impact of parameterization errors when compared to realistic experiments.It is worth mentioning that 50 chains of random initial models were sampled in parallel in the simulations.This guarantees good coverage of the solution space sampling and also increases confidence in the posterior.This simulated validation reinforces the reliability and effectiveness of the MOBO method and this muti-objective approach also provides new insights and possibilities for geoacoustic inversion.
However, this current approach still requires improvement in selecting the hyperparameters for the softplus function in the Wasserstein metric and in choosing the combination of data sets.In subsequent research, validation experiments will be conducted by incorporating experiment data with different SNR conditions to assess the performance of this method, providing more convincing results for geoacoustic inversion research.Future research could also explore alternative objective functions, such as the mixed L1/Wasserstein metric [38] and the generalized L2 norm [39], to assess the suitability of this approach.

Figure 1 .
Figure 1.Simulated geoacoustic model and sound speed profile obtained from SW06 experiment.The source depth is 22 m, and a single receiver is placed 7.0 km away from the source at depth 55 m.Note that the graph is not to scale.

Figure 1 .
Figure 1.Simulated geoacoustic model and sound speed profile obtained from SW06 experiment.The source depth is 22 m, and a single receiver is placed 7.0 km away from the source at depth 55 m.Note that the graph is not to scale.

1
Except for the basement half space.

Figure 2 .
Figure 2. (Color online) Numerical signals.(a,b) show the received and corresponding warpe signals, respectively.(c) The signal after warping (the continuous curves) and unwarping (the re circles).Keep in mind that the time axes of the two subfigures are not equivalent.

Figure 2 .
Figure 2. (Color online) Numerical signals.(a,b) show the received and corresponding warped signals, respectively.(c) The signal after warping (the continuous curves) and unwarping (the red circles).Keep in mind that the time axes of the two subfigures are not equivalent.

Figure 3 .
Figure 3. (Color online) (a) Spectrogram of the warped signal.The white polygon shows the region of the TF mask applied to filter mode 3. (b) The frequency spectrum of the warped signal for the frequency band 0-50Hz, and the embedded graph is warped mode 3. (c) Spectrogram of the received signal.White dotted lines represent the theoretical DCs for the first four modes.The estimated DCs are also displayed as red curves upon the time-frequency spectrogram (panel (c)).

Figure 3 .
Figure 3. (Color online) (a) Spectrogram of the warped signal.The white polygon shows the region of the TF mask applied to filter mode 3. (b) The frequency spectrum of the warped signal for the frequency band 0-50 Hz, and the embedded graph is warped mode 3. (c) Spectrogram of the received signal.White dotted lines represent the theoretical DCs for the first four modes.The estimated DCs are also displayed as red curves upon the time-frequency spectrogram (panel (c)).

Figure 4 .
Figure 4. (Color online) Data fit achieved in inversions of DCs.Comparison between the observed modal-dispersion data (red solid lines) and the posteriori ensembles (blue dashed lines) obtained from DCs-BO inversion.The green dotted line represents the best model and the gray region is a 95% credible interval from the Bayesian sampling.

Figure 4 .
Figure 4. (Color online) Data fit achieved in inversions of DCs.Comparison between the observed modal-dispersion data (red solid lines) and the posteriori ensembles (blue dashed lines) obtained from DCs-BO inversion.The green dotted line represents the best model and the gray region is a 95% credible interval from the Bayesian sampling.

Figure 5 .
Figure 5. (Color online) (a) The upper envelope (red line) and lower envelope (orange line) of the observed data.(b) Data fits achieved in FWH inversion.Comparison between the observed FWH data (blue solid lines) and the posteriori ensembles (purple dashed lines) obtained from FWH-BO inversion.The green solid line represents the best model and the gray region is the 95% credible interval from the Bayesian sampling.

Figure 5 .
Figure 5. (Color online) (a) The upper envelope (red line) and lower envelope (orange line) of the observed data.(b) Data fits achieved in FWH inversion.Comparison between the observed FWH data (blue solid lines) and the posteriori ensembles (purple dashed lines) obtained from FWH-BO inversion.The green solid line represents the best model and the gray region is the 95% credible interval from the Bayesian sampling.
space.Based on Figure6, it can be seen that both misfit measures exhibit numerous local minima, but with different characteristics in terms of specific details.The distribution of the likelihood function space for DCs based on the L2 norm exhibits a convex ridge-like distribution along with the s c axis.Comparing Figure6b,d, it can be concluded that the misfit function based on the Wasserstein metric significantly improves the convexity of the posterior distribution of the likelihood function obtained from MOBO inversion.

Figure 6 .
Figure 6.The posterior distribution of likelihood function in MOBO inversions, with black solid points representing the true values.(a,b) Likelihood distributions of DCs based on (a) L2 term and (c) Wasserstein metric.(c,d) Likelihood distributions of FWH based on (c) L2 term and (d) Wasserstein metric.

Figure 6 .
Figure 6.The posterior distribution of likelihood function in MOBO inversions, with black solid points representing the true values.(a,b) Likelihood distributions of DCs based on (a) L2 term and (c) Wasserstein metric.(c,d) Likelihood distributions of FWH based on (c) L2 term and (d) Wasserstein metric.

Figure 7 .
Figure 7. Posteriori models based on L2-MOBO and corresponding data fits for (b) FWH and (c) DCs, along with marginal distributions of (d) sediment thickness, (e) sediment sound speed, (f) sediment density, (g) basement density, (h) sediment attenuation and (i) basement attenuation.(a) Models of sound velocity and sediment depth.The results are represented with different colors, indicating the probability of the models, from the best 20%, 50% (medium) and all.Black vertical lines mark the actual sound speed values in the sediment and basement.The horizontal dashed line and the patch show the true value and distribution interval of sediment depth, respectively.The black solid lines, red solid lines and blue dashed in (b,c) indicate the average inversion values, observed values, and misfits under the posterior, respectively.The grouping of the best and full models and the colors are consistent with (a).The grey uncertainty intervals in (b,c) represent the range of values that contains the mean of the data, with a 95% credible interval.Black vertical dashed lines in (d-i) represent the true value of parameters.

Figure 7 .
Figure 7. Posteriori models based on L2-MOBO and corresponding data fits for (b) FWH and (c) DCs, along with marginal distributions of (d) sediment thickness, (e) sediment sound speed, (f) sediment density, (g) basement density, (h) sediment attenuation and (i) basement attenuation.(a) Models of sound velocity and sediment depth.The results are represented with different colors, indicating the probability of the models, from the best 20%, 50% (medium) and all.Black vertical lines mark the actual sound speed values in the sediment and basement.The horizontal dashed line and the patch show the true value and distribution interval of sediment depth, respectively.The black solid lines, red solid lines and blue dashed lines in (b,c) indicate the average inversion values, observed values, and misfits under the posterior, respectively.The grouping of the best and full models and the colors are consistent with (a).The grey uncertainty intervals in (b,c) represent the range of values that contains the mean of the data, with a 95% credible interval.Black vertical dashed lines in (d-i) represent the true value of parameters.

Figure 8 .
Figure 8. Posteriori models based on Wasserstein-MOBO and corresponding data fits for (b) FWH and (c) DCs, along with marginal distributions of (d) sediment thickness, (e) sediment sound speed, (f) sediment density, (g) basement density, (h) sediment attenuation and (i) basement attenuation.(a) Models of sound velocity and sediment depth.The results are represented with different colors, indicating the probability of the models, from the best 20%, 50% (medium) and all.Black vertical lines mark the actual sound speed values in the sediment and basement.The horizontal dashed line and the patch show the true value and distribution interval of sediment depth, respectively.The black solid lines, red solid lines and blue dashed lines in (b,c) indicate the average inversion values, observed values, and misfits under the posterior, respectively.The grouping of the best and full models and the colors are consistent with (a).The grey uncertainty intervals in (b,c) represent the range of values that contains the mean of the data, with a 95% credible interval.Black vertical dashed lines in (d-i) represent the true value of parameters.

Figure 8 .
Figure 8. Posteriori models based on Wasserstein-MOBO and corresponding data fits for (b) FWH and (c) DCs, along with marginal distributions of (d) sediment thickness, (e) sediment sound speed, (f) sediment density, (g) basement density, (h) sediment attenuation and (i) basement attenuation.(a) Models of sound velocity and sediment depth.The results are represented with different colors, indicating the probability of the models, from the best 20%, 50% (medium) and all.Black vertical lines mark the actual sound speed values in the sediment and basement.The horizontal dashed line and the patch show the true value and distribution interval of sediment depth, respectively.The black solid lines, red solid lines and blue dashed lines in (b,c) indicate the average inversion values, observed values, and misfits under the posterior, respectively.The grouping of the best and full models and the colors are consistent with (a).The grey uncertainty intervals in (b,c) represent the range of values that contains the mean of the data, with a 95% credible interval.Black vertical dashed lines in (d-i) represent the true value of parameters.

Table 1 .
Inversion parameters and uniform prior bounds.
1Except for the basement half space.

Table 2 .
Summary of individual BO inversion (DCs & FWH) and MOBO inversion results.A mean of the posterior distribution, called a Bayes estimator, with 95% credible interval uncertainties is also given.The rightmost column gives the normalized root-mean-square error (NRMSE) of this inversion results.