A Graph-Space Optimal Transport Approach Based on Kaniadakis κ-Gaussian Distribution for Inverse Problems Related to Wave Propagation

Data-centric inverse problems are a process of inferring physical attributes from indirect measurements. Full-waveform inversion (FWI) is a non-linear inverse problem that attempts to obtain a quantitative physical model by comparing the wave equation solution with observed data, optimizing an objective function. However, the FWI is strenuously dependent on a robust objective function, especially for dealing with cycle-skipping issues and non-Gaussian noises in the dataset. In this work, we present an objective function based on the Kaniadakis κ-Gaussian distribution and the optimal transport (OT) theory to mitigate non-Gaussian noise effects and phase ambiguity concerns that cause cycle skipping. We construct the κ-objective function using the probabilistic maximum likelihood procedure and include it within a well-posed version of the original OT formulation, known as the Kantorovich–Rubinstein metric. We represent the data in the graph space to satisfy the probability axioms required by the Kantorovich–Rubinstein framework. We call our proposal the κ-Graph-Space Optimal Transport FWI (κ-GSOT-FWI). The results suggest that the κ-GSOT-FWI is an effective procedure to circumvent the effects of non-Gaussian noise and cycle-skipping problems. They also show that the Kaniadakis κ-statistics significantly improve the FWI objective function convergence, resulting in higher-resolution models than classical techniques, especially when κ=0.6.


Introduction
The task of inferencing physical parameters from indirect observations arises in various practical problems. Determining parameters that cannot be directly observed remains a complex issue and involves a robust set of tools that compose the theoretical basis of the inverse problem theory [1]. The goal of an inverse problem consists of obtaining a quantitative model m that explains the observations (or observed data) by matching modeled data d mod = G(m) to observed data d obs , in which G denotes the so-called forward operator. The forward operator maps the variables from the model space to the data space through a physical law [2]. For instance, we may want to determine the thermal diffusivity of a material (physical system) by analyzing the observed data: the temporal distribution of the diffusing material density at a determined location. In this regard, the model consists of the collective diffusion coefficient, and a diffusion equation represents the forward operator G. So, the diffusion coefficients are determined by optimizing an objective function, which measures the distance between modeled and observed data.
In this work, we consider a non-linear inverse problem that has attracted increasing interest in several fields, such as astrophysics [3], biomedicine [4], machine learning [5], materials (mass) from sources to sinks. In recent years, OT theory has received much attention in broad literature (e.g., refs. [40][41][42][43]), such as geophysics issues [44][45][46]. However, the OT-based objective function is suitable for comparing probability distributions, that is, positive and normalized quantities, two requirements that seismic signals do not greet due to their oscillatory nature. In this way, waveforms are commonly distorted through transformations to satisfy the probability axioms, which may manufacture unwanted information. Indeed, several applications have demonstrated the effectiveness of OT-based objective functions to mitigate the effects of phase ambiguity; however, all assume that the errors obey Gaussian statistics.
In this work, we explore the κ-objective function in the context of OT theory to introduce an objective function resistant to non-Gaussian noise and less sensitive to cycleskipping issues. In this regard, we propose a robust framework for matching seismic waveforms using the Wasserstein distance, a well-posed relaxation of the OT formulation. Furthermore, inspired by ref. [47], we consider the representation of the waveforms in the graph space suitable for real large-scale problems [33]. In this approach, the waveforms are represented by Dirac delta functions in a two-dimensional space (amplitude versus time).
We organize the present work as follows. In Section 2, we briefly introduce the theoretical basis of inverse problems in the context of κ-Gaussian statistics and their robustness properties. In Section 3, we present a well-posed relaxation of the original optimal transport formulation using the Kaniadakis κ-objective function. Then, in Section 4 we present FWI based on optimal transport and κ-Gaussian statistics in the context of the adjoint state method. In Section 5, we demonstrate how the proposed objective function deals with cycle-skipping issues and non-Gaussian noise by considering a Brazilian pre-salt case study. Finally, we devote Section 6 to the final remarks and future applications.

Inverse Problems in the Context of Kaniadakis κ-Statistics
In science issues, several practical problems are data-centric. Indeed, determining a quantitative physical model that explains the observations is crucial to more accurately model and describe a wide variety of existing physical systems. In this context, the inverse problem theory is an excellent tool.
From a practical point of view, an inverse problem is formulated as an optimization task for obtaining a quantitative model by comparing modeled data to observed data. Modeled data are calculated using an appropriate physical law. The comparison between modeled and observed data is performed through an objective function. In the classical approach, the objective function is constructed from the assumption that the errors (the difference between modeled and observed data) obey Gaussian statistics. Let ε = {ε 1 , ε 2 , · · · , ε N } be the errors. From the assumption that the errors are independent and identically distributed according to a standard Gaussian distribution, we can determine the associated likelihood function as follows [11]: where L 0 is the Gaussian likelihood. The use of index 0 will become clear later on. It is worth remembering that the standard Gaussian distribution can be determined from the maximization of the Boltzmann-Gibbs-Shannon entropy subject to the normalization condition and the unit variance constraint, In inverse problems, the errors ε depend on the model parameter m and are computed through the difference between modeled (d mod = G(m)) and observed (d obs ) data, i.e., ε i (m) = d mod i (m) − d obs i , where G represents the forward operator. In this way, obtaining the model parameter can be performed by employing the maximum likelihood estimation (MLE) method, which is achieved by maximizing the likelihood function as follows: wherem represents the estimated model. The MLE estimates an unknown model parameter by considering that its optimal value maximizes the probability that the observed data are measured. Since the minimum of the negative log-likelihood coincides with the maximum of the likelihood function, maximizing L 0 (5) is equivalent to minimizing the negative log-likelihood, i.e., From the principle of maximum likelihood, an objective function φ 0 can be obtained from [11]: which can be rewritten as: We notice that minimizing Equation (8) or (9) is the same since the term N 2 ln(2 π) is constant. The latter equation is well-known and used in solving problems via the least squares method. Please see Section 2 of ref. [48] for more detail.
However, due to the non-Gaussianity of the errors, it is reasonable to assume that the errors are non-Gaussian. In this study, we consider that the errors are distributed according to a Kaniadakis κ-Gaussian distribution of the form [22]: where Z κ is a normalizing constant, β κ is a scale parameter, and with 0 ≤ |κ| < 1, is the κ-exponential function [26], a generalization of the exponential function. The κ-exponential becomes the ordinary exponential function in the limit κ → 0: exp 0 (y) = exp(y). Considering the normalization (3) and unitary variance (4) conditions, we obtain and holding for |κ| < 2/3. The standard Gaussian distribution (1) is a particular case of the κ-Gaussian distribution (10) in the classical limit κ → 0 since lim κ→0 β κ = 1 2 and lim κ→0 Z κ = √ 2π. Figure 1 depicts the plots of the κ-Gaussian distribution (10) for typical κ-values, with the solid black curve referring to the standard Gaussian distribution (κ → 0). Because we assume that the errors are independent and identically distributed by the power law distribution represented in (10), we can calculate the corresponding objective function by estimating the most likely state using the probabilistic maximum likelihood method: min where L κ (m|d obs ) := ∏ N i=1 p κ (ε i (m)) represents the likelihood function. It is crucial to remember that minimizing the negative log-likelihood is the same as maximizing the likelihood function. In this way, the objective function φ κ can be obtained from (14): where φ κ is the κ-objective function, which converges to the classical objective function (9) in the limit κ → 0. The κ-objective function is not easily influenced by aberrant measurements (outliers), as it is based on κ-Gaussian criteria [49]. To demonstrate this, we compute the influence function Υ related to the objective function. According to ref. [50], a statistical criterion is not robust if Υ → ±∞ under |ε| → ∞, and robust (outlier-resistant) if Υ → 0 under ε → ±∞. Given a model m i , the influence function is defined by [50]: where φ κ (ε|m i ) is the κ-objective function computed from the errors ε given the model m i . Thus, the κ-objective function generates the following influence function: for 0 < κ < 2/3, with Υ κ = Υ κ (m i ) and ε = ε(m i ). We notice that, as ε tends to ±∞, the influence function associated with the κ-criterion (Υ κ ) approaches to 0; the κ-objective function is then robust (outlier-resistant). Indeed, the influence function in its valid domain (0 < κ < 2/3) is proportional to 1/ε for large errors (suppressing these) and to ε for small errors (magnifying these). Figure 2 depicts the behavior of the κ-objective function and the associated influence function.

Optimal Transport Metric Based on Kaniadakis κ-Statistics
In 1781, Gaspard Monge first raised a challenger transportation problem [39], which consisted of moving a pile of sand from one place to another optimally and without losing mass. As formulated by Monge, the optimal transport (OT) issue is an ill-posed problem; hence the solution, if it exists, is not unique. Nearly 200 years later, Leonid Kantorovitch proposed a well-posed relaxation of Monge's OT problem in the context of optimal economic resource allocation [51]. In this regard, Kantorovich proposed what is now known as the Kantorovich-Rubinstein metric (also referred to as Wasserstein distance), which earned him the 1975 Nobel Memorial Prize in Economic Science.
The Wasserstein criterion is a metric that defines a distance between two probability distributions. Let us consider two sets of points Ω 1 = {x i ; i = 1, 2, · · ·, N 1 } and Ω 2 = {y j ; j = 1, 2, · · ·, N 2 }, in which each point x i and y j are represented by "mass" functions, namely µ(x i ) and υ(y j ), respectively. Considering the mass conservation constraint (∑ i µ(x i ) = ∑ j υ(y j ) = 1), we can define the κ-optimal total transport cost W κ as [30,40]: where Λ(µ, υ) denotes the set of transport maps T defined in The transport map T assigns how many "sand particles" from µ(x i ) should be transported to υ(y j ) for each pair (x i , y j ), while the κ-objective function maps each pair (x i , y j ) to [0; +∞]. The Monge-Kantorovich transportation relaxed problem (19), using the Kaniadakis κ-statistics, can therefore be solved by determining an optimal transport plan T that minimizes the κ-optimal total transport cost W κ from µ to υ, given an κ-objective function φ κ .
Let us consider a metric space P (X × Y ) formed by a set of probability measures, in which X and Y are two separable and complete metric spaces with µ ∈ X and υ ∈ Y. From this point forward, for practical reasons, we assume that X = Y ⊂ R N (with N 1 = N 2 = N ∈ N). In addition, let us consider the mass distributions µ and υ represented in terms of the Dirac delta function as follows: , in which u l ∈ Ω 1 and w l ∈ Ω 2 point out the data points describing µ(x) and υ(x). In this context, we can reformulate the optimization problem in Equation (19) as follows: From a practical viewpoint, we notice that solving (21) consists of obtaining an optimal transport plan that links data points from P (X ) to the corresponding data points in P (Y ) that minimizes the κ-optimal total transport cost W κ . Although each element of the optimal transport plan T i,j can assume fractional values, a classic result states that the optimal solution values are integer values, specifically 0 or 1 when the constraints described in Equation (22) are considered [52,53]. Indeed, obtaining the minimum of W κ implicates solving a combinatorial optimization issue, which can be defined as: where σ represents a permutation solution for the linear sum assignment problem in (21) related with T , and S(N) = {1, 2, · · ·, N} is a set of permutations. Equation (23) represents the Wasserstein metric in the context of κ-Gaussian statistics. Naturally, the Wasserstein metric based on Kaniadakis κ-statistics appreciates the advantages provided by κ-Gaussian statistics. However, this approach in this format is only valid for comparing probability distributions, which is not interesting for geophysical applications like the FWI case. This incompatibility is because seismic signals are not normalized and positive-definite quantities like probability functions.

FWI Based on Kaniadakis κ-Gaussian Distribution
In this section, we present the main elements of FWI based on the Kaniadakis κ-Gaussian distribution, the metric explained in Section 2. The FWI is a non-linear inverse problem whose main goal consists of inferring a quantitative physical model by comparing modeled waveforms (modeled data) with measured waveforms (observed data) [7]. FWI is often formulated as a gradient-based minimization due to the computational costs, in which the model parameters are iteratively updated, from an initial model m 0 , as follows [8]: where m represents the model parameter, α i > 0 is the so-called step length [54], N iter represents the number of FWI iterations, and h κ denotes the descent direction at the ith iteration.
In this work, we employ a non-linear conjugate gradient optimization method based on the so-called Polak-Ribière-Polyak algorithm. In this regard, the descent direction is defined by [55,56]: where ∇ m φ κ (m) is the gradient of the κ-objective function. Thus, it is remarkable that the objective function plays a crucial role in obtaining models via FWI, which is defined for our problem as (10): where Γ s,r ψ s = d mod s,r and d s,r = d obs s,r represent the modeled and observed data generated by the seismic source s and recorded in the receiver r, while x ∈ R 2 and t ∈ [0, T] denote the spatial coordinates and the seismic acquisition time.
It is worth mentioning that the observed data d s,r are registered only in the receiver positions x = x s,r , the available and chosen positions during a seismic survey. The seismic wavefield ψ s is computed in the entire physical domain for each seismic source s by solving a wave equation. Thus, Γ s,r represents a sampling operator that acts as a measurement processor onto the receiver r from the source s. In this work, we consider the acoustic case; therefore, ψ s are the pressure wavefields that satisfy the following model: where g s represents a seismic source signature at the fixed position x = x s , c is the P-wave velocity model of the medium, and ∇ 2 denotes de Laplacian operator. Thus, the gradient of the κ-objective function (27) with respect to the model parameters is given by: where ∆d s,r (m, t) = Γ s,r ψ s ( x, m, t) − d s,r ( x s,r , t) represents the error (or residual data) and is known as the Fréchet derivative. It is worth emphasizing that FWI problems involve many elements from the model parameters that typically comprise 10 6 to 10 12 variables (coefficients of the wave equation). In this context, we need to solve the wave equation once in the forward modeling process plus at least 10 6 times in calculating the gradient of the κ-objective function through Fréchet derivatives, being unfeasible in industrial problems.

Adjoint-State Method
Since calculating Fréchet derivatives can be computationally prohibitive, we compute the gradient of the κ-objective function using the adjoint-state method, which was developed in the 1970s [57]. There are several ways to formulate the state-adjoint approach, such as in techniques based on the augmented Lagrangian method or Green's functions. How-ever, in this work we consider the perturbation theory to calculate the gradient efficiently. We notice that the κ-objective function can be written as: where ψ is a state variable that belongs to the complex space Q; ψ satisfies the following equation of state: in which we suppress the subscript s for the sake of a simplified notation. In the latter Suppose we consider an arbitrary variation δm concerning the model parameter m. In that case, the state variable ψ will be disturbed by a variation δψ; consequently, the κobjective function in Equation (31) will also be disturbed. In this way, we have to: where we only consider the first-order terms in δm and δψ. Furthermore, ψ j is any element of the space Q, and , Q is the inner product in Q.
It is worth emphasizing that the perturbations δm and δψ also induce variations in the equation of state (32). Moreover, assuming that there is a unique solution ψ for any model parameter m, we can state that ψ + δψ is the unique solution of F(ψ + δψ, m + δm) = 0. In other words, for a physical realization ψ (that is, F(ψ, m) = 0), we have the following first-order development in δm and δψ: From the latter equation, we have that the perturbation in the state variable ψ is given by: where a −1 denotes the inverse of a. So, replacing the resulting from Equation (35) in Equation (33), we have an efficient way to compute the gradient of the κ-objective function without the Fréchet derivatives: On the other hand, to obtain an intuitive way to calculate the gradient, Equation (36) can be rewritten so that in the inner product in Q, one of the terms varies only with ψ and the other with m. For this, we consider the following adjoint operator property for any x and y variables: x, Ry = R † x, y where R † is the adjoint operator of R, while the superscript † represents the adjoint operation (complex-conjugate transpose). Applying the property (37) to the second term of Equation (36), we obtain: Furthermore, if we consider a new state variable v belonging to the complex space V, given by: where v is the first term of the inner product in Equation (38), we have the following equation of state: which is known as the adjoint-state equation [58,59], and therefore v is called the adjointstate variable.
In summary, the calculation of the gradient of the κ-objective function through the state-adjoint method is given by: where the state-adjoint variable v is calculated from the state-adjoint equation in (40). In this way, for our problem we have: where φ κ (m) = ∑ s f ψ s (m, t), m . In addition, for any model parameter m, let ψ s be a solution of the equation of state given in (32), that is, a physical realization. We obtain: and Therefore, we obtain the following derivatives of the equation of state: and the following for the κ-objective function: Thus, the gradient of the κ-objective function via the state-adjoint method is given by substituting the derivatives calculated in Equations (45) and (46) in Equations (40) and (41), as follows: with v s being the solution of the adjoint-wave equation given by where m( x) = 1 c 2 ( x) . In search of a physical meaning for Equations (47) and (48), let us consider a new state variable given by λ s ( x, t) = v s ( x, T − t). So, the latter equation becomes: We notice that the adjoint-state variable λ s is calculated in reverse time from Equation (49), i.e., starting the wave propagation from the final time T to the initial time 0. For this reason, this state-adjoint variable is commonly called the backpropagated wavefield, while Equation (49) is called the adjoint-wave equation, in which the right-hand term is named the adjoint source [59]. In this way, the κ-objective function gradient is calculated efficiently from the cross-correlation of the forward wavefield with the backpropagated wavefield. In this context, computing the gradient via the state-adjoint method for each seismic source requires solving the wave equation only twice, first in the forward modeling and second in backpropagation modeling. We also point out that the robustness properties of the objective function discussed in Section 2 are indispensable in calculating the gradient. Indeed, the influence function (18) gives the adjoint source used in the inversion process. The particular classical case κ → 0 provides the residual data as the adjoint source.

κ-Graph-Space Optimal Transport FWI
From a statistical point of view, non-Gaussian criteria are critical to handle noisy datasets in FWI analysis [9]. In this sense, we also consider the κ-Gaussian-based metric to deal with a challenging issue in FWI called cycle skipping [7]. Cycle skipping occurs when the initial model used in the FWI process is not kinematically accurate or lacks low-frequency contents in the analyzed dataset [34]. So, we consider the criterion based on κ-Gaussian statistics in the context of OT metric to mitigate the effects of non-Gaussian errors and cycle-skipping issues. However, let us remind the reader that the OT metric measures the distance between probability distributions, incompatible with a comparison between seismic signals, as discussed earlier. Thus, to work around this incompatibility, in this work we represent the non-normalized and oscillatory waveforms in the graph space [47]. Graphs are mathematical structures formed by ordered pairs of disjoint sets (V, E), where V denotes the so-called vertices and E represents an edge that connects paired vertices [60].
Hence, we discretize the waveforms d(t) as an ensemble of ordered pairs of the form {(t i , d i ) ∈ R 2 ; i = 1, 2, · · ·, N} with d i = d(t i ). So, the graph-transformed representation of a discretized waveform d = {d i ; i = 1, 2, · · ·, N} is defined as: where G denotes the graph transformation, d G (y, t) is the graph-transformed waveform, and D(R 2 ) is a probability space on R 2 . The graph-transformed waveform is defined as: where y is associated with the waveform amplitude. In this way, waveforms are represented by normalized and positive quantities. However, in many contexts like FWI, one needs to calculate the derivative of waveforms on some occasions (as explained in the previous sections); the Dirac delta function is not differentiable. Due to this, we consider a smoothed graph transformation by representing Dirac functions using κ-Gaussian distributions (10). Thus, the κ-graph-transformed representation of a discretized waveform is given by [61]: where C ∞ (R, R + * ) represents a set of strictly positive and infinitely differentiable functions. In this context, the graph-space κ-OT objective function is defined as: where d G κ mod,i = (t i , d mod,i ) and d G κ obs,i = (t i , d obs,i ), and C κ d mod , d obs = W G κ κ d G κ mod , d G κ obs represents the κ-Wasserstein criterion applied to the graph-transformed seismic data. The κ- is then computed via the following minimization task: where N t denotes the number of time samples for each waveform, while σ is the permutation solution for the linear sum assignment problem in (21) related with a transport map T , and S(N) = {1, 2, · · ·, N} is an ensemble of permutations. For simplicity, we multiply the κ-Wasserstein distance (23) by the scalar N in the latter equation. Indeed, optimizing W κ is equivalent to optimizing the product N × W κ . Equation (55) represents the FWI objective function based on κ-OT, namely, the κ-GSOT-FWI, for short, in reference to κ-Graph-Space Optimal Transport FWI. The gradient of the κ-GSOT-objective function (55), that is, the derivative of W G κ κ with respect to the model parameters, is given by: is the adjoint-source related with the κ-GSOT-FWI framework, while N s , N r and N t represent the number of seismic sources, receivers, and time samples used in the acquisition of seismic data. The statistical interpretation of the residual data (error) associated with the κ-Gaussian statistics is preserved in the κ-GSOT-FWI case. The critical difference is that in the approach without OT, the waveforms are compared sample by sample. In contrast, in the κ-GSOT-FWI approach, the waveforms are analyzed more completely, comparing each time sample of the observed data with all the time samples of the modeled data in according to an optimal assignment using the permutation solution σ. Figure 3 shows a flow chart of the FWI algorithm, which is an iterative process, which means that model updates are computed concerning the previous model as described by Equation (24). The first step, called Initial Setup, consists of introducing the input variables, i.e., the initial model, the parameters of the seismic acquisition (the positions of the sources and receivers, the seismic source signature, acquisition time). After configuring and organizing all the input variables of the FWI algorithm, modeled wavefields are obtained in the forward problem through the numerical solution of the wave Equation (28) by employing the finite difference method [62]. Then, a sampling operation (Γ s,r ) is carried out from the modeled wavefields ψ s to obtain the modeled data (d mod s,r = Γ s,r ψ s ), extracting the wavefields in the positions of the seismic acquisition receivers. After, the objective function gradient is obtained through the adjoint-state method described in Section 4.2 and used to update the model following Equation (24). Finally, the FWI algorithm checks whether the optimization process reached the pre-defined stopping criteria (which, in our case, was the maximum number of iterations equal to 50). As long as the criteria are not met, the cycle is repeated. If so, the iterative process is interrupted, and the resulting model is the one that minimizes the difference between modeled and observed data.

Numerical Experiments
To demonstrate how the κ-GSOT-FWI deals with non-Gaussian noise and cycleskipping issues, we carried out numerical examples involving a 2D acoustic time-domain FWI to estimate a P-wave velocity model in a typical Brazilian pre-salt oil region. Such an Earth model, namely, Chalda, represents a region with approximate dimensions 16 by 7 km in lateral distance and depth, respectively, as depicted in Figure 4a. Our problem has 720,484 unknown variables because we discretize the Chalda model in a regular grid with 12.5 m spacing, generating 562 and 1282 grid cells in the vertical and horizontal directions, respectively. In all numerical experiments, we consider a seismic survey comprising 161 seismic sources equally spaced every 75 m at 12.5 m in-depth. We employ a Ricker wavelet as a seismic source, which is mathematically described by: f (t) = 1 − 2π 2 µ 2 p t 2 exp − π 2 µ 2 p t 2 , in which µ p represents the peak frequency (maximum energy in the spectrum of frequencies). Moreover, to simulate a sparse node acquisition, named the ocean bottom nodes survey, we take into account 21 receivers implanted on the ocean floor at 400 m intervals. We consider the Chalda model depicted in Figure 4a as a benchmark (or true model). Thus, we generate a seismic dataset by considering the true model, the acquisition geometry, and the finite difference method with second and eighth order approximations for time and space. In order to simulate an infinite medium, we implement the perfectly matched layer [63] absorbing boundaries for spatial discretization. We consider 7 s as the seismic acquisition time at a sampling rate of 2 ms. In addition, to simulate a realistic case, we also employ a high-pass filter on the seismic dataset to remove energy less than 2.5 Hz.
In the FWI experiments, we consider two scenarios involving different initial models to confirm the significance of our proposal. In the first one, we consider an initial model similar to the true model, which is depicted in Figure 4b. We produce such a velocity model by weakly smoothing the true model by applying a Gaussian filter with a standard deviation of 250 m. This scenario's idea is to simulate a seismic imaging process starting from a kinetically accurate model. We call this model the Good Model. In contrast, we produce the second initial model, referring to the second scenario, by applying a more severe Gaussian filter with a standard deviation of 750 m. We call this model the Bad Model. We notice that the Bad Model lacks the main structures of the true model, particularly in the pre-salt oil region, as depicted in Figure 4c. Since the Bad Model is kinematically inaccurate, it generates cycle-skipped data [34].
For each initial-model scenario, we conduct time-domain FWI by applying the classical FWI based on Gaussian statistics, and the κ-GSOT objective function (55) in the classical limit κ → 0 and for κ = 0.1, 0.3, 0.5 and 0.6. We consider 50 FWI iterations in all numerical experiments. To evade the so-called inversion crime, we perform the forward modeling using a different algorithm than the one used to generate the observed dataset. In this regard, our algorithm solves the forward problem using a finite difference scheme with second and fourth order approximations for time and space in a regular grid with 25 m spacing. In addition, we consider two different circumstances concerning the type of noise in the seismic dataset. First, we consider a dataset contaminated by Gaussian noise with a signal-to-noise ratio (SNR) of 20. In contrast, in the second circumstance, we consider a non-Gaussian noise from which the dataset is polluted by Gaussian noise with an SNR of 20 and a collection of spikes (erratic data or outliers) with different amplitudes. In this regard, 10% of the time samples were contaminated by outliers, where locations were randomly drawn. The spikes have intensities that range from 5 P to 15 P multiplied by the original waveform amplitude, where P is a standard normal random variable.
We perform our numerical simulations using a computer hosting a quad-core processor at 3.50 GHz and 256 GB RAM. Each FWI iteration takes approximately 6 min; 71.4% is associated with calculating the gradient of the objective function using the state-adjoint method described in Section 4.2, 26.7% is related to the forward modeling process (i.e., in generating the modeled data by numerically solving Equation (28)), 1.3% of this time is dedicated to solving the combinatorial optimization problem in (55), and 0.6% is spent on the rest of the algorithm in I/O initialization and initial set-ups loading. Figures 5 and 6 show the FWI resulting P-wave models starting from the Good Model for the Gaussian and non-Gaussian noise cases, respectively. From a visual inspection, when only Gaussian noise is considered, all resulting models are satisfactory ( Figure 5) since they are very similar to the true model (Figure 4a), regardless of the κ-value. Such successful results are due to the weak Gaussian noise in the observed data simultaneously with a kinetically accurate initial model (Figure 4b).
where c true and c inv are the true and the resulting models, while cov(·) and std(·) denote covariance and standard deviation, respectively. The R-value ranges from −1 to 1, with −1 representing, in this context, a wrong resulting model, while 1 represents a perfect resulting model. The NRMS-value range from 0 (perfect resulting model) to ∞ (wrong resulting model). Table 1 summarizes the comparative metrics between the true model and the P-wave velocity models resulting from the first scenario by analyzing data contaminated only by Gaussian noise. In this table, we can see that all resulting models have a low error and are strongly correlated with the true model (R ≥ 0.8, following the strength-scale suggested by ref. [64]). Table 1. The comparative metrics between the true model and the resulting models, depicted in Figure 1, from the first scenario in the Gaussian noise case. R represents the Pearson's correlation coefficient, while NRMS represents the normalized root-mean-square. However, when non-Gaussian noise is considered, the classical approach fails as expected (Figure 6a). Such a wrong model is due to the classical approach being based on Gaussian statistics and sensitive to cycle-skipping issues. Figure 6b shows the resulting model from the classical GSOT-FWI, which is also based on Gaussian statistics. However, the Wasserstein metric was able to mitigate the effects of the outliers, building a satisfactory model. Nevertheless, as the κ-value increases (which means a more significant deviation from Gaussian behaviors), the κ-GSOT-FWI models present a better resolution (Figure 6c-f), especially in the deeper regions of the analyzed area. Although all κ-GSOT-FWI models are strongly correlated with the true model, the case κ = 0.6 has a higher Pearson's coefficient and a smaller NRMS error, as summarized in Table 2.  Table 2. The comparative metrics between the true model and the resulting models, depicted in Figure 6, from the first scenario in the non-Gaussian noise case. R represents the Pearson's correlation coefficient, while NRMS represents the normalized root-mean-square.  Figures 7 and 8 show the FWI resulting P-wave models starting from the Bad Model for the Gaussian and non-Gaussian noise cases, respectively. From a visual inspection, it is noticeable that the classical FWI approach fails when the initial model is kinetically inaccurate, regardless of whether the data are polluted by Gaussian or non-Gaussian noise, as depicted in Figures 7a and 8a. In contrast, the FWI based on the κ-GSOT approach generates satisfactory models when Gaussian noise is considered, regardless of the κ-value (Figure 7b-f). Again, as the κ-value increases, the resulting models (Figure 7) are closer to the true model (Figure 4a), as endorsed by the statistical metrics summarized in Table 3.

Strategy
Finally, in the second scenario with non-Gaussian noise, the resulting models are drastically affected by the outliers and poverty from the initial model, as depicted in Figure 8. However, when the κ-GSOT-based objective function is applied, the large geological structures of the true model are reconstructed regardless of the κ-value. However, the case κ = 0.6 reveals a P-wave velocity model (Figure 8f) that is quite accurate and comparable to the true model ( Figure 4a). Likewise, the case κ = 0.6 generated a model closer to the true model, as summarized in Table 4. Indeed, in all numerical tests, the κ-GSOT-FWI for κ = 0.6 generated accurate velocity models, leading to accurate parameter estimations.  Table 3. The comparative metrics between the true model and the resulting models, depicted in Figure 7, from the second scenario in the Gaussian noise case. R represents the Pearson's correlation coefficient, while NRMS represents the normalized root-mean-square.   Table 4. The comparative metrics between the true model and the resulting models, depicted in Figure 8, from the second scenario in the non-Gaussian noise case. R represents the Pearson's correlation coefficient, while NRMS represents the normalized root-mean-square.  Figure 9 shows the normalized κ-GSOT-objective function decay for all numerical tests, in which panels (a) and (b) refer to the first scenario, while panels (c) and (d) correspond to the second scenario. In this regard, the left column refers to the case in which Gaussian noise is considered, and the right column is the non-Gaussian noise case. The convergence curve of the classical objective is represented by the solid black line in Figure 9. We notice that the classical objective function monotonically decays only in the most straightforward situation, where the initial model is the Good Model, and the data is contaminated by Gaussian noise (Figure 9a). In this case, the classical approach is the most indicated because the convergence rate is higher than our proposal, in addition to generating more accurate models (as summarized in Table 1). In cases where the noise is non-Gaussian or when the inversion process starts from the Bad Model, our proposal with κ = 0.6 exhibits a higher objective function decay rate (see red curves in Figure 9b-d), reconstructing P-wave velocity models closer to the true model, as summarized in Tables 2-4.

Final Remarks
In this work, we have examined the portability of the objective function based on the graph-space optimal transport and Kaniadakis κ-Gaussian statistics in the FWI context. In particular, we have analyzed the robustness of our proposal in mitigating two critical problems in seismic imaging via FWI, which are associated with cycle-skipping issues and the non-Gaussian nature of the errors. We have set up an objective function by employing the probabilistic maximum likelihood method for computing the most probable state using a κ-Gaussian distribution. Furthermore, we have formulated the FWI in a relaxed version of the optimal transport problem, known as the Kantorovich-Rubinstein metric or Wasserstein distance. So, we have considered the graph of the seismic data rather than the original data because the optimal transport framework is predicated on the idea that the compared entities adhere to the probability axioms. We named our proposal the κ-Graph-Space Optimal Transport FWI (or κ-GSOT-FWI, for short).
The Brazilian pre-salt case study disclosed how the κ-GSOT-FWI could be employed to deal with flawed initial models and non-Gaussian noise. The findings have demonstrated that the classical approach is ineffective in producing accurate physical models when the initial model is crude or if the observed waveforms are contaminated by non-Gaussian errors. However, when the initial model is kinetically precise and the data well-behaved, the classical approach is the best alternative in terms of computational cost. The results also revealed that the κ-GSOT-FWI lessens the impact of phase ambiguity and non-Gaussian errors on the waveform inversion, demonstrating that our proposal is a powerful way to deal with non-linear inverse problems related to wave propagation. Moreover, we notice that the κ-GSOT-FWI produces more accurate models than those produced by classical approaches, leading to a notable improvement in objective function convergence. Additionally, our numerical experiments demonstrated that a more significant deviation from a Gaussian behavior (which in our applications was typified by the κ = 0.6 case) results in a more authentic P-wave velocity model. However, our proposal depends on the choice of a hyperparameter, which demands special investigations on how to obtain it in a real setting application. This issue should be examined in future applications.
From a practical point of view, extensive and arduous data processing is required to engineer a good initial model to alleviate phase-ambiguity issues and eliminate erratic data points. In this context, the κ-GSOT-FWI decreases the requirement of human subjectivity, which is appealing for automated techniques to analyze, for instance, recent big datasets. Thus, the κ-OT-based approach has enormous potential for dealing with modern data-centric problems. As a perspective, we intend to test our proposed methodology to analyze field data and evaluate its robustness from several initial conditions. Finally, we underline how readily our concept may be applied to a wide variety of inverse problems, ranging from estimating critical exponents of power-law distributions to modern artificial intelligence applications.