Optimizing Finite-Difference Operator in Seismic Wave Numerical Modeling

: The ﬁnite-difference method is widely used in seismic wave numerical simulation, imaging, and waveform inversion. In the ﬁnite-difference method, the ﬁnite difference operator is used to replace the differential operator approximately, which can be obtained by truncating the spatial convolution series. The properties of the truncated window function, such as the main and side lobes of the window function’s amplitude response, determine the accuracy of ﬁnite-difference, which subsequently affects the seismic imaging and inversion results signiﬁcantly. Although numerical dispersion is inevitable in this process, it can be suppressed more effectively by using higher precision ﬁnite-difference operators. In this paper, we use the krill herd algorithm, in contrast with the standard PSO and CDPSO (a variant of PSO), to optimize the ﬁnite-difference operator. Numerical simulation results verify that the krill herd algorithm has good performance in improving the precision of the differential operator.


Introduction
Seismic wave numerical modeling is a widely used method to simulate the propagation of seismic waves in a known geological landscape. So far, it has been used in seismic exploration and seismology, e.g., earthquake disaster prediction [1], seismic performance assessment of roads, buildings, bridges, and other materials [2,3], seismic oceanography [4], ship detection, and identification [5].
The wave equation is of fundamental importance for describing the propagation of seismic waves [6]. The methods of seismic wave numerical modeling based on the wave equation include the finite-difference (FD) method [7], the finite element method [8,9], and the pseudo spectrum method [10]. The FD method, first presented in 1968 [7], simulates the propagation of seismic waves in various media of the earth and thus can help predicate what happens during the propagation process [11]. Compared to other techniques, the FD method is more efficient and easily implemented [6].
Generally, the wave equation of seismic wave propagation is a partial differential equation. The FD method transforms the continuous problem into the discrete problem, and one can obtain the approximate solution of the original wave equation by solving the difference equation. As the pseudospectral method is considered as the highest order finite-difference scheme [12], we shall calculate the finite-difference operator by truncating the convolution series of the pseudospectral method with the window function. However, the discretization of wave equation inevitably introduces numerical dispersion, which deteriorates the accuracy of seismic wave simulation. The window function method and optimization method are two essential ways to suppress numerical dispersion.
The size of the window function grid plays an influential role in the modeling precision. Although the refined difference grids offer better performance, sometimes, coarse difference grids are preferably used to reduce the computational cost. For this reason, choosing a suitable window function is demanding in practice. The traditional window functions include Hanning [13], Gaussian [14], Kaiser [15], and so on. In the past decades, some new window functions were also reported in the literature. Wang et al. [16] proposed a Chebyshev auto-convolution combined window. Zheng et al. [17] developed a cosine modulated Chebyshev window. Wang and Hong [18] constructed a new window function by combining the cosine function with the original window function.
Optimizing the difference operator is another strategy. In this category, the Newton method, the least squares method, and the swarm intelligence algorithm are frequently used. For example, Zhang and Yao [19] applied the simulated annealing algorithm to search for the operator that best satisfies the constraints. Z.Y. Wang et al. [20] used the improved particle swarm optimization (PSO) algorithm to optimize the finite-difference operator. Ren et al. [21] developed the least-squares method to optimize the staggered-grid finite-difference operator.
Both the window function and optimization method aim to find the optimal window function coefficients. The difference is that window functions are directly designed parameters, while optimization methods generate the optimal coefficients through the iteration. In this paper, we use the nature-inspired optimization (NIO) algorithms to find the fittest finite-difference coefficients. The NIO algorithms are widely used to solve the optimal solution problem in continuous and discrete space. In this paper, we implement the krill herd algorithm (KH), which is a recently proposed NIO algorithm, the PSO, and the center-decenter PSO, which is a variant of PSO. We derive the finite-difference operator through sinc function interpolation and Taylor series expansions. We define the objective function as the error function of the first and second derivatives. Compared with Wang's Chebyshev auto-convolution combined window, all the NIO algorithms show outstanding performances in seismic wave numerical simulation. Among them, KH exhibits greater stability at most orders.
The rest of the paper is structured as follows. Section 2 introduces the finite-difference method and deduces the objective function of the optimization method. The seismic numerical modeling and the three NIO algorithms are also introduced in this section. Section 3 presents the numerical simulations. Section 4 concludes the paper.

Finite-Difference Method
We start with a brief introduction of the FD method. Based on studies of Chu and Stoffa [22] and Bai et al. [23], we describe the derivation process of FD operators as follows. First, the FD operators can be derived by the sinc interpolator. According to Nyquist's theorem, a continuous signal f (x) with limited bandwidth can be reconstructed by interpolating the normalized sinc function. Roughly speaking, the FD method represents waveforms with discrete samples.
where ∆x is the sampling interval. Thus, 1/∆x is the sampling frequency, and f (n∆x) is the infinite uniformly spaced discrete sequence of f (x). The first and second derivatives of f are derived from Equation (1). The derivation can be simplified by letting x = 0 (see Appendix A.1).
We use a window function ω(n) with length N + 1 to truncate the derivative function Equations (2) and (3). N is an even number, which is related to the accuracy. Windowing is a weighting process that truncates the function of infinite length to [−N/2, N/2]. The FD operators can be calculated as: Chu and Stoffa [22] pointed out that w(n) in the conventional FD operator is a binomial constant coefficient, and they referred to the conventional w(n) as a binomial window. One method to obtain a better FD operator is to design a more suitable w(n) function, which requires more control over the parameters. For the optimization algorithms, we need to set an objective function. Consider the coefficients of the function Equations (4) and (5) as a whole: where b n = − 1 n cos(nπ)ω(n), c n = − 2 n 2 cos(nπ)ω(n), b n and c n are the finite-difference coefficients for the first and second derivative, respectively. They are related by c n = 2b n /n.
The FD weights can also be obtained through the Taylor series expansions (Thongyoy and Boonyasiriwat [24]). It reflects the relation between finite-difference operators and the pseudospectral method. The finite-difference operator is formulated as (see Appendix A.2): where f n = f (x + n∆x), b n and c n are defined similarly as before.
Performing the Fourier transform on Equations (7) and (8), the data can be transformed from the conventional spatial and time domains to the frequency-wavenumber domain, which is also known as the F-k domain (see Appendix A.3). We obtain the first derivative in the F-k domain: and the second derivative: where k x is the wavenumber corresponding to the spatial derivative ∂/∂x and ∂ 2 /∂x, which is the variable in the F-k domain.
The truncation error of the first derivative and second derivative can be expressed as: The FD method replaces the differential in the wave equation with difference, namely, the difference form of the derivative obtained above. To minimize the numerical dispersion introduced by the replacement, we shall find the optimal coefficients b n and c n that minimize the truncation errors E 1 and E 2 . The optimization problem of seismic wave simulation has been transformed into a coefficient optimization problem. Since b n or c n can be derived from each other, we attempt to obtain the optimal b n using the NIO algorithms. Zhang and Yao [19] expressed the objective function of the optimization method as follows: where k max x is the max wavenumber the FD can reach, and is the error limitation. We list the frequently used symbols as in Table 1. Table 1. List of symbols.

Symbol
Meaning window function E 1 , E 2 truncation errors of the first and second derivative b n , c n FD coefficients of the first and second derivative error limitation k x wavenumber k max x max wavenumber k cuto f f cutoff wavenumber

Modeling
We use the elastic wave equation to simulate the seismic wave propagation. The earthquake hypocenter function is given as follows: where f peak is the dominant frequency, and i ∈ {0, 1, . . . , n} is the propagation cycle. We set f peak = 50 Hz.
The max iteration n is calculated by: where t s is the simulation duration and ∆t is the discrete time interval. Let t s = 0.3 s and ∆t be 0.75 ms; then, n = 400. We use the homogeneous medium model and the more complex Marmousi velocity model [25] as the basic model of propagation geology. The Marmousi velocity model is an anisotropic media with very large velocity variations in the horizontal and vertical directions. The grid size of the homogeneous medium model is 200 × 200, and that of Marmousi is 767 × 751, as shown in Figure 1. Different media are distinguished by different colors. The position of the earthquake hypocenter is at the center and that of the receiver is at 50 grids right to the center. The grid spacing is set to 10 m.

Three Optimization Algorithms
We optimize the finite-difference operator by minimizing the difference between the optimizing difference operator and the differential operator. A smaller truncation error means lower numerical dispersion. Next, we briefly introduce the three NIO algorithms: PSO, CDPSO, and KH.

Particle Swarm Optimization
PSO is a well-reputed nature-inspired algorithm proposed by Kennedy and Eberhart [26], which mimics bird flocking or fish schooling. In the PSO algorithm, each agent has two attributes: the position and the velocity. The new position of the agent is updated by the velocity. Each agent has a current fittest position pbest. gbest represents the global fittest position of all agents. The maximum velocity method is used as the update rule [27]. Specifically, the position is updated as follows: where position i is the current position of the agents at the ith iteration, and v i is the velocity at the ith iteration that is given as: where w is the inertia weight coefficient; c 1 and c 2 are the acceleration constants of individual and society; r 1 and r 2 are random numbers between 0 and 1.

Center-Decenter PSO
CDPSO, proposed by Wang [23], is a variant of PSO that has the advantage of avoiding premature convergence and local trap in PSO. It combines two learning strategies: centralized learning and decentralized learning. The agent switches its strategy to update the velocity.
In the centralized learning strategy, the velocity of the ith iteration is updated as follows, and the meaning of the same coefficients is the same as that of PSO.
where L is a random number, which represents the number of partial optimal solutions. The velocity in decentralized learning is updated as follows: where γ is the serial number of the particle, and P is the number of agents in the range of all possible solutions.

Krill Herd Algorithm
The Krill herd (KH) algorithm was proposed in 2012 by Gandomi and Alavi [28]. KH is inspired by the herding behavior of the krills. The position of the ith krill individuals X i is influenced by three main factors: movement induced by the presence of other individuals N i , foraging activity F i , and random diffusion D i , respectively: The individual motion is induced by the neighbors within a certain range and the position of the elite individual. The formula for induced speed N is given as follows: where w n is the inertia weight, α i is the direction of the induced, α local i is the influence of neighbors, and α target i is the influence of individuals with optimal fitness values. The foraging motion is relevant to the location of food and previous experience. The foraging speed F is formulated as follows: where w is the inertia weight; β i is the direction of the forage. The physical diffusion is the random movement behavior of krill populations. The diffusion speed D is as follows: where δ is a random value.
We give the flow charts of the above-mentioned algorithms in Figure 2. PSO and CDPSO have similar processes. The difference between them is that CDPSO alters the updating strategy of the velocity.

Simulation Setup
The parameters regarding the seismic wave landscape have been given previously. We let population = 30, iteration = 500 for PSO and KH, population = 100, and iteration = 1500 for CDPSO. The coefficients of PSO and CDPSO are assigned as: w = 0.8, c 1 = c 2 = 1.5, and the velocity is between −0.001 and 0.001.
We use the first-order derivative truncation error E 1 in Equation (12) as the objective. The value of E is corresponding to the value of k x . We denote this one-one relation as E(k x ). To make the image of coefficient convergence more demonstrative, the above error value E(k x ) is used to calculate the number of agents that meet the requirement of the accuracy. Given the accuracy = 5e −3 , the cost function value (CFV) is determined as follows: where k cuto f f is the cutoff wavenumber, corresponding to the upper limit of k x . As k x ∈ [0, π/∆x] and the sampling interval ∆x is set to π, k x ∈ [0, 1]. k cuto f f ∈ [0, 1000], then the k x is an increasing sequence 0, 0.001, · · · , k cuto f f /1000. We choose the order N as [8,12,16,20,24] with k cuto f f = [0.54, 0.65, 0.74, 0.8, 0.85] × 1000 for the following tests of the three algorithms.

Coefficients Convergence
To test the performance of the algorithms, we use CFV in Equation (26) to evaluate the fitness of the solution. The fitness convergence curves of the three NIO algorithms are illustrated in Figure 3. The closer CFV value is to 0, the better the precision of the algorithm is.   As illustrated in Figure 3, the KH algorithm with 500 generations shows better accuracy than the CDPSO algorithm with 1500 generations. When the order N = 8, the KH algorithm is slightly better than the PSO algorithm. However, when N = 16, the KH algorithm converges to a good optimum, while PSO and CDPSO do not converge.
We give the first finite-difference coefficients b n optimized by the three algorithms in Tables 2-4.

Stability
We calculate second coefficients c n by using the formula mentioned in Section 2.1: c n = 2b n /n (n = 1, 2, . . . , N/2), c 0 = −2 ∑ N/2 n=1 c n . Then, we use the absolute error E 1 (Equation (12)) and E 2 (Equation (13)) to evaluate the stability of the optimized difference operators. The absolute error is illustrated in Figure 4.   The curve is flat from the beginning. It decreases after the wavenumber reaches about 60. To make it easier to observe, the figure is magnified a thousand times (columns 2 and 4). The closer the curve is to 0, the higher the error stability is. It is observed that the curve fluctuates up and down at 0 and extends abruptly to negative infinity at a certain point. PSO performs well at N = 12 and 24 but not very well at N = 8 and 16. CDPSO is good at N = 8, but the overall fluctuation is large. The second derivative error of KH is between plus and minus 0.00075. The KH algorithm shows better stability than PSO and CDPSO. When the percentage of Nyquist wavenumber is less than 40 in the first derivative and less than 50 in the second derivative, the KH algorithm shows excellent stability.

Model Test
We choose the N = 12 finite-difference coefficients of the second derivative c n for wave field simulation. Take Wang's [16] data as a baseline, which are denoted as 'source'. They proposed a Chebyshev auto-convolution combined window. The four finite-difference coefficients of the second derivative are illustrated in the Table 5. Take the wave field simulation of the homogeneous medium first. With the earthquake hypocenter obtained from Equation (15), we choose the snapshot at 150 ms, as shown in Figure 5. Since the homogeneous medium is isotropic, the entire wave field can be known by looking at the graph in only one direction. Therefore, we stitched the four figures together to facilitate the comparison. The operator with less waviness suppressed the numerical dispersion better, and the parameters calculated are better, too. It is observed that PSO is quite similar to the source, while KH is the best to have less wave. KH has significantly suppressed numerical dispersion. We take the simulation on the Marmousi model shown in Figure 1. Figure 6 gives four wave field snapshots taken at 75 ms intervals.
Due to the complexity of the model and the small difference between the optimized parameters, the obtained wave field information is mixed and difficult to distinguish. It is not obvious which one is better. Therefore, we make the single traces figure from the receiver to show the differences. A portion of the trace after the arrival of the main waver is captured and shown in Figure 7. To see the differences more clearly, we zoomed in on the waveform after the arrival of the main frequency. CDPSO has a better suppression of numerical dispersion in the later stages, although the suppression is not as good as the source data in the early stage. The overall trend of PSO is the same as source data, while KH is slightly different from the source in the late stage. It is observed from the wave of about 625 that both KH and PSO have a better suppression effect than the source. KH is much better than CDPSO and slightly better than PSO. The differences caused by parameter optimization become less noticeable as the complexity of the model increases.  In a nutshell, the KH shows the best performance among the three algorithms.

Conclusions
The finite-difference method uses the difference formulas to replace the derivatives approximately. We introduced the process of obtaining the FD operator and set the error as the optimization objective. We used the NIO algorithms to optimize the first and second derivative coefficients. We compared the performance of the three optimization algorithms (PSO, CDPSO, and KH) in terms of convergence and stability, and we compared them with Chebyshev auto-convolution combined window in a model test. Compared with the Chebyshev window, KH and PSO have better performance. Despite the fact that we did not perform further tuning operation, the NIO algorithms still have a good performance in spectral suppression. KH is the most stable algorithm among the three, which performs well at all N orders. It shows promising prospect in optimizing a finite-difference operator in seismic wave numerical modeling.  The first and second derivatives of f can be derived from Equation (1).
While x = 0, sin[ π ∆x (x − n∆x)] = sin(−nπ) = 0. The derivatives can be simplified as: x 0 is a differentiable point on f . ∆x is the sampling interval, O((n∆x) n ) is the approximate error. Next, we calculate the normal derivative formula.
where a n are constant coefficients. For the calculation derivative, the constant A should be 0. For first derivative: where c n are the finite-difference coefficients for the second derivative. Let c −n = c n ; then, we can get c 0 = −2 ∑ N/2 n=1 c n . Then, the normal derivative formula is summarized as where f n = f (x + n∆x), N is the order, c n and b n are the FD coefficients. The higher the N, the more approximate the result.