Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition

Lu, Cai; Liu, Jijun; Qu, Liyuan; Gao, Jianbo; Cai, Hanpeng; Liang, Jiandong

doi:10.3390/app15020941

Open AccessArticle

Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition

by

Cai Lu

¹,

Jijun Liu

^2,*

,

Liyuan Qu

¹

,

Jianbo Gao

²,

Hanpeng Cai

²

and

Jiandong Liang

²

¹

The School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China

²

The School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu 611731, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(2), 941; https://doi.org/10.3390/app15020941

Submission received: 2 December 2024 / Revised: 7 January 2025 / Accepted: 9 January 2025 / Published: 18 January 2025

Download

Browse Figures

Versions Notes

Abstract

Full-waveform velocity inversion has long been a primary focus in seismic exploration. Full-waveform inversion techniques employing physics-informed recurrent neural networks (PIRNNs) have recently gained significant scholarly attention. However, these approaches demand considerable storage to capture spatiotemporal seismic wave propagation fields and their gradient information, often exceeding the memory capabilities of current GPU resources during field data processing. This study proposes a full-waveform inversion method utilizing a dual-branch PIRNN architecture to effectively minimize GPU resource consumption. The primary PIRNN branch performs forward-wave equation modeling at the original scale and computes the discrepancy between synthetic and observed seismic records. Additionally, a downscaled spatiotemporal PIRNN branch is introduced, transforming the original-scale error into a loss function via scale decomposition, which drives the inversion process in the downscaled domain. This dual-branch design necessitates recording only the spatiotemporal field and gradient information of the downscaled branch, significantly reducing GPU memory requirements. The proposed dual-branch PIRNN framework was validated through full-waveform inversions on synthetic horizontal-layer models and the Marmousi model across various scales. The results demonstrate that this approach markedly reduces resource consumption while maintaining high inversion accuracy.

Keywords:

AFWI; multiscale decomposition; dual-physics-informed recurrent neural network (PIRNN); full-waveform inversion (FWI); vertical seismic profiling (VSP)

1. Introduction

Full-waveform inversion (FWI) is a pivotal technique in geophysical exploration, extensively utilized to derive detailed subsurface velocity structures. These structural insights are integral to applications such as resource exploration and seismic hazard analysis. Traditional velocity-inversion methods, including travel-time tomography and ray tracing, rely on limited data, focusing primarily on first-arrival times or specific ray paths. In contrast, FWI leverages the full-waveform information from seismic records, enabling a more comprehensive estimation of subsurface medium parameters. By utilizing the entire seismic wavefield, FWI achieves superior accuracy and detail in reconstructing the Earth’s subsurface properties, representing a significant advancement in characterizing complex geological structures.

The concept of FWI was first introduced in the 1980s by Lailly and Tarantola [1]. It uses wavefield information contained in seismic datasets to reconstruct subsurface parameters by minimizing the discrepancy between recorded observational data (real seismic records) and predicted data (synthetic records derived from an initial model). As a highly nonlinear method, FWI incurs substantial computational costs. Pratt et al. (1998) [2] implemented FWI in the frequency domain, improving computational efficiency; however, its application is constrained by the need for low-frequency data acquisition. Shin and Cha (2008) [3] extended FWI to the Laplace and Laplace–Fourier domains, enhancing imaging capabilities for deep structures but further increasing computational demands. A recurring challenge in FWI is cycle-skipping, which arises when the initial model significantly deviates from the actual subsurface model. To address this, Warner and Guasch (2016) [4] introduced adaptive waveform inversion (AWI), employing a traceable objective function to mitigate cycle-skipping and reduce dependence on low-frequency data, albeit with potential compromises in resolution. To account for the complexity of subsurface media, Pan et al. (2018) [5] enhanced FWI by simultaneously inverting multiple physical parameters (e.g., velocity–density, modulus–density, impedance–density), thereby improving the comprehensiveness of subsurface characterization. However, this approach increased the problem’s complexity and ill-posedness, necessitating additional prior information and computational resources. Further refinements were made by Sun, Alkhalifah et al. (2019) [6], who defined a new objective function using optimal transport theory, enhancing the utilization of low-frequency information. Despite these advancements, FWI continues to face challenges in balancing computational efficiency, accuracy, and reliance on initial models.

In recent years, significant progress in large-scale computing and the rapid development of deep-learning technologies have revolutionized various computational engineering fields. Deep learning has shown remarkable success in structural health monitoring (SHM). For instance, Ta et al. [7] developed a 1D CNN model for concrete stress monitoring using smart aggregate-based impedance signals, demonstrating high accuracy in stress estimation even with noisy data. Similarly, Nguyen et al. [8] proposed an automated bolt-loosening monitoring method by integrating impedance technique with deep learning, achieving precise identification of bolt locations and loosening degrees with minimal pre-processing requirements. In the field of geophysical exploration, deep-learning techniques have also demonstrated promising applications. Within the FWI domain, traditional deep-learning techniques generally involve three key steps: generating training data, constructing and training deep neural networks, and employing the network for prediction and subsurface structure imaging. For instance, DAS et al. [9] utilized convolutional neural networks (CNNs) for seismic impedance inversion. Similarly, Puzyrev and Swidinsky [10] and Moghadas [11] proposed CNN-based one-dimensional deep-learning inversion strategies for electromagnetic and transient electromagnetic induction data. Furthermore, Wang et al. [12] and Liao et al. [13] applied CNNs and enhanced deep belief networks, respectively, to invert 2D MT data. These deep-learning-based techniques are entirely data-driven, relying on training datasets to establish mappings between inputs and outputs. Although these methods simplify the implementation of deep-learning-based inversion, they also present challenges related to interpretability and the significant costs of constructing large-scale datasets and performing the associated computations. Recent advancements have increasingly focused on physics-informed neural networks (PINNs) and the integration of prior knowledge into FWI as a response to longstanding challenges. Originally introduced by Raissi et al. (2019) [14] for solving complex physical problems, particularly nonlinear partial differential equations, PINNs combine deep learning with traditional scientific computing, offering an innovative framework for addressing both forward and inverse problems. Since their inception, the applications of PINNs have rapidly expanded to various domains. For instance, they have been employed in fluid dynamics for solving incompressible Navier–Stokes Equations (2021) [15] and addressing high-speed flow problems [16]. In seismology, PINNs have gained prominence through applications such as the Earthquake Transformer model developed by Mousavi et al., which performs earthquake detection and phase identification simultaneously (2020) [17]. Additionally, PINNs have demonstrated considerable potential in solving wave Equations (2020) [18] and ray tracing problems (2021) [19], underscoring their versatility across a broad spectrum of scientific and engineering challenges. Within the FWI domain, Lu et al. [20] utilized PINNs by incorporating seismic data and prior initial models as inputs for network training, effectively reducing the dependence on extensive datasets. Similarly, Rasht-Behesht et al. [21] proposed a novel method leveraging PINNs to solve FWI problems. This approach employs neural networks to solve the partial differential equations governing acoustic wave propagation, utilizing their mesh-free properties to accommodate diverse boundary conditions. Sun et al. (2020) [22] further extended PINN applications in FWI by developing a theory-guided system that integrates the forward modeling physics of wave equations directly into the training loop. By employing physics-based data mismatch as the loss function, this system achieves unsupervised deep learning, reframing the process as a physics-guided FWI technique. Compared to purely data-driven methods, PINN-based approaches and FWI methods that incorporate prior knowledge exhibit greater robustness against noise. Furthermore, unlike traditional FWI techniques, PINN-based methods offer a flexible framework for integrating diverse data types and structural prior knowledge, thereby enhancing inversion accuracy and reliability. Their versatility across various wave propagation models makes them particularly advantageous for addressing complex geophysical problems. However, the computational demands of PINN-based methods, especially for large-scale three-dimensional applications, remain a significant barrier to their practical implementation.

To address the challenge of high computational resource consumption, this study introduces a dual-branch physics-informed recurrent neural network vertical seismic profiling (PIRNN VSP) FWI architecture. The central concept involves constructing multiscale branches and facilitating information transfer across these scales. Specifically, the proposed method enables the propagation of information from the original-scale branch to a downscaled branch, effectively performing full-waveform inversion while significantly reducing GPU resource consumption during the inversion process.

The main contributions of this study are as follows:

(1): we propose a novel dual-branch PIRNN architecture for VSP acoustic FWI. This architecture facilitates efficient information transfer and conversion between models at different scales, enabling more accurate and computationally efficient inversion processes.
(2): our proposed approach incorporates scale decomposition, allowing the transformation of loss information from the original-scale model to a reduced-scale model. This enables PIRNN-based VSP acoustic FWI across multiple scales, significantly enhancing both computational efficiency and inversion accuracy.
(3): we validate the proposed dual-branch PIRNN FWI network through simulation experiments on a homogeneous layered model and the Marmousi model. The results demonstrate that our approach substantially reduces GPU resource consumption while maintaining high inversion accuracy, confirming its effectiveness for practical applications.

2. Theory

2.1. FWI Math Model

As mentioned earlier, FWI is a high-resolution imaging technique that uses the complete seismic waveform information to invert subsurface medium parameters. The core principle of FWI is to estimate these parameters through an iterative optimization process that minimizes the difference between observed and simulated seismic data. This process is fundamentally based on the wave equation, often represented in matrix form as follows [23]:

M (x) \frac{d^{2} u (x, t)}{d t^{2}} = A (x) u (x, t) + s (x, t),

(1)

where M and A represent the mass and stiffness matrices, respectively. The source term is denoted as

s (r, t)

, while the seismic wavefield is represented by

u (x, t)

, where x denotes the spatial position and t represents time. For a 2D point source, the acoustic wave equation can be expressed as:

\nabla^{2} u (x, t) = \frac{1}{v^{2} (r)} \cdot \frac{\partial^{2} u (x, t)}{\partial t^{2}} + s (x, t) δ (x - x_{s}),

(2)

where

\nabla^{2} u (x, t)

represents the pressure at coordinate r at t,

v^{2} (r)

denotes the acoustic velocity at coordinate r,

s (r, t)

is the source function at coordinate r at time t, and

δ (x - x_{s})

is the Dirac delta function, indicating that the energy of the source function is released exclusively at the source location. Forward-wave equation modeling is performed using numerical methods to generate simulated data records in the time–space domain. The objective function is defined as the discrepancy between the observed and simulated data. Typically, the L2 norm is employed to quantify this discrepancy, as shown in (3) [24]:

J_{D} = {∥d_{obs} (x_{r}, x_{s}, t, m) - d_{cal} (x_{r}, x_{s}, t, m_{0})∥}^{2},

(3)

where

m_{0}

is the model parameter vector,

d_{o b s}

represents the observed data obtained from field survey data or simulated data from a true model,

d_{c a l}

denotes the simulated data obtained by solving the wave equation, and

x_{r}

and

x_{s}

represent the receiver and source positions, respectively. The method of acquiring observational data

d_{o b s}

depends on the observation system. For instance, surface wave observation systems deploy multiple receivers on the ground surface to record seismic waves [25], enhancing FWI resolution in shallow layers. This study employs VSP, a specialized seismic exploration technique where seismic receivers are positioned in boreholes, and the seismic source is typically located on the surface [26]. This configuration allows for the direct measurement of seismic waves propagating from the surface to the borehole, providing high-resolution data on subsurface medium properties. In FWI, VSP data offer several advantages, including high signal-to-noise ratio (SNR) near-well measurements, which can serve as initial models or prior constraints. Moreover, VSP data are rich in low-frequency information, which facilitates FWI convergence and enhances inversion accuracy and stability [27,28].

Constructing an appropriate objective function is crucial for enhancing the resolution of FWI, and the integration of additional seismic information into the objective function can significantly accelerate convergence. By defining the objective function

J_{D}

for full-waveform inversion, the FWI process is reformulated as an optimization problem:

m_{opt} = arg min_{m} J_{D}

(4)

where

J_{D}

represents the objective function defined under the L2 norm. FWI is a highly nonlinear and ill-posed inverse problem that typically requires an iterative solution. The gradient of the objective function with respect to the model parameters is generally computed using the adjoint-state method [29]. Subsequently, the model is updated based on the gradient by selecting an appropriate optimization algorithm. Common optimization algorithms include the steepest descent method and the conjugate gradient method, both of which enhance the robustness of FWI [30]. The optimization of the objective function corresponds to the process of updating the model parameters in FWI. Through iterative updates, the value of the objective function is progressively reduced until it converges to a sufficiently small value or satisfies other convergence criteria, yielding the final subsurface model from FWI.

2.2. PIRNN Forward Modeling

Establishing an accurate seismic forward modeling process is fundamental to FWI. Compared to traditional FWI, PINNs can seamlessly integrate physical constraints, such as the wave equation, into the neural network structure. This integration allows PINNs to leverage the nonlinear fitting capabilities of deep learning while maintaining physical interpretability. Moreover, PINN methods typically require minimal labeled data and can be trained effectively in data-scarce scenarios, which is particularly valuable in geophysical exploration [31]. Additionally, PINNs exhibit superior generalization capabilities, handle complex nonlinear problems, and offer potential advantages in computational efficiency. In this study, we impose physical constraints on neural network-based FWI and employ a first-order pressure–velocity acoustic wave equation within a split-field structure for forward modeling in PIRNNs:

\{\begin{matrix} \frac{\partial p_{x} (r, t)}{\partial t} = v^{2} (r) \frac{\partial v_{x} (r, t)}{\partial x} \\ \frac{\partial p_{z} (r, t)}{\partial t} = v^{2} (r) \frac{\partial v_{z} (r, t)}{\partial z} \\ \frac{\partial v_{x} (r, t)}{\partial t} = \frac{\partial p (r, t)}{\partial x} \\ \frac{\partial v_{z} (r, t)}{\partial t} = \frac{\partial p (r, t)}{\partial z} \end{matrix}

(5)

where

p_{x}

and

p_{z}

are the pressure components in the x and z directions,

v_{x}

and

v_{z}

are the particle velocities in the x and directions, and v is the acoustic velocity, which serves as the velocity parameter to be inverted. Accurate forward modeling is fundamental to FWI. In this study, we employ high-order finite-difference methods with a staggered-grid format to solve the aforementioned first-order pressure–velocity acoustic wave equation. Furthermore, perfectly matched layer (PML) boundaries are integrated to ensure a high-precision forward modeling process by effectively absorbing outgoing waves and minimizing reflections at the boundaries.

RNNs are a class of artificial neural networks that can be categorized into two types: finite impulse response networks, which use directed acyclic graphs, and infinite impulse response networks, which utilize directed cyclic graphs. Both types can represent dynamic temporal behaviors. Unlike deep neural networks, RNNs leverage their internal state (memory) to process sequential inputs, enabling the current output to be influenced by previous results. This renders RNNs particularly well-suited for time-dependent signal processing tasks, such as text and audio processing. Time-stepping methods are widely employed in numerical simulations of wave equations, which describe the variation of wave fields in both time and space. These methods compute the wave field state at the next time step based on the current state. The core feature of RNNs—their ability to process time-series data—aligns well with these time-stepping solution methods. Specifically, in RNNs, the hidden state carries information from one-time step to the next. This characteristic closely parallels the time-stepping solution method for wave equations, where the hidden state can be interpreted as the wave field information from the previous time step. To further clarify the relationship between the time dependency in wave equation solutions and RNN characteristics, we use

v_{x}

and

p_{x}

from the first-order pressure–velocity wave equation as examples. We construct a recursive formula with second-order temporal accuracy and second-order staggered-grid finite-difference accuracy in the time domain:

\{\begin{matrix} \frac{\partial v_{x} (r, t)}{\partial t} = \frac{\partial p (r, t)}{\partial x} \\ \frac{\partial p_{x} (r, t)}{\partial t} = v^{2} (r) \frac{\partial ν_{x} (r, t)}{\partial x} \end{matrix} .

(6)

We discretize the aforementioned equations by defining the pressure components

p_{x}

and

p_{z}

on the full grid points and the particle velocities

v_{x}

and

v_{z}

on the half grid points. We expand these at

(k + 1 / 2) Δ t

and

[i x Δ x, i z Δ z]

[32]:

\begin{matrix} v_{x}^{k + \frac{1}{2}} [i x + \frac{1}{2}, i z] = \frac{Δ t [p^{k} [i x + 1, i z] - p^{k} [i x, i z]]}{Δ x} + v_{x}^{k - \frac{1}{2}} [i x - \frac{1}{2}, i z] \\ p_{x}^{k + 1} [i x, i z] = v^{2} [i x, i z] Δ t \frac{v_{x}^{k + \frac{1}{2}} [i x + \frac{1}{2}, i z] - v_{x}^{k + \frac{1}{2}} [i x - \frac{1}{2}, i z]}{Δ x} + p_{x}^{k} [i x, i z] \end{matrix} .

(7)

From (7), in each time loop, we first use the pressure

p^{k}

at time

k Δ t

and the particle velocity

v_{x}^{k - \frac{1}{2}}

at time

(k - \frac{1}{2}) Δ t

to calculate the particle velocity

v_{x}^{k + \frac{1}{2}}

at time

(k + \frac{1}{2}) Δ t

. Then, based on the calculated particle velocity

v_{x}^{k + \frac{1}{2}}

at time

(k + \frac{1}{2}) Δ t

and the pressure

p_{x}^{k}

at time

k Δ t

, we compute the pressure

p_{x}^{k + 1}

for the next time step. In this process, each time the wave equation is solved, the wave field data from the previous time steps,

k Δ t

and

(k - \frac{1}{2}) Δ t

, is utilized. This process enables seismic forward modeling using RNNs, where each layer of the RNN is designed to calculate and store the wave field information at a specific time step. This structure maps the temporal evolution of the wave field onto the hierarchical structure of the RNN. Specifically, each layer of the RNN represents the state of the wave field at a particular time step, including pressure and particle velocity. Within this framework, each RNN layer stores the current wave field information and receives the output from the previous layer as input. This design effectively simulates the time dependency of the wave equation, where the solution at each time step depends on the state from the previous time step. The aforementioned process is based on PIRNNs for seismic forward modeling, forming the foundation for seismic inversion. To construct an architecture supporting seismic inversion, the PIRNN forward modeling process is shown in Figure 1, where the velocity model is input into the RNN structure, and forward modeling is performed step by step through the SGFD RNN operator to finally obtain the output VSP shot-gather. Here,

Input (i)

represents the wave field information stored in the RNN unit at time step i, and

Output (i)

represents the simulated shot-gather record generated at time step

i - 1

(recording local wave field information using a VSP observation system). During this process, the subsurface medium parameters are incorporated as trainable parameters within each SGFD RNN operator. Consequently, the loss calculation includes gradient information with respect to these parameters. This enables the neural network to compute the gradient of the objective function with respect to the subsurface parameters through backpropagation, iteratively updating these parameters. The specific principles underlying this process will be introduced in the next subsection.

2.3. PIRNN Velocity Model Inversion

This section describes the implementation of seismic inversion based on PIRNNs, using the inversion of velocity parameters as an example. The process begins by specifying an initial velocity model, which is incorporated as a trainable parameter within the RNN network [22]. Next, an objective function

J_{D}

is defined to quantify the discrepancy between the forward-modeled seismic data and the observed data. The L2 norm is chosen to measure this difference:

J_{D} = \sum_{N_{s}} \sum_{T} {∥ d_{c a l} - d_{o b s} ∥}^{2} .

(8)

Here,

N_{s}

and T represent the total number of seismic sources and time steps in forward modeling, respectively.

d_{c a l}

denotes the shot-gather record obtained through forward modeling using the SGFD RNN operator based on the velocity parameter v, while

d_{o b s}

represents the observed data, i.e., the actual record. The objective of FWI is to minimize the defined objective function. Within the deep-learning framework, gradient-based optimizers such as SGD or Adam can be utilized to update the velocity parameters by leveraging gradients computed during the backpropagation process. In this context, calculating the gradient of the objective function

J_{D}

with respect to the velocity parameter v is critical. Sun et al. [22] derived the gradient formula for the objective function with respect to the velocity parameter in the framework of wave propagation simulation using the second-order acoustic wave equation and RNNs. Their findings demonstrated that this full-waveform inversion process is equivalent to traditional FWI. Subsequently, Lu et al. [20] validated the effectiveness of this method using the Marmousi velocity model.

In the neural network framework, the relationships between variables are inherently recorded. During the backpropagation process, gradients are computed via automatic differentiation, and parameters are updated using optimizers. Notably, the automatic differentiation mechanism in RNNs leverages the chain rule to accelerate the backpropagation process. However, this approach demands substantial memory to store the hidden states of each unit, which in this context corresponds to the entire wavefield information, including pressure and particle velocity in the x and z directions. To address this memory constraint, this study proposes a dual-PIRNN network architecture. Transferring the loss from the original-scale model to a downscaled model facilitates inversion operations on the downscaled model, significantly reducing GPU resource consumption. The specific network design and underlying principles will be elaborated in the next section.

3. Methodology

The proposed dual-PIRNN inversion method comprises two stages. In the first stage, recurrent neural networks (RNNs) are employed to perform forward modeling on the original-scale model, and information is extracted through scale decomposition. In the second stage, inversion is performed on the downscaled model using the extracted information.

3.1. Network Structure

The network architecture of the dual-PIRNN is illustrated in Figure 2. The proposed inversion process is divided into two branches. One branch performs forward modeling on the original-scale model to compute the loss at the original-scale. The second branch, referred to as the downscaled branch, conducts FWI by receiving the scale-decomposed loss from the original-scale model. The velocity parameters are updated on the downscaled model, which is subsequently upsampled and fed back into the original-scale branch for further forward modeling, repeating this cycle. The key distinction between the downscaled and original-scale models lies in their parameter settings. For instance, parameters such as the sampling rate

d t

of the source function, the number of time points

N T

for numerical simulation, and the number of grid points

n z

and

n x

exhibit a multiplicative relationship between the two scales. During each iteration of the inversion process, forward modeling is first performed on the initial original-scale model using the SGFD RNN operator to obtain the simulated shot-gather data

d_{c a l}

. The loss at the original scale,

L o s s_{o r i g i n a l}

, is then calculated using the L1 norm between

d_{o b s}

and

d_{c a l}

, as expressed in (9):

L o s s_{o r i g i n a l} = d_{o b s} - d_{c a l} .

(9)

Here, the

d_{c a l}

and

d_{o b s}

obtained through the VSP observation system are time-series matrices with a size of

N T \times n z

. Consequently, the calculated

L o s s_{o r i g i n a l}

is also a time-series matrix with a size of

N T \times n z

. a downsampling method based on average pooling is then applied to extract

L o s s_{o r i g i n a l}

, yielding

L o s s_{d o w n}

that matches the required data size for the downscaled branch model. Forward modeling is subsequently performed using the SGFD RNN operator on the downscaled branch to generate the VSP shot-gather

d_{D c a l}

on the downscaled model. By summing

L o s s_{d o w n}

and

d_{D c a l}

, the observed record

d_{D o b s}

on the downscaled model is obtained, as expressed in (10):

\begin{matrix} L o s s_{d o w n} = E x t r a c t i o n O p e r a t o r (L o s s_{o r i g i n a l}) \\ d_{D o b s} = d D c a l + L o s s_{d o w n} \end{matrix} .

(10)

The extraction operator refers to a downsampling operator based on average pooling. Once the downscaled observed data,

d_{D o b s}

is obtained, the inversion process—detailed in Section 2 and based on the PIRNN—can be executed on the downscaled model to update its velocity parameters. The mean squared error (MSE) is employed as the loss function, and the Adam optimizer is used for parameter updates. The loss function for the downscaled model is presented in (11):

L o s s = M S E (d_{D o b s}, d_{D c a l}) .

(11)

The calculated

L o s s

is propagated backward to update the velocity parameter v in the downscaled model. Once updated, the velocity parameters v are upscaled to revert to the original scale, preparing them for the next full-scale forward modeling iteration. This sequence represents the inversion flow within each cycle of the dual-PIRNN network structure (Figure 2). During this process, inversion occurs on the downscaled model, Compared to traditional FWI, which performs inversion at the original scale, the dual-branch architecture of the PIRNN further minimizes GPU resource consumption by leveraging scale decomposition: the original-scale shot gathers are transformed into downscaled models, which require fewer grid points and shorter forward modeling time. Since the PIRNN-based full-waveform inversion records gradient information for the entire spatiotemporal field, its resource consumption is directly proportional to the forward modeling time and the number of grid points. By reducing both factors through downscaling, the dual-branch architecture achieves significant savings in GPU memory and computational costs. Consequently, computational costs are directly reduced.

3.2. SGFD RNN Operator

The SGFD RNN operator, based on PIRNN, is designed for high-precision forward modeling, accepts velocity parameters v and source location

r_{s}

as inputs, and generates VSP shot-gather records as output. Its internal implementation is illustrated in Figure 3. Here,

v_{x}^{k - \frac{1}{2}}

,

v_{z}^{k - \frac{1}{2}}

,

p_{x}^{k}

, and

p_{z}^{k}

serve as the current time step inputs. The internal process first combines the split pressure fields

p_{x}^{k}

and

p_{z}^{k}

at time k to form the complete pressure field

p^{k}

. Following the staggered-grid approach, we first update the particle velocity at time

k + \frac{1}{2}

, then use this updated velocity, along with the split pressure fields

p_{x}^{k}

and

p_{z}^{k}

at time k, and the internal velocity parameter v from the RNN to compute the split pressure fields

p_{x}^{k + 1}

and

p_{z}^{k + 1}

at time

k + 1

. We then add the source terms

S_{x}^{k}

and

S_{z}^{k}

to the x and z components of the pressure field, respectively. Finally,

v_{x}^{k + \frac{1}{2}}

,

v_{z}^{k + \frac{1}{2}}

,

p_{x}^{k + 1}

, and

p_{z}^{k + 1}

are output and fed into the next SGFD RNN operator as inputs. In this structure,

v_{x}^{k + \frac{1}{2}}

,

v_{z}^{k + \frac{1}{2}}

,

p_{x}^{k + 1}

, and

p_{z}^{k + 1}

serve as hidden states for each SGFD RNN operator unit, propagating information between different RNN units. These hidden states encapsulate the chain relationship tied to the velocity parameter v at each time step across the entire time domain; the relationship is computed during backpropagation, leveraging automatic differentiation to determine the gradients associated with v. The velocity parameters are then updated using the Adam optimizer.

3.3. Scale Decomposition

In the field of FWI, scale decomposition is a strategy designed to mitigate the nonlinearity of the inverse problem by progressively incorporating information from low to high frequencies. The core concept involves breaking down complex data or models into components of varying frequencies or scales. This approach is especially critical for seismic data, as it facilitates more effective inversion by isolating information across different scales, thereby enhancing model accuracy and convergence.

In the dual-PIRNN FWI network architecture, scale conversion enables information obtained through forward modeling in the original-scale branch to be utilized in the downscaled branch. To illustrate this, we use the Marmousi velocity model as an example, dividing it into two scales and verifying the correctness of information conversion between them. One model is designated as the original-scale model, and the other as the downscaled model (Figure 4). In this example, the original-scale model has a grid size of

100 \times 100

with grid spacing

d x = d z = 10

. The source function is a Ricker wavelet with a sampling rate of

f_{s} = 4000 Hz

. The number of time points is

N T = 1600

. For the downscaled model, the grid spacing

d x

and

d z

are half that of the original-scale model. Consequently, the number of time points

N T

required for forward modeling is also half that of the original scale. The source function for the downscaled model uses a sampling rate of

f_{s} = 2000 Hz

. We employ both surface seismic and VSP observation systems for data acquisition. Additional parameter settings are detailed in Table 1.

In numerical simulations, we employ a Ricker wavelet as the source, with its discrete-time form given by (12):

R [n] = (1 - 2 π^{2} f_{p}^{2} {(n Δ t)}^{2}) e^{- π^{2} f_{p}^{2} {(n Δ t)}^{2}} .

(12)

In this context,

f_{p}

is the peak frequency of the Ricker wavelet,

Δ t

is the sampling interval of the Ricker wavelet, and the sampling interval

Δ t

determines the sampling rate of the Ricker wavelet,

f_{s} = \frac{1}{Δ t}

. Using the parameter settings in the Table 1. we will demonstrate the intermediate steps of scale conversion according to (9) and (10), using the Marmousi true velocity model and the initial velocity model, as shown in Figure 5.

We perform forward modeling using the SGFD RNN operator on both the true and initial models shown in Figure 5 under two scale parameter settings specified in Table 1. The VSP shot gathers data for the first shot and is illustrated in Figure 6. According to the variables in (9) and (10), the four-shot-gather records—(a), (b), (c), and (d)—correspond to

d_{o b s}

,

d_{c a l}

,

d_{D o b s}

, and

d_{D c a l}

, respectively. As observed from the figure, the shot gathers records at different scales and exhibits structural similarities, enabling the transformation of information from the original scale to the downscaled domain. In practical scenarios, the shot-gather records in (a) and (b) can be directly obtained at the original scale using the SGFD RNN operator, while the data in (d) can be acquired using the operator at the downscaled level, with velocity model parameters converted from the original-scale initial-velocity model. During the inversion process, the data corresponding to (c), representing the actual record at the downscaled level, remain unknown. Therefore, the objective is to synthesize the true shot-gather record (c) on the downscaled model using the shot-gather records corresponding to (a), (b), and (d), thereby enabling inversion on the downscaled model. We compare the actual

d_{D o b s}

with the

d_{D o b s}

synthesized according to (9) and (10) and calculate the correlation coefficient (Figure 7). The results demonstrate that the true shot-gather records synthesized on the downscaled model using this method exhibit high accuracy. Under the high-precision forward modeling approach, only minor discontinuities are observed in the direct wave, with the impact on inversion remaining within acceptable limits.

Based on the analysis of our proposed dual-PIRNN FWI network architecture, the primary components of GPU memory usage are original-scale model forward modeling, downscaled model forward modeling, and inversion. Among these, the most significant portion is occupied by the spatiotemporal wavefield data related to gradients during the inversion process, commonly referred to as intermediate activations in neural networks. Since the inversion is conducted within the PIRNN framework, gradient information must be stored across the entire spatiotemporal field. The example discussed in this section demonstrates decomposition in the time domain, which reduces the amount of gradient information that needs to be stored. Additionally, information can be decomposed simultaneously in both the time and space domains, further optimizing memory utilization. The next section will discuss specific simulation results and resource utilization analysis.

4. Numerical Examples

In this section, we conduct numerical experiments to validate the effectiveness of our proposed dual-PIRNN FWI framework. To ensure robust evaluation and enhance data diversity, we implement multiple data augmentation strategies. These include (1) scale-decomposition approaches (time-scale and combined time–space-scale reduction), (2) initial model variations through different Gaussian smoothing levels, and (3) multi-source acquisition configuration with 11 equidistant sources and VSP observation system. Our experiments are organized as follows. First, we verify the accuracy of PIRNN forward modeling by examining the effects of PML boundaries and high-order staggered-grid finite-difference methods. Then, we evaluate the inversion performance on two velocity models: a three-layer homogeneous model and the Marmousi model. For each model, we analyze the results obtained using different scale-decomposition strategies and initial models. Finally, we analyze the GPU memory consumption of our network architecture.

4.1. Forward Modeling

In this experiment, we verify the accuracy of forward modeling of the wave equation based on PIRNN and demonstrate the enhanced modeling precision achieved through high-order staggered-grid finite-difference methods and the implementation of PML boundaries. A homogeneous velocity model is selected for forward modeling, with a Ricker wavelet employed as the source function. The specific parameter settings are provided in Table 2.

Using the parameter settings in Table 2, we plotted wavefield images with and without PML layers, as well as wavefields with different finite-difference orders (Figure 8). Figure 8a,b illustrate the wavefield at

t = 0.5 s

with and without PML layers, respectively. without PML layers, acoustic waves exhibit significant reflections at the boundaries during propagation. These boundary reflections adversely impact shot-gather recordings in VSP observation systems, thereby degrading the resolution of FWI results. Another factor influencing shot-gather quality is dispersion. Higher wavelet frequencies improve shot-gather resolution but are more susceptible to dispersion. Figure 8c,d demonstrate the impact of different staggered-grid finite-difference orders on the wavefield at a wavelet frequency of

f = 40 Hz

. Figure 8c employs 10th-order central differences, while Figure 8d uses 4th-order central differences. Evidently, at higher wavelet frequencies, Figure 8d exhibits noticeable dispersion due to the insufficient difference order. This dispersion leads to the reception of numerous false reflection signals in VSP observation systems, consequently affecting inversion performance. In conclusion, we achieved high-precision wavefield forward modeling by incorporating PML layers and high-order staggered-grid finite-difference methods into our PIRNN model, providing a robust foundation for inversion.

4.2. Homogeneous Layer Model

In this experiment, we conduct FWI on a homogeneous layer model and the Marmousi model using the proposed dual-PIRNN network architecture. The Adam optimizer is employed with a learning rate of 40. The time step

d t

for finite-difference calculations is set to 0.0005 s, while the specific scale of each model determines the grid spacing and the number of grid points. The forward modeling component of the PIRNN network generates observed shot-gather records. Based on the grid parameters, 11 sources are positioned equidistantly along the top of the model to simulate synthetic shot-gather data. The receivers for the VSP observation system are arranged in a vertical array at the center of the model.

The actual model and initial model for the homogeneous layer are shown in Figure 9. The original-scale velocity model has a grid size of

100 \times 100

, with grid spacing

d z = d x = 10

. The three layers have velocities of [2500, 3000, 3500] m/s, respectively. The initial model is uniformly shifted by 200 m/s from the actual model, resulting in velocities of [2700, 3200, 3700] m/s. Following the dual-PIRNN network architecture, we employ two scale-decomposition strategies: one that decomposes only in time and another that decomposes in both time and space. Table 3 presents the inversion parameters and corresponding data scales across different model scales. Inversion tests were conducted using both scale-decomposition strategies within the dual-PIRNN FWI network architecture. The results are presented in Figure 10 and Figure 11. To comprehensively evaluate the inversion performance, we employ three quantitative metrics: R-squared score (R2), Structural Similarity Index (SSIM), and Normalized Cross-Correlation (NCC). The R2 score measures the linear correlation between inverted and true models, SSIM evaluates structural similarity, and NCC assesses overall similarity. Figure 10a shows the inversion results using the time-scale reduction strategy, while Figure 10b displays the results using both time- and spatial-scale reduction. From these results, we observe that the time-scale reduction strategy in Figure 10a produces striped noise in the inversion results. This is attributed to limited source-receiver coverage, leading to poor resolution in certain directions. Additionally, the reduced time-scale results in a lack of high-frequency information in some areas of the model, manifesting as rapid changes and stripe artifacts. Although scale decomposition may lead to some loss of accuracy in the inversion results, a potential improvement is to incorporate a velocity smoothness regularization term during the inversion process. In Figure 10b, the inversion is performed on a smaller model grid by reducing both time and spatial scales, which mitigates local minima and results in a more stable inversion process. Furthermore, due to the reduced number of grid points and time steps required for numerical simulation, as shown by the data scale comparison in Table 3, this approach significantly decreases GPU resource consumption and improves computational efficiency during the inversion process.

4.3. Marmousi Model

In this experiment, the Marmousi model is employed for inversion using the proposed dual-PIRNN network architecture. The initial Marmousi velocity models are derived from the true Marmousi velocity model by applying varying degrees of Gaussian smoothing. The velocity parameters are treated as trainable parameters within the dual-PIRNN network. The original-scale grid size is

100 \times 100

, with grid spacing

d z = d x = 10 m

. Split-field perfectly matched layer (SPML) absorbing boundaries are implemented around the model domain. The specific velocity models are shown in Figure 12. Figure 12a represents the true Marmousi velocity model at the original scale, while Figure 12b–d display the initial velocity models obtained by applying Gaussian smoothing to Figure 12a with

σ

values of 5, 10, and 20, respectively. As described in the previous section, the scale-decomposition strategies are categorized into time-scale and spatial-scale reduction. Inversion is performed using the dual-PIRNN FWI network architecture. The inversion parameters and corresponding data scales for the Marmousi model at different scale levels are presented in Table 4.

We first conducted inversions on the initial model with

σ = 5

, applying both time-scale reduction and combined time–space reduction strategies according to the parameter settings in Table 4. Figure 13a displays the inversion result under the time-scale reduction strategy. This approach accurately identifies several velocity interfaces and structural information in the shallow layers. The resolution of structural details in most deep areas is also relatively high. However, it can be observed in Figure 13a that the inversion effect in the deep regions on the left side of the velocity model is less accurate than on the right. This discrepancy arises because the reflective signals from the velocity structures on the left side of the deep Marmousi model are only partially received by the VSP observation system. In contrast, the reflective signals from the sloped structures on the right are more easily captured, leading to noticeably higher inversion accuracy on the right side. Figure 13b shows the inversion result under the combined time and space-scale reduction strategy. While this approach significantly reduces computational costs by decreasing the number of grid points and time steps (as detailed in Table 3), due to decomposition in both the time and space domains, the resolution is reduced on the coarse grid. This trade-off between computational efficiency and resolution is a key consideration in practical applications. If interpolated back to the original scale, the details could be clearer. However, the overall structure of the Marmousi model is still well-reconstructed. Figure 14 presents a further comparison of velocity profiles at

x = 300 m

and

x = 700 m

. Evidently, the downtime scale inversion can accurately predict velocities in both shallow and deep regions, with high precision in the highly folded area at

x = 300 m

. The down time–space approach accurately predicts velocities in shallow regions but performs less accurately in deep areas compared to time-scale reduction. Nonetheless, it still captures the velocity variation trends in deep regions.

Similarly, we employed a less accurate initial model, as shown in Figure 12c, where the Gaussian smoothing parameter

σ

is set to 10. Following the same parameter settings and inversion process, we obtained the inversion results presented in Figure 15. Figure 15a displays the inversion result under the time-scale reduction strategy. Compared to the results shown in Figure 13a, the inversion results here are satisfactory even with a less accurate initial model. To further evaluate the inversion performance, we compared the velocity profiles of inversion results obtained from initial models with different degrees of smoothing, taking profiles at

x = 300 m

and

x = 700 m

(Figure 16). In Figure 16a, at x = 300 m, the inversion result from the initial model with

σ = 5

slightly outperforms that of

σ = 10

in both shallow and deep regions; however, the overall difference is minimal. Figure 16b shows that at

x = 700 m

, the inversion results from both models are nearly identical. This demonstrates that the current network architecture possesses a certain degree of robustness and adapts well to different initial models. Similarly, comparable results were observed for the inversion under the time -scale reduction strategy, as shown in Figure 16c,d.

The quantitative analysis of inversion results under different scenarios is summarized in Table 5, encompassing both the Homogeneous model with different scale-decomposition strategies and the Marmousi model with varying initial models and scale reduction approaches. Notably, for the Marmousi model, the R2 scores, SSIM, and NCC values remain remarkably consistent across different initial models under the same scale-decomposition strategy. This consistency strongly demonstrates the robustness of our dual-PIRNN network architecture to initial model perturbations. To further investigate the impact of initial model errors on inversion results under our current network architecture, we employed a heavily smoothed velocity model with

σ = 20

as shown in Figure 12d as the initial model. Following the same inversion procedure, we found that local minima emerged under the time-scale reduction strategy, as illustrated in Figure 17. This indicates that when the initial model is excessively smoothed, the non-uniqueness of the optimization problem becomes more pronounced, leading to inversion errors. For cases with less accurate initial models, additional prior information should be incorporated to better constrain the inversion process.

4.4. GPU Memory Usage Analysis

Having validated the effectiveness of our proposed dual-PIRNN FWI network architecture, we now address GPU resource consumption issues during the inversion process. In the deep-learning framework, GPU memory usage can be primarily categorized into four components: model parameters, gradients of model parameters, optimizer states, and intermediate activation values. In the proposed dual-PIRNN FWI framework, the model parameters are defined as the velocity parameters. The gradients of the model parameters refer to the gradients of the loss function with respect to the velocity parameters, which are automatically calculated and stored during the backpropagation process. For the optimizer states, since the Adam optimizer is employed, the first-order and second-order moment states for each model parameter must be maintained. This results in memory consumption approximately twice that of the model parameters. Lastly, in the context of PIRNN-based FWI, the intermediate activation values correspond to the wavefield information within all RNN units across the entire spatiotemporal field. Wavefield data for each time point are sequentially calculated and stored according to the time-series relationship, consuming a significant amount of GPU memory. From a resource utilization perspective, decomposing both time and spatial scales reduces the number of time points and spatial grid points required for numerical simulation. This decomposition significantly decreases GPU memory consumption for intermediate activation values, thereby improving resource efficiency during the inversion process. Compared to traditional FWI methods that require storing the complete forward wavefield for adjoint-state calculations, our dual-PIRNN approach demonstrates distinct advantages in memory management. While conventional methods need to store wavefields at all time steps for backpropagation, leading to memory requirements scaling with

O (N T)

(where N is the spatial grid size and T is the number of time steps), our method’s memory usage primarily comes from storing gradients of intermediate activations. As demonstrated in our previous simulation analysis (Figure 13, Figure 14, Figure 15 and Figure 16), this approach maintains comparable inversion accuracy while offering flexible memory-accuracy trade-offs through scale decomposition.

In our experiments, we applied three scales to the Marmousi velocity model: original scale, downtimescale, and downtimespacescale. For the original scale, the dual-PIRNN FWI network architecture is not required; instead, inversion is directly performed following the PIRNN-based inversion process, utilizing 1600 time points. For inversion under downtimescale, based on scale decomposition, the required number of time points for numerical simulation is reduced to 800, while the grid size remains unchanged. Theoretically, this reduces the storage needed for wavefield information in the intermediate activation values by half. However, to perform an inversion on the reduced-scale using the dual-PIRNN network structure, forward modeling is first conducted on the original-scale branch to obtain shot-gather data, which is then transformed to the reduced -scale branch according to the scale decomposition. All wavefield data involved in this forward modeling process are unrelated to gradients, thereby requiring less GPU memory. Overall, under downtimescale parameters, total GPU memory consumption is reduced by nearly half compared to the original scale. For the downtimespacescale, a similar process is followed, where forward modeling is first performed on the original-scale branch. After transforming the information to the reduced -scale model, the number of time points required for inversion on the reduced -scale branch becomes 400, with the grid size reduced to

50 \times 50

. Since the GPU memory usage for wavefield data in each RNN unit is directly related to the number of grid points, reducing the grid size from

100 \times 100

to

50 \times 50

, while the time points decreasing from 1600 to 400 significantly decreases resource usage. Consequently, under downtimespacescale, the resource consumption for intermediate activation values during the inversion process is approximately one-sixteenth of that under the original scale. The remaining resource consumption primarily arises from the forward modeling on the original-scale model.

We utilized PyTorch’s (version 1.12.0) memory profiling framework and computational storage analysis to calculate the GPU memory consumption per epoch for the dual-PIRNN FWI network architecture shown in Figure 2. Using the parameters in Table 4 as an example, the intermediate activation values require storing the wavefield information within each RNN unit. As depicted in the internal structure of the SGFD RNN operator in Figure 3, the velocity parameter v is set as a trainable parameter in the network, necessitating the storage of gradient information for all variables related to v during training. This storage requirement effectively doubles the GPU memory usage for these variables. Figure 3 illustrates a simplified RNN unit structure, excluding SPML-related computations. Variables related to the velocity parameter v include,

\partial_{x} v_{x}^{k + 1 / 2}

,

\partial_{z} v_{z}^{k + 1 / 2}

,

p_{x}^{k + 1}

,

p_{z}^{k + 1}

,

v_{x}^{k + 1 / 2}

, and

v_{z}^{k + 1 / 2}

. Adding the v-related variables in SPML, there are 16 intermediate variables requiring gradients and 2 that do not, namely

\partial_{x} p^{k}

and

\partial_{z} p^{k}

. Using this information, we can calculate the approximate GPU resources required for a one-time point. Each variable requires approximately

120 \times 120 \times 4 \div 1024 \approx 56.25 kb

of space. For variables requiring gradients, this is approximately

56.25 \times 2 = 112.5 kb

. Thus, the total size of all stored variables for a one-time point is approximately

(56.25 \times 2 + 112.5 \times 16) \div 1024 \approx 1.8677 Mb

. With 11 shots set and 1600 time points per shot, the total space required is

1.8677 \times 1600 \times 11 \div 1024 \approx 32.1 Gb

. This result is derived from theoretical calculations. Additionally, we obtained a GPU resource consumption of approximately 32.5 GB using PyTorch’s GPU memory profiling functions. The remaining resource usage includes optimizer parameters and other factors, validating the accuracy and reasonableness of our calculation process. The above calculations regarding GPU memory consumption reveal that, within the current inversion network framework, the majority of resource utilization is attributed to storing intermediate wavefield data associated with the velocity parameter v. The magnitude of this resource consumption is directly proportional to the number of time points and grid points. as shown in Table 4, under the downtimescale condition, where the number of time points is reduced from 1600 to 800, the resource consumption for intermediate activation values decreases from approximately 32.1 GB to 16 GB, achieving a 50% reduction in memory usage. Similarly, under the downtimespacescale condition, the grid size is reduced from

100 \times 100

to

50 \times 50

, and the number of time points is decreased from 1600 to 400. This results in a reduction of resource consumption for intermediate activation values from 32.1 GB to approximately 2 GB. We previously conducted inversion tests on the Marmousi model under both downtimescale and downtimespacescale conditions. In this experiment, we performed PIRNN-based velocity inversion at the original scale and compared the results with those from the two reduced-scale conditions (Figure 18). Figure 18a shows the true model, (b) displays the inversion result at the original scale, (c) presents the result under downtimescale, and (d) illustrates the result under downtimespacescale. From these results, it is evident that under the current parameter settings, inversion at downtimescale produces outcomes very close to those at the original scale. The primary difference is the more detailed representation of certain regions of the velocity model at the original scale. When both space and time are further decomposed (Figure 18d), the inversion quality decreases significantly compared to that achieved with time decomposition alone. However, the resource consumption in the downtimespacescale condition is theoretically only 1/16th of that required by the original scale. Figure 19 displays velocity profiles at x = 300 m and x = 700 m, illustrating the inversion results across the three scales. These profiles similarly indicate that inversion performance follows the order:

Original-Scale > DownTime > DownTimeSpace

. The actual resource consumption for each of the three scales is summarized in Table 6.

5. Conclusions

We combined FWI theory with PINNs to simulate the forward propagation of seismic waves, establishing a foundation for PIRNN-based FWI. Building upon this, we introduced scale decomposition and developed the dual-PIRNN FWI network architecture, significantly reducing GPU resource usage in PIRNN-based full-waveform inversion. Data conversion between different scales was implemented for shot gathers, facilitating inversion on downscaled models. This conversion accelerates model convergence while maintaining an acceptable level of inversion resolution. Our simulations on the Marmousi model demonstrate that, within a specific range of parameter settings, the proposed dual-PIRNN FWI network architecture effectively reduces GPU resource consumption while preserving inversion quality. Furthermore, it exhibits robustness and reduces dependence on the initial model.

However, our method still faces several limitations that require further investigation. Although our approach effectively reduces computational costs, it may result in the loss of high-frequency information in complex geological structures. Additionally, insufficient source-receiver coverage can lead to stripe artifacts in the inverted regions, and significant deviations in the initial model may cause the inversion to converge to local minima. To address these challenges, our future work will focus on developing FWI methods that are less sensitive to initial models while incorporating physics-based regularization terms to constrain the inversion process. Furthermore, increasing the source-receiver coverage can enhance convergence performance. These developments will improve the practical applicability of our method in real-world seismic exploration.

Author Contributions

Conceptualization, C.L. and J.L. (Jijun Liu); methodology, J.L. (Jijun Liu); software, J.L. (Jijun Liu); validation, J.L. (Jijun Liu) and H.C.; formal analysis, L.Q.; investigation, L.Q.; resources, J.G.; data curation, J.L. (Jijun Liu) and J.G.; writing—original draft preparation, J.L. (Jijun Liu); writing—review and editing, J.L. (Jiandong Liang); visualization, C.L. and J.L. (Jijun Liu); supervision, C.L.; project administration, C.L. and J.L. (Jiandong Liang); funding acquisition, C.L. and H.C. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the National Natural Science Foundation of China (Grant No. 42474168).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

We are grateful to the reviewers for their valuable comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tarantola, A. Linearized inversion of seismic reflection data. Geophys. Prospect. 1984, 32, 998–1015. [Google Scholar] [CrossRef]
Pratt, R.G.; Shin, C.; Hick, G. Gauss–Newton and full Newton methods in frequency–space seismic waveform inversion. Geophys. J. Int. 1998, 133, 341–362. [Google Scholar] [CrossRef]
Shin, C.; Cha, Y.H. Waveform inversion in the Laplace domain. Geophys. J. Int. 2008, 173, 922–931. [Google Scholar] [CrossRef]
Warner, M.; Guasch, L. Adaptive waveform inversion: Theory. Geophysics 2016, 81, R429–R445. [Google Scholar] [CrossRef]
Pan, W.; Innanen, K.A.; Geng, Y.; Li, J. Interparameter trade-off quantification for isotropic-elastic full-waveform inversion with various model parameterizations. Geophysics 2019, 84, R185–R206. [Google Scholar] [CrossRef]
Sun, B.; Alkhalifah, T. The application of an optimal transport to a preconditioned data matching function for robust waveform inversion. Geophysics 2019, 84, R923–R945. [Google Scholar] [CrossRef]
Ta, Q.B.; Pham, Q.Q.; Pham, N.L.; Huynh, T.C.; Kim, J.T. Smart Aggregate-Based Concrete Stress Monitoring via 1D CNN Deep Learning of Raw Impedance Signals. Struct. Control Health Monit. 2024, 2024, 5822653. [Google Scholar] [CrossRef]
Nguyen, T.T.; Ta, Q.B.; Ho, D.D.; Kim, J.T.; Huynh, T.C. A method for automated bolt-loosening monitoring and assessment using impedance technique and deep learning. Dev. Built Environ. 2023, 14, 100122. [Google Scholar] [CrossRef]
Das, V.; Pollack, A.; Wollner, U.; Mukerji, T. Convolutional neural network for seismic impedance inversion. Geophysics 2019, 84, R869–R880. [Google Scholar] [CrossRef]
Puzyrev, V. Deep learning electromagnetic inversion with convolutional neural networks. Geophys. J. Int. 2019, 218, 817–832. [Google Scholar] [CrossRef]
Moghadas, D. One-dimensional deep learning inversion of electromagnetic induction data using convolutional neural network. Geophys. J. Int. 2020, 222, 247–259. [Google Scholar] [CrossRef]
Wang, H.; Liu, W.; Xi, Z.z.; Fang, J.h. Nonlinear inversion for magnetotelluric sounding based on deep belief network. J. Cent. South Univ. 2019, 26, 2482–2494. [Google Scholar] [CrossRef]
Liao, X.; Shi, Z.; Zhang, Z.; Yan, Q.; Liu, P. 2D inversion of magnetotelluric data using deep learning technology. Acta Geophys. 2022, 70, 1047–1060. [Google Scholar] [CrossRef]
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [Google Scholar] [CrossRef]
Jin, X.; Cai, S.; Li, H.; Karniadakis, G.E. NSFnets (Navier-Stokes flow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations. J. Comput. Phys. 2021, 426, 109951. [Google Scholar] [CrossRef]
Mao, Z.; Jagtap, A.D.; Karniadakis, G.E. Physics-informed neural networks for high-speed flows. Comput. Methods Appl. Mech. Eng. 2020, 360, 112789. [Google Scholar] [CrossRef]
Mousavi, S.M.; Ellsworth, W.L.; Zhu, W.; Chuang, L.Y.; Beroza, G.C. Earthquake transformer—An attentive deep-learning model for simultaneous earthquake detection and phase picking. Nat. Commun. 2020, 11, 3952. [Google Scholar] [CrossRef] [PubMed]
Moseley, B.; Markham, A.; Nissen-Meyer, T. Solving the wave equation with physics-informed deep learning. arXiv 2020, arXiv:2006.11894. [Google Scholar]
Waheed, U.; Haghighat, E.; Alkhalifah, T.; Song, C.; Hao, Q. Eikonal solution using physics-informed neural networks. In Proceedings of the EAGE 2020 Annual Conference & Exhibition Online, Online, 8–11 December 2020; European Association of Geoscientists & Engineers: Utrecht, The Netherlands, 2020; Volume 2020, pp. 1–5. [Google Scholar]
Lu, C.; Zhang, C. Seismic velocity inversion via physical embedding recurrent neural networks (RNN). Appl. Sci. 2023, 13, 13312. [Google Scholar] [CrossRef]
Rasht-Behesht, M.; Huber, C.; Shukla, K.; Karniadakis, G.E. Physics-informed neural networks (PINNs) for wave propagation and full waveform inversions. J. Geophys. Res. Solid Earth 2022, 127, e2021JB023120. [Google Scholar] [CrossRef]
Sun, J.; Niu, Z.; Innanen, K.A.; Li, J.; Trad, D.O. A theory-guided deep-learning formulation and optimization of seismic waveform inversion. Geophysics 2020, 85, R87–R99. [Google Scholar] [CrossRef]
Virieux, J.; Operto, S. An overview of full-waveform inversion in exploration geophysics. Geophysics 2009, 74, WCC1–WCC26. [Google Scholar] [CrossRef]
Fichtner, A. Full Seismic Waveform Modelling and Inversion; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
Borisov, D.; Modrak, R.; Gao, F.; Tromp, J. 3D elastic full-waveform inversion of surface waves in the presence of irregular topography using an envelope-based misfit function. Geophysics 2018, 83, R1–R11. [Google Scholar] [CrossRef]
Hardage, B.A. Vertical seismic profiling. Lead. Edge 1985, 4, 59. [Google Scholar] [CrossRef]
Plessix, R.É.; Baeten, G.; de Maag, J.W.; ten Kroode, F.; Rujie, Z. Full waveform inversion and distance separated simultaneous sweeping: A study with a land seismic data set. Geophys. Prospect. 2012, 60, 733–747. [Google Scholar] [CrossRef]
Brossier, R.; Operto, S.; Virieux, J. Velocity model building from seismic reflection data by full-waveform inversion. Geophys. Prospect. 2015, 63, 354–367. [Google Scholar] [CrossRef]
Plessix, R.E. A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 2006, 167, 495–503. [Google Scholar] [CrossRef]
Fichtner, A.; Trampert, J. Resolution analysis in full waveform inversion. Geophys. J. Int. 2011, 187, 1604–1624. [Google Scholar] [CrossRef]
Tarantola, A. Inversion of seismic reflection data in the acoustic approximation. Geophysics 1984, 49, 1259–1266. [Google Scholar] [CrossRef]
Yang, P. A Numerical Tour of Wave Propagation. ResearchGate 2014. Available online: https://www.researchgate.net/profile/Pengliang-Yang/publication/262936022_A_numerical_tour_of_wave_propagation/links/57d3e6de08ae601b39a80917/A-numerical-tour-of-wave-propagation.pdf (accessed on 1 December 2024).

Figure 1. PRINN Forward Modeling Workflow.

Figure 2. Dual-PIRNN process.

Figure 3. SGFD RNN operator.

Figure 4. Scale transform. The black “X” indicates the source location, while the green “X” represents the receiver positions.

Figure 5. Marmousi model. (a) True model. (b) Initial model.

Figure 6. Shot gathers the true model and the initial model at different scales. The horizontal axis represents the depth of the geophones, while the vertical axis represents the time points. (a) Shot-gather of the true model at the original scale. (b) Shots of the initial model were gathered at the original scale. (c) Shot-gather of the true model at the downscaled level. (d) Shot-gather of the initial model at the downscaled level.

Figure 7. Correlation analysis. (a) SSIM map, where SSIM = 0.9979765, NCC = 0.9986. (b) Difference image.

Figure 8. Forward Modeling. (a) Forward with PML at t = 0.5 s, f = 20 Hz. (b) Forward without PML at t = 0.5 s, f = 20 Hz. (c) Forward with order = 10 at t = 0.25 s, f = 40 Hz. (d) Forward with order = 4 at t = 0.25 s, f = 40 Hz.

Figure 9. Homogeneous layer model. (a) True model (b) Initial model.

Figure 10. Inversion results of homogeneous layer model with different DownScale. (a) Inversion with DownTimeScale. (b) Inversion with DownTimeSpaceScale.

Figure 11. Velocity profiles of homogeneous layer model and loss curve. (a) V profile at x = 300 m. (b) V profile at x = 700 m.

Figure 12. True velocity model and initial model with different Gaussian smoothing. (a) True model. (b) Initial model,

σ

= 5. (c) Initial model,

σ

= 10. (d) Initial model,

σ

= 20.

Figure 12. True velocity model and initial model with different Gaussian smoothing. (a) True model. (b) Initial model,

σ

= 5. (c) Initial model,

σ

= 10. (d) Initial model,

σ

= 20.

Figure 13. Inversion results of Marmousi model with the initial model at

σ

= 5. (a) Inversion with DownTimeScale. (b) Inversion with DownTimeSpaceScale.

Figure 13. Inversion results of Marmousi model with the initial model at

σ

= 5. (a) Inversion with DownTimeScale. (b) Inversion with DownTimeSpaceScale.

Figure 14. Velocity profiles of inversion result at

σ

= 5. (a) V profile at x = 300 m. (b) V profile at x = 700 m.

Figure 14. Velocity profiles of inversion result at

σ

= 5. (a) V profile at x = 300 m. (b) V profile at x = 700 m.

Figure 15. Inversion results of Marmousi model with initial model at

σ

= 10. (a) Inversion with DowmTimeScale. (b) Inversion with DownTimeSpaceScale.

Figure 15. Inversion results of Marmousi model with initial model at

σ

= 10. (a) Inversion with DowmTimeScale. (b) Inversion with DownTimeSpaceScale.

Figure 16. Inversion velocity profiles for different initial velocity models with Gaussian smoothing. (a) Velocity profiles at x = 300 m with downtimescale. (b) Velocity profiles at x = 700 m with downtimescale. (c) Velocity profiles at x = 300 m with downtimespacescale. (d) Velocity profiles at x = 700 m with downtimespacescale. The green dashed line represents the Gaussian smoothness of

σ

= 5, the red dashed line represents the Gaussian smoothness of

σ

= 10, and the black solid line represents the real model.

Figure 16. Inversion velocity profiles for different initial velocity models with Gaussian smoothing. (a) Velocity profiles at x = 300 m with downtimescale. (b) Velocity profiles at x = 700 m with downtimescale. (c) Velocity profiles at x = 300 m with downtimespacescale. (d) Velocity profiles at x = 700 m with downtimespacescale. The green dashed line represents the Gaussian smoothness of

σ

= 5, the red dashed line represents the Gaussian smoothness of

σ

= 10, and the black solid line represents the real model.

Figure 17. Inversion results of Marmousi model with initial model at

σ

= 20. (a) the inversion with DowmTimeScale. (b) the inversion with DownTimeScale.

Figure 17. Inversion results of Marmousi model with initial model at

σ

= 20. (a) the inversion with DowmTimeScale. (b) the inversion with DownTimeScale.

Figure 18. Inversion results at different scales. (a) True model. (b) Inversion at original scale. (c) Inversion at downtimescale. (d) Inversion at downtimespacescale.

Figure 19. Inversion velocity profiles with different scales. (a) Velocity profiles at x = 300 m. (b) Velocity profiles at x = 700 m.

Table 1. Parameters for original and downscaled model.

Parameters	Original Scale	DownScale
Grid size	nz = nx = 100	nz = nx = 100
Gird spacing	dz = dx = 10	dz = dx = 5
Number of shots	11	11
Numbers of receivers	100	100
Source sampling rate	4000 Hz	2000 Hz
Central frequency	40 Hz	40 Hz
Time step	0.5 ms	0.5 ms

Table 2. Parameters for forward modeling.

Parameters	Forward Modeling
Finite-Difference parameter	nz = nx = 200, Order = 4/10
PML Width	20
Width of velocity model	2000 m
Depth of velocity model	2000 m
Central frequency	20/40 Hz
Source Location	Center of model
Velocity value	3000 m/s

Table 3. Parameters for Homogeneous Layer Model.

Parameters	Original Scale	DownTime	DownTimeSpace
Grid Points	nz = nx = 100	nz = nx = 100	nz = nx = 50
Grid Spacing	dz = dx = 10	dz = dx = 5	dz = dx = 5
Number of timesteps	1400	700	350
Central frequency	40 Hz	40 Hz	40 Hz
Source sampling rate	4000/8000 Hz	2000 Hz	2000 Hz
Total data scale	11 × 100 × 100 × 1400	11 × 100 × 100 × 700	11 × 50 × 50 × 350

Table 4. Parameters for different scale Marmousi models.

Parameters	Original Scale	DownTime	DownTimeSpace
Grid Points	nz = nx = 100	nz = nx = 100	nz = nx = 50
Grid Spacing	dz = dx = 10	dz = dx = 5	dz = dx = 5
Number of timesteps	1600	800	400
Central frequency	40 Hz	40 Hz	40 Hz
Source sampling rate	4000/8000 Hz	2000 Hz	2000 Hz
Total data scale	11 × 100 × 100 × 1400	11 × 100 × 100 × 800	11 × 50 × 50 × 400

Table 5. Quantitative metrics for different inversion scenarios.

Model	Scale Strategy	R2	SSIM	NCC
Homogeneous	DownTime	0.9574	0.5478	0.9805
Homogeneous	DownTimeSpace	0.7978	0.6254	0.9449
Marmousi ( $σ$ = 5)	DownTime	0.9254	0.9751	0.9876
Marmousi ( $σ$ = 5)	DownTimeSpace	0.8927	0.7212	0.9477
Marmousi ( $σ$ = 10)	DownTime	0.9421	0.8746	0.9711
Marmousi ( $σ$ = 10)	DownTimeSpace	0.8736	0.6972	0.9373

Table 6. Parameters for different scale Marmousi models.

Components	Original Scale	DownTime	DownTimeSpace
Intermediate activations	32.1 Gb	16 Gb	2.3 Gb
Model parameters	1.8 Gb	1.6 Gb	1.6 Gb
Optimizers Status
Other usage

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, C.; Liu, J.; Qu, L.; Gao, J.; Cai, H.; Liang, J. Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition. Appl. Sci. 2025, 15, 941. https://doi.org/10.3390/app15020941

AMA Style

Lu C, Liu J, Qu L, Gao J, Cai H, Liang J. Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition. Applied Sciences. 2025; 15(2):941. https://doi.org/10.3390/app15020941

Chicago/Turabian Style

Lu, Cai, Jijun Liu, Liyuan Qu, Jianbo Gao, Hanpeng Cai, and Jiandong Liang. 2025. "Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition" Applied Sciences 15, no. 2: 941. https://doi.org/10.3390/app15020941

APA Style

Lu, C., Liu, J., Qu, L., Gao, J., Cai, H., & Liang, J. (2025). Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition. Applied Sciences, 15(2), 941. https://doi.org/10.3390/app15020941

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Resource-Efficient Acoustic Full-Waveform Inversion via Dual-Branch Physics-Informed RNN with Scale Decomposition

Abstract

1. Introduction

2. Theory

2.1. FWI Math Model

2.2. PIRNN Forward Modeling

2.3. PIRNN Velocity Model Inversion

3. Methodology

3.1. Network Structure

3.2. SGFD RNN Operator

3.3. Scale Decomposition

4. Numerical Examples

4.1. Forward Modeling

4.2. Homogeneous Layer Model

4.3. Marmousi Model

4.4. GPU Memory Usage Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI