Accelerated Full Waveform Inversion by Deep Compressed Learning

Gelboim, Maayan; Adler, Amir; Araya-Polo, Mauricio

doi:10.3390/s26061832

Open AccessArticle

Accelerated Full Waveform Inversion by Deep Compressed Learning

by

Maayan Gelboim

¹,

Amir Adler

^2,* and

Mauricio Araya-Polo

³

¹

Braude College of Engineering, Karmiel 2161002, Israel

²

Massachusets Institute of Technology, Cambridge, MA 02319, USA

³

TotalEnergies EP R&T US, Houston, TX 77002, USA

^*

Author to whom correspondence should be addressed.

Sensors 2026, 26(6), 1832; https://doi.org/10.3390/s26061832

Submission received: 20 January 2026 / Revised: 6 March 2026 / Accepted: 11 March 2026 / Published: 13 March 2026

(This article belongs to the Special Issue Acquisition and Processing of Seismic Signals)

Download

Browse Figures

Versions Notes

Abstract

We propose and test a method to reduce the dimensionality of Full Waveform Inversion (FWI) inputs as a computational cost mitigation approach. Given modern seismic acquisition systems, the data (as an input for FWI) required for an industrial-strength case is in the teraflop level of storage; therefore, solving complex subsurface cases or exploring multiple scenarios with FWI becomes prohibitive. The proposed method utilizes a deep neural network with a binarized sensing layer that learns by compressed learning seismic acquisition layouts from a large corpus of subsurface models. Thus, given a large seismic data set to invert, the trained network selects a smaller subset of the data, then by using representation learning, an autoencoder computes latent representations of the shot gathers, followed by K-means clustering of the latent representations to further select the most relevant shot gathers for FWI. This approach can effectively be seen as a hierarchical selection. The proposed approach consistently outperforms random data sampling, even when utilizing only 10% of the data for 2D FWI, and these results pave the way to accelerating FWI in large scale 3D inversion.

Keywords:

full waveform inversion; compressed learning; compressed sensing; representation learning; autoencoder; K-means clustering

1. Introduction

Seismic inversion is a fundamental tool in geophysical analysis, involving the estimation of subsurface properties such as velocity, density, and impedance from recorded seismic data. Full Waveform Inversion (FWI) [1] is a widely-used seismic inversion tool, and it has become the industry standard for model building/reconstruction in subsurface characterization workflows. It is utilized for hydrocarbon exploration,

C O_{2}

sequestration, and shallow hazard assessment, among others. FWI has all the challenges of an inversion process: non-linear, non-convex, ill-posed, and it is computationally demanding [1], in particular for 3D inversion and when more complete physics beyond the acoustic approximation are targeted (i.e., elastic, viscoelastic).

Modern seismic acquisition configurations with a large number of seismic sources (i.e., shots) significantly increase the computational burden and complicate data manipulation. Several research directions aim to optimize seismic configurations and subsequent data manipulation to address these challenges. For example, using diffusion models for reconstruction and importance weighting of sources [2] or creating a sampling pattern based upon the changing complexity of the sampling area [3]. Some studies such as [4] have explored the integration of Compressed Sensing (CS)-based methods, where for instance, random sampling of sources is performed, then other sources are added to the sampling space, controlling the maximum distance between sources as a parameter. In [5] a greedy algorithm is conditioned by computing illumination maps from virtual sources towards selecting an optimal subset of the data. A combined deep learning (DL) network and a straight-through estimator (STE) [6] to limit the number of sources and receivers when sampling shot gathers was suggested in [7]. Thus, balancing between limiting the number of sources, but still optimizing the selection of sources for the reconstruction of shot gathers. A combination of generative adversarial network (GAN) and CS was proposed in [8] that introduced shot gather reconstruction based on conditional-GAN combined with a repeated creation of sampling schemes to optimize the placement of sources near surface obstacles. In the same context, Jiang et al. [9] utilized GAN to create a transformed domain that connects a CS-based sampling of the seismic data to the original, non-sampled data. Nevertheless, all the above works focus on seismic data reconstructing, rather than inversion. Improving FWI with DL is a developing field of research, including the improvement of initial models [10] or combining DL network with FWI principles to improve reconstruction’s results, like using an iterative approach for a reconstruction [11] and using the wave equation [10,12]. Random shot selection for FWI, which is industry standard for reducing FWI computational load, was proposed in [13]. Methods for improving the calculation of the gradient generated by this selection approach were presented in [14,15].

In previous work, we utilized Compressed Learning (CL) [16], which is a framework for machine learning in the CS domain, for DL-based 3D seismic inversion [17]. In this paper we leverage this approach, and augment it with representation learning to reduce the FWI computational load; by choosing only a small subset of the available seismic data, this step is essential in realistic scenarios where very large volumes of seismic data, on the order of Terabytes, are typically utilized to reconstruct subsurface models.

The contributions to this paper are two-fold: (i) we present a deep learning workflow that first learns candidate compressed sensing layouts, then selects online the best layout for specific data to invert by FWI; and (ii) we introduce the first utilization of representation learning [18] for feature extraction and clustering [19,20] of seismic shot gathers in learned latent space for inversion. The rest of this paper is organized as follows: Section 2 presents theoretical background on FWI and the main ingredients of the proposed solution: compressed learning and sensing, and representation learning. Section 3 presents the two-stage solution, including deep compressed learning and shot gathers clustering by representation learning. Performance evaluation is presented in Section 4, and conclusions are discussed in Section 5.

2. Theoretical Background

FWI is a seismic imaging technique that uses the full seismic wavefield to create high-resolution subsurface models. It works by iteratively adjusting a subsurface model

\hat{m}

until synthetic data generated from the forward operator

F (\hat{m})

matches the observed seismic data

d^{o b s}

. FWI minimizes the following loss function:

J (\hat{m}) = \sum_{i = 1}^{N_{s}} | | F_{i} (\hat{m}) - d_{i}^{o b s} {| |}_{2}^{2},

(1)

where

N_{s}

is the number of shots (i.e., seismic sources). Therefore, the complexity of minimizing

J (\hat{m})

is linear in

N_{s}

, which motivates FWI running time reduction by selecting only a small subset of the shots that are the most relevant for the inversion process.

F

is a forward operator, which numerically solves seismic waves propagation through the mechanical medium (

\hat{m}

). In this work, we simulated seismic waves using the acoustic approximation [21], represented by the following wave equation:

\frac{\partial^{2} u}{\partial t^{2}} - V \nabla^{2} u = f,

(2)

where

u = u (x, z, t)

is the seismic wave displacement,

V

is the P-wave velocity model and

f

is the perturbation source (i.e., shot) function. The relationship between the forward operator and the wave displacement can be described as:

F_{i} (\hat{m}) = R u

where

R

is a detection operator, responsible for obtaining the values calculated by the propagation simulation.

To reduce the number of utilized shots we designed a compressed learning [16] deep network, detailed in Section 3, that performed compressed sensing of the shots. Compressed Sensing (CS) enables reconstruction of a signal from a small number of linear projections (i.e., measurements) measurements, under certain assumptions as detailed in the following. Given a signal

x \in R^{N}

an

M \times N

sensing matrix

Φ

(such that

M ≪ N

) and a measurements vector

y = Φ x

, the goal of CS is to recover the signal from its measurements. According to CS theory [22], signals that have a sparse representation in the domain of some linear transform can be exactly recovered with high probability from their measurements. While CS was originally developed for general sensing matrices, it was extended [23] to binary sensing matrices. Compressed Learning (CL) is the extension of CS to solve machine learning problems in the CS domain. CL was introduced in [16], which proved that direct inference from the compressive measurements

y = Φ x

is possible with high accuracy in Support Vector Machines. CL extensions to deep learning were introduced by [24,25].

In this work, we employed for the first time representation learning (RL) [26] to analyze seismic traces in latent space. RL is utilized for automatic discovery of useful features from raw data, without relying on manual feature engineering. For RL we utilized an autoencoder, composed by a cascade of encoder and decoder sub-networks. The encoder

f_{E}

maps a seismic signal

x

to a reduced dimension latent representation

z

, namely

z = f_{E} (x)

. The decoder reconstructs the signal

x

from the latent representation, namely

\hat{x} = f_{D} (z)

. The latent representation

x

retains essential information, enabling tasks such as denoising, compression, anomaly detection and inference, among others, to be performed [18,19].

3. Deep Compressed Learning and Representation Learning in FWI

We designed a 2D Deep Compressed Learning (DCL) architecture (Table 1) that jointly optimizes shots selection and the inversion of the selected shots. A sensing layer for shot selection was designed using a straight-through estimator [6] to create learnable binary weights for all available shots (i.e., ‘1’ indicates a selected shot and ‘0’ non-selected shot). This layer utilizes a binarization function,

ϕ_{b} (w) = 1 (w > 0)

, returning ‘1’ if

w > 0

and ‘0’ otherwise, where

w \in R

is a learnable weight. The binarization function’s derivative is discontinuous and has an indefinite value, making it unsuitable for back-propagation. As a result, we employ the binarization function for the forward-pass but substitute the hard-sigmoid function [6] for the backward-pass. This architecture was trained by minimizing the following mixed loss function:

L (m_{i}, \hat{m_{i}}, R, \hat{R}) = MAE (m_{i}, \hat{m_{i}}) + μ {(R - \hat{R})}^{2},

(3)

where MAE is mean absolute error,

m_{i}, \hat{m_{i}}

represent the ground-truth and predicted velocity models, respectively. Here, the reconstructed velocity model can be written as

\hat{m_{i}} = R (ϕ_{b} ⊙ d_{i}; W)

. Where

d_{i}

represents the shot gathers of

m_{i}

and ⊙ is the element-wise multiplication operator.

R

serves as DL reconstruction operator.

R, \hat{R}

are the target and learned sensing rates, respectively, and

μ > 0

controls the trade-off between the two misfit terms. The learned sensing rate is defined as

\hat{R} = \frac{1}{N_{s}} \sum_{i = 1}^{N_{s}} ϕ_{b} (i)

, where

ϕ_{b} (i)

is the i-th binarized coefficient of the sensing layer. The binarized coefficient’s gradient is estimated using a STE. The loss function

L (m_{i}, \hat{m_{i}}, R, \hat{R})

measures two misfit terms: the mean absolute error (MAE) between the ground-truth velocity model and the reconstructed one

\hat{m_{i}}

, provided by the DL reconstruction operator, whose input is the compressive sensed subset of shot gathers, denoted by

ϕ_{b} ⊙ d_{i}

; and the squared error between the target sensing rate

(0 < R < 1)

and the learned sensing rate

\hat{R}

. Thus, the optimization of

ϕ_{b}

is based on these two misfit terms. Training the DCL model yields two outputs: a set of selected shot gathers (i.e., learned sensing pattern) and a reconstructed velocity model, not used in this work.

We observed that training the DCL multiple times resulted in different sensing patterns for velocity model reconstruction with comparable high quality (SSIM

> 0.9

). This variability, namely having several equally good sensing patterns, is reasonable and was also reported by [4] in the context of CS for shot gather reconstruction. Therefore, for all the velocity models in the training data, we can learn multiple sensing patterns of shot gathers, by multiple training of the DCL network. However, applying each one of the learned patterns on new unseen data (i.e., selecting different subsets of shots), will not necessarily provide equally good results, and requires identifying the best sensing pattern for the specific data to invert by an additional selection step.

Let

{θ_{1}, θ_{2}, \dots, θ_{P}}

denote a set of P vectors, where each

θ_{i} \in {1, 2, \dots, N}^{K}

represents a sensing pattern for selecting K shot gathers from a total of N sources, aimed at achieving high-quality reconstruction of the velocity model

\hat{m}

. Let

S = {X_{1}, X_{2}, \dots, X_{N}}

denote a set of N shot gathers, here each

X_{i} \in R^{R \times T}

, where

R, T

are the number of receivers and time-samples, respectively.

Representation learning: we employed a convolutional auto-encoder (CAE) for mapping shot gathers

X_{i}

to compact latent representations. The encoder sub-network E projects the shot gather into a low-dimensional latent representation

z_{i} = E (X_{i})

, where

z_{i} \in R^{F \times H \times W}

and

F \times H \times W < R \times T

. The decoder sub-network (including the bottleneck) D, reconstructs the CAE input as

\hat{X_{i}} = D (z_{i})

.

We further applied K-means, denoted by

f_{K m e a n s} (z_{1}, \dots, z_{N})

, using the latent representations as feature vectors, to create K clusters [20,27]. The clustering process results in N labels (one per shot):

L = {{\hat{y}}_{(z_{1})}, \dots, {\hat{y}}_{(z_{N})}}

where

{\hat{y}}_{(z_{i})} \in {1, 2, \dots, K}

. Let us denote a set

S_{j}

as a subset of shot gathers that belong to class

j \in {1, \dots, K}

. The subset

L_{θ_{i}}

includes only the labels associated with the shot gathers specified by

θ_{i}

, defined by

L_{θ_{i}} = \{{\hat{y}}_{(z_{j})} \in L | j \in θ_{i}\} .

(4)

The underlying assumption is that a highly informative sensing pattern would include shot gathers from as many clusters as possible. To quantify this, we assign a diversity score

s_{1 θ_{i}}

to each recommendation vector

θ_{i}

, calculated as the number of unique elements in

L_{θ_{i}}

:

s_{1 θ_{i}} = |unique (L_{θ_{i}})|

, and selected the sensing pattern vector with the highest diversity score. In cases where multiple vectors achieve the same maximum value of

s_{1 θ_{i}}

, we introduce a distance score

s_{2 θ_{i}}

, defined as the sum of pairwise Euclidean distances between the latent representations of the shot gathers within

θ_{i}

:

s_{2 θ_{i}} = \sum_{n \in θ_{i}} \sum_{m \in θ_{i}} {∥ z_{n} - z_{m} ∥}_{2} .

(5)

The underlying assumption is that higher average distance between shot gathers in the latent space, namely a higher

s_{2 θ_{i}}

score, indicates that more unique information is carried by these shots, and therefore more relevant for inversion. The complete solution is summarized in Figure 1 and Algorithm 1.

Algorithm 1: Shots Sensing Pattern Selection.

3.1. Dataset Preparation

We created a dataset of 5800 layered 2D velocity models with 5–8 layers using GemPy [28], with velocity ranging between 2 to 4.5 km/s. The dataset was split into the following disjoint sets; deep learning models (DCL and autoencoder) training and validation 5500 and 250 models, respectively, and 50 testing models for FWI evaluation. Each velocity model represented an area of 2 km × 1 km (width × depth) discretized to a grid of 288 × 144 points. The spacing between grid points was set at 6.99 m. Receivers were placed at every other grid point along the lateral dimension. Each shot gather was recorded for a period of one second. For each velocity model, 20 shot gathers were generated, with shots positioned uniformly along the 2 km lateral dimension.

3.2. Deep Learning Architectures and Training

Table 1 and Table 2 present the two deep learning architectures used in our experiments. Table 1 describes our proposed DCL architecture, where all convolution layers include ReLU activations. IN* denotes instance normalization layer, and Enc1* denotes an encoder layer without max pooling. For the STE, we initialized the model randomly while ensuring that the number of initially selected shot gathers matched the target value. This design enables monitoring of how far the STE deviated from its initial configuration during training. The initial weight for a binary ‘1’ was set to 0.25, and for a binary ‘0’ to −0.25. Table 2 presents the autoencoder architecture used to encode shot gathers (output of Enc5) for K-means clustering. Prior to training the autoencoder, all shot gathers were normalized to [−1, 1] for consistent input scaling. ADAM optimizer was used for training all deep learning models, experiments were conducted on NVIDIA GraceHopper GH200 GPUs.

4. Performance Evaluation

As a reference for all experiments, FWI experiments were conducted using the complete set of 20 shot gathers on the 50 test velocity models (not used during training or validation of the deep learning models). Following standard practice, we initialized the FWI executions with smoothed version of the ground-truth velocity model. We then repeated the FWI experiments on the same testset of velocity models, but using randomly selected subsets of 2, 3, 4, and 5 shot gathers. For each of these configurations, the shot gathers were randomly chosen prior starting FWI. To ensure statistical robustness, we repeated the random selection experiments three times for each velocity model.

All FWI experiments were undertaken following stopping conditions: (i) Early stopping after 10 iterations without decrease in the loss function

J (m)

; or (ii) Early stopping upon reaching

J (m) \leq 0.05

, a threshold beyond which we observed no significant improvements in velocity model reconstruction; or (iii) a limit of 500 iterations. All FWI experiments were performed on computing nodes supporting AMD Genoa-X multi-core. We trained three times each DCL architecture (i.e., for 2, 3, 4, and 5 shot gathers). This resulted in a total of twelve distinct shot gather sensing layers (three per sensing rate). Since we used 50 velocity models for testing, this lead to 150 FWI runs per sensing rate. Next, FWI experiments were carried out using each DCL-learned sensing layer, with standard smoothed initial velocity model. In the final stage, we applied the proposed solution (DCL-RL) to select the shots for FWI, and evaluated inversion quality. Figure 2 presents examples of FWI results under the different strategies for shot selection, and Figure 3 summarizes measured FWI running times for the different number of selected shots, clearly indicating the potential of accelerating FWI.

Table 3 summarizes the results of all experiments above with noiseless data. Inversion quality trends were observed as follows: (i) as expected, FWI utilizing a randomly selected subset of the available shots produced less accurate models as compared to using all 20 shots. (ii) DCL-based shot selection vs. random selection, is sometimes better but not for all sensing rates. (iii) the proposed DCL-RL solution consistently outperforms random shot selection in terms of mean absolute error (MAE, lower is better), structural similarity (SSIM, higher is better) and Peak Signal-to-Noise Ratio (PSNR, higher is better), for all sensing rates. Table 4 summarizes the results of all experiments above with noisy date (additive Gaussian noise at SNR = 10 dB), clearly demonstrating the advantage of DCL-RL as compared to random selection, also in the presence of noise.

5. Conclusions

We presented a novel solution for seismic sources (e.g., shots) selection in FWI by utilizing auxiliary compressed learning and representation learning deep neural networks. The proposed DCL-RL solution achieved consistent advantage in FWI model reconstruction quality as compared to random shot selection, which is often used in practice. Our study also reveals that the advantage of DCL-RL increases for the lower sensing rates (e.g., almost 3dB PSNR advantage at 10% sensing rate). These results pave the way for FWI acceleration in large scale surveys with very large numbers of shots gathers, especially (but not limited to) 3D surveys. Future work should evaluate this method on larger and more diverse 2D and 3D testsets.

Author Contributions

Conceptualization, M.G., A.A. and M.A.-P.; methodology, M.G., A.A. and M.A.-P.; software, M.G. and M.A.-P.; validation, M.G.; formal analysis, M.G., A.A. and M.A.-P.; investigation, M.G.; data curation, M.G.; writing—original draft preparation, M.G., A.A. and M.A.-P.; writing—review and editing, A.A. and M.A.-P.; visualization, M.G. and M.A.-P.; supervision, A.A. and M.A.-P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by TotalEnergies EP R&T US.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The authors acknowledge TotalEnergies EP R&T US, for supporting this work and allowing its publication.

Conflicts of Interest

Author Mauricio Araya-Polo was employed by the company TotalEnergies, EP R&T. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Virieux, J.; Operto, S. An overview of full-waveform inversion in exploration geophysics. Geophysics 2009, 74, WCC1–WCC26. [Google Scholar] [CrossRef]
Wang, X.; Wang, Z.; Lei, X.; Zhu, C.; Gao, J. Seis-PDDN: Seismic Undersampling Design and Reconstruction Using Prior Distribution and Diffusion Null-Space Iteration. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5932313. [Google Scholar] [CrossRef]
Xu, Y.; Yu, S. Active sampling design for seismic data based on dynamic matching distance. In Proceedings of the SEG International Exposition and Annual Meeting; SEG: Houston, TX, USA, 2024; p. SEG-2024. [Google Scholar]
Titova, A.; Wakin, M.B.; Tura, A.C. Achieving robust compressive sensing seismic acquisition with a two-step sampling approach. Sensors 2023, 23, 9519. [Google Scholar] [CrossRef] [PubMed]
Verschuren, M.; Araya-Polo, M. Directional Full-Wave Scatter Source Modelling and Dip-Sensitive Target-Oriented RTM. Annu. Eage Meet. Proc. 2017, 2017, 1–5. [Google Scholar] [CrossRef]
Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized neural networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Hernandez-Rojas, A.; Arguello, H. Design of Undersampled Seismic Acquisition Geometries via End-to-End Optimization. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5901413. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, S.; Lin, J.; Sun, F.; Zhu, X.; Yang, Y.; Tong, X.; Yang, H. An Efficient Seismic Data Acquisition Based on Compressed Sensing Architecture with Generative Adversarial Networks. IEEE Access 2019, 7, 105948–105961. [Google Scholar] [CrossRef]
Jiang, Y.; Chen, H.; Huang, P.; Guo, L.; Liu, Y. Research on Active Obstacle Avoidance in Seismic Acquisition Using Compressed Sensing and Deep Learning Reconstruction Technology. IEEE Access 2024, 12, 177634–177646. [Google Scholar] [CrossRef]
Dong, X.; Yuan, Z.; Lin, J.; Dong, S.; Tong, X.; Li, Y. Bidirectional physics-constrained full-waveform inversion: Reducing seismic data dependence in velocity model building. Geophys. J. Int. 2026, 244, ggaf466. [Google Scholar] [CrossRef]
Chen, G. Accurate background velocity model building method based on iterative deep learning in sparse transform domain. arXiv 2024, arXiv:2407.19419. [Google Scholar] [CrossRef]
Zheng, Q.; Li, M.; Wu, B. Physics-Guided Self-Supervised Learning Full Waveform Inversion with Pretraining on Simultaneous Source. J. Mar. Sci. Eng. 2025, 13, 1193. [Google Scholar] [CrossRef]
Díaz, E.; Guitton, A. Fast full waveform inversion with random shot decimation. In Proceedings of the SEG International Exposition and Annual Meeting; SEG: Houston, TX, USA, 2011; p. SEG-2011. [Google Scholar]
Kuldeep; Shekar, B. Full waveform inversion with random shot selection using adaptive gradient descent. J. Earth Syst. Sci. 2021, 130, 183. [Google Scholar] [CrossRef]
Gao, F.; Yao, G.; Liu, Z. Seismic waveform inversion with Fourier-based gradient shaping. J. Geophys. Eng. 2025, 22, 1822–1836. [Google Scholar] [CrossRef]
Calderbank, R. Compressed Learning: Universal Sparse Dimensionality Reduction and Learning in the Measurement Domain. Preprint 2009. Available online: https://www.academia.edu/2791697/Compressed_learning_Universal_sparse_dimensionality_reduction_and_learning_in_the_measurement_domain (accessed on 10 March 2026).
Gelboim, M.; Adler, A.; Sun, Y.; Araya-Polo, M. Deep compressed learning for 3D seismic inversion. In Proceedings of the Third International Meeting for Applied Geoscience & Energy Expanded Abstracts; SEG International Exposition and Annual Meeting; SEG: Houston, TX, USA, 2023; Volume 1, p. SEG–2023–3898594. [Google Scholar] [CrossRef]
Fard, M.M.; Thonet, T.; Gaussier, E. Deep K-means: Jointly clustering with k-means and learning representations. Pattern Recognit. Lett. 2020, 138, 185–192. [Google Scholar] [CrossRef]
Lafabregue, B.; Weber, J.; Gançarski, P.; Forestier, G. End-to-end deep representation learning for time series clustering: A comparative study. Data Min. Knowl. Discov. 2022, 36, 29–81. [Google Scholar] [CrossRef]
Carthy, J.; Rey-Devesa, P.; Titos, M.; Benitez, C. Volcano-Seismic Event Detection and Clustering. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 11276–11289. [Google Scholar] [CrossRef]
Tarantola, A. Inversion of seismic reflection data in the acoustic approximation. Geophysics 1984, 49, 1259–1266. [Google Scholar] [CrossRef]
Donoho, D. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Lu, W.; Dai, T.; Xia, S.T. Binary Matrices for Compressed Sensing. IEEE Trans. Signal Process. 2018, 66, 77–85. [Google Scholar] [CrossRef]
Adler, A.; Zisselman, E.; Elad, M. Compressed Learning: A Deep Neural Network Approach. In Proceedings of the Signal Processing with Adaptive Sparse Structured Representations (SPARS17); HAL: Paris, France, 2017. [Google Scholar]
Zisselman, E.; Adler, A.; Elad, M. Compressed learning for image classification: A deep neural network approach. In Handbook of Numerical Analysis; Elsevier: Amsterdam, The Netherlands, 2018; Volume 19, pp. 3–17. [Google Scholar]
Bengio, Y.; Courville, A.; Vincent, P. Representation Learning: A Review and New Perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1798–1828. [Google Scholar] [CrossRef] [PubMed]
Yang, Z.; Li, H.; Tuo, X.; Li, L.; Wen, J. Unsupervised Clustering of Microseismic Signals Using a Contrastive Learning Model. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5903212. [Google Scholar] [CrossRef]
de la Varga, M.; Schaaf, A.; Wellmann, F. GemPy 1.0: Open-source stochastic geological modeling and inversion. Geosci. Model Dev. 2019, 12, 1–32. [Google Scholar] [CrossRef]

Figure 1. The proposed DCL-RL solution evaluates sensing patterns, obtained from trained deep compressed learning models. Using the latent representations of the shot gathers, we assign scores to each learned sensing pattern to identify the most informative subset of shot gathers for reconstructing a velocity model.

Figure 2. FWI results: per each velocity model (A–H) the ground truth and inversion using all (20) shot gathers are compared vs. inversion with 4 shots (20% sensing rate), 3 shots (15%), and 2 shots (10%), clearly indicating the advantage of the proposed DCL-RL approach over random sampling.

Figure 3. FWI running time on AMD Genoa-X multi-core vs. number of utilized shot gathers. Blue dots indicate measured times for 2, 3, 5 and 20 shot gathers.

Table 1. Deep compressed learning architecture. IN* denotes instance normalization layer, and Enc1* denotes an encoder layer without max pooling.

Block	Layer	Unit	Comments
Input	0	20 shot gathers	20 × 864 × 144
			grid points
Sensing	1	Binarized Sensing Layer	20 learnable
			parameters
Enc1	2	Conv2D (32, (5 × 5))	+IN*
	3	Conv2D (32, (5 × 5))	+IN*
	4	MaxPool2D (2 × 2)	+Dropout (0.2)
Enc2	5–7	Enc1 (64)
Enc3	8–10	Enc1 (128)
Enc4	11–13	Enc1 (256)
Enc5	14–15	Enc1* (512)
Dec1	16	ConvTrans2D (256, (2 × 2))	+IN*
	17	Conv2D (256, (2 × 2))	+IN*
	18	Conv2D (256, (2 × 2))	+IN*
Dec2	19–21	Dec1 (128)
Dec3	22–24	Dec1 (64)
Dec4	25–27	Dec1 (32)
	28	Conv2D (1, (1 × 6))	stride = (1 × 6)
			dilation = (1 × 1)
output	29	Velocity Model	288 × 144 grid points

Table 2. Autoencoder for shot gathers representation learning.

Block	Layer	Unit
Input	0	Shot gather
Enc1	1	Conv2D (16, (3 × 3), ReLU)
	2	MaxPool2D
Enc2	3–4	Enc1 (8)
Enc3	5–6	Enc1 (8)
Enc4	7–8	Enc1 (8)
Dec1	9	Conv2D (8, (3 × 3), ReLU)
	10	UpSampling2d
Dec2	11–12	Dec1 (8)
Dec3	13–14	Dec1 (8)
Dec4	15–16	Dec1 (16)
Dec5	17	Conv2D (1, (3 × 3), Tanh)
Output	18	Reconstructed Shot Gather

Table 3. FWI results with with noiseless data, different sensing rates and shot selection methods: DCL-RL (Deep CL + Representaion Learning + K-means), DCL (Deep CL), Random Selection. All results averaged over 50 test velocity models.

Shots (Rate)	Selection	MAE	SSIM	PSNR [dB]
2 (10%)	DCL-RL	0.08698	0.69406	30.265
2	DCL	0.12850	0.63556	27.590
2	Random	0.12872	0.64132	27.559
3 (15%)	DCL-RL	0.06928	0.75678	32.654
3	DCL	0.07670	0.74249	32.141
3	Random	0.09110	0.70532	30.402
4 (20%)	DCL-RL	0.06306	0.76140	32.862
4	DCL	0.06387	0.76695	33.021
4	Random	0.07548	0.74934	31.925
5 (25%)	DCL-RL	0.05076	0.79774	34.355
5	DCL	0.06718	0.77216	33.004
5	Random	0.05776	0.78548	33.506
20 (100%)	All	0.03306	0.86592	36.597

Table 4. FWI results with noisy data (SNR = 10 dB), different sensing rates and shot selection methods. All results averaged over 50 test velocity models.

Shots (Rate)	Selection	MAE	SSIM	PSNR [dB]
2 (10%)	DCL-RL	0.10964	0.61794	28.089
2	DCL	0.14988	0.56901	25.805
2	Random	0.15164	0.57876	25.688
3 (15%)	DCL-RL	0.08712	0.68144	30.211
3	DCL	0.09642	0.66771	29.681
3	Random	0.10930	0.64128	28.358
4 (20%)	DCL-RL	0.07962	0.69448	30.592
4	DCL	0.08043	0.70024	30.769
4	Random	0.08802	0.68682	29.935
5 (25%)	DCL-RL	0.06642	0.73792	32.168
5	DCL	0.08214	0.71129	30.868
5	Random	0.07758	0.71844	31.222
20 (100%)	All	0.04572	0.83075	34.920

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gelboim, M.; Adler, A.; Araya-Polo, M. Accelerated Full Waveform Inversion by Deep Compressed Learning. Sensors 2026, 26, 1832. https://doi.org/10.3390/s26061832

AMA Style

Gelboim M, Adler A, Araya-Polo M. Accelerated Full Waveform Inversion by Deep Compressed Learning. Sensors. 2026; 26(6):1832. https://doi.org/10.3390/s26061832

Chicago/Turabian Style

Gelboim, Maayan, Amir Adler, and Mauricio Araya-Polo. 2026. "Accelerated Full Waveform Inversion by Deep Compressed Learning" Sensors 26, no. 6: 1832. https://doi.org/10.3390/s26061832

APA Style

Gelboim, M., Adler, A., & Araya-Polo, M. (2026). Accelerated Full Waveform Inversion by Deep Compressed Learning. Sensors, 26(6), 1832. https://doi.org/10.3390/s26061832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Accelerated Full Waveform Inversion by Deep Compressed Learning

Abstract

1. Introduction

2. Theoretical Background

3. Deep Compressed Learning and Representation Learning in FWI

3.1. Dataset Preparation

3.2. Deep Learning Architectures and Training

4. Performance Evaluation

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI