Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism

Rao, Jie; Chen, Mingju; Song, Xiaofei; Xie, Chen; Duan, Xueyang; Hu, Xiao; Li, Senyuan; Zhang, Xingyue

doi:10.3390/app15158332

Open AccessArticle

Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism

by

Jie Rao

^1,2,

Mingju Chen

^1,2,*

,

Xiaofei Song

^1,2,

Chen Xie

^1,2,

Xueyang Duan

^1,2,

Xiao Hu

^1,2,

Senyuan Li

^1,2 and

Xingyue Zhang

^1,2

¹

College of Automation and Information Engineering, Sichuan University of Science and Engineering, Yibin 644002, China

²

Intelligent Perception and Control Key Laboratory of Sichuan Province, Sichuan University of Science and Engineering, Yibin 644005, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8332; https://doi.org/10.3390/app15158332

Submission received: 3 July 2025 / Revised: 24 July 2025 / Accepted: 24 July 2025 / Published: 26 July 2025

Download

Browse Figures

Versions Notes

Abstract

This study proposes a Swin-ReshoUnet architecture with a three-level enhancement mechanism to address inefficiencies in multi-scale feature extraction and gradient degradation in deep networks for high-precision seismic exploration. The encoder uses a hierarchical convolution module to build a multi-scale feature pyramid, enhancing cross-scale geological signal representation. The decoder replaces traditional self-attention with ORCA attention to enable global context modeling with lower computational cost. Skip connections integrate a residual channel attention module, mitigating gradient degradation via dual-pooling feature fusion and activation optimization, forming a full-link optimization from low-level feature enhancement to high-level semantic integration. Simulated and real dataset experiments show that at decimation ratios of 0.1–0.5, the method significantly outperforms SwinUnet, TransUnet, etc., in reconstruction performance. Residual signals and F-K spectra verify high-fidelity reconstruction. Despite increased difficulty with higher sparsity, it maintains optimal performance with notable margins, demonstrating strong robustness. The proposed hierarchical feature enhancement and cross-scale attention strategies offer an efficient seismic profile signal reconstruction solution and show generality for migration to complex visual tasks, advancing geophysics-computer vision interdisciplinary innovation.

Keywords:

Swin-ReshoUnet; three-level enhancement; hierarchical convolution; ORCA attention; residual channel attention; seismic profile signal reconstruction

1. Introduction

In the field of geophysical exploration, seismic profile signals carry critical information about subsurface medium structures and physical properties, serving as the core basis for characterizing geological structures, identifying oil-gas reservoirs, and assessing disaster risks [1]. However, in practical exploration, seismic trace data loss is prevalent due to complex field topography, observation equipment layout constraints, and economic considerations. Industry statistics show that decimation ratios in conventional seismic acquisition often reach 10–50%, compromising the continuity of event horizons in seismic signals and attenuating effective wave energy [2]. This leads to issues such as blurred seismic imaging and increased reservoir parameter inversion errors. Seismic signal reconstruction technology aims to recover complete, high-fidelity seismic signals from sparse and noisy observation data, with the core goal of maximizing the restoration of true subsurface geological structure information, providing precise data support for energy exploration and geological safety assessment. Thus, it has become an indispensable key link in seismic data processing workflows [3].

Traditional seismic signal reconstruction methods primarily rely on signal processing theories and mathematical optimization models, such as wavelet transform, curvelet transform, and sparse inversion. These methods reconstruct missing data by constructing sparse representation models of signals and imposing regularization constraints. They demonstrate effectiveness in scenarios characterized by simple geological structures and low noise levels, where the underlying signal sparsity assumptions align well with the data characteristics. However, when facing complex and variable geological structures in practical exploration, as well as issues like strong random noise and multiple-wave interference, the limitations of traditional methods become increasingly prominent. Relying on manually designed transform basis functions, they struggle to adapt to the non-stationary nature and multi-scale feature coupling characteristics of seismic signals, often leading to phenomena such as event horizon disconnection and weak signal loss in reconstruction results [4,5,6]. Furthermore, as seismic exploration advances into deep and complex regions, data scale grows exponentially, making traditional methods computationally intensive and inefficient, failing to meet the needs of modern seismic exploration for rapid processing of massive data.

In recent years, deep learning has demonstrated great potential in seismic signal processing due to its powerful nonlinear feature learning capabilities. Convolutional neural networks (CNNs) can automatically extract local features of seismic signals through multi-layer convolution and pooling operations, attaining remarkable achievements in tasks such as seismic denoising and first-break picking. U-Net and its derivative architectures use an encoder–decoder symmetric structure with skip connections to effectively fuse shallow and deep features, achieving high precision in seismic signal reconstruction. In particular, Transformer-based models like Swin-Unet, with their hierarchical window self-attention mechanism, can capture long-range dependencies between seismic traces, providing a new approach for restoring event horizon continuity and being widely applied in seismic data processing [7,8,9].

Despite the progress made by deep learning in seismic signal reconstruction, existing methods still have limitations. On one hand, seismic signals contain multi-scale information ranging from high-frequency waveform details (corresponding to thin-layer reflections) to low-frequency structural trends (corresponding to large-scale geological interfaces), while traditional convolutional networks with fixed receptive fields struggle to efficiently extract features at different scales simultaneously. On the other hand, Swin-Transformer based on window self-attention has high computational complexity and insufficient inter-window information interaction when processing large-scale seismic profiles, limiting its ability to model long-range contextual information [10,11,12]. Additionally, with increased network depth, traditional skip connection methods are prone to gradient degradation, significantly degrading the model’s reconstruction performance for weak reflection signals in high-noise environments.

To tackle these challenges, we propose Swin-ReshoUnet, an improved Swin-Unet model enhancing seismic signal reconstruction via architectural innovations. The encoder introduces a hierarchical convolution module with cascaded kernels of varying sizes to build a multi-scale feature pyramid, boosting multi-scale feature capture. The decoder substitutes ORCA for traditional W-MSA and SW-MSA, enabling efficient feature aggregation via regional coordinate encoding and global context mapping to improve long-range event horizon recovery. The skip connections integrate a residual channel attention mechanism with dual-pooling fusion (global average and max pooling) to generate adaptive channel weights, combined with SiLU activation for optimized gradient propagation, alleviating deep-network gradient degradation. These improvements enable Swin-ReshoUnet to achieve end-to-end optimization from low-level feature extraction to high-level semantic integration, offering an effective solution for seismic reconstruction under complex geology. The workflow of seismic data processing covers multiple links such as data restoration and inversion modeling. This study focuses on the reconstruction of seismic profile signals, aiming to obtain high-quality seismic profile signals. It should be noted that this study did not involve the inversion of geological structures and wave propagation characteristics, but the high-quality reconstruction results can lay a foundation for subsequent inversion analysis.

The structure of this paper is organized as follows: Section 1 is the introduction. Section 2 introduces the base network and the improved model. Section 3 covers data preparation and experimental results on both synthetic and official datasets. Section 4 presents the conclusions.

2. Seismic Profile Signal Reconstruction Model

2.1. Base Network

This study employs Swin-Unet as the foundational architecture, which deeply integrates the hierarchical feature fusion advantages of classic UNet with the efficient global modeling capability of Swin Transformer. Swin Transformer achieves a balance between local window partitioning and cross-window dependency modeling through the synergistic design of Window-Based Multi-Head Self-Attention (W-MSA) and Shifted Window-Based Multi-Head Self-Attention (SW-MSA), enabling hierarchical capture of multi-scale contextual information [13].

As a classic framework in image signal processing, UNet’s symmetric encoder–decoder architecture extracts multi-scale semantic features via convolutional downsampling (encoder), restores resolution through deconvolutional upsampling (decoder), and directly fuses shallow spatial details with deep semantic features via skip connections, establishing a hierarchical feature representation system [14].

Building upon UNet’s encoder–decoder framework and skip connection mechanism, Swin-Unet embeds Transformer modules into the feature extraction stage, forming a composite processing pipeline of “local feature extraction—global contextual modeling—multi-scale feature fusion”. This architecture not only inherits UNet’s sensitivity to detailed features but also enhances long-range dependency modeling through Transformer’s attention mechanisms, offering a technically sound solution for seismic profile reconstruction that balances accuracy and efficiency. The overall architecture is illustrated in Figure 1 Overall architecture diagram of Swin-Unet.

The overall architecture of Swin-Unet in this study consists of four components: encoder, bottleneck, decoder, and three groups of skip connections. The fundamental computational unit is the swin transformer block.

In the encoder stage, spatial partitioning of seismic profile images is performed to convert input data into Sequence Embeddings, with a block size set to 4 × 4 pixels [15]. This partitioning strategy calculates the feature dimension of a single image block as 4 × 4 × 3 (number of channels) = 48 dimensions, thereby transforming the original

W

×

H

× 3 input data into:

\frac{W}{4} \times \frac{H}{4} \times 48

(1)

Furthermore, the linear embedding layer is applied to project the input into any dimension:

\frac{W}{4} \times \frac{H}{4} \times C

(2)

The transformed patch tokens are processed through multi-stage Swin Transformer modules and Patch Merging layers to generate Hierarchical Feature Representations. The Patch Merging layer achieves feature dimension expansion via downsampling operations, while Swin Transformer modules are responsible for deep feature representation learning.

The Transformer decoder, inspired by the U-Net architecture, is composed of Swin Transformer modules and Patch Expanding layers. Cross-scale fusion of Context Features and multi-scale encoder features is achieved via Skip Connections, effectively compensating for spatial information loss during downsampling. In contrast to the downsampling mechanism of Patch Merging, the Patch Expanding layer performs upsampling operations, achieving 2× resolution enhancement through neighborhood dimension reshaping. The final Patch Expanding layer completes 4× upsampling to restore the input resolution (

W

×

H

), and pixel-level segmentation predictions are generated via a Linear Projection Layer.

The core architectural component of Swin-Unet is the Swin Transformer, a Transformer variant specifically optimized for visual tasks. Its key innovation lies in the synergistic design of Window-Based Multi-Head Self-Attention (W-MSA) and Shifted Window-Based Multi-Head Self-Attention (SW-MSA), which enables efficient global contextual modeling while maintaining linear computational complexity [16]. Specifically, W-MSA partitions features into non-overlapping local windows to capture intra-window dependencies while reducing self-attention computational costs, whereas SW-MSA establishes cross-window feature interactions through window position shifting, achieving effective global context fusion without significant computation overhead.

The fundamental module of Swin Transformer consists of Layer Normalization (LN), Window-Based Multi-Head Self-Attention (W-MSA) units, Shifted Window-Based Multi-Head Self-Attention (SW-MSA) units, and Multi-Layer Perceptrons (MLP). Here, LN normalizes features to provide stable inputs for subsequent attention calculations [17]. W-MSA and SW-MSA form a hierarchical feature extraction mechanism through alternating stacking, enabling joint modeling of local details and global dependencies. The MLP enhances feature representational power via nonlinear transformations. The specific architecture of this module is illustrated in Figure 2, showing the architecture of a Swin Transformer, while two consecutively stacked Swin Transformer units form a progressive contextual information aggregation system through hierarchical attention connections, as schematically depicted in Figure 3, Schematic diagram of the plates of the Swin Transformer.

Based on this window segmentation mechanism, continuous swin transformer blocks can be formulaic as

{\hat{z}}^{l} = W - M S A (L N (z^{l - 1})) + z^{l - 1}

(3)

z^{l} = M L P (L N ({\hat{z}}^{l})) + {\hat{z}}^{l}

(4)

{\hat{z}}^{l + 1} = S W - M S A (L N (z^{l})) + z^{l}

(5)

z^{l + 1} = M L P (L N ({\hat{z}}^{l + 1})) + {\hat{z}}^{l + 1}

(6)

where

z^{l - 1}

denotes the input feature matrix of the (

l - 1

)-th layer;

{\hat{z}}^{l}

and

z^{l}

represent the intermediate and output features of the

l

-th layer after

W - M S A

and

M L P

operations, respectively;

{\hat{z}}^{l + 1}

and

z^{l + 1}

are the intermediate and output features of the (

l + 1

)-th layer after

S W - M S A

and

M L P

operations; and

L N

stands for Layer Normalization.

The Window-Based Multi-Head Self-Attention (W-MSA) serves as a core component of Transformer architecture. It reduces computational complexity by partitioning the input feature map into non-overlapping windows and computing multi-head self-attention within each window. The computational process of W-MSA is as follows:

Suppose the size of the input feature map is

H \times W \times C

, and it is divided into the window of

H_{w} \times W_{w}

. The calculation formula for multi-head self-attention within each window is

W - M S A (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(7)

In self-attention mechanisms, Q (Query), K (Key), and V (Value) correspond to feature matching, similarity measurement, and information aggregation, respectively, where

d_{k}

denotes the key dimension for dot product scaling and the

s o f t m a x

function enables normalization. The Window-Based Multi-Head Self-Attention (W-MSA) controls computational complexity effectively while preserving local feature dependencies by performing attention calculations within non-overlapping local windows. As a core innovation of Swin Transformer, the Shifted Window-Based Multi-Head Self-Attention (SW-MSA) captures long-range dependencies by shifting feature windows right and down by half the window size, creating overlapping regions for cross-window pixel interaction. Although its fundamental equations align with W-MSA, the input feature map is shifted prior to computation:

S W - M S A (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(8)

The difference lies in that the input feature map of SW-MSA undergoes a shift operation before partitioning the window, thereby achieving cross-window attention calculation.

Layer Normalization (LN) is a commonly used regularization technique, which is used to accelerate training and improve the generalization ability of the model. Its formula is

L N (x) = γ ⊙ \frac{x - μ}{σ} + β

(9)

Here,

x

represents the input vector,

μ

is the input mean,

σ

is the input standard deviation, and

γ

and

β

are learnable parameters for scaling and shifting. Layer Normalization is typically applied to the input and output of Transformer modules to maintain data distribution stability. The Multi-Layer Perceptron (MLP), a nonlinear transformation module in Swin Transformer, further maps features nonlinearly. The MLP is computed as follows

M L P (x) = R e L U ({x W}_{1} + b_{1}) W_{2} + b_{2}

(10)

where

W_{1}

and

W_{2}

are weight matrices,

b_{1}

and

b_{2}

are bias vectors, and

R e L U

is the activation function. The

M L P

enhances the model’s expressive power through nonlinear transformations, enabling Swin Transformer to better model complex feature relationships. By integrating Layer Normalization (LN), Window-Based Multi-Head Self-Attention (W-MSA), Shifted Window-Based Multi-Head Self-Attention (SW-MSA), and Multi-Layer Perceptron (MLP), Swin Transformer significantly enhances global contextual modeling while maintaining computational efficiency. The synergistic effect of these components makes Swin Transformer an efficient and powerful feature extraction tool, particularly suitable for tasks such as image segmentation and signal reconstruction.

2.2. Technical Workflow for Seismic Profile Signal Reconstruction

This study proposes a seismic profile signal reconstruction method based on an improved Swin-Unet, validated through a systematic technical roadmap to demonstrate its scientific validity and engineering application value. The roadmap comprises a complete closed loop from data processing to model construction, training optimization, and performance evaluation:

(1): To verify model generalization, an official dataset and simulated datasets are combined to construct a multi-source validation framework. The preprocessing stage simulates seismic trace loss in real acquisition scenarios via decimation, enhances model robustness through slicing and data augmentation, and finally divides the dataset into training/validation/testing sets at an 8:1:1 ratio to ensure data independence and representativeness.
(2): Systematic improvements are implemented on the Swin-Unet framework to propose an innovative network model: (a) Encoder Enhancement: In the Swin-Transformer module, a hierarchical convolution structure is innovatively preposed to the front end of W-MSA operations. Composed of parallel branches with different-scale convolution kernels, this structure captures local geological features at multiple scales through multi-receptive-field feature extraction, significantly enriching feature representation and multi-scale characterization capabilities. (b) Decoder Optimization: The traditional W-MSA/SW-MSA mechanisms in Swin-Transformer modules are replaced by an ORCA (Overall Regional Coordinate Attention) module. By constructing cross-window long-range dependencies via regional coordinate mapping matrices, this module reduces computational complexity while enhancing the spatial localization accuracy of deep semantic features, providing high-resolution feature representations for seismic profile reconstruction. (c) Gradient Degradation Mitigation: A dual-channel Residual Channel Attention Mechanism (RCAM) is integrated into skip connections, and the traditional ReLU activation function is replaced with SiLU to enhance nonlinear expressiveness. This design maintains feature extraction capability while alleviating gradient degradation, significantly improving reconstruction performance in complex signal scenarios.
(3): Following data standardization and 8:1:1 dataset division, the model is optimized using a combined MSE-SSIM loss function, distributed training strategies, and hyperparameter tuning. Specifically, the batch size is set to 16; this choice balances GPU memory constraints and training stability, as larger batches caused memory overflow, while smaller batches led to unstable gradient updates for seismic signal batches. For regularization, we adopted L2 regularization with a weight decay of 1 × 10⁻⁵ in the Adam optimizer to mitigate overfitting—critical for avoiding memorization of noise in seismic signals. Early termination criteria were implemented: training stops if the validation loss does not improve for 15 consecutive epochs, balancing training sufficiency and computational efficiency. Cross-validation is employed to enhance generalization, with final performance evaluation conducted on an independent test set.
(4): Model performance is evaluated via a combination of qualitative visualization and quantitative metrics: four indicators (PSNR, SNR, RMSE, SSIM) are introduced to quantify reconstruction quality. Comparative experiments show that the proposed model significantly outperforms traditional methods in reconstructing complex geological structures, particularly in noise robustness and structural preservation capabilities. The technical roadmap is illustrated in Figure 4 Technical roadmap flowchart for seismic profile signal reconstruction.

2.3. Swin-ReshoUnet Model with Hierarchical Convolution and Regional Coordinate Attention Enhancement

The architecture of the improved Swin-ReshoUnet model as shown in Figure 5 Network structure diagram of Swin-ReshoUnet continues the encoder–decoder symmetric structure of Swin-Unet, with its core innovations embodied in the synergistic optimization of three technical dimensions: the encoder stage constructs a hierarchical feature extraction system with multi-scale receptive fields by using a hierarchical convolution module for pre-feature enhancement of Swin-Transformer units; the decoder part replaces the traditional window attention module with the Overall Regional Coordinate Attention Mechanism (ORCA) to achieve balanced optimization of computational complexity and long-range dependency modeling capabilities; the skip connection links embed a dual-channel Residual Channel Attention Mechanism (RCAM), and the dynamic allocation of feature channel weights and the upgrade of nonlinear activation function (SiLU instead of ReLU) significantly improve the gradient conduction efficiency and feature expression ability of deep networks. Through the above iterative optimizations, this architecture retains the inherent advantage of multi-scale feature fusion in U-shaped networks and forms a three-level optimization mechanism: “local feature refinement–global dependency modeling-cross–layer information enhancement”. This mechanism synergistically enhances the ability to capture fine geological features, model long-range structural correlations, and mitigate cross-layer feature mismatch, thereby providing a structured technical solution for high-fidelity reconstruction of seismic profile signals.

In the Swin-Transformer modules of the encoder, hierarchical convolution is introduced before W-MSA, while in the decoder’s Swin-Transformer modules, the original W-MSA and SW-MSA are replaced with the ORCA module. Figure 6—Structures of Swin-Transformer block, Swin HT block, and Swin OT block—illustrates the original Swin-Transformer architecture, the Swin HT module with embedded hierarchical convolution, and the Swin OT module where ORCA substitutes for W-MSA and SW-MSA. This modification enhances multi-scale feature extraction in the encoder and optimizes long-range dependency modeling in the decoder, as visually compared in the figure.

2.3.1. Hierarchical Dilated Convolutions

In the research on seismic profile data reconstruction, we adopt a hierarchical convolution approach to extract multi-scale feature information through convolutional operations at different levels. Specifically, the network architecture is divided into four stages, each employing distinct convolutional kernels and dilation rates to effectively capture features at various scales in seismic profile data [18,19,20]. This hierarchical convolution mechanism is schematically illustrated in Figure 7 Hierarchical convolution structure diagram, demonstrating how multi-level receptive fields enable comprehensive characterization of geological structures with diverse spatial resolutions.

Level 1 employs 1 × 1 convolution kernels to preserve spatial dimensions, focusing on extracting high-frequency shallow features. This operation rapidly captures detailed information, laying a foundation for subsequent feature extraction.

Level 2 utilizes 3 × 3 convolution kernels with a dilation rate of 6. This configuration expands the receptive field moderately without increasing computational cost, effectively capturing medium-scale geological structures.

Level 3 maintains 3 × 3 kernels but increases the dilation rate to 12. By further expanding the receptive field, this level captures broader contextual information, crucial for reconstructing complex geological features in seismic data.

Level 4 uses 3 × 3 kernels with a dilation rate of 18, providing the largest receptive field to capture low-frequency features distributed across the seismic profile, thereby enhancing overall reconstruction quality [21]. This hierarchical convolution design enables the network to extract multi-scale features efficiently, significantly improving its ability to characterize geological structures with varying spatial resolutions. Compared to traditional convolution, this approach balances receptive field expansion and computational efficiency while mitigating information dilution. Mathematically, the feature extraction process of hierarchical convolution can be expressed as follows.

For the

l

-th layer, the convolution operation is formulated as

F_{l} (x) = \sum_{i = 1}^{k_{l}} w_{l, i} * x_{l, i} + b_{l}

(11)

where

w_{l, i}

denotes the weight of the

i

-th convolution kernel in the

l

-th layer,

x_{l, i}

represents the input feature map,

b_{l}

is the bias term, and

k_{l}

is the number of convolution kernels in this layer [22].

Through convolutional operations at different layers, the network can gradually extract feature information from local details to global structures, enabling efficient reconstruction of seismic profile data. This hierarchical design not only enhances the model’s expressive power but also improves its adaptability to geological features at different scales, providing strong support for the accurate reconstruction of seismic profile data.

2.3.2. Overall Regional Coordinate Attention Module

In the task of seismic profile data reconstruction, the ORCA (Overall Regional Coordinate Attention) module is specifically designed to address the key limitations of traditional convolutional neural networks in this domain: their fixed receptive fields struggle to adapt to the multi-scale geological features (e.g., thin layers vs. broad formations) in seismic signals, and their limited capacity to model long-range structural correlations (e.g., fault continuity and stratum trends). Traditional methods often struggle to simultaneously capture global information in both height and width dimensions, restricting feature representation capabilities. To this end, the ORCA module performs global average pooling and max pooling along the height and width directions, respectively, to effectively capture multi-dimensional global information and enhance the comprehensiveness of feature extraction [23].

Moreover, considering the importance of disparities of features across different channels and spatial locations, the ORCA module introduces an attention mechanism. By generating attention maps for height and width dimensions, it weights the input feature maps to enhance key features and suppress trivial ones [24,25]. To tackle the computational complexity and memory overhead from high-resolution images and large-scale data, the ORCA module employs grouped processing, dividing input feature maps by channel count to reduce per-group computation while maintaining feature diversity.

By integrating multi-dimensional global information with attention mechanisms, ORCA not only leverages global context from both height and width but also boosts the model’s ability to capture complex geological structures in seismic profiles [26], significantly improving reconstruction accuracy and robustness. This design enables the ORCA module to excel in seismic profile reconstruction, efficiently handling multi-scale features for high-quality results. The architecture of the ORCA module is illustrated in Figure 8 ORCA module structure diagram.

The ORCA module enhances feature representation by capturing global information along both height and width dimensions of the feature map, generating corresponding attention maps to weight the input feature maps. This design enables ORCA to focus on critical regions, improving the attention mechanism’s effectiveness and ultimately achieving higher-quality feature extraction and model performance [27]. The module consists of five main components, which are denoted by different colors in the structure diagram.

The first component is feature map grouping, represented by the yellow box in the diagram. For the input feature map,

X \in R^{B \times C \times H \times W}

(12)

The input feature map is divided into

G

groups along the channel dimension, with each group containing

C / G

channels, where

B

denotes the batch size,

C

the number of channels, and

C

,

W

the height and width of the feature map. The grouped feature maps are denoted as

X \in R^{B \times G \times \frac{C}{G} \times H \times W}

(13)

The second component is global pooling processing, represented by the green box in the diagram, where global average pooling and global max pooling operations are performed on the grouped feature maps along the height and width directions, respectively:

X_{h . a v g} = A v g P o o l (X) \in R^{B \times G \times \frac{C}{G} \times H \times 1}

(14)

X_{h . m a x} = M a x P o o l (X) \in R^{B \times G \times \frac{C}{G} \times H \times 1}

(15)

X_{w, a v g} = A v g P o o l (X) \in R^{B \times G \times \frac{C}{G} \times 1 \times W}

(16)

X_{w . m a x} = M a x P o o l (X) \in R^{B \times G \times \frac{C}{G} \times 1 \times W}

(17)

Here,

X

is input feature map,

X_{h . a v g}

and

X_{h . m a x}

are horizontal average/max pooling results, with spatial dimension compressed along the width.

X_{w, a v g}

and

X_{w . m a x}

are vertical average/max pooling results, with spatial dimension compressed along the height. The third component is shared convolutional layers, indicated by the red box in the diagram. For each grouped feature map, we apply shared convolutional layers for feature processing. The shared convolutional layers consist of two 1 × 1 convolutional layers, batch normalization layers, and ReLU activation functions, which are used to reduce and restore the channel dimensions:

Y_{h, a v g} = C o n v (X_{h, a v g})

(18)

Y_{h, m a x} = C o n v (X_{h, m a x})

(19)

Y_{w, a v g} = C o n v (X_{w, a v g})

(20)

Y_{w, m a x} = C o n v (X_{w, m a x})

(21)

Here,

C o n v

is convolution operation, used to refine spatial features

Y_{h, a v g}

and

Y_{h, m a x}

are the convolved results of horizontal average/max pooling features.

Y_{w, a v g}

and

Y_{w, m a x}

are the convolved results of vertical average/max pooling features, the tensor dimensions of

Y

remain consistent with the corresponding

X

. The fourth component is attention weight computation, shown as the blue box in the diagram. By summing the outputs of convolutional layers and applying the Sigmoid activation function (where

σ

denotes the Sigmoid function), attention weights for both height and width directions are generated:

A_{h} = σ (Y_{h, a v g} + Y_{h, m a x}) \in R^{B \times G \times \frac{C}{G} \times H \times 1}

(22)

A_{w} = σ (Y_{w, a v g} + Y_{w, m a x}) \in R^{B \times G \times \frac{G}{G} \times 1 \times W}

(23)

Here,

A_{h}

and

A_{w}

are horizontal/vertical attention maps, which assign weights to different spatial regions.

σ

is activation function, used to normalize attention weights to the range [0, 1].

Y_{h, a v g} + Y_{h, m a x}

and

Y_{w, a v g} + Y_{w, m a x}

are fusions of convolved average and max pooling features, integrating global context and local saliency. The tensor dimensions of

A_{h} / A_{w}

are consistent with the input pooling results, ensuring compatibility with the original feature map for recalibration. The fifth component is the application of attention weights, denoted by the purple section in the diagram. The input feature maps are weighted by the computed attention weights to obtain the output feature maps:

O = X \times A_{h} \times A_{w} \in R^{B \times C \times H \times W}

(24)

Here, the attention weights

A_{h}

and

A_{w}

are expanded along the height and width dimensions, respectively, to match the spatial dimensions of the input feature map.

2.3.3. Residual Channel Attention Mechanism Module

After incorporating hierarchical convolution in the encoder and replacing it with the ORCA module in the decoder, we observed gradient vanishing. To address this, a Residual Channel Attention Mechanism (RCAM) was introduced into the skip connections of Swin-Unet [28]. The residual structure resolves gradient vanishing via identity mapping, enabling the network to learn input-output differences rather than directly fitting complex transformations. In seismic signal reconstruction, this enhances high-frequency detail capture and mitigates performance degradation in deep networks. The standard residual block is formulated as

y = x + F (x; Θ)

(25)

where

x

denotes the input feature map,

y

the output feature map, and

F (x; Θ)

represents the residual function with

Θ

as its parameters.

Channel attention mechanism enhances feature representation by learning importance weights for each channel of the feature map, suppressing irrelevant channels and enhancing target channels [29]. In seismic profile reconstruction, this mechanism adaptively weights signal channels with different frequencies and phases, improving the signal-to-noise ratio of reconstructed signals.

Traditional channel attention typically employs only global average pooling (GAP) to aggregate channel information, whereas RCAM introduces global max pooling (GMP) to form a dual-channel structure, capturing both global statistical features and salient characteristics of channels [30,31,32]. Specifically, global average pooling calculates the mean value of all spatial locations within a channel, reflecting the overall response intensity of the channel:

z_{c}^{a v g} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} x_{c} (i, j)

(26)

Global max pooling highlights local peaks within channels by extracting the maximum values.

z_{c}^{m a x} = \overset{H}{\underset{i = 1}{m a x}} \overset{W}{\underset{j = 1}{m a x}} x_{c} (i, j)

(27)

where

x_{c} (i, j)

denotes the feature value of the

c

-th channel at position

(i, j)

, and

H \times W

represents the spatial dimensions of the feature map. Finally, the dual-channel outputs are concatenated and processed through a Multi-Layer Perceptron (MLP) to generate channel weights:

z = [z^{a v g}, z^{m a x}] \in R^{2 C}

(28)

a = σ (W_{1} (W_{0} z)) \in R^{C}

(29)

Here

W_{0} \in R^{\frac{C}{r} \times 2 C}

and

W_{1} \in R^{C \times \frac{C}{r}}

are MLP weight matrices,

r

is the dimensionality reduction ratio, and

σ

is the Sigmoid activation function that normalizes the weights to [0, 1]. The final channel attention output is

x_{a t t n} = a ⊙ x

(30)

where

⊙

denotes channel-wise multiplication.

The schematic diagram of the channel attention mechanism is shown in Figure 9, Schematic diagram of channel attention mechanism.

In the skip connections of Swin UNet, RCAM (Residual Channel Attention Mechanism) can be embedded either inside the residual blocks or within the skip paths, whose core structure is

y = x + A (F (x; Θ))

(31)

where

F (x; Θ)

is the residual function, and

A (\cdot)

is the channel attention module that performs channel-wise weighting on residual features.

The complete procedures include residual feature extraction, dual-channel global pooling, channel attention weight computation, and attention weighting with residual connection, which are expressed by Equations (31)–(34), respectively:

f = F (x; Θ) = C o n v BNSiLU (x)

(32)

z_{c}^{a v g} = \frac{1}{H \times W} \sum_{i, j} f_{c} (i, j), z_{c}^{m a x} = \max_{i, j} f_{c} (i, j)

(33)

z = C o n c a t (z^{a v g}, z^{m a x}), a = σ (M L P (z))

(34)

y = x + (a ⊙ f)

(35)

The schematic diagram of the residual channel attention mechanism is shown in Figure 10, Structure diagram of the residual channel attention mechanism.

3. Experimental Process and Results

3.1. Data Preparation

To rigorously validate the model’s effectiveness and generalization ability, this study constructs a hybrid dataset system encompassing both official standard datasets and synthetic datasets. Specifically, synthetic seismic profile signals are generated using the precise modeling capabilities of MATLAB (2019b), while official seismic data is sourced from the authoritative Open SEG-Y platform—a benchmark repository in the field of seismic data that provides reliable real-world data for experimentation.

During data preparation, this study prioritizes diversity in scales and frequencies of seismic signals, selecting two distinct seismic profile datasets for model training and validation:

Simulated dataset: 6 s duration, 9030 horizontal traces, 4 ms vertical sampling interval; 250 Hz sampling frequency, effective frequency 10–100 Hz. Official dataset: 9.2 s duration, 400,000 horizontal traces, 8 ms vertical sampling interval; 125 Hz sampling frequency, effective frequency 5–50 Hz.

Their selection stems from complementary scale and frequency traits: the simulated dataset excels in temporal resolution and high-frequency coverage (reflecting fine medium-to-high frequency geological signals), while the official dataset is representative in horizontal coverage and low-frequency traits (embodying macro medium-to-low frequency geological scales). Together, they cover common ranges in seismic exploration, providing diverse training inputs to enhance the model’s adaptability and generalization in practical trace restoration.

Considering the prevalent issue of missing seismic traces in actual exploration, this research systematically simulates data loss scenarios under natural acquisition conditions by setting three different decimation ratios: 0.1, 0.3, and 0.5. Figure 11, The original data and their corresponding decimation ratios of 0.1, 0.2, 0.3, visually compares the original intact seismic profile with its decimated counterparts, providing a visual demonstration of the characteristic changes in signals under varying degrees of data loss.

In the experimental demonstration section, given the universality of the proposed method across both dataset types, the official dataset is prioritized as a representative case to illustrate the processing workflow. This strategy mitigates information redundancy from repetitive demonstrations while maintaining a rigorous focus on the core methodology. By conducting an in-depth analysis of this representative case, we systematically validate the algorithm’s consistent superiority across diverse data sources, thereby establishing a robust empirical and theoretical foundation for subsequent model evaluations.

To meet the sample requirements for model training and enhance dataset diversity, this study employs a multi-dimensional data augmentation strategy involving seven transformation operations: vertical flipping, 90° counterclockwise rotation, 90° rotation followed by vertical flipping, 180° rotation, 180° rotation followed by flipping, 270° rotation, and 270° rotation followed by flipping. Through this combination of techniques, a total of 5520 seismic data slices were generated.

During data partitioning, the conventional practices of medium-sized datasets in the field of seismic profile signal reconstruction were referred to in order to balance the sufficiency of training data and the stability of evaluation. A stratified random split with an 8:1:1 ratio was applied to construct training, testing, and validation sets. Figure 12, The original data slice and its corresponding thinned slice, presents randomly selected examples of augmented seismic slices, visually illustrating the feature distribution of the enhanced dataset. This systematic data expansion approach effectively enriches the training samples, providing a robust foundation for subsequent algorithmic performance validation.

3.2. Model Training

The experiments were implemented in Python using PyCharm 2023.1.3 with Python 3.10.0, conducted under a Windows 11 environment. The hardware configuration included an Intel Core i9-14900K 3.20 GHz CPU (Intel, Santa Clara, CA, USA) and an NVIDIA GeForce RTX 3090 GPU (NVIDIA, Santa Clara, CA, USA). The implementation utilized PyTorch 1.13.1 with compatible CUDA 11.7.

Training was performed over 200 epochs at decimation ratios of 0.1, 0.3, and 0.5, with a learning rate of 0.0001, determined as the optimal value via validation set tests across candidate values (including Adam’s default 0.001). Evaluation metrics included SNR, PSNR, MRSE, and SSIM. To enhance the interpretability of these metrics in result assessment, Table 1, Explanation of evaluation metrics, provides detailed descriptions of the four evaluation metrics. To clearly demonstrate each model’s performance under different noise conditions, observations were conducted at each epoch to visualize model performance dynamics.

3.3. Experimental Results Based on Simulated Seismic Profile Signals

First, the reconstruction results under a decimation ratio of 0.1 using synthetic seismic profile signals are presented. To enhance the persuasiveness of the results and provide a clearer visualization of the reconstruction effects, residual signals and F-K spectra are also included. Figure 13 Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.1 illustrates the reconstruction results, residual signals, and F-K spectra of various networks at a decimation ratio of 0.1. This image, as well as the subsequent result images, can be enlarged to view the details. Here, WT denotes wavelet transform, while Unet and Unet++ are self-explanatory. For esthetic consistency in result presentation, TranUnet is abbreviated as TUnet, SwinUnet as SUnet, and Swin-ReshoUnet as SRUnet. These abbreviations are used consistently throughout the subsequent analyses.

To enhance the credibility and reliability of experimental conclusions, SSIM is selected as the evaluation metric to quantify the similarity between reconstructed and original images. For the scenario with a decimation ratio of 0.1, the specific data of the four evaluation metrics are presented in Table 2, Evaluation indicators of each model when the decimation ratio is 0.1. Here, the confidence intervals of each index of the model SRUnet used in this study have been added, and they will remain consistent in the subsequent tables. As previously mentioned, this experiment involved over 200 training epochs. To intuitively analyze the performance of different networks, performance curves were plotted for each epoch. For esthetic clarity and to reduce noise, smoothing processing was applied using Origin software (2022) with a window size of 40 points, and the relevant results are shown in Figure 14, Changes in evaluation indicators for each model when the decimation ratio is 0.1.

Next, the reconstruction results of seismic profile signals at a decimation ratio of 0.3 are presented. To fully validate the model performance and visually demonstrate the reconstruction quality, in addition to presenting the reconstruction results, residual signals and F-K spectral analysis are simultaneously introduced.

The reconstruction results, residual signal distributions, and F-K spectral characteristics of each network at a decimation ratio of 0.3 are shown in Figure 15, Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.3.

For the experimental scenario with a decimation ratio of 0.3, detailed data of the four evaluation metrics are presented in Table 3, Evaluation indicators of each model when the decimation ratio is 0.3. As previously described, this experiment completed over 200 rounds of iterative training (epoch).

To clearly demonstrate the dynamic performance of different networks, performance change curves were plotted by recording training data of each epoch. To optimize visualization, enhance data readability, and reduce fluctuation interference, the curves were subjected to smoothing filtering using Origin software with a window size of 40 points. The relevant results are shown in Figure 16, Changes in evaluation indicators for each model when the decimation ratio is 0.3.

For the extreme sampling scenario with a decimation ratio as high as 0.5, the seismic profile reconstruction performance of each network is presented below. To comprehensively evaluate the model’s recovery capability under severe data loss conditions, the results include three visualization dimensions—the intuitive restoration effect of the reconstructed profile, the point-wise deviation distribution of residual signals, and the energy aggregation characteristics of F-K spectra—as shown in Figure 17 Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.5.

Notably, performance differentiation among different networks becomes more pronounced under high decimation ratios. Through the joint analysis of residual maps and spectral plots, the differences in structural preservation capability among various models can be clearly observed.

For the rigorous experimental scenario with a decimation ratio of 0.5, detailed data of the four evaluation metrics are compiled in Table 4, Evaluation indicators of each model when the decimation ratio is 0.5. As previously stated, this experiment involved over 200 rounds of iterative training (epoch).

To multi-dimensionally present the dynamic performance evolution of different networks, performance fluctuation curves were constructed by collecting training data from each epoch. Considering the intensified data fluctuations under high decimation ratios, a 40-point window moving average filtering was applied to the curves using Origin software. This processing effectively suppresses random noise while preserving trend characteristics, and the optimized visualization results are shown in Figure 18, Changes in evaluation indicators for each model when the decimation ratio is 0.

3.4. Experimental Results Based on Office Seismic Profile Signals

Based on the official seismic exploration dataset, reconstruction experiments were first conducted under a decimation ratio of 0.1. To fully validate the model’s applicability to real-world data, a three-dimensional presentation framework of “reconstruction results–error analysis–frequency-domain characteristics” is adopted: the reconstructed profiles visually reflect the recovery quality of stratigraphic structures, residual signal maps quantify the prediction deviations of measured data, and F-K spectra reveal the degree of frequency-domain preservation of real signals. Comparisons among different networks are shown in Figure 19, Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.1. Model abbreviation conventions remain consistent with those in the synthetic experiments to facilitate cross-comparison between the two datasets.

To enhance the authority and empirical validity of the experimental conclusions, PSNR, SNR, RMSE, and SSIM were employed as core metrics to quantify the similarity between reconstructed official seismic data and original profiles. For the field measurement scenario with a decimation ratio of 0.1, the measured values of the four evaluation metrics are tabulated in Table 5, Evaluation indicators of each model when the decimation ratio is 0.1. As previously described, over 200 epochs of iterative training were conducted using real exploration data.

To visually characterize the performance dynamics of different networks in practical scenarios, performance evolution curves were constructed by recording training data at each epoch. A 40-point window smoothing was applied via Origin software to preserve trend characteristics while optimizing visual clarity, with results presented in Figure 20, Changes in evaluation indicators for each model when the decimation ratio is 0.

Based on the official seismic exploration dataset, reconstruction experiments with a decimation ratio of 0.3 were conducted, employing a three-dimensional framework of “reconstruction results–error analysis–frequency-domain characteristics” to validate model applicability. The reconstructed profiles visually demonstrate the recovery quality of stratigraphic structures, residual signal maps quantify the deviations between measured and predicted data, and F-K spectra reveal the preservation degree of frequency-domain features. Comparative results of different networks are shown in Figure 21, Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.3.

To strengthen empirical support, reconstruction experiments with a decimation ratio of 0.3 were conducted using official seismic data. Four core metrics—PSNR, SNR, RMSE, and SSIM—quantified reconstruction similarity, with measured data in Table 6, Evaluation indicators of each model when the decimation ratio is 0.3.

Over 200 training epochs on real data enabled performance curve construction from epoch records. Curves were smoothed with a 40-point window in Origin to retain trends and clarity, shown in Figure 22, Changes in evaluation indicators for each model when the decimation ratio is 0.3.

Using the official seismic dataset, extreme sparse data reconstruction (decimation ratio 0.5) was evaluated via a multidimensional framework: seismic profiles assess stratigraphic interface accuracy and structural feature restoration; residual maps quantify pixel-wise amplitude errors to characterize distributions; F-K spectra analyze energy retention and noise suppression in the frequency-wavenumber domain. Model abbreviations align with synthetic experiments for cross-dataset comparability. This extreme sparsity test rigorously validates the model’s data loss resilience and high-frequency recovery efficiency in real exploration, with network comparisons shown in Figure 23, Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.5.

To solidify empirical validity, extreme sparse reconstruction (decimation ratio 0.5) was conducted using the official seismic dataset, employing gold-standard metrics—PSNR, SNR, RMSE, SSIM—to quantify similarity to original profiles. Measured data are shown in Table 7, Evaluation indicators of each model when the decimation ratio is 0.5. Over 200 real-data epochs constructed performance curves from training records, characterizing dynamics under high sparsity; curves were smoothed with a 40-point window in Origin to retain trends and reduce noise, with results shown in Figure 24, Changes in evaluation indicators for each model when the decimation ratio is 0.5. Model naming followed synthetic experiment conventions for cross-comparison, and this design systematically validated high-frequency information capture and noise robustness via metric analysis under extreme data loss.

To verify the impact of each model in the improved network on the overall network, we conducted an ablation experiment in the official dataset when the dilution ratio was 0.5. It can be seen that when individual modules are added, the performance will undergo certain changes, which are marked with arrows in the table. When two or more modules are added, the performance begins to optimize. The table of the ablation experiment is shown in Table 8.

Thus, we completed the seismic profile signal reconstruction research using the improved Swin-ReshoUnet on both synthetic and official datasets, and the following elaborates on the research results. The study focuses on modifying the encoder and decoder structures of SwinUnet: hierarchical convolutions are inserted before the W-MSA attention module in the encoder to capture more multi-scale features through convolutional operations; in the decoder, the innovative ORCA attention mechanism replaces the original W-MSA and SW-MSA modules, enhancing feature extraction efficiency while significantly reducing model parameters. To demonstrate that the proposed model not only enhances performance but also maintains high efficiency, Table 9 provides the parameter count and GPU usage of each model. Since SRUnet employs grouped processing, the GPU usage is significantly reduced, and the parameter count remains within a relatively small range. To address gradient vanishing issues arising from these structural changes, a novel residual channel attention mechanism is integrated into skip connections, adopting dual-path residual links with global max and average pooling and replacing the traditional ReLU with SiLU activation function, which effectively mitigates gradient degradation.

Experimental results across both datasets show that all networks exhibit satisfactory reconstruction performance at a decimation ratio of 0.1. However, as the decimation ratio increases—particularly at the extreme ratio of 0.5—the traditional wavelet transform demonstrates poor performance, while Unet and Unet++ show moderate improvements. TransUnet and SwinUnet achieve better results but still struggle with incomplete reconstruction of certain seismic traces. In contrast, the proposed Swin-ReshoUnet not only delivers the best visual reconstruction quality in seismic profiles but also outperforms other models in quantitative metrics including PSNR, SNR, RMSE, and SSIM. Residual signal analysis and F-K spectral comparisons further confirm that its reconstructed signals exhibit the highest similarity to original seismic profiles, demonstrating superior preservation of stratigraphic structures and frequency-domain features.

4. Conclusions

This study proposes a novel network architecture for seismic profile signal reconstruction. In contrast to traditional reconstruction networks and conventional neural architectures, the Swin-ReshoUnet model is meticulously constructed upon the SwinUnet framework through the integration of three key enhancements: (i) hierarchical convolution modules for adaptive multi-scale feature extraction, (ii) the ORCA (Overall Regional Coordinate Attention) mechanism to model long-range structural dependencies, and (iii) residual channel attention modules to emphasize geologically salient features. The network’s core advantages lie in the following: significantly enhancing model representation capabilities for complex visual tasks through multi-dimensional global context capture and feature enhancement mechanisms; adopting channel grouping strategies to reduce feature map computational complexity while preserving feature diversity; and leveraging residual attention mechanisms to effectively mitigate gradient vanishing issues in deep network training.

Experimental results not only present reconstruction outcomes of different networks but also enhance persuasiveness through residual signal analysis and F-K spectral comparisons. Four quantitative metrics—PSNR, SNR, RMSE, and SSIM—are introduced for systematic performance evaluation. Data show that Swin-ReshoUnet demonstrates significant advantages in seismic profile reconstruction, outperforming comparative networks in all metrics. Although all models exhibit performance degradation with increasing sparsity, Swin-ReshoUnet maintains optimal performance at the same sparsity ratio, validating its robust reconstruction capability.

Notwithstanding these strengths, the current study has limitations that warrant attention. First, the sparsity ratios tested (0.1, 0.3, 0.5) are relatively moderate; performance under extremely high sparsity (>0.7) or non-uniform sparse patterns (e.g., clustered missing data) remains unvalidated. Second, evaluation relies primarily on general image quality metrics, lacking in-depth validation against geological interpretation needs (e.g., reservoir continuity, log correlation). Third, the model’s robustness in handling seismic data with strong noise or complex fault structures requires further verification.

Notably, while this study only demonstrates reconstruction results at sparsity ratios of 0.1, 0.3, and 0.5, the network architecture’s design characteristics suggest it can maintain good performance under higher sparsity or specific sparse patterns. This study verified the effectiveness of signal reconstruction, but has not yet involved the simulation of wave propagation in different strata. However, the reconstructed data provides a reliable foundation for subsequent in-depth research. In the future, research on formation wave propagation and inversion will be carried out based on this to respond to the industry’s demand for geological analysis.

Author Contributions

Conceptualization, J.R.; methodology, J.R. and C.X.; validation, J.R. and M.C.; formal analysis, X.S.; investigation, X.H. and X.Z.; data curation, S.L.; writing—original draft preparation, J.R. and X.H.; writing—review and editing, X.S.; visualization, J.R.; supervision, X.D. and M.C.; project administration, M.C.; funding acquisition, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is partly supported by the Sichuan province natural science fund project (2025ZNSFSC0477,2024NSFSC2042), The Opening Fund of Intelligent Perception and Control Key Laboratory of Sichuan Province (2024RYY01), and the Innovation Fund of Postgraduate, Sichuan University of Science and Engineering under Grant No. Y2024288.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting the findings of this study are subject to privacy and therefore cannot be shared publicly. However, interested parties may contact the corresponding author for further details.

Conflicts of Interest

All authors declare no competing financial interests or personal relationships that could have influenced the work reported in this study.

References

Zucca, M.; Crespi, P.; Tropeano, G.; Simoncelli, M. On the Influence of Shallow Underground Structures in the Evaluation of the Seismic Signals. Ing. Sismica 2021, 38, 23–36. [Google Scholar]
Iqbal, N. Unveiling Strength: Efficient Recovery of Reflectivity From Coarsely Quantized Seismic Data. IEEE Trans. Geosci. Remote Sens. 2025, 63, 1000907. [Google Scholar] [CrossRef]
Liu, J.L.; Cao, J.X.; Zhao, L.S.; You, J.C.; Li, H. Super-Resolution Reconstruction of Seismic Images Based on Deep Residual Channel Attention Mechanism. IEEE Access 2024, 12, 149032–149044. [Google Scholar] [CrossRef]
Tang, H.H.; Cheng, S.J.; Li, W.Q.; Mao, W.J. Simultaneous reconstruction and denoising for DAS-VSP seismic data by RRU-net. Front. Earth Sci. 2023, 10, 993465. [Google Scholar] [CrossRef]
Sun, Q.F.; Feng, Y.L.; Du, Q.Z.; Gong, F.M. Removing random noise and improving the resolution of seismic data using deep-learning transformers. Geophys. Prospect. 2025, 73, 611–627. [Google Scholar] [CrossRef]
Wang, C.; Huang, X.G.; Li, Y.; Jensen, K. Removing multiple types of noise of distributed acoustic sensing seismic data using attention-guided denoising convolutional neural network. Front. Earth Sci. 2023, 10, 986470. [Google Scholar] [CrossRef]
Cheng, M.; Lu, S.P.; Dong, X.T.; Zhong, T. Multiscale recurrent-guided denoising network for distributed acoustic sensing-vertical seismic profile background noise attenuation. Geophysics 2023, 88, WA201–WA217. [Google Scholar] [CrossRef]
Saengduean, P.; Snieder, R.; Wakin, M.B. Multi-source wavefield reconstruction combining interferometry and compressive sensing: Application to a linear receiver array. Geophys. J. Int. 2023, 235, 2007–2019. [Google Scholar] [CrossRef]
Ma, H.T.; Ba, R.J.; Tian, Y.N.; Li, Y.; Wu, N. Learnable dual attention fusion network for borehole distributed acoustic sensing systems data reconstruction. Geophysics 2024, 89, D315–D327. [Google Scholar] [CrossRef]
Jin, Z.Y.; Hou, H.S.; Fu, W.; Zhang, P.; An, D.Z.; Hu, Y. Joint passive seismic imaging based on surface wave inversion and reflection wavefield retrieval: A case study in the sedimentary basin areas adjacent to Well Songke-2. J. Appl. Geophys. 2023, 208, 104898. [Google Scholar] [CrossRef]
Tian, Y.N.; Lin, T.; Li, Y.; Wu, N. Efficient SPSNet for Downhole Weak DAS Signals Recovery. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7508305. [Google Scholar] [CrossRef]
Feng, Q.K.; Wang, S.G.; Li, Y. Analysis of DAS Seismic Noise Generation and Elimination Process Based on Mean-SDE Diffusion Model. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5905613. [Google Scholar] [CrossRef]
Kumar, S.S.; Kumar, R.S.V. Transformer Skip-Fusion Based SwinUNet for Liver Segmentation From CT Images. Int. J. Imaging Syst. Technol. 2024, 34, e23126. [Google Scholar] [CrossRef]
Xiang, C.; Chen, A.R.; Li, H.; Wang, D.L.; Ge, B.X.; Chang, H.C. Two stage multiobjective topology optimization method via SwinUnet with enhanced generalization. Sci. Rep. 2025, 15, 9350. [Google Scholar] [CrossRef]
Jiang, L.F.; Hu, J.N.; Huang, T.Y. Improved SwinUNet with fusion transformer and large kernel convolutional attention for liver and tumor segmentation in CT images. Sci. Rep. 2025, 15, 14286. [Google Scholar] [CrossRef]
Aghamohammadesmaeilketabforoosh, K.; Parfitt, J.; Nikan, S.; Pearce, J.M. From blender to farm: Transforming controlled environment agriculture with synthetic data and SwinUNet for precision crop monitoring. PLoS ONE 2025, 20, e0322189. [Google Scholar] [CrossRef]
Shen, Z.R.; Liu, W.J.; Xu, S. DS-SwinUNet: Redesigning Skip Connection With Double Scale Attention for Land Cover Semantic Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4382–4395. [Google Scholar] [CrossRef]
Islam, M.S.; Hasan, K.F.; Sultana, S.; Uddin, S.; Lio, P.; Quinn, J.M.W.; Moni, M.A. HARDC: A novel ECG-based heartbeat classification method to detect arrhythmia using hierarchical attention based dual structured RNN with dilated CNN. Neural Netw. 2023, 162, 271–287. [Google Scholar] [CrossRef]
Zhou, W.J.; Liu, W.Y.; Lei, J.S.; Luo, T.; Yu, L. Deep Binocular Fixation Prediction Using a Hierarchical Multimodal Fusion Network. IEEE Trans. Cogn. Dev. Syst. 2023, 15, 476–486. [Google Scholar] [CrossRef]
Yao, L.S. ADHN: Sentiment Analysis of Reviews for MOOCs of Dilated Convolution Neural Network and Hierarchical Attention Network Based on ALBERT. IEEE Access 2024, 12, 98352–98366. [Google Scholar] [CrossRef]
Zhang, X.Y.; Chen, M.J.; Liu, F.; Li, S.Y.; Rao, J.; Song, X.F. MSSSHANet: Hyperspectral and multispectral image fusion algorithm based on multi-scale spatial-spectral hybrid attention network. Meas. Sci. Technol. 2025, 36, 035407. [Google Scholar] [CrossRef]
Chen, M.J.; Yi, S.H.; Lan, Z.X.; Duan, Z.X. An Efficient Image Deblurring Network with a Hybrid Architecture. Sensors 2023, 23, 7260. [Google Scholar] [CrossRef]
Jiang, W.P.; Wang, J.; Li, Z.; Li, W.D.; Yuan, P. A new deep self-attention neural network for GNSS coordinate time series prediction. Gps Solut. 2024, 28, 3. [Google Scholar] [CrossRef]
Wang, Y.Z.; Sun, Y.F.; Zhang, P.; Wu, Y.L.; Yang, H. A MultiScale Coordinate Attention Feature Fusion Network for Province-Scale Aquatic Vegetation Mapping From High-Resolution Remote Sensing Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 1752–1765. [Google Scholar] [CrossRef]
Xu, F.Y.; Wang, R.F.; Huang, Y.L.; Mao, D.Q.; Yang, J.Y.; Zhang, Y.C.; Zhang, Y. MuA-SAR Fast Imaging Based on UCFFBP Algorithm with Multi-Level Regional Attention Strategy. Remote Sens. 2023, 15, 5183. [Google Scholar] [CrossRef]
Zhong, X.F.; Lu, Y.; Zhong, Z.Q. Did regional coordinated development policy mitigate carbon emissions? Evidence from the Beijing-Tianjin-Hebei region in China. Environ. Sci. Pollut. Res. 2023, 30, 108992–109006. [Google Scholar] [CrossRef]
Wang, Z.W. Coordinated development of regional resources, environment, and economic growth under the background of low-carbon economy. Front. Ecol. Evol. 2023, 11, 1181990. [Google Scholar] [CrossRef]
Fu, C.; Du, B.; Zhang, L.P. ReSC-net: Hyperspectral Image Classification Based on Attention-Enhanced Residual Module and Spatial-Channel Attention. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5518615. [Google Scholar] [CrossRef]
Peng, C.M.; Shu, P.; Huang, X.Y.; Fu, Z.Z.; Li, X.F. LCRCA: Image super-resolution using lightweight concatenated residual channel attention networks. Appl. Intell. 2022, 52, 10045–10059. [Google Scholar] [CrossRef]
Dai, Q.; Cheng, X.; Zhang, L. Image denoising using channel attention residual enhanced Swin Transformer. Multimed. Tools Appl. 2024, 83, 19041–19059. [Google Scholar] [CrossRef]
Du, W.L.; Guo, Z.; Gong, X.Y.; Pu, Z.Q.; Li, C. Efficient channel attention residual learning for the time-series fault diagnosis of wind turbine gearboxes. Meas. Sci. Technol. 2024, 35, 015118. [Google Scholar] [CrossRef]
Liu, Y.P.; Yang, D.Z.; Zhang, F.; Xie, Q.S.; Zhang, C.M. Deep recurrent residual channel attention network for single image super-resolution. Vis. Comput. 2024, 40, 3441–3456. [Google Scholar] [CrossRef]

Figure 1. Overall architecture diagram of Swin-Unet.

Figure 2. The architecture of a Swin Transformer.

Figure 3. Schematic diagram of the plates of the Swin Transformer.

Figure 4. Technical roadmap flowchart for seismic profile signal reconstruction.

Figure 5. Network structure diagram of Swin-ReshoUnet.

Figure 6. Structures of Swin-Transformer block, Swin HT block, and Swin OT block.

Figure 7. Hierarchical convolution structure diagram.

Figure 8. ORCA module structure diagram.

Figure 9. Schematic diagram of channel attention mechanism.

Figure 10. Structure diagram of the residual channel attention mechanism.

Figure 11. The original data and their corresponding decimation ratios of 0.1, 0.2, 0.3.

Figure 12. The original data slice and its corresponding thinned slice.

Figure 13. Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.1.

Figure 14. Changes in evaluation indicators for each model when the decimation ratio is 0.1.

Figure 15. Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.3.

Figure 16. Changes in evaluation indicators for each model when the decimation ratio is 0.3.

Figure 17. Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.5.

Figure 18. Changes in evaluation indicators for each model when the decimation ratio is 0.5.

Figure 19. Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.1.

Figure 20. Changes in evaluation indicators for each model when the decimation ratio is 0.1.

Figure 21. Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.3.

Figure 22. Changes in evaluation indicators for each model when the decimation ratio is 0.3.

Figure 23. Reconstruction results, residual signals, and corresponding F-K spectra under a decimation ratio of 0.5.

Figure 24. Changes in evaluation indicators for each model when the decimation ratio is 0.5.

Table 1. Explanation of evaluation metrics.

Metrics	Mathematical Expression	Explanation
SNR	$S N R = 10 \cdot {l o g}_{10} (\frac{\sum_{i = 1}^{N} s_{i}^{2}}{\sum_{i = 1}^{N} {(s_{i} - {\hat{s}}_{i})}^{2}})$	Ratio of signal strength to noise strength, where $s_{i}$ is the pre-denoising signal, ${\hat{s}}_{i}$ is the post-denoising signal, and $N$ is the total signal length.
PSNR	$P S N R = 10 \cdot {l o g}_{10} (\frac{P_{m a x}^{2}}{M S E})$	Discrepancy between the denoised and original signals, where $P_{m a x}$ is the maximum signal value, and $M S E$ is the mean squared error.
RSME	$R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(s_{i} - {\hat{s}}_{i})}^{2}}$	Difference between the denoised and original signals, where $s_{i}$ is the pre-denoising signal, ${\hat{s}}_{i}$ is the post-denoising signal, and $N$ is the total signal length.
SSIM	$S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}$	Structural similarity between denoised and original signals, where $μ_{x}$ and $μ_{y}$ are their mean values, $σ_{x}^{2}$ and $σ_{y}^{2}$ are their variances, $σ_{x y}$ is their covariance, and $C_{1}$ and $C_{2}$ are constants for stability.

Table 2. Evaluation indicators of each model when the decimation ratio is 0.1.

Network	PSNR	SNR	RMSE	SSIM
WT	25.013	12.656	3.352	0.874
Unet	26.821	14.074	1.937	0.905
UNet++	26.933	14.534	1.826	0.897
TUnet	28.451	15.342	1.102	0.914
SUnet	28.172	15.462	1.035	0.911
SRUnet	30.495 ± 0.108	16.436 ± 0.123	0.894 ± 0.024	0.937 ± 0.003
SRUnet	[30.387, 30.603]	[16.313, 16.559]	[0.870, 0.918]	[0.934, 0.940]

Table 3. Evaluation indicators of each model when the decimation ratio is 0.3.

Network	PSNR	SNR	RMSE	SSIM
WT	21.351	11.362	3.954	0.835
Unet	23.375	13.466	2.374	0.872
UNet++	24.035	13.526	2.742	0.881
TUnet	26.053	14.462	1.543	0.894
SUnet	26.351	14.363	1.337	0.898
SRUnet	29.232 ± 0.115	15.301 ± 0.139	0.943 ± 0.027	0.907 ± 0.004
SRUnet	[29.117, 29.347]	[15.162, 15.440]	[0.916, 0.970]	[0.903, 0.911]

Table 4. Evaluation indicators of each model when the decimation ratio is 0.5.

Network	PSNR	SNR	RMSE	SSIM
WT	19.432	9.364	6.354	0.793
Unet	21.954	12.114	3.849	0.823
UNet++	22.332	12.039	3.749	0.819
TUnet	25.137	14.082	2.354	0.846
SUnet	24.955	13.864	1.831	0.838
SRUnet	26.473 ± 0.134	14.672 ± 0.143	1.096 ± 0.036	0.877 ± 0.006
SRUnet	[26.339, 26.607]	[14.529, 14.815]	[1.060, 1.132]	[0.871, 0.883]

Table 5. Evaluation indicators of each model when the decimation ratio is 0.1.

Network	PSNR	SNR	RMSE	SSIM
WT	23.772	12.357	3.514	0.883
Unet	25.911	14.352	1.954	0.913
UNet++	26.413	14.933	1.795	0.907
TUnet	27.625	15.137	1.218	0.915
SUnet	28.184	15.435	0.815	0.924
SRUnet	29.237 ± 0.119	16.031 ± 0.136	0.548 ± 0.015	0.946 ± 0.002
SRUnet	[29.118, 29.356]	[15.895, 16.167]	[0.533, 0.563]	[0.944, 0.948]

Table 6. Evaluation indicators of each model when the decimation ratio is 0.3.

Network	PSNR	SNR	RMSE	SSIM
WT	22.137	11.425	4.137	0.820
Unet	24.193	13.452	2.516	0.849
UNet++	23.527	13.933	2.673	0.832
TUnet	25.924	14.562	1.772	0.874
SUnet	26.103	14.273	1.803	0.883
SRUnet	28.101 ± 0.128	15.135 ± 0.141	0.957 ± 0.032	0.910 ± 0.005
SRUnet	[27.973, 28.229]	[14.994, 15.276]	[0.925, 0.989]	[0.905, 0.915]

Table 7. Evaluation indicators of each model when the decimation ratio is 0.5.

Network	PSNR	SNR	RMSE	SSIM
WT	19.112	8.953	5.732	0.787
Unet	21.332	12.035	3.571	0.805
UNet++	20.972	12.357	3.452	0.811
TUnet	24.837	14.033	1.952	0.842
SUnet	25.029	13.783	1.801	0.837
SRUnet	26.993 ± 0.140	14.326 ± 0.151	1.135 ± 0.038	0.883 ± 0.007
SRUnet	[26.853, 27.133]	[14.175, 14.477]	[1.097, 1.173]	[0.876, 0.890]

Table 8. The influence of each module on the model.

Model Name	Modules			PSNR ↑	SNR ↑	RMSE ↓	SSIM ↑
Model Name	H-CNN	ORCA	RCAM	PSNR ↑	SNR ↑	RMSE ↓	SSIM ↑
A	×	×	×	25.029	13.783	1.801	0.837
B	√	×	×	25.183	13.892	1.817 ↑	0.840
C	×	√	×	24.982 ↓	13.815	1.793	0.839
D	×	×	√	25.056	13.751 ↓	1.789	0.836 ↓
E	√	√	×	25.371	13.968	1.725	0.843
F	√	×	√	25.693	14.082	1.652	0.849
G	×	√	√	25.487	13.926	1.703	0.845
H	√	√	√	26.993	14.326	1.135	0.883

An upward arrow indicates that the larger the indicator, the better; a downward arrow indicates that the smaller the indicator, the better. × represents none, and √ represents yes.

Table 9. Parameters of each model and GPU usage.

Network	Parameter	GPU
WT	32.03 M	125.87 M
Unet	31.04 M	135.28 M
UNet++	18.21 M	209.28 M
TUnet	91.4 M	1.93 G
SUnet	19.83 M	1.34 G
SRUnet	18.23 M	780.56 M

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rao, J.; Chen, M.; Song, X.; Xie, C.; Duan, X.; Hu, X.; Li, S.; Zhang, X. Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism. Appl. Sci. 2025, 15, 8332. https://doi.org/10.3390/app15158332

AMA Style

Rao J, Chen M, Song X, Xie C, Duan X, Hu X, Li S, Zhang X. Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism. Applied Sciences. 2025; 15(15):8332. https://doi.org/10.3390/app15158332

Chicago/Turabian Style

Rao, Jie, Mingju Chen, Xiaofei Song, Chen Xie, Xueyang Duan, Xiao Hu, Senyuan Li, and Xingyue Zhang. 2025. "Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism" Applied Sciences 15, no. 15: 8332. https://doi.org/10.3390/app15158332

APA Style

Rao, J., Chen, M., Song, X., Xie, C., Duan, X., Hu, X., Li, S., & Zhang, X. (2025). Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism. Applied Sciences, 15(15), 8332. https://doi.org/10.3390/app15158332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Swin-ReshoUnet: A Seismic Profile Signal Reconstruction Method Integrating Hierarchical Convolution, ORCA Attention, and Residual Channel Attention Mechanism

Abstract

1. Introduction

2. Seismic Profile Signal Reconstruction Model

2.1. Base Network

2.2. Technical Workflow for Seismic Profile Signal Reconstruction

2.3. Swin-ReshoUnet Model with Hierarchical Convolution and Regional Coordinate Attention Enhancement

2.3.1. Hierarchical Dilated Convolutions

2.3.2. Overall Regional Coordinate Attention Module

2.3.3. Residual Channel Attention Mechanism Module

3. Experimental Process and Results

3.1. Data Preparation

3.2. Model Training

3.3. Experimental Results Based on Simulated Seismic Profile Signals

3.4. Experimental Results Based on Office Seismic Profile Signals

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI