Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network

Luo, Xi; Liu, Junbang; Li, Shaohua; Qiu, Xinyi; Lu, Changsheng; Wang, Shaojun

doi:10.3390/app152312363

Open AccessArticle

Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network

by

Xi Luo

¹,

Junbang Liu

²,

Shaohua Li

^1,*

,

Xinyi Qiu

²,

Changsheng Lu

¹ and

Shaojun Wang

²

¹

College of Geosciences, Yangtze University, Wuhan 430100, China

²

China National Petroleum Corporation Exploration and Development Research Institute, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12363; https://doi.org/10.3390/app152312363

Submission received: 10 October 2025 / Revised: 14 November 2025 / Accepted: 17 November 2025 / Published: 21 November 2025

Download

Browse Figures

Versions Notes

Abstract

High sand-ratio and low-connectivity reservoirs are commonly developed in deep-water depositional environments. Well-developed muddy interlayers reduce reservoir connectivity and form multiple discrete sandbody units, thereby offering good potential for layered development. However, due to limited research and insufficient data, modeling such complex reservoir structures remains challenging. The existing multiple-point statistics (MPS) method can utilize limited training images for geological modeling, but under high sand-ratio conditions, it often produces models with excessively high connectivity, failing to accurately represent reservoir characteristics. To address this issue, this study proposes a self-attention-enhanced single-image generative adversarial network (SA-SinGAN). Based on the original 2D SinGAN, the method was extended to 3D modeling by incorporating a self-attention mechanism and trained layer by layer to capture multi-scale geological features. Experimental results show that the FID score of SA-SinGAN is 143.75, compared with 175.61 for the MPS method. In terms of average connectivity error, MPS yields 0.42, which is significantly higher than 0.13 for SA-SinGAN, while the average NTG error is similar. SA-SinGAN can more accurately reproduce the low-connectivity characteristics of reservoirs while maintaining randomness, outperforming MPS in modeling performance. This demonstrates the applicability of SA-SinGAN for modeling complex reservoirs.

Keywords:

reservoir modeling; single-image generative adversarial network; self-attention

1. Introduction

High sandbody-to-mud ratio reservoirs with low connectivity exhibit reduced reservoir continuity due to the presence of laterally extensive, continuous mudstone interlayers between sand bodies. Although these mudstone interlayers are typically very thin, they possess strong lateral continuity, effectively compartmentalizing the sand bodies and forming multiple disconnected reservoir units. Despite their overall low connectivity, such reservoirs offer certain advantages for development. Owing to the significant barrier effect of the mudstone layers on fluid flow, these reservoirs demonstrate favorable compartmentalization performance during staged fracturing and pressure-controlled development, thereby enhancing injection–production efficiency and improving water control. In addition, the dispersed nature of the reservoir units reduces the risk of rapid breakthrough through a single high-permeability channel, offering better controllability and resource utilization potential. Therefore, accurately modeling the spatial architecture of such reservoirs is crucial for achieving fine-scale development and improving recovery efficiency. However, research on modeling these structurally complex reservoirs remains limited, and related simulation methodologies require further advancement. As early as 1965, G. Matheron introduced the concept of the variogram to characterize the spatial structure of regionalized variables, laying the theoretical foundation for two-point geostatistics. This approach uses the variogram to quantify spatial variation between pairs of points and has become a primary tool in early geological modeling. However, its inherent limitation lies in capturing only two-point statistical relationships, making it inadequate for representing complex, nonlinear, and geometrically intricate reservoir structures. To address this limitation, researchers proposed the multiple-point statistics (MPS) method [1], which extracts higher-order spatial relationships by scanning training images with a defined template. This enables the modeling of complex geological patterns beyond the scope of traditional methods. Notably, MPS can provide geologically realistic models even when limited conditioning data are available, thus overcoming the structural expression constraints of conventional variogram-based approaches.

With the advancement of geostatistics, a variety of modeling approaches have been proposed, including those based on geological processes, geological surfaces, and target-based constraints [2,3,4,5]. These methods have, to some extent, improved the geological realism and structural representation capability of reservoir models. However, compared with multiple-point statistics (MPS), they typically require a deeper understanding of the reservoir architecture and involve more complex modeling procedures.

Despite its advantages, the MPS method encounters significant challenges when applied to high sandbody-to-mud ratio reservoirs with low connectivity. Studies have shown that when the training image has a relatively simple structure, the connectivity of MPS-generated models can closely match that of the training image. However, when the training image contains complex geometric features, MPS tend to produce models with much higher connectivity than the original image [6]. This issue is particularly pronounced when mudstone interlayers exhibit complex spatial distributions—MPS simulations often fail to preserve the continuity of these features, leading to deficiencies in both geometric realism and geological consistency.

Therefore, there is an urgent need for a modeling approach that maintains the ease of use of MPS while addressing its limitations in reproducing reservoir connectivity under high sandbody-to-mud ratio conditions.

In recent years, with the rapid development of deep learning, neural network methods have been increasingly introduced in various fields. For example, convolutional neural networks (CNNs) have been used for sentence classification [7] and ultra-deep CNNs for large-scale image recognition [8]. Particularly, the advent of Generative Adversarial Networks (GANs) [9] has attracted significant attention due to their unique generative capabilities [10]. GANs employ adversarial training between a generator and a discriminator to learn complex data distributions, enabling the generation of high-quality images or models.

Following the pioneering work of Laloy et al. [11], several studies further extended GAN-based approaches to improve geological realism under complex sedimentary structures. Song Suihong et al. [12], building on Karras et al. [13,14], introduced Progressive GANs for hierarchical facies modeling, achieving improved multi-resolution representation and stratigraphic continuity. Chen Mei et al. [15] subsequently employed Self-Attention GANs (SAGANs) [16] to capture long-range spatial dependencies, while Chen Qiyu et al. [17] combined a Laplacian pyramid with self-attention to enhance multi-scale feature learning.

Although these methods improved the visual continuity and realism of generated facies, they share a common limitation: strong dependence on large training datasets. Under limited-data conditions, these GAN-based models tend to overfit the available samples, leading to repetitive patterns, biased connectivity, and reduced geological diversity. Moreover, the training of attention mechanisms in SAGAN-based models often requires thousands of image patches to achieve stable convergence, making them impractical for small geological datasets. Therefore, despite their methodological advances, existing GAN and SAGAN frameworks still struggle to generalize structural variability when only a few or a single training image is available.

To address this limitation, Shaham et al. [18] proposed SinGAN, a GAN framework trained on a single image. SinGAN performs coarse-to-fine, multi-scale layer-wise training using only one image and can generate diverse results with strong capabilities in structure extraction and local texture learning. Its multi-scale pyramid architecture is particularly advantageous for capturing complex stratification, mudstone interlayers, and irregular boundaries in geological models, making it well-suited for modeling non-stationary and complex reservoirs. Compared to traditional GANs, SinGAN can generate multiple structurally similar yet detail-varied models under limited training data conditions, exhibiting strong generative power and diversity. In this study, a high sandbody-to-mud ratio three-dimensional geological model characterized by thick sand bodies interbedded with thin mudstone layers was selected as the training image. The two-dimensional SinGAN framework was improved and extended to enable three-dimensional generation. Additionally, a self-attention mechanism was incorporated into SinGAN to enhance the model’s ability to represent key features such as mudstone interlayers, facilitating automated modeling of low-connectivity reservoirs with high sandbody-to-mud ratios.

The second section of this paper details the principles and methodologies of generative adversarial networks, single-image generative networks (SinGANs), and the self-attention mechanism. The third section presents empirical testing of the model, comparing simulation results and connectivity performance between MPS and SA-SinGANs. The fourth section provides a summary and discussion of the research findings and modeling workflow.

2. Materials and Methods

2.1. Generative Adversarial Networks

Generative Adversarial Networks (GANs) are an important class of generative models in deep learning. GANs consist of two models that learn and compete with each other, enabling the generation of data similar to the training set, such as images, audio, and text. The two components of a GAN are the generator and the discriminator. The generator takes random noise as input and produces new samples, while the discriminator receives samples generated by the generator and distinguishes them by assigning scores to both generated and real samples.

The objective of the generator is to optimize its model so that the generated samples closely resemble real data. Conversely, the discriminator aims to distinguish between real data samples and those produced by the generator. The generator feeds its outputs to the discriminator for evaluation, and the discriminator’s feedback is used to improve the generator. This adversarial process continues until a dynamic equilibrium is reached, where the generator produces samples that are sufficiently realistic that the discriminator’s ability to differentiate generated samples from real ones approaches random guessing, with scores close to 0.5.

The training of both models is controlled by loss functions, which are optimized iteratively to balance the generator and discriminator. The loss functions of the generator and discriminator are defined as follows (Equation (1)):

\min_{θ} \max_{φ} L_{a d v} (G_{θ}, D_{φ}) = E_{x ~ P_{d a t a (x)}} \log (D_{φ} (x)) + E_{z ~ P_{(z)}} \log (1 - D_{φ} (G_{θ} (z)))

(1)

Here,

E_{x}

represents the expected value of the distribution function.

P_{d a t a (x)}

represents the distribution of this actual sample,

P_{(z)}

is defined in a low-dimensional noise distribution, By mapping parameter

θ_{g}

through the

G

function to the high-dimensional data space, we obtain

P_{g} = G_{θ} (z, θ_{g})

. The computational process and structure of the generative adversarial network are shown in Figure 1:

2.2. The Core Architecture of SinGAN

The multi-scale pyramid forms the core architectural component of the SinGAN framework. This hierarchical structure enables the progressive generation of 3D geological models from low to high resolution, allowing the network to capture both global structures and fine-scale details at different levels. Through this design, SinGAN performs generation and discrimination sequentially across multiple scales, ensuring consistent feature learning at every resolution.

Each layer (or scale) in the pyramid corresponds to a specific spatial resolution, ranging from the lowest (coarsest) at the bottom to the highest (finest) at the top.

The total number of pyramid levels, denoted as

N

, is determined according to the size of the training sample and the desired level of detail. Each level

n

has its own generator

G_{n}

and discriminator

D_{n}

. The resolution of the

n

-th level is defined as

(X_{n}, Y_{n}, Z_{n})

,which is obtained by progressively downsampling the original model

(X_{N}, Y_{N}, Z_{N})

using a fixed scale factor rrr (typically between 0.7 and 0.8):

X_{n} = r^{n} X_{N} Y_{n} = r^{n} Y_{N} Z_{n} = r^{n} Z_{N}

This coarse-to-fine structure allows each scale to focus on different spatial characteristics: lower levels learn global morphology and connectivity, while higher levels refine local texture and small-scale facies transitions.

At the coarsest level

(n = 0)

, the generator receives only Gaussian noise

z_{0}

as input to produce a rough global structure.

For subsequent levels

(n > 0)

, the input is composed of the upsampled output from the previous level,

x_{n - 1}^{↑}

, combined with additional noise

z_{n}

:

x_{n} = G_{n} (x_{n - 1}^{↑} + α z_{n})

where

α

is a noise scaling parameter that controls the degree of randomness introduced at each level. This noise transfer mechanism allows the model to introduce controlled stochasticity while maintaining structural consistency across scales.

During progressive training, the generator and discriminator pairs

(G_{n}, D_{n})

are trained sequentially from the coarsest to the finest resolution. At each level, the model learns the mapping between the noise input and the reference pattern at the corresponding resolution. Once convergence is achieved at the current scale, its weights are frozen, and the network proceeds to the next finer level. The upsampling operator

↑ r

ensures smooth transition between scales by resizing both the generated samples and reference images according to the same scale factor

r

.

As illustrated in Figure 2, the complete pyramid structure can be represented as a generator hierarchy

{G_{0}, G_{1}, \dots, G_{N}}

and a discriminator hierarchy

{D_{0}, D_{1}, \dots, D_{N}}

. This progressive, coarse-to-fine training strategy enables the model to first capture the global geometry of the reservoir (e.g., channel orientation and connectivity) and then refine fine-scale geological textures such as interbedded mudstone layers and facies transitions, ultimately yielding high-resolution, geologically realistic results.

2.3. Self-Attention Mechanism

The attention mechanism refers to the process of focusing on key information among a large amount of data, effectively identifying and prioritizing the most relevant components. Originally introduced by Bahdanau et al. [19], the attention mechanism was applied in neural machine translation models, enabling the model to dynamically attend to different parts of the input sentence during translation. This allowed the model to emphasize relevant information instead of relying solely on a fixed-size context vector.

Specifically, the attention mechanism assigns a weight to each word in the input sequence, allowing the model to concentrate on those words that are more relevant to the translation at a given time step. Building upon this concept, Vaswani et al. [20] proposed the self-attention mechanism, which captures global dependencies by computing the similarity between each position in the input sequence and all other positions. Unlike traditional attention mechanisms that require manually added weights, self-attention automatically calculates the attention weights, significantly improving both parallelization and the model’s capacity to handle long sequences.

The mathematical formulation of the self-attention mechanism is as follows:

a t t (Q, K, V) = S o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(2)

In this formulation,

Q, K, V

represent the input sequence

X

projected into three distinct subspaces using learnable matrices

W_{Q}, W_{K}, W_{V}

, corresponding to the query, key, and value representations, respectively. The term

\frac{1}{\sqrt{d_{k}}}

is a scaling factor introduced to prevent the dot-product values from becoming excessively large, which could otherwise lead to vanishing gradients during training.

After computing the dot product to measure the similarity between the query and key vectors (i.e., between elements within the sequence), a normalization function—typically the Softmax function—is applied to the resulting scores. This produces a set of attention weights that determine the importance of each element in the sequence relative to others. Specifically, the input feature map is projected into three components: Query, Key, and Value. The dot-product similarity between the Query and Key is computed and normalized using a Softmax function to produce attention weights, which are then used to aggregate the Value vectors. The output is combined with the original feature map via residual connections, thereby enhancing the network’s sensitivity to semantically important regions (Figure 3).

To capture horizontal long-range dependencies in 3D facies models while keeping memory and computation costs manageable, we implement a lightweight attention module that operates on the XY plane of each depth slice independently. Given an input tensor

x \in R^{B \times C \times D \times H \times W}

, for each depth

d

we extract the 2D slice

x_{d} \in R^{B \times C \times H \times W}

, downsample it by a factor of 4 in each spatial dimension, and compute Query, Key and Value by

1 \times 1

convolutions:

Q = W_{Q} * {\tilde{x}}_{d}, K = W_{K} * {\tilde{x}}_{d}, V = W_{V} * {\tilde{x}}_{d},

Specifically,

{\tilde{x}}_{d}

is the downsampled slice, and the query/key channel dimension is reduced to

C' = C / 8

for efficiency. After reshaping to matrices

Q, K \in R^{B \times C^{'} \times N}

and

V \in R^{B \times C \times N}

(with

N = h \cdot w

), the attention map is computed by scaled dot-product and softmax:

A = S o f t m a x (\frac{Q^{⊤} K}{\sqrt{C^{'}}}) \in R^{B \times N \times N}

The attended representation is

O = V A^{⊤}

, reshaped and upsampled back to the original slice size and fused via a residual connection:

x_{d}^{'} = γ U p s a m p l e (O) + x_{d},

where

γ

is a learnable scalar initialized to zero. Processing is done slice by slice, so only one

N \times N

attention matrix is resident in memory at a time, which keeps peak memory usage feasible (e.g., for

H = W = 200

the downsampled

N = 2500

, so the attention matrix size is ≈ 6.25 M entries

\approx

25 MB in float32 for

B = 1

). We use single-head attention and channel reduction (C→C/8) rather than multi-head or windowed attention to balance representational capacity and GPU memory constraints. This design preserves lateral continuity while remaining tractable on a single GTX-3060-class GPU. The manufacturer is NVIDIA, located in Santa Clara, CA, USA.

To further improve numerical stability and convergence speed during training, Batch Normalization layers are applied after each convolutional layer. These layers normalize the feature maps within each mini batch, mitigating activation distribution shifts and reducing the risk of vanishing gradients, thereby improving the trainability of deep networks and enhancing generalization.

Moreover, to strengthen the model’s nonlinear representation capability, an activation function is applied to the normalized feature maps within each convolutional block. In this study, LeakyReLU is primarily used as the nonlinear activation function. Unlike ReLU, LeakyReLU allows a small portion of negative input values to pass through, effectively alleviating the “dead neuron” issue that can arise in early training stages and supporting the capture of more complex geological textures. The final output layer of each sub-generator uses the Tanh activation function, while the discriminator’s final output layer does not use any activation function, directly producing the discrimination score.

2.4. Design of a Single-Image GAN for Modeling High-Sand-Content Reservoirs

In this study, a three-dimensional single-image generative adversarial network architecture incorporating a self-attention mechanism is proposed. The network is optimized through the selection of appropriate loss functions and a stage-wise training strategy. The resulting model is capable of generating reservoir realizations characterized by high sandbody-to-mud ratios and low connectivity.

2.5. Training Image and Network Architecture

The training image is a three-dimensional geological model with dimensions of 200 × 200 × 20, constructed with reference to the high sandbody-to-mud ratio, low-connectivity model described in Walsh et al. [6]. The model has a sand-body proportion of approximately 70% (Figure 4). Due to the presence of thin mudstone interlayers that separate some of the stacked sand bodies, the overall connectivity of the model remains low. This model is used as the input training image for the single-image generative adversarial network.

The generator in this study adopts a progressive generation strategy, composed of multiple cascaded sub-generator modules with identical architectures, as shown in Table 1. The initial input is a 3D random noise volume with dimensions of 47 × 47 × 4, which is first passed through the initial sub-generator to produce a low-resolution image. Each sub-generator contains five consecutive 3 × 3 × 3 3D convolutional layers. The first four layers use 16 convolutional kernels followed by LeakyReLU activation functions to enhance nonlinear modeling capability and mitigate vanishing gradients.

Meanwhile, to improve the modeling accuracy of key geological structures (such as sandbody boundaries and interbedded mudstones), a three-dimensional self-attention mechanism (SelfAttention3D) was introduced after the convolutional modules of the two intermediate sub-generators, in order to enhance the model’s ability to capture both local and global spatial dependencies.

At the final layer of each sub-generator, a single-channel convolution is applied, followed by a Tanh activation function, to generate a 3D image with dimensions of 4 × 47 × 47. The output is then upsampled via 3D linear interpolation and concatenated with new random noise as the input for the next stage. Each subsequent sub-generator shares the same structure and continues to extract and reconstruct features at higher resolutions. The attention mechanism is applied to the three intermediate-resolution sub-generators to enhance structural coherence and feature recognition across different scales while conserving computational resources. Through this progressive architecture, the model can iteratively refine image details, ultimately producing a high-resolution 3D reservoir model with dimensions of 200 × 200 × 20 (Figure 5).

The discriminator adopts a convolutional neural network architecture symmetrical to that of the generator and consists of multiple sub-discriminators corresponding to the number of sub-generators, with identical structures as shown in Table 1. Each sub-discriminator comprises five 3 × 3 × 3 3D convolutional layers, combined with LeakyReLU activation functions and BatchNorm normalization layers to progressively extract discriminative features. Unlike the generator, no attention mechanism is introduced in the discriminator in order to maintain architectural simplicity and computational efficiency. This design improves training stability and ensures that the discriminator focuses on capturing local texture and structural differences in the generated images. Each sub-discriminator outputs a single-channel feature map to evaluate the realism of the generated results at different resolutions.

2.6. Network Parameters

In this study, to ensure stable training and improve the balance between the generator and discriminator, the Zero-Centered Gradient Penalty (ZERO-GP) strategy proposed by Roth et al. [21] was adopted in this study. Unlike the traditional Wasserstein GAN with gradient penalty (WGAN-GP) that enforces the gradient norm to be approximately one, ZERO-GP regularizes the discriminator by minimizing the deviation of the gradient norm from zero, which effectively prevents gradient explosion and mode collapse. This approach encourages smoother discriminator behavior and yields more stable convergence in limited-data scenarios.

The overall adversarial objective of SA-SinGAN can be formulated as follows:

\min_{G} \max_{D} L (G, D) = E_{x \sim p_{d a t a} (x)} [D (x)] - E_{z \sim p_{z} (z)} [D (G (z))] + λ_{g p} E_{\hat{x} \sim p_{\hat{x}}} [{‖\nabla_{\hat{x}} D (\hat{x})‖}^{2}]

where

G

and

D

denote the generator and discriminator, respectively,

p_{d a t a}

is the distribution of real samples,

p_{z}

is the noise prior, and

λ_{g p}

controls the strength of the gradient penalty term. The third term represents the ZERO-GP regularization, penalizing large gradient norms near real and generated data distributions.

The discriminator loss is expressed as:

L_{D} = - E_{x \sim p_{d a t a} (x)} [D (x)] + E_{z \sim p_{z} (z)} [D (G (z))] + λ_{g p} E_{\hat{x}} [{‖\nabla_{\hat{x}} D (\hat{x})‖}^{2}]

And the generator loss as:

L_{G} = - E_{z \sim p_{z} (z)} [D (G (z))] + λ_{f_{m}} ‖ f_{r e a l} - f_{f a k e} ‖_{2}^{2}

where

f_{r e a l}

and

f_{f a k e}

represent intermediate feature maps of the discriminator used in feature matching loss, and

λ_{f_{m}}

is the corresponding weighting factor.

In this study, the weights were empirically set as follows:

λ_{g p} = 10, λ_{f m} = 100

These hyperparameters were determined through sensitivity testing to ensure convergence stability and improved generative diversity.

Compared to WGAN-GP and LSGAN regularizations, ZERO-GP demonstrated better numerical stability and faster convergence in our experiments, consistent with previous findings in Roth et al. [21] and Mescheder et al. [22], where zero-centered gradient penalties led to improved training robustness, especially in data-limited scenarios such as geological facies modeling.

The specific training parameters are summarized in Table 2. During training, both the discriminator and generator are optimized synchronously with the same learning rate of 0.0005. This learning rate was selected based on preliminary experiments, which indicated that higher values (e.g., 0.001) caused unstable training and mode collapse, while lower values (e.g., 0.0001) led to very slow convergence. Similar learning rates have also been widely used in prior GAN studies in geoscience applications, providing a balance between training stability and convergence efficiency. The number of training iterations is set to 10,000, with an update frequency of 1:1 between the discriminator and generator. The ZERO-GP method enforces the Lipschitz continuity constraint on the discriminator through a zero-centered gradient penalty, which stabilizes training more effectively compared to conventional gradient penalty approaches.

A zero-centered gradient penalty term is introduced into the discriminator loss function, forcing the gradient norm between the real and generated data distributions to approach 1, thereby preventing mode collapse and gradient vanishing issues. The generator loss consists of adversarial loss (weight = 1) and feature matching loss (weight = 100), while the discriminator loss includes real data discrimination loss (weight = 1), generated data discrimination loss (weight = 1), and the zero-centered gradient penalty term (weight = 10). Experimental results demonstrate that this parameter configuration ensures training stability while significantly improving the quality and diversity of generated samples.

3. Result

The proposed SA-SinGAN framework was implemented using the PyTorch2.1.1+cu118 deep-learning platform. All experiments were conducted on a workstation equipped with an NVIDIA GeForce RTX 3060 GPU (12 GB memory). To ensure strict reproducibility, the pseudorandom number generators in Python3.10.9, NumPy1.26.4, and PyTorch were all initialized with a fixed random seed (42). Using a fixed seed guarantees that stochastic processes—including weight initialization, noise sampling, and data shuffling—produce identical outcomes across repeated runs, which is essential for ensuring the reliability and replicability of the presented results. Each scale of the multi-scale pyramid was trained for a maximum of 10,000 iterations. Rather than relying on subjective visual inspection, a quantitative convergence criterion was employed to determine when training at a given scale should terminate. Convergence was declared when the relative change in the loss function satisfied the following condition:

\frac{| L_{t} - L_{t - 500} |}{\max (1, | L_{t - 500} |)} < 1 0^{- 4}

for at least 500 consecutive iterations. In this expression,

L_{t}

represents the loss value at iteration

t

,

L_{t - 500}

denotes the loss 500 iterations earlier, the numerator measures the recent variation in loss, the denominator normalizes the change to avoid numerical instability when the absolute loss becomes small, and the threshold

1 0^{- 4}

ensures that only negligible loss changes are interpreted as convergence. This criterion provides an objective and stable assessment of convergence under ZERO-GP–regularized adversarial training. The full trajectories of the generator loss, discriminator loss, and gradient-penalty term are presented in Figure 6, clearly illustrating smooth and stable convergence behavior across all pyramid scales.

Upon completion of the training process, the final SA-SinGAN model was used to generate 30 three-dimensional geological facies realizations with dimensions of 200 × 200 × 20. To ensure reproducibility of the generated realizations, a fixed set of noise tensors was used for all inference procedures. For comparison, an additional 30 realizations of identical size were produced using a conventional Multiple-Point Statistics (MPS) algorithm. Representative samples from both methods were randomly selected for qualitative morphological comparison, followed by quantitative evaluation using similarity and connectivity metrics.

Figure 7 presents a comparison between the images generated by the two methods. It can be observed that the images generated using SA-SinGAN successfully reproduce the complex geological structures in the horizontal plane similar to the training image, and exhibit vertically consistent sandbody stacking patterns and continuous mudstone interlayers. In contrast, the models generated by the MPS algorithm fail to effectively reproduce the stacking of sandbodies and the continuity of thin mudstone interlayers. As shown in Figure 8, the MPS-generated models lack structural continuity in the vertical direction compared to the reference model, resulting in evident structural discontinuities. Moreover, as illustrated in Figure 9, a comparison of the internal structures of the generated models indicates that SA-SinGAN outperforms MPS in simulating internal geological features.

In addition, the sandbody proportions of the geological models generated by both methods were calculated, and boxplots were plotted accordingly. As shown in Figure 10, there is little difference (Table 3) between the SA-SinGAN and MPS methods in terms of simulating sandbody proportion, and both methods are capable of effectively reproducing the high sandbody proportion characteristic.

The variogram can be used to reflect spatial variations in the geological structure of reservoir models, where the range indicates the extent of spatial autocorrelation and continuity of the regional variable. Variograms were calculated for the training image and the geological models generated by the two methods (Figure 11). The results show that the variogram of the SA-SinGAN generated model closely resembles that of the training image in shape. In contrast, the variogram values of the MPS generated model are generally lower than those of the training image, indicating a lower similarity in spatial structure.

The connectivity function represents the probability that two points within a given lag distance belong to the same connected body. Theoretically, a rapid decrease in connectivity at short lag distances indicates a complex geological structure with fewer connected bodies. In this study, the connectivity functions of the training image and the models generated by the two methods were calculated (Figure 12). The results show that the connectivity trend of the SA-SinGAN generated models is generally consistent with that of the training image, indicating a structural resemblance and similar connectivity. In contrast, the MPS generated models fail to effectively reproduce this structural characteristic.

After comparing the geological models generated by the MPS and SA-SinGAN methods, a quantitative evaluation was performed using three metrics: the Fréchet Inception Distance (FID), the average NTG error, and the average connectivity error, based on thirty realizations produced by each method (Table 3). Among these metrics, the FID was adopted to quantitatively assess the similarity between the generated geological models and the reference training image. It measures the statistical distance between feature distributions extracted by a pretrained Inception network from both real and generated samples. A lower FID value indicates that the generated data distribution is closer to that of the reference, thus representing higher realism and structural consistency. This metric has been widely applied in generative model evaluation and is particularly suitable for comparing the global structural and textural similarity of geological facies models.

To evaluate the generalizability of the SA-SinGAN model, training was performed on an edge-type channel geological model, generating multiple realizations. Figure 13 shows the training image of the edge-type channel, with dimensions of 96 × 64 × 32, created using the target-based method. Figure 14 presents the corresponding realizations generated by SA-SinGAN.

4. Discussion

(1): Although the training image used in this study is relatively small (200 × 200 × 20), the proposed SA-SinGAN framework demonstrates good scalability. Once trained, the model can generate realizations at integer multiples of the latent vector dimensions through the inference mode, thereby enlarging the model while preserving learned spatial patterns. If training on larger domains is required, the model dimensions in the x, y, and z directions can be directly adjusted in the code. In addition, multi-GPU parallelization or patch-based training strategies can be employed to support field-scale reservoir modeling.
(2): The current implementation of SA-SinGAN focuses on single-image training and therefore does not directly support joint modeling of facies with petrophysical properties such as porosity and permeability. However, in our recent work based on Progressive Growing GANs (ProGAN), we have explored multi-attribute joint training, where the distributions of facies, porosity, and permeability are combined and input into the network simultaneously. Once trained, the model can generate facies realizations conditioned on porosity and permeability fields. A similar strategy can be applied to SA-SinGAN in future work by constructing a unified data distribution of facies, porosity, and permeability, enabling the network to learn their coupled spatial relationships effectively.
(3): The SA-SinGAN framework presented in this study shows promising potential for geological facies modeling, particularly in capturing local heterogeneity and spatial continuity from limited training data. The current work has been validated only on theoretical models. In future studies, conditional SA-SinGAN could be developed to incorporate auxiliary geological information, such as facies data interpreted from well logs or probabilistic volumes derived from seismic interpretation, to constrain the generated realizations and enhance geological plausibility. By introducing such conditional mechanisms, the model can simultaneously honor the global patterns of the training image and local observational data, generating facies models that better reflect real geological structures. Moreover, by integrating joint multi-property modeling and three-dimensional conditional constraints, SA-SinGAN is expected to have greater applicability in reservoir characterization, uncertainty quantification, and predictive modeling.

5. Conclusions

Focusing on a single-image generative adversarial network enhanced with a self-attention mechanism, this study investigates its modeling performance in high-sand-ratio, low-connectivity reservoirs, as well as its effectiveness in improving multi-scale structural representation and geological realism. Based on the results, the following conclusions can be drawn:

(1): This study proposes a single-image generative adversarial network incorporating a self-attention mechanism (SA-SinGAN), which has been successfully applied to geological modeling of high-sand-ratio, low-connectivity reservoirs. Compared with traditional multiple-point statistics (MPS) methods, SA-SinGAN exhibits lower limitations in adapting to and representing complex geological structures, achieving superior modeling performance.
(2): In the experimental evaluation, the modeling results were comprehensively validated using variogram analysis, connectivity functions, boxplots, and visual comparisons. The results show that SA-SinGAN achieves an average connectivity error of ±0.13, significantly lower than MPS at 0.42, and an FID score of 143.75, lower than MPS at 175.61. The average sandbody ratio error is comparable to MPS. Taken together, these results indicate that SA-SinGAN outperforms MPS in modeling the overall geological structures of the tested reservoir.
(3): The model’s applicability in complex three-dimensional reservoir modeling has been confirmed. It can accurately reconstruct intricate spatial structures and demonstrates robust performance in handling heterogeneous geological scenarios, such as high-sand-ratio reservoirs. However, SA-SinGAN still shows limitations in reproducing connectivity compared with deterministic modeling methods. Future work will explore incorporating conditional controls, such as core or borehole data, to further improve model performance and controllability.
(4): By constructing a pyramid-style generative adversarial network architecture, the model is able to extract multi-scale geological features from a single training image. The incorporation of a self-attention mechanism further enhances the model’s ability to focus on and recognize critical geological structures, thereby improving the structural consistency and spatial coherence of the generated images.

Author Contributions

Conceptualization, S.L.; Formal analysis, S.W.; Investigation, J.L.; Methodology, X.L., S.L. and C.L.; Resources, X.Q.; Validation, X.L., X.Q. and C.L.; Visualization, X.L.; Writing—original draft, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science and Technology Major Project (Grant No. 2025ZD1401502) and the China National Petroleum Corporation Technology Project “Research on Key Technologies of Artificial Intelligence for Oil and Gas Exploration and Development” (Grant No. 2023DJ84).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

This study was supported by China National Petroleum Corporation (CNPC). Junbang Liu, Xinyi Qiu and Shaojun Wang are employees of the China National Petroleum Corporation Exploration and Development Research Institute. The authors declare that these affiliations have not influenced the work reported in this paper, and there are no other financial or personal relationships that could inappropriately affect this study.

References

Guardiano, F.B.; Srivastava, R.M. Multivariate Geostatistics: Beyond Bivariate Moments. In Geostatistics Tróia ’92; Soares, A., Ed.; Quantitative Geology and Geostatistics; Springer: Dordrecht, The Netherlands, 1993; Volume 5. [Google Scholar]
Miller, J.K.; Sun, T.; Li, H.; Stewart, J.; Genty, C.; Li, D.; Lyttle, C. Direct Modeling of Reservoirs through Forward Process-based Models: Can We Get There? In Proceedings of the International Petroleum Technology Conference, Kuala Lumpur, Malaysia, 3–5 December 2008. [Google Scholar]
Groenenberg, R.M.; Hodgson, D.M.; Prelat, A.; Luthi, S.M.; Flint, S.S. Flow-deposit interaction in submarine lobes: Insights from outcrop observations and realizations of a process-based numerical model. J. Sediment. Res. 2010, 80, 252–267. [Google Scholar] [CrossRef]
Zhang, X.; Pyrcz, M.J.; Deutsch, C.V. Stochastic surface modeling of deepwater depositional systems for improved reservoir models. J. Petrol. Sci. Eng. 2009, 68, 118–134. [Google Scholar] [CrossRef]
Bertoncello, A.; Sun, T.; Li, H.M.; Mariethoz, G.; Caers, J. Conditioning surface based geological models to well and thickness data. Math. Geosci. 2013, 45, 873–893. [Google Scholar] [CrossRef]
Walsh, D.A.; Manzocchi, T. Connectivity in pixel-based facies models. Math. Geosci. 2021, 53, 415–435. [Google Scholar] [CrossRef]
Kim, Y. Convolutional Neural Networks for Sentence Classification. arXiv 2014, arXiv:1408.5882. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar] [PubMed]
Laloy, E.; Hérault, R.; Jacques, D.; Linde, N. Training-image based geostatistical inversion using a spatial Generative Adversarial Neural Network. Water Resour. Res. 2018, 54, 381–406. [Google Scholar] [CrossRef]
Song, S.; Shi, Y.; Hou, J. Review of a Generative Adversarial Networks (GANs)-based geomodelling method. Pet. Sci. Bull. 2022, 7, 34–49. [Google Scholar]
Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
Kaarras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4401–4410. [Google Scholar]
Chen, M.; Wu, S.; Bedle, H. Modeling of subsurface sedimentary facies using Self-Attention Generative Adversarial Networks (SAGANs). J. Pet. Sci. Eng. 2022, 214, 110470. [Google Scholar] [CrossRef]
Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-attention generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7354–7363. [Google Scholar]
Chen, Q.; Zhang, R.; Cui, Z. Multiple-scale three-dimensional geological modeling approach based on Lap-SAGAN. Earth Sci. Front. 2025, 1–30. [Google Scholar] [CrossRef]
Shaham, T.R.; Dekel, T.; Michaeli, T. SinGAN: Learning a Generative Model from a Single Natural Image. arXiv 2019, arXiv:1905.01164. [Google Scholar] [CrossRef]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Roth, K.; Lucchi, A.; Nowozin, S.; Hofmann, T. Stabilizing Training of Generative Adversarial Networks through Regularization. arXiv 2017, arXiv:1705.09367. [Google Scholar] [CrossRef]
Mescheder, L.; Geiger, A.; Nowozin, S. Which Training Methods for GANs do actually Converge? arXiv 2018, arXiv:1801.04406. [Google Scholar] [CrossRef]

Figure 1. GAN Structure and Computational Flow.

Figure 2. Multi-scale Pyramid Structure.

Figure 3. Self-Attention Structure Diagram.

Figure 4. Training image with a high sandbody-to-mud ratio, illustrating low connectivity between sand bodies.

Figure 5. Sub-Generator and Sub-Discriminator Structures Diagram.

Figure 6. Training loss curves at the final (highest-resolution) scale of the multi-scale SA-SinGAN pyramid.

Figure 7. (a–d) present the results generated by SA-SinGAN, with yellow representing sand and gray representing mud. (e–h) show the outcomes obtained using the MPS method. In terms of morphological fidelity, SA-SinGAN demonstrates superior performance compared to MPS.

Figure 8. Vertical comparison of the results obtained by the two methods. Yellow represents sand, and gray represents mud. In the top six images, the gray mud regions exhibit strong continuity with few discontinuities, whereas in the bottom six images, the gray mud layers show low continuity.

Figure 9. Diagram of the internal grid structures in two realizations: SA-SinGAN (middle) and MPS (right).

Figure 10. The proportion of sand bodies in the training images (TI) and the SinGAN implementation and the MPS implementation.

Figure 11. The variation functions of the trained image and the SinGAN implementation as well as the MPS implementation.

Figure 12. The connectivity function of the training image and the SinGAN implementation (left) as well as the MPS implementation (right).

Figure 13. The edge-type channel geological training model.

Figure 14. (a–d) show the realizations generated by SA-SinGAN, representing three facies: yellow for the main channel, green for the channel-margin sand bodies, and gray for mud.

Table 1. Sub-Generator and Sub-Discriminator Structures.

Generator
Block	Layer	Activation	Output shape
	Latent vector	—	47 × 47 × 4 × 1
Block 1	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 2	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 3	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 4	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
			47 × 47 × 4 × 16
Block 5	Conv3d_3 × 3 × 3	Tanh	47 × 47 × 4 × 1
Block 5	SelfAttention3d	—	47 × 47 × 4 × 1
Discriminator
Block	Layer	Activation	Output shape
	Latent vector	—	47 × 47 × 4 × 1
Block 1	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 2	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 3	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 4	Conv3d_3 × 3 × 3	LReLU	47 × 47 × 4 × 16
	BatchNorm3d	—	47 × 47 × 4 × 16
		—	47 × 47 × 4 × 16
Block 5	Conv3d_3 × 3 × 3	—	47 × 47 × 4 × 1
Block 5		—

Table 2. Training parameter table for single-image generative adversarial network.

Training Parameter	Value
Generator Learning Rate	0.0005
Exponential Moving Average of Generator Weights	0.999
Discriminator Learning Rate	0.0005
Number of Generator Training Steps per Discriminator Step	1
Maximum Number of Iterations	10,000
Weight of Generator Image Generation Loss	1
Weight of Generator Reconstruction Loss	100
Weight of Discriminator Loss for Real and Reconstructed Images	1
Weight of Discriminator Gradient Penalty Loss	10

Table 3. Quantitative Analysis of 30 Results Generated by Two Methods.

Quantitative Analysis of the Model	FID	Average Connectivity Error	Average NTG Error
SA-SinGAN Result	143.75 ± 2.66	±0.13	−0.005
MPS Result	175.61 ± 1.17	+0.42	+0.005

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, X.; Liu, J.; Li, S.; Qiu, X.; Lu, C.; Wang, S. Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network. Appl. Sci. 2025, 15, 12363. https://doi.org/10.3390/app152312363

AMA Style

Luo X, Liu J, Li S, Qiu X, Lu C, Wang S. Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network. Applied Sciences. 2025; 15(23):12363. https://doi.org/10.3390/app152312363

Chicago/Turabian Style

Luo, Xi, Junbang Liu, Shaohua Li, Xinyi Qiu, Changsheng Lu, and Shaojun Wang. 2025. "Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network" Applied Sciences 15, no. 23: 12363. https://doi.org/10.3390/app152312363

APA Style

Luo, X., Liu, J., Li, S., Qiu, X., Lu, C., & Wang, S. (2025). Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network. Applied Sciences, 15(23), 12363. https://doi.org/10.3390/app152312363

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling of High-Sand-Ratio and Low-Connectivity Reservoirs Based on Self-Attention Single-Image Generative Adversarial Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Generative Adversarial Networks

2.2. The Core Architecture of SinGAN

2.3. Self-Attention Mechanism

2.4. Design of a Single-Image GAN for Modeling High-Sand-Content Reservoirs

2.5. Training Image and Network Architecture

2.6. Network Parameters

3. Result

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI