ADPGAN: Anti-Compression Attention-Based Diffusion Pattern Steganography Model Using GAN

Zhen-Qiang Chen; Yu-Hang Huang; Xin-Yuan Chen; Sio-Long Lo

doi:10.3390/electronics14224426

,

and

¹

Faculty of Innovation Engineering, Macau University of Science and Technology, Macao, China

²

Faculty of Law, Macau University of Science and Technology, Macao, China

^*

Author to whom correspondence should be addressed.

Electronics2025, 14(22), 4426;https://doi.org/10.3390/electronics14224426

This article belongs to the Section Electronic Multimedia

Version Notes

Order Reprints

Review Reports

Abstract

Image steganography is often employed in information security and confidential communications, yet it typically faces challenges of imperceptibility and robustness during transmission. Meanwhile, insufficient attention has been paid to preserving the quality of the secret image after JPEG compression at the receiver, which limits the effectiveness of steganography. In this study, we propose an anti-compression attention-based diffusion pattern steganography model using GAN (ADPGAN). ADPGAN leverages dense connectivity to fuse shallow and deep image features with secret data, achieving high robustness against JPEG compression. Meanwhile, an enhanced attention module and a discriminator are employed to minimize image distortion caused by data embedding, thereby significantly improving the imperceptibility of the host image. Based on ADPGAN, we propose a novel JPEG-compression-resistant image framework that improves the quality of the recovered image by ensuring that the degradation of the reconstructed image primarily stems from sampling rather than JPEG compression. Unlike direct embedding of full-size secret images, we downsample the secret image into a secret data stream and embed it into the cover image via ADPGAN, demonstrating high distortion resistance and high-fidelity recovery of the secret image. Ablation studies validate the effectiveness of ADPGAN, achieving a 0-bit error rate (BER) under JPEG compression at a quality factor of 20, yielding an average Peak Signal-to-Noise Ratio (PSNR) of 39.70 dB for the recovered images.

Keywords:

JPEG compression; steganography; image scaling; deep learning

1. Introduction

With the rapid development of information and communication technologies, as well as the swift adoption of Internet and multimedia technologies, traditional methods of accessing, disseminating, and utilizing information have been disrupted. Daily, millions of individuals post and download digital images via social networking services (SNS). Nevertheless, significant losses have been incurred by rights holders as a result of unlawful actions such as piracy, identity theft, malicious tampering, unauthorized access and distribution, and online fraud, which have been made easier by the public’s yet insecure nature of SNS []. The presence of private elements in most images, including personal facial features, life scenarios, and commercial copyright protections, jeopardizes privacy’s security and integrity. Moreover, to minimize expenses, service providers frequently preprocess images prior to uploading them, employing techniques like JPEG compression to diminish the volume of data requiring processing []. Due to transmission distortion, users may be unable to access or download images with visual quality close to the original. Therefore, three requirements must be satisfied to preserve digital images on SNSs: (1) guaranteeing the integrity of the image’s visual content; (2) making the image resistant to compression-induced distortions; and (3) making the image easily identifiable. However, achieving all three requirements simultaneously has long been considered challenging [].

To fulfill requirement 1, using keys is considered the most direct and efficient approach to safeguarding image security. Image encryption entails converting the original image into a noise-like representation using key-based confusion and diffusion; hence, it prevents users from reverting it to its original form without the key. Peng et al. [] developed a block-based encryption method that provides robust image encryption while preserving the compression efficiency required for real-time transmission. On the other hand, key-based data concealment effectively addresses issues related to confidentiality. Integrating a key prevents unauthorized access to the image, thereby reducing its likelihood of interception. Yang et al. [] integrated deep learning algorithms with keys to facilitate the transmission of confidential data over insecure networks. Singh et al. [] used hidden trigger keys in networks to verify user permissions, preventing unauthorized access and breaches.

For requirement 2, digital media is transmitted and stored through social network channels. Nevertheless, compression processes in the channel, such as JPEG compression, might diminish the visual quality of digital information. JPEG compression reduces file size by performing steps such as color space conversion, downsampling, block splitting, the discrete cosine transform, quantization, and entropy coding. During quantization, the DCT coefficients are processed according to a predefined quantization table, typically discarding smaller coefficients, especially in the high-frequency part. Consequently, the recipient is unable to get the anticipated original digital media, rendering it infeasible for the recipient to retrieve the accurate secret information. This is why many strong watermarking algorithms and steganography methods that work with JPEG compression have been created to protect the integrity of private data. Watermarking emphasizes protecting confidential information, whereas steganography prioritizes statistical security. Ahmadi et al. [] proposed a diffusion watermarking system that leverages a residual structure to learn a robust watermark pattern via end-to-end training. Yang et al. [] presented two advanced modules for neural network-based image steganography to reduce image degradation and enhance the quality of the concealed image. Jia et al. [] improved JPEG resilience by integrating authentic JPEG, simulated JPEG, and noiseless layers. For JPEG robust steganography, both the integrity of the secret information and statistical security are ensured to prevent information loss during transmission through the compression channel.

To meet Requirement 3, we balance security with other factors, including lower processor requirements, bitstream standards, and cryptographic signal processing. An image security method was proposed by Chen et al. [] that focuses on region selection, encrypting only the area of interest, and using a deep learning-based embedding network to embed the encrypted location data into the encrypted image. This method protects sensitive areas while facilitating identification using non-sensitive components; however, the user cannot obtain high-quality images due to JPEG compression.

Based on the aforementioned research, to achieve high imperceptibility and robustness, we propose an anti-compression attention-based diffusion pattern steganography model using GAN (ADPGAN). ADPGAN embeds secret data into the cover image and employs an encoder-noise-decoder structure for end-to-end training to achieve significant robustness against JPEG compression. The image is reshaped and changed into the discrete cosine transform (DCT) domain during the embedding phase. An enhanced dense net is used in the embedding network to leverage both deep and shallow features, making the model more resistant to JPEG compression. Furthermore, incorporating an enhanced attention module enables ADPGAN to identify subtle, detail-rich regions, thereby improving imperceptibility. The incorporation of a discriminator strengthens the security of the secret image by establishing an adversarial dynamic between the domain encoders, thereby enhancing the imperceptibility of the host image. To address the problem of poor image quality extraction under attack, we propose a novel JPEG-compression-resistant image framework (JCIF). The framework first employs an image downsampling network to downsample the secret image into secret data, thereby reducing the amount of data required for embedding by ADPGAN, and then embeds the secret data into the cover image via ADPGAN. Thanks to ADPGAN’s robustness, the quality degradation of the recovered secret image primarily stems from trainable image sampling rather than JPEG compression, thereby improving image quality. Experimental results demonstrate that ADPGAN outperforms in terms of embedded image quality and resilience against JPEG attacks, enabling the framework to maintain the security of the secret image and to achieve near-original fidelity recovery at the receiver. The primary contributions of this study are delineated as follows.

(1) To overcome the limitations of existing deep-learning-based steganography models, which extract only limited data features, we propose ADPGAN, which deploys a circular feature fusion module (CFFM) to learn both shallow and deep image features for fusion with secret data. These features are reused in the transform domain through dense connectivity to enhance robustness.

(2) To enhance image features and reduce visual degradation caused by embedding, we deploy a circular attention module (CAM) that guides data embedding into inconspicuous, textured regions by computing a probability distribution across image feature channels. The integration of CFFM and CAM enables ADPGAN to achieve high imperceptibility and robustness, as confirmed by ablation studies.

(3) Based on ADPGAN, we propose a novel JPEG compression-resistant image framework that ensures high-fidelity images by making the degradation in the revealed image primarily result from sampling rather than JPEG compression. The framework downsamples the secret image into secret data and embeds it into the cover image via ADPGAN. Experimental results show that, even with JPEG compression at a quality factor of 20, the framework maintains a 0 BER while achieving a recovered-image PSNR of 39.70 dB, demonstrating its high fidelity in revealing the secret image under attack.

The rest of the paper is organized as follows: In Section 2, we review the cited related work. In Section 3, we will introduce the details of the proposed scheme in detail and show the experimental results in Section 4. Finally, Section 5 concludes.

2. Related Work

2.1. Image Steganography

Traditional steganography conceals hidden information by altering image pixels in the spatial domain, specifically by manipulating the least significant bit (LSB) []. These methods are straightforward to execute and do not require considerable training data. In contrast to conventional steganography, deep learning-based steganography offers superior robustness and capacity, providing enhanced undetectability and making the concealed data harder to track. Jing et al. [] introduced a novel image-hiding network that uses reversible neural networks to improve image concealment. Lu et al. [] presented a reversible steganography network that enhances payload capacity and facilitates the embedding of numerous images. Cui et al. [] introduced an innovative deep image concealment technique that allows stego-images to exhibit enhanced resilience against steganalysis. To enhance security and robustness, Ahmadi et al. [] incorporated circular convolution into the watermarking framework, enabling the watermark data to diffuse across a substantial image region. Comparing the difference (referred to as the diffusion pattern) between watermark images produced by embedding an all-zero mask and a singular non-zero mask into an identical constant cover image demonstrates that circular convolution enables the diffusion of watermark data across multiple image blocks, rather than merely substituting data within a solitary block.

2.2. Dense-Net

In contrast to conventional deep neural networks that enhance performance by augmenting depth or width, DenseNet is a densely connected neural network in which each layer is connected to all preceding layers []. Concatenating feature maps from multiple layers enhances the diversity of inputs to subsequent layers, enabling each layer to gather additional information []. In DenseNet, each layer obtains input from the feature maps of prior layers, a configuration known as the feedforward property, as shown in Equation (1):

\begin{matrix} X_{i} = Y_{i} ([X_{0}, X_{1}, \dots \dots X_{i - 1}]) \end{matrix}

(1)

The complete dense block consists of eight residual blocks, each with two convolutional layers (

1 \times 1

and

3 \times 3

) []. Every convolutional layer consists of a batch normalization (BN) layer, a rectified linear unit (ReLU) activation, and a convolution (Conv). Batch normalization and ReLU improve training efficiency and introduce non-linearity, thereby enhancing the neural network’s efficacy and stability []. The

1 \times 1

convolution kernel is employed to reduce the number of channels, thereby enhancing computational efficiency []. A

3 \times 3

convolution kernel is used to extract local features from the image, with the number of output feature maps per layer predetermined as r (the growth rate). Due to DenseNet’s advantages in alleviating vanishing gradients, enhancing feature propagation, promoting feature reuse, and reducing model parameters, it has been widely applied in image processing.

2.3. Attention Mechanism

The attention mechanism effectively captures pertinent information in input sequences. During sequential input processing, the model can dynamically allocate weights based on positional information. The attention mechanism skillfully guides the location information in image steganography by focusing on important parts of the image and ignoring unnecessary details. SENet is an exceptional channel attention technique that optimizes feature channel weights via global average pooling and fully connected layers []. To improve hiding and security, Tan et al. integrated SENet into image steganography models to enhance channel properties []. Convolutional block attention module (CBAM) significantly improves representational capacity. Wang et al. [] included CBAM in the model, which helped invisible adversarial watermark images effectively defend against deep neural networks that could misidentify, access, and remove the watermarks. Xie et al. [] added an SE module to the encoder to combine watermark data at both the pixel and channel levels, thereby improving the encoder’s performance. Huang et al. integrated an attention module into the steganographic model to enhance the imperceptibility of the watermarked image by computing an attention mask from global features extracted from the cover image [].

Although the aforementioned models employ various strategies to enhance watermark imperceptibility or robustness, the loss of visual quality in the embedded image caused by severe noise remains a significant concern. In this study, building on prior work, we compute an attention mask in the transform domain to embed data into inconspicuous, texture-rich regions, thereby improving ADPGAN performance. Furthermore, we introduce a JPEG-compression-resistant image framework that integrates an image sampling network and an ADPGAN to mitigate JPEG-compression-induced image degradation. Unlike direct full-size embedding, the framework downsamples the secret image into secret data before embedding it into the image via ADPGAN.

3. Proposed Method

To begin, we show the full structure of JCIF, which consists of two parts: the image sampling network and ADPGAN, which includes an encoder, a noise layer, a decoder, and an adversarial discriminator. Throughout the iterative training process, the parameters of each component will be persistently refined to create a host image characterized by high imperceptibility and robustness, followed by a comprehensive explanation of all the principal modules of JCIF. The comprehensive framework structure is depicted in Figure 1.

Figure 1. The architecture of JCIF.

3.1. Image Sampling Network

The image downsampling network comprises a reversible bijection scaling network and a transformation model, designed to downsample the image into secret data or recover it into a secret image. Drawing inspiration from invertible image rescaling [], we utilize an invertible bijective transformation scaling network to manipulate secret images. The suggested framework reduces the embedding capacity required for the secret images while preserving a satisfactory level of visual quality, since invertible networks have a special advantage in handling degradation-recovery problems with circular compatibility []. From the secret image

I_{S}

, the

i th

Haar transform module is used to get the low-frequency data (

S_{l}^{i}

) and the high-frequency data (

S_{h}^{i}

). This module includes some inductive bias. The application of an additive affine transformation facilitates the creation of a downsampled image

I_{D}

with satisfactory visual quality, alongside an independent and adequately distributed latent representation that encapsulates the lost data, as demonstrated in Equation (2). During the recovery stage, random Gaussian noise is used to replace the high-frequency data lost during downsampling and is incorporated into an enhanced affine transformation. The image

I_{D}

and the random Gaussian noise undergo an inverse affine transformation to recover the secret image

I_{S}^{'}

.

σ (\cdot)

,

ρ (\cdot)

, and

η (\cdot)

are responsible for parameterizing the transformation function of the invertible image rescaling, and we adopt densely connected convolutional blocks based on their effectiveness in image sampling.

\begin{matrix} S_{l}^{i + 1} & = S_{l}^{i} + ϕ (S_{h}^{i}) \\ S_{h}^{i + 1} & = S_{h}^{i} ⊙ \exp (σ (ρ (S_{l}^{i + 1}))) + η (S_{h}^{i + 1}) \\ S_{h}^{i} & = (S_{h}^{i + 1} - η (S_{h}^{i + 1})) ⊙ \exp (- σ (ρ (S_{l}^{i + 1}))) \\ S_{l}^{i} & = S_{l}^{i + 1} - ϕ (S_{h}^{i}) \end{matrix}

(2)

The transformation model enables fine-tuning the downsampled image into secret data. By strategically using the key, the secret image is embedded in the cover image as a byte stream, thereby reducing the volume of data that needs to be embedded, while only individuals with the correct key can access the decrypted image. The key is split into 16 64-bit blocks and passed to the corresponding block. The use of a linear layer ensures that the key is properly segmented and transformed into a pseudo-random key (PRK), which is then XORed with the downsampled image to obtain the secret data (

S_{d a t a}

) for subsequent embedding. This process can be represented as shown in Equation (3):

\begin{matrix} P R K & = Gen (C o n v ((L_{2} (S i g m a (L_{1} (Key)))))) \\ s_{d a t a} & = P R K \cap I_{D} \end{matrix}

(3)

where

C o n v

is the convolutional layer,

L_{1}, L_{2}

are linear layers with an output size of 64, Gen is a pseudorandom number generator,

I_{D}

is the downsampled image, and

s_{d a t a}

is the secret data that will be used in the subsequent embedding process.

3.2. Anti-Compression Attention-Based Diffusion Pattern Steganography Model Using GAN

3.2.1. Encoder Network

Given a cover image

I_{C}

, a binary secret data

S_{d a t a}

, and a strength factor (

α

) as inputs, the sizes of the image and secret data are

H \times W \times 1

and

h \times w \times 1

, respectively. The encoder takes the inputs and generates the host image

I_{H}

, where E consists of the circular attention module (CAM), the circular feature fusion module (CFFM), and two non-trainable transformation layers, including the DCT layer and inverse DCT transformation layers, as shown in Figure 2. The output of encoder (

I_{H}

) is computed by

I_{H} = E (I_{C}, S_{d a t a}, α)

(4)

where

E (\cdot)

represents the encoding process. The use of CAM enables the extraction of global characteristics of the cover image, which are used to guide the development of

I_{H}

with excellent imperceptibility. To acquire both shallow and deep features and integrate them with the secret data to improve robustness against attacks, we used an enhanced CFFM with dense connections. The CFFM incorporates secret data of dimensions

h \times w

into a larger cover image of

H \times W

pixels, yielding an expansion ratio of

\frac{H \times W}{h \times w}

. The CFFM modifies the image mask/pattern in the DCT domain, facilitating the embedding of secret data at multiple locations and thereby significantly improving its imperceptibility and resilience. The secret data residual mask in the spatial domain is subsequently obtained using the inverse DCT transform and incorporated into the cover image with a strength factor to enhance robustness. The subsequent section elaborates on the encoder network’s technical specifications.

Figure 2. The framework of the proposed encoder network.

Space to Depth. The transition from space to depth enhances the cover image’s depth, resulting in a more nuanced feature representation. This process enables the model to extract more intricate details from the cover image and produce more resilient secret data features against compression attacks. To ensure imperceptibility in embedding secret data within a larger cover image, we partition the cover image into multiple blocks of size

M \times N

, with each block accommodating at least one data bit. Given that there are

h \times w

one-bit data to embed according to the size of

S_{d a t a}

, we transform the cover image into a tensor of dimensions

M \times N \times h \times w

. The

M \times N

image blocks are vectorized to constitute each column of the tensor.

DCT/IDCT Transform Layer. Although the spatial domain offers advantages such as a larger embedding capacity and faster, simpler implementation, embedding data in the frequency domain provides better imperceptibility and robustness against attacks []. Utilizing a series of reversible discrete cosine transforms, the basis of an image’s representation is converted between the spatial domain and the DCT domain. Since low-frequency DCT coefficients have minimal impact on overall perceptual quality, fine-tuning the DCT coefficients and adjusting the quantization step sizes can reduce the impact of data embedding on visual quality, thereby enhancing the imperceptibility of the host image. Meanwhile, these coefficients, which capture the image’s primary structure and visual information, are typically more stable than high-frequency coefficients. Therefore, the host image with data embedded in the low-frequency coefficients domain demonstrates superior resilience to JPEG compression attacks. To prevent any modification of the tensor, we employ

1 \times 1

masked convolution with designated inductive biases as a transformation layer to process each block of the cover image that has been turned into a tensor. The data structure is based on the size of the new space, so

n_{F}

convolution masks are needed to support the DCT transformation basis. Each

M \times N

block (t) is reshaped into a column vector to form a

1 \times 1 \times M N

tensor, and the 2D DCT transformation is applied as shown in the formula.

\begin{matrix} t^{⊺} & = {[t^{⊺} (δ)]}_{n_{F} \times 1} \\ = [\frac{1}{M N} \sum_{k = 0}^{N M - 1} t (k) \cos (\frac{(2 ⌊ \frac{k}{M} ⌋ + 1) ⌊ \frac{δ}{M} ⌋ π}{2 M}) \\ \cos (\frac{(2 (k - ⌊ \frac{k}{M} ⌋ M) + 1) (δ - ⌊ \frac{δ}{M} ⌋ M) π}{2 N})]_{n_{F} \times 1} \\ = {[\sum_{k = 0}^{N M - 1} t (k) D (δ, k)]}_{n_{F} \times 1} \\ = {[d_{δ}]}_{n_{F} \times M N} {[t]}_{M N \times 1} = D_{n_{F} \times M N} \cdot t_{M N \times 1} \end{matrix}

(5)

where

k \in [1, M N]

,

δ \in [1, n_{F}]

. During the implementation,

t (k)

is the

k th

element of the vector t.

d_{δ}

, which is a vectorized convolution mask, represents the weight of the

δ th

1 \times 1

convolutional with no bias, while

t^{⊺} (δ)

represents the output of the

δ th

filter mask. The transformation matrix

D = [d_{δ}]

, of dimension

n_{F} \times M N

, has rows containing the corresponding neuron weights, and

t^{⊺}

represents the transformed feature space, i.e., the output of all neurons. Therefore, two-dimensional DCT transformation can be simulated by a fixed-weight convolutional layer.

Circular Attention Module. Employing a suitable attention mask to direct the embedding of secret data in various segments of image blocks can improve image quality and resilience against attacks. Embedding secret data in unobtrusive regions helps minimize image distortion, whereas concealing it in textured sections can effectively resist image processing algorithms. Consequently, the CAM is tasked with identifying covert, texture-rich regions for clandestine data embedding, thereby ensuring that the host image exhibits superior imperceptibility and robustness. The output of CAM (

I_{C A M}

) is computed by

I_{C A M} = C A M (F_{C})

(6)

where

C A M (\cdot)

represents the process of the CAM. Specifically,

L a y e r_{A}

reduces the visual quality degradation caused by data embedding by adjusting the global image features of

I_{C}

, while

L a y e r_{B}

selects inconspicuous yet detailed regions based on the computed probability feature distribution, as illustrated in Figure 2.

L a y e r_{A}

comprises a

C o n v B N R e L U

block (consisting of

3 \times 3

convolution, batch normalization, and ReLU), a

2 \times 2

circular convolution, and a circular-dense block, while

L a y e r_{B}

consists of

C o n_{E A}

,

C o n_{E B}

, and a SoftMax activation function used for calculating the probability distribution across channels.

Inspired by ReDMark [], we strategically use the

C o n v B N R e L U

block and circular convolution to extract local image features. In

L a y e r_{A}

, the circular dense blocks, which substitute standard convolutions with circular convolutions, enhance the robustness and visual quality of image features. The three interconnected, circular, dense blocks, linked via dense connections, are tasked with refining the global features of the input image. In

L a y e r_{B}

,

C o n_{E A}

and

C o n_{E B}

, which consist of batch normalization, ReLU activation, and circular convolution, are responsible for fine-tuning the global features, while the SoftMax activation function calculates the probability distribution of the output channels.

I_{C A M}^{(h, w, q)} = \frac{e^{l a y e r_{B}^{(h, w, q)}}}{\sum_{q = 1}^{F} e^{l a y e r_{B}^{(h, w, p)}}} \times F

(7)

where F represents the quantity of output channels from the blocks. In the CAM, the cover image initially traverses

L a y e r_{A}

to extract deep features, after which

L a y e r_{B}

adjusts these features to derive secret data features that produce inconspicuous, texture-rich feature maps to enhance the visual quality of the host image.

Circular Feature Fusion Module. CFFM is tasked with learning to extract robust image features from all relevant image attributes and integrate them with secret data to counter JPEG compression attacks. After extracting multi-level shallow and deep features, CFFM employs a circular-dense block to integrate secret data with image features

F_{C}

. The output of CFFM (

I_{C F F M}

) is computed by

I_{C F F M} = C F F M (F_{C}, S_{d a t a})

(8)

where

C F F M (\cdot)

represents the process of the CFFM. The multi-layered secret data fusion effectively extracts multi-dimensional features and reuses features, thereby enhancing data integrity.

L a y e r_{C}

consists of a Squeeze-and-Excitation (SE) block [,] and three densely connected circular dense blocks. To counteract significant distortions, we implement the SE module to handle secret data throughout the channel dimension. The input to the circular dense block comprises shallow features from the cover image and the secret data; the output consists of deep features that incorporate the secret data. The utilization of circular dense blocks facilitates the relearning of both shallow and deep features, hence allowing

L a y e r_{C}

to train distinct data patterns. The circular dense block, on the other hand, uses

2 \times 2

circular convolution blocks, which give the neurons in its last layer a larger receptive field than traditional dense blocks, enabling them to share and distribute private data. Despite JPEG compression corrupting an important pixel block, the decoder can still retrieve concealed data from neighboring blocks. Subsequent to

L a y e r_{D}

, comprising

C o n_{E C}

and

C o n_{E D}

blocks, the deep features are subjected to additional processing. The

C o n_{E C}

block reconfigures the pattern of the secret data to enhance resistance to attacks, while the

C o n_{E D}

block adjusts the feature dimensions to align with the CAM result.

Skip Connection and Strength Factor. In CFFM, after generating the deep features, we use the CAM output to refine the patterns of the secret data embedded within the deep features, ensuring that the secret data distribution resides in areas that minimally affect visual quality and are less vulnerable to JPEG compression. To reconcile robustness and imperceptibility, we include a strength factor (

α

) to modulate the strength of the generated features. The output of the encoder is computed by

I_{H} = D C T^{- 1} (α * (I_{C A M} \times I_{C F F M}) + I_{C})

(9)

where

D C T^{- 1}

is the inverse DCT transform. Therefore, while ensuring robustness against attacks, the steganographic features containing secret data can be attenuated to improve the imperceptibility of the host image. To maximize resilience against attacks, we maintain the strength factor constant at one during network training. The host image is generated by incorporating the features, adjusted by the strength factor, into the original image through skip connections. The suggested approach allows the network to generate residual secret data, thereby expediting network training convergence.

3.2.2. Attack Network

We focus on training host images that are resilient to JPEG compression, so that even under significant distortion, secret data can still be accurately retrieved from the relevant feature regions. As illustrated in Figure 1, the proposed framework integrates a differentiable network layer to emulate the JPEG compression attack, thereby enabling end-to-end training. The comprehensive training instructs the encoder to learn more resilient steganographic patterns, enabling the generated host image to perform more effectively in SNSs with JPEG compression.

JPEG compression encodes the DCT coefficients using a quantization table, though this procedure involves non-differentiable operations. Non-differentiable JPEG compression may yield zero gradients during backpropagation, hindering the encoder’s ability to update parameters and produce resilient images, while the decoder fails to get the accurate secret data []. Consequently, a differentiable approximation operation is required to enable assaults on the network. JPEG compression involves encoding and decoding. The encoding process initially segments the image into blocks, then converts them to the DCT domain. The coefficients are subsequently divided by a quantization table established by the quality factor. A differentiable rounding operation is incorporated to prevent the gradient from reaching 0, thereby ensuring the result remains an integer. Likewise, JPEG decoding is the inverse of encoding; after multiplication by the identical quantization table, the result is reverted to the spatial domain. In the differentiable rounding operation, we employ uniform noise

β

to emulate the rounding process, leading to increased distortion of the host image due to the attack:

\begin{matrix} {\tilde{p^{D C T}}}_{1 \times 1 \times M N} & = (\frac{p_{1 \times 1 \times M N}^{D C T}}{Q} + β) \times Q \\ = p_{1, 1, d}^{D C T} + Q β \end{matrix}

(10)

where Q is the quantization table determined by the quality factor,

p_{1 \times 1 \times M N}^{D C T}

is the pixel block of the host image after reshape and DCT transformation,

β \in [- 0.5, 0.5]

is the uniform noise, and

{\tilde{p^{D C T}}}_{1 \times 1 \times M N}

is the output after differentiable quantization. The analysis of Equation (10) indicates that larger elements in the high-frequency region of the quantization matrix result in more pronounced rounding errors. This compels the encoder to reduce the embedding intensity in the high-frequency domain to prevent excessive distortion of high-frequency details caused by quantization errors.

3.2.3. Decoder Network

Figure 3 illustrates that the decoder operates as the inverse of the encoder’s CFFM, yielding a structurally simpler design than the embedding network. The decoder is employed to retrieve secret data from the attack image. The decoder consists of 6 convolutional blocks and uses dense connections to extract deep features from the attacked image.

C o n v_{D 1}

, which is a

C o n v B N R e L U

block, is used to extract shallow features from the attacked image. The other convolutional blocks follow the

C o n v_{C i r}^{2 \times 2}

-

B N

-

L R

structure.

C o n v_{D 6}

is used to reduce the dimensionality of the dense layers to generate features with the same dimensions as the secret data. The concluding

1 \times 1

convolutional layer, comprising neurons with a sigmoid activation function, produces a probability map by using a threshold function for the confidential data, which is presented as the extracted secret data.

Figure 3. The framework of the decoder network.

The implementation of dense connections enables the decoder to obtain deep features similar to those extracted by the encoder, while incorporating a hard threshold function improves the decoder’s error resilience. Despite an attack, if the features extracted from the attacked image differ from the encoded features, the decoder can still effectively retrieve the secret data. When the quality degradation of the host image exceeds the decoder’s maximum error tolerance, the accuracy of the extracted data diminishes, hence impairing the decoder’s extraction capabilities. The decoder’s feedback during training guides the encoder to modify image features to embed data into resilient regions. Simultaneously, owing to the analogous structure, the decoder can effectively extract data features to reconstruct the secret data using a decoding loss.

3.2.4. Adversary Discriminator

The encoder’s performance can be enhanced by a competitive approach using the discriminator []. An enhanced discriminator featuring additional network layers and a more intricate architecture would positively influence the encoder. The discriminator differentiates between the cover image

I_{C}

and the host image

I_{H}

by analyzing the distribution of the cover image. Nevertheless, the encoder produces a host image that closely resembles the cover image, thereby misleading the discriminator and diminishing its detection efficacy. Specifically, after a sequence of adversarial training, the discriminator improves a more secure steganographic method by determining whether the host image contains secret data. Figure 4 illustrates the construction of an efficient discriminator, which features a structure akin to that of the decoder. Following the extraction of shallow features, a configuration including five

C o n v_{C i r}^{2 \times 2} - B N - L R

structures is employed, utilizing

C o n v_{D 6}

blocks. Following five convolutional blocks, pooling, and linear layers are utilized on the output for binary classification.

Figure 4. The framework of the adversary discriminator.

3.3. Training Objective

Our emphasis is on training ADPGAN, in which the embedding network, JPEG simulation layer, adversarial discriminator, and extraction network collectively constitute a holistic, end-to-end trainable model, as illustrated in Figure 1. The JPEG simulation layer emulates JPEG compression and decompression as a differentiable network layer, enabling back-propagation of training gradients and allowing the embedding and extraction networks to train more resilient embedded images during the JPEG compression training phase.

The primary objective of the network module is to develop embedding and extraction networks that produce host images resilient to compression attacks and can completely retrieve the secret data from the host image. The inclusion of the discriminator enhances the visual quality of the host images. Consequently, each network possesses a distinct loss function. The goal of the embedding network is to produce host images that exhibit outstanding invisibility and little distortion. The structural similarity index (SSIM) serves as a training loss function to quantify the decrease in image quality caused by the embedding of secret data.

\begin{matrix} L_{1} & = SSIM (I_{C}, I_{H}) \\ = \frac{(2 μ_{I_{C}} μ_{I_{H}} + c_{1}) (2 σ_{I_{C}, I_{H}} + c_{2})}{(μ_{I_{C}}^{2} μ_{I_{H}}^{2} + c_{1}) (σ_{I_{C}}^{2} σ_{I_{H}}^{2} + c_{2})} \end{matrix}

(11)

where

I_{C}

is the input image of the encoder network,

I_{H}

is the output of the encoder network,

μ_{I_{C}}

and

μ_{I_{H}}

are the mean values of

I_{C}

and

I_{H}

, respectively,

σ_{I_{C}}

and

σ_{I_{H}}

are their variances, and

σ_{I_{C}, I_{H}}

is the covariance of

(I_{C}, I_{H})

. During training, we set the constants of

c_{1}

and

c_{2}

to

10^{- 4}

and

9 \times 10^{- 4}

, respectively. To achieve the robustness of secret information, the decoder uses binary cross-entropy as the loss function.

\begin{matrix} L_{2} & = CE (S_{d a t a}, S_{d a t a}^{'}) \\ = - \sum_{i = 1}^{h \times w} {(S_{d a t a})}_{i} \log ({(S_{d a t a}^{'})}_{i}) + & (1 - {(S_{d a t a})}_{i}) \\ l o g (1 - {(S_{d a t a}^{'})}_{i}) \end{matrix}

(12)

where

S_{d a t a} = [{(S_{d a t a})}_{i}] \in {0, 1}^{h \times w}

is the original binary data,

S_{d a t a}^{'} = [{(S_{d a t a}^{'})}_{i}] \in {[0, 1]}^{h \times w}

is the output of the extraction network, which is the probability of the

h \times w

binary bits

S_{d a t a}

,

P {S_{data}^{'} = S_{d a t a}}

. The discriminator uses binary cross-entropy to update the parameter of network in order to output the likelihood of incorrect judgments, increasing the invisibility of the host image through competition with the encoder.

\begin{matrix} L_{3} & = E_{p \sim I_{C}} [l o g (1 - D (I_{H}))] \end{matrix}

(13)

where

D (\cdot)

is the probability of detecting that

I_{H}

contains

S_{d a t a}

. To achieve high invisibility and robustness in the secret data, we employ a weighted combination of embedding, extraction, and comparison loss functions. The objective is to minimize the total loss, as shown in the Equation (14).

\begin{matrix} L_{total} = λ_{1} L_{1} + λ_{2} L_{2} + λ_{3} L_{3} \end{matrix}

(14)

where

λ_{1}

,

λ_{2}

, and

λ_{3}

represent the weights of

L_{1}

,

L_{2}

, and

L_{3}

, respectively. We employ an end-to-end training methodology for the secret data embedding and extraction process. We adjust the encoder and decoder parameters by minimizing their individual losses. To obtain a host image with high visual quality and enhanced robustness, the encoder is guided not only by the losses from the encoder and the discriminator but also, indirectly, by the decoder’s loss. For instance, the discriminator improves its detection performance on the host image by adjusting its parameters, compelling the encoder to produce images with higher imperceptibility. In contrast, when the encoder receives feedback from the decoder, it adjusts the features of the cover image to make it more attack-resistant while still extracting all secret data from the host image. Consequently, network training achieves commendable performance via multiobjective optimization.

4. Experimental Results

4.1. Experimental Setup and Evaluation Metrics

The proposed framework consists of two independently trained components: an image sampling network and an ADPGAN that handles the embedding and extraction of secret data. Although ADPGAN offers excellent robustness and imperceptibility, it has limitations in embedding capacity. The combination of ADPGAN and the image sampling network enables the framework to overcome the embedding capacity constraint while ensuring high-quality reconstruction of the secret image under attack. In the first component, we adhered to the training procedure described in [], with the modification that we converted the DIV2K dataset [] to grayscale for use as a training set to meet our experimental requirements. In the second section, we utilized the CIFAR-10 dataset [] and the Pascal VOC dataset [] as our training sets. The CIFAR-10 dataset comprises 60,000

32 \times 32

pixel color images, characterized by its small image size, diverse categories, and balanced sample sizes per category, making it one of our chosen training sets. Specifically, we converted the images in this dataset to grayscale. The Pascal VOC dataset, containing over 15,000 color images, can be combined with other datasets to enhance the model’s generalization and robustness. Specifically, we randomly extracted

32 \times 32

pixel image patches from random locations in the dataset images and converted them to grayscale. We combined the processed CIFAR-10 and PASCAL VOC datasets to create training patches with about 1.8 million

32 \times 32

grayscale image patches. To include a variety of patterns, from smooth to detailed, which helps stabilize the training process, we chose a subset that covers the full range of intensities. Codes are available at https://github.com/874434500/ADPGAN.git (accessed on 4 November 2025).

In the secret data embedding and extraction experiment, to achieve a balance between imperceptibility and robustness, we used images measuring

32 \times 32

and secret data sized

4 \times 4

as inputs while configuring the block size to

8 \times 8

during the reshaping phase, resulting in each input image comprising 16 blocks. All convolutional filters in both the embedding and extraction networks were sized at 64, with a stride of 1. To prevent network bias caused by specific data characteristics, we assign secret data to the input blocks using random values. We employed stochastic gradient descent (SGD) to update network parameters and train the network, using the training configurations and settings detailed in Table 1.

Table 1. The training parameters of ADPGAN network.

After training, the secret data end-to-end model only keeps the embedding and extraction layers. These, along with the trained image sampling network, make up the framework. The framework’s robustness is assessed during the testing phase by real-world attacks rather than simulated ones. We use the grayscale-processed ImageNet dataset [] as the set of secret images (

I_{S}

) to test how well the proposed framework performs. ImageNet contains over 1000 different types of scenes that can be used to protect privacy in secret images, such as everyday objects, people at work, and indoor settings. On the other hand, due to the prevalence of the 49 standard test images in the Fdez-Vidal dataset [] for numerical analysis and comparative assessment, we have selected it as the cover image (

I_{C}

). We assess our architecture by embedding

128 \times 128

secret images into

512 \times 512

greyscale cover images. The trained encoder achieves 0.015

(4^{2} / 32^{2})

bits per pixel (BPP) rate, thus enabling the embedding of 4096

(4^{2} / 32^{2} \times 512^{2})

bits of secret data into a

512 \times 512

cover image. In the downgrading phase, the secret image is reduced to a

32 \times 32

scaled format fed into the encoding network to produce the host image (

I_{H}

). After simulating real-world compression attacks, the extraction network recovers the secret data based on the attack image. After the secret data is extracted, it is fed into the image sampling network, where the transformation model creates the

32 \times 32

recovered downsampled image (

I_{D}^{'}

). Finally, the image sampling network generates a recovered secret image (

I_{S}^{'}

) of dimensions

128 \times 128

based on

I_{D}^{'}

.

4.2. Framework Invisibility and Robustness Evaluation

The effectiveness of the proposed framework is demonstrated by the visual effects of recovered images under JPEG compression with different quality factors, as shown in Figure 5. It can be observed that even under severe JPEG compression, the framework still maintains strong anti-distortion capability. This is because ADPGAN exhibits excellent robustness to JPEG compression, allowing the image scaling network and transformation layer to recover high-resolution secret images.

Figure 5. The grayscale visual effect of recovered images by our scheme under JPEG compression with different Quality factors.

The effectiveness of the proposed framework was further validated by evaluating host images with varying intensity parameters. We picked 1000 images at random to use as secret images and evaluated the trained framework by measuring how similar the host image and the cover image were, as well as the quality of the recovered secret image using PSNR and SSIM. Table 2 demonstrates that as

α

increases, the host image exhibits greater robustness against JPEG compression, with host images showing significant resistance to JPEG compression (BER = 0) when

α

exceeds 0.4. However, image quality decreases as

α

increases. This is because a higher

α

enhances the intensity of secret data features in the host image, thereby improving robustness against attacks while reducing imperceptibility. The enhanced secret data features will then alter the pixel blocks of the cover image, thereby reducing the image’s imperceptibility. In summary, the use of

α

enables a balance between invisibility and robustness to be achieved based on different requirements.

Table 2. The image quality of host images with different strength factors and their robustness after being attacked by different JPEG compression levels.

Figure 6 illustrates the image quality of the recovered

I_{D}^{'}

and

I_{S}^{'}

. As

α

increases,

I_{D}^{'}

exhibits improved image quality, enabling the recovery of a high-quality

I_{S}^{'}

. This is because

α

adjusts the intensity of secret data features within the host image, with larger

α

values indicating greater robustness against attacks. Simultaneously, as Q decreases, image quality degrades due to the lower quality factor, causing a significant loss of detail and texture information, particularly in high-frequency components. Nevertheless,

I_{S}^{'}

maintains excellent image quality, attributable to the CFFM module in ADPGAN’s pronounced robustness against severe compression attacks, while the image sampling network provides error tolerance for secret image recovery. In summary, the proposed framework effectively mitigates the degradation in the quality of the secret image under attack.

Figure 6. The recovery performance of the proposed framework under different strength factors and quality factors.

Figure 7 illustrates the invisibility of the framework, encompassing the cover image, host image, secret image, and the recovered secret image. The enlarged sections of the cover image and host image are visually identical, suggesting that the secret data has been embedded in unobtrusive locations. Furthermore, we observe that the intensity of the artifacts generated in the corresponding region of the host image varies across images when residual processing is applied to the designated area, and the difference is amplified by a factor of ten. The fluctuations in the difference matrix indicate that the proposed framework adaptively integrates the secret data based on the image’s local properties. Additionally, after implementing a

Q = 20

JPEG compression attack, we magnified the detailed portions of

I_{S}^{'}

and observed that its texture details closely resemble those of

I_{S}

. This observation is attributed to ADPGAN’s excellent robustness against JPEG compression, which facilitates the recovery of

I_{D}^{'}

into a high-quality

I_{S}^{'}

. The proposed framework enhances the security of the secret image while maintaining the excellent imperceptibility of the host image.

Figure 7. The detailed comparison of different images (

I_{C}

,

I_{H}

,

I_{S}

,

I_{S}^{'}

) under strength factor

α = 0.8

and JPEG compression with

Q = 20

in the proposed framework.

To demonstrate the embedding capacity of the proposed framework, we only adjust the parameters of the image sampling network to generate three models for sampling secret images of sizes

64 \times 64

,

128 \times 128

, and

256 \times 256

, which are then integrated into the framework. To ensure the security of the

I_{S}

, Table 3 presents the performance of different models under compression with

α = 0.8

and Q = 20. Evidently, as the resolution of

I_{S}

increases, the quality of

I_{S}^{'}

degrades; however, the framework still maintains excellent recovered image quality. This is because larger image sizes result in more sampling network parameters and more pixels to process, reducing the model’s ability to minimize information loss. In summary, incorporating the image sampling network ensures the framework maintains embedding capacity, thereby allowing ADPGAN to focus on robustness and imperceptibility. To evaluate the cross-dataset generalization capability of the proposed framework, we utilized the grayscale-processed BOSSBase (v1.01) dataset [] and the Common Objects in Context (COCO) dataset [], as shown in Table 4. The performance across different datasets is significantly consistent within a specific range, and the

I_{S}^{'}

can be recovered with excellent quality. This is because ADPGAN is based on visual perception, enabling the framework to embed data or reconstruct images based on the characteristics of each image. Consequently, the proposed framework maintains robust performance across diverse datasets.

Table 3. The performance of different sieze of secret image under JPEG compression with strength factor

α = 0.8

and JPEG compression with

Q = 20

.

Table 4. The performance of the proposed framework using different datasets under strength factor

α = 0.8

and JPEG compression with

Q = 20

.

4.3. Framework Security Evaluation

The key space size refers to the total number of possible keys in a cryptographic system. It is a critical determinant in evaluating the system’s security, particularly against brute-force attacks. A brute-force attack involves an adversary attempting every possible key until the correct one is found to decrypt the data. A larger key space requires the attacker to try more keys, thereby increasing system security. To ensure a high level of security, the key space should be at least

2^{128}

[]. The proposed framework employs a key size of 1024 bits, yielding a key space of

2^{1024}

and effectively resisting brute-force attacks.

To demonstrate the proposed framework’s resistance to differential attacks, we highlight its high sensitivity to key differences. During the reconstruction of

I_{D}^{'}

, even a minor alteration in the key should result in significant distortion in the recovered image, rendering

I_{S}^{'}

unrecognizable. This characteristic increases the difficulty for attackers to decode the key, thereby enhancing the framework’s robustness. Here, we introduced a slight modification (1 bit) to the initial key, and Figure 8 illustrates the resulting

I_{S}^{'}

. It is evident that

I_{S}^{'}

is heavily corrupted with noise, making the original shape unrecognizable. Table 5 presents the image quality of IR’ and

I_{S}^{'}

, highlighting the proposed framework’s pronounced sensitivity to key changes, where even a 1-bit alteration can result in complete inconsistency in the

P R K

within the image sampling network. As a result, low-resolution

I_{D}^{'}

is recovered as a low-quality

I_{S}^{'}

, thereby enhancing the security of JCIF. In summary, the proposed framework demonstrates excellent robustness against differential attacks.

Figure 8. The comparison of the secret image and the recovered secret image after changing the 1-bit key value.

Table 5. The image quality of the

I_{S}^{'}

after being compressed with

Q = 20

and changing the 1-bit key value.

In addition to JPEG compression, the proposed ADPGAN is applicable to other image processing distortions, such as cropping attacks, Gaussian noise, and more. We introduce a grid-cropping attack that randomly suppresses

8 \times 8

blocks across the entire image to zero, with parameters specifying the percentage of suppressed blocks. Gaussian filters, salt-and-pepper noise, cropping attacks, and sharpening are incorporated into the noise layer during training. Figure 9 and Table 6 illustrate the various attacked images and the model’s robustness against them. The model shows strong resistance to Gaussian and salt-and-pepper noise, as well as sharpening, thanks to DenseNet, which helps it learn and embed data features into strong areas. Simultaneously, even when certain significant image blocks are cropped, the model retains robust data-extraction capabilities, as the incorporation of circular convolution facilitates the diffusion and sharing of secret data. Overall, the suggested framework exhibits robust performance against diverse attacks.

Figure 9. Visual effects of various attacks on host image. (a) Host image. (b) Gaussian noise (

σ

= 45). (c) Salt & Pepper (15%). (d) Grid Crop (30%). (e) Cropping (30%). (f) Sharpening (Rad = 40).

Table 6. The robustness results of the proposed network against different attacks.

4.4. Ablation Study

Effert of Preprocessing Component. This section outlines a series of ablation experiments to clearly illustrate the efficacy of preprocessing techniques and methodologies, such as CFFM. The preprocessing component (Conv-BN-ReLU, circular convolution, and SE block) serves as the initial element of the encoder, tasked with extracting feature maps that are minimally impacted by JPEG compression in the embedded regions. Ablation experiments were devised to evaluate the efficacy of each component of the preprocessing phase, as illustrated in Table 7. It is evident that when specific elements of the preprocessing component are absent, the host images maintain satisfactory visual quality; however, their resilience is markedly diminished compared to the ADPGAN. Given that full secret data extraction is typically possible at

α = 1

, ADPGAN’s advantage is further illustrated by its robustness to JPEG compression at

α = 0.2

. The preprocessing encodes the cover image and the secret data, as well as training feature maps and data attributes that are resilient to JPEG compression.

Table 7. Comparison of different components in preprocessing.

Effert of CFFM. CFFM is tasked with extracting both deep and superficial information from images. The implementation of dense connections and circular convolutions improves robustness. To assess the efficacy of CFFM, three comparable models were employed, wherein the dense connections in CFFM were substituted with successive connections (

C F F M_{S u c}

) and skip connections (

C F F M_{S k i p}

), while circular convolutions were replaced with standard convolutions (

C F F M_{C i r}

). Figure 10 depicts these three kinds of connections. The three models were trained using identical parameters, and their respective outcomes are presented in Table 8.

C F F M_{C i r}

has a PSNR that is 0.55 dB higher than ADPGAN when

α = 0.2

. However, bit error rates rise by 0.09 and 0.042 during compression attacks with Q = 20 and Q = 10, respectively. This is mostly due to circular convolution’s ability to reuse image characteristics, thereby enhancing resilience. The image quality of

C F F M_{S u c}

is worse than that of ADPGAN by 1.58 and 0.6 at

α = 0.2

and

α = 1

, respectively, and its robustness is similarly diminished compared to ADPGAN. This is due to

C F F M_{S u c}

exclusively using the deep features from the circular dense block without feature reutilization, which drastically diminishes robustness. Conversely,

C F F M_{S k i p}

exhibits superior BER compared to

C F F M_{S u c}

due to its utilization of image features to bolster robustness; nonetheless, it exclusively reuses the forward features from the preceding circular dense block. In contrast to others, ADPGAN’s circular dense blocks enhance resilience by relearning shallow features and mitigate image distortion by the reuse of shallow features at the layer level. This indicates that by reusing prior features within circular dense blocks, each block can regain both superficial and deep features, thereby enhancing model performance. In conclusion, feature reuse within circular dense blocks significantly enhances the efficacy of secret data extraction.

Figure 10. Three different types of connections: (a) Successive connection; (b) Non-circular connection; (c) Skip connection.

Table 8. Comparison of different components in CFFM.

Effert of CAM and Attack Layer. Ablation tests on CAM and attack layer were also performed, as illustrated in Table 9. It can be observed that the network without the CAM module (

A D P G A N_{C F F M}

) performs inferior to ADPGAN. This is because using CAM enables the host image to match the features of the cover image, thereby reducing the distortion caused by embedding. Meanwhile, it guides the data to be embedded in inconspicuous, texture-rich regions to enhance robustness. Compared with the network that does not employ circular convolution in the CAM module (

A D P G A N_{A M}

), the CAM module in ADPGAN improves robustness by sharing data among adjacent blocks. The

A D P G A N_{A t t a c k}

model evidently attains superior PSNR and SSIM values relative to the models subjected to attacks. The absence of the attack constraint leads the model to prioritize enhancing imperceptibility, enabling complete data extraction despite decoding loss. Nevertheless, when exposed to a JPEG attack, the resilience markedly diminishes due to the absence of attack training.

Table 9. The comparison of CAM and Attack Layer.

4.5. RGB Visualization

To adapt the framework for color images, we altered the ADPGAN’s parameters (convolutional channels, DCT weights, and quantization table) and transitioned the training set to RGB images to retrain the ADPGAN and the image sampling network, which are then integrated to form the

J C I F_{R G B}

framework. During testing, we used MATLAB R2023b to convert the Fdez-Vidal dataset into RGB images for use as cover images, while the Image_Net dataset served as the secret images. Figure 11 demonstrates the imperceptibility and high fidelity of ADPGAN. Clearly, the cover image and the host image pair at

α = 1

, as well as the secret image and the secret image pair extracted after compression with different quality factors, are visually indistinguishable. Table 10 demonstrates that ADPGAN host images in RGB have superior image quality relative to those in grayscale, as RGB images have more pixels and channels than grayscale images, enabling ADPGAN to detect more subtle regions for data embedding. At

Q = 60

, the BER of ADPGAN escalates due to the increased computational complexity of RGB images, diminishing ADPGAN’s robustness to JPEG compression. Nevertheless, the recovered secret images exhibit high image quality, suggesting that the proposed framework effectively processes color images.

Figure 11. The RGB visual effect of recovered images by our scheme under JPEG compression with different quality factors.

Table 10. The performance of the proposed framework under strength factor

α = 1

.

4.6. Steganographic Analysis

To evaluate the capability of resisting steganalysis, we adopted the methodology presented in [] to transform the BOSSBase v1.01 [] image set into the BOSSBaseColor dataset, which consists of RGB images. The BOSSBaseColor was utilized to generate stego images via HiNet [] with its official parameters, forming cover-stego image pairs. These pairs were then employed as the training set for steganalyzers. We adopted the steganalyzer training methodology presented in [] and evaluated the capability of resisting steganalysis using the miss detection rate (

R_{m d}

), false alarm rate (

R_{f a}

), and error rate (

R_{e r r}

). The

R_{m d}

represents the proportion of stego images incorrectly classified as cover images, the

R_{f a}

denotes the proportion of cover images incorrectly classified as stego images, and

R_{e r r}

is the ratio of the total number of miss detections and false alarms to the total number of samples. The results of different digital image hiding (DIH) methods are shown in Table 11.

Table 11. The security performance (IN %) of DIH methods against known steganalyzers.

J C I F_{R G B}

achieved excellent

R_{m d}

under various steganalyzers, particularly achieving the highest

R_{m d}

against SRNet (89.1%) and CovPoolNet (51.1%). This is attributed to the use of CAM, which embeds data in inconspicuous, texture-rich regions, enabling

J C I F_{R G B}

to effectively evade detection by steganalysis tools. In most cases,

J C I F_{R G B}

achieved the highest

R_{e r r}

, demonstrating its effectiveness in deceiving steganalyzers, except for the results with ZhuNet (low

R_{m d}

and high

R_{f a}

). This is attributed to the steganalyzer’s sensitivity to high-frequency features, which leads it to predominantly predict most samples as cover images. In conclusion, our approach demonstrates excellent resistance to steganalysis.

Under the detection by the steganalysis tool StegExpose [], the performance of JCIF exceeds that of HiNet and ISN but is inferior to MSM-DIH and Baluja, as shown in Figure 12. Because Baluja’s modification scope extends beyond the LSBs, rendering StegExpose, which is designed to capture embedding traces in LSBs, incapable of detection. MSM-DIH focuses on enhancing imperceptibility, whereas JCIF effectively balances robustness and imperceptibility.

Figure 12. The ROC curves of StegExpose for detecting DIH methods.

4.7. Comparison with State-of-the-Art

To demonstrate the robustness of ADPGAN, we compare its steganographic quantitative results for grayscale images with those of other methods, as shown in Table 12. It can be observed that ADPGAN achieves higher extraction accuracy across different quality factors, exhibiting strong resistance to JPEG compression. Among the compared methods, MOANet is the most competitive with ADPGAN, while the others have more errors.

Table 12. Qualitative comparison of steganography restoration under 30, 50, 70 and 90 QFs.

In Figure 13, a visual comparison of our framework with other methods (including HiNet [] and PRIS []) under varying JPEG quality factor degradations clearly demonstrates JCIF’s superior recovery capability. Despite significant color distortion or failure in competing methods when facing severe compression, JCIF effectively reconstructs the secret image with remarkable fidelity and robustness to compression.

Figure 13. The qualitative results of subjective evaluation of different DIH methods in terms of JPEG compression.

To compare with state-of-the-art methods, we conducted experiments simulating degradation and achieved their reported performance on the DIV2K dataset. Table 13 presents the performance of hiding images within host images contaminated by noise or JPEG compression. Our framework not only exhibits superior visual quality in the absence of attacks but also demonstrates remarkable adaptability across varying levels of degradation, with only marginal performance decline, whereas other methods experience significant fidelity degradation. Compared to other schemes, the quality degradation in the proposed framework during secret image recovery primarily stems from information loss during image sampling rather than JPEG compression, enabling controllable image degradation and ensuring high fidelity. Consequently, the retrieved downsampled image

I_{D}^{'}

is distorted to some extent, yet the secret image can still be recovered with high quality.

Table 13. PSNR (dB) results of the proposed

J C I F_{R G B}

and other methods under different levels of degradations. The proposed

J C I F_{R G B}

can achieve superior data fidelity in most settings. The best results are red and the second-best results are blue.

5. Discussion

To achieve excellent imperceptibility and robustness for the proposed framework, we deploy CAM and CFFM in ADPGAN and utilize DCT to enhance network performance. However, this also increases computational complexity, as CAM computation and the multi-layer reuse of secret data increase the network’s parameter count. Therefore, we must acknowledge that, despite ADPGAN achieving strong performance in imperceptibility and robustness, its practical applicability is limited due to high computational cost. Additionally, the use of DCT reduces the embedding capacity of ADPGAN compared to spatial-domain embedding, further limiting its applications.

We evaluate the computational requirements of the proposed framework across different color spaces, with the results presented in Table 14, where the values in parentheses indicate the ADPGAN parameters. It is evident that JCIF is a lightweight and efficient network; however, its limitation is its ability to process only grayscale images. Although

J C I F_{RGB}

can effectively process color images, it requires a larger model size (>15 M) to achieve satisfactory results, with room for improvement in terms of robustness. On the other hand, we evaluated the testing time using a device equipped with 16 GB RAM and an Intel Core i7-6700 processor. The testing times for the two frameworks were 8.32 s and 9.48 s, respectively, demonstrating their suitability for consumer electronics or edge devices.In future work, we first focus on enhancing the embedding capacity of the data while maintaining ADPGAN’s high imperceptibility and robustness. Additionally, we will explore downsampling the secret image before embedding it within the same network and compare its performance with the full-size direct embedding method.

Table 14. The computationally expensive of the proposed framework.

6. Conclusions

This study proposes an anti-compression attention-based diffusion pattern steganography model using GAN (ADPGAN). The model includes an encoder, a decoder, an attack layer, and a discriminator, trained end-to-end to address the performance limitations of existing deep-learning-based steganography models. After transitioning to the transform domain, the circular feature fusion module (CFFM) integrates shallow and deep image features with secret data, enhancing robustness against JPEG compression. The circular attention module (CAM) guides the encoder to embed data into inconspicuous regions, improving the imperceptibility of the host image. To address poor secret-image quality at the receiver, we propose a novel JPEG-compression-resistant image framework (JCIF) based on ADPGAN. Unlike direct full-size embedding, we downsample the secret image using an image downsampling network before embedding it into the image via ADPGAN. Benefiting from ADPGAN’s high robustness, the loss in the recovered secret image at the receiver primarily stems from sampling rather than JPEG compression, enabling the reception of high-quality secret images. Experimental results demonstrate that ADPGAN not only achieves good imperceptibility to evade detection but also exhibits high robustness against JPEG compression. Ablation studies validate ADPGAN’s effectiveness, enabling JCIF to demonstrate enhanced visual quality at the receiver across various levels of degradation. However, pursuing high robustness in ADPGAN results in limited embedding capacity and high computational cost, leading to an oversized framework. In future work, while ensuring robustness and imperceptibility, we focus on increasing the model’s embedding capacity. We will also explore the feasibility of integrating the two networks for joint training—specifically, downsampling the secret image before embedding it—and compare its performance with direct embedding without preprocessing.

Author Contributions

Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Validation, Visualization, Writing—original draft, Writing—review & editing: Z.-Q.C.; Review & editing, project administration, study design: Y.-H.H.; Review & editing, literature retrieval, study design: X.-Y.C.; Review & editing, project administration, methodology, formal analysis: S.-L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Pascal-voc-2012: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (accessed on 6 October 2025); Fdez-Vidal: http://decsai.ugr.es/cvg/CG/base.htm (accessed on 6 October 2025); CIFAR-10: https://www.cs.toronto.edu/~kriz/cifar.html; COCO: https://cocodataset.org/#home (accessed on 6 October 2025); BOSSBase(v1.01): https://www.kaggle.com/datasets/lijiyu/bossbase (accessed on 12 October 2025); Image_Net: https://www.image-net.org/update-mar-11-2021.php (accessed on 12 October 2025).

Acknowledgments

The technical ideas, research design, implementation, experimental evaluation, and primary intellectual contributions are solely attributable to the author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Anand, A.; Singh, A.K. A Hybrid Optimization-Based Medical Data Hiding Scheme for Industrial Internet of Things Security. IEEE Trans. Ind. Informat. 2023, 19, 1051–1058. [Google Scholar] [CrossRef]
Melman, A.; Evsutin, O.; Smirnov, D. An Image Watermarking Algorithm in DCT Domain Based on Optimal Patterns. In Proceedings of the 2023 XVIII International Symposium Problems of Redundancy in Information and Control Systems (REDUNDANCY), Moscow, Russian, 24–27 October 2023; pp. 1–5. [Google Scholar]
Kamal, A.A. Searchable encryption of image based on secret sharing scheme. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 1495–1503. [Google Scholar]
Peng, Y.; Fu, C.; Cao, G.; Song, W.; Chen, J.; Sham, C.-W. JPEG-compatible joint image compression and encryption algorithm with file size preservation. ACM Trans. Multimed. Comput. Commun. Appl. 2024, 20, 105. [Google Scholar] [CrossRef]
Yang, H.; Xu, Y.; Liu, X. DKiS: Decay weight invertible image steganography with private key. Neural Netw. 2025, 185, 107148. [Google Scholar] [CrossRef] [PubMed]
Singh, H.K.; Baranwal, N.; Singh, K.N.; Singh, A.K. GANMarked: Using secure GAN for information hiding in digital images. IEEE Trans. Consum. Electron. 2024, 70, 6189–6195. [Google Scholar] [CrossRef]
Ahmadi, M.; Norouzi, A.; Karimi, N.; Samavi, S.; Emami, A. ReDMark: Framework for residual diffusion watermarking based on deep networks. Expert Syst. Appl. 2020, 146, 113157. [Google Scholar] [CrossRef]
Yang, H.; Xu, Y.; Liu, X.; Ma, X. PRIS: Practical robust invertible network for image steganography. Eng. Appl. Artif. Intell. 2024, 133, 108419. [Google Scholar] [CrossRef]
Jia, Z.; Fang, H.; Zhang, W. MBRS: Enhancing robustness of DNN-based watermarking by mini-batch of real and simulated JPEG compression. In Proceedings of the 29th ACM International Conference on Multimedia, New York, NY, USA, 20–24 October 2021; pp. 41–49. [Google Scholar] [CrossRef]
Chen, Z.; Liu, Y.; Ke, G.; Wang, J.; Zhao, W.; Lo, S. A region-selective anti-compression image encryption algorithm based on deep networks. Int. J. Comput. Intell. Syst. 2024, 17, 117. [Google Scholar] [CrossRef]
Mielikainen, J. LSB matching revisited. IEEE Signal Process. Lett. 2006, 13, 285–287. [Google Scholar] [CrossRef]
Jing, J.; Deng, X.; Xu, M.; Wang, J.; Guan, Z. HiNet: Deep Image Hiding by Invertible Network. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 4713–4722. [Google Scholar]
Lu, S.-P.S. Large-capacity Image Steganography Based on Invertible Neural Networks. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10811–10820. [Google Scholar]
Cui, Q.; Tang, W.; Zhou, Z.; Meng, R.; Nan, G.; Shi, Y.-Q. Meta Security Metric Learning for Secure Deep Image Hiding. IEEE Trans. Dependable Secur. Comput. 2024, 21, 4907–4920. [Google Scholar] [CrossRef]
Tembhare, N.P.; Tembhare, P.U.; Chauhan, C.U. Chest X-ray analysis using deep learning. Int. J. Sci. Technol. Eng. 2023, 11, 1441–1447. [Google Scholar]
Huang, G.; Liu, Z.; Pleiss, G.; van der Maaten, L.; Weinberger, K.Q. Convolutional networks with dense connectivity. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 8704–8716. [Google Scholar] [CrossRef]
Glorot, X.; Bordes, A.; Bengio, Y. Deep sparse rectifier neural networks. J. Mach. Learn. Res. 2011, 15, 315–323. [Google Scholar]
Szegedy, C.C. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Volume 42, pp. 2011–2023. [Google Scholar]
Tan, J.; Liao, X.; Liu, J.; Cao, Y.; Jiang, H. Channel Attention Image Steganography With Generative Adversarial Networks. IEEE Trans. Netw. Sci. Eng. 2022, 9, 888–903. [Google Scholar] [CrossRef]
Wang, J.; Wang, H.; Zhang, J.; Wu, H.; Luo, X.; Ma, B. Invisible Adversarial Watermarking: A Novel Security Mechanism for Enhancing Copyright Protection. ACM Trans. Multimed. Comput. Commun. Appl. 2025, 21, 1–22. [Google Scholar] [CrossRef]
Xie, S.; Zhao, C.; Sun, N.; Li, W.; Ling, H. Picking watermarks from noise (PWFN): An improved robust watermarking model against intensive distortions. In Proceedings of the 2024 IEEE International Conference on Multimedia and Expo (ICME), Niagara Falls, ON, Canada, 15–19 July 2024. [Google Scholar] [CrossRef]
Huang, J.; Luo, T.; Li, L.; Yang, G.; Xu, H.; Chang, C.-C. ARWGAN: Attention-guided robust image watermarking model based on GAN. IEEE Trans. Instrum. Meas. 2023, 72, 5018417. [Google Scholar] [CrossRef]
Xiao, M.; Zheng, S.; Liu, C.; Lin, Z.; Liu, T.-Y. Invertible rescaling network and its extensions. Int. J. Comput. Vis. 2023, 131, 134–159. [Google Scholar] [CrossRef]
Lepcha, D.C.; Goyal, B.; Dogra, A.; Goyal, V. Image super-resolution: A comprehensive review, recent trends, challenges and applications. Inf. Fusion 2023, 91, 230–260. [Google Scholar] [CrossRef]
Dabas, P.; Khanna, K. A study on spatial and transform domain watermarking techniques. Int. J. Comput. Appl. 2013, 71, 38–41. [Google Scholar] [CrossRef]
Wang, D.; Yang, G.; Chen, J.; Ding, X. GAN-based adaptive cost learning for enhanced image steganography security. Expert Syst. Appl. 2024, 249, 123471. [Google Scholar] [CrossRef]
Agustsson, E.A.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Krizhevsky, A. Learning multiple layers of features from tiny images. Tech. Rep. 2009. Available online: https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 6 October 2025).
Everingham, M.M. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. PASCAL VOC2012 Workshop. 2012. Available online: http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html (accessed on 6 October 2025).
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Li, F.-F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
Fdez-Vidal, X.R. Visual distinctness metric for coder performance evaluation. In Proceedings of the Visual Distinctness Metric Conference, 2014; Available online: http://decsai.ugr.es/cvg/CG/base.htm (accessed on 6 October 2025).
Bas, P.P. Break our steganographic system: The ins and outs of organizing BOSS. In International Workshop on Information Hiding; Springer: Berlin/Heidelberg, Germany, 2011; pp. 59–70. [Google Scholar]
Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L.; Dollár, P. Microsoft COCO: Common objects in context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Li, R.Z.; Liu, Q.L.; Liu, L.F. Novel image encryption algorithm based on improved logistic map. IET Image Process. 2019, 13, 125–134. [Google Scholar] [CrossRef]
Zeng, J.; Tan, S.; Liu, G.; Li, B.; Huang, J. WISERNet: Wider Separate-Then-Reunion Network for Steganalysis of Color Images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 2735–2748. [Google Scholar] [CrossRef]
Xu, G.; Wu, H.-Z.; Shi, Y.-Q. Structural Design of Convolutional Neural Networks for Steganalysis. IEEE Signal Process. Lett. 2016, 23, 708–712. [Google Scholar] [CrossRef]
Boroumand, M.; Chen, M.; Fridrich, J. Deep Residual Network for Steganalysis of Digital Images. IEEE Trans. Inf. Forensics Secur. 2019, 14, 1181–1193. [Google Scholar] [CrossRef]
Deng, X.; Chen, B.; Luo, W.; Luo, D. Fast and Effective Global Covariance Pooling Network for Image Steganalysis. In Proceedings of the ACM Workshop on Information Hiding and Multimedia Security, New York, NY, USA, 3–5 July 2019; pp. 230–234. [Google Scholar]
Zhang, R.; Zhu, F.; Liu, J.; Liu, G. Depth-Wise Separable Convolutions and Multi-Level Pooling for an Efficient Spatial CNN-Based Steganalysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 1138–1150. [Google Scholar] [CrossRef]
Boehm, B. StegExpose—A Tool for Detecting LSB Steganography. arXiv 2014, arXiv:1410.6656v1. [Google Scholar] [CrossRef]
Mun, S.-M.S.; Nam, S.-H.N.; Jang, H.-U.J.; Kim, D.K.; Lee, H.-K.L. A robust blind watermarking using convolutional neural network. arXiv 2017, arXiv:1704.03248. [Google Scholar] [CrossRef]
Zhong, X.X.; Huang, P.-C.H.; Mastorakis, S.M.; Shih, F.Y. An automated and robust image watermarking scheme based on deep neural networks. IEEE Trans. Multimed. 2020, 23, 1951–1961. [Google Scholar] [CrossRef]
Luo, X.Y.; Zhan, R.H.; Chang, H.W.; Yang, F.Y.; Milanfar, P. Distortion agnostic deep watermarking. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13548–13557. [Google Scholar]
Chen, B.B.; Wu, Y.W.; Coatrieux, G.C.; Chen, X.C.; Zheng, Y.H. JSNet: A simulation network of JPEG lossy compression and restoration for robust image watermarking against JPEG attack. Comput. Vis. Image Underst. 2020, 197, 103015. [Google Scholar] [CrossRef]
Li, Z.Z.; Zhang, X.S.; Yang, Y.I.; Gong, X.X. Mixed Order Attention Watermark Network Against JPEG Compression. In Proceedings of the 2024 21st International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 14–16 December 2024; pp. 1–7. [Google Scholar]
Baluja, S. Hiding Images within Images. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 1685–1697. [Google Scholar] [CrossRef] [PubMed]
Guan, Z.; Jing, J.; Deng, X.; Xu, M.; Jiang, L.; Zhang, Z.; Li, Y. DeepMIH: Deep Invertible Network for Multiple Image Hiding. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 372–390. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Mou, C.; Hu, Y.; Xie, J.; Zhang, J. Robust Invertible Image Steganography. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 7865–7874. [Google Scholar]
Yu, J.; Zhang, X.; Xu, Y.; Zhang, J. CRoSS: Diffusion Model Makes Controllable, Robust and Secure Image Steganography. In Proceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 10–16 December 2023. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, X.; Meng, X.; Mou, C.; Zhang, J. Diffusion-Based Hierarchical Image Steganography. In Proceedings of the 2025 IEEE International Conference on Multimedia and Expo (ICME), Nantes, France, 30 June–4 July 2025. [Google Scholar] [CrossRef]

Figure 1. The architecture of JCIF.

Figure 2. The framework of the proposed encoder network.

Figure 3. The framework of the decoder network.

Figure 4. The framework of the adversary discriminator.

Figure 5. The grayscale visual effect of recovered images by our scheme under JPEG compression with different Quality factors.

Figure 6. The recovery performance of the proposed framework under different strength factors and quality factors.

Figure 7. The detailed comparison of different images (

I_{C}

,

I_{H}

,

I_{S}

,

I_{S}^{'}

) under strength factor

α = 0.8

and JPEG compression with

Q = 20

in the proposed framework.

Figure 8. The comparison of the secret image and the recovered secret image after changing the 1-bit key value.

Figure 9. Visual effects of various attacks on host image. (a) Host image. (b) Gaussian noise (

σ

= 45). (c) Salt & Pepper (15%). (d) Grid Crop (30%). (e) Cropping (30%). (f) Sharpening (Rad = 40).

Figure 10. Three different types of connections: (a) Successive connection; (b) Non-circular connection; (c) Skip connection.

Figure 11. The RGB visual effect of recovered images by our scheme under JPEG compression with different quality factors.

Figure 12. The ROC curves of StegExpose for detecting DIH methods.

Figure 13. The qualitative results of subjective evaluation of different DIH methods in terms of JPEG compression.

Table 1. The training parameters of ADPGAN network.

Parameter	Value	Description
(W, H)	(32, 32)	Size of training image
(M, N)	(8, 8)	Size of divided image part
(S, R)	(4, 4)	Size of reshaped binary data $S_{d a t a}$
Epoch	150	Training epoch number
LR	$10^{- 4}$	Learning rate
$λ_{1}$	0.4	Encoder’s loss function weights
$λ_{2}$	0.6	Decoder’s loss function weights
$λ_{3}$	$10^{- 4}$	Discriminator’s loss function weights

Table 2. The image quality of host images with different strength factors and their robustness after being attacked by different JPEG compression levels.

$α$	Image Quality	Robustness (BER)
$α$	(PSNR/SSIM)	Q = 90	Q = 70	Q = 50	Q = 30	Q = 20
0.2	$40.95 / 0.990$	$1.06 \times 10^{- 1}$	$1.17 \times 10^{- 1}$	$1.34 \times 10^{- 1}$	$1.67 \times 10^{- 1}$	$1.67 \times 10^{- 1}$
0.4	$35.28 / 0.935$	$2.73 \times 10^{- 2}$	$1.66 \times 10^{- 2}$	$2.63 \times 10^{- 2}$	$4.00 \times 10^{- 2}$	$4.88 \times 10^{- 2}$
0.6	$32.03 / 0.920$	0	0	0	0	$1.00 \times 10^{- 3}$
0.8	$29.76 / 0.922$	0	0	0	0	0
1.0	$28.18 / 0.898$	0	0	0	0	0

Table 3. The performance of different sieze of secret image under JPEG compression with strength factor

α = 0.8

and JPEG compression with

Q = 20

.

Table 3. The performance of different sieze of secret image under JPEG compression with strength factor

α = 0.8

and JPEG compression with

Q = 20

.

Image Size	PSNR		SSIM		BER
Image Size	$I_{H}$	$I_{S}^{'}$	$I_{H}$	$I_{S}^{'}$	$S_{data}^{'}$
$64 \times 64$	29.78	48.06	0.851	0.97	0
$128 \times 128$	29.66	42.90	0.847	0.981	0
$256 \times 256$	29.69	38.72	0.847	0.951	0

Table 4. The performance of the proposed framework using different datasets under strength factor

α = 0.8

and JPEG compression with

Q = 20

.

Table 4. The performance of the proposed framework using different datasets under strength factor

α = 0.8

and JPEG compression with

Q = 20

.

Datasets	PSNR		SSIM		BER
Datasets	$I_{H}$	$I_{S}^{'}$	$I_{H}$	$I_{S}^{'}$	$S_{data}^{'}$
COCO	29.76	41.88	0.851	0.976	0
BOSSBase (v1.01)	29.67	42.19	0.850	0.958	0
Image_Net	29.66	42.90	0.846	0.963	0

Table 5. The image quality of the

I_{S}^{'}

after being compressed with

Q = 20

and changing the 1-bit key value.

Table 5. The image quality of the

I_{S}^{'}

after being compressed with

Q = 20

and changing the 1-bit key value.

$Image$	Image Quality (PSNR/SSIM)
$Image$	$I_{D}^{'}$	$I_{S}^{'}$
1	8.561/−0.0136	6.228/0.0163
2	6.738/−0.0259	5.536/0.0160
3	8.659/−0.0307	6.177/0.0156
4	8.127/−0.0005	5.760/0.0125
5	7.896/0.0001	5.992/0.0224

Table 6. The robustness results of the proposed network against different attacks.

Attack	Robustness (% BER)
	Gaussian Noise ( $σ$ )			Salt & Pepper Noise (%)			Grid Crop (%)			Cropping (%)			Sharpening (Radius)
	40	45	50	10	15	20	30	40	50	10	20	30	30	40	50
ADPGAN	0	0	$9.76 \times 10^{- 4}$	$2.92 \times 10^{- 3}$	$3.90 \times 10^{- 3}$	$1.66 \times 10^{- 2}$	0	0	$3.90 \times 10^{- 3}$	0	$1.85 \times 10^{- 2}$	$5.76 \times 10^{- 2}$	0	0	0

Table 7. Comparison of different components in preprocessing.

SE Block	Cir_Conv	ConvBNReLU	A = 1.0
SE Block	Cir_Conv	ConvBNReLU	Image Quality (PSNR/SSIM)	BER (Q = 20)	BER (Q = 10)
✓ ¹	✓	× ²	28.00/0.840	0	0
×	✓	✓	29.08/0.854³	0	0
×	×	×	28.90/0.854	0	0
✓	×	✓	28.53/0.847	0	0
✓	✓	✓	28.19/0.841	0	0
SE Block	Cir_Conv	ConvBNReLU	A = 0.2
SE Block	Cir_Conv	ConvBNReLU	Image Quality (PSNR/SSIM)	BER (Q = 20)	BER (Q = 20)
✓	✓	×	40.76/0.997	0.260	0.321
×	✓	✓	41.89/0.987	0.201	0.304
×	×	×	41.58/0.981	0.213	0.288
✓	×	✓	41.39/0.980	0.244	0.314
✓	✓	✓	40.96/0.979	0.195	0.26

¹ The model incorporates this component. ² The model lacks this component. ³ Bold indicates the best performance in this column.

Table 8. Comparison of different components in CFFM.

Model	A = 0.2			A = 1.0
Model	Image Quality (PSNR/SSIM)	BER (Q = 20)	BER (Q = 10)	Image Quality (PSNR/SSIM)	BER (Q = 20)	BER (Q = 10)
$C F F M_{C i r}$	41.51/0.989	0.285	0.302	28.57/0.853	0	0
$C F F M_{S u c}$	39.38/0.965	0.264	0.326	27.59/0.843	0.004	0.007
$C F F M_{S k i p}$	40.05/0.971	0.230	0.305	27.28/0.834	0	0.002
ADPGAN	40.96/0.976	0.195	0.260	28.19/0.849	0	0

Table 9. The comparison of CAM and Attack Layer.

Model	A = 0.2			A = 1.0
Model	Image Quality (PSNR/SSIM)	BER (Q = 20)	BER (Q = 10)	Image Quality (PSNR/SSIM)	BER (Q = 20)	BER (Q = 10)
ADPGAN	40.96/0.985	0.195	0.260	28.19/0.849	0	0
$A D P G A N_{C F F M}$	38.25/0.980	0.275	0.335	26.70/0.820	0.176	0.245
$A D P G A N_{A M}$	40.25/0.982	0.226	0.292	28.70/0.838	0	0.005
$A D P G A N_{A t t a c k}$	51.06/0.999	0.433	0.449	47.29/0.995	0.292	0.364

Table 10. The performance of the proposed framework under strength factor

α = 1

.

Table 10. The performance of the proposed framework under strength factor

α = 1

.

Quality Factor	PSNR		SSIM		BER
Quality Factor	$I_{H}$	$I_{S}^{'}$	$I_{H}$	$I_{S}^{'}$	$S_{data}^{'}$
90	34.99	42.41	0.919	0.990	0
80	35.04	42.80	0.923	0.993	0
70	35.33	42.73	0.931	0.991	0
60	34.71	42.50	0.899	0.992	$1.855 \times 10^{- 2}$
50	35.31	42.19	0.934	0.989	$6.347 \times 10^{- 2}$
40	35.16	41.50	0.929	0.992	$1.269 \times 10^{- 1}$

Table 11. The security performance (IN %) of DIH methods against known steganalyzers.

DIH	XuNet []			SRNet []			CovPoolNet []			ZhuNet []
Methods	$R_{md}$	$R_{fa}$	$R_{err}$	$R_{md}$	$R_{fa}$	$R_{err}$	$R_{md}$	$R_{fa}$	$R_{err}$	$R_{md}$	$R_{fa}$	$R_{err}$
MSM-DIH []	76.6	1.00	38.8	36.6	52.4	44.5	8.00	5.00	6.50	12.8	9.00	10.9
HiNet []	13.8	14.2	14.0	0.40	0.00	0.20	3.30	0.20	1.75	0.00	8.20	4.10
ISN []	10.4	19.4	14.9	0.40	0.20	0.30	3.80	4.00	3.90	7.20	43.4	25.3
$J C I F_{R G B}$	67.8	26.9	47.4	89.1	7.30	48.2	51.1	23.1	37.1	0.00	49.1	24.1

Table 12. Qualitative comparison of steganography restoration under 30, 50, 70 and 90 QFs.

Method	30	50	70	90
Mun []	37.8%	36.1%	34.4%	32.2%
HiDDeN []	34.6%	32.6%	31.3%	30.9%
Zhong []	9.3%	6.8%	4.5%	2.3%
Luo []	23.9%	17.6%	11.2%	4.3%
CRWNet []	25.7%	24.5%	23.8%	22.6%
ReDMark []	9.8%	6.7%	2.3%	1.3%
MOANet []	6.9%	4.8%	2.2%	1.3%
ADPGAN	0	0	0	0

Table 13. PSNR (dB) results of the proposed

J C I F_{R G B}

and other methods under different levels of degradations. The proposed

J C I F_{R G B}

can achieve superior data fidelity in most settings. The best results are red and the second-best results are blue.

Table 13. PSNR (dB) results of the proposed

J C I F_{R G B}

and other methods under different levels of degradations. The proposed

J C I F_{R G B}

can achieve superior data fidelity in most settings. The best results are red and the second-best results are blue.

Methods	Clean	Gaussian Noise ( $σ$ )			Gaussian Denoiser ( $σ$ ) []			JPEG Compression (Q)			JPEG Enhancer (Q) []
Methods	Clean	10	20	30	10	20	30	20	40	80	20	40	80
Baluja []	34.24	10.30	7.54	6.92	7.97	6.10	5.49	6.59	8.33	11.92	5.21	6.98	9.88
ISN []	41.83	12.75	10.98	9.93	11.94	9.44	6.65	7.15	9.69	13.44	5.88	8.08	11.63
DeepMIH []	42.98	12.91	11.54	10.23	11.87	9.32	6.87	7.03	9.78	13.23	5.59	8.21	11.88
RIIS []	43.78	26.03	18.89	15.85	20.89	15.97	13.92	22.03	25.41	26.02	13.88	16.74	20.13
CRoSS []	23.79	21.89	20.19	18.77	21.39	21.24	21.02	21.74	22.74	23.51	20.60	21.222	21.19
HIS []	28.39	23.49	21.88	20.02	21.73	22.41	21.98	23.21	26.11	26.23	22.42	23.23	23.15
Ours	40.26	40.25	40.26	40.24	39.47	39.28	39.02	39.70	39.77	40.24	38.12	38.24	38.55

Table 14. The computationally expensive of the proposed framework.

Framework	CPU	GPU	RAM (G)	Training Time (H)	Test Time (S)	Param (M)	Memory (M)
JCIF	Intel Core i9	GeForce RTX 4090	64	171 (99)	5.04	5.24 (2.02)	20.04 (7.77)
$J C I F_{R G B}$	Intel Core i9	GeForce RTX 4090	64	103 (65)	5.66	22.82 (18.46)	87.40 (70.79)
JCIF	Intel Core i7	−	16	−	8.32	5.24 (2.02)	20.04 (7.77)
$J C I F_{R G B}$	Intel Core i7	−	16	−	9.48	22.82 (18.46)	87.40 (70.79)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

ADPGAN: Anti-Compression Attention-Based Diffusion Pattern Steganography Model Using GAN

Abstract

1. Introduction

3. Proposed Method

3.1. Image Sampling Network

3.2. Anti-Compression Attention-Based Diffusion Pattern Steganography Model Using GAN

3.2.1. Encoder Network

3.2.2. Attack Network

3.2.3. Decoder Network

3.2.4. Adversary Discriminator

3.3. Training Objective

4. Experimental Results

4.1. Experimental Setup and Evaluation Metrics

4.2. Framework Invisibility and Robustness Evaluation

4.3. Framework Security Evaluation

4.4. Ablation Study

4.5. RGB Visualization

4.6. Steganographic Analysis

4.7. Comparison with State-of-the-Art

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics