Next Article in Journal
Investigation of Reliability Strengthening by Six-Sided Protective Structure in Fan-Out Wafer-Level Packaging
Previous Article in Journal
Challenges in Fault Diagnosis of Nonlinear Circuits
 
 
Article
Peer-Review Record

ADPGAN: Anti-Compression Attention-Based Diffusion Pattern Steganography Model Using GAN

Electronics 2025, 14(22), 4426; https://doi.org/10.3390/electronics14224426
by Zhen-Qiang Chen 1, Yu-Hang Huang 1, Xin-Yuan Chen 2 and Sio-Long Lo 1,*
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2025, 14(22), 4426; https://doi.org/10.3390/electronics14224426
Submission received: 16 October 2025 / Revised: 3 November 2025 / Accepted: 7 November 2025 / Published: 13 November 2025
(This article belongs to the Section Electronic Multimedia)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper suggests a robust steganography and invertible scaling network (JCIF) framework for JPEG compression-resistant images that incorporates large secret images into host images to protect security and privacy. The suggested anti-compression attention-based diffusion pattern for steganography employing GAN (ADPGAN) is part of the framework, along with a transformation layer and an image scaling model.

At the current time, deep learning-based steganography systems are unable to balance significant capacity, robustness, and imperceptibility for all secret material that is being used in industries. Simultaneously, the obtained hidden images are often of poor visual quality, which leads to lower steganographic effectiveness. In your research, you have a JPEG compression-resistant image framework based on an invertible scaling network and robust steganography (JCIF), which hides the secret image within the cover image and demonstrates high distortion resistance.

Per your research, the ADPGAN gets around these problems by using dense connections to combine shallow and deep image features with secret data. This makes it more resistant to JPEG compression. Data is embedded in discrete locations, increasing the host image's imperceptibility. Additionally, JCIF can meet both the amount of data it can hide and the safety of the secret image by combining the transformation layer and the image scaling model. According to your experimental data, JCIF can hide images well while keeping their quality during transmission, and it shows better visual quality at the receiver even when there is some damage, compared to current methods.

Ablation tests confirm the efficacy of ADPGAN, which enhances the JPEG compression resistance of JCIF. Users can integrate hidden images into cover images to preserve visual fidelity and assure anonymity in secure apps like social networking platforms and cloud storage thanks to JCIF's strong steganographic capabilities and great image quality.  Overall, this paper comprehensively covers all aspects and is well-written.

Author Response

我们衷心感谢您抽出宝贵时间,并对我们的稿件提出了如此宝贵的反馈意见。您的积极评价对我们意义重大。

Reviewer 2 Report

Comments and Suggestions for Authors

This is an interesting paper, though there are several areas where it could be further improved.

1. The introduction should more clearly outline the major challenges in the field and explicitly state the paper's original contributions in addressing them. It would also help to better position the proposed framework within the broader context of JPEG compression-resistant steganography.

2. The framework mainly integrates existing components (e.g., DenseNet, SE block, and GAN). While the engineering design is solid, the novelty is somewhat incremental. Please highlight the unique contributions and technical insights more clearly in both the Related Work and Method sections, explaining how this framework differs from previous combinations.

3. The description of the embedding and decoding modules is relatively brief. It would be beneficial to add more mathematical formulations, architecture diagrams, or flowcharts to help readers better understand the principles and workflow of the proposed JCIF and JCIFRGB frameworks.

4. The Method section would benefit from a clearer explanation of the motivation behind each major component — for example, why specific network structures (e.g., DenseNet or circular convolution) were chosen and how they contribute to compression resistance and reversibility.

5. The Experiment section should include comparisons with more recent and representative baselines in deep steganography and anti-compression This would strengthen the empirical validation and demonstrate the effectiveness of the proposed approach relative to state-of-the-art methods.

6. The Ablation Study could be further enhanced by quantitatively analyzing the contribution of each module to the final performance, to better justify design choices.

7. The Conclusion section would benefit from adding a discussion of limitations and possible future directions.

Author Response

Comments 1: The introduction should more clearly outline the major challenges in the field and explicitly state the paper's original contributions in addressing them. It would also help to better position the proposed framework within the broader context of JPEG compression-resistant steganography.

 

Response 1:   We thank the reviewer for pointing this out. To provide a clearer overview of the major challenges in this field within the introduction, we outline the encountered issues in the first paragraph: 1) Confidentiality and imperceptibility. Due to illegal activities such as piracy, identity theft, malicious tampering, unauthorized access and distribution, and online fraud, rights holders have suffered significant losses, exacerbated by the public and insecure nature of SNS. 2) Robustness and high fidelity. Due to transmission distortions, users may be unable to obtain or download images with visual quality close to the original. At the end of the introduction, we present our contributions: 1) To overcome the limitations of existing deep-learning-based steganography models, which extract only limited data features, we propose ADPGAN, which deploys a circular feature fusion module (CFFM) to learn both shallow and deep image features for fusion with secret data. These features are reused in the transform domain through dense connectivity to enhance robustness. 2) To enhance image features and reduce visual degradation caused by embedding, we deploy a circular attention module (CAM) that guides data embedding into inconspicuous, textured regions by computing a probability distribution across image feature channels. The integration of CFFM and CAM enables ADPGAN to achieve high imperceptibility and robustness, as confirmed by ablation studies. 3) Based on ADPGAN, we propose a novel JPEG compression-resistant image framework that ensures high-fidelity images by making the degradation in the revealed image primarily result from sampling rather than JPEG compression. The framework downsamples the secret image into secret data and embeds it into the cover image via ADPGAN. Experimental results show that, even with JPEG compression at a quality factor of 20, the framework maintains a 0 BER while achieving a recovered-image PSNR of 39.70 dB, demonstrating its high fidelity in revealing the secret image under attack.

 

Comments 2: The framework mainly integrates existing components (e.g., DenseNet, SE block, and GAN). While the engineering design is solid, the novelty is somewhat incremental. Please highlight the unique contributions and technical insights more clearly in both the Related Work and Method sections, explaining how this framework differs from previous combinations.

Response 2:   This observation is correct. We have changed. We must acknowledge that the article does not clearly explain its innovation, though we have highlighted its unique contributions and technical insights in the introduction and related work sections. To better address the loss of visual quality in embedded images due to intense noise, we compute an attention mask in the transform domain, rather than the spatial domain, to embed data into inconspicuous, texture-rich regions, thereby enhancing the imperceptibility or robustness of ADPGAN. The inclusion of circular convolution enables DenseNet to better reuse image features, improving image security. Additionally, we propose a novel JPEG-compression-resistant image framework that integrates an image downsampling network with an ADPGAN to mitigate JPEG-compression-induced image degradation. Unlike direct full-size embedding, the framework downsamples the secret image into secret data before embedding it into the image via ADPGAN. Benefiting from ADPGAN's high robustness, the quality degradation of the recovered secret image in the proposed framework primarily results from information loss during trainable sampling. Compared to combinations where the loss mainly stems from JPEG compression, the proposed framework achieves superior-quality secret images at the receiver.

 

Comments 3: The description of the embedding and decoding modules is relatively brief. It would be beneficial to add more mathematical formulations, architecture diagrams, or flowcharts to help readers better understand the principles and workflow of the proposed JCIF and JCIFRGB frameworks.

Response 3:  We agree and have updated. We have revised the flowchart of our framework to help readers better understand its structure. Additionally, to facilitate readers' comprehension of the ADPGAN workflow, we have incorporated mathematical formulas for each step.

Comments 4:  The Method section would benefit from a clearer explanation of the motivation behind each major component — for example, why specific network structures (e.g., DenseNet or circular convolution) were chosen and how they contribute to compression resistance and reversibility.

Response 4:   We agree and have updated. We express gratitude to the reviewers for recognizing the significance of the issue. Since our objective is to reveal high-definition secret images at the receiver, ADPGAN requires high robustness against JPEG compression. The use of Dense-Net enables the reuse of previous image features (including shallow and deep features) and relearning, guiding the encoder to train distinct steganographic patterns to enhance robustness. Circular convolution evenly distributes data across all neurons and avoids the degradation of zero-padding on the watermark. Data diffusion allows damaged pixel blocks to extract data from adjacent blocks, improving the imperceptibility and robustness of ADPGAN. In the experimental section, we validate the effectiveness of Dense-Net and circular convolution through ablation studies. Additionally, to address issues of confusion, we replace reversibility with an image downsampling module. The excellent robustness enables the extraction network to retrieve accurate secret data and allows the image downsampling network to recover high-quality secret images.

Comments 5:  The Experiment section should include comparisons with more recent and representative baselines in deep steganography and anti-compression This would strengthen the empirical validation and demonstrate the effectiveness of the proposed approach relative to state-of-the-art methods.

Response 5:   This observation is correct. We have changed. In the revision, we included a comparison of ADPGAN with other schemes under intense compression attacks to demonstrate that the proposed ADPGAN effectively resists JPEG attacks. Additionally, we incorporated the security performance of DIH methods against known steganalyzers and the ROC curves of StegExpose for detecting DIH methods to showcase the excellent imperceptibility of the proposed framework.

Comments 6:  The Ablation Study could be further enhanced by quantitatively analyzing the contribution of each module to the final performance, to better justify design choices.

Response 6:   We agree and have updated. We have conducted a quantitative analysis primarily targeting four modules: CFFM, CAM, the preprocessing module, and the attack layer. 1) The preprocessing component (Conv-BN-ReLU, circular convolution, and SE block) serves as the initial element of the encoder, tasked with extracting feature maps that are minimally impacted by JPEG compression in the embedded regions. Through ablation experiments on the preprocessing block, the preprocessing encodes the cover image and secret data, training feature maps, and data attributes robust to JPEG compression. 2) To assess the efficacy of CFFM, three comparable models were employed, wherein the dense connections in CFFM were substituted with successive connections (CFFMSuc) and skip connections (CFFMSkip), while circular convolutions were replaced with standard convolutions (CFFMCir}). Ablation experiments on CFFM demonstrate that its feature reuse significantly enhances the efficacy of secret data extraction. 3) We conducted ablation experiments on the AM module without the CAM module and without circular convolution. The results confirm that the CAM module in ADPGAN improves robustness by sharing data among adjacent blocks. 4) We performed ablation experiments with and without the attack layer. The results indicate that, while the absence of the attack layer yields images with better imperceptibility, their robustness is insufficient to withstand JPEG compression.

Comments 7: The Conclusion section would benefit from adding a discussion of limitations and possible future directions.

Response 7:   We have made the change. The new sentence reads as follows. Pursuing high robustness in ADPGAN results in limited embedding capacity and high computational cost, leading to an oversized framework. In future work, while ensuring robustness and imperceptibility, we focus on increasing the model’s embedding capacity. We will also explore the feasibility of integrating the two networks for joint training—specifically, downsampling the secret image before embedding it—and compare its performance with direct embedding without preprocessing.

 

 

 

 

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents JCIF, a framework for robust steganography. It uses a lossy scaling network to shrink a secret image and then embeds this small payload into a cover image using a new GAN model called ADPGAN, which works in the DCT domain.

The authors have clearly put effort into the ADPGAN network, which looks interesting. However, the overall framework and, more importantly, the experimental comparisons are problematic. The claims about high capacity and robustness aren't fairly supported. I'd recommend a Major Revision to fix these issues.

1). The SOTA comparison in Table 12 really needs a second look. You're comparing your method, which hides a tiny payload (about 0.03 bpp), with full-capacity steganography methods. This comparison isn't really fair to either side, is it?
2). I noticed the paper claims "high embedding capacity" several times. But the 0.03 bpp payload is actually very low. Perhaps it's better to re-brand this as a low-capacity, high-robustness method? That's also a valid and interesting contribution.
3). The name "Invertible Rescaling Network" is a bit confusing. Your Figure 1 and text show that high-frequency data is lost and replaced with noise. To avoid misleading the reader, maybe just call it a "lossy downsampling" module?
4). A key point to clarify: because the rescaling is lossy, the secret image can never be perfectly recovered. It would be great if you could add a baseline PSNR for the secret image just from this down/up-sampling, without any steganography.
5). The robustness and anti-steganalysis results look good. But, to be fair, any method hiding such a tiny payload would be hard to detect and robust. It's important to acknowledge this context; otherwise, the results seem a bit too good.
6). To really show the strength of your method, the SOTA comparison needs to be redone. I'd suggest finding other low-capacity robust watermarking or steganography papers to compare against. That would be a much more convincing benchmark.
7). I was a bit unclear on the payload from Section 3.2. When you XOR the downsampled image ID, is this an 8-bit grayscale image or a 1-bit binary stream? This detail is pretty important for the bpp calculation.
8). The key test in 4.3 is a good start. But it just shows you're using a cipher (like the XOR). To make the security part stronger, could you discuss resistance to other attacks, not just that the key works?
9). I think the ADPGAN part, with its circular convolutions and DCT attention, is actually the strongest part of this paper. Have you considered focusing the paper more on ADPGAN as a robust watermarking technique? The current JCIF framework is a bit problematic.
10). Table 3 was a bit confusing. It lists JCIF64, JCIF128, etc. Does this mean the original secret image size? This seems to contradict the setup in 4.1. A little clarification here would help a lot.
11). To be clearer to the reader, I'd recommend stating in the abstract and intro that the rescaling is lossy. Using the word "invertible" right now is a bit confusing since information is clearly lost.

Author Response

Comments 1: The SOTA comparison in Table 12 really needs a second look. You're comparing your method, which hides a tiny payload (about 0.03 bpp), with full-capacity steganography methods. This comparison isn't really fair to either side, is it?

 

Response 1:   This observation is correct. We have changed. We have re-evaluated the state-of-the-art (SOTA) comparison. To this end, we compared the bit error rate (BER) of ADPGAN with other schemes after JPEG compression, highlighting ADPGAN's robustness and its contribution to the proposed framework. However, we must acknowledge that comparing with full-capacity steganography methods may seem unfair, though both approaches aim to enable the receiver to obtain high-fidelity secret images after attacks. Unlike them, full-capacity steganography methods embed the secret image directly without preprocessing, whereas we first downsample the secret image before embedding it. This methodological difference results in a distortion in full-capacity steganography primarily stemming from JPEG compression, whereas our recovery loss arises mainly from information loss during downsampling, leading to better high-fidelity performance. Finally, we must admit that comparing with full-capacity steganography methods is unfair, but both aim to enhance method robustness, which inspires us to explore the feasibility of jointly training the two networks in future work. That is, we will downsample the secret image before embedding it into the image and compare its performance with direct embedding without preprocessing.

Comments 2: I noticed the paper claims "high embedding capacity" several times. But the 0.03 bpp payload is actually very low. Perhaps it's better to re-brand this as a low-capacity, high-robustness method? That's also a valid and interesting contribution.

Response 2:  We thank the reviewer for pointing this out. We have revised. We must acknowledge that the article's claim of "high embedding capacity" is inappropriate; after revision, ADPGAN is redefined as a steganography model focused on high imperceptibility and robustness.

Comments 3:  The name "Invertible Rescaling Network" is a bit confusing. Your Figure 1 and text show that high-frequency data is lost and replaced with noise. To avoid misleading the reader, maybe just call it a "lossy downsampling" module?

Response 3:  We agree and have updated. We must acknowledge that the term "reversible scaling network" is somewhat confusing in the article. Following the reviewer's suggestion, we have integrated the two modules into a single network, renaming it the image sampling network, as both upsampling and downsampling are performed using the same network. Additionally, we have marked in the article that the information is lossy.

Comments 4:    A key point to clarify: because the rescaling is lossy, the secret image can never be perfectly recovered. It would be great if you could add a baseline PSNR for the secret image just from this down/up-sampling, without any steganography.

Response 4:  We have fixed the error. We have revised the article to make it more accessible to readers. In the methodology, we clarify that information loss occurs when the secret image is downsampled, and ADPGAN embeds the output of the image downsampling network into the cover image. Benefiting from ADPGAN's high robustness, the quality degradation of the secret image recovered at the receiver primarily stems from lossy sampling within the framework. Compared to schemes where quality degradation mainly arises from JPEG compression, the proposed framework ensures the recovery of high-fidelity secret images at the receiver.

Comments 5: The robustness and anti-steganalysis results look good. But, to be fair, any method hiding such a tiny payload would be hard to detect and robust. It's important to acknowledge this context; otherwise, the results seem a bit too good.

Response 5:    We thank the reviewer for pointing this out. We have revised. We have disclosed the embedding capacity in the Experimental Setup and Evaluation Metrics section, and in the discussion, we have explored the limitations of ADPGAN, including its limited embedding capacity. In future work, we will investigate the feasibility of enhancing embedding capacity while maintaining high imperceptibility and robustness.

Comments 6:To really show the strength of your method, the SOTA comparison needs to be redone. I'd suggest finding other low-capacity robust watermarking or steganography papers to compare against. That would be a much more convincing benchmark.

Response 6:    We agree and have updated. In the revision, we included a comparison of ADPGAN with other schemes under intense compression attacks to demonstrate that the proposed ADPGAN effectively resists JPEG attacks.

Comments 7:I was a bit unclear on the payload from Section 3.2. When you XOR the downsampled image ID, is this an 8-bit grayscale image or a 1-bit binary stream? This detail is pretty important for the bpp calculation.

Response 7:  We feel great thanks for your professional review work on our article. After downsampling the secret image, we process the 8-bit downsampled image through an XOR operation to obtain a 1-bit binary stream. The effective payload is provided in the Experimental Setup and Evaluation Metrics section; since we set the secret data to 4×4 and the cover image to 32 × 32 during ADPGAN training, the bits per pixel (bpp) is calculated as 0.015 (42/ 32 2). In the experiments, we use a 512 × 512 cover image, resulting in an effective payload of 409 bits. Since the proposed model aims to achieve high robustness and imperceptibility, ensuring that the receiver of the proposed framework obtains a high-fidelity secret image, the embedding capacity will be limited. In future work, we will focus on enhancing the feasibility of increasing ADPGAN's embedding capacity while maintaining high imperceptibility and robustness.

Comments 8:The key test in 4.3 is a good start. But it just shows you're using a cipher (like the XOR). To make the security part stronger, could you discuss resistance to other attacks, not just that the key works?

Response 8:   We thank the reviewer for pointing this out. We have revised. We have incorporated an evaluation of security against brute force attacks in Section 4.3, Framework Security Evaluation. The key space size refers to the total number of possible keys in a cryptographic system. It is a critical determinant in evaluating the system's security, particularly against brute-force attacks. A brute-force attack involves an adversary attempting every possible key until the correct one is found to decrypt the data. A larger key space requires the attacker to try more keys, thereby increasing system security. To ensure a high level of security, the key space should be at least 2128. The proposed framework employs a key size of 1024 bits, yielding a key space of 21024 and effectively resisting brute-force attacks.

Comments 9: I think the ADPGAN part, with its circular convolutions and DCT attention, is actually the strongest part of this paper. Have you considered focusing the paper more on ADPGAN as a robust watermarking technique? The current JCIF framework is a bit problematic.

Response 9:   This observation is correct. We have changed. Since the core of the framework stems from ADPGAN's high imperceptibility and robustness, following the reviewers' valuable suggestions, we have shifted the paper's focus more toward ADPGAN. Additionally, we have revised the article to provide readers with a clearer understanding of the JCIF framework's process and performance.

Comments 10: Table 3 was a bit confusing. It lists JCIF64, JCIF128, etc. Does this mean the original secret image size? This seems to contradict the setup in 4.1. A little clarification here would help a lot.

Response 10:   Thank you for your nice comments on our article. We must acknowledge that Table 3 was described in an unclear and confusing manner, and we have revised its description. To demonstrate the embedding capacity of the proposed framework, we only adjust the parameters of the image downsampling network to generate models for sampling secret images of sizes 64 × 64, 128 × 128, and 256 × 256, which are then integrated into the framework. To ensure the security of IS, Table  presents the performance of different models under compression with α=0.8 and Q=20. Evidently, as the resolution of IS increases, the quality of IS degrades; however, the framework still maintains excellent recovered image quality. This is because larger image sizes result in more downsampling network parameters and more pixels to process, reducing the model’s ability to minimize information loss. In summary, incorporating the image sampling network ensures the framework maintains embedding capacity, thereby allowing ADPGAN to focus on robustness and imperceptibility.

Comments 11:To be clearer to the reader, I'd recommend stating in the abstract and intro that the rescaling is lossy. Using the word "invertible" right now is a bit confusing since information is clearly lost.

Response 11:   We agree and have updated. We must acknowledge that the use of the term "reversible" in this article is somewhat confusing, although "reversible" originates from reversible neural networks, where it refers to the mutual reversibility of the forward and backward propagation processes in such networks. After revision, we have removed "reversible" from the abstract and introduction and replaced it with "sampling" to indicate that the image is lossy. Leveraging ADPGAN's robustness, the proposed framework enhances image quality by ensuring that the degradation of the reconstructed image primarily stems from sampling rather than JPEG compression. Furthermore, we demonstrate the feasibility of the proposed method by showcasing the quality of the recovered secret images after attacks in the experimental section.

 

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thanks for addressing all concerns. 

Reviewer 3 Report

Comments and Suggestions for Authors

Good revisions. All my concerns are addressed. I think this manuscript is acceptable now.

Back to TopTop