Next Article in Journal
Towards Deterministic-Delay Data Delivery Using Multi-Criteria Routing over Satellite Networks
Previous Article in Journal
A YOLO Network Based on Depthwise Convolution Attention, Feature Fusion, and KL Divergence (DFK-YOLO): A Deep Learning Method for Infrared Small Target Detection Based on YOLOv7
Previous Article in Special Issue
Fast Hybrid Search for Automatic Model Compression
 
 
Article
Peer-Review Record

JSN: Design and Analysis of JPEG Steganography Network

Electronics 2024, 13(23), 4821; https://doi.org/10.3390/electronics13234821
by Po-Chyi Su 1,*, Yi-Han Cheng 1 and Tien-Ying Kuo 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Electronics 2024, 13(23), 4821; https://doi.org/10.3390/electronics13234821
Submission received: 23 October 2024 / Revised: 2 December 2024 / Accepted: 3 December 2024 / Published: 6 December 2024
(This article belongs to the Special Issue Image and Video Coding Technology)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper "JSN: Design and Analysis of JPEG Steganography Network" introduces a novel approach for JPEG-compliant image steganography. The authors propose a method called JSN, which integrates a deep invertible neural network (INN) with Discrete Cosine Transform (DCT) and quantization steps. This model aims to retain the quality of hidden images under lossy JPEG compression. While the approach is technically sound and has potential, there are several areas where the manuscript could be improved for clarity, rigor, and completeness.

1. The initial robustness testing with specific JPEG quality settings is well done and shows a solid grasp of the basics. However, expanding the robustness tests to include a wider range of JPEG quality factors, such as 50, 70, and 100, would provide additional insights. Furthermore, adding standard image transformations, like Gaussian noise, scaling, and rotation, would offer a clearer picture of how JSN performs under real-world conditions, making the robustness claims even stronger.

 

 

2. The authors have done well in terms of computational factors, particularly with their choice of dense blocks to prioritize quality. However, including a more detailed analysis of computational demands would be beneficial, especially for readers interested in deploying JSN on devices with limited resources. Metrics such as processing time, memory usage, and resource comparisons would strengthen the proposed model's practicality.

 

 

3. Using metrics like PSNR and SSIM to assess the image quality is a robust approach. However, evaluating the proposed methodology (JSN) on security-specific metrics such as Mean Structural Similarity (MSSIM), Bit Error Rate (BER), or Peak Signal-to-Interference Ratio (PSIR) would provide better insights into its performance.

 

4. In section 4.2, adding a detailed table summarizing the training, validation, and test sets of the datasets would help readers understand the characteristics of the datasets employed in this study. This table should detail image resolutions and sample sizes across different datasets, providing an overview that strengthens the paper's transparency.

 

5. While the paper is generally well-written, a few minor grammatical issues and typos are present throughout. A thorough proofreading pass would help polish the writing further and enhance readability.

Author Response

The paper “JSN: Design and Analysis of JPEG Steganography Network” introduces a novel approach for JPEG-compliant image steganography. The authors propose a method called JSN, which integrates a deep invertible neural network (INN) with Discrete Cosine Transform (DCT) and quantization steps. This model aims to retain the quality of hidden images under lossy JPEG compression. While the approach is technically sound and has potential, there are several areas where the manuscript could be improved for clarity, rigor, and completeness.

Comment 1: The initial robustness testing with specific JPEG quality settings is well done and shows a solid grasp of the basics. However, expanding the robustness tests to include a wider range of JPEG quality factors, such as 50, 70, and 100, would provide additional insights. Furthermore, adding standard image transformations, like Gaussian noise, scaling, and rotation, would offer a clearer picture of how JSN performs under real-world conditions, making the robustness claims even stronger.

Response 1: Thank you for the valuable suggestion. We have added the results for QF=70 in Table 1, Table 2, and Figure 5. The results for QF=50 are now shown in Figure 6 to demonstrate that JSN can withstand moderate JPEG compression, providing more information than Figure 5. The discussion on scaling and rotation attacks has been incorporated in Section 4.6 (Geometrical Attacks). We acknowledge that the proposed scheme is sensitive to synchronization issues. We have added a discussion on noise addition in Section 4.9, where Figure 15 illustrates the effects of different Gaussian noise levels. While we do not claim robustness in the context of large-volume covert communication provided by JSN, our primary goal remains the generation of JPEG-compliant stego images.

Comment 2: The authors have done well in terms of computational factors, particularly with their choice of dense blocks to prioritize quality. However, including a more detailed analysis of computational demands would be beneficial, especially for readers interested in deploying JSN on devices with limited resources. Metrics such as processing time, memory usage, and resource comparisons would strengthen the proposed model's practicality.

Response 2: Thank you for pointing out the issue. We provide a brief description of the execution time on Page 20. Embedding or extracting a 512x512 color secret image takes approximately 3 to 4 seconds on a 4090 GPU (similar performance is observed on a 3090 GPU). The code is not yet fully optimized, and the execution speed can be improved with more refined optimizations. We plan to release the code publicly so that interested users can run and test it.

Comment 3: Using metrics like PSNR and SSIM to assess the image quality is a robust approach. However, evaluating the proposed methodology (JSN) on security-specific metrics such as Mean Structural Similarity (MSSIM), Bit Error Rate (BER), or Peak Signal-to-Interference Ratio (PSIR) would provide better insights into its performance.

Response 3: Thank you for the insightful suggestion. In our work, we primarily used PSNR to assess performance and included SSIM as an additional reference, which yields similar results. The motivation behind JSN stems from the observation that existing deep learning-based steganography methods tend to fail when even mild JPEG compression is applied. The extremely low PSNR in such cases indicates these failures. While our work aims to preserve the PSNR of the secret image above 25dB, we acknowledge that this may not be as high as expected. However, when the stego image is compressed with lower QF in JPEG, the quality of the extracted secret image is affected. Given the lossy nature of JPEG compression, we think PSNR remains a reliable indicator for evaluating performance, allowing for comparisons with existing works like HiNet, ISN, and PRIS, which also used PSNR, likely for simplicity.

Comment 4: In section 4.2, adding a detailed table summarizing the training, validation, and test sets of the datasets would help readers understand the characteristics of the datasets employed in this study. This table should detail image resolutions and sample sizes across different datasets, providing an overview that strengthens the paper's transparency.

Response 4: Thank you for the suggestion. We have added more details about the data in Sec. 4.2. The DIV2K dataset consists of 1000 images, offering a variety of combinations of cover and secret images. Following the methodology of HiNet, we randomly selected 800 images and created 400 pairs, resulting in 800 stego images. For the ImageNet and COCO datasets, which contain a larger number of images, we again randomly selected cover and secret images to generate 2000 stego images from each dataset. The selection process was random to ensure that the results reflect the average performance. To facilitate batch processing, we cropped the images to a consistent size, i.e., 224x224 for training and 512x512 for inference.

Comment 5: While the paper is generally well-written, a few minor grammatical issues and typos are present throughout. A thorough proofreading pass would help polish the writing further and enhance readability.

Response 5: Thank you for the reminder. We have thoroughly proofread the entire paper and made additional improvements to enhance its readability. All figures have been reviewed, and their quality has been improved. The paper should now read much more smoothly.

Reviewer 2 Report

Comments and Suggestions for Authors

This article proposes a JPEG Steganography Network (JSN) that embeds images within images and withstands JPEG compression. It also presents some considerations, such as the selection of cover images. However, the following issues should be addressed. 

1. The article’s contributions should be clarified and strengthened. A brief review of existing methods for resisting JPEG compression can help readers better understand the motivation and significance of the article.

2. The authors are encouraged to provide a more extensive literature review, for example, including a brief review of reversible data hiding in the encrypted domain or in the shared domain. Some recent work on these active topics includes:

Reversible Data Hiding in Encrypted Images with Asymmetric Coding and Bit-plane Block Compression, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2024.3405717.

Reversible data hiding with hierarchical embedding for encrypted images, IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 451-466, 2022.

 

Reversible data hiding in share JPEG images, ACM Transactions on Multimedia Computing, Communications and Applications, 2024,DOI:10.1145/3695463.

Reversible data hiding in encrypted images with secret sharing and hybrid coding, IEEE Transactions on Circuits and Systems for Video Technology, 33(11):6443-6458, 2023. 

3. The stego image is generated after an IDCT process. But this process is not shown in Figure 2.

4. Three methods are compared in terms of image quality. However, a comparison with more SOTA methods would better demonstrate the superiority of the proposed method.

5. The clarity of the figures should be further improved.

 

Author Response

This article proposes a JPEG Steganography Network (JSN) that embeds images within images and withstands JPEG compression. It also presents some considerations, such as the selection of cover images. However, the following issues should be addressed.

Comment 1: The article’s contributions should be clarified and strengthened. A brief review of existing methods for resisting JPEG compression can help readers better understand the motivation and significance of the article.

Response 1: Thank you for the suggestion. The contributions of our work are highlighted on Page 2, at the end of Section 1. While there are effective methods for resisting JPEG compression, this work focuses on embedding one image into another of the same resolution. Deep learning is commonly used for this task, and invertible neural networks (INNs) offer a promising solution. However, existing INN-based methods typically fail when the stego image undergoes even mild JPEG compression. To address this challenge, we incorporated block DCT into the processing flow, along with additional quantization, to generate a JPEG-compliant stego image and ensure effective image steganography. Related issues are also discussed in this work.

Comment 2: The authors are encouraged to provide a more extensive literature review, for example, including a brief review of reversible data hiding in the encrypted domain or in the shared domain. Some recent work on these active topics includes:
- Reversible Data Hiding in Encrypted Images with Asymmetric Coding and Bit-plane Block Compression, IEEE Transactions on Multimedia, doi: 10.1109/TMM.2024.3405717.
- Reversible data hiding with hierarchical embedding for encrypted images, IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 451-466, 2022.
- Reversible data hiding in share JPEG images, ACM Transactions on Multimedia Computing, Communications and Applications, 2024,DOI:10.1145/3695463.
- Reversible data hiding in encrypted images with secret sharing and hybrid coding, IEEE Transactions on Circuits and Systems for Video Technology, 33(11):6443-6458, 2023.

Response 2: Thank you for the suggestion. We have added these papers in Section 2.1 to make the review more comprehensive.

Comment 3: The stego image is generated after an IDCT process. But this process is not shown in Figure 2.

Response 3: Thanks for pointing out the error. Figure 2 is revised to correct the issue.

Comment 4: Three methods are compared in terms of image quality. However, a comparison with more SOTA methods would better demonstrate the superiority of the proposed method.

Response 4: Thank you for the question. We acknowledge that there are excellent deep learning-based watermarking schemes that are both robust and preserve image quality. However, in this work, we primarily focused on image-in-image steganography and specifically addressed the challenges posed by JPEG compression. Therefore, we compared schemes that are more closely aligned with this type of scenario.

Comment 5: The clarity of the figures should be further improved.
Response 5: Thank you for pointing out this issue. We have removed some figures and added clarifications in the captions. All figures have been reviewed to correct any errors or ambiguities, making the explanations clearer.

Reviewer 3 Report

Comments and Suggestions for Authors

 

1. The proposed method is an embedding method where the guest (secret) image is embedded in the host (cover) image and the guest image can be extracted even after JPEG compression. There are two types of data hiding methods: steganography, which, as described by the author, does not make the embedded image unnoticeable, and watermarking methods, where the host image has a value. The author calls the proposed method a steganographic method, but isn't it a watermarking method? Steganalysis must be used to demonstrate that the guest image is undetectable as embedded if the proposed method is steganography. However, this manuscript does not present any steganalysis results. The results are mandatory if the authors claim this is a steganalysis method. The following paper would be a good reference for a steganalysis method.

[A] Fridrich, J., Goljan, M., Hogea, D. (2003). Steganalysis of JPEG Images: Breaking the F5 Algorithm. In: Petitcolas, F.A.P. (eds) Information Hiding. IH 2002. Lecture Notes in Computer Science, vol 2578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36415-3_20

 

2. In steganography, there are few methods that are tolerant to JPEG compression, so the proposed method has utility in this regard. The proposed method performs the quantization of the JPEG compression outside the INN. It could be implemented within a neural network, as in the following paper. This approach can be used as a reference.

[B] Yamauchi, S.; Kawamura, M. A Neural-Network-Based Watermarking Method Approximating JPEG Quantization. J. Imaging 2024, 10, 138. https://doi.org/10.3390/jimaging10060138

 

3. Image quality has improved compared to conventional methods. But how much image quality do you expect? At present, the image quality is poor. It would be better to reduce the size of the secret image while maintaining the image quality. In other words, I do not understand the reason for using IHI. It seems that the size of the cover image and the secret image should be the same for this network. The cover image is also degraded and is likely to be detected by steganalysis. Please explain why you chose to use an IHI-based model and what are the advantages of this model.

4. Some parts of the English explanation are not clear. In particular, I do not understand the part in section 3.1 that explains the proposed method. English proofread is needed.

* I cannot understand what the (7, 7) band is in line 209, “the channel 63 corresponds to the (7, 7) band”.

* In lines 223-225, enhanced affine transformation is used, but it is unclear where and why this process is used.

* In line 226, “f (-), g (-) and h (-) use the same dense block [31] as shown in Figure 4”, I cannot understand what the dense block refers to. f, g, and h are functions. Why are they represented in the network? Also, I cannot figure out the relationship between Figure 3 and Figure 4. The description should be written for readers who do not know IHI.

* Figure 3 is part of Figure 2. There is no need to show it separately.

* In line 205, x_cover and x_secret represent the WxH pixel image. At line 222, x_cover^i and x_secret^i represent the i-th embedding invertible block. This block size should be clearly specified. The dimensions of the arguments of the f, g, and h functions and the dimension of the returned value should also be specified. The dimension of loss information r should also be clarified.

* I do not understand what the captions for Figure 4 and Figure 10 mean.

5. In the paragraph starting at line 253, you mention loss information r, but I don't understand what you mean. Is it correct to say that the proposed method uses a sampled variable z that follows a Gaussian distribution, since the loss information r at the time of embedding cannot be used at the time of extraction? How valid is the assumption that the distribution is Gaussian? Why did you assume that the distribution is Gaussian? As claimed in line 264, there is the possibility of multiple solutions. How many recovery should be performed? Chapter 4 does not show multiple recovered secret images. Multiple images should be obtained by changing the value of z.

6. In line 275, it states that the JSN will be learned, but the parameters to be learned are not stated in the text or in Figures 2, 3, and 4. The dimensions of the inputs and outputs of the network should be specified.

7. The sum of (6) and (7) should be n=0.

8. In lines 390-392, what measure was used to measure the integrity of the secret information? Provide its definition.

9. Are you sure you have not misplaced the image in Figure 9? The same image is shown in (a). Is the top one uncompressed and the bottom one compressed? The description does not match the figures.

10. Please specify the image quality of Figure 13 in PSNR in the text.

 

 

Comments on the Quality of English Language

Please see comment 4.

Author Response

Comment 1: The proposed method is an embedding method where the guest (secret) image is embedded in the host (cover) image and the guest image can be extracted even after JPEG compression. There are two types of data hiding methods: steganography, which, as described by the author, does not make the embedded image unnoticeable, and watermarking methods, where the host image has a value. The author calls the proposed method a steganographic method, but isn't it a watermarking method? Steganalysis must be used to demonstrate that the guest image is undetectable as embedded if the proposed method is steganography. However, this manuscript does not present any steganalysis results. The results are mandatory if the authors claim this is a steganalysis method. The following paper would be a good reference for a steganalysis method.
[A] Fridrich, J., Goljan, M., Hogea, D. (2003). Steganalysis of JPEG Images: Breaking the F5 Algorithm. In: Petitcolas, F.A.P. (eds) Information Hiding. IH 2002. Lecture Notes in Computer Science, vol 2578. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36415-3_20

Response 1: Thank you for the question. We categorize the proposed scheme based on whether the hidden information is related to the cover image. Digital watermarking is typically used for copyright protection or ownership declaration, whereas steganography is aimed at covert communication. Another key difference is the payload. Our proposed scheme focuses on hiding an image within another image, so we used INNs. The mentioned reference is a classic paper and should indeed be included. The discussion of steganalysis has been added in Sec. 4.8. Thanks for the reminder.

Comment 2: In steganography, there are few methods that are tolerant to JPEG compression, so the proposed method has utility in this regard. The proposed method performs the quantization of the JPEG compression outside the INN. It could be implemented within a neural network, as in the following paper. This approach can be used as a reference.
[B] Yamauchi, S.; Kawamura, M. A Neural-Network-Based Watermarking Method Approximating JPEG Quantization. J. Imaging 2024, 10, 138. https://doi.org/10.3390/jimaging10060138

Response 2: Thank you for the reminder. We will include this paper in the references.

Comment 3: Image quality has improved compared to conventional methods. But how much image quality do you expect? At present, the image quality is poor. It would be better to reduce the size of the secret image while maintaining the image quality. In other words, I do not understand the reason for using IHI. It seems that the size of the cover image and the secret image should be the same for this network. The cover image is also degraded and is likely to be detected by steganalysis. Please explain why you chose to use an IHI-based model and what are the advantages of this model.

Response 3: That's a great question. The reason for using INNs in our approach is their large steganography capacity, as one RGB image can be embedded into another RGB image of the same size - which is difficult to achieve with traditional methods. However, existing deep learning-based steganography schemes are typically vulnerable to JPEG compression. Specifically, after JPEG compression, the content is often severely distorted, leading to very low PSNR values. In our proposed scheme, the output is a JPEG-compressed image, so the PSNR values may not be as high as expected. The key advantage of our approach is that it ensures a stego JPEG image can be generated, and the secret image can still be retrieved.

Comment 4: Some parts of the English explanation are not clear. In particular, I do not understand the part in section 3.1 that explains the proposed method. English proofread is needed.

* I cannot understand what the (7, 7) band is in line 209, “the channel 63 corresponds to the (7, 7) band”.

Thanks for pointing out this issue. Since 8x8 DCT is used, there will be 64 channels:

(0,0), (0,1) …  (0,7)

  :    :   :   :

  :    :   :   :

(7,0), (7,2) …  (7,7)

where (0,0) is the DC band and (7,7) is the frequency band corresponding to the highest frequency. We have rephrased this on Page 5. Thank you.

* In lines 223-225, enhanced affine transformation is used, but it is unclear where and why this process is used.

Thank you for pointing out this ambiguity. We followed the implementation from the paper “Invertible Image Rescaling,” which uses the term “enhanced affine transformation,” and this corresponds to Eq. (2). Since our approach is based on this existing method without significant modifications, we did not elaborate on this part in detail.

* In line 226, “f (-), g (-) and h (-) use the same dense block [31] as shown in Figure 4”, I cannot understand what the dense block refers to. f, g, and h are functions. Why are they represented in the network? Also, I cannot figure out the relationship between Figure 3 and Figure 4. The description should be written for readers who do not know IHI.

Thank you for pointing this out. We follow the common implementation of INNs, where some approaches used convolutional blocks, residual blocks, or dense blocks. After experimenting with different options, we chose to use dense blocks for our implementation. We have revised some sentences below Eq. (2) to clarify this further.

* Figure 3 is part of Figure 2. There is no need to show it separately.

Thank you for pointing out this duplication. We have modified Figures 2 and 3. Figure 2 now shows blocks representing FIBs and BIBs, and their details are provided in Figure 3.

* In line 205, x_cover and x_secret represent the WxH pixel image. At line 222, x_cover^i and x_secret^i represent the i-th embedding invertible block. This block size should be clearly specified. The dimensions of the arguments of the f, g, and h functions and the dimension of the returned value should also be specified. The dimension of loss information r should also be clarified.

Thank you for the suggestion. Figure 2 has been significantly revised, and all data dimensions are now listed to provide clearer information.

* I do not understand what the captions for Figure 4 and origin mean.

Thank you for your questions. Figures 4 and 10 in the previous manuscript showed the structure of dense blocks and residual blocks. However, presenting the structure did not significantly clarify the content. As a result, these two figures have been removed in the revised version.

We have thoroughly proofread the paper and carefully reviewed and revised all the figures. The paper should now be much clearer and more readable. Thank you.

Comment 5: In the paragraph starting at line 253, you mention loss information r, but I don't understand what you mean. Is it correct to say that the proposed method uses a sampled variable z that follows a Gaussian distribution, since the loss information r at the time of embedding cannot be used at the time of extraction? How valid is the assumption that the distribution is Gaussian? Why did you assume that the distribution is Gaussian? As claimed in line 264, there is the possibility of multiple solutions. How many recovery should be performed? Chapter 4 does not show multiple recovered secret images. Multiple images should be obtained by changing the value of z.

Response 5: Thank you for the question. Embedding multiple images is achieved by increasing the number of channels. In JSN, we demonstrate this by embedding two secret images. Figure 8 in the previous manuscript was used to illustrate the concept of increasing channels to embed multiple images. However, we found that this figure was not essential and decided to remove it. The explanation has been added in Sec. 4.4.

Using r with a Gaussian distribution is somewhat heuristic. For example, schemes like ISN use a constant value. After experimenting with different approaches, we found that using r with a Gaussian distribution yielded better performance. We acknowledge that this may not be the most optimized solution.

For the implementation of embedding multiple images, we didn’t make significant changes to the design, but instead, we simply increased the number of channels (doubling the channels to embed two images). The z’s are random signals provided separately to each “branch” to help reveal the two secret images.

Comment 6: In line 275, it states that the JSN will be learned, but the parameters to be learned are not stated in the text or in Figures 2, 3, and 4. The dimensions of the inputs and outputs of the network should be specified.

Response 6: Thank you for pointing this out. The learned parameters are the dense blocks used to implement f, g and h. In the previous manuscript, Fig. 3 was already shown in Fig. 2, and Fig. 4 did not provide significant clarification. Therefore, we have modified Fig. 2 and Fig. 3, and removed Fig. 4 and Fig. 10 in the revised version.

Comment 7: The sum of (6) and (7) should be n=0.

Response 7: Thanks for the reminder. Adding n=0 certainly makes the equations clearer.

Comment 8: In lines 390-392, what measure was used to measure the integrity of the secret information? Provide its definition.

Response 8: Thanks for pointing out this issue. In this context, “integrity” refers to the “content” of the secret image. In deep learning-based steganography, the retrieved secret image may differ significantly from the original. Therefore, “integrity” here means that the content is preserved. To avoid ambiguity, we have decided not to use the term “integrity” in the revised version.

Comment 9: Are you sure you have not misplaced the image in Figure 9? The same image is shown in (a). Is the top one uncompressed and the bottom one compressed? The description does not match the figures.

Response 9: Thanks for the question. The original figures could have been clearer. We have added text to the figures to clarify each case. Thank you.

Comment 10: Please specify the image quality of Figure 13 in PSNR in the text.

Response 10: Thanks for the reminder. The PSNR values have been added to the figure caption.

Reviewer 4 Report

Comments and Suggestions for Authors

In this paper, in order to solve the challenge of JPEG compression in image steganography, we propose a JPEG steganography network (JSN), which uses reversible depth neural network as the backbone and is integrated with JPEG coding process. We use 8 × 8 discrete cosine transform (DCT) and consider the quantization step specified by JPEG to create JPEG compliant implicit images. However, I noticed that the following problems still exist in this article:

1. explain in detail 1 and figure 3.

2. give the description of degree (X10) in Figure 6.

3. the process of hiding multiple secret images is not clear in this paper. Please explain it in more detail.

4. whether it can resist steganalysis attack.

5. give an intuitive comparison of whether the compression efficiency is reduced after the improvement.

Author Response

In this paper, in order to solve the challenge of JPEG compression in image steganography, we propose a JPEG steganography network (JSN), which uses reversible depth neural network as the backbone and is integrated with JPEG coding process. We use 8 × 8 discrete cosine transform (DCT) and consider the quantization step specified by JPEG to create JPEG compliant implicit images. However, I noticed that the following problems still exist in this article:

Comment 1: explain in detail 1 and figure 3.

Response 1: Thanks for the question. Figure 1 illustrates that existing deep learning-based methods are vulnerable to mild JPEG compression, which motivates our research. Figure 3 shows the structure of invertible blocks, which are essentially convolutional blocks derived from the model training process. We chose dense blocks to achieve better performance. The structure in Figure 3 corresponds to Eqs. (1)-(4), which are standard implementations of INNs, and we did not modify them.

Comment 2: give the description of degree (X10) in Figure 6.

Response 2: Thanks for the reminder. We have added the explanation on Page 10, below Table 2 (line39).

Comment 3: the process of hiding multiple secret images is not clear in this paper. Please explain it in more detail.

Response 3: Thanks for pointing out the issue. To embed multiple secret images, we increase the number of channels (doubling the channels to embed two secret images). Figure 8 of the previous version was found to be less useful and has been removed. The explanation is now included in Sec. 4.4.

Comment 4: whether it can resist steganalysis attack.

Response 4: Thanks for pointing out this issue. We have added a new subsection, Sec. 4.8, titled “Steganalysis.” We acknowledge that if deep learning-based steganalysis models are used and the images are from the proposed JSN, the steganographic images can potentially be detected. To address this issue, we compared our work with HiNet to demonstrate that JSN offers better resistance to steganalysis.

Comment 5: give an intuitive comparison of whether the compression efficiency is reduced after the improvement.

Response 5: One advantage of the proposed scheme is that the resulting JPEG images are not significantly enlarged. The discussions were added at the end of Sec. 4.8.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have adequately addressed the concerns previously raised, and it is therefore recommended that this paper be accepted for publication.

Author Response

Comment: The authors have adequately addressed the concerns previously raised, and it is therefore recommended that this paper be accepted for publication.

Response: Thank you for your valuable feedback and insightful suggestions. They will undoubtedly help us enhance the quality of the paper.

 

 

Reviewer 3 Report

Comments and Suggestions for Authors

I would like authors to mark only those parts of the text that were corrected in the responses to the comments, excluding sentences that were corrected in the English proofreading.

For Response 2, I expected to cite and reference the paper [B] listed in the review comment, but it is not cited in the references.
Other possible implementations of JPEG quantization could be described in Section 1.

 

 

 

Author Response

Comment 1: I would like authors to mark only those parts of the text that were corrected in the responses to the comments, excluding sentences that were corrected in the English proofreading.

Thank you for the reminder. The revised section has been highlighted in the submitted PDF.

Comment 2: For Response 2, I expected to cite and reference the paper [B] listed in the review comment, but it is not cited in the references. Other possible implementations of JPEG quantization could be described in Section 1.

We apologize for the oversight in missing the reference. While it was added to the bibliography file, we neglected to insert it into the text. The reference [B] has now been included at the end of Section 2.3, along with a description of the method. Thank you for the reminder.

 

Back to TopTop