Review Reports
- Yangjie Zhong,
- Jia Liu* and
- Peng Luo
- et al.
Reviewer 1: Anonymous Reviewer 2: Kateryna Molodetska Reviewer 3: Fernando Molina-Granja
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe title “Generative Implicit Steganography via Message Mapping” is generalized, but concentration of the article is image data , need to give example of others data relate to multimedia applications. Otherwise, change the title.
Scope need to specify clearly in introduction and conclusion
At abstract need to specify tools, methods (can get from section 4)
At the end of the abstract: last line need to prove in a scientific method / evaluate your proposed system not “To our knowledge,”.
At abstract need to write, result from section 4 and mention limitation too.
LINE-61: NOT NEED TO MENTION THIS IS THE FIRST TIME …
End of section 1 add the paragraph “ organization of the paper”
Give introduction statement between heading and sub heading like 2 and 2.1
Avoid this kind of sentence “Except for reference[21],” line 147;
Give a comparison study of the existing techniques in 183 line
Heading 3 is not clear, method?
What is the line “The process of this article is shown in Figure 2”, what is the meaning of this? Line 199
Figure 3 – picture issue? Is it overlook copyright?
3.2.1 bit messaging formula is expected ( how do calculated Z is √ 2 or − √ 2; 0
Line 253, 3.3 missing
3.5.and section 4 very generalized, need to align with the article,
Author Response
Reviewer1:
Coment1:
The title “Generative Implicit Steganography via Message Mapping” is generalized, but concentration of the article is image data , need to give example of others data relate to multimedia applications. Otherwise, change the title.
Response1:
Thank you very much for your suggestion. Generative Implicit Steganography is a method proposed by our team to use implicit neural representation techniques in generative steganography. Due to the ability of implicit neural representation to represent diverse data, we do not separately emphasize the steganography scheme for image data under this concept. The article uses images as an example to demonstrate the feasibility and effectiveness of the proposed scheme, and can be analogized and extended to other types of data without changing the scheme structure. Simply update the dataset and retrain the function generator to achieve steganography for other types of data. Therefore, from this perspective, this article chooses "Generative Implicit Steganography via Message Mapping" as the title.
Coment2:
Scope need to specify clearly in introduction and conclusion.
At abstract need to specify tools, methods (can get from section 4)
At the end of the abstract: last line need to prove in a scientific method / evaluate your proposed system not “To our knowledge,”.
At abstract need to write, result from section 4 and mention limitation too.
Response2:
Thank you very much for your reminder. This article uses images as an example to evaluate the scheme, and adds scope in the introduction and conclusion. Supplement the tools and methods used in the plan in the abstract, modify vague language in the abstract, and use more rigorous language. At the same time, the limitations of the proposed solution are also presented in the abstract, which is modified as follows:
Abstract in page 1: Generative Steganography (GS) generates stego-media via secret messages, but existing GS only targets single-type multimedia data with poor universality. The generator and extractor sizes are highly coupled with resolution. Message mapping converts secret messages and noise, yet current GS schemes based on it use gridded data, failing to generate diverse multimedia universally. Inspired by Implicit Neural Representation(INR), we proposes Generative Implicit Steganography via Message Mapping(GIS). We designed Single-bit and multi-bit message mapping schemes in function domains. Its function generator eliminates the coupling between model and gridded data sizes, enabling diverse multimedia generation and breaking resolution limits. A dedicated point cloud extractor is trained for adaptability. Through literature review, this scheme is the first to perform message mapping in the functional domain. During the experiment, taking images as an example, methods such as PSNR, StegExpose, and neural pruning were used to demonstrate that the generated image quality is almost indistinguishable from the real image. At the same time, the generated image is robust. The accuracy of message extraction has reached the baseline level or above. The training cost of the plan still needs to be considered in practical deployment.
Coment3:
LINE-61: NOT NEED TO MENTION THIS IS THE FIRST TIME …
Response3:
Thank you very much for your suggestion. The purpose of mentioning 'this is the first time...' in the article is to emphasize the innovation of the plan, which is a work from scratch. Based on your suggestion, we will reduce the use of such statements in the article and only provide explanations at the beginning.
Coment4:
End of section 1 add the paragraph “ organization of the paper”
Response4:
Thank you very much for your suggestion. At the end of the first section, we add a section about the organizational structure of the article, and modify it as follows:
Section 1 in page 3 line 79:
The organizational structure of the article is as follows:
Chapter 1: Introduction. Firstly, the classification and existing problems of generative steganography are briefly introduced, and the basic framework and contribution of the scheme are presented.
Chapter 2: Related Work. This paper mainly introduces generative steganography and its three classifications, and steganography based on implicit neural representation and its two classifications. It also points out the existing problems and the ideas of this paper.
Chapter 3: The Proposed Method. This paper mainly introduces the main process of the article, and describes the scheme in detail from data representation, message mapping scheme, function generator and message extractor based on point cloud data.
Chapter 4: Experiments and Analysis: The main content of this paper is the experimental analysis of the scheme, and the quantitative evaluation of the scheme is carried out from the perspectives of visual security, message extraction accuracy, non-detectability, robustness, efficiency and super-resolution sampling.
Chapter 5: Conclusion. This paper mainly summarizes the method and effect.
Coment5:
Give introduction statement between heading and sub heading like 2 and 2.1
Response5:
Thank you very much for your suggestion. In order to make the article more coherent, an introduction is added between chapters. Relevant modifications are as follows:
Section 2 in page 3 line 95: The current research status of steganography is introduced from two categories: Generative Steganography and Steganography Based on Implicit Neural Network.
Section 3.4 in page 9 line 313: This chapter will introduce the basic architecture, training parameters, and loss function definition of point cloud message extractor.
Coment6:
Avoid this kind of sentence “Except for reference[21],” line 147;
Response6:
Thank you very much for your suggestion. We modify the expression as follows:
Section 2.1 in page 5 line 169: Except that Zhong uses implicit data oriented generators, most generative steganography uses explicit data oriented generatiors. The type and size of media data are single and fixed, which is not universal.
Coment7:
Give a comparison study of the existing techniques in 183 line
Response7:
Thank you very much for your suggestion. We add two examples to illustrate the shortcomings of implicit representation steganography based on generative model mentioned in the text, and modify them as follows:
Section 2.2 in page 5 line 205: The above schemes are to iterate and optimize the cover-media to generate stego-media instead of directly generating stego-media.
Coment8:
Heading 3 is not clear, method?
Response8:
Thank you very much for your reminding. To make the title clearer, modify title 3 to "the proposed generative implicit steganography via message mapping".
Coment9:
What is the line “The process of this article is shown in Figure 2”, what is the meaning of this? Line 199
Response9:
Thank you for your comments. This sentence means that figure 2 introduces the overall framework and process of the scheme proposed in this paper. In order to make the statement more clear, it is modified as follows:
Section 3 in page 6 line 223: As illustrated in Fig. 2, the proposed steganography framework consists of two phases.
Coment10:
Figure 3 – picture issue? Is it overlook copyright?
Response10:
Thank you for your comments. The real images used in this paper are from CelebA-HQ database, and the rest are fake images generated by the generator. The part involving the dataset in the paper conforms to the "non-commercial research" and indicates the source of reference (such as Karras' practice), so it does not involve copyright issues.
Coment11:
3.2.1 bit messaging formula is expected ( how do calculated Z is √2 or -√2 ; 0)
Response11:
Thank you for your comments. The purpose of message mapping scheme is to extract secret messages accurately, so the simplest mapping method is selected in the design. Referring to the mapping method used by Hu et al., we design the single bit and multi bit mapping methods in this paper. The design of the mapping scheme is based on the uniqueness of the interval division and the controllability of the noise distribution.
Coment12:
Line 253, 3.3 missing
Response12:
Thank you for your comments. The title is duplicated due to errors in the compilation process, which has been corrected.
Coment13:
3.5.and section 4 very generalized, need to align with the article,
Response13:
Thank you for your comments. Section 3.5 mainly describes the architecture, training parameters and loss function definition of the message extractor based on point cloud. We supplement the specific parameters of the message extractor to reproduce the code, which is modified as follows:
Section 3.4.1 in page 9 line 328: After calculation, the total number of parameters for the point cloud convolutional layer and fully connected layer is about 3.1M. The overall structure adopts progressive downsampling to effectively extract multi-scale features, and BatchNorm is used to ensure high quality during the training process. Simultaneously, no Sigmoid output is suitable for continuous noise reconstruction.
Section 3.4.2 in page 9 line 335: ...it has a batch processing capacity of 32, learning rate of 0.001 and the regularization weight of 10.0.
The fourth section describes the experimental part from the evaluation index, experimental environment, visual security, message extraction accuracy, non detectability, robustness, efficiency and super-resolution sampling. In the message extraction accuracy part, other control groups were added for comparison; In the visual security part, add images generated by other schemes for comparison to highlight the advantages of the scheme. Amend as follows:
Section 4.4 in page 11 line 373 : We define the number of secret messages embedded per pixel as the embedding capacity, measured in bpp. To illustrate the impact of training epochs on the accuracy of the extractor, we set the learning rate to 0.001 and the message embedding capacity from 1bpp to 2bpp.....It can be seen from the results that when the capacity is 1bpp, the accuracy is at an excellent level compared with other methods under the same conditions, slightly lower than that of GSN and diffusion stego, but still has a good performance. When the capacity is increased to 2bpp, the accuracy rate decreases, but the overall accuracy rate can still maintain a certain level, indicating that our steganography scheme has corresponding performance under different capacity requirements.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper presents an interesting and quite original idea: using implicit neural representations for message mapping in generative steganography. The authors propose a function generator together with a point-cloud extractor in order to break the tight link between model size and image resolution, aiming for a more flexible and universal system. Embedding messages in the function domain instead of directly into gridded data is a fresh approach, and the extractor design combined with GASP training makes the contribution technically appealing.
At the same time, the paper is not yet ready for publication. The main difficulties lie in the methodology, internal consistency, and the strength of the evaluation. The biggest issue is the way embedding capacity is defined and reported: the manuscript jumps between raw bit counts and bits-per-pixel in ways that do not match mathematically, which calls into question the reliability of the results. In addition, the proposed message mapping schemes are not explained convincingly. The single-bit mapping relies on discrete values that do not truly follow the Gaussian prior, while the multi-bit mapping slices the noise space into ranges without proving that the resulting distribution still looks Gaussian. Without solid quantitative tests of the distribution, the claim that the scheme remains invisible is not persuasive.
Overall, the paper explores a promising and potentially valuable direction. But given the current methodological inconsistencies, I cannot recommend acceptance yet. A major revision is necessary. If the authors can clearly define and measure embedding capacity, ensure the extractor is trained in a way that really matches its intended purpose, and provide stronger evidence for invisibility and robustness, then this work could indeed become an important contribution to the field.
Comments on the Quality of English LanguageThe manuscript requires significant language editing to meet the standards of an international academic journal.
Terminology is inconsistent: for example, “Celebahq” is used instead of the standard “CelebA-HQ” and abbreviations are not always introduced or applied consistently. Figure captions and section headings contain mistakes, including duplicated labels and errors in numbering, which disrupt the structure of the paper. Some descriptions, particularly of the methodology, are unclear due to imprecise or confusing phrasing, while vague expressions such as “excellent” or “good robustness” should be replaced with precise, academic wording.
Author Response
Reviewer2:
Coment1:
This paper presents an interesting and quite original idea: using implicit neural representations for message mapping in generative steganography. The authors propose a function generator together with a point-cloud extractor in order to break the tight link between model size and image resolution, aiming for a more flexible and universal system. Embedding messages in the function domain instead of directly into gridded data is a fresh approach, and the extractor design combined with GASP training makes the contribution technically appealing.
At the same time, the paper is not yet ready for publication. The main difficulties lie in the methodology, internal consistency, and the strength of the evaluation. The biggest issue is the way embedding capacity is defined and reported: the manuscript jumps between raw bit counts and bits-per-pixel in ways that do not match mathematically, which calls into question the reliability of the results. In addition, the proposed message mapping schemes are not explained convincingly. The single-bit mapping relies on discrete values that do not truly follow the Gaussian prior, while the multi-bit mapping slices the noise space into ranges without proving that the resulting distribution still looks Gaussian. Without solid quantitative tests of the distribution, the claim that the scheme remains invisible is not persuasive.
Overall, the paper explores a promising and potentially valuable direction. But given the current methodological inconsistencies, I cannot recommend acceptance yet. A major revision is necessary. If the authors can clearly define and measure embedding capacity, ensure the extractor is trained in a way that really matches its intended purpose, and provide stronger evidence for invisibility and robustness, then this work could indeed become an important contribution to the field.
Response1:
Thank you very much for your comments. Next, we will reply to the parts of the paper that need to be improved point by point.
Definition and report of embedding capacity: in this paper, we redefined the embedding capacity, taking the number of secret message bits embedded per pixel as the embedding capacity, and the unit is BPP. It is modified in section 4.4 of the article.
Section 4.4 in page 11 line 373: We define the number of secret messages embedded per pixel as the embedding capacity, measured in bpp.To illustrate the impact of training epochs on the accuracy of the extractor, we set the learning rate to 0.001 and the message embedding capacity from 1bpp to 2bpp.
Quantitatively test the noise distribution of the message mapping scheme: the noise in the scheme uses a noise generator based on GAN. The input is 0, 1 bit string, and the output is noise. The design of the mapping scheme is based on the uniqueness of interval division and the controllability of noise distribution. The core purpose of the mapping is to ensure the accuracy of message extraction by one-to-one correspondence between noise and secret messages. The mapped noise distribution is not the focus of the scheme, so the distribution of the scheme is not quantitatively detected. We modify the inaccurate expression of "conforming to Gaussian distribution" in this paper. Referring to the message mapping scheme of Hu et al.[4], the purpose of this paper is to prove the feasibility of this new method by using implicit neural representation technology based on the steganography scheme based on message mapping.
Coment2:
The manuscript requires significant language editing to meet the standards of an international academic journal.
Terminology is inconsistent: for example, “Celebahq” is used instead of the standard “CelebA-HQ” and abbreviations are not always introduced or applied consistently. Figure captions and section headings contain mistakes, including duplicated labels and errors in numbering, which disrupt the structure of the paper. Some descriptions, particularly of the methodology, are unclear due to imprecise or confusing phrasing, while vague expressions such as “excellent” or “good robustness” should be replaced with precise, academic wording.
Response2:
Thank you very much for your reminding. We modify the expression of all data sets in this paper as the standard "celebA-HQ". Modify the vague expression in the text as follows and use more precise academic wording.
Abstract in page 1 line 11: Through literature review, this scheme is the first to perform message mapping in the functional domain. During the experiment, taking images as an example, methods such as PSNR, StegExpose, and neural pruning were used to demonstrate that the generated image quality is almost indistinguishable from the real image. At the same time, the generated image is robust. The accuracy of message extraction has reached the baseline level or above. The training cost of the plan still needs to be considered in practical deployment.
Section 4.6 in page 13 line 403: Experimental results have shown that the accuracy of message extraction can be maintained when the encrypted carrier is within an acceptable pruning range.
Section 5 in page 15 line 433: Experimental results have shown that our scheme can resist neural pruning attacks and ensure message extraction accuracy within a certain pruning ratio.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for Authors-
Provide additional details in Methods (e.g., generator and extractor parameter counts, reproducibility of code).
-
Extend the comparative analysis to include more recent generative models (e.g., diffusion-based schemes and VAE-based steganography).
-
Expand the discussion of limitations, especially regarding training time (120h for the extractor) and scalability for real-world deployment.
-
Improve English editing: avoid redundancy and enhance transitions between sections.
-
Simplify and improve the readability of Figures 2–5.
The manuscript introduces a novel approach to generative steganography by integrating message mapping in the function domain with implicit neural representation (INR) and a point-cloud extractor. The idea is innovative and addresses important challenges in existing methods, such as resolution coupling, universality of data types, and robustness. The experimental results are promising, showing strong performance in terms of visual quality, message extraction accuracy, and resistance to steganalysis. However, several points could be improved to strengthen the work and make it more accessible to the research community:
-
Methodological clarity and reproducibility
-
While the framework is explained, some technical details remain under-specified. For instance, the exact architecture of the function generator and point-cloud extractor should be described in greater depth (e.g., number of layers, activation functions, number of parameters).
-
The training setup should include more information about hyperparameters (batch size, optimizer, loss weights, initialization methods) and whether results were averaged over multiple runs. This will enhance reproducibility.
-
Consider providing supplementary material (e.g., code repository or pseudo-code) to allow replication of the experiments.
-
-
Comparative analysis
-
The paper compares its scheme against a limited set of baselines (mainly DCGAN-based and diffusion-based models). Expanding the comparisons to include other recent methods—such as VAE-Stega, Coverless image steganography approaches, or alternative INR-based techniques—would make the evaluation stronger.
-
In addition to accuracy and undetectability, including metrics such as computational cost per sample, memory requirements, or scalability analysis would provide a more comprehensive understanding of the advantages and trade-offs of the proposed method.
-
-
Limitations and discussion
-
The paper could benefit from a more explicit section discussing limitations. For example, training the extractor takes approximately 120 hours, which may hinder practical deployment. Similarly, the scheme is tested primarily on CelebA-HQ, which is a specific image dataset; broader validation across other modalities (e.g., audio, 3D models) would strengthen the universality claims.
-
It would be helpful to provide insight into future work, such as optimizing training time, extending the method to video data, or exploring alternative data representations.
-
-
Figures and visualization
-
Several figures (especially Figures 2–5) are dense and somewhat difficult to interpret. Simplifying the diagrams, increasing font sizes, and using consistent visual styles would improve readability.
-
The visual results (Figures 8–13) are convincing, but adding more side-by-side comparisons with competing methods would better highlight the improvements achieved.
-
-
Language and presentation
-
The English is generally clear but could benefit from minor editing to improve fluency and avoid redundancy. For example, the phrase “in this paper we propose” is repeated multiple times and could be replaced with varied expressions.
-
Transitions between sections could be smoother, particularly between Related Work and The Proposed Method.
-
-
References
-
The reference list is appropriate and up-to-date. However, some entries lack complete bibliographic details (e.g., missing page numbers or publisher information for conference proceedings). Standardizing the reference format according to journal style is recommended.
-
Author Response
To make it easier to read, put the reply containing pictures and forms in a PDF file for reviewers' review.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for Authors
Authors incorporated most of the suggestions, but the coherence of text is poor and reader lost interest of reading, you have to improve flow of information delivery more effectively. Need to link between sentences and paragraphs. More concertation is expected on proof reading.
Abstract does not reflect result till now; can summarize the result of tables here; for example, it showed maximum accuracy 96.88% for …………….
The lines 79………. Organization of the paper is not chapters, can pointed as sections and present in a paragraph. Remove chapter from all text and can replace by section.
Need to allocate figures to the right position, for example figure 7 will be in the paragraph 3.4.1,
Expected better explanation of the figure with values and their consequence.
This research selected different functions but not mentioned reason for selecting than other alternates’, like 3.4.3
Table one ours, ours (replace by your model name)
Line 430 zero, it can handle various kind of multimedia data ( what is proof?) … conclusion need to improve with highlighting performance, limitations and conclusion, future research guidelines.
Author Response
Reviewer1:
Comment1:
Authors incorporated most of the suggestions, but the coherence of text is poor and reader lost interest of reading, you have to improve flow of information delivery more effectively. Need to link between sentences and paragraphs. More concertation is expected on proof reading.
Response1:
Thank you very much for your suggestion. We read through the full text and made efforts in the transition between paragraphs and sentences. The following is the revised part of the text:
Section 1 in page 2 line 53:The above problems still exist in Generative steganography.So our goal is to construct a generative steganography scheme based on message mapping that can simultaneously address the generality of generators and extractors, and improve robustness on this basis.
Section 2.2 in page 5 line 200:The above method only changes the representation of data, and is still multimedia steganography in essence.
Section 3.2 in page 7 line 266: The purpose of message mapping scheme is to extract secret messages accurately, so the simplest mapping method is selected in the design. The core of mapping scheme design is the uniqueness of interval partition and the controllability of noise distribution.
Comment2:
Abstract does not reflect result till now; can summarize the result of tables here; for example, it showed maximum accuracy 96.88% for …………….
Response2:
Thank you very much for your suggestion. According to your suggestion, we summarize the results of the paper in the abstract and modify it as follows:
Abstract in line 15: The accuracy of message extraction can reach 96.88% when the embedding capacity is 1bpp, 89.84% when the embedding capacity is 2bpp, and 82.21% when the pruning rate is 0.3.
Comment3:
The lines 79………. Organization of the paper is not chapters, can pointed as sections and present in a paragraph. Remove chapter from all text and can replace by section.
Response3:
Thank you very much for your suggestion. We will restate the organizational structure of the paper and revise it as follows:
Section 1 in page 3 line 80: The organizational structure of this paper is expanded through several logically coherent sections. The specific arrangements are as follows: first, in the "Introduction" section, the classification of generative steganography and the problems existing in the current field are summarized, and then the basic framework of this scheme is further introduced, and the core contributions of the research are clearly described. Then, the "related work" section focuses on generative steganography, and introduces three specific categories in detail. At the same time, it also explains the steganography based on implicit neural representation and its two classifications. On this basis, it points out the shortcomings in the current related research, and leads to the research idea of this paper. Then, the "algorithm design" section, as the core content, comprehensively introduces the main process of the scheme proposed in this paper, and clearly describes the specific details of the scheme from the key dimensions of data representation, message mapping scheme, function generator, and message extractor based on point cloud data. The subsequent "experiment and analysis" section carries out systematic experimental analysis on the scheme, and quantifies the performance of the scheme from several important angles, such as visual security, message extraction accuracy, non detectability, robustness, efficiency, and super-resolution sampling, to verify the effectiveness of the scheme. Finally, the "conclusion" section summarizes the methods proposed in this paper and their practical application effects, and condenses the core results of the research.
Comment4:
Need to allocate figures to the right position, for example figure 7 will be in the paragraph 3.4.1,
Response4:
Thank you very much for your suggestion. Due to typesetting restrictions, we try to map the pictures to their respective positions, and the final version still needs to be determined by the editor.
Comment5:
Expected better explanation of the figure with values and their consequence.
Response5:
Thank you very much for your suggestion. We have reinterpreted the pictures and tables and revised them as follows:
Section 4.3 in page 11 line 380: It can be seen that the details of the stego-media generated by the noise generated from the message mapping scheme are clear and natural, the overall visual effect is smooth, and there is no obvious distortion, blur or abnormal texture. The stego-media and other images have excellent performance in structural background, color texture, style fidelity and so on, which are almost indistinguishable from the other two types of images.
Section 4.6 in page 13 line 425: It can be seen from the results that when the pruning rate is low (such as 0.01, 0.05), the visual quality of the image almost does not decline, and the details of the face are still clear, and the overall integrity can be maintained. This shows that the model can retain the key information better when pruning at a small scale, and has less impact on the image quality. With the increase of pruning rate, the image appears obvious degradation, which means that too high pruning rate has a greater impact on the stego-media. The acc for images with different pruning rates is shown in Table2. It can be seen from the results that with the increasing pruning rate, the accuracy of message extraction shows a downward trend. When the pruning rate was low, although the extraction accuracy decreased to some extent, it still remained at a high level. When the pruning rate reaches 0.1, the message extraction accuracy can still reach 87.50%, which proves that the scheme has good robustness.
Section 4.7 in page 14 line 443: The time cost of message extraction and stego-media generation is controllable. However, in high-resolution scenes, the time cost of both will increase significantly, especially the generation time of -stego-media, which may have a certain impact on the practicability of steganography schemes in high-resolution image scenes (such as application scenarios with high real-time requirements).
Comment6:
This research selected different functions but not mentioned reason for selecting than other alternates’, like 3.4.3
Response6:
Thank you very much for your reminding. As for the selected function, we have explained it as follows:
Section 3.4.3 in page 10 line 350: The message extraction task is not a discrete task, so the loss function in the training process needs to be adaptive to the continuous value output task. MSE has simple mathematical characteristics, low optimization difficulty and low computational cost, which is suitable for large-scale data training.
Comment7:
Table one ours, ours (replace by your model name)
Response7:
Thank you very much for your suggestion. We changed the experimental results related to this scheme in the article from "ours" to "GIS", and used the abbreviation of the scheme name for readers to read.
Table1 Comparison of message extraction accuracy among different schemes.
|
Methods |
Imagesize |
Capacity(bpp) |
Acc |
|
GSN |
64*64 |
1 |
97.15% |
|
DCGANs |
64*64 |
1 |
95.8% |
|
DCGANs |
64*64 |
2 |
94% |
|
Diffusion-Stego |
64*64 |
1 |
98.12% |
|
GSN |
64*64 |
1 |
97% |
|
CopyRNeRF |
/ |
1 |
88.31% |
|
NeRFProtector |
/ |
1 |
92.69% |
|
GIS |
64*64 |
1 |
96.88% |
|
GIS |
64*64 |
2 |
89.84% |
Comment8:
Line 430 zero, it can handle various kind of multimedia data ( what is proof?) … conclusion need to improve with highlighting performance, limitations and conclusion, future research guidelines.
Response8:
Thank you very much for your suggestion. The reason why we can handle all kinds of multimedia data comes from Dupont's paper, which gives the visual representation of all kinds of data under the point cloud data. However, because there is no complete experiment on other data types, the conclusion needs to be modified. According to your suggestion, we modify it as follows:
Conclusion in page 16 line 465: ....The versatility of point cloud data enables it to handle a variety of different types of data. At the same time, by designing a point cloud message extractor, the model size of the extractor is not limited by data resolution and can receive multimedia data of different sizes. Experiments show that our scheme can achieve 96.88% message extraction accuracy when the embedding capacity is 1bpp. When the pruning rate is 0.3, the message extraction accuracy can still reach 82.21%, which has good robustness. It also has advantages in the quality of the stego-media. In the future, we will consider applying implicit neural representation technology to generative steganography for data with time axis such as video and audio to construct a completely universal implicit representation generative steganography scheme.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsDear authors,
The presentation has become clearer and the work looks mature. For the final polish, I recommend only minor changes: add a brief quantitative check of the ‘invisibility’ of message mapping (at least one or two metrics of convergence to Gaussian).
Wish you all the best.
Author Response
Reviewer2:
Comment1:
The presentation has become clearer and the work looks mature. For the final polish, I recommend only minor changes: add a brief quantitative check of the ‘invisibility’ of message mapping (at least one or two metrics of convergence to Gaussian).
Wish you all the best.
Response1:
Thank you very much for your suggestion. We conduct KS test and t test on the noise and random Gaussian noise after message mapping, and draw the distribution statistical chart, which is modified as follows:
Section 4.5 in page 12 line 410: To further detect the invisibility of the scheme, we use KS (Kolmogorov Smirnov test) and T test to statistically analyze the noise and random Gaussian noise after message mapping. It can be seen from the figure10 that there are some differences in the distribution of the two noises, but the frequencies of some intervals are close. Through experiments, the KS test statistics of the two are 0.1562, and the p value is 0.3438; The statistic of t test was 0.1807, and the p value was 0.8569. It can be seen from the results of the two tests that at the 5% significance level, the assumption that "two distributions are the same" cannot be rejected, which proves that the distribution difference between the mapped noise and random Gaussian noise is not statistically significant, which further proves the invisibility of the scheme.
Author Response File:
Author Response.pdf