Review Reports - DICTION: DynamIC robusT whIte bOx Watermarking Scheme for Deep Neural Networks

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript presents a dynamic robust white-box watermarking scheme for deep neural networks (DICTION). Using a DNN to extract watermark information through the intermediate feature map of the trigger set is a good idea. Although experimental results show that the proposed DICTION has good fidelity, robustness, security, integrity, etc., the manuscript needs to be further improved.

There are some issues as follows.

The sentence “Its main originality stands on ...”in Section Abstract is rather long. It is recommended to make it shorter for reading convenience.
Add an introduction about the BN layer. Some white-box model watermarking methods select BN layer parameters for watermark embedding.
Section 2 is really lengthy, which makes different sections rather imbalanced.
When describing convolutional network parameters, the shape is now more commonly used, which is also how PyTorch describes it. Please consider whether to continue using this representation .
Do not repeat the same symbols, such as using “d”for both kernel size and distance. Use consistent symbols throughout the paper. The format of Formula 17 needs to be adjusted.
Lots of the knowledge has already been covered in other papers, so please keep your description concise. Focus on summarizing other work and then presenting your unique work.
Please describe in detail how to design the structure of the projection model (in addition to 2-layer fully connected neural network, other networks can also be used, such as the ResNet50, where the classification fully connected layer can be replaced with N-bit-output fully connected layer), as well as the training strategy. These can also be illustrated in Figure 2.
The expression of the loss function can be improved. For example, the loss can be divided into two parts: the loss of the watermarkedmodel and the loss of the projection model. Then, introduce how the two models are trained.

（1）Watermarked model training loss:

original classification model training loss = classification loss of the training set

watermark embedding loss = watermark extraction loss of the watermarked model's trigger set

watermarked model train loss = original classification model training loss + watermark embedding loss

（2）Projection model training loss:

watermark trivial loss = random watermark extraction loss of the original model's trigger set

projection model training loss = watermark embedding loss + watermark trivial loss

In all tables of experimental results, the order of each method should be fixed, and the best results should be marked.
Some of the text in the figures is too small. Please enlarge it appropriately (even though they are vector images), such as Figures 2, 3, and 4.
Please check the caption in Figure 3, which may contain template text. In addition, please optimize the layout of Figure 3.
Abbreviations used in charts need to be explained. For example, BM1 in Table 4 refers to Benchmark 1 mentioned earlier, which can be explained in Section 4.1.
In Section 4.5, Figure 4 and Table 5 show that DICTION preserves the distribution spanned by the model. Please explain why this effect can be achieved (since the training samples in each training batch are different, i.e., randomly selected from the training set, even models with the same architecture may have different optimization directions for their parameters).
Please explain the meaning of CNN Layers in Table 8 and the corresponding detailed settings.
Check the grammar throughoutthe For example, there are two commas in Section 2.2.1.
Formats of tables in the manuscript can be improved.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors introduce Diction as a dynamic watermarking method for neural networks operating in a white-box setting. The scheme leverages generative adversarial networks and enables embedding data within the activation maps of the model. However, there are a few important issues for which I did not find answers and which should be addressed in the text of the article:

The authors did not clearly define the research problem in the introduction.
Authors should discuss how the proposed DICTION scheme performs on large-scale, real-world models such as GPT, BERT, EfficientNet etc. Authors limit their evaluation to simple datasets e.g. MNIST, CIFAR-10 and relatively small architectures which does not provide sufficient insight into the scalability and practical applicability of the method.
Authors of this paper should standardize the testing conditions by using the same number of bits, the same network layer, and consistent evaluation metrics such as BER and fidelity.
It would also be beneficial to perform tests with a unified set of hyperparameters and different trigger set variants.
The reviewer suggests extending the analysis to include anomaly detection mechanisms in the activation space, as well as evaluating the scheme's resilience to more advanced attacks such as gradient inversion and fingerprinting. Additionally, it is recommended to conduct tests using latent spaces generated from distributions other than the standard normal, in order to better assess the method's generality and robustness.
Please describe and justify whether the proposed method can be effectively applied to models operating in resource-constrained environments, such as mobile devices or embedded systems.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In this paper, authors present a unified framework for white-box deep neural network (DNN) watermarking methods, outlining their theoretical connections, and propose a new method called DICTION. DICTION is a dynamic and robust watermarking scheme that leverages a generative adversarial network (GAN) setup, where the watermark extraction function acts as a discriminator and the target model serves as the generator. Positioned as a generalization of DeepSigns—the only other dynamic white-box watermarking method—DICTION significantly improves performance in terms of watermark capacity, model accuracy, and robustness against various attacks. However, following are the few suggestions and recommendations that could help to improve the overall quality of the paper:

Add results in quantify manner at the end of the abstract section
Add contributions in bullet forms at the end of the intro section
Related work should be main section rather than a sub section
Section 1.2 is not required. it is mostly the part of thesis type documents. in research papers it should be within the intro section discussion
Advantages and drawbacks of this approach : this should be a subsection in each section
equation number is missing after 24
Captions of diagrams are too big.
Rename the section 3 with proposed methodology and introduce the scheme name within the section
Authors should propose future directions
Outline study limitations
Add references from 2024 and 2025

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

The similarity report is showing 51% which is very high. Authors should reduce it

Author Response

Comment 1: The similarity report is showing 51% which is very high. Authors should reduce it.

Response 1: Thank you for your valuable comment regarding the similarity report. We appreciate your careful review.

We would like to clarify that the similarity percentage is mainly due to the overlap with our own preprint version of the manuscript, which was made publicly available prior to submission, in accordance with the journal's preprint policy. Since the first round of review, we have significantly revised and improved the manuscript, particularly the clarity and presentation of the experimental results, while incorporating all suggestions from the reviewers across the different sections.

We made further efforts to rephrase overlapping sentences where possible without compromising scientific accuracy and clarity.

Thank you again for your insightful feedback.