Next Article in Journal
OntoDomus: A Semantic Model for Ambient Assisted Living System Based on Smart Homes
Next Article in Special Issue
Broadband and Wide-Angle Performance of a Perfect Absorber Using a MIM Structure with 2D MXene
Previous Article in Journal
An Efficient Simulation-Based Policy Improvement with Optimal Computing Budget Allocation Based on Accumulated Samples
Previous Article in Special Issue
Sensitivity Improvement of Surface Plasmon Resonance Biosensors with GeS-Metal Layers
 
 
Article
Peer-Review Record

An End-to-End Video Steganography Network Based on a Coding Unit Mask

Electronics 2022, 11(7), 1142; https://doi.org/10.3390/electronics11071142
by Huanhuan Chai 1, Zhaohong Li 1,*, Fan Li 2 and Zhenzhen Zhang 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Electronics 2022, 11(7), 1142; https://doi.org/10.3390/electronics11071142
Submission received: 3 March 2022 / Revised: 30 March 2022 / Accepted: 31 March 2022 / Published: 5 April 2022
(This article belongs to the Special Issue Digital and Optical Security Algorithms via Machine Learning)

Round 1

Reviewer 1 Report

The manuscript described end-to-end steganographic system. The manuscript is hard to read and it seems to be written carelessly. English should be improved. Please revise the manuscript in terms of comma/dots, gramma, word replacement, sentence structure. The clarity also should be improved (e. g. “attention mechanism”, Fig. 3, ). Fonts in Fig. 4 are too small. Fig. 4-6 includes to much layer description. It is better to summarize the description in tables. In Fig. 4 and 5, the size of the concatenated tensor is not the sum of three input tensors. Why? The idea of using the CU partitioning is unclear. Section 4 should be revised to remove the unnecessary sentence and the header. It is better to create one comparison Table for Tab. 1-4. First obtained result should be reported and the compared without subsections.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

TITLE
An End-to-End Video Steganography Network based on CU Mask:
Pls substitute CU with Coding Unit in title. 

 

ABSTRACT
The 1/3 of the abstract looks like a brief introduction before the official Introduction. 
The abstract should briefly set the research problem, as well as summarize paper results and key findings without many details. 


KEYWORDS
Pls replace 'CU mask' with Coding Unit mask.

Considering adding 'PyraGAN' to the  Keywords. 

 

SECTIONS 
2. Proposed End-to-End Video Steganography ==> 2. Proposed End-to-End Video Steganography System

Please avoid too brief section titles; provide descriptive section titles instead.  For instance: 
2.1.4. Discriminator
3.1. Comparison of Parameters (what Parameters?) 
3.2. Comparison of Performance (of which thing?) 
3.2.2. Imperceptibility, and so on. 

 

FIGURES
In Figure 3: pls explain the function of the boxes (lefgt & right) 
The fonts in Figures 3,4,5,6 are too small.   
Figures 4,5:   pls explain the meaning of the gray background. 
Figure 7: I suggest replacing the caption (The discriminator component of PyraGAN). I think that Figure 7 displays results about the visual quality of steganographic video frame.


TABLES
I dont see why two numbers are undelined in Tables 3 & 4. 
Table captions: Full-stop should be removed. For instance: Table.1 ==> Table 1  (and so on). 

 

FORMATTING
The paragraph between Lines 36-46 is not fully justified. 

 

ENGLISH: GRAMMATICAL ERRORS, SYNTACTICAL ERRORS 
As a general rule, avoid too long sentences, separate the main sentence from the secondary sentence(s) using semi-colon or worlds like "that". 
For instance: 
Discriminator as show in Fig.6 plays the role of steganalysis accepts as input both cover frames and steganographic frames.

The paper needs proof-reading. There are several grammatical and syntactical errors. For instance: 

L13: Considering video has the characteristic  ==> Considering that video has the characteristic

L47-49: The end-to-end steganography network different from traditional methods is a new
type of data hiding that is simultaneously trained to create the hiding and revealing pro-
cesses and is designed to specifically work as a pair. ==> 
The end-to-end steganography network is different from traditional methods in that is a new
type of data hiding that is simultaneously trained to create the hiding and revealing pro-
cesses and is designed to specifically work as a pair. 

L84: ...is shown in Fig.1 consists of three modules:  ==> is shown in Fig.1 and consists of three modules:

L93: We can see the CU mask fits the content of image very well ==> We can see that the CU mask fits the content of image very well 

L98: Meanwhile, Convolutional Block Attention Module (CBAM)[30] in Fig.3 is utilize as our attention mechanism. ==>
As shown in Figure 3, a Convolutional Block Attention Module (CBAM) [30]  is used as our attention mechanism.

L104: Encoder as shown in Fig.4 accepts as input cover video frame ==> 
The Encoder, shown in Fig.4, accepts as input a cover video frame

L109:  CU mask which is gotten by compressing the ==> 
CU mask which is/are obtained by compressing the (?) 
(Pls clarify the subject of gotten (obtained): is it one (the CU mask) or two  (the feature channels and the CU mask)?

L128: Decoder as shown in Fig.5 accepts the steganographic video frames ==> 
The Decoder (shown in Fig.5) accepts the steganographic video frames

L130: one finally convolutional layer. ==>  one final convolutional layer. 

L133: video frames can be gotten.  ==>  video frames can be obtained.

L203: A good steganography should take into consider properties of ==> A good steganographic (system OR algorith) should take into account ( properties of

L228: proved that the stego frames ==> provided that the stego frames (?)

L255: In general, proposed model of P_H16_3 is the best on the performance of capacity, imperceptibility and accuracy ==> 
In general, the proposed model of P_H16_3 is the best in terms of capacity, imperceptibility and accuracy

 

ENGLISH: TYPOGRAPHICAL ERRORS, PUNCTUATION ERRORS        

As a general rule, place a space before "[".  For instance: 
L30: general public[1]. ==> general public [1].

L64: ...information embedding and recovery, To resist...  ==> information embedding and recovery. To resist

L116: PyraGAN_H32_2 and This network  ==>  PyraGAN_H32_2. This network

L138: message’.  ==> message. 

L 150:  algorithm. all operations are ==> algorithm. All operations are

As a general rule, place a space after ")".  For instance: 
L154:  (1)Message ==> (1) Message

Eq.(1): A right parenthessis ")" is missing. 

 

REPETITIONS
L74: VVC (the latest video coding standard) 
L96: (VVC) which is the latest video coding standard
L158, L160 : also. 


ACRONYMS
Perhaps  MSE & PSNR should be defined. 

 

METHODOLOGY
Pls explain encoder and decoder in more details; comment on the blocks of figures 4 & 5. 

 

CONCLUSION
I strongly encourage improving the conclusion.

The sentence of Lines 259-260 seems unnecessary: 
This section is not mandatory but can be added to the manuscript if the discussion is unusually long or complex.

Section 6. Patents is rather the conclusion. Line 261 ("Section 6. Patents") should be removed. 

What do your results mean to a wider scientific community? 
What are the limitations of your method? 
 

REFERENCES
L165: DIV2K training set: Pls provide References or Links.
L166: MSCOCO data set: Pls provide References or Links.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

I think the presentation of this paper is too short so that hypotheses do not emerge and the contribution becomes less clear. A few things need to be added:
1. It is necessary to add 1 section that discusses end-to-end video steganography's state of the art. Why use PyraGAN and CU? What are the advantages of the previous method are?
2. Step by step encoder and decoder steps need to be explained in more detail.
3. What dataset is used needs to be explained.
4. Eq. (2) instead of the MSE formula, the MSE formula needs to be explained more clearly.
5. If the ACCU value can't reach 1, does it mean that the message can't be perfect? If so, how can a message be read?
6. The analysis of the results is too short, and there is no conclusion

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

All concerns are addressed.

Author Response

Thank you very much for your comments and suggestions.

Reviewer 3 Report

A step-by-step explanation written with numbering will be easier to read than in the form of paragraphs.

 

The variables used need to be explained, and I'm having trouble finding an explanation of the variables. Maybe you can add a notation list.

CxHxW calculates the MSE formula. Why in the MSE formula is there no summation of the square of the error? Which compares the MSE of the entire video or just the frame? If it's just a frame, what's the difference with a steganographic image. For reference, see papers 10.1007/s11042-020-10035-z and 10.1109/TIP.2013.2273671

 

In traditional steganography, perfect extraction is possible, and this is considered very necessary because if there is a 1-bit error, it will make the message possibly unreadable. Maybe the perfection of bit extraction needs to be discussed in more depth. Does this only happen in deep learning-based steganography? If true, then can the deep learning-based steganography method not be implemented for now?

 

Suppose I compare it with traditional steganography, for example in paper 10.1007/s11277-019-06393-z. Why is the PSNR value produced by this method not special. Although logically, deep learning is computationally heavier than traditional steganography?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

I am satisfied with your response and revision result, this version is acceptable.

But there are some small typos such as "modle", etc, this can be fixed when doing proof.

Back to TopTop