Human Action Recognition Based on Improved Two-Stream Convolution Network
Round 1
Reviewer 1 Report
In this paper, authors proposed an improved two-stream convolution network. The recognition mode of single frame of spatial stream is changed to multi-frame image recognition byusing BiGRU network, which solves the shortcomings of many existing neural network in the perception of action appearance coherence features.The theory of the paper is correct, the structure is rigorous, and the experiment is sufficient.
My suggestions are as following:
- In introduction, and section 2.2, Please explain why you use attention mechanism SimAM;
- Figure 3, and Figure 6 are classical figures of original algorithms, please re-draw or cite them when use them in your paper.
- In formula 13~17, please explain the meaning of left arrow, right arrow, wavy sign on symbol.
- Please check formula 17.(between line 320 and 321)
- In line 387, how many repeated experiments which you get Figure 10.
- Section 4.3. Results of experiments and analysis, maybe it is better to divided into just 3 parts, “Ablation experiments”,“Comparative experiment”,“Experimental overall analysis”.
- Applied Science is one excellent journal, there are many excellent related papers, please study and cite them.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
The paper has presented the improved version of two-stream CNN with the improved comparative results. However, the following issues need to be addressed in the revised version.
- In the abstract (line 5), the author has mentioned that the paper has utilized the strong mining capabilities of BiGRU. What are those strong mining capabilities and how they have been utilized are not clearly presented in the paper?
- In the introduction section (lines 92-93), as a contribution to the paper, the author has claimed that the proposed structure can well solve the shortcoming of the original network and will provide the possibility for more complex fuzzy recognition tasks as an argument. But, there has no clear justification for this argument. This appears as the author's assumption as the contribution.
- In Section 2.2 (line 116), the attention mechanism is presented. My suggestion is present the attention mechanism in detail describing what is it, and how it is beneficial?
- In figure 2, a three-stream of input is presented in a spatial stream network. How these streams are different from each other in feature. extraction?
- In line 213, Yang et al. .., reference is missing in the statement.
- In line 222, clarification and explanation are required on why the small energy of each neuron is more important?
- In section 4 (line 261), clarification and explanation are required to understand more about "several independent repeated experiments". What are they, and how different they are from each other?
- The author has presented 5 datasets but used only two datasets because of hardware limitations (line 306), What is the hardware limitation of the three left out datasets?
- In line 360, the author claimed satisfactory result, which is vague to understand what is the satisfactory level is?
- The contribution of the paper is weak at the moment in the conclusion section in relation with the summary.
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 2 Report
The revised version has addressed the issues I have raised.