Tohjm-Trained Multiscale Spatial Temporal Graph Convolutional Neural Network for Semi-Supervised Skeletal Action Recognition
Round 1
Reviewer 1 Report
In general the problem of using the spatial-temporal graph convolutional networks for human body action recognition is tremendous actual scientific applied task. A Tohjm-16 trained multi-scale spatial temporal graph convolutional neural network for semi-supervised action recognition is developed by the authors of the paper. In the work the time-level online hard joint mining strategy is proposed, which allows to increase the performance of the overall technology, as seen by experimental results, and helps to focus on hard training skeleton joints in both coarse and fine scales. Authors applied semi-supervised training using unlabeled data which can be useful for processing a big volume of data without need to label it manually.
The work could be interesting for a vast amount of video processing experts, technically sound, the subject matter presented in a comprehensive manner, references provided applicable and sufficient enough, therefore it could be recommended for publication.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 2 Report
The authors studied a multiscale spatial-temporal graph convolutional neural network for semi-supervised skeletal action recognition. The complexity of the model and the useful results provide a good novelty research area. The manuscript is well structured and the writing and language used are easy to comprehend. This will help future researchers to be able to use it. However, I have some minor comments as listed:
- It would be good to discuss some limitations in the conclusion.
- Very short related work section. The authors need to explain the other existing methods and their gaps and limitations which highlights the most important techniques (at least for the chosen problem), and above all a comparison with the technique proposed in the paper. A final table summarizing the technical aspects of the various methods would be very useful.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf
Reviewer 3 Report
This work proposes an improved ML-based approach for skeletal action recognition that can be trained semi-supervised, focusing on training joints. Overall, it is an interesting and well-designed paper, but needs a few minor revisions.
Some comments for further improvements are:
a) There are some grammar and syntax mistakes that need to be addressed.
b) The introduction needs some minor improvements in presenting the drawbacks of ST-GCN-based approaches and the work's main contributions. Please present them properly by using lists.
c) The related works section should be extended. Consider adding the following additional similar works:
https://www.mdpi.com/1424-8220/20/17/4943
https://www.hindawi.com/journals/mpe/2021/6650632/
https://ieeexplore.ieee.org/abstract/document/9772775
d) The presentation in some chapters needs further improvement structure-wise. When presenting a specific module, please consider adding a dedicated subsection.
e) Please do a careful proofread of the paper.
Author Response
Please see the attachment.
Author Response File: Author Response.pdf