Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessFeature PaperArticle

Peer-Review Record

Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Electronics 2020, 9(1), 147; https://doi.org/10.3390/electronics9010147

by Fazliddin Anvarov

, Dae Ha Kim and Byung Cheol Song^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2020, 9(1), 147; https://doi.org/10.3390/electronics9010147

Submission received: 27 November 2019 / Revised: 4 January 2020 / Accepted: 9 January 2020 / Published: 12 January 2020

(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

3D Convolutional Neural Network model is proposed in this study for vide based action recognition based on sequential feature aggregation. This paper addressed an important problem in the field, however, there are some information is missing and a revision is required. Please consider bellow comments to improve the quality of your manuscript.

Please discuss the research gap briefly in the abstract. Include the state of the art techniques, what is their problems and how your model solves these problems. Accuracy is not always a good indicator, it depends on the data population. Ideally include other metrics such as kappa. For more information please refer to this work, deep convolutional neural network designed for age assessment based on orthopantomography data.

Include some other evaluation metrics in the evaluation section such as precision, recall, etc. Tables 5 and 6 are not discussed properly in the manuscript. explain the results and discuss it. how the complexity time is achieved, based on how much data, is it training + test? if not, how about the training time. What are the values in Table 3 and 4. include this information in their captions.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper presents an interesting method for performing activity recognition using convolutional neural networks combined with squeeze-and-excitation and self-attention.

The paper must be reorganised.

-Introduction must present the motivation of the paper together with the new aspects introduced by the paper.

-Description of the proposed method must be explained more clearly - introduce a section for it, in which will be presented the proposed architecture (with Figures 1 and 3) showing the new aspects.

-Figures and tables must appear after there are referred into the text.

-The evaluation result must be extended: add the confusion matrix in order to provide what are the activities with best accuracy. What is the processing time - is it a real time method?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thank you for providing the revised version of your manuscript and response to the reviewer's letter. Some comments have been addressed, however, there are still some issues needs to be addressed in the manuscript. The provided confusion matrix is related to HMDB51 which the results are quite inaccurate (~74%). Please include a similar confusion matrix for the UCF-101 which results are ~95%. Based on this confusion matrix, you can calculate the Precision and recall and report it. Moreover, based on this confusion matrix, please report the Balanced Accuracy (True Positive rate + True Negative Rate /2). Some text in Figures are very small and hardly readable, please enlarge them. Regarding the model description, please include a table and show the layers of your mode, for a sample, please see table 1 of this work and include it in your introduction, deep convolutional neural network designed for age assessment based on orthopantomography data. Please expand the section Qualitative result section and including some real qualitative analysis.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Thank you for your comments. Please add extra explanation for Figure 5 - confusion matrix:

-how was chosen the 20 classes used for confusion matrix?

-you can plot a normalised confusion matrix for all classes (see: https://pythonhealthcare.org/category/machine-learning/page/1/, point 107)

-explain the obtained results: what are the best classes that are recognised and what are classes with poor results? Please explain the obtained results.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

Action Recognition Using Deep 3D CNNs with Sequential Feature Aggregation and Attention

Further Information

Guidelines

MDPI Initiatives

Follow MDPI