Next Article in Journal
THREATGET: Towards Automated Attack Tree Analysis for Automotive Cybersecurity
Next Article in Special Issue
Masked Face Recognition System Based on Attention Mechanism
Previous Article in Journal
Using Job Demands–Resources Theory to Predict Work–Life Balance among Academicians in Private Universities in Egypt during the COVID-19 Pandemic
Previous Article in Special Issue
Hybrid No-Reference Quality Assessment for Surveillance Images
 
 
Article
Peer-Review Record

Basketball Action Recognition Method of Deep Neural Network Based on Dynamic Residual Attention Mechanism

Information 2023, 14(1), 13; https://doi.org/10.3390/info14010013
by Jiongen Xiao 1,2, Wenchun Tian 3,* and Liping Ding 2
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Information 2023, 14(1), 13; https://doi.org/10.3390/info14010013
Submission received: 28 November 2022 / Revised: 22 December 2022 / Accepted: 23 December 2022 / Published: 27 December 2022
(This article belongs to the Special Issue Deep Learning for Human-Centric Computer Vision)

Round 1

Reviewer 1 Report (Previous Reviewer 5)

The following are the observations made on the manuscript:

1. The assertions on lines 33-37 require citations.

2. Provide a rationale/justification for the assertion in lines 59-61 and in lines 65-70.

3. The paper lacks a clear research problem (research gap) that is anchored on a thorough review of extant literature. The background of the study is poor. The novelty of the study is in question.

4. Citation(s) is needed for the assertions made in lines 99-104.

5. How did the authors arrive at the more than 97% accuracy for posture recognition for their study?

6. The implications and contributions of the study findings should naturally follow from the study results and discussion. This is missing in the manuscript. Authors should state the implications/contributions of the study findings to theory and practice.

7. Authors should provide the justification for the assertion "the traditional C3D does not accurately recognize basketball poses".  Provide citation(s).

8. The discussion on the findings of the study on pages 9 and 10 is not adequate. I expect a robust discussion on the findings of the study with allusions drawn to findings from previous related studies and how they compare and/or compare with the findings of the current study.

 

Author Response

Review 1:

  1. The assertions on lines 33-37 require citations.

Reply: We gratefully appreciate for your valuable comment! We have added the citation of the assertions on lines 33-37.

  1. Provide a rationale/justification for the assertion in lines 59-61 and in lines 65-70.

Reply: We gratefully appreciate for your valuable comment! We have added some citations to provide a rationale for the assertion in lines 59-61 and in lines 65-70.

  1. The paper lacks a clear research problem (research gap) that is anchored on a thorough review of extant literature. The background of the study is poor. The novelty of the study is in question.

Reply: We gratefully appreciate for your valuable comment! We have added a thorough review of extant literature to obtain a clear research problem and background, as shown in the red part on page 2.

  1. Citation(s) is needed for the assertions made in lines 99-104.

Reply: We gratefully appreciate for your valuable comment! We have added the citation of the assertions on lines 99-104.

  1. How did the authors arrive at the more than 97% accuracy for posture recognition for their study?

Reply: We gratefully appreciate for your valuable comment! This is because the recognition accuracy of our proposed method is only over 97% in some special basketball movements, not all movements.

  1. The implications and contributions of the study findings should naturally follow from the study results and discussion. This is missing in the manuscript. Authors should state the implications/contributions of the study findings to theory and practice.

Reply: We gratefully appreciate for your valuable comment! We have added the implications/contributions of the study findings to theory and practice, as shown in the red part on page 9 and page 10.

  1. Authors should provide the justification for the assertion "the traditional C3D does not accurately recognize basketball poses".  Provide citation(s).

Reply: We gratefully appreciate for your valuable comment! We have added the citation of the assertions.

  1. The discussion on the findings of the study on pages 9 and 10 is not adequate. I expect a robust discussion on the findings of the study with allusions drawn to findings from previous related studies and how they compare and/or compare with the findings of the current study.

Reply: We gratefully appreciate for your valuable comment! We have added more discussion on the findings of the study, as shown in the red part on page 9 and page 10.

Reviewer 2 Report (Previous Reviewer 4)

The authors have replied to my comments properly. However, there are still many grammatical errors. Some of these grammatical errors have been highlighted in the attached file. Please read the whole paper again and correct possible typos. 

Comments for author File: Comments.pdf

Author Response

I have carefully checked the grammar errors and corrected them carefully. Thank you for your correction.

Reviewer 3 Report (Previous Reviewer 3)

The comments have been addressed.

 

Author Response

Thank you very much for your support and professional review of the article

Reviewer 4 Report (Previous Reviewer 1)

There are no new comments.

Author Response

Thank you for your guidance and correction, and thank you for your recognition of the article

Round 2

Reviewer 1 Report (Previous Reviewer 5)

The following are the observations made on the manuscript.

1. The authors have not informed the reader on how they arrived at 97.82% recognition accuracy rate.

2. The results are not discussed. The authors presented the results without discussing them.

3. What are the implications of the results and how do the results contribute to theory and practice in the field of computer vision and other related fields?

4. To what extent is the problem identified in lines 77-81 solved in the study? Authors should be very precise and should clearly state how the problem was solved/resolved and to what extent it was solved. This is necessary because prior studies have already proffered solutions to the problems identified in the study and thus not really novel.

5. Has the method been implemented in real life context? If yes, this should be reported and if no. it is necessary to do the implementation in actual basketball sporting situation to serve as validation for the study. 

 

Author Response

Dear Editors and Reviewers,

Thank you for your letter and for the reviewers’ comments. Those comments are all valuable and very helpful for revising and improving our paper and also are very important for guiding our research. We have studied the comments carefully and have made corrections which we hope meet with approval. Revised parts are marked in red in the new manuscript.

 

#Response to Reviewers

 

Reviewer#1

The following are the observations made on the manuscript.

  1. The authors have not informed the reader on how they arrived at 97.82% recognition accuracy rate.
    Response: Thanks sincerely for the insightful suggestion.
    "In previous approaches, there is the problem that keyframes are difficult to be focused. To solve this problem, we designed a deep neural network based on a dynamic attention mechanism from several perspectives. First, we used a median filtering method to pre-process the images to get basketball motion images with less noise. In addition, we also modified the original convolutional layer, into dynamic residual convolution, which not only improved the correct rate but also made the final network extraction more efficient. Most importantly, we further improved the attention mechanism to be able to extract the most important features in basketball sports images, further improving the accuracy of basketball sports recognition."

    To help the reader understand the mechanics of how our method works, we have added this passage to the Conclusion and discussion section.

************************************************************************************************

 

  1. The results are not discussed. The authors presented the results without discussing them.
    Response: Thanks sincerely for the insightful suggestion.
    We have added the discussion in the Conclusion and discussion section and marked it in red.

************************************************************************************************

 

  1. What are the implications of the results and how do the results contribute to theory and practice in the field of computer vision and other related fields?
    Response: Thanks sincerely for the insightful suggestion.
    "The contribution of our research to computer vision and its related fields can be summarized in three main parts. First, we propose a paradigm for solving sports-like behavior recognition, and this problem-solving paradigm can be easily extended to other fields, not only for basketball sports behavior recognition. Second, we bring inspiration for other complex behavior recognition. Importantly, we introduce attention mechanisms and improve them for specific problems, and the experimental part illustrates the effectiveness of our approach. Third, we bring some proven ideas for network design. In this paper, to circumvent the problem of network degradation due to the deepening of network layers, we use an improved residual network structure, which is different from the traditional residual network structure. This structure, we discuss in detail in the methods section."

We have added this passage in the Conclusion and discussion section and marked it in red.

************************************************************************************************

 

  1. To what extent is the problem identified in lines 77-81 solved in the study? Authors should be very precise and should clearly state how the problem was solved/resolved and to what extent it was solved. This is necessary because prior studies have already proffered solutions to the problems identified in the study and thus not really novel.
    Response: Thanks sincerely for the insightful suggestion.
    We describe the previous studies in more detail to illustrate the validity and novelty of our proposed method. We entered 77-81 for revision and marked in red.

************************************************************************************************

 

  1. Has the method been implemented in real life context? If yes, this should be reported and if no. it is necessary to do the implementation in actual basketball sporting situation to serve as validation for the study.
    Response: Thanks sincerely for the insightful suggestion.
    The current proposal to apply this method to actual motion scenes, where the deployment of specific solutions has been affected by the epidemic, is something we are discussing, which may be more complex than the method itself. For example, we need to ensure the stability of the sensors and overcome the problems of occlusion, shadows, and strong illumination in real-life scenarios. The application to real-life research is already in progress, and in the future, once completed, we will submit this work to the journal INFORMATION. It is worth noting that in the experimental part, the datasets we used, which are all real datasets from daily life, achieved good results, so there is reason to believe that in the future our method will be just as good in real life.

************************************************************************************************

 

Round 3

Reviewer 1 Report (Previous Reviewer 5)

The following observations are made on the manuscript:

1. What is the full form of C3D? At the first mention of the term, the full form of the term should be written first with its short form placed in bracket, thereafter, the short form can then be used in the manuscript.

2. Authors should separate discussion from the conclusion section. It is advisable that the discussion be either made a section of its own (e.g., 'Discussion') or that it be made a co-section with results (e.g., Results and Discussion').

Essentially, what are expected in the discussion section are: 1) analysis of results, 2) the comparisons of current study findings with findings from prior related studies, 3) the implications of the findings to theory (existing literatures in the field) and practice (to industry), etc.

3. The authors have not informed the reader on how they arrived at the 97.82% recognition accuracy rate.

4. From the explain provided in 2. above, the results have not been discussed. 

5. Also, the authors should indicate the implications of the study results to theory and practice.

6. A real-life implementation of the proposed method is necessary to validate the proposed method.

7, I expected that the study's future works will concentrate on the observed limitation of the current study as stated on lines 421-423. 

 

Author Response

Dear Editors and Reviewers,

Thank you for your letter and for the reviewers’ comments. Those comments are all valuable and very helpful for revising and improving our paper and also are very important for guiding our research. We have studied the comments carefully and have made corrections which we hope meet with approval. Revised parts are marked in red in the new manuscript.
The following observations are made on the manuscript:

  1. What is the full form of C3D? At the first mention of the term, the full form of the term should be written first with its short form placed in bracket, thereafter, the short form can then be used in the manuscript.
    Response: Thanks sincerely for the insightful suggestion.
    C3D refers to the name of a convolutional neural network. Its full name is Convolutional 3D, and we have added its full name in line 11 and marked it in red.

 

  1. Authors should separate discussion from the conclusion section. It is advisable that the discussion be either made a section of its own (e.g., 'Discussion') or that it be made a co-section with results (e.g., Results and Discussion').

Essentially, what are expected in the discussion section are: 1) analysis of results, 2) the comparisons of current study findings with findings from prior related studies, 3) the implications of the findings to theory (existing literatures in the field) and practice (to industry), etc.

Response: Thanks sincerely for the insightful suggestion.
We have added a discussion section.
“In previous approaches, there is the problem that keyframes are difficult to be focused. To solve this problem, we designed a deep neural network based on a dynamic attention mechanism from several perspectives. First, we used a median filtering method to pre-process the images to get basketball motion images with less noise. In addition, we also modified the original convolutional layer, into dynamic residual convolution, which not only improved the correct rate but also made the final network extraction more effi-cient. Most importantly, we further improved the attention mechanism to be able to extract the most important features in basketball sports images, further improving the accuracy of basketball sports recognition.


The contribution of our research to computer vision and its related fields can be summa-rized in three main parts. First, we propose a paradigm for solving sports-like behavior recognition, and this problem-solving paradigm can be easily extended to other fields, not only for basketball sports behavior recognition. Second, we bring inspiration for other complex behavior recognition. Importantly, we introduce attention mechanisms and im-prove them for specific problems, and the experimental part illustrates the effectiveness of our approach. Third, we bring some proven ideas for network design. In this paper, to circumvent the problem of network degradation due to the deepening of network layers, we use an improved residual network structure, which is different from the traditional re-sidual network structure. This structure, we discuss in detail in the methods section.”

 


  1. The authors have not informed the reader on how they arrived at the 97.82% recognition accuracy rate.
    Response: Thanks sincerely for the insightful suggestion.
    We have added the experimental details in section 4.1 and marked them in red.


    4. From the explain provided in 2. above, the results have not been discussed.

Response: Thanks sincerely for the insightful suggestion.
We have added to section 4.3 to provide a discussion of the results.


5. Also, the authors should indicate the implications of the study results to theory and practice.

Response: Thanks sincerely for the insightful suggestion.
We have added the practical impact of the method on basketball and computer vision theory to the discussion section and marked them in red.


6. A real-life implementation of the proposed method is necessary to validate the proposed method.

Response: Thanks sincerely for the insightful suggestion.
The proposed method does require real-time implementation. in lines 100-104, we illustrate the effect of our proposed method in real-time.


7. I expected that the study's future works will concentrate on the observed limitation of the current study as stated on lines 421-423.
Response: Thanks sincerely for the insightful suggestion.
We agree with you that future improvements will target this part to be able to recognize more kinds of basketball.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

This paper improves C3D into a dynamic residual convolutional network in order to extract sufficient feature information and apply it to basketball action recognition. The subject matter of this paper is interesting, however, there are some comments and questions about the manuscript as follows.

1. on page 2, line 38, "tame" should be "game".

2. All graphic headings should be centered.

3. On page 6, line 158, "jump" should be "jumps".

4. In line 288 on page 11, "Final" should be "Finally".
5. Almost all cited papers were published before 2020, suggest adding new ones from recent years.
6. Relatively few methods. It is suggested to add comparison experiments with state-of-the-art algorithms.

Author Response

Review 1:

  1. on page 2, line 38, "tame" should be "game".

Reply: We gratefully appreciate for your valuable comment! We have modified the “tame”, as shown in the red part line 38 on page 2.

  1. All graphic headings should be centered.

Reply: We gratefully appreciate for your valuable comment! We have modified the graphic headings format.

  1. 3. On page 6, line 158, "jump" should be "jumps".

Reply: We gratefully appreciate for your valuable comment! We have modified the “jump”, as shown in the red part line 158 on page 6.

  1. In line 288 on page 11, "Final" should be "Finally".

Reply: We gratefully appreciate for your valuable comment! We have modified the “Final”, as shown in the red part line 288 on page 11.

  1. 5. Almost all cited papers were published before 2020, suggest adding new ones from recent years.

Reply: We gratefully appreciate for your valuable comment! We have added some references during 2021~2022.

  1. 6. Relatively few methods. It is suggested to add comparison experiments with state-of-the-art algorithms.

Reply: We gratefully appreciate for your valuable comment! We have added comparison experiments with state-of-the-art algorithms, as shown in the red part Section 4.3 on page 9.

Reviewer 2 Report

This paper presented the method for action recognition, which proposed the improved residual connection and the improved attention mechanism based on C3D network. To prove the effectiveness of the proposed method, the authors demonstrated that the action recognition performance was improved compared to the C3D baseline. However, thie paper has a lack of novelty and needs more experimental results to assure its effectiveness.

 

Q1. What is the jump coefficient? Please clearly define the jump coefficient and specify in Fig. 5.

 

Q2. What is difference of the proposed attention mechanism compared to the existing methods? Please compare action recognition performance with other methods.

 

[1] Wang, Xiaolong, et al. "Non-local neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[2] Wang, Qilong, et al. "ECA-Net: Efficient channel attention for deep convolutional neural networks." Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, WA, USA. 2020.

 

Q3. It is uncertain to check the improvement of each proposed module. Please present ablation study to ensure action recognition performance step by step.

 

Q4. The authors should present qualitative results, which show the effectiveness of the proposed attention mechanism visually. Please present visual attention map spatially and temporal attention weights of videos.

 

Q5. The authors need more experimental results of the proposed method on other action recognition datasets. Otherwise, the results of other methods on the basketball technical action dataset should be presented.

Author Response

Review 2:

  1. What is the jump coefficient? Please clearly define the jump coefficient and specify in Fig. 5.

Reply: We gratefully appreciate for your valuable comment! The jump coefficient is the  in Fig.5. In this paper, we obtain the parameter  by two FC layers and Sigmoid function that scales the one-dimensional parameters in the range of 0-1, which is equivalent to adaptively learning the parameters according to the process of updating the network parameters. This adaptively jumps the residual blocks of the connection line more conducive to the flow of information and accelerates the network training to improve the final recognition effect of the model, as shown in the red part of Section 3.2 on page 6.

  1. 2. What is difference of the proposed attention mechanism compared to the existing methods? Please compare action recognition performance with other methods.

[1] Wang, Xiaolong, et al. "Non-local neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

[2] Wang, Qilong, et al. "ECA-Net: Efficient channel attention for deep convolutional neural networks." Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, WA, USA. 2020.

Reply: We gratefully appreciate for your valuable comment! We have cited two references. The difference of the proposed attention mechanism and the existing methods are shown below:

The traditional attention mechanism can only focus on some specific features of the image through channel attention and spatial attention, but cannot focus on the key frames in time. Our proposed improved attention mechanism can generate the attention feature map along the three dimensions of channel, space and time, as shown in the red part of Section 2.3 on page 4 and Section 3.3 on page 6.

  1. 3. It is uncertain to check the improvement of each proposed module. Please present ablation study to ensure action recognition performance step by step.

Reply: We gratefully appreciate for your valuable comment! We have added the ablation study of each proposed module, as shown in the red part of Section 4.4 on page 11.

  1. The authors need more experimental results of the proposed method on other action recognition datasets. Otherwise, the results of other methods on the basketball technical action dataset should be presented.

Reply: We gratefully appreciate for your valuable comment! We have added more comparison experimental results on other datasets and different basketball moves, as shown in the red part of Section 4.1 on page 8 and Section 4.3 on page 9-11.

Reviewer 3 Report

The comments were better solved, and I recommended it to be published. However, some improvements should be done for the lingistic quality.

Reviewer 4 Report

This paper presents a basketball action recognition method of deep neural network based on a dynamic residual attention mechanism to address the problem of low recognition rate of basketball sports in traditional C3D network. This work is interesting. However, I would look for the revision to make the paper more complete.

#1. There are many language errors. Please read the whole paper again and correct the possible typos.

#2. All equations should be rewritten to be clearer. 

#3- Experimental results and their comparison is not seen with statistical evaluation.

#4. Some remarks on the main results would be necessary and helpful!

#5. The authors do not mention the limitations of this work. It would be useful to discuss the limitations of this work.

#6. The authors should consider updating the references and must cite recently published papers as this may be helpful in making some comments and comparisons.

#7. Future works can be added to attract future researchers and readers.

Reviewer 5 Report

The following are the observations made on the manuscript:

1. Authors claim that "the traditional C3D cannot accurately and hilariously recognize basketball poses". This claim should be justified.

2. Does the study have an improvement or answer to the issues raised in 1. above? If yes, to what extent will the study provide such improvement/answer?

3. Specifically, what research gap is the study filling from a thorough review of extant literature? The background of the study is lacking. The authors should improve on this and clearly show what gap in literature their study is closing from a sound review of literature.

4. Authors should provide a step-by-step protocol for the study that explains the methodology used in the study.

5. The results of the study was not discussed at all. The results should be compared with results from prior related studies.

6. The manuscript lacks information on the implications and contributions of the study to theory and practice and to methodology and society.

7. The limitation of the study was not reported.

8. The direction for future studies was not provided.

9. Most of the references are old and should be updated. 

10. Did the study address the claim made in 1. above? Justify.

Round 2

Reviewer 1 Report

There are no new comments.

Reviewer 2 Report

Thank you for your response. Nevertheless, there still remains few questions and this paper seems more unclear with the answers and the additional descriptions.

 

Q1. "Non-local Neural Networks" used space-time attention modules to understand global dependecy in feature map, and this paper process each attention separetely. Was the action recognition performance improved by separated attention design?

 

Q2. In Table 3, cross-validation protocol is not sepcified on the NTU-RGBD dataset and the action recognition accuracy(98.4%) seems very high in last row on the NTU+RGBD dataset.

 

Q3. The authors additionally presented the results of the Kinetics skeleton dataset. This paper handles RGB videos, not skeletons.

 

Q4. What are"Only the attention mechanism network" and "Dynamic-attention mechanism network" in Table 3?

 

Q5. In Figure 8, the action classes seemed to be defined in instance-level. How does the proposed method recognize actions in instance-level?

 

Q6. The labels of the instances in Figure 8 are different from what the authors specified in Section 4.1.

Back to TopTop