Next Article in Journal
Power Distribution IoT Tasks Online Scheduling Algorithm Based on Cloud-Edge Dependent Microservice
Next Article in Special Issue
A Cosine-Similarity-Based Deconvolution Method for Analyzing Data-Independent Acquisition Mass Spectrometry Data
Previous Article in Journal
Computer-Generated Holography Methods for Data Page Reconstruction Using Phase-Only Medium
Previous Article in Special Issue
Frequency-Limited Model Reduction for Linear Positive Systems: A Successive Optimization Method
 
 
Article
Peer-Review Record

Gesture Detection and Recognition Based on Object Detection in Complex Background

Appl. Sci. 2023, 13(7), 4480; https://doi.org/10.3390/app13074480
by Renxiang Chen and Xia Tian *
Reviewer 1:
Reviewer 2:
Appl. Sci. 2023, 13(7), 4480; https://doi.org/10.3390/app13074480
Submission received: 4 March 2023 / Revised: 16 March 2023 / Accepted: 28 March 2023 / Published: 31 March 2023

Round 1

Reviewer 1 Report

Authors have proposed a methodology which already exists, which is my great concern and remark. 

1.. The authors shall give a comprehensive review of the related works and divide them into different kinds.

2. The advantages and disadvantages of the previous work are not clearly expounded, in other words, the motivation of writing the paper is not explained. 

3. References can also be improved. The authors should make a detailed review of the latest research progresses in recent three years. Recent related works could be mentioned to further enhance the reference.

4. As the authors aim for gesture detection and recognition, does the model suitable for moving videos instead of static images? 

5. What are the limitations of the proposed model

 

Author Response

Dear Reviewers:

 

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Gesture Detection and Recognition Based on Object Detection in Complex Background” (ID: applsci-2292500).Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval.

 

  1. Authors have proposed a methodology which already exists, which is my great concern.

The author’s answer: Thank you for your comments, we are also very concerned about the innovation of the paper, so we have searched the major databases using the keywords YOLOv5 and the innovation point of the paper and have not found articles that use the same approach.

  1. The authors shall give a comprehensive review of the related works and divide them into different kinds.

The author’s answer: Thank you for your comments, which we have carefully reviewed and found to be true, so we have improved upon them by classifying the existing methods and summarizing them separately and comprehensively in section 1 of the paper.

  1. The advantages and disadvantages of the previous work are not clearly expounded, in other words, the motivation of writing the paper is not explained.

The author’s answer: Thank you for your comments, which have been followed by a summary of the strengths and weaknesses of the previous work in the first part of the paper, and which led to the motivation for writing the paper.

  1. References can also be improved. The authors should make a detailed review of the latest research progresses in recent three years. Recent related works could be mentioned to further enhance the reference.

The author’s answer: Thank you for your comments, we have added more than 20 new relevant papers from the last 3 years in the first two sections of the paper in line with your comments and summarized the recent relevant work in the end of section 2.

  1. As the authors aim for gesture detection and recognition, does the model suitable for moving videos instead of static images?

The author’s answer: Thank you for your comments, our approach is actually applicable to static images as our model is trained with static image data, but as our model achieves a recognition speed of 64FPS, our model is also applicable to real-time gesture detection and recognition or detection and recognition of videos with gestures.

  1. What are the limitations of the proposed model.

The author’s answer: Thank you for your comments, the shortcomings of our approach are outlined in the last section of the paper, the main two points are: firstly, it requires a large amount of training data, and secondly, it requires a pre-set anchor box of the right size before training.

Reviewer 2 Report

Title: Gesture Detection and Recognition Based on Object Detection 2 in Complex Background

 

Manuscript-ID: applsci-2292500

1.   The paper aims to provide an improved YOLOv5 network structure to solve the gesture recognition problem in complex backgrounds. You have mentioned the main contributions exactly the same in both abstract and introduction section. Please remove point 4 in the contribution, which is not considered as an objective but is represents the experimental results.

2.   Why did you use Yolov5 as a base model? Give more detail of the model improvement in methodology section.

3.   Why did you select mAP0.5:0.95 as accuracy metric? Did you test for mAP0.1:0.5? What do you expect?

4.   I think the frame rate is large 64 fps for a frame size (640*640). Was it average value?

5.   Your related work’s survey in the introduction section is not enough. More up-to-date references especially for complex background must be investigated.

6.   What was the total number of parameters before and after improvement?

7.   Did you calculate the execution time for real time test?  

8.   Results in figure 5 are Comparison of detection of different methods. Do you consider the gambling table as a complex environment for hand gesture recognition? It would be better to train your model with more than two datasets to show the validity of the model.

9.   What cross validation strategy was used for splitting data?

10.                 Give some future directions in the conclusion section.

Comments for author File: Comments.pdf

Author Response

Dear Reviewers:

 

 

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “Gesture Detection and Recognition Based on Object Detection in Complex Background” (ID: applsci-2292500).Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval.

 

  1. The paper aims to provide an improved YOLOv5 network structure to solve the gesture recognition problem in complex backgrounds. You have mentioned the main contributions exactly the same in both abstract and introduction section. Please remove point 4 in the contribution, which is not considered as an objective but is represents the experimental results.

The author’s answer: Thank you to the reviewers for their valuable comments, you are correct and we have revised the corresponding section of the introduction and removed point 4 of the main contribution.

  1. Why did you use Yolov5 as a base model? Give more detail of the model improvement in methodology section.

The author’s answer: Thanks to the valuable comments of the reviewers, the reasons why we chose YOLOv5 as the basic model have been added to the beginning of section 3 with the relevant descriptions, mainly as follows: As a one-stage object detection algorithm, YOLOv5 algorithm can transform the detection problem into a regression problem. Compared with other algorithms, YOLOv5 has faster detection speed and higher detection accuracy, which can meet the needs of real-time detection and recognition of complex background gestures.

Also a detailed description of the model improvements has been added in the methodology section as well.

  1. Why did you select mAP0.5:0.95 as accuracy metric? Did you test for mAP0.1:0.5? What do you expect?

The author’s answer: Thank you for your question, the choice of map0.5:0.95 as the evaluation index can be more comprehensive and more rigorous evaluation of the effectiveness of this paper's method, while map0.1:0.5 index due to its iou threshold is set too low, most of the models can get good accuracy values, can not evaluate the superiority of the model.

  1. I think the frame rate is large 64 fps for a frame size (640*640). Was it average value?

The author’s answer: Thank you for your question, the recognition speed of 64FPS was achieved by inputting 640*640 images on a 1080Ti GPU and the result was obtained by inputting 100 images and averaging the recognition speed.

  1. Your related work’s survey in the introduction section is not enough. More up-to-date references especially for complex.

The author’s answer: Thanks to your suggestion, we have added more than 20 summaries of relevant studies in the last 3 years and added summaries of studies related to complex contexts.

  1. What was the total number of parameters before and after improvement?

The author’s answer: Thank you for your question. The total number of parameters in our model before the improvement is: 108.2 GFLOPs and after the improvement is: 97.2 GFLOPs.

  1. Did you calculate the execution time for real time test?

The author’s answer: Thank you for your question, we have done the actual test time calculations and the results are approximately the same as the time written in the paper.

  1. Results in figure 5 are Comparison of detection of different methods. Do you consider the gambling table as a complex environment for hand gesture recognition? It would be better to train your model with more than two datasets to show the validity of the model.

The author’s answer: Thank you for your question and suggestion. The dataset used in this paper contains a variety of different complex backgrounds, of which Figure 5 is only part, and the variation of complex scenes is also one of the complex background gesture recognition difficulties, while the first column of images in Figure 5 has a dark background case, the second column has a skin tone like case, the third column has a motion blur case, and the fourth column has a very small gesture case, all of which are complex background difficulties. With regard to proving the validity of the model, this paper verifies and compares it through a large number of ablation experiments, which can prove the validity of the model in this paper.

  1. What cross validation strategy was used for splitting data?

The author’s answer: Thank you for your question, all our experimental data are divided randomly, by setting different random number seeds to divide the data set for multiple rounds of experiments and taking the average value of the results in the paper.

  1. Give some future directions in the conclusion section.

The author’s answer: Thank you for your valuable suggestions and we have included a future perspective in the conclusion of the revised paper.

Round 2

Reviewer 1 Report

The authors have responded to the comments provided. The paper can be published in the present form

Regards,

Reviewer 2 Report

no comments

Back to TopTop