Wrist-Wearable sEMG Gesture Recognition System Based on ThinNet Lightweight Neural Network
Round 1
Reviewer 1 Report
Comments and Suggestions for Authorssee attached file for revised version
Comments for author File:
Comments.pdf
see attached file for revised version
Author Response
Reviewer 1 Comment 1:
The abstract should clearly state the scalability and real-world deployment considerations of the proposed system.
Response 1:
We thank the reviewer for this valuable comment. We have revised the abstract to explicitly include the scalability and practical deployment considerations of the proposed wrist-worn sEMG system. Specifically, we highlighted that the system supports multiple users, requires only a small amount of fine-tuning data for high accuracy, and is suitable for real-time deployment in real-world applications. These modifications are reflected on Page 1, Abstract section of the revised manuscript.
Reviewer 1 Comment 2:
The first statement at the start of the introduction section is grammatically weak and conceptually broad. The phrase “fundamentally transforming how human interface with machines” is informal.
Response 2:
We thank the reviewer for this constructive comment. The opening sentence of the Introduction has been rewritten for clarity and a formal tone. We now describe the growing importance of gesture recognition in human-computer interaction, emphasizing specific applications in virtual reality, rehabilitation, and robotics. This revision appears on Page 2, Introduction section of the revised manuscript.
Reviewer 1 Comment 3:
In the introduction section, the authors have mentioned some related work regarding gesture recognition. In this regard, the authors should include a comparative table to summarize the existing studies such as the applied machine learning techniques and their accuracy, datasets, and key constraints or limitations.
Response 3:
We thank the reviewer for this suggestion. A comparative table summarizing existing sEMG-based gesture recognition studies, including applied machine learning techniques, reported accuracies, datasets, and key limitations, has been added. This table is presented as Table 1 in the Introduction section (Page 3) of the revised manuscript.
Reviewer 1 Comment 4:
Convert the summary of the proposed work into bullet forms at the end of introduction section.
Response 4:
We thank the reviewer for this suggestion. The summary of the proposed work at the end of the Introduction has been reformatted into bullet points, clearly outlining the main contributions: (1) development of a high-performance wristband; (2) design of the ThinNet architecture; (3) implementation of a three-tier buffered decision strategy; (4) demonstration of data efficiency and practical applicability. This change appears on Page 3, Introduction section of the revised manuscript.
Reviewer 1 Comment 5:
It is suggested to provide more comprehensive and technically detailed explanation about the framework of the system (i.e., Figure 1). In addition, it is simplistic and does not evidently signify the overall architecture of the proposed system such as data flow and its functionality, and interaction among the system components.
Response 5:
We thank the reviewer for this valuable comment. Figure 1 and its accompanying description have been updated to provide a comprehensive technical overview of the system, clearly illustrating the data flow among the wristband, host software, and processing/classification modules, along with the functionality and interaction of each component. The detailed explanation appears in Section 2.1.1 on Page 4, and the revised figure is included as Figure 1.
Reviewer 1 Comment 6:
Enhance the quality and readability of Figures 1-4 in the manuscript. Besides, Figures 5 and 6 are the same category so Figure 6 is enough for the evidence and remove Figure 5.
Response 6:
We thank the reviewer for this suggestion. The quality and readability of Figures 1–4 have been enhanced by increasing the resolution to 300 DPI and refining all text labels and arrows. Figure 5 has been removed, as Figure 6 sufficiently illustrates the wristband placement. These changes are reflected in Figures 1–4 and 6 on Pages 4–7 of the revised manuscript.
Reviewer 1 Comment 7:
Convert Table 2 into graph (i.e., bar graph or line graph) to improve readability and comparison based on accuracy.
Response 7:
We thank the reviewer for this suggestion. Table 2 has been converted into a bar graph, presenting the classification accuracy of Baseline, Fine-tune, and ThinNet networks, with numerical values annotated on each bar. The revised figure is presented as Figure 12 on Page 18 of the revised manuscript.
Reviewer 1 Comment 8:
Enhance the quality of Figure 9 and add proper text labels vertically and horizontally on each dot and lines.
Response 8:
We thank the reviewer for this comment. Figure 9 has been updated to a step plot representing real-time predicted gesture labels, with key label change points annotated, axes clearly labeled, and first vs second test distinguished by line style. The resolution has been increased to 300 DPI. The revised figure is shown on Page 17 of the revised manuscript.
Reviewer 1 Comment 9:
Enhance the quality of all figures by increasing the DPI value to 250.
Response 9:
We thank the reviewer for this suggestion. All figures in the manuscript have been updated to a minimum of 300 DPI, improving clarity and readability for print and online publication. This change affects all figures throughout the manuscript (Figures 1–12).
Reviewer 1 Comment 10:
In section 3 (Results), the authors are encouraged to properly explain the procedure of the experimental evaluation; particularly, explain the participant setup, evaluation protocol, the process of data collection, and testing conditions used for validating the proposed gesture recognition system.
Response 10:
We thank the reviewer for this suggestion. Section 3 has been revised to provide a detailed description of the experimental evaluation, including participant setup (100 healthy adults, age 22 ± 3), evaluation protocol (six predefined gestures with randomized sequences), data collection process (8-channel sEMG, 1 kHz, filtered and segmented), and testing conditions (offline cross-validation and simulated online tests). These changes appear in Sections 2.1.3–2.2.3, Pages 8–15 of the revised manuscript.
Reviewer 1 Comment 11:
In section 3.3, to improve the clarification provide a clear discussion about the accuracy that what metrics are used for calculation the accuracy.
Response 11:
We thank the reviewer for this comment. Section 3.3 now explicitly states that recognition accuracy is calculated as the proportion of correctly identified gestures relative to the total number of gestures presented, consistent across networks and test sessions. This clarification appears on Pages 16–17.
Reviewer 1 Comment 12:
Similarly, also include the average age with standard deviation of participants.
Response 12:
We thank the reviewer for this comment. The average age and standard deviation of participants (22 ± 3 years) have been added to Section 2.1.3 and referenced in the Results section on Page 10.
Reviewer 1 Comment 13:
Briefly summarize the overall results at the end of the result section.
Response 13:
We thank the reviewer for this suggestion. A summary paragraph has been added at the end of Section 3, highlighting key findings: Baseline 1D-CNN achieved 73.01% accuracy, Fine-tune improved to 86.90%, and ThinNet achieved 90.47% accuracy; online testing confirmed real-time feasibility and cross-subject robustness. This summary appears on Page 18.
Reviewer 1 Comment 14:
The citations are not according to the journal format.
Response 14:
We thank the reviewer for pointing this out. All references have been reformatted to comply with the journal’s citation style, including in-text citations and the reference list. These changes are reflected throughout the manuscript, Pages 2–23.
Reviewer 1 Comment 15:
The conclusion is very short. To extend the conclusion add briefly restate the problem and contribution, summarize key results and emphasize significance with limitations and future work.
Response 15:
We thank the reviewer for this comment. The Conclusion section has been expanded to restate the research problem, summarize key contributions (wristband hardware, ThinNet architecture, three-tier buffered decision strategy), highlight main results (accuracy improvement and data efficiency), and emphasize both the significance and limitations of the study, as well as future work for improving robustness, expanding gesture sets, and evaluating broader populations. The revised conclusion appears on Page 23.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe paper discusses a (sEMG)-based gesture recognition system including a ThinNet approach. The paper is interesting. A minor revision is required before its publication by answering to the following points:
- Introduction: the state of the art should be expanded about: CNN hand movements including robotic applications ( doi: 10.3390/s22030831; DOI: 10.1109/JSEN.2026.3676256; arXiv:2602.14099v1 [cs.RO]) opening the proposed framework to other application fields;
- Materials and Methods: add more details of the PCB of Figure 3 ;
- Results: add more details about the performance comparison of Fig. 10;
- Discussion: add more details about advantages, limitations and perspectives.
Important: please highlights differences considering the paper DOI: 10.1007/s11036-020-01590-8
Author Response
Reviewer 2 Comment 1 (Introduction):
The state of the art should be expanded about: CNN hand movements including robotic applications (doi: 10.3390/s22030831; DOI: 10.1109/JSEN.2026.3676256; arXiv:2602.14099v1 [cs.RO]) opening the proposed framework to other application fields.
Response 1:
We thank the reviewer for this valuable comment. We have expanded the Introduction to include a discussion of recent CNN-based hand gesture recognition methods applied to robotic control and human-robot interaction, highlighting how these approaches extend the framework to other application fields. Relevant references have been added (doi: 10.3390/s22030831; DOI: 10.1109/JSEN.2026.3676256; arXiv:2602.14099v1 [cs.RO]), and differences from the approach in DOI: 10.1007/s11036-020-01590-8 are explicitly discussed. These modifications are reflected on Pages 2–3 of the revised manuscript.
Reviewer 2 Comment 2 (Materials and Methods):
Add more details of the PCB of Figure 3.
Response 2:
We thank the reviewer for this suggestion. Section 2.1.2 has been updated to provide additional technical details of the PCB shown in Figure 3, including layout of the Main Control Board and Front-End Board, FPC connectors, and key components. The description now clarifies signal and power routing, as well as modular design for real-time data acquisition. These updates appear on Pages 5–6.
Reviewer 2 Comment 3 (Results):
Add more details about the performance comparison of Fig. 10.
Response 3:
We thank the reviewer for this comment. Section 3.4 has been revised to provide additional details regarding the performance comparison presented in Figure 10, including trends with increasing pretraining data, cross-validation metrics, and the effect of data volume on ThinNet’s accuracy. These details appear on Page 19 of the revised manuscript.
Reviewer 2 Comment 4 (Discussion):
Add more details about advantages, limitations and perspectives.
Response 4:
We thank the reviewer for this suggestion. The Discussion section has been expanded to more comprehensively address the advantages of our framework (high SNR wristband, lightweight ThinNet, robust real-time performance), its limitations (sensitivity to electrode placement, muscle fatigue, dataset size), and perspectives for future work (adaptive calibration, larger gesture sets, deployment across diverse user populations). Differences with prior work, including DOI: 10.1007/s11036-020-01590-8, are highlighted. These revisions appear on Pages 20–22 of the revised manuscript.
Reviewer 3 Report
Comments and Suggestions for AuthorsThe topic of the paper is interesting and the results are promising.
However, before the paper can be published several issues have to be addressed, especially the availability of the data and the source code, improved experimental design, and more detailed description of the authors' method.
1. It is stated in the paper: "Data Availability Statement: Data are not publicly available due to privacy restrictions." Making the data available is an absolute need for any scientific paper to be accepted, which guarantees that the results can be recreated and verified. The data can be anonymized by removing the personal data of participants and thus not causing any privacy concerns.
2. I cannot find any link in the paper to the source code of the software created by the authors (the neural network and the whole pipeline). Making the source code available is another absolute need for any scientific paper to be accepted, which together with the availability of the data guarantees that the results can be recreated and verified.
3. 100 participants' data were divided into training (80 subjects), validation (10 subjects), and test (10 subjects). The results are presented and discussed on the 10 test subjects only.
That is definitely too small sample to draw any conclusions and to reliably assess the performance of the method. The proportions are OK, but in such a case the experiments should be repeated 10 times (like a crossvalidation) so that each participant appears once in a validation and once in a test subset. The results should be discussed based on the average and distributions of 100 test cases and not on 10 case only.
4. Line 254: "The baseline 1D-CNN follows the same architecture as in a previous work. [20]". This architecture should be shortly presented here so that the reader does not have to look for another paper.
5. Why such an architecture of the ThinNent was proposed? There is a description of the architecture but the rationale of choosing this architecture is missing.
6. There should be description of the data structure (the data received from the wrist sensor and how it was fed to the neural network? What was the number of neurons in particular layers of
the ThinNent?
Author Response
Reviewer 3 Comment 1:
It is stated in the paper: "Data Availability Statement: Data are not publicly available due to privacy restrictions." Making the data available is an absolute need for any scientific paper to be accepted, which guarantees that the results can be recreated and verified. The data can be anonymized by removing the personal data of participants and thus not causing any privacy concerns.
Response 1:
We thank the reviewer for emphasizing the importance of data availability for scientific reproducibility. We fully agree that open data is ideal. However, we would like to clarify that this study is part of a collaborative project with Huawei Technologies Co., Ltd. The dataset contains proprietary information and is subject to strict confidentiality agreements and data privacy regulations. Therefore, we are legally restricted from making the full dataset publicly available in an open-access repository. To address this concern while respecting these constraints, we have included a small subset of anonymized/synthetic data in the supplementary materials to demonstrate the data format and allow basic testing of the methodology. This update is reflected in the Data Availability Statement on Page 24 of the revised manuscript.
Reviewer 3 Comment 2:
I cannot find any link in the paper to the source code of the software created by the authors (the neural network and the whole pipeline). Making the source code available is another absolute need for any scientific paper to be accepted, which together with the availability of the data guarantees that the results can be recreated and verified.
Response 2:
We thank the reviewer for this comment. While the full software pipeline contains proprietary commercial modules, we are willing to release a simplified version of the core algorithm (excluding proprietary components) for academic research purposes upon request. This ensures that the methodology can be tested while respecting confidentiality agreements. This information is added to the Data Availability Statement on Page 24.
Reviewer 3 Comment 3:
100 participants' data were divided into training (80 subjects), validation (10 subjects), and test (10 subjects). The results are presented and discussed on the 10 test subjects only. That is definitely too small sample to draw any conclusions and to reliably assess the performance of the method. The proportions are OK, but in such a case the experiments should be repeated 10 times (like a cross-validation) so that each participant appears once in a validation and once in a test subset. The results should be discussed based on the average and distributions of 100 test cases and not on 10 case only.
Response 3:
We thank the reviewer for pointing this out. To ensure robust evaluation, all offline experiments were conducted using 10-fold cross-validation across all 100 participants, such that each participant appeared once in the validation set and once in the test set. Results reported in the manuscript represent averages and distributions over all 100 participants, addressing concerns regarding small sample size. This is described in Sections 2.2.1 and 3.2, Pages 12–16.
Reviewer 3 Comment 4:
Line 254: "The baseline 1D-CNN follows the same architecture as in a previous work. [20]". This architecture should be shortly presented here so that the reader does not have to look for another paper.
Response 4:
We thank the reviewer for this suggestion. The baseline 1D-CNN architecture has been briefly described in Section 2.2.2 (Pages 13–14), including convolutional layers, filter numbers, kernel sizes, stride, fully connected layers, and input structure. This allows readers to understand the model without consulting external references.
Reviewer 3 Comment 5:
Why such an architecture of the ThinNet was proposed? There is a description of the architecture but the rationale of choosing this architecture is missing.
Response 5:
We thank the reviewer for this comment. The rationale for the ThinNet architecture has been added to Section 2.2.2 (Page 14). We explain that the fully convolutional design reduces parameters by 87%, hierarchical downsampling expands the receptive field efficiently, and global average pooling enhances robustness to inter-channel variations, balancing accuracy, computational efficiency, and cross-subject generalization.
Reviewer 3 Comment 6:
There should be description of the data structure (the data received from the wrist sensor and how it was fed to the neural network? What was the number of neurons in particular layers of the ThinNet?
Response 6:
We thank the reviewer for this suggestion. Section 2.2.2 (Pages 13–14) has been updated to describe the data structure: 8-channel sEMG segments of 500 samples per window are Z-score normalized and used as input to all networks. The ThinNet layer structure is fully detailed, including filter numbers, kernel sizes, stride, and the 1×1 convolution classification head. This ensures readers understand both the input data format and the network layer configuration.
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsMinor Changes and Accepted
Comments for author File:
Comments.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe authors have addressed all my comments. Especially they explained why the original data and software cannot be made publicly available but special versions of data and software will be available. The authors have also added more explanations of the architecture, methods and conducted crossvalidation on the offline data. Taking that into account, in my opinion the paper can be accepted for publications in its current form.

