Next Article in Journal
Filament Type Recognition for Additive Manufacturing Using a Spectroscopy Sensor and Machine Learning
Previous Article in Journal
Event-Triggered State Filter Estimation for INS/DVL Integrated Navigation with Correlated Noise and Outliers
Previous Article in Special Issue
Robust Multi-Subtype Identification of Breast Cancer Pathological Images Based on a Dual-Branch Frequency Domain Fusion Network
 
 
Article
Peer-Review Record

Simultaneous Speech and Eating Behavior Recognition Using Data Augmentation and Two-Stage Fine-Tuning

Sensors 2025, 25(5), 1544; https://doi.org/10.3390/s25051544
by Toshihiro Tsukagoshi 1, Masafumi Nishida 1 and Masafumi Nishimura 1,2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Sensors 2025, 25(5), 1544; https://doi.org/10.3390/s25051544
Submission received: 31 December 2024 / Revised: 16 February 2025 / Accepted: 27 February 2025 / Published: 2 March 2025
(This article belongs to the Special Issue AI-Based Automated Recognition and Detection in Healthcare)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Very interessting manuscript on Simultaneous Speech and Eating Behavior Recognition Using
Data Augmentation and Two-Stage Fine-Tuning.

In the introduction, authors might more clearly state what the differences to their previous publication (https://www.mdpi.com/1424-8220/21/10/3378) are.

Some other suggestions are given in the attached pdf file

 

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

While the paper presents an interesting and potentially valuable approach, it has several critical weaknesses that need to be addressed before it can be considered for publication.

1. The paper mostly compares different versions of its own method instead of benchmarking against state-of-the-art models. How does this perform against other multitask learning models for speech and sound recognition? Without that, it’s hard to judge the real impact.
2. Some of the accuracy differences are really small (like CER 16.22% vs. 16.24%). Are these meaningful improvements? Running significance tests or showing confidence intervals would make the results more convincing.
3. The two-stage fine-tuning idea is interesting, but the explanation is too technical. A simpler breakdown of why this approach works better would help make it clearer.
4. Table 1 is packed with numbers, but a graph or visual comparison would make it much easier to see trends. Also, Figure 6 should highlight where the proposed method improves or struggles.
5. All tests are done in a lab, but real life is messy. What happens in a noisy restaurant? What if the microphone placement shifts? Right now, the results don’t show how well this would actually work in daily use.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

accept

Back to TopTop