Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite

Open AccessArticle

Peer-Review Record

Embedded Implementation of Real-Time Voice Command Recognition on PIC Microcontroller

Automation 2025, 6(4), 79; https://doi.org/10.3390/automation6040079 (registering DOI)

by Mohamed Shili¹

, Salah Hammedi^2,3

, Amjad Gawanmeh^4,*

and Khaled Nouri⁵

Reviewer 1:

Napa Sae-Bae

Reviewer 2: Anonymous

Reviewer 3:

Lukáš Beňo

Reviewer 4:

Honggui Li

Automation 2025, 6(4), 79; https://doi.org/10.3390/automation6040079 (registering DOI)

Submission received: 26 July 2025 / Revised: 17 October 2025 / Accepted: 24 October 2025 / Published: 28 November 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper presents the implementation of Real-Time Voice Command Recognition on a PIC Microcontroller, detailing both the hardware setup and the recognition model. However, the techniques used demonstrate limited novelty compared to existing methods in the field, with the paper's primary contribution lying in the specific combination and optimization for this constrained platform, rather than introducing new algorithms or features.

In addition, the content is not well-organized, and some sub-sections largely overlap, leading to significant inconsistencies and potential confusion for the reader, e.g, 2.2 and 2.5

Also, a major organizational flaw is the contradictory information regarding the feature extraction techniques used in the system. ◦ The abstract, introduction, and the dedicated "Feature Extraction" Section 3.3.2 consistently state that Zero-Crossing Rate (ZCR) and Short-Time Energy (STE) are the features derived from the audio signal. ◦ However, the "System Architecture" diagram (Figure 2) labels the feature extraction block with "(MFCC, LPC, FFT)".

Author Response

Response to Reviewer 1 Comments

It is with excitement that we resubmit to you a revised version of manuscript automation-3811211 “Embedded Implementation of Real-Time Voice Command Recognition on PIC Microcontroller”for the automation.

Thank you for your precious comments and advice. Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studies comments carefully and have made correction, which we hope meet with approval. Revised portion are marked in yellow in the paper. The main corrections in the paper and the response to the reviewer’s comments are as followings:

Reviewer 1: The paper presents the implementation of Real-Time Voice Command Recognition on a PIC Microcontroller, detailing both the hardware setup and the recognition model. However, the techniques used demonstrate limited novelty compared to existing methods in the field, with the paper's primary contribution lying in the specific combination and optimization for this constrained platform, rather than introducing new algorithms or features.

Comment 1: In addition, the content is not well-organized, and some sub-sections largely overlap, leading to significant inconsistencies and potential confusion for the reader, e.g, 2.2 and 2.5

Authors’ Response: We thank the reviewer for this comment. Sections 2.2 and 2.5 have been revised to clarify their focus: 2.2 now discusses the theoretical role and constraints of PIC microcontrollers, while 2.5 presents practical experimental implementations, including examples, accuracy, and optimization strategies. This revision removes overlap and improves clarity (see pages 3 and 4).

Comment 2: Also, a major organizational flaw is the contradictory information regarding the feature extraction techniques used in the system. ◦ The abstract, introduction, and the dedicated "Feature Extraction" Section 3.3.2 consistently state that Zero-Crossing Rate (ZCR) and Short-Time Energy (STE) are the features derived from the audio signal. ◦ However, the "System Architecture" diagram (Figure 2) labels the feature extraction block with "(MFCC, LPC, FFT)".

Authors’ Response: We thank the reviewer for this observation. The system uses only Zero-Crossing Rate (ZCR) and Short-Time Energy (STE) for feature extraction. The reference to MFCC, LPC, and FFT in Figure 2 was included in error and has been corrected. We have updated Figure 2 (Section 3.1) and revised the Abstract, Introduction, and Section 3.3.2 to ensure consistency throughout the manuscript (see pages 1, 2, 6, 7, 9–10).

Thank you for your thoughtful review. We believe we have responded satisfactorily to your concerns.

We have studies comments carefully and have made correction which we hope meet with approval

Sincerely,

The Authors.

Reviewer 2 Report

Comments and Suggestions for Authors

The manuscript presents a low-power, real-time voice command recognition system on a PIC-series microcontroller. The authors implement an end-to-end pipeline for detecting four spoken commands using only on-board processing. The approach uses simple time-domain audio features – specifically Zero Crossing Rate (ZCR) and signal energy – and a lightweight multi-layer perceptron (MLP) classifier to recognize the commands. The system is demonstrated on a PIC microcontroller, with a minimal hardware setup: a MEMS microphone feeding the PIC’s ADC, and the PIC driving output actuators (e.g. DC motors via a driver) based on recognized commands. Overall, this work demonstrates the feasibility of keyword spotting on a modest PIC microcontroller, contributing to the growing area of TinyML for voice interfaces.

=== Comments ===

1. This manuscript is written in "What I did" perspective. The author needs to add a paragraph that explains why PIC device is required in the proposal, what points are improved from previous TinyML voice recognition works (Table 1).
2. line 157: Why does the term "image recognition" appear?
3. The authors need to clarify the reasons why proposed system requires motor control. Is it mandatory for the TinyML model or voice recognition flow? If control output is included in contributions, the authors need to add a paragraph that explains its novelties.
4. The authors need to compare results with previous works using TinyML voice recognition such as PIC or Arm Cortex-M. Additionally, experimental results of Figure 5 would be better if they can be compared with other accuracy results.

1. The manuscript have several English grammatical issues. Reviewer recommends that correcting these issues will improve its readability.

- line 359: There is a missing space.
- line 352: The phrase “fret with” is awkward in this context.

Author Response

Response to Reviewer 2 Comments

Reviewer 2: The manuscript presents a low-power, real-time voice command recognition system on a PIC-series microcontroller. The authors implement an end-to-end pipeline for detecting four spoken commands using only on-board processing. The approach uses simple time-domain audio features – specifically Zero Crossing Rate (ZCR) and signal energy – and a lightweight multi-layer perceptron (MLP) classifier to recognize the commands. The system is demonstrated on a PIC microcontroller, with a minimal hardware setup: a MEMS microphone feeding the PIC’s ADC, and the PIC driving output actuators (e.g. DC motors via a driver) based on recognized commands. Overall, this work demonstrates the feasibility of keyword spotting on a modest PIC microcontroller, contributing to the growing area of TinyML for voice interfaces.

Comment 1: This manuscript is written in "What I did" perspective. The author needs to add a paragraph that explains why PIC device is required in the proposal, what points are improved from previous TinyML voice recognition works (Table 1).

Authors’ Response: We thank the reviewer for the comment. Section 2 has been revised to emphasize the research gaps, clarifying that prior works seldom combine real-time stress estimation with signal-based adaptive control. Our study addresses this gap to enhance EV efficiency and drivetrain durability (see pages 5 and 6).

Comment 2: line 157: Why does the term "image recognition" appear?

Authors’ Response: We thank the reviewer for pointing this out. The term “image recognition” was included in error and has been corrected to “voice command recognition” throughout the manuscript, specifically in Section 2.5 (Experimental Implementations on PIC Microcontrollers) and the Literature Review.(see page 4-5).

Comment 3: The authors need to clarify the reasons why proposed system requires motor control. Is it mandatory for the TinyML model or voice recognition flow? If control output is included in contributions, the authors need to add a paragraph that explains its novelties.

Authors’ Response : We thank the reviewer for this comment. A description clarifying the role and novelty of motor control has been added after Table 2 in Section 3.1 (see page 5 and 6).

Comment 4: The authors need to compare results with previous works using TinyML voice recognition such as PIC or Arm Cortex-M. Additionally, experimental results of Figure 5 would be better if they can be compared with other accuracy results.

Authors’ Response: We thank the reviewer for the comment. A new section, 4.5. Comparison with Previous TinyML Voice Recognition Works, has been added to compare our results with prior studies on PIC and ARM Cortex-M platforms and to provide context for the experimental results in Figure 5 (see pages 17–18).

Comment 5: The manuscript have several English grammatical issues. Reviewer recommends that correcting these issues will improve its readability.

- line 359: There is a missing space.

- line 352: The phrase “fret with” is awkward in this context.

Authors’ Response: We thank the reviewer for this comment. English grammatical issues, including the missing space on line 359 and the awkward phrase “fret with” on line 352, have been corrected to improve readability (see pages 14–15).

Thank you for your thoughtful review. We believe we have responded satisfactorily to your concerns.

We have studies comments carefully and have made correction which we hope meet with approval

Sincerely,

The Authors.

Reviewer 3 Report

Comments and Suggestions for Authors

The proposed solution demonstrates an efficient approach for voice command recognition on a resource-constrained microcontroller. However, it would be useful to clarify where such an implementation could be applied in real practice. Current consumer devices already provide sufficient performance to support real-time speech translation and complex voice commands—for example in automobiles with voice control, in robotics, or in other embedded systems.

One point that remains unclear is why no activation phrase (“wake word”) is used in the proposed design. In commercial solutions from Apple, Microsoft, Amazon, BMW, Mercedes and others, the activation phrase is an important element: it signals to the system that a command is starting, reduces false activations, and ensures that the speech processing module only extracts commands when needed. Without such a mechanism, it is not obvious how the proposed system avoids misclassification or unintended activations in real-world environments with background noise.

Could the authors comment on how their approach could be extended with an activation phrase, or explain why they intentionally did not use one?

Author Response

Response to Reviewer 3 Comments

Reviewer 3: The proposed solution demonstrates an efficient approach for voice command recognition on a resource-constrained microcontroller. However, it would be useful to clarify where such an implementation could be applied in real practice. Current consumer devices already provide sufficient performance to support real-time speech translation and complex voice commands—for example in automobiles with voice control, in robotics, or in other embedded systems.

Comment 1: One point that remains unclear is why no activation phrase (“wake word”) is used in the proposed design. In commercial solutions from Apple, Microsoft, Amazon, BMW, Mercedes and others, the activation phrase is an important element: it signals to the system that a command is starting, reduces false activations, and ensures that the speech processing module only extracts commands when needed. Without such a mechanism, it is not obvious how the proposed system avoids misclassification or unintended activations in real-world environments with background noise.

Authors’ Response: We thank the reviewer for this observation. The system focuses on lightweight, real-time operation on a resource-constrained PIC microcontroller. Wake-word detection was omitted to prioritize efficiency, targeting applications like IoT devices, simple robotics, and home automation. (Introduction, pages 2-3)

Comment 2: Could the authors comment on how their approach could be extended with an activation phrase, or explain why they intentionally did not use one?

Authors’ Response: We thank the reviewer for this observation. A wake-word detector could trigger the command classifier to reduce false activations and improve robustness; this extension is planned in future work. (Conclusion, pages 19-20)

Thank you for your thoughtful review. We believe we have responded satisfactorily to your concerns.

We have studies comments carefully and have made correction, which we hope meet with approval

Sincerely,

The Authors.

Reviewer 4 Report

Comments and Suggestions for Authors

1 General Comments

This paper proposes a framework for efficient classification of voice commands under real-time and power constrained environment. The contribution of this work includes the merging of lightweight signal processing, embedded neural inference for voice command classification, while achieving initial classification below 50ms for real-time response. The experimental results show that the proposed method is correct and effective.

2 Specific Comments

There are some problems of theoretical and experimental analyses in this manuscript and it can be revised in the following aspects.

(1) The whole paper should be carefully checked to avoid some possible typographical and grammatical mistakes. For instance, simultaneously using semicolon and comma to separate the keywords is unsuitable.

(2) All mathematical expressions should be meticulously examined to prevent some latent errors. For example, “(x_n)-1” in Equation (1) should be “x_(n-1)”.

(3) Other performance metrics, such as precision, recall, and F1 score, can be considered in the experimental section.

(4) The comparison with the state-of-the-art methods is expected in the experimental section.

Author Response

Response to Reviewer 4 Comments

Reviewer 4: This paper proposes a framework for efficient classification of voice commands under real-time and power constrained environment. The contribution of this work includes the merging of lightweight signal processing, embedded neural inference for voice command classification, while achieving initial classification below 50ms for real-time response. The experimental results show that the proposed method is correct and effective.

There are some problems of theoretical and experimental analyses in this manuscript and it can be revised in the following aspects.

Comment 1: The whole paper should be carefully checked to avoid some possible typographical and grammatical mistakes. For instance, simultaneously using semicolon and comma to separate the keywords is unsuitable.

Authors’ Response: We thank the reviewer for this comment. The manuscript has been carefully proofread and corrected for grammar and formatting issues.

Comment 2: All mathematical expressions should be meticulously examined to prevent some latent errors. For example, “(x_n)-1” in Equation (1) should be “x_(n-1)”.

Authors’ Response We thank the reviewer for this comment .We have carefully reviewed and corrected all mathematical expressions, including Equation (1). (See page 9)

Comment 3: Other performance metrics, such as precision, recall, and F1 score, can be considered in the experimental section.

Authors’ Response We thank the reviewer for this valuable suggestion. To provide a more complete evaluation of the classifier’s performance, we have added precision, recall, and F1-score metrics in the experimental results (see pages 16-17, new Section 4.3.3: Precision, Recall, and F1-Score Analysis).

Comment 4: The comparison with the state-of-the-art methods is expected in the experimental section.

Authors’ Response We thank the reviewer for this comment. We have included a comparison with state-of-the-art TinyML and embedded voice recognition systems in the experimental section. (See pages 17-18)

Thank you for your thoughtful review. We believe we have responded satisfactorily to your concerns.

We have studies comments carefully and have made correction which we hope meet with approval

Sincerely,

The Authors.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

THe manuscript improved significantly. However, there are still some minor issues.

In figure 6, the unit of computational time and computational power consumption are different and they should not be placed in the same graph. It is better to put them in the other format, e.g.,
The advantages of the proposed method listed in the table 11 should be attributed to their specific origins, whether they stem from hardware or the novel features themselves. A comparison of power consumption should also be included to provide a comprehensive evaluation of the method's performance and efficiency.

Author Response

Response to Reviewer 1 Comments (Round 2)

Reviewer 1: THe manuscript improved significantly. However, there are still some minor issues.

Comment 1: In figure 6, the unit of computational time and computational power consumption are different and they should not be placed in the same graph. It is better to put them in the other format, e.g.,

Authors’ Response: We appreciate this insightful comment. After revision, the figure originally numbered as Figure 6 has been updated and is now presented as Figure 7 (see page 14). The measurements of inference time and power consumption have been separated into two distinct plots to clearly distinguish computational latency from energy usage, providing a more accurate representation of system performance.

Comment 2: The advantages of the proposed method listed in the table 11 should be attributed to their specific origins, whether they stem from hardware or the novel features themselves. A comparison of power consumption should also be included to provide a comprehensive evaluation of the method's performance and efficiency.

Authors’ Response: We appreciate this insightful comment. In response, we have revised Section 4.5, “Comparison with Previous TinyML Voice Recognition Works” (see pages 17–19), to clearly attribute the advantages of the proposed method to their specific origins, distinguishing between hardware benefits and methodological improvements. Additionally, we have included a detailed comparison of power consumption to provide a comprehensive evaluation of the system’s performance and efficiency.

Thank you for your thoughtful review. We believe we have responded satisfactorily to your concerns.

We have studies comments carefully and have made correction, which we hope meet with approval

Sincerely,

The Authors.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

I appreciate authors response to the reviewer's comments.

This manuscript is written in "What I did" perspective. The author needs to add a paragraph that explains why PIC device is required in the proposal, what points are improved from previous TinyML voice recognition works (Table 1).

We thank the reviewer for the comment. Section 2 has been revised to emphasize the research gaps, clarifying that prior works seldom combine real-time stress estimation with signal-based adaptive control. Our study addresses this gap to enhance EV efficiency and drivetrain durability (see pages 5 and 6).

Addressed. If the improved terms are indicated in bold, it will contribute to better readability.

line 157: Why does the term "image recognition" appear?

We thank the reviewer for pointing this out. The term “image recognition” was included in error and has been corrected to “voice command recognition” throughout the manuscript, specifically in Section 2.5 (Experimental Implementations on PIC Microcontrollers) and the Literature Review.(see page 4-5).

Addressed.

The authors need to clarify the reasons why proposed system requires motor control. Is it mandatory for the TinyML model or voice recognition flow? If control output is included in contributions, the authors need to add a paragraph that explains its novelties.

We thank the reviewer for this comment. A description clarifying the role and novelty of motor control has been added after Table 2 in Section 3.1 (see page 5 and 6).

Addressed.

The authors need to compare results with previous works using TinyML voice recognition such as PIC or Arm Cortex-M. Additionally, experimental results of Figure 5 would be better if they can be compared with other accuracy results.

We thank the reviewer for the comment. A new section, 4.5. Comparison with Previous TinyML Voice Recognition Works, has been added to compare our results with prior studies on PIC and ARM Cortex-M platforms and to provide context for the experimental results in Figure 5 (see pages 17–18).

There are two Figure 5 on page 12, 13. Moreover, the content related to the author’s response cannot be found in Figure 5. Table 11 appears to be relevant to the response, and thus, it is assumed to have been added when providing this comment. When comparing with previous works, it would be advisable to include proper citations or, at minimum, explain the conditions used for comparison. For example, the ARM Cortex-M in Ref1 has various series such as M0, M3, and M4, with clock frequencies that can be configured differently. These variations directly affect the inference time. Therefore, please revise the comparison with other works to incorporate the reviewer’s comments.

The manuscript have several English grammatical issues. Reviewer recommends that correcting these issues will improve its readability.

We thank the reviewer for this comment. English grammatical issues, including the missing space on line 359 and the awkward phrase “fret with” on line 352, have been corrected to improve readability (see pages 14–15).

Addressed.

========================

New Comments

========================

The authors must carefully verify the figure and table indexing in the manuscript. There are instances of duplicated or skipped figure numbers.

Author Response

Response to Reviewer 2 Comments (Round 2)

Reviewer 2: New Comments

Comment 1: The authors must carefully verify the figure and table indexing in the manuscript. There are instances of duplicated or skipped figure numbers.

Authors’ Response: Thank you for pointing this out. We agree with the reviewer’s comment. Therefore, we have carefully verified and corrected the numbering of all figures and tables to ensure consistency throughout the manuscript.

All figure and table references in the text have also been updated accordingly.

Thank you for your thoughtful review. We believe we have responded satisfactorily to your concerns.

We have studies comments carefully and have made correction, which we hope meet with approval

Sincerely,

The Authors.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

All my remarks were answered.

Author Response

Authors’ Response:Thank you for your positive feedback. We are glad that all previous remarks have been satisfactorily addressed

Article Menu

Embedded Implementation of Real-Time Voice Command Recognition on PIC Microcontroller

Further Information

Guidelines

MDPI Initiatives

Follow MDPI