Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Speech Enhancement Based on Two-Stage Processing with Deep Neural Network for Laser Doppler Vibrometer

Appl. Sci. 2023, 13(3), 1958; https://doi.org/10.3390/app13031958

by Chengkai Cai^1,*, Kenta Iwai² and Takanobu Nishiura^2,*

Reviewer 1:

Gerrit Vermeir

Reviewer 2:

Giuseppe Ciaburro

Reviewer 3: Anonymous

Appl. Sci. 2023, 13(3), 1958; https://doi.org/10.3390/app13031958

Submission received: 28 November 2022 / Revised: 16 December 2022 / Accepted: 26 December 2022 / Published: 2 February 2023

(This article belongs to the Special Issue Audio and Acoustic Signal Processing)

Round 1

Reviewer 1 Report

Overall impression

Well-structured, orderly text and drawings, well-documented. Personally, I am not sufficiently specialised to give a detailed assessment in relation to scientific value with respect to peers. Still, my overall impression is this of good work worthy of publication with the prospect of exciting application. A possible question on my part is whether the procedure can function in real-time. It might be good to mention this in relation to the application.

Comments/suggestions

Regarding the caption of Figure 1, it might not be a bad idea to add some detail regarding the mounting of the different samples (c-f). It is also somewhat regrettable that one did not immediately apply the complete procedures to these objects. Is there a reason why this was not done? Perhaps it would be best to mention this in the text.

Author Response

Thank you for reviewing our manuscript. Our responses has been written in the PDF document. Please check it.

Author Response File: Author Response.pdf

Reviewer 2 Report

Section 1 must be improved.

- Authors should emphasize contribution and novelty, the introduction needs to clarify the motivation, challenges, contribution, objectives, and significance/implication.

- Briefly introduce how LDV can be used for real-time speech-signal acquisition

- You must properly introduce your work, specify well what were the goals you set yourself and how you approached the problem.

Section 2 must be improved.

- In this section you introduce how LDV can be used for real-time speech-signal acquisition.

- Start by explaining why you need to measure the vibration of objects

- Explain why you used those objects in Figure 1

- Briefly describe how you made the measurements shown in Figure 1

- Keep in mind that the paper can also be read by non-expert readers of the subject.

- Also, since this section is short, I suggest you merge it with section 3 and rename it Materials and Methods. In that section you could then add two subsections with the titles you've already used.

Section 3 must be improved.

- I suggest you merge it with section 2 and rename it Materials and Methods. In that section you could then add two subsections with the titles you've already used.

- In section 3.1, first present your methodology shown in the flowchart of Figure 3 and then present the results of Figure 2.

- You must properly introduce the equation, list in detail the variables contained in it with a concise description of the meaning. To make them more readable show them in a bulleted list. In this way the reader will be able to understand the contribution of each variable.

- Why did you use three figures to show the methodology? You could have swallowed Figures 4 and 5 in Figure 3.

- Introduce adequately the Deep Learning techniques used

Section 4 must be improved.

- Describe in detail the equipment used to make the measurements (LDV). Extract this data from the datasheet of the instrumentation manufacturer. To make reading the specifications of the instruments more immediate, you can insert them in a table, listing the instruments used and the specific characteristics for each.

- Add a photo of the experimental set-up, Figure 10 is not clear.

- Furthermore, a description of the hardware and software used for data processing is completely missing. Describe in detail the hardware used: Extract this data from the datasheet of the hardware manufacturer. To make reading the specifications of the hardware more immediate, you can insert them in a table, listing the instruments used and the specific characteristics for each.

- Also, you should describe in detail the software platform you used.

- Also describe the machine learning-based libraries you used.

Section 5 must be improved.

- Paragraphs are missing where the possible practical applications of the results of this study are reported. What these results can serve the people, it is necessary to insert possible uses of this study that justify their publication.

33-34) Remove double occurrences of LDV term

42) Use this format for reference: Li et al. [9], in this way the reference number follows the name of the author. I have seen that you often use this format, so I will not repeat this advice again, it also applies to the other occurrences.

89) Introduce adequately the topic (spectrogram)

155) “RNN” Do not use acronyms until you have presented the full definition, I will not repeat this advice again, it also applies to the other occurrences.

173-174) Why used PET bottle? Explain

Author Response

Thank you for reviewing our manuscript. Our responses has been written in the PDF document. Please check it.

Author Response File: Author Response.pdf

Reviewer 3 Report

Authors proposed the distant-talk measurement systems with laser Doppler vibrometer (LDV). Literature review background of LDV was provided. Figure quality looks good. However, there are some concern for English grammar. Please check English grammar very carefully with native English colleague professors. Thus, the manuscript could be revised accordingly.

1. The novelty of the proposed approach need to be described a little bit more in Abstract section to emphasize the novelty because it seems that authors do not mention that in detail.

2. In Figures 1,2, and 11 please use , for the values. For example, 8000 to 8,000.

3. Please mark the highest frequency point in Figure 2c.

4. Please describe how you set ambient noise level of 20.8 dB and sampling frequency of 16 kHz.

5. Please provide the ref. (These microphones can pick up ~) with the ref. (https://www.sciencedirect.com/science/article/abs/pii/S0003682X20306538)

6. Before starting the conclusion sections, authors had better provide summarized data.

7. Please use abbreviated journal names in reference section.

8. No data availability section.

9. Why authors used wideband perceptual evaluation of speech quality, LSD, and STOI ? Please describe why these evaluation merits are useful ?

10. Provide who is corresponding author.

Author Response

Thank you for reviewing our manuscript. Our responses has been written in the PDF document. Please check it.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The authors addressed the reviewer's comments with attention and modified the paper with the suggestions provided. The new version of the paper has improved both in the presentation and in the contents.

Minor revision

- Authors should rearrange text, figures and tables to make the best use of journal space. I see that there are several spaces left blank because the figure has moved to the next page.

Article Menu

Speech Enhancement Based on Two-Stage Processing with Deep Neural Network for Laser Doppler Vibrometer

Further Information

Guidelines

MDPI Initiatives

Follow MDPI