Review Reports - An Artificial Intelligence-Based Algorithm for the Assessment of Substitution Voicing

Round 1

Reviewer 1 Report

In this research manuscript, the authors have presented a new artificial intelligence-based method for evaluating substitution voicing (SV) and speech following laryngeal oncosurgery. Named ASVI, the proposed technique provides a fast and efficient option for SV and speech in patients after laryngeal oncosurgery. In this work for the ML application part, the authors used well-known CNN models for image categorization. In the proposed ML process, a Mel-frequency spectrogram input is fed into deep neural network architecture, yielding excellent SV classification results.

The suggested ASVI results are comparable to the auditory-perceptual SV evaluation performed by medical professionals. The proposed AI method was trained on 309 male participants and later evaluated on 70 speech samples that were not part of the training. The proposed ASVI has the potential to be used in research and clinical practice as an easy-to-use metric for SV and speech changes in patients after laryngeal oncosurgery. ASVI would help the patients after laryngeal oncosurgery to seek treatment on time, as changes in voice and speech are usually the first sign of laryngeal cancer recurrence.

This is a very well-written paper by the authors, and I recommend it for publication in its current form. My minor suggestion for the authors would be to break up the introduction section as it is too long. The introduction section can be broken down into two sections. One section briefly outlines the proposed research, and the second section has the background.

Author Response

Thank you for your comments. Our team has addressed them and hopes that this will help to improve the quality of our manuscript. As suggested by the reviewers, we have consulted an editing service to make moderate English changes and improve the readability of the text.

Author Response File: Author Response.pdf

Reviewer 2 Report

Dear Authors

This paper proposes an Artificial Intelligence-based Algorithm for the Assessment of Substitution Voicing.

The paper is well written, but in my opinion it lacks a better description of the state of the art and a better explanation of the novelty of this work. Especially, the authors should compare their resolution, power, SNR results with the literature in order to demonstrate their contribution.

Page 1, line 17: in my opinion the explanation is not very clear, and I could not understand the issue justifying the usage of 70 samples or 309 male.

Page 1, 2 lines 42, 43, 44, 45, 46. I cannot understand the explanation. Please better explain it.

Page 8. The sentence is unclear. Please improve, and probably separate the sentence in two part.

Page 9, improve the figure 4. Missing label figure 4.

Page 4, line 153: what is SD?

Figure 1 and 2, need a better explanation,

please include a better explanation of yours Convolutional neural networks, blocks or skecth.

The ASVI and the SV assessment done by qualified laryngologists had a statistically 437 significant strong connection with rs = 0.863 (p= 0.001). ASVI differences in the control, 438 cordectomy and partial laryngectomy, and whole laryngectomy patient groups were sta-439 tistically significant (p0.001). The refined lightweight ASVI algorithm achieved reaction 440 times of 3.56 ms.

This is my major concern regarding this work. At this step, authors should compare their work with other ones in other to demonstrate the novelty of their work. compared with other 2 works… Is it better or worse?

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Excellent paper with interesting approach and results!

It is suggested to make minor changes:

1) to make a sequence diagram (at page 9) be limited to fit into the page, so the figure caption could be readable. Currently, the figure caption is overlapped.

2) to explain, according to figure at page 9, why the metrics are performed during the recording or, maybe, to alter the sequence diagram to present different approach, if needed. Are metrics and neural network training performed upon recorded audio samples or they are used DURING the recording?

3) The title of this manuscript emphasizes the substitution voicing. Why there is no clear presentation of the difference between initial voicing (of male persons before surgery or healthy ones) and substitutional voicing i.e. altered voicing (of patients after surgery). If there is focus on SUBSTITUTION, what kind of substitution is made or is possible...It is needed to highlight practical implications of the research.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Summary: A study for the evaluation of substitution voicing using a convolutional neural network is described. The motivation behind the development of the proposed method is spelt out and the methodology used is explained clearly. The authors conclude that their technique yields ASVI results comparable to medical substitution voice evaluation.

Main Comments:

1. The paper is well-written and clear. However some aspects in the initial paragraphs are very much similar to one of the author's published work in Cancers 2022, 14, 2366. Possibly the authors can mention the differences between these two studies which are carried out on practically the same group of patients. The algorithm is shown to perform well on test data.

Overall Assessment: This work is of a good quality and the methodology applied seems to be new pertaining to this area. There seems to be a minor correction as indicated below. This work is suitable for publication.

Minor corrections:

(i). Page 7, line 275: 'int into' is not clear.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Dear Authors

Excellent changes, your manuscript looks better.