Review Reports - Modelling of Amplitude Modulated Vocal Fry Glottal Area Waveforms Using an Analysis-by-Synthesis Approach

Round 1

Reviewer 1 Report

This paper relates a study that investigates modeling of different types of amplitude modulated vocal fry glottal area waveforms (GAW). The aim of developing such models is to automatically segregate euphonic GAWs from various types of vocal fry, based on their cyclicity and the number of pulses in a cycle.

The paper is clear and well organized, the modeling method is explained in detail and the results are quantified, allowing to estimate the reliability of the modulating model.

Major concerns

The Discussion section actually looks like a Conclusion because it summarizes the article. A proper Discussion section is currently lacking in this article. In such a section, the results should be compared against literature outcomes and an argumentation should draw the performance and limits of the proposed method in the context of other studies on the subject. Please add a Discussion section before this reviewer can encourage publication of this article.

Minor issues

- The Introduction section could be developed, in particular in direction of the relation between vocal fry and vibration of the ventricular folds, such as investigated in [Bailly, L., Bernardoni, N. H., Müller, F., Rohlfs, A. K., & Hess, M. (2014). Ventricular-fold dynamics in human phonation. Journal of Speech, Language, and Hearing Research, 57(4), 1219-1242].

- Figure 1 could be complemented by series of images from the camera over one glottal cycle. For each type of the four vibratory behaviors, a visual illustration would allow to discuss the type of modal pattern and yield material for further discussion.

- 2.4 title might be changed

- In section 3. Results, the so-called “non-modulating model” is the basic waveform on which the amplitude modulation is further applied to obtain the “modulating model”, right? It would be worth defining clearly the “non-modulating model”.

- Lines 277 to 283, “As a result, ...” to “… using the non-modulating model” is very straightforward and could be avoided.

- In Fig. 8, does CAMS and AAMS should read CAMP and AAMP?

- For the discussion on the efficiency of the Entropy and Carrier frequency ratio as estimators of the different classes of subtypes of vocal fry GAWs, please run an ANOVA test on the data related to Fig. 8. It should be possible to conclude that significant differences are obtained between the subtypes, which would validate the reliability of the estimators.

Decision of this reviewer

Elements of this article, in its present form, are worth being published because the proposed model is original, carefully presented, and its performances are quantified. The lack of proper discussion of these results refrain from direct publication and this reviewer recommends to add such a section, in addition to the minor issues listed above, before this article can be published.

Author Response

Dear Sir/Madam

First, thank you for reviewing the manuscript.

I have attached the word file containing replies and corrections to your comments.

regards,

Vinod

Author Response File: Author Response.docx

Reviewer 2 Report

The authors investigated different types of amplitude modulated vocal fry glottal area waveforms. They were modelled using an analysis-by-synthesis approach and distinguished automatically from euphonic GAWs based on their modelling errors.

Language. English style requires some minor corrections so the article could be clearly explained.

Introduction. The introduction is well structured but should contain more relevant information with updated references.

References. References are adequately discussed in a fair context but need to update some of them. There are missing some key references on the field. The references section could be improved. There are 10 references of 20 with more than ten years old, and 14 references of 20 with more than five years old, so some of them could be taken out or be updated.

Structure. The manuscript is well structured, containing all relevant sections required in a technical paper.

Results and discussion. The manuscript should improve in their structure. It containing all relevant sections required in a technical paper. Still, some sections should be improved: a) The description of the experimental setup addressed is very poor, and the authors should have provided more detail about the system under investigation.

Novelty/contribution. I appreciate the hard work done by the authors, and the manuscript at this stage, does not introduce any relevant novelty, it just applies some well-known technique (FFT and some statistical methods) which has been used in many related published papers. Having said that, the following comments bears significance concerning this paper:

a) Define the meaning of SVM in the first appearance through the text.

b) In Fig. (1) what does “px” means?

c) In the manuscript you do not talk about the noise present in the signals, why not? All signals have noise. Two types of uncertainties should be considered in simulations, the first one is the measurement noise (it is Gaussian) and the second is a noise intrinsic to the system, from unidentified sources (it is also introduced within the simulations). The obtained results from these noise considerations lead to realistic observations both in terms of the performance of the experiment and in terms of testing the proposed methodology. The authors should explain and provide justification for the choice of just one kind of noise (Gaussian noise).

d) It is already known that Fourier transform is not suitable for non-lineal signals. I still considered GAW non-linear and stochastic signals, so why not to use a technique more suitable for this kind of signals.

e) In Fig.2, what is the switching time between y(t) an sys(t)?

f) You use statistical parameters of first order (mean a standard deviation), the use of high order of statistical parameters could provide better results?

g) Are the mean, variances, and Shannon’s Entropy measurements adequate for the GAW modelling? What is the influence of the noise? Which is the warning/alarms levels of this indicatores? Are they equals for all the signals? There are too many unanswered questions that must be fully addressed to validate a new diagnostic method, which are not even mentioned in the presented work.

h) No comparison has been made with the performance of other well-established methods, not cited in the references. Many relevant papers in this field have not been cited. IT is mandatory that authors realize a comparison with at least other two methodologies.

Author Response

Dear Sir/Madam,

Thanks you for your reviews,

I have attached the document containing corrections for you reviews.

regards,

Vinod

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

After a new revision of this proposed article; this reviewer can conclude the new submission has been significantly improved from the first version. The article presents an interesting methodology, and the results are optimistic. In the new version, the concerns and questions of the first manuscript are clearly attended, which makes the paper much more useful and interesting.

Therefore, the new submitted article has been done well. Authors have addressed all issues.

Thus, this reviewer recommends this manuscript for publication in Applied Sciences.

Author Response

Dear Sir/Madam,

We have revised the manuscript for minoe spell checks. Also, we have updated the results. The revised version of the manuscript is attached to this mail.

Thanks for your reviews,

Vinod

Author Response File: Author Response.docx