Next Article in Journal
Fusion Network for Change Detection of High-Resolution Panchromatic Imagery
Next Article in Special Issue
Data Augmentation for Speaker Identification under Stress Conditions to Combat Gender-Based Violence
Previous Article in Journal
Fault Diagnosis for a Bearing Rolling Element Using Improved VMD and HT
 
 
Article
Peer-Review Record

Automatic Assessment of Prosodic Quality in Down Syndrome: Analysis of the Impact of Speaker Heterogeneity

Appl. Sci. 2019, 9(7), 1440; https://doi.org/10.3390/app9071440
by Mario Corrales-Astorgano 1,*, Pastora Martínez-Castilla 2,*, David Escudero-Mancebo 1,*, Lourdes Aguilar 3,*, César González-Ferreras 1,* and Valentín Cardeñoso-Payo 1,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2019, 9(7), 1440; https://doi.org/10.3390/app9071440
Submission received: 19 February 2019 / Revised: 2 April 2019 / Accepted: 3 April 2019 / Published: 5 April 2019

Round 1

Reviewer 1 Report


This work analyses how the heterogeneity of Down syndrome patients can affect automatic assessment of prosodic quality.  Evaluations are done with a therapist and an expert in prosody, to judge the prosodic appropriateness of speech collected in the context of a video game.  The paper is well-written, and describes useful experiments.

Specific changes needed (in each line, first: text from the article; 2nd: my comments):

… that considers the high number of variables and… - what does this mean?  be more clear here

…that predicts the quality… - quality of what?

…the relationship of some prosodic features …- what does this mean?  be more clear here

 We observe the different importance of the prosodic features in the automatic classification … - again, be more specific

This result seems to indicate that… - avoid vague words such as “seems”

 this heterogeneity must be taken into account when developing an automatic assessment of the prosodic quality of people with Down syndrome.

…associated with trisomy 21 (third copy of chromosome 21) … - why say this?

…description, phenotypic variability … - explain what this is

…[8] report disfluencies …
->
…[8] reports disfluencies …

… difficulty of separating the effects of each of the suprasegmental features on communication together… -what does this mean? e.g., what is “separating the effects”? to what end? why?

…lead to insecurities and low self esteem. - state that this refers to the patient

Therefore, we propose that any improvement in the
suprasegmental domain will lead to a better verbal interaction. - far too general a statement

Both the fields of perception and production are included in the design of the tasks. - far too general a statement

 Concerning this question, the attempts to classify different speech dimensions is well researched, - vague; rephrase

… the recognition of speech emotions and autism spectrum disorders have … - rephrase; is one recognizing both of these? if so, then say “has”

 The point is that all these works … - avoid trite sayings such as “ The point is”

… by experts as a gold standard … - avoid trite sayings such as “gold standard” (this has nothing to do with gold…)

… from the audios of the corpus.
->
… from the recordings of the corpus.

…whether the user must repeat …. - “must”? if not, what then?

….and the most informative ones are selected. -how is this determined?

…was rightly resolved, Cont (Continue) means that the activity was satisfactorily resolved… - what is the difference
between rightly and satisfactorily? These words mean the same.

section 2.2.1 uses present tense, while section 2.1 used past tense, why?

hitting a concrete key … -what is this?

… frustrated, when ever possible, the therapist
->
… frustrated, whenever possible, the therapist

With the aid of a web, … -what is this?

…she listened to each audio file and decided
whether the speaker should repeat the sentence, since she/he … - confusing use of pronouns; do the TWO “she” refer to the same person? I think not…

The judgments were made relying on a purely auditive basis, - what does auditive mean?

As explained is section 2.2, …
As explained in section 2.2, …

According to this, the output of the different classifiers are
According to this, the outputs of the different classifiers are

I suggest to not use percentages to 2 decimal places,… one is enough

 the least qualified speakers in Table 2. - least qualified in what sense?

is highly and statistically correlated… - does not highly imply statistically?

…. highly correlated with the prosodic production competences (MProdT), but the correlation did
 not reach statistical significance. - who can we have “highly correlated” yet not statistical significance?

…significant ROC area value (above 0.6). - was ROC explained earlier in the paper?

 Phenotype variability is common … - what is this?

…to be related with a good assessment of the recordings,
…to be related to a good assessment of the recordings,

… depends on the speakers developmental levels
… depends on the speaker’s developmental levels

At the top of this, differences
->
In addition, differences

…within the video game.
->
…while playing the video game.

…has also been evidenced
…has also been shown

 There are some coincidences among speakers, - what are these?

 build a generalist solution. - what is this?

 References:
- do not repeat the year date multiple times (e.g., ref. 1)

- use initial capital letters for titles

29. Raven, J.; Raven, J.C.; others. - this looks strange


Author Response

Please find in the attached pdf the response to the reviewer 1

Author Response File: Author Response.pdf

Reviewer 2 Report

Summary

This paper examined the automatic detection of prosodic accuracy measured using a video game designed to improve the prosody and pragmatic abilities of children with intellectual disability (including Down syndrome). The preliminary investigation reports some new findings and is of interest to the readership of this journal. Further replication and expansion appear to be needed before this technique can be effectively applied in a clinical setting. A number of comments are highlighted below with the aim of improving the quality of this paper.

 

Major comments

1.     Abstract

a.     The abstract provides a clear and succinct summary of the study. A few details can be adapted for clarity as follows:

b.     It may be useful for readers if authors included a definition of prosody in the abstract, including specific speech features that were considered in the perceptual analysis.

c.     The aims could be outlined more explicitly, e.g., training the automatic classifier and then analysis of specific prosodic features.

d.     Lines 10-11, “In addition, the relationship of some prosodic…”; this sentence is hard to follow, authors can be more explicit – do authors mean the relationship between prosodic features detected by the perceptual assessment compared with the automatic classifier?

2.     Introduction

a.     The first line of the introduction can be clarified – do authors mean that Down syndrome is a genetic anomaly, and that intellectual disability is a common feature of Down syndrome?

b.     The first few paragraphs provide a good introduction to the field of interest.

c.     Page 2, line 44 – this sentence requires expansion. An introductory line on technology in this field more generally can guide the reader to the new topic introduced in this paragraph. Further, what areas of therapy do the existing tools focus on?

d.     Authors may like to expand on why and how it’s “agreed that prosodic skills have great power to improve communication abilities”. This can be done in paragraphs 2-3, page 2. Some further details on this are included on page 3, lines 68-70, this information can be brought forward in the introduction.

e.     The introduction includes some methodological details that can be moved to the methods (e.g., the details of the game, the section outlining assessments completed by the trained listener, etc.).

f.      The final paragraph does not need to include an outline of the paper, rather, the aims can be more clearly outlined, e.g., “this study aimed to… “. The methods become difficult to follow without clearly defined aims in the introduction. Authors can differentiate the aims that relate specifically to the PRADIA software, and those that relate to instrumental measures of prosodic features in general.

3.     Methods

a.     What microphone was used to record the speech samples? It is known that the methods of recording can affect acoustic results.

b.     It is unclear which variables listeners rated and how these were compared to the automatic classification. I.e., Did the two listeners rate the specific features of prosody, i.e., intonation, accent and phrasing? Or simply whether the responses were “right” or “wrong”, or the individual features outlined in Table 6. These details do not become clear until later in the manuscript. Authors might like to clarify these details and clearly highlight how specific comparisons relate to identified aims.

c.     I am not an expert in statistical analysis and have not commented on statistical analyses in great detail. The manuscript would benefit from review by a statistician specializing in this area. Nonetheless, the sample sizes appear to be small for correlation analysis, please provide further details regarding power calculations.

4.     Results

a.     Details regarding the reason behind selection of different “Cases” in table 3 can be explained further in the methods section.

b.     As the aims are not explicitly highlighted in the results, it is difficult to determine how each step of the methods is related to study questions. Once the aims are clearly outlined in the introduction, the structure of the methods and results may be easier to follow.

5.     Discussion

a.     As with the results and methods, when the aims are clarified, the discussion may be easier to follow; with each paragraph aligning to a specific aim.

b.     Authors might like to also draw more on existing literature examining automatic measures of speech. Further, broader literature in the field of speech-language pathology suggests that perceptual ratings are more likely to be consistent between raters if the speech is more ‘typical’, when compared to ratings made of disordered speech. This can also be discussed further in light of the study outcomes.

c.     Authors could provide further discussion on the results from Table 3.

d.     The limitations highlight some important points for consideration and future work.

e.     Authors might also like to include some direct clinical implications of this work. Results may need to be replicated and expanded – i.e., to include specific perceptual prosodic ratings and identify significant automatically identified prosodic features – in order for this technique to be utilized effectively in a clinical setting.

6.     Conclusion

a.     Authors might like to clarify what they mean by “quality of recordings” – this could be interpreted as quality of the speech sound recording, rather than the amount that the utterance deviates from typical.

 

Minor issues

1.     Inappropriate use of in text references – authors of referenced papers need to be included in the main text if they form part of the sentence, e.g., page 1 line 22 “… as described in [4]”; line 26 “… the study in [6] demonstrated that children with DS….”; page 2, line 31 “… [8] report disfluencies …”.

2.     Page 1 line 22, authors may prefer to say “all areas of language may be impaired”, rather than “are impaired”.

3.     Page 3, paragraph starting line 94, there is no “section 1” mentioned.

4.     Page 11 paragraph beginning line 329, this paragraph comes out of the blue, it may benefit from further linking details.

5.     There are a number of typographic and grammatical errors, alongside informal language use, throughout the manuscript.


Author Response

Please find in the attached pdf the response to the reviewer 2

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper is well written and has many advantages. The main weakness of the publication are conclusions based on a small group of speakers. The purpose of the work is not clear. Probably it is the cognitive aspect because there are no conclusions that could be useful in diagnosis or rehabilitation.

Author Response

Please find in the attached pdf the response to the reviewer 3

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Authors have comprehensively revised the manuscript, considering my feedback and suggestions appropriately. This area of research is important and interesting to investigate. I have re-read the manuscript and only have a few minor comments as outlined below.

 

Minor comments:

 

Page 11 line 345, grammar, did authors mean sub-corpora, plural? (“… included more speakers than the other subcorpus but much fewer samples of each speaker.)

 

Page 11 lines 354-356 are not grammatical. This can be divided into two sentences.

 

Page 12 lines 402; thank you for addressing this point so diligently. The phrasing of this sentence is such that readers may think different acoustic parameters were considered for those with impaired speech compared to those with typical speech. Authors may like to re-phrase this sentence to emphasize that they are referring to how acoustic features were more variable for those with individual variability. A speaker may not ‘use’ acoustic parameters, rather, their speech signal may be characterised by a specific set of (more variable) acoustic parameters.


Author Response

Please find in the attached pdf the response to the reviewer 2

Author Response File: Author Response.pdf

Back to TopTop