Next Article in Journal
The Conformity of Rehabilitation Protocols Used for Different Cartilage Repairs of the Knee Joint—A Review on Rehabilitation Standards in German Speaking Countries
Next Article in Special Issue
Influence of TTS Systems Performance on Reaction Times in People with Aphasia
Previous Article in Journal
Adaptive Prediction of Enhanced Oil Recovery by N2 huff-n-puff in Fractured-Cavity Reservoir Using an FNN-FDS Hybrid Model
Previous Article in Special Issue
Which Utterance Types Are Most Suitable to Detect Hypernasality Automatically?
 
 
Article
Peer-Review Record

Improving Aphasic Speech Recognition by Using Novel Semi-Supervised Learning Methods on AphasiaBank for English and Spanish

Appl. Sci. 2021, 11(19), 8872; https://doi.org/10.3390/app11198872
by Iván G. Torre *, Mónica Romero and Aitor Álvarez *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(19), 8872; https://doi.org/10.3390/app11198872
Submission received: 31 July 2021 / Revised: 16 September 2021 / Accepted: 20 September 2021 / Published: 24 September 2021
(This article belongs to the Special Issue Applications of Speech and Language Technologies in Healthcare)

Round 1

Reviewer 1 Report

The presented paper could be accepted in presented form.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Reviewer 2 Report

The paper focuses on speech recognition using a semi-supervised learning approach for patients suffering from an aphasic condition. 

The topic is interesting and worth investigating. The paper includes an adequate literature review and the purpose of the paper is clearly defined.

The authors are kindly asked to mention why they have not chosen a cross-validation approach, given the relatively limited corpus.

The achieved CER and WER are rather low and should be compared to the state of the art in the domain.

It would be advisable to share the resulting models (and maybe tools) with other researchers using a public repository, such as zenodo or github, in order to promote research reproducibility.

The authors could also consider including an example of translated speech from the considered dataset.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Reviewer 3 Report

This article considers  XLSR-53 (ref[45]) model, build on top of wav2vec2.0 architecture (ref[27]), which has been pre-trained in 53 languages, being Spanish and English among those, to improve Aphasic speech recognition. Different metrics are calculated, like Word Error Rate, where the authors best model achieves better results for all severity levels of aphasia compared to the given baseline.

The article is well written and organized, given a general and clear overview of the problem.

The novelty of the work will rely on the evaluation of Spanish dataset, with only 2.2h of transcribed speech was available. Where there are no levels of severity annotated, however the resulted metrics can stablish a new baseline for Spanish Aphasic recognition.

The use of state-of-the-art ASR techniques for aphasic speech seems particularly important when dealing with small audio datasets and it contributes to the ASR domain by studying in detail how the semi-supervised methods can improve recognition performance.

 

Questions/comments:

- For the Aphasic Spanish dataset, does it include Latin American speakers?

- How the speakers are divided into folds (train, validation, test), it is not clear to me, is it ensured that the speakers at training time are not repeated for testing time? The splits for the presented baseline scenarios are different from the splits use by the authors.

-Table 3. What AM refers to?

- Section 5. Why CER and WER are calculated and not PER and WER as in the presented baselines scenarios? In my opinion, the metrics used and why could be improved and further explained.

Minor comments

- Table 3. for BLSTM-RNN the "very severe" scenario should be 63.17 instead of 53.17, according to the cited paper.

- Table 4 and 5. First column gives same info for all the rows. Can be deleted and just mention it in the caption or in the text. Even they can be merged since the first 4 columns are the same.

- The metrics should be indicated in the table with % if they need it. Since in the text those same values are referred with %.

Author Response

"Please see the attachment."

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I would like to thank the authors for the changes made.

Back to TopTop