Next Article in Journal
HIF-1α as a Target Molecule in the Use of Triazino-Indole Derivative on the Acoustic Trauma Model
Next Article in Special Issue
Clinical Trial for Cartilage Conduction Hearing Aid in Indonesia
Previous Article in Journal
Optimization of the Speech Test Material in a Group of Hearing Impaired Subjects: A Feasibility Study for Multilingual Digit Triplet Test Development
Previous Article in Special Issue
How Is the Cochlea Activated in Response to Soft Tissue Auditory Stimulation in the Occluded Ear?
 
 
Article
Peer-Review Record

Word Categorization of Vowel Durational Changes in Speech-Modulated Bone-Conducted Ultrasound

Audiol. Res. 2021, 11(3), 357-364; https://doi.org/10.3390/audiolres11030033
by Tadao Okayasu 1,*, Tadashi Nishimura 1, Akinori Yamashita 1, Yoshiki Nagatani 2, Takashi Inoue 3, Yuka Uratani 1, Toshiaki Yamanaka 1, Hiroshi Hosoi 4 and Tadashi Kitahara 1
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Audiol. Res. 2021, 11(3), 357-364; https://doi.org/10.3390/audiolres11030033
Submission received: 31 May 2021 / Revised: 3 July 2021 / Accepted: 12 July 2021 / Published: 14 July 2021
(This article belongs to the Special Issue Bone and Cartilage Conduction)

Round 1

Reviewer 1 Report

The paper studied the transmission of the prosodic information via speech-modulated bone-conducted ultrasound (SM-BCU) and compared it with that via air-conducted audible sound. The paper is well organized, and the results are clear. I only have a few minor comments.

  • Could the author discuss if the results derived from ‘hato’ and ‘haato’ also valid for other vowels such as ‘tori’ and ‘toori’?
  • In line 108, is reference [6] the previous study of any of the authors?
  1. Lenhardt M.L., Skellett R., Wang P., Clarke A.M. Human ultrasonic speech perception. Science 1991, 253, 82–85. 208

3) Please fix the grammar error in the first sentence of Abstract.

4) The authors didn’t randomize the ACAS experiment with SM-BCU experiment. Could the difference of threshold between ACAS and SM-BCU result from it?

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript describes a study of the perception of word duration.  Words from a vowel-duration continuum were presented via the normal acoustic pathway or via bone-conducted ultrasound.  Word-identification functions were fitted with a logistic function from which two dependent measures were extracted.  The first of these, referred to as the categorization boundary or simply the threshold, was the midpoint of the identification function, the duration for which two responses were equally likely.  That's a straightforward measurement in speech-identification research.  The manuscript would be strengthened if the actual identification data were shown, in addition to the fitted curves in Fig. 2.  That wouldn't be difficult, given that only 8 listeners participated, and it would enable the reader to judge the quality of the fitting.  The threshold for bone-conducted ultrasound was about 5 msec longer than the threshold for acoustic speech. That difference reached statistical significance, but whether it's a meaningful difference seems to me to be an open question.  The manuscript devotes too much text and a figure to an attempt to explain the difference in terms of fine structure; I did not find that speculation to be convincing because it's based on such a small amount of data.  If a consistent duration effect were demonstrated over a range of words, vowels, durations, and talkers, then I'd be willing to consider this interpretation.  At this point, the small threshold difference seems more likely to be fortuitous.  A better use of the Discussion would be to consider some more subjective aspects of the results.  Did the listeners perceive the bone-conducted sounds as speech-like?  If not, how were they different?  Did they respond because they actually understood the words, or did they base their responses on some other attribute?

The second measure is called the “differential threshold”.  That term, or something similar to it, is more appropriately applied to data in which a listener hears two or more sounds, and makes a judgment about whether one of them is different from the others.  That is not what was done in this study.  What is actually being reported here is a number that reflects the slope of the fitted identification function.  It would be preferable to convert those numbers to actual slopes and label them correctly.  The slope is related to the difference threshold, but it is not identical to it.

Specific comments:

line 78:  "20 dB HL or lower" might be clearer than "within 20 dB HL"

lines 101-102:  I think this means each token was presented 10 times, for a total of 70 words.  Re-phrase to make what was done clearer.

line 103:  It's unclear what "seven sessions in random order" means, since the stimuli presented in each session seem to be identical.

 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

My comments have been addressed.  However, I continue to think that some actual data should have been presented (individual fitted functions in Fig. 2 are not the actual data).

Back to TopTop