Advances in Clinical Voice Quality Analysis with VOXplot
Abstract
:1. Introduction
2. Materials and Methods
2.1. Participants
2.2. Auditory-Perceptual Judgment
2.3. Acoustic Measurements
2.4. Statistics
3. Results
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dejonckere, P.H.; Bradley, P.; Clemente, P.; Cornut, G.; Crevier-Buchman, L.; Friedrich, G.; Van De Heyning, P.; Remacle, M.; Woisard, V.; Committee on Phoniatrics of the European Laryngological Society (ELS). A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques. Guideline elaborated by the Committee on Phoniatrics of the European Laryngological Society (ELS). Eur. Arch. Otorhinolaryngol. 2001, 258, 77–82. [Google Scholar] [CrossRef] [PubMed]
- Verdolini, K.; Rosen, C.A.; Branski, R.C. Classification manual for voice disorders-I. In Special Interest Division 3, Voice and Voice Disorders, American Speech-Language-Hearing Association; Lawrence Erlbaum Associates, Inc.: Mahwah, NJ, USA, 2006. [Google Scholar]
- Fleischer, S.; Hess, M. The significance of videostroboscopy in laryngological practice. HNO 2006, 54, 628–634. [Google Scholar] [CrossRef] [PubMed]
- Barsties, B.; De Bodt, M. Assessment of voice quality: Current state-of-the-art. Auris Nasus Larynx 2015, 42, 183–188. [Google Scholar] [CrossRef] [PubMed]
- Shrivastav, R. Evaluating voice quality. In Handbook of Voice Assessments; Ma, E.P.M., Yiu, E.M.L., Eds.; Singular Publishing Group: San Diego, CA, USA, 2011; pp. 305–318. [Google Scholar]
- Buder, E.H. Acoustic analysis of voice quality: A tabulation of algorithms 1902–1990. In Voice Quality Measurement; Kent, R.D., Ball, M.J., Eds.; Singular Publishing Group: San Diego, CA, USA, 2000; pp. 119–244. [Google Scholar]
- Patel, R.R.; Awan, S.N.; Barkmeier-Kraemer, J.; Courey, M.; Deliyski, D.; Eadie, T.; Paul, D.; Švec, J.G.; Hillman, R. Recommended protocols for instrumental assessment of voice: American Speech-Language-Hearing Association expert panel to develop a protocol for instrumental assessment of vocal function. Am. J. Speech Lang. Pathol. 2018, 27, 887–905. [Google Scholar] [CrossRef]
- Maryn, Y.; Roy, N. Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity. J. Soc. Bras. Fonoaudiol. 2012, 24, 107–112. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Maryn, Y.; Corthals, P.; Van Cauwenberge, P.; Roy, N.; De Bodt, M. Toward improved ecological validity in the acoustic measurement of overall voice quality: Combining continuous speech and sustained vowels. J. Voice 2010, 24, 540–555. [Google Scholar] [CrossRef]
- Barsties v. Latoszek, B.; Mathmann, P.; Neumann, K. The cepstral spectral index of dysphonia, the acoustic voice quality index and the acoustic breathiness index as novel multiparametric indices for acoustic assessment of voice quality. Curr. Opin. Otolaryngol. Head Neck Surg. 2021, 29, 451–457. [Google Scholar] [CrossRef]
- Sobol, M.; Sielska-Badurek, E.M. The Dysphonia Severity Index (DSI)-normative values. Systematic review and meta-analysis. J. Voice 2022, 36, 143.e9–143.e13. [Google Scholar] [CrossRef]
- Uloza, V.; Barsties, V.; Latoszek, B.; Ulozaite-Staniene, N.; Petrauskas, T.; Maryn, Y. A comparison of Dysphonia Severity Index and Acoustic Voice Quality Index measures in differentiating normal and dysphonic voices. Eur. Arch. Otorhinolaryngol. 2018, 275, 949–958. [Google Scholar] [CrossRef]
- Maryn, Y.; Morsomme, D.; De Bodt, M. Measuring the Dysphonia Severity Index (DSI) in the program Praat. J. Voice 2017, 31, 644.e29–644.e40. [Google Scholar] [CrossRef]
- Batthyany, C.; Barsties, V.; Latoszek, B.; Maryn, Y. Meta-Analysis on the Validity of the Acoustic Voice Quality Index. J. Voice 2022, in press. [CrossRef] [PubMed]
- Barsties v. Latoszek, B.; Kim, G.H.; Delgado Hernandez, J.; Hosokawa, K.; Englert, M.; Neumann, K.; Hetjens, S. The validity of the Acoustic Breathiness Index in the evaluation of breathy voice quality: A Meta-Analysis. Clin. Otolaryngol. 2021, 46, 31–40. [Google Scholar] [CrossRef] [PubMed]
- Barsties v. Latoszek, B.; Lehnert, B.; Janotte, B. Validation of the Acoustic Voice Quality Index Version 03.01 and Acoustic Breathiness Index in German. J. Voice 2020, 34, 157.e17–157.e25. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Nawka, T.; Wiesmann, U.; Gonnermann, U. Validation of the German version of the Voice Handicap Index. HNO 2003, 51, 921–930. [Google Scholar] [CrossRef]
- Franca, M.C. Acoustic comparison of vowel sounds among adult females. J. Voice. 2012, 26, 671.e9–671.e17. [Google Scholar] [CrossRef]
- Brockmann, M.; Drinnan, M.J.; Storck, C.; Carding, P.N. Reliable jitter and shimmer measurements in voice clinics: The relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task. J. Voice. 2011, 25, 44–53. [Google Scholar] [CrossRef]
- Frey, L.R.; Botan, C.H.; Friedman, P.G.K.G. Investigating Communication: An Introduction to Research Methods; Prentice-Hall: Englewood Cliffs, NJ, USA, 1991. [Google Scholar]
- Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2000; pp. 156–164. [Google Scholar]
- Hanley, J.A.; McNeil, B.J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29–36. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Jayakumar, T.; Benoy, J.J. Acoustic Voice Quality Index (AVQI) in the measurement of voice quality: A systematic review and meta-analysis. J. Voice 2022, in press. [CrossRef]
- Barsties v. Latoszek, B.; Maryn, Y.; Gerrits, E.; De Bodt, M. The Acoustic Breathiness Index (ABI): A Multivariate Acoustic Model for Breathiness. J. Voice 2017, 31, 511.e11–511.e27. [Google Scholar] [CrossRef]
- Hillenbrand, J.; Houde, R.A. Acoustic correlates of breathy vocal quality: Dysphonic voices and continuous speech. J. Speech Hear. Res. 1996, 39, 311–321. [Google Scholar] [CrossRef]
- Michaelis, D.; Gramss, T.; Strube, H.W. Glottal-to-Noise Excitation Ratio—A New Measure for Describing Pathological Voices. Acustica 1997, 83, 700–706. [Google Scholar]
- Maryn, Y.; Roy, N.; De Bodt, M.; Van Cauwenberge, P.; Corthals, P. Acoustic measurement of overall voice quality: A meta-analysis. J. Acoust. Soc. Am. 2009, 126, 2619–2634. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Group | Type of Dysphonia | Number | Gender | Age in Years | ||
---|---|---|---|---|---|---|
Female | Male | Mean | SD | |||
Dysphonia Group | Carcinoma of head and neck | 55 | 13 | 42 | 61.25 | 10.18 |
Functional dysphonia | 38 | 26 | 12 | 52.11 | 16.48 | |
Larynx carcinoma | 28 | 1 | 27 | 69.96 | 9.05 | |
Paralyses | 25 | 14 | 11 | 63.36 | 16.09 | |
Nodules | 8 | 5 | 3 | 33.25 | 19.43 | |
Reflux laryngitis | 4 | 4 | 0 | 54.50 | 5.45 | |
Cancer of unknown primary syndrome | 4 | 2 | 2 | 61.00 | 8.21 | |
Mutational falsetto | 3 | 0 | 3 | 15.67 | 3.06 | |
Leukoplakia | 2 | 0 | 2 | 57.00 | 8.49 | |
Granuloma | 2 | 0 | 2 | 42.00 | 11.31 | |
Laryngitis | 2 | 1 | 1 | 39.50 | 12.02 | |
Parkinson’s | 2 | 0 | 2 | 74.00 | 11.31 | |
Polyp | 1 | 0 | 1 | 60.00 | - | |
Laryngeal trauma | 1 | 0 | 1 | 78.00 | - | |
Control group | None | 43 | 23 | 20 | 26.79 | 7.06 |
Category | Acoustic Measures | Abbreviation |
---|---|---|
Fourier and linear prediction coefficient spectra | Smoothed cepstral peak prominence is the distance between the first harmonic peak and the point with equal quefrency on the regression line through the smoothed cepstrum. | CPPS (dB) |
Differences between the amplitudes of the first and second harmonics in the spectrum. To localize the first harmonic peak, a cepstrum was performed for F0 determination. | H1H2 (dB) | |
Relative level of high-frequency noise between energy from 0 to 6 kHz and energy from 6 to 10 kHz. | HF-Noise (dB) | |
Harmonics-to-noise ratio is the base 10 logarithm of the ratio between the periodic energy and the noise energy, multiplied by 10 HNR. | HNR (dB) | |
Harmonics-to-noise ratio from Dejonckere and Lebacq, which analyzes the harmonic emergence of the spectral display comprised within the frequency bandwidth between 500 Hz and 1500 Hz. A cepstrum was performed to determine F0 and thus to localize the harmonic structure in the long-term average spectrum. | HNR-D (dB) | |
General slope of the spectrum is defined as the difference between the energy within 0–1000 Hz and the energy within 1000–10,000 Hz of the long-term average spectrum. | Slope (dB) | |
Tilt of the regression line through the spectrum is the difference between the energy within 0–1000 Hz and the energy within 1000–10,000 Hz of the trendline through the long-term average spectrum. | Tilt (dB) | |
Frequency of short-term perturbation measures | Period standard deviation is the variation in the standard deviation of periods in which the length of the sample is important for a valid computation of the standard deviation. | PSD (ms) |
Frequency of short-term perturbation measures | Two jitter variations: Jitter local is the average difference between successive periods, divided by the average period. | Jitter local (%) |
Jitter of the five-point period perturbation quotient is the average absolute difference between a period and the average of it and its four closest neighbors, divided by the average period. | PPQ5 (%) | |
Amplitude of short-term perturbations measures | Two shimmer variations: Shimmer local is the absolute mean difference between the amplitudes of successive periods, divided by the average amplitude. | Shimmer (%) |
Shimmer local dB is the base 10 logarithm of the difference between the amplitudes of successive periods, multiplied by 20. | Shimmer (dB) | |
Combines spectral and perturbation features | The glottal-to-noise-excitation (GNE) ratio with a maximum frequency of 4500 Hz. | GNE |
Voice Quality Parameters | Validation Parameters | Hoarseness | Breathiness |
---|---|---|---|
CPPS (dB) | Correlation | −0.76 * | −0.81 * |
AROC | 0.823 * | 0.915 ** | |
Threshold | 15.02 dB | 14.47 dB | |
Sensitivity | 84.7% | 88.1% | |
Specificity | 71.2% | 81.7% | |
GNE | Correlation | −0.70 | −0.78 * |
AROC | 0.798 * | 0.886 * | |
Threshold | 0.91 | 0.89 | |
Sensitivity | 88.9% | 91.7% | |
Specificity | 62.3% | 74.3% | |
H1H2 (dB) | Correlation | 0.03 | 0.12 |
AROC | 0.448 | 0.584 | |
Threshold | Chance−level based on AROC | 6.39 dB | |
Sensitivity | Chance−level based on AROC | 40.4% | |
Specificity | Chance−level based on AROC | 82.6% | |
HNR (dB) | Correlation | −0.71 * | −0.56 |
AROC | 0.812 * | 0.794 * | |
Threshold | 23.34 dB | 23.34 dB | |
Sensitivity | 90.3% | 78.9% | |
Specificity | 62.9% | 68.5% | |
HNR-D (dB) | Correlation | −0.57 | −0.38 |
AROC | 0.760 * | 0.701 * | |
Threshold | 31.77 dB | 24.23 dB | |
Sensitivity | 61.1% | 77.1% | |
Specificity | 80.8% | 53.2% | |
HF noise (dB) | Correlation | −0.48 | −0.49 |
AROC | 0.698 | 0.728 * | |
Threshold | 2.28 dB | 2.29 dB | |
Sensitivity | 80.6% | 77.1% | |
Specificity | 54.1% | 62.4% | |
Jitter local (%) | Correlation | 0.68 | 0.57 |
AROC | 0.839 * | 0.808 * | |
Threshold | 0.50% | 0.57% | |
Sensitivity | 70.8% | 71.0% | |
Specificity | 84.7% | 78.0% | |
PPQ5 (%) | Correlation | 0.71 * | 0.55 |
AROC | 0.833 * | 0.799 * | |
Threshold | 0.29% | 0.32% | |
Sensitivity | 67.2% | 67.0% | |
Specificity | 84.5% | 75.9% | |
PSD (ms) | Correlation | 0.59 | 0.41 |
AROC | 0.802 * | 0.730 * | |
Threshold | 0.00012 ms | 0.00018 ms | |
Sensitivity | 65.3% | 50.5% | |
Specificity | 81.9% | 88.1% | |
Shimmer (%) | Correlation | 0.65 | 0.53 |
AROC | 0.773 * | 0.780 * | |
Threshold | 3.08% | 3.58 | |
Sensitivity | 53.5% | 57.0% | |
Specificity | 91.7% | 90.8% | |
Shimmer (dB) | Correlation | 0.66 | 0.55 |
AROC | 0.783 * | 0.786 * | |
Threshold | 0.27 dB | 0.33 dB | |
Sensitivity | 54.9% | 57.9% | |
Specificity | 91.7% | 91.7% | |
Slope (dB) | Correlation | −0.09 | −0.11 |
AROC | 0.617 | 0.602 | |
Threshold | −25.08 dB | −25.34 dB | |
Sensitivity | 81.9% | 80.7% | |
Specificity | 39.7% | 43.1% | |
Tilt (dB) | Correlation | 0.30 | 0.43 |
AROC | 0.592 | 0.673 | |
Threshold | −10.32 dB | −11.73 dB | |
Sensitivity | 34.9% | 81.7% | |
Specificity | 86.1% | 46.8% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Barsties v. Latoszek, B.; Mayer, J.; Watts, C.R.; Lehnert, B. Advances in Clinical Voice Quality Analysis with VOXplot. J. Clin. Med. 2023, 12, 4644. https://doi.org/10.3390/jcm12144644
Barsties v. Latoszek B, Mayer J, Watts CR, Lehnert B. Advances in Clinical Voice Quality Analysis with VOXplot. Journal of Clinical Medicine. 2023; 12(14):4644. https://doi.org/10.3390/jcm12144644
Chicago/Turabian StyleBarsties v. Latoszek, Ben, Jörg Mayer, Christopher R. Watts, and Bernhard Lehnert. 2023. "Advances in Clinical Voice Quality Analysis with VOXplot" Journal of Clinical Medicine 12, no. 14: 4644. https://doi.org/10.3390/jcm12144644