A Case of Speciﬁcity: How Does the Acoustic Voice Quality Index Perform in Normophonic Subjects?

: The acoustic voice quality index (AVQI) is a multiparametric tool based on six acoustic measurements to quantify overall voice quality in an objective manner, with the smoothed version of the cepstral peak prominence (CPPS) as its main contributor. In the last decade, many studies demonstrated its robust diagnostic accuracy and high sensitivity to voice changes across voice therapy in di ﬀ erent languages. The aim of the present study was to provide information regarding AVQI’s and CPPS’s performance in normophonic non-treatment-seeking subjects, since these data are still scarce; concatenated voice samples, consisting of sustained vowel phonation and continuous speech, from 123 subjects (72 females, 51 males; between 20 and 60 years old) without vocally relevant complaints were evaluated by three raters and run in AVQI v.02.06. According to this auditory-perceptual evaluation, two cohorts were set up (normophonia versus slight perceived dysphonia). First, gender e ﬀ ects were investigated. Secondly, between-cohort di ﬀ erences in AVQI and CPPS were investigated. Thirdly, with the number of judges giving G = 1 to partition three sub-levels of slight hoarseness as an independent factor, di ﬀ erences in AVQI and CPPS across these sub-levels were investigated; for AVQI, no signiﬁcant gender e ﬀ ect was found, whereas, for CPPS, signiﬁcant trends were observed. For both AVQI and CPPS, no signiﬁcant di ﬀ erences were found between normophonic and slightly dysphonic subjects. For AVQI, however, this di ﬀ erence did approach signiﬁcance; these ﬁndings emphasize the need for a normative study with a greater sample size and subsequently greater statistical power to detect possible signiﬁcant e ﬀ ects and di ﬀ erences. F.W.; data C.B.; and and and


Introduction
In the last few decades, it has been a challenge to standardize the evaluation of overall voice quality in practice. Until now, auditory-perceptual evaluation of voice quality has been considered to be the gold standard due to its accessibility and the perceptual nature of voice quality. Subsequently, it

Participants
Normative data were collected from 123 subjects (72 females and 51 males) between 20 and 60 years old. Participants were not selected at random but were recruited by way of snowball sampling, since there was an appeal on volunteers such as family members, friends, and acquaintances. They mainly originated from the West Flemish region of Belgium, Europe. All participants included did not have any voice-related complaints, were not diagnosed with vocal pathology, and were not seeking voice treatment.

Voice Recordings
All subjects were asked to sustain the vowel /a:/ for at least 5 s and to read a Dutch phonetically balanced text at comfortable pitch and loudness. Both types of voice samples were recorded using an AKG C420 head-mounted condenser microphone (AKG Acoustics, München, Germany), digitized at a sampling rate of 44.1 kHz and a resolution of 16 bits using the Computerized Speech Lab model (KayPentax, Lincoln Park, NJ, USA), and saved in WAV-format. The microphone was positioned at 8-10 centimeters and at a 45 • azimuthal angle from the mouth. Recording took place in an anechoic audiometric cabin with a low mean ambient noise of 38 dBA. Vowel samples were edited to include only the medial 3 s and continuous speech samples were cropped to contain only the first two sentences of the speech task.

Overall Dysphonia Ratings
Using the software Praat both vowel samples and continuous speech samples were edited and emerged resulting into a concatenation of the first two sentences of the speech task, a pause of two seconds, followed by the medial three seconds of the vowel /a:/. The concatenated samples were evaluated by three raters, of whom one experienced speech-language pathologist and two final-year speech-language pathology students with minimal experience with dysphonic voices and voice pathology. Perceptual overall voice quality (i.e., hoarseness) was evaluated using the first parameter 'Grade' (i.e., G) following the GRBAS scale [2], proposed by the Japan Society of Logopedics and Phoniatrics. As recommended by Wuyts et al. [38], a 4-point equal-appearing interval scale was used (i.e., 0 = normally or absence of hoarseness, 1 = slightly hoarse, 2 = moderately hoarse, 3 = severely hoarse). Rating occurred fully separately and independently. Only those voice samples that were unanimously rated as G = 0 by all three raters, were considered to be true normophonic. Consequently, two cohorts were set up, with the first cohort consisting of 83 subjects with no perceived dysphonia and the second cohort consisting of 40 subjects with slight perceived dysphonia. Furthermore, three sub-levels of slight hoarseness were obtained according to the number of judges rating G as 1 (1 = majority normophonia, 2 = majority slight dysphonia, 3 = unanimously slight dysphonia).

Acoustic Measures and AVQI
Because certain acoustic measures used in AVQI are only valid for voiced segments, the continuous speech samples needed to be edited. Detection and extraction of voiced segments from the first two sentences of the speech task were performed in the software Praat using the extraction Praat-script from Maryn et al. (2009) [5], an algorithm based on three criteria, proposed by Parsa and Jamieson (2001) [6]. A segment was considered to be voiced if (a) sound energy exceeded 30% of the overall sound energy, (b) zero crossing rate was less than 1500 Hz, and (c) the normalized autocorrelation peak was above 0.3. Subsequently, the medial three seconds of the vowel /a:/ were appended to the voiced segments, resulting in a concatenation sample with a correspondent single sound waveform. The six acoustic measures implemented in AVQI were obtained using the software Praat. First, harmonics-to-noise-ratio (i.e., HNR) was specified as the base-10-logarithm of the ratio between the periodic energy and the noise energy multiplied by 10. Secondly, shimmer local (i.e., SL) was the absolute mean difference between the amplitudes of successive periods divided by the average amplitude. Thirdly, the shimmer local dB (i.e., SLdB) was obtained as the base-10-logarithm of the differences between the amplitudes of successive periods multiplied by 20. Fourthly, the general spectral slope (i.e., Slope) was calculated as the difference between the energy in the 0-1 kHz range and the energy in the 1-10 kHz range of the long-term average spectrum. Fifthly, the spectral trendline inclination (i.e., Tilt) was measured as the difference between the energy in the 0-1 kHz range and the energy in the 1-10 kHz range of the trendline through the long-term average spectrum. Finally, the sixth and main acoustic measure implemented in AVQI, smoothed cepstral peak prominence (i.e., CPPS), was calculated as the distance between the first rahmonic's peak and the point with equal quefrency on the regression line through the smoothed cepstrum. To determine the CPPS in the software Praat on a Windows or Mac computer, the following steps were completed: First, click on the "Analyze periodicity -" menu, choose "To PowerCepstrogram . . . ", and then complete the "Sound: To PowerCepstrogram" form as shown in Figure 1. Second, query the resulting PowerCepstrogram object by clicking on the "Query -" menu, choose the "Get CPPS . . . " option, and then finally complete the "PowerCepstrogram: Get CPPS" form as shown in Figure 2. The result of these steps is that the CPPS is shown as a value in dB in a "Praat Info" screen. Ultimately, AVQI (v.02.06) was calculated according to the following regression formula:

Statistical Analysis
Statistical analysis was completed using the software package SPSS for Windows version 15 (SPSS Inc., Chicago, IL, USA). Non-parametric statistical tests were used for small sample sizes (N < 30), as for larger sample sizes (N ≥ 30) parametric statistical tests were used. Assumptions of normality were checked and verified using the one-sample Kolmogorov-Smirnov test. First, gender effects for AVQI and CPPS were investigated using the Student t-test. Secondly, between-cohort differences in AVQI and CPPS were investigated using the Mann-Whitney U test. Thirdly, differences across the normophonic cohort and the three sub-levels of slight dysphonia were investigated using the omnibus Kruskal-Wallis H test and post-hoc Mann-Whitney U tests. All results were considered statistically significant at p < 0.05. Upper and lower cut-off values were calculated using the standard deviation value: mean + and − (2 × standard deviation), respectively.

Statistical Analysis
Statistical analysis was completed using the software package SPSS for Windows version 15 (SPSS Inc., Chicago, IL, USA). Non-parametric statistical tests were used for small sample sizes (N < 30), as for larger sample sizes (N ≥ 30) parametric statistical tests were used. Assumptions of normality were checked and verified using the one-sample Kolmogorov-Smirnov test. First, gender effects for AVQI and CPPS were investigated using the Student t-test. Secondly, between-cohort differences in AVQI and CPPS were investigated using the Mann-Whitney U test. Thirdly, differences across the normophonic cohort and the three sub-levels of slight dysphonia were investigated using the omnibus Kruskal-Wallis H test and post-hoc Mann-Whitney U tests. All results were considered statistically significant at p < 0.05. Upper and lower cut-off values were calculated using the standard deviation value: mean + and − (2 × standard deviation), respectively.

Statistical Analysis
Statistical analysis was completed using the software package SPSS for Windows version 15 (SPSS Inc., Chicago, IL, USA). Non-parametric statistical tests were used for small sample sizes (N < 30), as for larger sample sizes (N ≥ 30) parametric statistical tests were used. Assumptions of normality were checked and verified using the one-sample Kolmogorov-Smirnov test. First, gender effects for AVQI and CPPS were investigated using the Student t-test. Secondly, between-cohort differences in AVQI and CPPS were investigated using the Mann-Whitney U test. Thirdly, differences across the normophonic cohort and the three sub-levels of slight dysphonia were investigated using the omnibus Kruskal-Wallis H test and post-hoc Mann-Whitney U tests. All results were considered statistically significant at p < 0.05. Upper and lower cut-off values were calculated using the standard deviation value: mean + and − (2 × standard deviation), respectively.

Normative Data for AVQI and CPPS
Descriptive data of AVQI and CPPS in normophonic and slightly dysphonic subjects are presented in Table 2, for both men and women. In the true normophonic cohort, a mean AVQI score of 2.48 was found. The upper limit of the 95% prediction interval was 4.04. For CPPS, mean values of CPPS vowel , CPPS sentences , and CPPS combination were 15.86, 7.45, and 14.27, respectively. Lower limits of their 95% prediction intervals were 10.84, 5.91, and 10.63, respectively.

Normal Versus Subclinical Voice Quality
Although mean AVQI was higher and mean CPPS vowel and mean CPPS combination were lower in the subclinical group (Table 4), none of the differences between normal and subclinical voice quality levels reached significance.

Subclinical Levels of Dysphonia
AVQI and CPPS data for the different subclinical levels of dysphonia are provided in Table 5. As more judges rated G as 1 (i.e., slight dysphonia), the higher mean AVQI values were observed. A significant difference was found between the true normophonic cohort and the second subclinical dysphonia cohort (Figure 3a, U = 392.00, p = 0.028). For CPPS vowel , a declining trend was observed with a significant difference between the true normophonic and the second subclinical dysphonia cohort (Figure 3b, U = 422.00, p = 0.048). For CPPS sentences and CPPS combination no consistent trend was observed (Figure 3c,d, respectively). Comparisons with the third subclinical cohort were not conducted since this cohort included only three subjects and the statistical power consequently would be too low.

Discussion
This study investigated the performance of AVQI and CPPS in a strictly normophonic subject group to enhance the interpretation of dysphonic voice samples and to increase the clinical utility of these measures in the assessment of overall voice quality. In 2009, Maryn et al. [5] developed the multiparametric tool AVQI and found a threshold score of 2.95 using receiver operating characteristics (ROC) curve analysis. This value needs to be interpreted as a cut-off score to discriminate between normophonic and dysphonic voices with a weighted sensitivity and specificity of 74% and 96%, respectively. In the present study, an upper cut-off value of 4.04 was found, which implies a very high specificity of 97.5% but low sensitivity. To our knowledge, only one other normative study on AVQI has been conducted. In 2017, Barsties et al. [39] performed AVQI analysis on concatenated voice samples of 123 vocally healthy Lithuanian speaking individuals and investigated the influence of age and gender. They reported a mean AVQI score of 2.32 and an upper limit value of 3.90, which approaches our findings. Similar to the present study, no significant gender effects were found. Consequently, one could suggest that AVQI is relatively unaffected by genderbased differences in vocal anatomy and physiology. In the present study, mean AVQI values were higher in subclinical subjects compared to normophonic subjects. This finding, however, did not reach significance. Furthermore, as more judges rated G as 1 (i.e., slight dysphonia), the higher mean AVQI values were observed. A significant difference was only found between the true normophonic cohort and the second subclinical dysphonia cohort.

Discussion
This study investigated the performance of AVQI and CPPS in a strictly normophonic subject group to enhance the interpretation of dysphonic voice samples and to increase the clinical utility of these measures in the assessment of overall voice quality. In 2009, Maryn et al. [5] developed the multiparametric tool AVQI and found a threshold score of 2.95 using receiver operating characteristics (ROC) curve analysis. This value needs to be interpreted as a cut-off score to discriminate between normophonic and dysphonic voices with a weighted sensitivity and specificity of 74% and 96%, respectively. In the present study, an upper cut-off value of 4.04 was found, which implies a very high specificity of 97.5% but low sensitivity. To our knowledge, only one other normative study on AVQI has been conducted. In 2017, Barsties et al. [39] performed AVQI analysis on concatenated voice samples of 123 vocally healthy Lithuanian speaking individuals and investigated the influence of age and gender. They reported a mean AVQI score of 2.32 and an upper limit value of 3.90, which approaches our findings. Similar to the present study, no significant gender effects were found. Consequently, one could suggest that AVQI is relatively unaffected by gender-based differences in vocal anatomy and physiology. In the present study, mean AVQI values were higher in subclinical subjects compared to normophonic subjects. This finding, however, did not reach significance. Furthermore, as more judges rated G as 1 (i.e., slight dysphonia), the higher mean AVQI values were observed. A significant difference was only found between the true normophonic cohort and the second subclinical dysphonia cohort.
It is also interesting to compare our CPPS values to those found in other studies with normative data, also using CPPS measurement in the software Praat. Latoszek et al. [40] performed CPPS measurements on concatenated voice samples of 530 normophonic voices and found a mean CPPS value of 11.92 (SD = 2.15). Phadke et al. [41] reported a mean value of CPPS of 13.9 (SD = 1.9) and 10.5 (SD = 1.2) for 40 normophonic vowel samples and speech samples, respectively. Heman-Ackah et al. [35] reported a mean CPPS value of 4.77 (SD = 0.97) for 87 normophonic speech samples using SpeechTool software and reported a cut-off value of 4.0 discriminating between normal and dysphonic voices with a sensitivity and specificity of 92.4% and 79% respectively, using receiver operating characteristic (ROC) curve analysis. These differences in CPPS values can be explained by different settings and methodology within and among software packages. Furthermore, comparisons with CPPS values found in the present study can only be carried out when similar settings in Praat are used (Figures 1  and 2). In the present study, specific significant gender effects were found for CPPS. Significantly higher values of CPPS vowel (t = −3.734, p < 0.001) and CPPS combination (t = −3.028, p = 0.004) were found in males and significantly higher values of CPPS sentences (t = 2.182, p = 0.032) in females (Table 5). It is known that the type of speech task causes important variability in the perceptual-auditory evaluation of voice quality. A previous study conducted by Maryn et al. [42] showed that the two speech tasks-i.e., sustained vowels and continuous speech-yielded significant differences in their ratings of degree of dysphonia severity (i.e., Grade). There is a tendency towards significantly higher G scores for sustained vowel samples than for continuous speech samples. The gender effect of these different types of speech tasks has however never been investigated. Based on the present study, the presumption could be made that, due to gender-related anatomical and/or physiological differences, males are somehow more fit to produce a clean normophonic sustained vowel and females are somehow more fit to produce a normophonic continuous speech task. This is, however, merely an assumption, which could be interesting to explore in future studies. Sustained vowel samples and concatenated samples show the same trend, which can be explained by the fact that in this study the sustained vowel task represents a greater contribution to the concatenated sample than the continuous speech task does.
There are some limitations regarding this study that are worth mentioning. First, the perceptual evaluation was performed by three raters, of whom two were final-year speech-language pathology students with only limited experience with dysphonic voices. Secondly, subjects mainly originated from the West-Flanders region of Belgium and were recruited by way of snowball sampling, since there was an appeal to volunteers such as family members, friends, and acquaintances. The relevance of this limitation can, however, be questioned, since multiple studies in the last decade already showed that AVQI is relatively unaffected by inter-language phonetic differences. Consequently, the influence of dialect can be negligible. Finally, only 123 subjects between 20 and 60 years old were included in the present study. In the future, a normative study with a greater number of subjects divided into different age groups is warranted to investigate whether AVQI and CPPS are able to differentiate significantly between the true normophonic cohort and the subclinical slight dysphonia cohort, as well to explore possible age effects. If AVQI and/or CPPS are found to be such sensitive measures to remark such small differences between normophonia and slight dysphonia, they could be useful as complementary tools in practice to support perceptual evaluation of voice quality in case of doubt.

Conclusions
In conclusion, the present study provided norm-referencing data of AVQI and CPPS to increase their clinical utility in practice. For AVQI, no significant gender effects were found. Consequently, one could suggest that AVQI is relatively unaffected by gender-based differences in vocal anatomy and physiology. For both AVQI and CPPS, no significant differences were found between normophonic and slightly dysphonic subjects, which emphasizes the need of a normative study with a greater sample size and subsequently greater statistical power to detect possible significant effects and differences. Future studies should be conducted with the newest version of AVQI (v.03.01). Conflicts of Interest: Youri Maryn has financial interest in an instructional and commercial website-based company (Phonanium) that provides information and sells software described in this manuscript.

Abbreviations
AVQI acoustic voice quality index CAPE-V consensus auditory-perceptual evaluation of voice CPPS smoothed version of the cepstral peak prominence G grade GRBAS scale grade, roughness, breathiness, asthenia and strain scale HNR harmonics-to-noise-ratio SL shimmer local SLdB shimmer local dB Slope general spectral slope Tilt spectral trendline inclination