Relationship of Cepstral Peak Prominence-Smoothed and Long-Term Average Spectrum with Auditory–Perceptual Analysis

: Cepstral peak prominence-smoothed (CPPs) and long-term average spectrum (LTAS) are robust measures that represent the glottal source and source-ﬁlter interactions, respectively. Until now, little has been known about how physiological events impact auditory–perceptual characteristics in the objective measures of CPPs and LTAS (alpha ratio; L1–L0). Thus, this paper aims to analyze the relationship between such acoustic measures and auditory–perceptual analysis and then determine which acoustic measure best represents voice quality. We analyzed 53 voice samples of vocally healthy participants (vocally healthy group-VHG) and 49 voice samples of participants with behavioral dysphonia (dysphonic group-DG). Each voice sample was composed of sustained vowel / a / and connected speech. CPPs seem to be the best predictor of voice deviation in both studied populations because there was moderate to strong negative correlations with general degree, breathiness, roughness, and strain (auditory–perceptual parameters). Regarding L1–L0, this measure is related to breathiness (moderate negative correlations). Hence, L1–L0 provides information about air leak through closed glottis, assisting the phonatory e ﬃ ciency analysis.


Introduction
According to Zhang [1], the voice is a suprasegmental aspect of speech that communicates meaning, ideas, opinions, and intentions by its different modulations. Thus, a balanced phonation system must convert aerodynamic energy into acoustic energy. Hence, the vocal folds (source) modulate the air through the glottis and produce acoustic energy, which will propagate and be modulated by the vocal tract (filter), thus amplifying or attenuating frequencies/harmonics. The harmonic energy modulation determines the speech sounds and their meanings, so the harmonic energy must be above the noise level for effective communication and its meanings to happen [1].
Voice disorders (i.e., dysphonia) include pitch, loudness, and vocal quality disturbances, varying from mild to complete vocal loss. Voice disorders decrease speech intelligibility and communication [2] as well as impact verbal and emotional messages and social interactions [3]. Thus, speech-language pathologists must assess the vocal quality so they can understand vocal disorders and then choose the best treatment (i.e., suppress the communicative handicap).
Acoustic analysis of vocal quality, along with auditory-perceptual analysis, is one of the components of the multidimensional voice assessment. It adds information on the voice signal vocal quality. Hence, when L0 (F 0 energy level) is higher regarding L1 (F1 energy level), there is a glottal hypoadduction, whereas when L0 is lower regarding L1, there is a glottal hyperadduction [22,31]. Furthermore, some studies observed that close values of L1 and L0 are associated with economic voice production [32,33].
Guzmán et al. [23] found that strained or projected emotional patterns, such as anger and joy, have higher (less negative) alpha ratio and L1-L0 values. Emotions with a higher level of breathiness, such as erotism, have lower values.
We observed that previous studies only analyzed CPPs in vocally healthy and dysphonic (behavioral dysphonia) women [4], participants with unilateral vocal fold paralysis (organic dysphonia) [17], and participants with behavioral dysphonia by only using sustained vowel [18]. At last, one study analyzed how emotional patterns affect LTAS measures. Hence, until now little has been known about how the physiological events that create the auditory-perceptual characteristics (general degree, roughness, breathiness, and strain) impact the objective measures of CPPs, alpha ratio, and L1-L0.
By analyzing the same variables in both groups (vocally healthy and dysphonic) and both genders, and with the sustained vowel and connected speech, this study aims to contribute to scientific progress by implementing objective voice assessment measures. The auditory-perceptual analysis has low reproducibility and depends on individual characteristics and speech pathologists' previous experiences (i.e., the examiner). Therefore, using robust algorithms with higher reproducibility and reliability to measure the voice signal such as CPPs and LTAS could be complementary to auditory-perceptual analysis, thus improving the voice assessment. Hence, it is key that we understand the correlations that lie between these acoustic measures and the auditory-perceptual parameters.
Thus, this study mainly aims to analyze the correlations between acoustic measures (CPPs, alpha ratio, and L1-L0) and auditory-perceptual analysis (general degree, roughness, breathiness, and strain). Moreover, its second aim is to determine which acoustic measures best represent the vocal quality of vocally healthy participants and of those who have behavioral dysphonia.

Study Design
This was an observational, transversal, and retrospective study.

Ethical Aspects
This research followed the rules of the National Health Council/National Research Ethics Committee. It was further approved by the Committee for Ethics in Research on Human Beings (no. 3.284.883).

Research Team
The research team consisted of four speech pathologists who were blinded to procedures:

Sample
All participants registered in the database completed an initial assessment protocol form (i.e., vocal complaints, self-referred health issues) and have been diagnosed with normal voice and larynx or behavioral dysphonia [34] (i.e., vocal complaints, benign mass lesion, vocal folds edema, glottic gap, and minor structural alterations).
We selected voices samples from the database of participants aged 18-50 years who had no vocal complaints and were vocally healthy to create the vocally healthy group (VHG), and voices samples of participants with behavioral dysphonia to create the dysphonic group (DG). Participants who reported endocrine, vascular, or pulmonary alterations or who were smokers were excluded. Thus, we selected a total of 102 voice samples, which were distributed as follows: (1) VHG-25 females, 28 males; 53 total participants (mean age = 31.0 years); (2) DG-29 females, 20 males; 49 total participants (mean age = 26.4 years).

Voice Recording
Participants had their voices recorded in an acoustically treated room at the Speech-Language Pathology and Audiology Clinic. We used an AKG microphone (model C 44 PP; AKG Acoustics GmbH, Viena, Austria) positioned at a 45º angle from the participants' mouth and 2 cm away from their labial commissure. The microphone was connected to a computer with a Creative Sound Blaster sound card (model Audigy II; Creative Technology Ltd., Singapore). We used the Sound Forge program (version 10; MAGIX, Munich, Germany) with a sample rate of 44,100 Hz. A 16-bit.wav extension was used to record the voices samples. Participants were asked to sustain for approximately 5 s, at a comfortable pitch and loudness, the vowel /a/ and then count from 1 to 10 (i.e., assessing continuous speech).

Acoustic Measures
CPPs, alpha ratio, and L1-L0 were measured by using the Praat software (version 6.0.43; https://www.fon.hum.uva.nl/praat) from the central 3 s of vowel /a/ emission and the entire continuous speech.
Open Praat and the voice sample and then select "Analyze Periodicity"; 2.
A Click on the new generated file, select "Query", and then click "Get CPPS"; 5.
A new window will open with the CPPs value.
To measure LTAS, the researchers used a spectral analysis with pitch correction in the standard values of the software. In the alpha ratio measure (spectral slope), they performed a difference in the harmonic distribution between the ranges of 50 to 1000 Hz and 1000 to 5000 Hz. In the L1-L0 measure (phonation mode), that difference was from 50 to 300 Hz and 300 to 800 Hz [23].

Auditory-Perceptual Analysis
Three experienced speech pathologists, who were blinded to the groups, analyzed the vowel /a/ emission and continuous speech with the following parameters: general degree, roughness, breathiness, and strain. They were instructed to complete a visual analogical scale (VAS) with 100 mm length at any point, indicating deviation intensity. Left indicated no deviation and right the highest deviation.
To obtain the deviation intensity, we calculated the mean values of each speech pathologist. Furthermore, they reanalyzed 20% of the samples to verify the intra-examiner reliability.

Data Analysis
The sample homogeneity assessment showed that all variables had a normal distribution, except for the auditory-perceptual parameter strain. Thus, the Pearson correlation coefficient was used to analyze the correlations, and the Spearman correlation coefficient was used to analyze those involving the strain parameter. Both tests had their significance level fixed at 5% (p < 0.05).
The systematic and random error calculation was used to determine auditory-perceptual analysis intra-examiner and inter-examiner reliability.

Results
According to Fleiss, Levin, and Paik [37] classification, intra-examiner reliability for vowel /a/ was excellent for all examiners (0.85, 0.88, and 0.88). For continuous speech, the reliability for two examiners was excellent (0.75 and 0.79), and the third was satisfactory (0.73). Regarding inter-examiner reliability, it was satisfactory among all examiners in vowel /a/ (0.51, 0.59, and 0.68) and continuous speech (0.64, 0.70, and 0.72). Tables 1 and 2 show a complimentary analysis aiming to compare the groups regarding both acoustic and auditory-perceptual analyses. We used a t-test with a significance level fixed at 5% (p < 0.05).  As mentioned before, the strain parameter of auditory-perceptual analysis does not have a normal distribution, so we used the Mann-Whitney test (p < 0.05) to complete the complementary analysis ( Figure 1).
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 13  Tables 3 and 4 show all the correlations results. The statistically significant results were highlighted with an *.
Between the acoustic variables and auditory-perceptual analysis, and for vowel /a/ emission (Table 3), VHG showed the following statistically significant correlations according to the Hinkle et al. [38] classification: moderate negative between CPPs and general degree (p < 0.001); strong negative between CPPs and breathiness (p < 0.001); weak positive between the alpha ratio and strain (p = 0.019); weak negative between L1-L0 and general degree (p = 0.004); and moderate negative between L1-L0 and breathiness (p < 0.001). DG had the following statistically significant correlations: moderate negative between CPPs and general degree (p < 0.001); strong negative between CPPs and breathiness (p < 0.001); moderate negative between CPPs and strain (p = 0.003); and moderate negative between L1-L0 and breathiness (p < 0.001). Table 3. Correlations analysis (r-value) between the intensity of auditory-perceptual parameters (mm) and acoustic measures (dB) for vowel /a/ emission in both groups: vocally healthy and dysphonic. Regarding continuous speech (Table 4), VHG had the following statistically significant correlations according to Hinkle, Wierma, and Jurs [38]: weak negative between CPPs and general degree (p = 0.007); moderate negative between CPPs and roughness (p = 0.001); and weak negative between the alpha ratio and breathiness. DG had the following statistically significant correlations: weak negative between the alpha ratio and roughness (p = 0.013); weak negative between L1-L0 and general degree (p = 0.039); and weak negative between L1-L0 and roughness (p = 0.049).  Tables 3 and 4 show all the correlations results. The statistically significant results were highlighted with an *. Table 3. Correlations analysis (r-value) between the intensity of auditory-perceptual parameters (mm) and acoustic measures (dB) for vowel /a/ emission in both groups: vocally healthy and dysphonic.  Table 4. Correlations analysis (r-value) between the intensity of auditory-perceptual parameters (mm) and acoustic measures (dB) for connected speech (counting 1-10) emission in both groups: vocally healthy and dysphonic. Between the acoustic variables and auditory-perceptual analysis, and for vowel /a/ emission (Table 3), VHG showed the following statistically significant correlations according to the Hinkle et al. [38] classification: moderate negative between CPPs and general degree (p < 0.001); strong negative between CPPs and breathiness (p < 0.001); weak positive between the alpha ratio and strain (p = 0.019); weak negative between L1-L0 and general degree (p = 0.004); and moderate negative between L1-L0 and breathiness (p < 0.001). DG had the following statistically significant correlations: moderate negative between CPPs and general degree (p < 0.001); strong negative between CPPs and breathiness (p < 0.001); moderate negative between CPPs and strain (p = 0.003); and moderate negative between L1-L0 and breathiness (p < 0.001).

CPPs
Regarding continuous speech (Table 4), VHG had the following statistically significant correlations according to Hinkle, Wierma, and Jurs [38]: weak negative between CPPs and general degree (p = 0.007); moderate negative between CPPs and roughness (p = 0.001); and weak negative between the alpha ratio and breathiness. DG had the following statistically significant correlations: weak negative between the alpha ratio and roughness (p = 0.013); weak negative between L1-L0 and general degree (p = 0.039); and weak negative between L1-L0 and roughness (p = 0.049). Figures 2 and 3 show the scatter plots for VHG and DG statistically significant correlations, respectively. Table 4. Correlations analysis (r-value) between the intensity of auditory-perceptual parameters (mm) and acoustic measures (dB) for connected speech (counting 1-10) emission in both groups: vocally healthy and dysphonic.

Discussion
The human voice is a product of the aerodynamic energy (i.e., subglottic pressure provided by the lungs) conversion into acoustic energy [39]. The transglottal airflow pulse is determined by how long vocal folds remain closed, which is associated with the intensity and definition of harmonics produced in the glottal source [40]. Hence, incomplete glottal closure provides turbulent airflow, which reduces the acoustic signal harmonic structure and increases the amplitude variability [4,41], which results in the audible vocal quality disturbances classified as breathiness [42]. Furthermore,

Discussion
The human voice is a product of the aerodynamic energy (i.e., subglottic pressure provided by the lungs) conversion into acoustic energy [39]. The transglottal airflow pulse is determined by how long vocal folds remain closed, which is associated with the intensity and definition of harmonics produced in the glottal source [40]. Hence, incomplete glottal closure provides turbulent airflow, which reduces the acoustic signal harmonic structure and increases the amplitude variability [4,41], which results in the audible vocal quality disturbances classified as breathiness [42]. Furthermore, differences in muscle tone and mass in vocal folds cause the variability of frequency and amplitude in the acoustic signal and result in an audible vocal quality disturbance classified as roughness [4]. Roughness is described as an unpleasant low-pitched noise [42]. At last, strain and general degree indicate the auditory perception of vocal effort and global characteristics of the voice, respectively [43].
Regarding auditory-perceptual analysis, since the VAS scores were below 35.5 points, the VHG scores indicate a normal variation of voice quality [44]. DG had a mild to moderate vocal quality deviation (35.6 to 50.5 points). So, the complementary analysis evidenced that VHG had statistically inferior values than DG.
Since auditory-perceptual analysis may be inconsistent and biased due to its subjectivity [19], acoustic measures (i.e., robust and objective method for voice analysis) must be associated with the auditory-perceptual analysis to assist the understanding of phonation system mechanisms. Besides, CPPs and LTAS measures use robust algorithms, are more accurate, allow a complete analysis, are not invasive, are low cost as well as more used in studies to assess the vocal quality [16,[23][24][25][26]42,[45][46][47]. Thus, this study contributes to evidence by associating robust measures with auditory-perceptual analysis.
CPPs amplitude provides information on the dominant rahmonic periodicity, which corresponds to F 0 [9]. The harmonics are multiples of F 0 and are produced at glottal source in a one-second period (i.e., the occurrence moment of cepstral peak), called quefrency, which determines the period of a glottal cycle [15] and can be expressed by: where T is the period. In this case, it corresponds to quefrency (s), and F 0 is the fundamental frequency value (Hz). We highlight that rahmonic and quefrency are anagrams to harmonic and frequency, respectively. When CPPs have increased amplitude, there is a better harmonic definition, which is a characteristic of vocal fold cycle regularity [11]. We could observe that in our study as, for vowel /a/ emission, there were negative correlations with breathiness and general degree in both groups and with strain in DG. Regarding continuous speech, there were negative correlations with general degree and roughness in VHG. In other words, we believe that voices with an increased deviation of general degree, roughness, and breathiness have a less regular acoustic signal produced in the glottal source, thus decreased CPPs. Furthermore, excessive contraction of intrinsic and extrinsic laryngeal muscles that result in the auditory-perceptual parameter of strain [42], present with a higher deviation in dysphonics, can also reduce the acoustic signal periodicity.
The alpha ratio provides information on the spectral slope [27], and there was a negative correlation with breathiness for continuous speech in VHG. We observed that when there are increased alpha ratio values (less negative values), the deviation degree of breathiness decreases. A possible explanation is when the vocal tract (filter) is inertive and the flow declination rate is faster than the glottis area declination rate, the voice is brighter and more projected, leading to a less steep spectral slope (i.e., more energy in harmonics above 1 kHz) [33,48,49]. It is important to highlight that the glottis must be normotensive and close properly.
On the other hand, when there is a loudness or phonatory system rigidity increase, the subglottic pressure also increases to start phonation, making the closed phase of the glottal cycle longer [23] and increasing vocal strain. This situation also leads to a less steep spectral slope [23,28,42]. Our finds corroborate this information as we observed positive correlation (weak) of the alpha ratio and strain in VHG, indicating that the increase of vocal strain (even with a normal variation of voice quality) also leads to a less steep spectral slope.
Even though participants in DG have behavioral dysphonia, characterized by increased loudness and closed phase of the glottal cycle, there were no significant correlations of the alpha ratio with breathiness and strain. However, for continuous speech, there was a negative correlation between the alpha ratio and roughness. We believe that the vocal quality aperiodicity of frequency and amplitude reduced the intensity of the harmonics above 1 kHz, thus decreasing alpha ratio values.
As previously described [23,28,42], the alpha ratio can indicate a less steep spectral slope in two situations: higher vocal projection and increased vocal strain. This fact was found in the complementary analysis of the current study. There was no significant difference between both groups, VHG and DG, for alpha ratio. We believe that people with behavioral dysphonia have vocal strain to compensate for other deviations of vocal quality and the inefficiency of the phonatory system. The absence of a significant difference between groups was also observed for L1-L0.
L1-L0 measure provides information on the phonation mode and is the difference between the F1 intensity level (L1) and the F 0 intensity level (L0) [31]. According to the non-linear source-filter coupling theory [50], the various filter configurations influence in vocal fold vibrations and emphasize different harmonics. Thus, the formants represent the mode that supraglottal airflow is molded by the filer [51]. Hence, the F1 intensity is determined by the mandibular lowering [51] and by the emission intensity [25]. At last, previous studies found that, when L0 is near or just below L1, the source has an efficient vibratory pattern due to the reactive vocal tract, and there is an improved vocal economy [32,52,53].
The significant correlations with L1-L0 were negative, which indicates that when that measure has lower values, there is a high auditory-perceptual parameter deviation. This could be seen in the current study regarding general degree and breathiness in both groups and roughness for DG. Breathy vocal patterns reduce L1-L0 [23], which was expected and corroborated our findings. Gauffin and Sundberg [28] suggest that the spectrum of voices with high glottal adduction L0 has lower energy than L1, however, the spectrum of voices with insufficient glottal adduction L0 has higher energy than L1. It is important to highlight that normal voices can also be breathy as a normal variation of voice quality.
The negative correlation of L1-L0 with roughness in DG was a surprising finding. We believe that roughness, a perceptual-auditory parameter generated from differences in muscle tone and mass in vocal folds (benign mass lesion; condition of DG), may have affected the phonation mode, making it less efficient with lower values of L1-L0.
Although continuous speech better represents the daily vocal use [54,55], this voice sample may have higher variability on its auditory-perceptual analysis due to the influences of each individual's articulation, prosody, and accent [4,34]. However, a sustained vowel, which is not biased in that way, has its auditory-perceptual parameter easily analyzed and may provide more information about the source. Thus, we think that was the reason for the weakest correlation degree with acoustic measures and auditory-perceptual analysis in continuous speech.
We encourage future studies to analyze the correlations of voice intensity with CPPs and LTAS measures. Thus, a complete and objective analysis will be possible. Also, analyzing these measures in different vocal conditions may provide a better understanding of phonatory adjustments among voice pathologies and source-filter interactions.
At last, the current study corroborates previous studies that analyzed other populations, such as vocally healthy and dysphonic women [4], participants with unilateral vocal fold paralysis [17], and analysis of sustained vowel in only dysphonic participants (women and men) [18]. Thus, the current study contributes to scientific evidence regarding the relationship and complementarity of the analysis of robust acoustic measures (CPPs, alpha ratio, and L1-L0). Hence, it provides a complete analysis of the vocal sample (sustained vowel and continuous speech) in vocally healthy participants and ones with behavioral dysphonia.
Since CPPs and LTAS measurements are more accurate, enable a complete analysis, are not invasive, and are low cost, they may be used in the clinical practice to assess clients and as research outcomes.

Conclusions
CPPs appear to be the best predictor of voice deviation in both studied populations because there was moderate to strong negative correlations with general degree, breathiness, roughness, and strain. Regarding L1-L0, because of moderate negative correlations, this measure is related to breathiness. Hence, L1-L0 provides information on air leakage through the closed glottis, assisting the phonatory efficiency analysis.
At last, the results which permeate alpha ratio analysis do not objectively reflect the physiological events of vocal production. Therefore, due to most of the weak correlations and their ambiguity, we suggest that this measure cannot stand alone. Thus, there needs to be other measures, such as CPPs and L1-L0, to complete it and provide a better understanding.

Funding:
We would like to thank the São Paulo Research Foundation (grant #2018/15768-5) for supporting the current research. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Conflicts of Interest:
The authors declare no conflict of interest.