Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (31)

Search Parameters:
Keywords = vowel quality

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 2817 KiB  
Article
Escalate Prognosis of Parkinson’s Disease Employing Wavelet Features and Artificial Intelligence from Vowel Phonation
by Rumana Islam and Mohammed Tarique
BioMedInformatics 2025, 5(2), 23; https://doi.org/10.3390/biomedinformatics5020023 - 30 Apr 2025
Viewed by 1416
Abstract
Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has [...] Read more.
Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has been reported in the literature that PD affects the voice quality of patients at an early stage. Hence, speech samples can be used as biomarkers of PD, provided that suitable voice features and artificial intelligence algorithms are employed. Methods: Advanced signal-processing techniques are used to extract audio features from the sustained vowel ‘/a/’ sound. The extracted audio features include baseline features, intensities, formant frequencies, bandwidths, vocal fold parameters, and Mel-frequency cepstral coefficients (MFCCs) to form a feature vector. Then, this feature vector is further enriched by including wavelet-based features to form the second feature vector. For classification purposes, two popular machine learning models, namely, support vector machine (SVM) and k-nearest neighbors (kNNs), are trained to distinguish patients with PD. Results: The results demonstrate that the inclusion of wavelet-based voice features enhances the performance of both the SVM and kNN models for PD detection. However, kNN provides better accuracy, detection speed, training time, and misclassification cost than SVM. Conclusions: This work concludes that wavelet-based voice features are important for detecting neurodegenerative diseases like PD. These wavelet features can enhance the classification performance of machine learning models. This work also concludes that kNN is recommendable over SVM for the investigated voice features, despite the inclusion and exclusion of the wavelet features. Full article
Show Figures

Figure 1

14 pages, 2353 KiB  
Article
Sensitivity of Acoustic Voice Quality Measures in Simulated Reverberation Conditions
by Ahmed M. Yousef and Eric J. Hunter
Bioengineering 2024, 11(12), 1253; https://doi.org/10.3390/bioengineering11121253 - 11 Dec 2024
Cited by 3 | Viewed by 1125
Abstract
Room reverberation can affect oral/aural communication and is especially critical in computer analysis of voice. High levels of reverberation can distort voice recordings, impacting the accuracy of quantifying voice production quality and vocal health evaluations. This study quantifies the impact of additive simulated [...] Read more.
Room reverberation can affect oral/aural communication and is especially critical in computer analysis of voice. High levels of reverberation can distort voice recordings, impacting the accuracy of quantifying voice production quality and vocal health evaluations. This study quantifies the impact of additive simulated reverberation on otherwise clean voice recordings as reflected in voice metrics commonly used for voice quality evaluation. From a larger database of voice recordings collected in a low-noise, low-reverberation environment, voice samples of a sustained [a:] vowel produced at two different speaker intents (comfortable and clear) by five healthy voice college-age female native English speakers were used. Using the reverb effect in Audacity, eight reverberation situations indicating a range of reverberation times (T20 between 0.004 and 1.82 s) were simulated and convolved with the original recordings. All voice samples, both original and reverberation-affected, were analyzed using freely available PRAAT software (version 6.0.13) to calculate five common voice parameters: jitter, shimmer, harmonic-to-noise ratio (HNR), alpha ratio, and smoothed cepstral peak prominence (CPPs). Statistical analyses assessed the sensitivity and variations in voice metrics to a range of simulated room reverberation conditions. Results showed that jitter, HNR, and alpha ratio were stable at simulated reverberation times below T20 of 1 s, with HNR and jitter more stable in the clear vocal style. Shimmer was highly sensitive even at T20 of 0.53 s, which would reflect a common room, while CPPs remained stable across all simulated reverberation conditions. Understanding the sensitivity and stability of these voice metrics to a range of room acoustics effects allows for targeted use of certain metrics even in less controlled environments, enabling selective application of stable measures like CPPs and cautious interpretation of shimmer, ensuring more reliable and accurate voice assessments. Full article
(This article belongs to the Special Issue Models and Analysis of Vocal Emissions for Biomedical Applications)
Show Figures

Figure 1

18 pages, 2045 KiB  
Article
An Acoustic–Phonetic Description of Hidatsa Vowels
by John P. Boyle, Jiaang Dong, Armik Mirzayan and V. B. Scott
Languages 2024, 9(10), 315; https://doi.org/10.3390/languages9100315 - 29 Sep 2024
Viewed by 1126
Abstract
In this study, we report on results of a preliminary acoustic–phonetic analysis of the Hidatsa vowel system. We conducted acoustic measurements of Hidatsa vowels in terms of averaged temporal and spectral properties of these phones. Our durational analysis provides strong evidence that Hidatsa [...] Read more.
In this study, we report on results of a preliminary acoustic–phonetic analysis of the Hidatsa vowel system. We conducted acoustic measurements of Hidatsa vowels in terms of averaged temporal and spectral properties of these phones. Our durational analysis provides strong evidence that Hidatsa has a ten-vowel system with phonemically long and short vowels, in addition to two diphthongs. Our spectral measurements consisted of averages and time-evolution dynamic properties of the first three formants (F1, F2 and F3) at 30 equally spaced time points along the central portion of each vowel. The centers and distributions of the F1 and F2 formants, as well as their time-averaged trajectories, provide strong evidence for separate vowel qualities for both the short and long vowels. These measurements also show that all Hidatsa vowels have some degree of time-dependent spectral change, with the back vowels generally displaying a longer time-evolution track. Lastly, our results also indicate that in Hidatsa mid-short vowels do not appear with the same frequency as the other vowels, and that the short [é] has no unstressed counterpart. Full article
(This article belongs to the Special Issue An Acoustic Analysis of Vowels)
Show Figures

Figure 1

29 pages, 2920 KiB  
Article
Acoustic Analysis of Vowels in Australian Aboriginal English Spoken in Victoria
by Debbie Loakes and Adele Gregory
Languages 2024, 9(9), 299; https://doi.org/10.3390/languages9090299 - 12 Sep 2024
Viewed by 1267
Abstract
(1) Background: Australian Aboriginal English (AAE) is a variety known to differ in various ways from the mainstream, but to date very little phonetic analysis has been carried out. This study is a description of L1 Aboriginal English in southern Australia, aiming to [...] Read more.
(1) Background: Australian Aboriginal English (AAE) is a variety known to differ in various ways from the mainstream, but to date very little phonetic analysis has been carried out. This study is a description of L1 Aboriginal English in southern Australia, aiming to comprehensively describe the acoustics of vowels, focusing in particular on vowels known to be undergoing change in Mainstream Australian English. Previous work has focused on static measures of F1/F2, and here we expand on this by adding duration analyses, as well as dynamic F1/F2 measures. (2) Methods: This paper uses acoustic-phonetic analyses to describe the vowels produced by speakers of Aboriginal Australian English from two communities in southern Australia (Mildura and Warrnambool). The focus is vowels undergoing change in the mainstream variety–the short vowels in KIT, DRESS, TRAP, STRUT, LOT, and the long vowel GOOSE; focusing on duration, and static and dynamic F1/F2. As part of this description, we analyse the data using the sociophonetic variables gender, region, and age, and also compare the Aboriginal Australian English vowels to those of Mainstream Australian English. (3) Results: On the whole, for duration, few sociophonetic differences were observed. For static F1/F2, we saw that L1 Aboriginal English vowel spaces tend to be similar to Mainstream Australian English but can be analysed as more conservative (having undergone less change) as has also been observed for L2 Aboriginal English, in particular for KIT, DRESS, and TRAP. The Aboriginal English speakers had a less peripheral vowel space than Mainstream Australian English speakers. Dynamic analyses also highlighted dialectal differences between Aboriginal and Mainstream Australian English speakers, with greater F1/F2 movement in the trajectories of vowels overall for AAE speakers, which was more evident for some vowels (TRAP, STRUT, LOT, and GOOSE). Regional differences in vowel quality between the two locations were minimal, and more evident in the dynamic analyses. (4) Conclusions: This paper further highlights how Aboriginal Australian English is uniquely different from Mainstream Australian English with respect to certain vowel differences, and it also highlights some ways in which the varieties align. The differences, i.e., a more compressed vowel space, and greater F1/F2 movement in the trajectories of short vowels for AAE speakers, are specific ways that Aboriginal Australian English and Mainstream Australian English accents are different in these communities in the southern Australian state of Victoria. Full article
(This article belongs to the Special Issue An Acoustic Analysis of Vowels)
Show Figures

Figure 1

43 pages, 11069 KiB  
Article
Maintenance of Lexical Pitch Accent in Heritage Lithuanian: A Study of Perception and Production
by Jessica Kantarovich
Languages 2024, 9(9), 296; https://doi.org/10.3390/languages9090296 - 3 Sep 2024
Viewed by 1376
Abstract
This study investigates how the unique circumstances of heritage language acquisition impact prosody, an understudied aspect of heritage speech. I examine the perception and production of lexical pitch accent by two generations of heritage Lithuanian speakers in Chicago (n = 13), with [...] Read more.
This study investigates how the unique circumstances of heritage language acquisition impact prosody, an understudied aspect of heritage speech. I examine the perception and production of lexical pitch accent by two generations of heritage Lithuanian speakers in Chicago (n = 13), with a qualitative comparison to one normative native speaker also living in Chicago. The speakers participated in the following: (1) a perception task requiring them to identify meaning distinctions between pairs of words that differ only by accent; and (2) a production task in which they produced sentences containing nine nominal declensions, where pitch accent plays a morphological role. In task (1), speakers across the board were not able to identify meaning distinctions in accent-based minimal pairs, irrespective of their frequency, and were more accurate at perceiving pairs that differed on the basis of segmental phonological features. However, HSs with more education perceived more accent-based distinctions, as did HSs who were more engaged in the Chicago community. Older HSs maintained more distinctions than either the NS or the younger HSs, which suggests a change in progress in the language or the Chicago Lithuanian community. In task (2), none of the speakers consistently used pitch to signal word-level prominence. Instead, all speakers relied on changes in duration and vowel quality to signal word-level prominence, suggesting that, for these speakers, there has been a shift to a stress-accent system. The older HSs also patterned more like the NS in their retention of the expected stress in the nominal declensions. Dialect was also determined to play a role in the retention of standard accent patterns in both perception and production. Full article
Show Figures

Figure 1

28 pages, 4013 KiB  
Article
Buenas no[tʃ]es y mu[ts]isimas gracias: A Sociophonetic Study of the Alveolar Affricate in Peninsular Spanish Political Speech
by Matthew Pollock
Languages 2024, 9(6), 218; https://doi.org/10.3390/languages9060218 - 14 Jun 2024
Viewed by 2093
Abstract
While variation in the southern Peninsular Spanish affricate /tʃ/ has been considered in the context of deaffrication to [ʃ], this study examines an emergent variant [ts] in the context of sociolinguistic identity and style in political speech. Based on a corpus of public [...] Read more.
While variation in the southern Peninsular Spanish affricate /tʃ/ has been considered in the context of deaffrication to [ʃ], this study examines an emergent variant [ts] in the context of sociolinguistic identity and style in political speech. Based on a corpus of public speech from Madrid and Andalusia, Spain, this study examines the phonetic and sociolinguistic characteristics of the affricate, finding variation in the quality of the frication portion of the segment through an analysis of segment duration (ms), the center of gravity (Hz), and a categorical identification of realization type. The results suggest that both linguistic variables, like phonetic environment, stress, lexical frequency, and following vowel formant height, as well as extralinguistic variables, like speaker city, gender, political affiliation, and speech context, condition use. Based on these findings, it appears that production of the alveolar affricate [ts] is an incipient sociolinguistic marker in the process of acquiring social meaning. It is particularly associated with female speech and prestige norms that transcend regional identification. This alveolar variant serves as an additional sociolinguistic resource accessible for identity development among politicians and offers insight into ongoing change in the affricate inventory of southern and northern-central Peninsular Spanish. Full article
(This article belongs to the Special Issue Phonetics and Phonology of Ibero-Romance Languages)
Show Figures

Figure 1

18 pages, 3932 KiB  
Article
Phonation Patterns in Spanish Vowels: Spectral and Spectrographic Analysis
by Carolina González, Susan L. Cox and Gabrielle R. Isgar
Languages 2024, 9(6), 214; https://doi.org/10.3390/languages9060214 - 12 Jun 2024
Viewed by 2051
Abstract
This article provides a detailed examination of voice quality in word-final vowels in Spanish. The experimental task involved the pronunciation of words in two prosodic contexts by native Spanish speakers from diverse dialects. A total of 400 vowels (10 participants × 10 words [...] Read more.
This article provides a detailed examination of voice quality in word-final vowels in Spanish. The experimental task involved the pronunciation of words in two prosodic contexts by native Spanish speakers from diverse dialects. A total of 400 vowels (10 participants × 10 words × 2 contexts × 2 repetitions) were analyzed acoustically in Praat. Waveforms and spectrograms were inspected visually for voice, creak, breathy voice, and devoicing cues. In addition, the relative amplitude difference between the first two harmonics (H1–H2) was obtained via FFT spectra. The findings reveal that while creaky voice is pervasive, breathy voice is also common, and devoicing occurs in 11% of tokens. We identify multiple phonation types (up to three) within the same vowel, of which modal voice followed by breathy voice was the most common combination. While creaky voice was more frequent overall for males, modal voice tended to be more common in females. In addition, creaky voice was significantly more common at the end of higher prosodic constituents. The analysis of spectral tilt shows that H1–H2 clearly distinguishes breathy voice from modal voice in both males and females, while H1–H2 values consistently discriminate creaky and modal voice in male participants only. Full article
(This article belongs to the Special Issue Phonetics and Phonology of Ibero-Romance Languages)
Show Figures

Figure 1

46 pages, 2878 KiB  
Article
A Stratal Phonological Analysis of Stem-Level and Word-Level Effects in Old French Compensatory Vowel Lengthening upon Coda /s/ Deletion
by Francisco Antonio Montaño
Languages 2024, 9(5), 177; https://doi.org/10.3390/languages9050177 - 13 May 2024
Viewed by 2198
Abstract
The well-known deletion of coda sibilants in Old French (11th–14th centuries) induced a compensatory lengthening effect on the preceding vowel, generally described as applying uniformly where coda /s/ was lost. This study highlights and analyzes phonological contexts where lengthening likely did not occur, [...] Read more.
The well-known deletion of coda sibilants in Old French (11th–14th centuries) induced a compensatory lengthening effect on the preceding vowel, generally described as applying uniformly where coda /s/ was lost. This study highlights and analyzes phonological contexts where lengthening likely did not occur, examining their interaction with stress assignment, vowel quality, schwa adjustment, prothesis, and morphological structure. The Stratal OT analysis formalizes the proposed pattern differentiating the long and short vowel reflexes identified especially for mid vowels: while categorical in tonic syllables and low vowels /a, ɑ/ irrespective of stress, lengthening only prevails in atonic mid vowels when coda /s/ deletion impacts a syllable assigned stress within the specific stratal phonological cycle when /s/ is deleted from input. The resulting length is transmitted and preserved in subsequent stratal cycles regardless of eventual word-level stress reassignment, especially (but not exclusively) because of word-level schwa adjustment, allowing a shift to word-final stress and producing an opacity effect of a long atonic mid vowel inherited from an earlier cycle. The stratal account formalizes observed analogical effects between lexical items and derived forms with respect to vowel quality and length and proposes them to result instead from the interplay of morphology and phonology. Full article
(This article belongs to the Special Issue Phonetic and Phonological Complexity in Romance Languages)
Show Figures

Figure 1

17 pages, 4441 KiB  
Article
The Effect of Pitch Accent on the Perception of English Lexical Stress: Evidence from English and Mandarin Chinese Listeners
by Fenqi Wang, Delin Deng, Kevin Tang and Ratree Wayland
Languages 2024, 9(3), 87; https://doi.org/10.3390/languages9030087 - 1 Mar 2024
Cited by 1 | Viewed by 4449
Abstract
The relative weighting of f0 and vowel reduction in English spoken word recognition at the sentence level were investigated in one two-alternative forced-choice word identification experiment. In the experiment, an H* pitch-accented or a deaccented word fragment (e.g., AR- in the word [...] Read more.
The relative weighting of f0 and vowel reduction in English spoken word recognition at the sentence level were investigated in one two-alternative forced-choice word identification experiment. In the experiment, an H* pitch-accented or a deaccented word fragment (e.g., AR- in the word archive) was presented at the end of a carrier sentence for identification. The results of the experiment revealed differences in the cue weighting of English lexical stress perception between native and non-native listeners. For native English listeners, vowel quality was a more prominent cue than f0, while native Mandarin Chinese listeners employed both vowel quality and f0 in a comparable fashion. These results suggested that (a) vowel reduction is superior to f0 in signaling initial stress in the words and (b) f0 facilitates the recognition of word initial stress, which is modulated by first language. Full article
(This article belongs to the Special Issue Advances in L2 Perception and Production)
Show Figures

Figure 1

30 pages, 2462 KiB  
Systematic Review
Systematic Review of Auditory Training Outcomes in Adult Cochlear Implant Recipients and Meta-Analysis of Outcomes
by James R. Dornhoffer, Shreya Chidarala, Terral Patel, Karl R. Khandalavala, Shaun A. Nguyen, Kara C. Schvartz-Leyzac, Judy R. Dubno, Matthew L. Carlson, Aaron C. Moberly and Theodore R. McRackan
J. Clin. Med. 2024, 13(2), 400; https://doi.org/10.3390/jcm13020400 - 11 Jan 2024
Cited by 5 | Viewed by 3111
Abstract
Objective: to review evidence on the efficacy of auditory training in adult cochlear implant recipients. Data Sources: PRISMA guidelines for a systematic review of the literature were followed. PubMed, Scopus, and CINAHL databases were queried on 29 June 2023 for terms involving cochlear [...] Read more.
Objective: to review evidence on the efficacy of auditory training in adult cochlear implant recipients. Data Sources: PRISMA guidelines for a systematic review of the literature were followed. PubMed, Scopus, and CINAHL databases were queried on 29 June 2023 for terms involving cochlear implantation and auditory training. Studies were limited to the English language and adult patient populations. Study Selection: Three authors independently reviewed publications for inclusion in the review based on a priori inclusion and exclusion criteria. Inclusion criteria encompassed adult cochlear implant populations, an analysis of clinician- or patient-directed auditory training, and an analysis of one or more measures of speech recognition and/or patient-reported outcome. Exclusion criteria included studies with only pediatric implant populations, music or localization training in isolation, and single-sample case studies. Data Extraction: The data were collected regarding study design, patient population, auditory training modality, auditory training timing, speech outcomes, and data on the durability of outcomes. A quality assessment of the literature was performed using a quality metric adapted from the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group guidelines. Data Synthesis and Meta-Analysis: Data were qualitatively summarized for 23 studies. All but four studies demonstrated significant improvement in at least one measured or patient-reported outcome measure with training. For 11 studies with sufficient data reporting, pre-intervention and post-intervention pooled means of different outcome measures were compared for 132 patients using meta-analysis. Patient-direct training was associated with significant improvement in vowel-phoneme recognition and speech recognition in noise (p < 0.05 and p < 0.001, respectively), and clinician-directed training showed significant improvement in sentence recognition in noise (p < 0.001). Conclusions: The literature on auditory training for adult cochlear implant recipients is limited and heterogeneous, including a small number of studies with limited levels of evidence and external validity. However, the current evidence suggests that auditory training can improve speech recognition in adult cochlear implant recipients. Full article
Show Figures

Figure 1

15 pages, 1856 KiB  
Article
Interaction of Voice Onset Time with Vocal Hyperfunction and Voice Quality
by Maria Francisca de Paula Soares, Marília Sampaio and Meike Brockmann-Bauser
Appl. Sci. 2023, 13(15), 8956; https://doi.org/10.3390/app13158956 - 4 Aug 2023
Cited by 1 | Viewed by 2465
Abstract
The main aim of the present work was to investigate whether vocal hyperfunction (VH), perceptual voice quality (VQ), gender, and phonetic environment influence Voice Onset Time (VOT). The investigated group consisted of 30 adults, including 19 women (X = 46.1 ± 13.7 years) [...] Read more.
The main aim of the present work was to investigate whether vocal hyperfunction (VH), perceptual voice quality (VQ), gender, and phonetic environment influence Voice Onset Time (VOT). The investigated group consisted of 30 adults, including 19 women (X = 46.1 ± 13.7 years) and 11 men (X = 47.5 ± 11.0 years), who had either phonotraumatic vocal hyperfunction (PVH) and non-phonotraumatic vocal hyperfunction (NPVH). VQ was judged considering the overall severity of dysphonia (OS) and the subcharacteristics of roughness, breathiness, and strain. Phonetic variables such as vowel stress, syllable stress, and mode of speech task were analyzed. Four samples of syllables with [p] plus vowel or diphthong were retrieved from CAPE-V sentences recordings. Acoustic analysis with Praat comprised VOT, mean fundamental frequency (fo), intensity (SPL dB(A)), and coefficient of variation of fundamental frequency (CV_fo %). VOT was significantly influenced by OS (p ≤ 0.001) but not by vocal VH condition (PVH versus NPVH) (p = 0.90). However, CV_fo was affected by the VH condition (p = 0.02). Gender effects were only found for mean fo (p ≤ 0.001) and SPL (p = 0.01). All VQ sub characteristics (OS, roughness, breathiness, and strain) correlated with VOT (p ≤ 0.001) and SPL (p ≤ 0.001) but not with fo. In summary, VOT was affected by voice quality, while it was not affected by vocal hyperfunction conditions. Therefore, VOT has the potential to objectively describe the onset of voicing in voice diagnostics, and may be one underlying objective characteristic of perceptual vocal quality. Full article
(This article belongs to the Special Issue Computational Methods and Engineering Solutions to Voice III)
Show Figures

Figure 1

16 pages, 3585 KiB  
Article
Voice Disorder Multi-Class Classification for the Distinction of Parkinson’s Disease and Adductor Spasmodic Dysphonia
by Valerio Cesarini, Giovanni Saggio, Antonio Suppa, Francesco Asci, Antonio Pisani, Alessandra Calculli, Rayan Fayad, Mohamad Hajj-Hassan and Giovanni Costantini
Appl. Sci. 2023, 13(15), 8562; https://doi.org/10.3390/app13158562 - 25 Jul 2023
Cited by 10 | Viewed by 2586
Abstract
Parkinson’s Disease and Adductor-type Spasmodic Dysphonia are two neurological disorders that greatly decrease the quality of life of millions of patients worldwide. Despite this great diffusion, the related diagnoses are often performed empirically, while it could be relevant to count on objective measurable [...] Read more.
Parkinson’s Disease and Adductor-type Spasmodic Dysphonia are two neurological disorders that greatly decrease the quality of life of millions of patients worldwide. Despite this great diffusion, the related diagnoses are often performed empirically, while it could be relevant to count on objective measurable biomarkers, among which researchers have been considering features related to voice impairment that can be useful indicators but that can sometimes lead to confusion. Therefore, here, our purpose was aimed at developing a robust Machine Learning approach for multi-class classification based on 6373 voice features extracted from a convenient voice dataset made of the sustained vowel/e/ and an ad hoc selected Italian sentence, performed by 111 healthy subjects, 51 Parkinson’s disease patients, and 60 dysphonic patients. Correlation, Information Gain, Gain Ratio, and Genetic Algorithm-based methodologies were compared for feature selection, to build subsets analyzed by means of Naïve Bayes, Random Forest, and Multi-Layer Perceptron classifiers, trained with a 10-fold cross-validation. As a result, spectral, cepstral, prosodic, and voicing-related features were assessed as the most relevant, the Genetic Algorithm performed as the most effective feature selector, while the adopted classifiers performed similarly. In particular, a Genetic Algorithm + Naïve Bayes approach brought one of the highest accuracies in multi-class voice analysis, being 95.70% for a sustained vowel and 99.46% for a sentence. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

10 pages, 1294 KiB  
Article
Advances in Clinical Voice Quality Analysis with VOXplot
by Ben Barsties v. Latoszek, Jörg Mayer, Christopher R. Watts and Bernhard Lehnert
J. Clin. Med. 2023, 12(14), 4644; https://doi.org/10.3390/jcm12144644 - 12 Jul 2023
Cited by 14 | Viewed by 3021
Abstract
Background: The assessment of voice quality can be evaluated perceptually with standard clinical practice, also including acoustic evaluation of digital voice recordings to validate and further interpret perceptual judgments. The goal of the present study was to determine the strongest acoustic voice quality [...] Read more.
Background: The assessment of voice quality can be evaluated perceptually with standard clinical practice, also including acoustic evaluation of digital voice recordings to validate and further interpret perceptual judgments. The goal of the present study was to determine the strongest acoustic voice quality parameters for perceived hoarseness and breathiness when analyzing the sustained vowel [a:] using a new clinical acoustic tool, the VOXplot software. Methods: A total of 218 voice samples of individuals with and without voice disorders were applied to perceptual and acoustic analyses. Overall, 13 single acoustic parameters were included to determine validity aspects in relation to perceptions of hoarseness and breathiness. Results: Four single acoustic measures could be clearly associated with perceptions of hoarseness or breathiness. For hoarseness, the harmonics-to-noise ratio (HNR) and pitch perturbation quotient with a smoothing factor of five periods (PPQ5), and, for breathiness, the smoothed cepstral peak prominence (CPPS) and the glottal-to-noise excitation ratio (GNE) were shown to be highly valid, with a significant difference being demonstrated for each of the other perceptual voice quality aspects. Conclusions: Two acoustic measures, the HNR and the PPQ5, were both strongly associated with perceptions of hoarseness and were able to discriminate hoarseness from breathiness with good confidence. Two other acoustic measures, the CPPS and the GNE, were both strongly associated with perceptions of breathiness and were able to discriminate breathiness from hoarseness with good confidence. Full article
(This article belongs to the Special Issue New Advances in the Management of Voice Disorders)
Show Figures

Figure 1

31 pages, 9159 KiB  
Article
Distributional and Acoustic Characteristics of Filler Particles in German with Consideration of Forensic-Phonetic Aspects
by Beeke Muhlack, Jürgen Trouvain and Michael Jessen
Languages 2023, 8(2), 100; https://doi.org/10.3390/languages8020100 - 31 Mar 2023
Viewed by 2929
Abstract
In this study, we investigate the use of the filler particles (FPs) uh, um, hm, as well as glottal FPs and tongue clicks of 100 male native German speakers in a corpus of spontaneous speech. For this purpose, the frequency [...] Read more.
In this study, we investigate the use of the filler particles (FPs) uh, um, hm, as well as glottal FPs and tongue clicks of 100 male native German speakers in a corpus of spontaneous speech. For this purpose, the frequency distribution, FP duration, duration of pauses surrounding FPs, voice quality of FPs, and their vowel quality are investigated in two conditions, namely, normal speech and Lombard speech. Speaker-specific patterns are investigated on the basis of twelve sample speakers. Our results show that tongue clicks and glottal FPs are as common as typically described FPs, and should be a part of disfluency research. Moreover, the frequency of uh, um, and hm decreases in the Lombard condition while the opposite is found for tongue clicks. Furthermore, along with the usual F1 increase, a considerable reduction in vowel space is found in the Lombard condition for the vowels in uh and um. A high degree of within- and between-speaker variation is found on the individual speaker level. Full article
(This article belongs to the Special Issue Pauses in Speech)
Show Figures

Figure 1

32 pages, 3868 KiB  
Article
Quantitative Acoustic versus Deep Learning Metrics of Lenition
by Ratree Wayland, Kevin Tang, Fenqi Wang, Sophia Vellozzi and Rahul Sengupta
Languages 2023, 8(2), 98; https://doi.org/10.3390/languages8020098 - 29 Mar 2023
Cited by 6 | Viewed by 3225
Abstract
Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a phonological process known as spirantization or, more broadly, lenition. However, conditioned by various factors such as stress, place of articulation, flanking vowel quality, and speaking [...] Read more.
Spanish voiced stops /b, d, ɡ/ surfaced as fricatives [β, ð, ɣ] in intervocalic position due to a phonological process known as spirantization or, more broadly, lenition. However, conditioned by various factors such as stress, place of articulation, flanking vowel quality, and speaking rate, phonetic studies reveal a great deal of variation and gradience of these surface forms, ranging from fricative-like to approximant-like [β, ð, ɣ]. Several acoustic measurements have been used to quantify the degree of lenition, but none is standard. In this study, the posterior probabilities of sonorant and continuant phonological features in a corpus of Argentinian Spanish estimated by a deep learning Phonet model as measures of lenition were compared to traditional acoustic measurements of intensity, duration, and periodicity. When evaluated against known lenition factors: stress, place of articulation, surrounding vowel quality, word status, and speaking rate, the results show that sonorant and continuant posterior probabilities predict lenition patterns that are similar to those predicted by relative acoustic intensity measures and are in the direction expected by the effort-based view of lenition and previous findings. These results suggest that Phonet is a reliable alternative or additional approach to investigate the degree of lenition. Full article
Show Figures

Figure 1

Back to TopTop