Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (56)

Search Parameters:
Keywords = high vowels

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 639 KiB  
Article
Identification of Perceptual Phonetic Training Gains in a Second Language Through Deep Learning
by Georgios P. Georgiou
AI 2025, 6(7), 134; https://doi.org/10.3390/ai6070134 - 23 Jun 2025
Cited by 1 | Viewed by 473
Abstract
Background/Objectives: While machine learning has made substantial strides in pronunciation detection in recent years, there remains a notable gap in the literature regarding research on improvements in the acquisition of speech sounds following a training intervention, especially in the domain of perception. This [...] Read more.
Background/Objectives: While machine learning has made substantial strides in pronunciation detection in recent years, there remains a notable gap in the literature regarding research on improvements in the acquisition of speech sounds following a training intervention, especially in the domain of perception. This study addresses this gap by developing a deep learning algorithm designed to identify perceptual gains resulting from second language (L2) phonetic training. Methods: The participants underwent multiple sessions of high-variability phonetic training, focusing on discriminating challenging L2 vowel contrasts. The deep learning model was trained on perceptual data collected before and after the intervention. Results: The results demonstrated good model performance across a range of metrics, confirming that learners’ gains in phonetic training could be effectively detected by the algorithm. Conclusions: This research underscores the potential of deep learning techniques to track improvements in phonetic training, offering a promising and practical approach for evaluating language learning outcomes and paving the way for more personalized, adaptive language learning solutions. Deep learning enables the automatic extraction of complex patterns in learner behavior that might be missed by traditional methods. This makes it especially valuable in educational contexts where subtle improvements need to be captured and assessed objectively. Full article
Show Figures

Figure 1

17 pages, 3915 KiB  
Article
Bangla Character Detection Using Enhanced YOLOv11 Models: A Deep Learning Approach
by Mahbuba Aktar, Nur Islam and Chaoyu Yang
Appl. Sci. 2025, 15(11), 6326; https://doi.org/10.3390/app15116326 - 4 Jun 2025
Viewed by 1139
Abstract
Recognising the Bangla alphabet remains a significant challenge within the fields of computational linguistics and artificial intelligence, primarily due to the script’s inherent structural complexity and wide variability in writing styles. The Bangla script is characterised by intricate ligatures, overlapping diacritics, and visually [...] Read more.
Recognising the Bangla alphabet remains a significant challenge within the fields of computational linguistics and artificial intelligence, primarily due to the script’s inherent structural complexity and wide variability in writing styles. The Bangla script is characterised by intricate ligatures, overlapping diacritics, and visually similar graphemes, all of which complicate automated recognition tasks. Despite ongoing advancements in deep learning (DL), machine learning (ML), and image processing (IP), accurately identifying Bangla characters continues to be a demanding and unresolved issue. A key limitation lies in the absence of robust detection frameworks capable of accommodating the script’s complex visual patterns and nuances. To address this gap, we propose an enhanced object detection model based on the YOLOv11 architecture, incorporating a ResNet50 backbone for improved feature extraction. The YOLOv11 framework is particularly effective in capturing discriminative features from input images, enabling real-time detection with high precision. This is especially beneficial in overcoming challenges such as character overlap and stylistic diversity, which often hinder conventional recognition techniques. Our approach was evaluated on a custom dataset comprising 50 primary Bangla characters (including vowels and consonants) along with 10 numerical digits. The proposed model achieved a recognition confidence of 99.9%, markedly outperforming existing methods in terms of accuracy and robustness. This work underscores the potential of single-shot detection models for the recognition of complex scripts such as Bangla. Beyond its technical contributions, the model has practical implications in areas including the digitisation of historical documents, the development of educational tools, and the advancement of inclusive multilingual technologies. By effectively addressing the unique challenges posed by the Bangla script, this research contributes meaningfully to both computational linguistics and the preservation of linguistic heritage. Full article
Show Figures

Figure 1

6 pages, 797 KiB  
Proceeding Paper
Machine Learning Classifiers for Voice Health Assessment Under Simulated Room Acoustics
by Ahmed M. Yousef and Eric J. Hunter
Eng. Proc. 2024, 81(1), 16; https://doi.org/10.3390/engproc2024081016 - 7 May 2025
Viewed by 400
Abstract
Machine learning (ML) robustness for voice disorder detection was evaluated using reverberation-augmented recordings. Common vocal health assessment voice features from steady vowel samples (135 pathological, 49 controls) were used to train/test six ML classifiers. Detection performance was evaluated under low-reverb and simulated medium [...] Read more.
Machine learning (ML) robustness for voice disorder detection was evaluated using reverberation-augmented recordings. Common vocal health assessment voice features from steady vowel samples (135 pathological, 49 controls) were used to train/test six ML classifiers. Detection performance was evaluated under low-reverb and simulated medium (med = 0.48 s) and high-reverb times (high = 1.82 s). All models’ performance declined with longer reverberation. Support Vector Machine exhibited slight robustness but faced performance challenges. Random Forest and Gradient Boosting, though strong under low reverb, lacked generalizability in med/high reverb. Training/testing ML on augmented data is essential to enhance their reliability in real-world voice assessments. Full article
(This article belongs to the Proceedings of The 1st International Online Conference on Bioengineering)
Show Figures

Figure 1

18 pages, 585 KiB  
Article
Improving Diacritical Arabic Speech Recognition: Transformer-Based Models with Transfer Learning and Hybrid Data Augmentation
by Haifa Alaqel and Khalil El Hindi
Information 2025, 16(3), 161; https://doi.org/10.3390/info16030161 - 20 Feb 2025
Viewed by 1637
Abstract
Diacritical Arabic (DA) refers to Arabic text with diacritical marks that guide pronunciation and clarify meanings, making their recognition crucial for accurate linguistic interpretation. These diacritical marks (short vowels) significantly influence meaning and pronunciation, and their accurate recognition is vital for the effectiveness [...] Read more.
Diacritical Arabic (DA) refers to Arabic text with diacritical marks that guide pronunciation and clarify meanings, making their recognition crucial for accurate linguistic interpretation. These diacritical marks (short vowels) significantly influence meaning and pronunciation, and their accurate recognition is vital for the effectiveness of automatic speech recognition (ASR) systems, particularly in applications requiring high semantic precision, such as voice-enabled translation services. Despite its importance, leveraging advanced machine learning techniques to enhance ASR for diacritical Arabic has remained underexplored. A key challenge in developing DA ASR is the limited availability of training data. This study introduces a transformer-based approach leveraging transfer learning and data augmentation to address these challenges. Using a cross-lingual speech representation (XLSR) model pretrained on 53 languages, we fine-tune it on DA and integrate connectionist temporal classification (CTC) with transformers for improved performance. Data augmentation techniques, including volume adjustment, pitch shift, speed alteration, and hybrid strategies, further mitigate data limitations, significantly reducing word error rates (WER). Our methods achieve a WER of 12.17%, outperforming traditional ASR systems and setting a new benchmark for DA ASR. These findings demonstrate the potential of advanced machine learning to address longstanding challenges in DA ASR and enhance its accuracy. Full article
Show Figures

Figure 1

19 pages, 3051 KiB  
Article
An Acoustic Study of Romanian Stressed Vowels with Special Reference to Mid Central [ɨ] and [ə]
by Daniel Recasens and Fernando Sánchez-Miret
Languages 2025, 10(1), 12; https://doi.org/10.3390/languages10010012 - 15 Jan 2025
Viewed by 1395
Abstract
The present study is concerned with some aspects of the production of [ɨ] and [ə] in Romanian, i.e., their position within the vowel space, degree of acoustic variability and acoustic duration. To this end, acoustic data were collected for the Romanian stressed vowels [...] Read more.
The present study is concerned with some aspects of the production of [ɨ] and [ə] in Romanian, i.e., their position within the vowel space, degree of acoustic variability and acoustic duration. To this end, acoustic data were collected for the Romanian stressed vowels [i e a o u ɨ ə] produced by six speakers in controlled consonantal context conditions and real and nonsense words. The formant frequency data reveal that [ɨ] and [ə] do not overlap along the F1 dimension, which may be ascribed to the need to set in contrast the two central vowels phonologically. Moreover, [ɨ] is clearly more variable in F2, and thus in vowel fronting, than schwa. Regarding segmental duration, [ɨ] is as short as high vowels and shorter than schwa, whose duration is comparable to that of mid vowels. The phonetic characteristics for stressed schwa in Romanian are in contrast with those for the same vowel in other world’s languages, in which it is highly variable and shorter than all or most peripheral vowels. This behaviour may be attributed to the existence of two central vowels, while suggesting that [ə] has a well-defined articulatory target in the Romanian language. Full article
(This article belongs to the Special Issue An Acoustic Analysis of Vowels)
Show Figures

Figure 1

36 pages, 4793 KiB  
Article
Cross-Regional Patterns of Obstruent Voicing and Gemination: The Case of Roman and Veneto Italian
by Angelo Dian, John Hajek and Janet Fletcher
Languages 2024, 9(12), 383; https://doi.org/10.3390/languages9120383 - 20 Dec 2024
Viewed by 1744
Abstract
Italian has a length contrast in its series of voiced and voiceless obstruents while also presenting phonetic differences across regional varieties. Northern varieties of the language, including Veneto Italian (VI), are described as maintaining the voicing contrast but, in some cases, not the [...] Read more.
Italian has a length contrast in its series of voiced and voiceless obstruents while also presenting phonetic differences across regional varieties. Northern varieties of the language, including Veneto Italian (VI), are described as maintaining the voicing contrast but, in some cases, not the length contrast. In central and southern varieties, the opposite trend may occur. For instance, Roman Italian (RI) is reported to optionally pre-voice intervocalic voiceless singleton obstruents whilst also maintaining the length contrast for this consonant class. This study looks at the acoustic realization of selected obstruents in VI and RI and investigates (a) prevoicing patterns and (b) the effects and interactions of regional variety, gemination, and (phonological and phonetic) voicing on consonant (C) and preceding-vowel (V) durations, as well as the ratio between the two (C/V), with a focus on that particular measure. An acoustic phonetic analysis is conducted on 3703 tokens from six speakers from each variety, producing eight repetitions of 40 real CV́C(C)V and CVC(C)V́CV words embedded in carrier sentences, with /p, pp, t, tt, k, kk, b, bb, d, dd, ɡ, ɡɡ, f, ff, v, vv, t∫, tt∫, dʒ, ddʒ/ as the target intervocalic consonants. The results show that both VI and RI speakers produce geminates, yielding high C/V ratios in both varieties, although there are cross-regional differences in the realization of singletons. On the one hand, RI speakers tend to pre-voice voiceless singletons and produce overall shorter C durations and lower C/V ratios for these consonants. On the other hand, VI speakers produce longer C durations and higher C/V ratios for all voiceless singletons, triggering some overlap between the C length categories, which results in partial degemination through singleton lengthening, although only for voiceless obstruents. The implications of a trading relationship between phonetic voicing and duration of obstruents in Italian gemination are discussed. Full article
(This article belongs to the Special Issue Speech Variation in Contemporary Italian)
Show Figures

Figure 1

14 pages, 2353 KiB  
Article
Sensitivity of Acoustic Voice Quality Measures in Simulated Reverberation Conditions
by Ahmed M. Yousef and Eric J. Hunter
Bioengineering 2024, 11(12), 1253; https://doi.org/10.3390/bioengineering11121253 - 11 Dec 2024
Cited by 3 | Viewed by 1118
Abstract
Room reverberation can affect oral/aural communication and is especially critical in computer analysis of voice. High levels of reverberation can distort voice recordings, impacting the accuracy of quantifying voice production quality and vocal health evaluations. This study quantifies the impact of additive simulated [...] Read more.
Room reverberation can affect oral/aural communication and is especially critical in computer analysis of voice. High levels of reverberation can distort voice recordings, impacting the accuracy of quantifying voice production quality and vocal health evaluations. This study quantifies the impact of additive simulated reverberation on otherwise clean voice recordings as reflected in voice metrics commonly used for voice quality evaluation. From a larger database of voice recordings collected in a low-noise, low-reverberation environment, voice samples of a sustained [a:] vowel produced at two different speaker intents (comfortable and clear) by five healthy voice college-age female native English speakers were used. Using the reverb effect in Audacity, eight reverberation situations indicating a range of reverberation times (T20 between 0.004 and 1.82 s) were simulated and convolved with the original recordings. All voice samples, both original and reverberation-affected, were analyzed using freely available PRAAT software (version 6.0.13) to calculate five common voice parameters: jitter, shimmer, harmonic-to-noise ratio (HNR), alpha ratio, and smoothed cepstral peak prominence (CPPs). Statistical analyses assessed the sensitivity and variations in voice metrics to a range of simulated room reverberation conditions. Results showed that jitter, HNR, and alpha ratio were stable at simulated reverberation times below T20 of 1 s, with HNR and jitter more stable in the clear vocal style. Shimmer was highly sensitive even at T20 of 0.53 s, which would reflect a common room, while CPPs remained stable across all simulated reverberation conditions. Understanding the sensitivity and stability of these voice metrics to a range of room acoustics effects allows for targeted use of certain metrics even in less controlled environments, enabling selective application of stable measures like CPPs and cautious interpretation of shimmer, ensuring more reliable and accurate voice assessments. Full article
(This article belongs to the Special Issue Models and Analysis of Vocal Emissions for Biomedical Applications)
Show Figures

Figure 1

12 pages, 516 KiB  
Article
Evaluation of the Peripheral and Central Auditory Systems in Children and Adolescents Before and After COVID-19 Infection
by Julia Siqueira, Milaine Dominici Sanfins, Piotr Henryk Skarzynski, Magdalena Beata Skarzynska and Maria Francisca Colella-Santos
Children 2024, 11(12), 1454; https://doi.org/10.3390/children11121454 - 28 Nov 2024
Viewed by 1025
Abstract
COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. During and after COVID-19, audiovestibular symptoms and impairments have been reported. Objectives: This study aimed to investigate the impacts of COVID-19 on the peripheral and central auditory systems of children and adolescents following [...] Read more.
COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. During and after COVID-19, audiovestibular symptoms and impairments have been reported. Objectives: This study aimed to investigate the impacts of COVID-19 on the peripheral and central auditory systems of children and adolescents following the acute COVID-19 phase based on behavioral, electroacoustic, and electrophysiological audiological assessments. Methods: This is a primary, prospective, observational, and cross-sectional study of 23 children aged 8 to 15 years who acquired confirmed COVID-19 and who, before infection, had not had any auditory complaints or school complications. The results were compared with pre-pandemic data collected from a similar group of 23 children who had normal peripheral and central hearing and good school performance. Each participant answered a questionnaire about child development, school, and health history and underwent tests including pure-tone audiometry and high-frequency audiometry, imitanciometry, transient evoked otoacoustic emissions, and distortion product otoacoustic emissions. They also received tests of Brainstem Auditory Evoked Potentials, Long Latency Auditory Evoked Potentials, Dichotic Digits Test, Sentence Identification Test, Dichotic Consonant–Vowel Test, Frequency Pattern Test, and Gaps-In-Noise Test. Results: Significant differences were observed between the groups, with the study group showing worse thresholds compared to the control group at both standard audiometric frequencies and at higher frequencies, although both groups were still within normal limits (p ≤ 0.05). In addition, the study group had a higher prevalence of absent responses, as identified by otoacoustic emissions and acoustic reflexes. In terms of central auditory performance, the study group showed ABRs with significantly longer latencies of waves I, III, and V compared to the control group. The study group also performed less well on the Dichotic Digits and Pediatric Speech Identification tests. Conclusions: COVID-19 appears to alter the auditory system, both peripherally at the level of the outer hair cells and more centrally. Full article
(This article belongs to the Section Pediatric Otolaryngology)
Show Figures

Figure 1

23 pages, 3561 KiB  
Article
“It’s a Bit Tricky, Isn’t It?”—An Acoustic Study of Contextual Variation in /ɪ/ in the Conversational Speech of Young People from Perth
by Gerard Docherty, Paul Foulkes and Simon Gonzalez
Languages 2024, 9(11), 343; https://doi.org/10.3390/languages9110343 - 31 Oct 2024
Viewed by 1222
Abstract
This study presents an acoustic analysis of vowel realisations in contexts where, in Australian English, a historical contrast between unstressed /ɪ/ and /ə/ has largely diminished in favour of a central schwa-like variant. The study is motivated by indications that there is greater [...] Read more.
This study presents an acoustic analysis of vowel realisations in contexts where, in Australian English, a historical contrast between unstressed /ɪ/ and /ə/ has largely diminished in favour of a central schwa-like variant. The study is motivated by indications that there is greater complexity in this area of vowel variation than has been conventionally set out in the existing literature, and our goal is to shed new light by studying a dataset of conversational speech produced by 40 young speakers from Perth, WA. In doing so, we also offer some critical thoughts on the use of Wells’ lexical sets as a framework for analysis in work of this kind, in particular with reference to the treatment of items in unstressed position, and of grammatical (or function) words. The acoustic analysis focused on the realisation in F1/F2 space of a range of /ɪ/ and /ə/ variants in both accented and unaccented syllables (thus a broader approach than a focus on stressed kit vowels). For the purposes of comparison, we also analysed tokens of the fleece and happy-tensing lexical sets. Grammatical and non-grammatical words were analysed independently in order to understand the extent to which a high-frequency grammatical word such as it might contribute to the overall pattern of vowel alternation. Our findings are largely consistent with the small amount of previous work that has been carried out in this area, pointing to a continuum of realisations across a range of accented and unaccented contexts. The data suggest that the reduced historical /ɪ/ vowel encountered in unaccented syllables cannot be straightforwardly analysed as a merger with /ə/. We also highlight the way in which the grammatical word it participates in this alternation. Full article
(This article belongs to the Special Issue Advances in Australian English)
Show Figures

Figure 1

32 pages, 11315 KiB  
Article
Correspondence of Consonant Clustering with Particular Vowels in German Dialects
by Samantha Link
Languages 2024, 9(7), 255; https://doi.org/10.3390/languages9070255 - 22 Jul 2024
Viewed by 1565
Abstract
Recent work found a correspondence between consonant clustering probability in monosyllabic lexemes and the three vowel types, short and long monophthong and diphthong, in German dialects. Furthermore, that correspondence was found to be bound to a North–South divide. This paper explores the preferences [...] Read more.
Recent work found a correspondence between consonant clustering probability in monosyllabic lexemes and the three vowel types, short and long monophthong and diphthong, in German dialects. Furthermore, that correspondence was found to be bound to a North–South divide. This paper explores the preferences in consonant clustering of particular vowels by analyzing the PhonD2-Corpus, a large database of phonotactic and morphological information. The clustering probability of the diphthongs is positively correlated with frequency while the other vowels showed particular preferences that are not positively correlated with frequency. However, all of them are determined by a threefold pattern: short monophthongs prefer coda clusters, diphthongs onset clusters and long monophthong are balanced. Furthermore, it was found that this threefold pattern seems to have evolved from an originally twofold pattern (short monophthong prefers coda clusters and long monophthong and diphthong prefer onset clusters) in Middle High and Low German. This result is then further considered under the aspect of the compensation of the syllable weight and moraicity. Furthermore, some interesting parallels with the syllable vs. word-language typology framework are noted. Full article
Show Figures

Figure 1

18 pages, 4144 KiB  
Article
Auditory Sensory Gating: Effects of Noise
by Fan-Yin Cheng, Julia Campbell and Chang Liu
Biology 2024, 13(6), 443; https://doi.org/10.3390/biology13060443 - 18 Jun 2024
Cited by 1 | Viewed by 2236
Abstract
Cortical auditory evoked potentials (CAEPs) indicate that noise degrades auditory neural encoding, causing decreased peak amplitude and increased peak latency. Different types of noise affect CAEP responses, with greater informational masking causing additional degradation. In noisy conditions, attention can improve target signals’ neural [...] Read more.
Cortical auditory evoked potentials (CAEPs) indicate that noise degrades auditory neural encoding, causing decreased peak amplitude and increased peak latency. Different types of noise affect CAEP responses, with greater informational masking causing additional degradation. In noisy conditions, attention can improve target signals’ neural encoding, reflected by an increased CAEP amplitude, which may be facilitated through various inhibitory mechanisms at both pre-attentive and attentive levels. While previous research has mainly focused on inhibition effects during attentive auditory processing in noise, the impact of noise on the neural response during the pre-attentive phase remains unclear. Therefore, this preliminary study aimed to assess the auditory gating response, reflective of the sensory inhibitory stage, to repeated vowel pairs presented in background noise. CAEPs were recorded via high-density EEG in fifteen normal-hearing adults in quiet and noise conditions with low and high informational masking. The difference between the average CAEP peak amplitude evoked by each vowel in the pair was compared across conditions. Scalp maps were generated to observe general cortical inhibitory networks in each condition. Significant gating occurred in quiet, while noise conditions resulted in a significantly decreased gating response. The gating function was significantly degraded in noise with less informational masking content, coinciding with a reduced activation of inhibitory gating networks. These findings illustrate the adverse effect of noise on pre-attentive inhibition related to speech perception. Full article
(This article belongs to the Special Issue Neural Correlates of Perception in Noise in the Auditory System)
Show Figures

Figure 1

24 pages, 1546 KiB  
Article
Articulatory Characteristics of Secondary Palatalization in Romanian Fricatives
by Laura Spinu, Alexei Kochetov and Maida Percival
Languages 2024, 9(6), 201; https://doi.org/10.3390/languages9060201 - 31 May 2024
Viewed by 1322
Abstract
The production of fricatives involves the complex interaction of articulatory constraints resulting from the formation of the appropriate oral constriction, the control of airflow through the constriction so as to achieve frication and, in the case of voiced fricatives, the maintenance of glottal [...] Read more.
The production of fricatives involves the complex interaction of articulatory constraints resulting from the formation of the appropriate oral constriction, the control of airflow through the constriction so as to achieve frication and, in the case of voiced fricatives, the maintenance of glottal oscillation by attending to transglottal pressure. To better understand this mechanism in a relatively understudied language, we explore the articulatory characteristics of five pairs of plain and palatalized Romanian fricatives produced by 10 native speakers using ultrasound imaging. Our analysis includes an assessment of the robustness of the plain-palatalized contrast at different places of articulation, a comparison of secondary palatalization with other relevant word-final [Ci] structures, and the identification of individual variation patterns. Since our study is the first to document the articulatory properties of secondary palatalization in Romanian, our findings are of descriptive interest. Full article
(This article belongs to the Special Issue Phonetic and Phonological Complexity in Romance Languages)
Show Figures

Figure 1

32 pages, 7815 KiB  
Article
Neural Adaptation at Stimulus Onset and Speed of Neural Processing as Critical Contributors to Speech Comprehension Independent of Hearing Threshold or Age
by Jakob Schirmer, Stephan Wolpert, Konrad Dapper, Moritz Rühle, Jakob Wertz, Marjoleen Wouters, Therese Eldh, Katharina Bader, Wibke Singer, Etienne Gaudrain, Deniz Başkent, Sarah Verhulst, Christoph Braun, Lukas Rüttiger, Matthias H. J. Munk, Ernst Dalhoff and Marlies Knipper
J. Clin. Med. 2024, 13(9), 2725; https://doi.org/10.3390/jcm13092725 - 6 May 2024
Cited by 4 | Viewed by 2152
Abstract
Background: It is assumed that speech comprehension deficits in background noise are caused by age-related or acquired hearing loss. Methods: We examined young, middle-aged, and older individuals with and without hearing threshold loss using pure-tone (PT) audiometry, short-pulsed distortion-product otoacoustic emissions [...] Read more.
Background: It is assumed that speech comprehension deficits in background noise are caused by age-related or acquired hearing loss. Methods: We examined young, middle-aged, and older individuals with and without hearing threshold loss using pure-tone (PT) audiometry, short-pulsed distortion-product otoacoustic emissions (pDPOAEs), auditory brainstem responses (ABRs), auditory steady-state responses (ASSRs), speech comprehension (OLSA), and syllable discrimination in quiet and noise. Results: A noticeable decline of hearing sensitivity in extended high-frequency regions and its influence on low-frequency-induced ABRs was striking. When testing for differences in OLSA thresholds normalized for PT thresholds (PTTs), marked differences in speech comprehension ability exist not only in noise, but also in quiet, and they exist throughout the whole age range investigated. Listeners with poor speech comprehension in quiet exhibited a relatively lower pDPOAE and, thus, cochlear amplifier performance independent of PTT, smaller and delayed ABRs, and lower performance in vowel-phoneme discrimination below phase-locking limits (/o/-/u/). When OLSA was tested in noise, listeners with poor speech comprehension independent of PTT had larger pDPOAEs and, thus, cochlear amplifier performance, larger ASSR amplitudes, and higher uncomfortable loudness levels, all linked with lower performance of vowel-phoneme discrimination above the phase-locking limit (/i/-/y/). Conslusions: This study indicates that listening in noise in humans has a sizable disadvantage in envelope coding when basilar-membrane compression is compromised. Clearly, and in contrast to previous assumptions, both good and poor speech comprehension can exist independently of differences in PTTs and age, a phenomenon that urgently requires improved techniques to diagnose sound processing at stimulus onset in the clinical routine. Full article
Show Figures

Graphical abstract

24 pages, 689 KiB  
Article
A Feature Alignment Approach to Plural Realization in Eastern Andalusian Spanish
by Stuart Davis and Matthew Pollock
Languages 2024, 9(5), 166; https://doi.org/10.3390/languages9050166 - 2 May 2024
Cited by 1 | Viewed by 2091
Abstract
Using an optimality theoretic analysis, this study offers a conception of the problem of plural realization in Eastern Andalusian Spanish (EAS) where plural suffix /s/ was deleted diachronically that differs from other accounts that assign the EAS plural an underlying suffixal /s/ synchronically. [...] Read more.
Using an optimality theoretic analysis, this study offers a conception of the problem of plural realization in Eastern Andalusian Spanish (EAS) where plural suffix /s/ was deleted diachronically that differs from other accounts that assign the EAS plural an underlying suffixal /s/ synchronically. Using alignment constraints, we argue that plural /s/ does not appear in the underlying form synchronically in EAS, but that instead the plural morpheme is represented by a floating [–ATR]PL feature that aligns to the right edge of the word and spreads left. The [–ATR] feature, represented phonetically as a laxing or opening of vowels, applies to all mid vowels, low vowels in word final position, and combines with vowel epenthesis to explain Eastern Andalusian pluralization tendencies in words with final consonants. We discuss the behavior of high vowels, which can be transparent to harmony, and focus in particular on the plural of words that end in a final stressed vowel that have been rarely discussed in the EAS literature. We develop an optimality-theoretic analysis on the Granada variety and extend that analysis to other varieties with somewhat different patterns. Full article
(This article belongs to the Special Issue Phonetics and Phonology of Ibero-Romance Languages)
17 pages, 1130 KiB  
Article
The Intelligibility Benefits of Modern Computer-Synthesized Speech for Normal-Hearing and Hearing-Impaired Listeners in Non-Ideal Listening Conditions
by Yizhen Ma and Yan Tang
J. Otorhinolaryngol. Hear. Balance Med. 2024, 5(1), 5; https://doi.org/10.3390/ohbm5010005 - 18 Apr 2024
Viewed by 1780
Abstract
Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices [...] Read more.
Speech intelligibility is a concern for public health, especially in non-ideal listening conditions where listeners often listen to the target speech in the presence of background noise. With advances in technology, synthetic speech has been increasingly used in lieu of actual human voices in human–machine interfaces, such as public announcement systems, answering machines, virtual personal assistants, and GPS, to interact with users. However, previous studies showed that speech generated by computer speech synthesizers was often intrinsically less natural and intelligible than natural speech produced by human speakers. In terms of noise, listening to synthetic speech is challenging for listeners with normal hearing (NH), not to mention for hearing-impaired (HI) listeners. Recent developments in speech synthesis have significantly improved the naturalness of synthetic speech. In this study, the intelligibility of speech generated by commercial synthesizers from Google, Amazon, and Microsoft was evaluated by both NH and HI listeners in different noise conditions. Compared to a natural female voice as the baseline, listeners’ listening performance suggested that some of the synthetic speech was significantly more intelligible even at rather adverse listening conditions for the NH cohort. Further acoustical analyses revealed that elongated vowel sounds and reduced spectral tilt were primarily responsible for improved intelligibility for NH, but not for HI due to their impairment at high frequencies and possible cognitive decline associated with aging. Full article
Show Figures

Figure 1

Back to TopTop