Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (251)

Search Parameters:
Keywords = accent

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 972 KB  
Article
Differential Effects of Educational and Occupational Cognitive Reserve on Foreign-Accented Speech Evaluation by Older Adults
by Jolanta Sypiańska and Zuzanna Cal
Brain Sci. 2026, 16(3), 280; https://doi.org/10.3390/brainsci16030280 - 28 Feb 2026
Viewed by 260
Abstract
Background/Objectives: Older adults are increasingly exposed to foreign-accented speech in everyday communication, yet such speech is known to increase perceptual and cognitive demands even when intelligibility is preserved. While previous research has documented age-related differences in processing foreign-accented speech, less is known [...] Read more.
Background/Objectives: Older adults are increasingly exposed to foreign-accented speech in everyday communication, yet such speech is known to increase perceptual and cognitive demands even when intelligibility is preserved. While previous research has documented age-related differences in processing foreign-accented speech, less is known about how individual differences in cognitive reserve shape older adults’ subjective evaluations of accented speech. The present study examined whether cognitive reserve is associated with senior listeners’ ratings of foreign-accented English across multiple perceptual dimensions. Methods: Thirty native English-speaking British adults aged 66–84 years completed an online foreign accent rating task. Participants rated non-native English speech on four dimensions: perceived accent strength, comprehensibility, perceived intelligibility and listening effort. Cognitive reserve was operationalized using a multidimensional proxy approach incorporating educational attainment, occupational complexity and leisure activities. These components were combined into a composite cognitive reserve score and also examined separately. The ratings were analyzed using cumulative link mixed models with the participant as a random intercept. Self-reported experience with foreign-accented speech was included as a covariate in all models. Results: Composite cognitive reserve showed no significant association with comprehensibility, perceived intelligibility, or listening effort and only a marginal negative association with perceived foreign-accent strength. When cognitive reserve was decomposed into its components, educational cognitive reserve significantly predicted higher comprehensibility and perceived intelligibility ratings and higher listening effort ratings. Occupational cognitive reserve showed significant effects in the opposite direction for comprehensibility, perceived intelligibility and listening effort; however, it was also associated with lower perceived accent strength. Self-reported experience with foreign-accented speech was not a significant predictor for any outcome. Conclusions: These findings indicate that cognitive reserve does not exert a uniform influence on older adults’ perception of foreign-accented speech. Instead, occupational and educational components of cognitive reserve show distinct and opposing associations with perceptual evaluations. The results highlight the importance of using multidimensional approaches to cognitive reserve when examining individual differences in speech perception in later adulthood. Full article
Show Figures

Figure 1

12 pages, 285 KB  
Article
On New Classes of Stretch Minkowskian Product Finsler Manifolds
by Fengyu Zheng, Yong He, Ruijia Yang and Jingya Chen
Axioms 2026, 15(3), 161; https://doi.org/10.3390/axioms15030161 - 25 Feb 2026
Viewed by 226
Abstract
Let (M1,F1) and (M2,F2) be two Finsler manifolds. A Minkowskian product Finsler manifold is defined to be the product manifold M1×M2, which is endowed with a [...] Read more.
Let (M1,F1) and (M2,F2) be two Finsler manifolds. A Minkowskian product Finsler manifold is defined to be the product manifold M1×M2, which is endowed with a Finsler metric F. This metric F is constructed by taking the square root of a product function f, which itself operates on the squares of the original metrics F1 and F2. This paper focuses on new classes of stretch Minkowskian product Finsler manifolds. We prove that the Minkowskian product Finsler manifold (M,F) is a B˜-manifold (resp. B˜-stretch manifold, H-stretch manifold) if and only if (M1,F1) and (M2,F2) are both B˜-manifold (resp. B˜-stretch manifold, H-stretch manifold). Thus an effective method for constructing special Finsler manifolds mentioned above is given. Full article
(This article belongs to the Section Geometry and Topology)
12 pages, 766 KB  
Article
Evaluation of the Human Capacity to Detect Spanish Deepfake Audios with a Paraguayan Accent
by María Vianella Giménez Ramos, Juan Pinto-Ríos, Pastor Pérez-Estigarribia and Enrique Dávalos
Appl. Sci. 2026, 16(4), 1910; https://doi.org/10.3390/app16041910 - 14 Feb 2026
Viewed by 418
Abstract
Deepfakes, synthetic multimedia files generated by artificial intelligence, are drastically undermining digital credibility. Their ability to manipulate our perception of reality has created a new and complex battleground for disinformation, posing a critical threat to non-English-speaking audio with distinctive accents. Consequently, the objective [...] Read more.
Deepfakes, synthetic multimedia files generated by artificial intelligence, are drastically undermining digital credibility. Their ability to manipulate our perception of reality has created a new and complex battleground for disinformation, posing a critical threat to non-English-speaking audio with distinctive accents. Consequently, the objective of this study is to determine the human capacity to detect deepfake audio in Spanish with a Paraguayan accent through an experiment conducted with an Android application called ReFake (developed specifically for this research). In this experiment, 450 participants, aged 16–72, evaluated 10 audio samples of up to 15 s each, classifying them as authentic (belonging to Paraguayan journalists) or fake (generated with ElevenLabs). The findings suggests that human ear is more accurate than artificial intelligence (AI) at detecting vocal ‘naturalness’. This ability is influenced by generational age and educational level, with younger people and those with postgraduate degrees demonstrating greater performance. Conversely, gender and nationality do not influence detection, although the high prosodic quality of deepfakes still leads to errors in human judgment. Given these results, it is crucial to adapt and develop new strategies for a secure and resilient online ecosystem. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

19 pages, 721 KB  
Article
Communication Experiences and Challenges in Adult Cochlear Implant Users: Associations with Age and Occupation
by Jack Y. Lin, Hugh M. Birky, Aaron C. Moberly and Terrin N. Tamati
J. Clin. Med. 2026, 15(4), 1450; https://doi.org/10.3390/jcm15041450 - 12 Feb 2026
Viewed by 308
Abstract
Background/Objectives: Over one million individuals have received cochlear implants (CIs) worldwide—a monumental milestone in improving speech perception for those otherwise unable to hear. Although implantation is now routine, communication outcomes vary widely. This study investigated the effects of age and occupational status [...] Read more.
Background/Objectives: Over one million individuals have received cochlear implants (CIs) worldwide—a monumental milestone in improving speech perception for those otherwise unable to hear. Although implantation is now routine, communication outcomes vary widely. This study investigated the effects of age and occupational status on the communication challenges of postlingually deafened adult CI users. Methods: Sixty-nine experienced (>6 months of use) CI users between the ages of 18 and 83 years completed a lab-developed survey. Self-reported communication challenges were compared between younger (<65 years) and older (≥65 years) CI users, and between working and retired individuals. Results: Younger CI users placed greater importance on communicating with colleagues than older CI users (p = 0.001), a pattern also observed among those working compared with retired individuals (p < 0.001). Compared with older CI users, younger participants reported fewer difficulties understanding fast (p = 0.012) and unclear speech (p = 0.016), but greater difficulties with soft (p = 0.044) and foreign-accented speech (p = 0.047). Similarly, working CI users reported fewer difficulties with fast (p = 0.028) and unclear speech (p < 0.001). Regardless of age or occupational status, most participants reported persistent listening fatigue and the tendency to avoid difficult conversations. Conclusions: Together, these findings demonstrate that while adult CI users report common struggles like fatigue, specific communication challenges differ across age and occupational status. Recognizing these factors may inform more personalized counseling and rehabilitation strategies to enhance everyday communication outcomes for CI users. Full article
Show Figures

Figure 1

44 pages, 3809 KB  
Review
Electrochemical (Bio)Sensors Based on Nanotechnologies for the Detection of Important Biomolecules in Plants and Plant-Related Samples: The Future of Smart and Precision Agriculture
by Ioana Silvia Hosu, Radu-Claudiu Fierăscu and Irina Fierăscu
Biosensors 2026, 16(2), 107; https://doi.org/10.3390/bios16020107 - 6 Feb 2026
Viewed by 460
Abstract
Considering the present environmental concerns, nanomaterial-based methods should be applied to achieve the bioeconomic sustainability initiatives and climate change mitigation. Plants and plant extracts are one of the most underused biomass and bioactive ingredients resources. Moreover, nowadays crop loss is one of the [...] Read more.
Considering the present environmental concerns, nanomaterial-based methods should be applied to achieve the bioeconomic sustainability initiatives and climate change mitigation. Plants and plant extracts are one of the most underused biomass and bioactive ingredients resources. Moreover, nowadays crop loss is one of the main problems that the world faces, together with the depletion of natural resources, increasing population and limited arable land, leading to increased food scarcity and demand. To correctly attribute/use plant-based bioresources or to rapidly decide which farming operations should be performed before crop loss, we should be able to properly characterize plants or plant-based resources by the desired useful characteristics, such as (bio)chemical characteristics, rather than simply observing physical traits of plants (because, when these traits become visible, it may be too late for crop loss mitigation). Plant crops could be optimized, for example, using electrochemical methods that assess the nutrient uptake and nutrient use efficiency (NUE) or the oxidative stress burst encountered before crop loss, in order to improve crop yields and crop quality. Other different important analytes (such as hormones, pathogens, metabolites, etc.) or plant characteristics (such as genus, species, phylogenetic analysis, etc.) can be evaluated with these electrochemical sensors and methods. In the present review, we focus on the application of nanomaterials/nanotechnologies for the development of fast, accurate, accessible, cost-effective, sensitive and selective analytical electrochemical methods for the detection of different relevant biomolecules in plants or plant-related samples (plant extracts, plant cells, plant tissues, and/or plant-derived natural drinks/foods, as well as entire plants/plant parts), both in vivo vs. ex vivo and in situ vs. ex situ. This review systematically presents and critically discusses the outcomes of current electrochemical methods (both applied in the lab or as wearable/implantable sensors) and the future perspectives of these nanotechnology-based sensors, with an accent on wearable sensors for smart and precision agriculture, as real-world sensing technologies with significant practical impact. The novelty of this article is the abundance of electrochemical analytical parameters gathered and discussed, for such a large number of analyte categories. Full article
Show Figures

Figure 1

25 pages, 2133 KB  
Article
Phonological Feature Posteriors and Cue-Specific Accent Perception in Hindi- and Tamil-Accented English
by Nitin Venkateswaran and Ratree Wayland
Brain Sci. 2026, 16(2), 177; https://doi.org/10.3390/brainsci16020177 - 31 Jan 2026
Viewed by 391
Abstract
Background/Objectives: Accented speech reflects systematic deviation from target-language phonetic norms. This study demonstrates that perceived accent strength covaries with selective, gradient differences in phonological feature realization. We examine whether perceived accents in Hindi- and Tamil-accented English reflect uniform segmental deviation or cue-specific [...] Read more.
Background/Objectives: Accented speech reflects systematic deviation from target-language phonetic norms. This study demonstrates that perceived accent strength covaries with selective, gradient differences in phonological feature realization. We examine whether perceived accents in Hindi- and Tamil-accented English reflect uniform segmental deviation or cue-specific patterns of phonological feature realization. Methods: English speech produced by native speakers of Hindi and Tamil was evaluated using native listener accentedness ratings. Phonetic variation was analyzed using posterior probabilities of phonological features derived from a machine learning model, Phonet. The analyses focused on liquids (laterals and rhotics (e.g., /l/, /ɭ/, and /ɻ/) and labial segments in the fricative–glide space (e.g., /v/, /w/, and /ʋ/), with attention to word position and feature-level generalization. Results: Accentedness ratings differed systematically for Hindi- and Tamil-accented English and covaried with a subset of phonological feature dimensions, yielding contrast- and context-specific patterns of perceptually relevant variation. Not all features that varied in production contributed to perceived accent strength. Conclusions: These findings support a cue-specific, perception-grounded account of accentedness and establish phonological feature posteriors derived from Phonet as interpretable phonological categories through which gradient L2 production differences are evaluated by listeners. Full article
(This article belongs to the Special Issue Language Perception and Processing)
Show Figures

Figure 1

21 pages, 2995 KB  
Article
Language Experience Shapes Neural Grouping of Speech by Accent: EEG Evidence from Native, Second-Language, and Heritage Listeners
by Lauren L. Hong, Chao Han and Philip J. Monahan
Brain Sci. 2026, 16(2), 174; https://doi.org/10.3390/brainsci16020174 - 31 Jan 2026
Viewed by 478
Abstract
Background: Accented speech contains talker-indexical cues that listeners can use to infer social group membership, yet it remains unclear how the auditory system categorizes accent variability and how this process depends on language experience. Methods: The current study used EEG and the MMN [...] Read more.
Background: Accented speech contains talker-indexical cues that listeners can use to infer social group membership, yet it remains unclear how the auditory system categorizes accent variability and how this process depends on language experience. Methods: The current study used EEG and the MMN oddball paradigm to test pre-attentive neural sensitivity to accent changes of English words stopped produced by Canadian English or Mandarin Chinese-accented English talkers. Three participant groups were tested: Native English listeners, L1-Mandarin listeners, and Heritage Mandarin listeners. Results: In the Native English and L1-Mandarin groups, we observed MMNs to the Canadian accented English deviant, indicating that the brain can group speech by accent despite substantive inter-talker variation and that this grouping is consistent with an experience-dependent sensitivity to accent. Exposure to Mandarin Chinese-accented English modulated MMN magnitude. Time-frequency analyses suggested that α and low-β power during accent encoding varied with language background, with Native English listeners showing stronger activity when presented with Mandarin Chinese-accented English. Finally, the neurophysiological response in the Heritage Mandarin group reflected a broader phonological space encompassing both Canadian English and Mandarin-accented English, and its magnitude was predicted by Chinese proficiency. Conclusions: These findings provide brain-based evidence that automatic accent categorization is not uniform across listeners but interacts with native phonology and second-language experience. Full article
(This article belongs to the Special Issue Language Perception and Processing)
Show Figures

Figure 1

20 pages, 707 KB  
Article
Beyond Native Norms: A Perceptually Grounded and Fair Framework for Automatic Speech Assessment
by Mewlude Nijat, Yang Wei, Shuailong Li, Abdusalam Dawut and Askar Hamdulla
Appl. Sci. 2026, 16(2), 647; https://doi.org/10.3390/app16020647 - 8 Jan 2026
Viewed by 393
Abstract
Pronunciation assessment is central to computer-assisted pronunciation training (CAPT) and speaking tests, yet most systems still adopt a native norm, treating deviations from canonical L1 pronunciations as errors. In contrast, rating rubrics and psycholinguistic evidence emphasize intelligibility for a target listener population and [...] Read more.
Pronunciation assessment is central to computer-assisted pronunciation training (CAPT) and speaking tests, yet most systems still adopt a native norm, treating deviations from canonical L1 pronunciations as errors. In contrast, rating rubrics and psycholinguistic evidence emphasize intelligibility for a target listener population and show that listeners rapidly adapt their phonetic categories to new accents. We argue that automatic assessment should likewise be referenced to the target learner group. We build a Transformer-based mispronunciation detection (MD) model that computationally mimics listener adaptation: it is first pre-trained on multi-speaker Librispeech, then fine-tuned on the non-native L2-ARCTIC corpus that represents a specific learner population. Fine-tuning, using either synthetic or human MD labels, constrains updates to the phonetic space (i.e., the representation space used to encode phone-level distinctions, the learned phone/phonetic embedding space, and its alignment with acoustic representations), which means that only the phonetic module is updated while the rest of the model stays fixed. Relative to the pre-trained model, L2 adaptation substantially improves MD recall and F1, increasing ROC–AUC from 0.72 to 0.85. The results support a target-population norm and inform the design of perception-aligned, fairer automatic pronunciation assessment systems. Full article
Show Figures

Figure 1

24 pages, 3121 KB  
Article
On the Constituent Structure of Augmented Plurals in Russian
by Ora Matushansky
Languages 2025, 10(12), 304; https://doi.org/10.3390/languages10120304 - 16 Dec 2025
Viewed by 729
Abstract
This article examines augmented plurals in Russian, mostly focusing on those in -ĭj- (e.g., pero/perʲja ‘feather.sg/pl’). The accentual behavior of -ĭj-plurals is sensitive to animacy: while inanimates show stem-final stress, animates appear with inflectional stress. This is [...] Read more.
This article examines augmented plurals in Russian, mostly focusing on those in -ĭj- (e.g., pero/perʲja ‘feather.sg/pl’). The accentual behavior of -ĭj-plurals is sensitive to animacy: while inanimates show stem-final stress, animates appear with inflectional stress. This is explained by different constituent structures: while for inanimates, -ĭj- combines with the stem, animate stems require complex suffix formation so as to not create neuter animates, which are not tolerated in Russian. The position of the accent is then derived from the usual assumptions about Russian stress and the hypothesis that -ĭj- is accented but unaccentable. Other plural augments are also discussed. Full article
(This article belongs to the Special Issue SinFonIJA 17 (Syntax, Phonology and Language Analysis))
23 pages, 910 KB  
Article
Fractal Modeling of Generalized Weighted Pre-Invex Functions with Applications to Random Variables and Special Means
by Muhammad Muddassar, Maria Bibi, Kashif Nazar and Adil Jhangeer
Axioms 2025, 14(12), 897; https://doi.org/10.3390/axioms14120897 - 2 Dec 2025
Viewed by 331
Abstract
This article introduces certain algebraic properties of generalized (h˜1,h˜2)-pre-invex functions on R(0<1). A new fractal weighted integral identity is established and further employed to obtain [...] Read more.
This article introduces certain algebraic properties of generalized (h˜1,h˜2)-pre-invex functions on R(0<1). A new fractal weighted integral identity is established and further employed to obtain several Ostrowski-type results in the fractal setting for functions whose first derivatives in the modulus belong to the generalized (h˜1,h˜2)-pre-invex functions’s class. An illustrative example is presented to validate the theoretical findings. Moreover, applications of the main results are derived in connection with generalized random variables and various special means, highlighting the effectiveness and potential scope of the proposed approach. Full article
Show Figures

Figure 1

14 pages, 782 KB  
Article
Measured vs. Estimated V˙O2max in the Yo-Yo Endurance Test: An Exploratory Study in Professional Soccer Players
by Antonio Buglione, Dario Pompa, Marco Beato, Marco Bruno Luigi Rocchi, Cristian Savoia, Maurizio Bertollo, Davide Curzi, Davide Sisti and Fabrizio Perroni
Sports 2025, 13(12), 424; https://doi.org/10.3390/sports13120424 - 2 Dec 2025
Viewed by 1259
Abstract
Accurate assessment of aerobic fitness is crucial in soccer; however, the validity of field-based predictive tests remains uncertain in professional players. This study examined the relationship between directly measured and estimated maximal oxygen uptake (V˙O2max) during the Yo-Yo [...] Read more.
Accurate assessment of aerobic fitness is crucial in soccer; however, the validity of field-based predictive tests remains uncertain in professional players. This study examined the relationship between directly measured and estimated maximal oxygen uptake (V˙O2max) during the Yo-Yo Endurance Test Level 1 (YYET1) in professional soccer players and evaluated seasonal changes after six months of training and competition. Seventeen players from an Italian third-division team performed the YYET1 in pre- and mid-season conditions, while VO2max was continuously recorded using a portable metabolic system. VO2max was estimated using Bangsbo’s distance-based formula. Linear regression and Bland–Altman analyses were used to assess relationships and agreement between methods. Measured VO2max increased significantly from pre- to mid- season (+13.9%, p < 0.001), whereas estimated values showed a smaller rise (+5.2%, p < 0.001). The predictive method systematically underestimated VO2max (bias −2.3 to −7.0 mL·kg−1·min−1), and regression analyses revealed only moderate shared variance (R2 = 0.18–0.20) between estimated and measured values. These findings demonstrate that Bangsbo’s equation lacks validity for estimating VO2max in professional players and cannot accurately track aerobic adaptations across a season. For precise physiological evaluation, direct measurement using portable metabolic systems is required, while submaximal soccer-specific protocols may offer practical alternatives for longitudinal monitoring. Full article
Show Figures

Figure 1

14 pages, 2974 KB  
Data Descriptor
Articulatory Data on Preboundary Lengthening Across Prominence Conditions in American English
by Jiyoung Jang, Sahyang Kim and Taehong Cho
Data 2025, 10(12), 197; https://doi.org/10.3390/data10120197 - 1 Dec 2025
Viewed by 437
Abstract
This article presents articulatory–kinematic data on preboundary lengthening (Intonational Phrase-final lengthening) from the productions of ten native speakers of American English—a relatively rare class of phonetic data compared with the more widely available acoustic data. The dataset includes three trisyllabic nonce words (bábaba, [...] Read more.
This article presents articulatory–kinematic data on preboundary lengthening (Intonational Phrase-final lengthening) from the productions of ten native speakers of American English—a relatively rare class of phonetic data compared with the more widely available acoustic data. The dataset includes three trisyllabic nonce words (bábaba, babába, bababá), each designed to manipulate the location of lexical stress. These were produced under prosodic conditions that varied in boundary position and focus-induced phrasal prominence, enabling analysis of how preboundary lengthening is distributed across words with different lexical stress locations and how it interacts with prosodic prominence. Articulatory data were collected using electromagnetic articulography (EMA, Carstens AG200), providing kinematic measurements such as movement duration, peak velocity, and displacement of articulatory gestures. The accompanying files allow examination of individual speaker variation in these measures as modulated by prosodic structure, including boundary and prominence effects. While theoretical findings have been reported in a previous study, the full dataset, including detailed descriptions of individual speaker patterns, is made available here. By making these less commonly available articulatory data publicly available, we aim to promote broad reuse and support further research in prosody, articulatory phonetics, and speech production. Full article
Show Figures

Figure 1

14 pages, 2679 KB  
Article
The KIF18A Inhibitor ATX020 Induces Mitotic Arrest and DNA Damage in Chromosomally Instable High-Grade Serous Ovarian Cancer Cells
by Jayakumar Nair, Tzu-Ting Huang, Maureen Lynes, Sanjoy Khan, Serena Silver and Jung-Min Lee
Cells 2025, 14(23), 1863; https://doi.org/10.3390/cells14231863 - 26 Nov 2025
Viewed by 2786
Abstract
High-grade serous ovarian cancer (HGSOC) is the most common (~80%) and lethal ovarian cancer subtype in the United States, characterized by TP53 mutations and DNA repair defects causing chromosomal instability (CIN). KIF18A is an essential cytoskeletal motor protein for cell division in CIN+ [...] Read more.
High-grade serous ovarian cancer (HGSOC) is the most common (~80%) and lethal ovarian cancer subtype in the United States, characterized by TP53 mutations and DNA repair defects causing chromosomal instability (CIN). KIF18A is an essential cytoskeletal motor protein for cell division in CIN+ cancer cells, but it is not necessary for cell division in normal cells. Therefore, KIF18A represents a promising target for therapeutic interventions in CIN+ cancers. We investigated the use of a novel KIF18A inhibitor ATX020, for selectively targeting CIN+ HGSOC cells using growth inhibition assays, invasion assays, immunoassays, cell cycle analysis, and immunofluorescence techniques. Using DepMap and flow cytometry, we classified a panel of HGSOC cell lines based on aneuploidy scores (AS) and ploidy levels and identified a correlation between these classifications and sensitivity against ATX020. ATX020 induced cytotoxicity through mitotic arrest and DNA damage, and reduced tumor growth in HGSOC with high aneuploidy scores (AS). Mechanistically, ATX020 blocks KIF18A’s plus-end movement on spindle fibers, increasing spindle length, resulting in chromosomal mis-segregation, aneuploidy, and DNA damage. Our findings suggest that ATX020 inhibits CIN+ HGSOC cells mainly by inducing mitotic arrest and DNA damage, disrupting KIF18A’s function crucial for mitosis. Full article
Show Figures

Figure 1

22 pages, 2265 KB  
Article
A Secure and Robust Multimodal Framework for In-Vehicle Voice Control: Integrating Bilingual Wake-Up, Speaker Verification, and Fuzzy Command Understanding
by Zhixiong Zhang, Yao Li, Wen Ren and Xiaoyan Wang
Eng 2025, 6(11), 319; https://doi.org/10.3390/eng6110319 - 10 Nov 2025
Viewed by 1231
Abstract
Intelligent in-vehicle voice systems face critical challenges in robustness, security, and semantic flexibility under complex acoustic conditions. To address these issues holistically, this paper proposes a novel multimodal and secure voice-control framework. The system integrates a hybrid dual-channel wake-up mechanism, combining a commercial [...] Read more.
Intelligent in-vehicle voice systems face critical challenges in robustness, security, and semantic flexibility under complex acoustic conditions. To address these issues holistically, this paper proposes a novel multimodal and secure voice-control framework. The system integrates a hybrid dual-channel wake-up mechanism, combining a commercial English engine (Picovoice) with a custom lightweight ResNet-Lite model for Chinese, to achieve robust cross-lingual activation. For reliable identity authentication, an optimized ECAPA-TDNN model is introduced, enhanced with spectral augmentation, sliding window feature fusion, and an adaptive threshold mechanism. Furthermore, a two-tier fuzzy command matching algorithm operating at character and pinyin levels is designed to significantly improve tolerance to speech variations and ASR errors. Comprehensive experiments on a test set encompassing various Chinese dialects, English accents, and noise environments demonstrate that the proposed system achieves high performance across all components: the wake-up mechanism maintains commercial-grade reliability for English and provides a functional baseline for Chinese; the improved ECAPA-TDNN attains low equal error rates of 2.37% (quiet), 5.59% (background music), and 3.12% (high-speed noise), outperforming standard baselines and showing strong noise robustness against the state of the art; and the fuzzy matcher boosts command recognition accuracy to over 95.67% in quiet environments and above 92.7% under noise, substantially outperforming hard matching by approximately 30%. End-to-end tests confirm an overall interaction success rate of 93.7%. This work offers a practical, integrated solution for developing secure, robust, and flexible voice interfaces in intelligent vehicles. Full article
(This article belongs to the Section Electrical and Electronic Engineering)
Show Figures

Figure 1

7 pages, 1456 KB  
Proceeding Paper
Towards a More Natural Urdu: A Comprehensive Approach to Text-to-Speech and Voice Cloning
by Muhammad Ramiz Saud, Muhammad Romail Imran and Raja Hashim Ali
Eng. Proc. 2025, 87(1), 112; https://doi.org/10.3390/engproc2025087112 - 20 Oct 2025
Cited by 13 | Viewed by 1689
Abstract
This paper introduces a comprehensive approach to building natural-sounding Urdu Text-to-Speech (TTS) and voice cloning systems, addressing the lack of computational resources for Urdu. We developed a large-scale dataset of over 100 h of Urdu speech, carefully cleaned and phonetically aligned through an [...] Read more.
This paper introduces a comprehensive approach to building natural-sounding Urdu Text-to-Speech (TTS) and voice cloning systems, addressing the lack of computational resources for Urdu. We developed a large-scale dataset of over 100 h of Urdu speech, carefully cleaned and phonetically aligned through an automated transcription pipeline to preserve linguistic accuracy. The dataset was then used to fine-tune Tacotron2, a neural network model originally trained for English, with modifications tailored to Urdu’s phonological and morphological features. To further enhance naturalness, we integrated voice cloning techniques that capture regional accents and produce personalized speech outputs. Model performance was evaluated through mean opinion score (MOS), word error rate (WER), and speaker similarity, showing substantial improvements compared to previous Urdu systems. The results demonstrate clear progress toward natural and intelligible Urdu speech synthesis, while also revealing challenges such as handling dialectal variation and preventing model overfitting. This work contributes an essential resource and methodology for advancing Urdu natural language processing (NLP), with promising applications in education, accessibility, entertainment, and assistive technologies. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Graphical abstract

Back to TopTop