Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (664)

Search Parameters:
Keywords = speech intelligibility

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 2321 KB  
Article
A Deployment-Aware Data Processing Approach for Accuracy and Authenticity Evaluation of Artificial Emotional Intelligence in IoT Edge with Deep Learning
by Şükrü Mustafa Kaya
Appl. Sci. 2026, 16(9), 4394; https://doi.org/10.3390/app16094394 - 30 Apr 2026
Abstract
Artificial Emotional Intelligence (AEI) has gained significant attention for enabling machines to recognize and interpret human affective states through modalities such as speech. While deep learning-based speech emotion recognition (SER) models have achieved promising accuracy levels, their practical deployment in resource-constrained IoT edge [...] Read more.
Artificial Emotional Intelligence (AEI) has gained significant attention for enabling machines to recognize and interpret human affective states through modalities such as speech. While deep learning-based speech emotion recognition (SER) models have achieved promising accuracy levels, their practical deployment in resource-constrained IoT edge environments remains insufficiently explored. In particular, there is a lack of systematic evaluation approaches that jointly consider classification performance, computational efficiency, and deployment feasibility under edge-oriented operational constraints. In this study, I address this gap by proposing a deployment-aware evaluation perspective for SER systems operating under IoT edge constraints. Rather than introducing a new model architecture, I focus on establishing a unified and reproducible evaluation framework that reflects practical deployment considerations for edge-based intelligent systems. Within this framework, three widely used deep learning architectures, convolutional neural networks (CNN), long short-term memory (LSTM), and dense neural networks, are systematically analyzed using the EMODB dataset. The experimental results demonstrate that CNN-based models achieve the most consistent classification performance, with peak validation accuracy reaching approximately 84%, while also providing a favorable balance between recognition performance and computational efficiency. To better reflect deployment-oriented evaluation, the study also considers latency-related behavior and computational characteristics relevant to edge computing environments based on benchmark-driven estimations. The findings highlight the importance of deployment-aware evaluation strategies and provide practical insights for selecting suitable model architectures in edge-oriented speech emotion recognition scenarios. This study contributes to bridging the gap between theoretical deep learning performance and practical feasibility considerations in IoT-based intelligent systems. Full article
Show Figures

Figure 1

20 pages, 2281 KB  
Technical Note
Development and Evaluation of a Low-Cost Open-Source Nasometer
by Liwei Wang, Alessia Romani, Scott Adams, Joshua M. Pearce and Vijay Parsa
Sensors 2026, 26(9), 2739; https://doi.org/10.3390/s26092739 - 28 Apr 2026
Abstract
Hypernasality is a common characteristic of several speech disorders and can significantly affect perceived speech intelligibility and quality. Nasometry quantifies nasalance by calculating the proportion of acoustic energy emitted from the nasal cavity relative to the combined nasal and oral acoustic output during [...] Read more.
Hypernasality is a common characteristic of several speech disorders and can significantly affect perceived speech intelligibility and quality. Nasometry quantifies nasalance by calculating the proportion of acoustic energy emitted from the nasal cavity relative to the combined nasal and oral acoustic output during speech production and is commonly used in clinical assessment and research. However, commercially available nasometers are costly and limited in portability, restricting their use in resource-limited or remote settings. The primary purpose of this study was to design and build a low-cost, open-source mobile nasometer prototype (“mNasometer”) by leveraging advances in 3D printing, off-the-shelf electronic components, and a custom open-source mobile application. A secondary aim was to compare the electroacoustic and subjective performance of mNasometer with that of a gold-standard commercial nasometer. Electroacoustic analyses focused on comparing long-term averaged spectra and the oral/nasal acoustic isolation between the gold-standard commercial nasometer and the proposed mNasometer, which incorporates a 3D-printed nasal separation plate. In addition, nasalance scores were collected from ten healthy young adult participants using both systems during structured speech production tasks (i.e., reading standard passages or nasal sentences). Agreement between devices was evaluated using correlational analyses and comparative statistical procedures. Long-term averaged spectra exhibited similar profiles between the commercial nasometer and the mNasometer across different test stimuli, indicating comparable capture of stimulus energy distributions. Although the mNasometer demonstrated reduced oral–nasal acoustic isolation relative to the commercial system, objective nasalance scores followed similar overall trends between devices, with statistically significant stimulus-dependent differences observed. Frame-wise correlational analyses revealed significant correlations between nasalance measures obtained from the commercial nasometer and the mNasometer across most of the speech production tasks, suggesting that the reduced isolation did not critically compromise measurement correspondence. In summary, the low-cost, open-source mNasometer prototype provides nasalance measurements that show promising agreement with those of a gold-standard commercial device. Its reduced cost and increased portability suggest potential for expanded research and field-based applications in the objective assessment of nasalance. Full article
(This article belongs to the Section Biomedical Sensors)
27 pages, 3977 KB  
Review
Recovering Speech from Vibrations: Principles and Algorithms in Radar and Laser Sensing
by Emily Bederov, Baruch Berdugo and Israel Cohen
Sensors 2026, 26(8), 2553; https://doi.org/10.3390/s26082553 - 21 Apr 2026
Viewed by 307
Abstract
Sensing audio using non-acoustic modalities such as millimeter-wave radar and laser-based systems has emerged as an active research area with significant implications for privacy, security, and robust speech processing. These approaches recover speech-related information from vibration measurements captured by non-acoustic sensing modalities. Prior [...] Read more.
Sensing audio using non-acoustic modalities such as millimeter-wave radar and laser-based systems has emerged as an active research area with significant implications for privacy, security, and robust speech processing. These approaches recover speech-related information from vibration measurements captured by non-acoustic sensing modalities. Prior work spans a wide range of techniques, from classical signal-processing pipelines to modern machine-learning and deep-learning models, enabling applications such as speech reconstruction, eavesdropping, automatic speech recognition, and noise-robust enhancement. Some systems rely on radar or laser sensing as a standalone audio surrogate, while others fuse radar-derived features with microphone signals to improve robustness in noisy or non-line-of-sight environments. Experimental results across the literature demonstrate that recovering intelligible speech or discriminative speech features from radar or laser-sensed vibrations is feasible under controlled conditions. However, performance remains sensitive to practical factors including sensing distance, object material and geometries, environmental interference, multipath effects, and task complexity. Not all speech-related tasks are reliably solved, particularly in unconstrained real-world scenarios. Overall, the field is rapidly evolving, with open challenges in robustness, generalization, and deployment, offering several promising directions for future research. Full article
Show Figures

Figure 1

36 pages, 6746 KB  
Article
An Archaeoacoustic Analysis of a Single-Nave Hall in the Cellars of Diocletian’s Palace in Split, Croatia
by Mateja Nosil Mešić, Marko Horvat and Zoran Veršić
Acoustics 2026, 8(2), 26; https://doi.org/10.3390/acoustics8020026 - 20 Apr 2026
Viewed by 329
Abstract
Diocletian’s palace with its cellars represents one of the most important cultural heritage sites of the ancient Roman civilisation on the present-day Croatian territory. The cellar complex has been rediscovered only recently and has been preserved remarkably well due to its centuries-long concealment [...] Read more.
Diocletian’s palace with its cellars represents one of the most important cultural heritage sites of the ancient Roman civilisation on the present-day Croatian territory. The cellar complex has been rediscovered only recently and has been preserved remarkably well due to its centuries-long concealment beneath mediaeval urban matrices. An archaeoacoustic analysis was performed on a selected single-nave hall as a small part of this complex. A model of the hall was developed in room acoustics simulation software and calibrated based on the results of field measurements. Acoustic suitability of the hall for speech-based events and music performances was then evaluated according to contemporary objective criteria, and the findings were compared with the results of similar studies performed on other heritage sites. The hall was found to be very well suited for speech in terms of intelligibility and mid-frequency reverberation, thus showing potential for revitalisation, with excessive low-frequency reverberation in the hall and reduced audibility in the farthest part of the audience as potential issues. With a feasible audience size, the hall is not reverberant enough for music performances but provides high clarity. In terms of sound strength, the hall is suitable for solo performers or small ensembles. Excessive perceptive broadening of the sound source is expected due to strong early lateral energy. In terms of traditional Dalmatian a cappella singing, the acoustics of the hall are likely to support and enhance such performances. Full article
(This article belongs to the Collection Historical Acoustics)
Show Figures

Figure 1

29 pages, 417 KB  
Article
An AI-Based Security Architecture for Fraud Detection in Cloud Call Centers for Low-Resource Languages: Arabic as a Use Case
by Pinar Boluk and Hana’a Maratouq
Electronics 2026, 15(8), 1718; https://doi.org/10.3390/electronics15081718 - 18 Apr 2026
Viewed by 173
Abstract
Cloud-based telephony platforms face growing fraud risks including voice phishing (vishing), subscription abuse, and organizational impersonation, with detection being especially challenging in low-resource languages such as Arabic. We present an Artificial Intelligence (AI)-based security architecture for fraud detection in Arabic cloud call centers, [...] Read more.
Cloud-based telephony platforms face growing fraud risks including voice phishing (vishing), subscription abuse, and organizational impersonation, with detection being especially challenging in low-resource languages such as Arabic. We present an Artificial Intelligence (AI)-based security architecture for fraud detection in Arabic cloud call centers, combining onboarding verification, behavioral monitoring, domain-adapted Automatic Speech Recognition (ASR), semantic transcript search, and Large Language Model (LLM)-based entity verification. The domain-adapted Langa ASR model achieves a Word Error Rate (WER) of 41.0% and Character Error Rate (CER) of 18.2%, outperforming all evaluated commercial baselines. LLM-based entity extraction with multi-call consensus achieves 97.3% company-name accuracy (Generative Pre-trained Transformer 4, GPT-4) and 92.0% in the cost-effective deployed configuration (GPT-3.5 with log-probability filtering). Evaluated on production data from a Middle East and North Africa (MENA)-region provider spanning more than 1000 accounts, the pipeline flagged 47 accounts of which 41 were confirmed fraudulent (directly observed precision 87.2%, 95% confidence interval (CI): 74.3–95.2%; estimated recall 51–82% under conservative base-rate assumptions—not directly measured), providing evidence for the viability of a unified, threat-model-driven architecture for low-resource telephony fraud detection. Full article
(This article belongs to the Special Issue AI-Enhanced Security: Advancing Threat Detection and Defense)
Show Figures

Figure 1

13 pages, 566 KB  
Article
Effects of Stimulus Complexity on the Phonemic Restoration Effect
by Nirmal Srinivasan, Sadie O’Neill and Chhayakanta Patro
Audiol. Res. 2026, 16(2), 60; https://doi.org/10.3390/audiolres16020060 - 15 Apr 2026
Viewed by 223
Abstract
Background/Objectives: Phonemic restoration refers to improved speech understanding when periodic silent interruptions are replaced by a plausible masking sound, reflecting an interaction between perceptual continuity and top-down linguistic inference. This study tested whether the magnitude and rate dependence of phonemic restoration vary systematically [...] Read more.
Background/Objectives: Phonemic restoration refers to improved speech understanding when periodic silent interruptions are replaced by a plausible masking sound, reflecting an interaction between perceptual continuity and top-down linguistic inference. This study tested whether the magnitude and rate dependence of phonemic restoration vary systematically with stimulus complexity, operationalized using speech materials that differ in response constraints and linguistic variability. Methods: Young adults with normal audiometric thresholds completed an interrupted-speech identification task using five corpora spanning closed-set and open-set speech corpora. Stimuli were periodically interrupted at 2 Hz and 3 Hz with a 50% duty cycle. For each corpus and rate, interruption intervals were either left silent or filled with speech-shaped noise. Results: Closed-set materials yielded higher intelligibility than open-set materials across conditions. Replacing silent gaps with speech-shaped noise improved intelligibility for all corpora. Importantly, the joint influence of interruption rate and gap-filler depended on the stimulus type: rate-by-filler interactions were most evident for the open-set corpora as compared to the closed-set corpora. Keyword identification varied systematically with word position for the open-set materials, indicating nonuniform vulnerability across sentence structures. Conclusions: These results indicate that phonemic restoration is robust but material-dependent. Stimulus complexity shapes how temporal sampling and masking plausibility combine to support perceptual repair, and open-set, high-variability materials are particularly sensitive to these interactions. Full article
Show Figures

Figure 1

21 pages, 748 KB  
Systematic Review
Accuracy of Machine Learning Models in Predicting Clinical Outcomes in Bipolar Disorder: A Systematic Review
by Jing Ling Tay, Ling Zhang and Kang Sim
Brain Sci. 2026, 16(4), 415; https://doi.org/10.3390/brainsci16040415 - 15 Apr 2026
Viewed by 379
Abstract
Background/Objectives: Bipolar disorder (BD) is one of the leading causes of disability worldwide, causing significant functional impairments in those affected. The heterogeneous course of BD renders the prediction of clinical progress and outcomes challenging, but it can be potentially enhanced with the use [...] Read more.
Background/Objectives: Bipolar disorder (BD) is one of the leading causes of disability worldwide, causing significant functional impairments in those affected. The heterogeneous course of BD renders the prediction of clinical progress and outcomes challenging, but it can be potentially enhanced with the use of artificial intelligence methods. In this systematic review, we aimed to examine the extant literature regarding the predictive accuracy of clinical functioning, illness affective state, relapse, and relevant predictors amongst patients with BD, using artificial intelligence methods. Methods: The study was guided by PRISMA and the Cochrane Handbook for Systematic Reviews. Six electronic databases were systematically searched from inception for relevant studies until July 2025 and relevant data were summarised in tables. The protocol of the review was registered on Prospero, ID: CRD42024590343. Results: Forty articles were included in this review. The area under the curve (AUC) values for clinical functioning, illness affective state, and relapse prediction were 0.59–0.72 (poor to acceptable), 0.57–0.97 (poor to outstanding), and 0.45–0.98 (poor to outstanding), respectively. Supervised, tree-based algorithms performed the best. Predictive factors included sociodemographic, clinical and psychological factors and wearable data, as well as speech and video recordings. Conclusions: Existing studies showed the potential of machine learning methods in the prediction of clinical progress and outcomes of BD (specifically functional status, affective state, and relapse) based on relevant collected variables. Longitudinal studies can further clarify and validate the associated predictive factors for earlier identification of those at risk of poorer prognosis to enhance management of BD. Full article
Show Figures

Figure 1

11 pages, 394 KB  
Review
Emerging Speech-in-Noise Tools for the Assessment of Hearing Loss: A Scoping Review
by Andrea Migliorelli, Marianna Manuelli, Chiara Visentin, Chiara Bianchini, Francesco Stomeo, Stefano Pelucchi, Nicola Prodi and Andrea Ciorba
Audiol. Res. 2026, 16(2), 57; https://doi.org/10.3390/audiolres16020057 - 11 Apr 2026
Viewed by 350
Abstract
Background/Objectives: The objective of this scoping review was to map and critically describe emerging speech-in-noise assessment tools developed over the last decade for the evaluation of hearing loss beyond conventional audiological measures. Methods: This review was conducted in accordance with the [...] Read more.
Background/Objectives: The objective of this scoping review was to map and critically describe emerging speech-in-noise assessment tools developed over the last decade for the evaluation of hearing loss beyond conventional audiological measures. Methods: This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines. A comprehensive literature search was performed in the PubMed/MEDLINE, Scopus, and Embase databases. A comprehensive review of studies describing novel or emerging SIN-based assessment tools was conducted, with a particular emphasis on those including adult participants with normal hearing and hearing loss. Results: Nine studies met the inclusion criteria and were included in the review. The identified tools cover a range of methodological innovations, including advanced digits-in-noise paradigms, antiphasic and binaural presentation modes, optimized adaptive procedures, and digital or automated testing platforms. Several studies also incorporated artificial intelligence-based approaches, such as machine learning, text-to-speech, and automatic speech recognition, to enhance test development, administration, and hearing loss classification. Across all studies, SIN measures demonstrated the ability to reliably differentiate between normal hearing listeners and individuals with hearing loss and to provide complementary information beyond pure-tone audiometry. Conclusions: Emerging speech-in-noise tools show considerable potential to improve the functional assessment of hearing loss and to support more sensitive, accessible, and scalable approaches for hearing evaluation. Further research is required to assess their clinical integration and long-term impact on hearing screening and diagnostic pathways. Full article
Show Figures

Figure 1

20 pages, 489 KB  
Systematic Review
Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review
by Yuhan Zhang, Alba Carrió, Julia Sevilla-Llewellyn-Jones, Enrique Gutiérrez, Ana Calvo, Jose-Blas Navarro and Ana Barajas
Healthcare 2026, 14(8), 999; https://doi.org/10.3390/healthcare14080999 - 10 Apr 2026
Viewed by 351
Abstract
Background/Objectives: In recent years, research on psychosis has increasingly focused on prevention, aiming to implement early interventions that mitigate or reduce its impact. Within this framework, the analysis of linguistic markers in individuals with at-risk mental states (ARMS) has proven valuable for [...] Read more.
Background/Objectives: In recent years, research on psychosis has increasingly focused on prevention, aiming to implement early interventions that mitigate or reduce its impact. Within this framework, the analysis of linguistic markers in individuals with at-risk mental states (ARMS) has proven valuable for identifying those at risk and predicting psychosis onset. Artificial intelligence tools, particularly natural language processing (NLP), have emerged as effective resources for detecting these language-based indicators. This study aims to synthesize the existing scientific evidence on linguistic markers analyzed through NLP techniques in individuals with ARMS. Methods: A systematic review following the PRISMA 2020 protocol was conducted. Three databases (PubMed, PsycInfo, and Scopus) were searched for published articles from their inception to October 2025. Rayyan software was used to manage references and article downloads. Out of ninety initial search results, fifteen studies involving 1313 participants from diverse groups were included in the review. Results: The findings indicated that alterations in semantic coherence, syntactic complexity, referential cohesion, and speech/content poverty differentiated ARMS individuals from healthy controls. Several of these markers, analyzed with NLP methods, predicted the onset of psychosis with accuracy levels ranging from 79% to 100%, although these findings should be interpreted with caution due to the significant methodological heterogeneity and variability in sample sizes across the included studies. Conclusions: NLP techniques offer a powerful approach for detecting language alterations that distinguish ARMS individuals and provide meaningful predictions of psychosis onset, highlighting their potential as a complement to traditional clinical assessments for early identification and prevention. Full article
Show Figures

Figure 1

21 pages, 288 KB  
Article
In the Space Between Words: Speech–Silence Dynamics, Religio–Racial Formations, and Christian–Muslim Relationships in The Netherlands
by Deniz Aktaş
Genealogy 2026, 10(2), 43; https://doi.org/10.3390/genealogy10020043 - 10 Apr 2026
Viewed by 334
Abstract
In Western Europe, and particularly in The Netherlands, speech is rarely neutral: to talk is to participate morally and civically, while silence is frequently marked as evasive, passive, or suspect. The capacities for speech, for being heard, understood, and responsive, are widely regarded [...] Read more.
In Western Europe, and particularly in The Netherlands, speech is rarely neutral: to talk is to participate morally and civically, while silence is frequently marked as evasive, passive, or suspect. The capacities for speech, for being heard, understood, and responsive, are widely regarded as hallmarks of autonomous, transparent, free-thinking, and sovereign subjectivity, celebrated as expressions of a shared progressive modernity. These ideals of subjectivity are routinely placed in tension within the so-called secular–religious binary framework, in which the compatibility of non-secular sensibilities or non-Christian religions, especially Islam, with such Dutch societal values is persistently and heavily problematized. Within such accounts, speech becomes a criterion Muslims in Europe are then expected to meet, not merely by speaking but by doing so in ways deemed proper and intelligible. To complicate and deepen understanding of these dynamics, this article draws on ethnographic insights from (secular) Christian–Muslim couples in The Netherlands, looking at how the dynamics of speech–silence function within intimate contexts, where they take place, where they break down, and ultimately where their limits lie. Attuned to the cacophony of multivocal gestures, whether in acts of refusal, the quiet eloquence of silence, or the directness of vocal protest, the article reveals the intricate and consequential interplay between these dynamics and the structuring and affective forms of secular and religio-racial norms in everyday life. Full article
(This article belongs to the Special Issue Secularism, Multiculturalism and Race–Religion Entanglements)
31 pages, 1954 KB  
Article
HASCom: A Heterogeneous Affective-Semantic Communication Framework for Speech Transmission
by Zhenjia Yu, Taojie Zhu, Md Arman Hossain, Zineb Zbarna and Lei Wang
Sensors 2026, 26(7), 2158; https://doi.org/10.3390/s26072158 - 31 Mar 2026
Viewed by 605
Abstract
Driven by the development of next-generation wireless networks and the widespread adoption of sensing, communication is shifting from traditional bit-level transmission to intelligent, rich interactions within our digital social system. However, existing speech semantic communication frameworks predominantly focus on textual accuracy, neglecting the [...] Read more.
Driven by the development of next-generation wireless networks and the widespread adoption of sensing, communication is shifting from traditional bit-level transmission to intelligent, rich interactions within our digital social system. However, existing speech semantic communication frameworks predominantly focus on textual accuracy, neglecting the critical affective information (e.g., tone and emotion) that is essential for natural human-centric interactions in the real world. To address this limitation, we propose the Heterogeneous Affective Speech Semantic Communication (HASCom) framework, designed for the robust transmission of highly expressive speech over complex wireless channels. Specifically, we design a heterogeneous dual-stream transmission architecture that decouples discrete phoneme-level linguistic content from continuous emotional embeddings. For discrete semantic information, we use reliable digital coding protected by Low-Density Parity-Check (LDPC) to guarantee strict recoverability. Conversely, for emotional features, we employ Deep Joint Source-Channel Coding (JSCC) analog transmission to prevent irreversible quantization errors and the cliff effect. Additionally, we develop a prior-guided diffusion reconstruction module at the receiving end. This module leverages a structural prior network to align the decoded semantics, which then steers the reverse diffusion process conditioned on the recovered affective features. Extensive experiments under both AWGN and Rayleigh fading channels demonstrate that HASCom significantly outperforms state-of-the-art baselines. Specifically, it achieves superior objective semantic similarity and subjective Mean Opinion Score (MOS) at low Signal-to-Noise Ratios (SNRs), while the JSCC transmission modules maintain an ultra-low inference latency of less than 0.1 ms, validating its high efficiency and robustness for practical deployments. Full article
Show Figures

Figure 1

17 pages, 6806 KB  
Article
Personalization and Generative Dialogue in Social Robotics for Eldercare: A User Study
by Luca Pozzi, Marco Nasato, Nicola Toscani, Francesco Braghin and Marta Gandolla
Appl. Sci. 2026, 16(7), 3369; https://doi.org/10.3390/app16073369 - 31 Mar 2026
Viewed by 478
Abstract
Service robots have the potential to support cognitive and social well-being in long-term care facilities, yet their widespread adoption depends on intuitive interaction modalities that minimize user learning effort and the need for a technical expert on-ground. Spoken dialogue is a natural interface, [...] Read more.
Service robots have the potential to support cognitive and social well-being in long-term care facilities, yet their widespread adoption depends on intuitive interaction modalities that minimize user learning effort and the need for a technical expert on-ground. Spoken dialogue is a natural interface, and recent advances in large language models (LLMs) promise more flexible and engaging exchanges than traditional scripted systems. In this study, we implemented a modular speech-based architecture combining automatic speech recognition, text-to-speech synthesis, and a conversational agent capable of switching between a fully scripted and LLM-driven dialogue. The implemented architecture was embodied in a TIAGo robot (PAL Robotics) and tested to compare three conversational strategies: (1) scripted, pre-defined dialogue, (2) LLM-based free-form conversation, and (3) LLM-based conversation augmented with personal information provided through the prompt. Eighteen younger adults and eighteen older adults engaged in a five-minute interaction with the robot under all three conditions in a within-subject design, and subsequently completed the Almere model questionnaire. Across all subscales and both participant groups, differences between dialogue strategies were small and statistically non-significant, despite informal comments from several older participants indicating a perceived increase in intelligence or naturalness for the LLM conditions. The findings suggest that generative dialogue and basic personalization alone do not meaningfully shift perceived acceptance in brief, task-neutral encounters, underscoring the importance of longer-term deployment and functionally meaningful robot roles in future evaluations. Full article
(This article belongs to the Special Issue Latest Advances and Prospects of Human-Robot Interaction (HRI))
Show Figures

Figure 1

13 pages, 235 KB  
Article
A Comparative Cross-Sectional Study of Prosthodontic Residents and Large Language Models on Standardized Multiple-Choice Questions
by Gül Ates and Ali Can Bulut
Appl. Sci. 2026, 16(7), 3296; https://doi.org/10.3390/app16073296 - 29 Mar 2026
Viewed by 289
Abstract
Recent advances in artificial intelligence have expanded the use of large language models (LLMs) beyond speech-based applications and increased interest in their potential roles in dental education. However, evidence regarding LLM performance in postgraduate dental education, particularly in prosthodontics, remains limited. Therefore, this [...] Read more.
Recent advances in artificial intelligence have expanded the use of large language models (LLMs) beyond speech-based applications and increased interest in their potential roles in dental education. However, evidence regarding LLM performance in postgraduate dental education, particularly in prosthodontics, remains limited. Therefore, this study aimed to compare the accuracy of responses from prosthodontic residents and LLMs to standardized multiple-choice questions in prosthodontics and to explore the potential role of artificial intelligence in prosthodontic education. Thirty-two prosthodontic residents participated in this cross-sectional study. Participants completed a standardized 30-item multiple-choice test comprising four demographic items and 26 questions assessing basic knowledge, general dentistry, and advanced prosthodontic specialty questions. The same questions were administered to seven large language models (LLMs): ChatGPT-4o, ChatGPT-o1, ChatGPT-o3-mini, Claude Sonnet 3.7, Gemini 2.5 Pro, Microsoft Copilot (web interface, accessed in August 2025), and DeepSeek V3. Response accuracy and consistency were evaluated. Statistical analyses were performed using IBM SPSS Statistics (version 27.0), with statistical significance set at p < 0.05. A statistically significant difference was observed between prosthodontic residents and LLMs in responses to advanced-level prosthodontic specialty questions (p < 0.05), with higher correct response rates recorded for LLMs. No statistically significant differences were identified between the two groups for basic knowledge and general dentistry questions (p > 0.05). In addition, no significant association was found between the duration of prosthodontic residency training and residents’ response accuracy (p > 0.05). LLMs achieved high scores on this structured MCQ-based assessment, particularly in advanced theoretical prosthodontic items. However, these findings should be interpreted with caution within the limits of a written examination format and do not represent overall clinical competence or real-world patient care performance. Accordingly, artificial intelligence may be considered a supportive educational tool in postgraduate prosthodontic education rather than a replacement for clinical training. Full article
22 pages, 1060 KB  
Systematic Review
Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input
by Sareen Kaur Bhar
Encyclopedia 2026, 6(4), 74; https://doi.org/10.3390/encyclopedia6040074 - 27 Mar 2026
Viewed by 1177
Abstract
Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine [...] Read more.
Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine how artificial intelligence (AI)-mediated instruction supports EFL speaking development. The included studies were analysed according to AI modality, pedagogical integration, instructional input characteristics, and linguistic and affective outcomes. Findings indicate that AI tools—such as chatbots, automatic speech recognition systems, and large language models—consistently support affective outcomes, including reduced speaking anxiety and increased willingness to communicate. Improvements in fluency, pronunciation, and accuracy were frequently reported, particularly when AI tools were embedded within task-based and pedagogically structured instructional designs. However, evidence for sustained development of higher-order communicative competence was more variable. The review proposes a mediated input framework conceptualising AI as a design-sensitive instructional resource rather than an autonomous teaching agent. Full article
(This article belongs to the Section Arts & Humanities)
Show Figures

Figure 1

18 pages, 7435 KB  
Article
A Comparative Analysis of Deep-Learning-Based Speech Enhancement Models: Assessing Biometric Speaker Verification in Real-World Noisy Environments
by Md Jahangir Alam Khondkar, Ajan Ahmed, Stephanie Schuckers and Masudul H. Imtiaz
Big Data Cogn. Comput. 2026, 10(3), 98; https://doi.org/10.3390/bdcc10030098 - 23 Mar 2026
Viewed by 616
Abstract
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation [...] Read more.
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation remains an open problem. We address this gap by benchmarking three architecturally distinct DL-based enhancement models—Wave-U-Net, CMGAN, and U-Net—on three independent, domain-diverse corpora (SpEAR, VPQAD, and Clarkson) that the models never encountered during training and by introducing commercial-grade VeriSpeak speaker-verification scores as a biometric evaluation dimension absent from prior comparative studies. Our experiments reveal a clear three-way trade-off: U-Net achieves the highest signal-to-noise ratio (SNR) gains (+61.44% on SpEAR, +67.05% on VPQAD, +235.3% on Clarkson) but sacrifices naturalness; CMGAN yields the best perceptual evaluation of speech quality (PESQ) values (3.33, 1.35, and 2.50, respectively), favoring listening-comfort applications; and Wave-U-Net delivers the strongest biometric fidelity (VeriSpeak improvements of +11.63%, +30.22%, and +29.24%) while offering competitive perceptual quality. These results highlight that model selection must be driven by the target deployment scenario and provide actionable guidance for improving biometric verification robustness under real-world noise. Full article
Show Figures

Figure 1

Back to TopTop