Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (661)

Search Parameters:
Keywords = speech intelligibility

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
36 pages, 6746 KB  
Article
An Archaeoacoustic Analysis of a Single-Nave Hall in the Cellars of Diocletian’s Palace in Split, Croatia
by Mateja Nosil Mešić, Marko Horvat and Zoran Veršić
Acoustics 2026, 8(2), 26; https://doi.org/10.3390/acoustics8020026 - 20 Apr 2026
Abstract
Diocletian’s palace with its cellars represents one of the most important cultural heritage sites of the ancient Roman civilisation on the present-day Croatian territory. The cellar complex has been rediscovered only recently and has been preserved remarkably well due to its centuries-long concealment [...] Read more.
Diocletian’s palace with its cellars represents one of the most important cultural heritage sites of the ancient Roman civilisation on the present-day Croatian territory. The cellar complex has been rediscovered only recently and has been preserved remarkably well due to its centuries-long concealment beneath mediaeval urban matrices. An archaeoacoustic analysis was performed on a selected single-nave hall as a small part of this complex. A model of the hall was developed in room acoustics simulation software and calibrated based on the results of field measurements. Acoustic suitability of the hall for speech-based events and music performances was then evaluated according to contemporary objective criteria, and the findings were compared with the results of similar studies performed on other heritage sites. The hall was found to be very well suited for speech in terms of intelligibility and mid-frequency reverberation, thus showing potential for revitalisation, with excessive low-frequency reverberation in the hall and reduced audibility in the farthest part of the audience as potential issues. With a feasible audience size, the hall is not reverberant enough for music performances but provides high clarity. In terms of sound strength, the hall is suitable for solo performers or small ensembles. Excessive perceptive broadening of the sound source is expected due to strong early lateral energy. In terms of traditional Dalmatian a cappella singing, the acoustics of the hall are likely to support and enhance such performances. Full article
(This article belongs to the Collection Historical Acoustics)
Show Figures

Figure 1

29 pages, 409 KB  
Article
An AI-Based Security Architecture for Fraud Detection in Cloud Call Centers for Low-Resource Languages: Arabic as a Use Case
by Pinar Boluk and Hana’a Maratouq
Electronics 2026, 15(8), 1718; https://doi.org/10.3390/electronics15081718 - 18 Apr 2026
Viewed by 54
Abstract
Cloud-based telephony platforms face growing fraud risks including voice phishing (vishing), subscription abuse, and organizational impersonation, with detection being especially challenging in low-resource languages such as Arabic. We present an Artificial Intelligence (AI)-based security architecture for fraud detection in Arabic cloud call centers, [...] Read more.
Cloud-based telephony platforms face growing fraud risks including voice phishing (vishing), subscription abuse, and organizational impersonation, with detection being especially challenging in low-resource languages such as Arabic. We present an Artificial Intelligence (AI)-based security architecture for fraud detection in Arabic cloud call centers, combining onboarding verification, behavioral monitoring, domain-adapted Automatic Speech Recognition (ASR), semantic transcript search, and Large Language Model (LLM)-based entity verification. The domain-adapted Langa ASR model achieves a Word Error Rate (WER) of 41.0% and Character Error Rate (CER) of 18.2%, outperforming all evaluated commercial baselines. LLM-based entity extraction with multi-call consensus achieves 97.3% company-name accuracy (Generative Pre-trained Transformer 4, GPT-4) and 92.0% in the cost-effective deployed configuration (GPT-3.5 with log-probability filtering). Evaluated on production data from a Middle East and North Africa (MENA)-region provider spanning more than 1000 accounts, the pipeline flagged 47 accounts of which 41 were confirmed fraudulent (directly observed precision 87.2%, 95% confidence interval (CI): 74.3–95.2%; estimated recall 51–82% under conservative base-rate assumptions—not directly measured), providing evidence for the viability of a unified, threat-model-driven architecture for low-resource telephony fraud detection. Full article
(This article belongs to the Special Issue AI-Enhanced Security: Advancing Threat Detection and Defense)
13 pages, 566 KB  
Article
Effects of Stimulus Complexity on the Phonemic Restoration Effect
by Nirmal Srinivasan, Sadie O’Neill and Chhayakanta Patro
Audiol. Res. 2026, 16(2), 60; https://doi.org/10.3390/audiolres16020060 - 15 Apr 2026
Viewed by 128
Abstract
Background/Objectives: Phonemic restoration refers to improved speech understanding when periodic silent interruptions are replaced by a plausible masking sound, reflecting an interaction between perceptual continuity and top-down linguistic inference. This study tested whether the magnitude and rate dependence of phonemic restoration vary systematically [...] Read more.
Background/Objectives: Phonemic restoration refers to improved speech understanding when periodic silent interruptions are replaced by a plausible masking sound, reflecting an interaction between perceptual continuity and top-down linguistic inference. This study tested whether the magnitude and rate dependence of phonemic restoration vary systematically with stimulus complexity, operationalized using speech materials that differ in response constraints and linguistic variability. Methods: Young adults with normal audiometric thresholds completed an interrupted-speech identification task using five corpora spanning closed-set and open-set speech corpora. Stimuli were periodically interrupted at 2 Hz and 3 Hz with a 50% duty cycle. For each corpus and rate, interruption intervals were either left silent or filled with speech-shaped noise. Results: Closed-set materials yielded higher intelligibility than open-set materials across conditions. Replacing silent gaps with speech-shaped noise improved intelligibility for all corpora. Importantly, the joint influence of interruption rate and gap-filler depended on the stimulus type: rate-by-filler interactions were most evident for the open-set corpora as compared to the closed-set corpora. Keyword identification varied systematically with word position for the open-set materials, indicating nonuniform vulnerability across sentence structures. Conclusions: These results indicate that phonemic restoration is robust but material-dependent. Stimulus complexity shapes how temporal sampling and masking plausibility combine to support perceptual repair, and open-set, high-variability materials are particularly sensitive to these interactions. Full article
Show Figures

Figure 1

21 pages, 748 KB  
Systematic Review
Accuracy of Machine Learning Models in Predicting Clinical Outcomes in Bipolar Disorder: A Systematic Review
by Jing Ling Tay, Ling Zhang and Kang Sim
Brain Sci. 2026, 16(4), 415; https://doi.org/10.3390/brainsci16040415 - 15 Apr 2026
Viewed by 253
Abstract
Background/Objectives: Bipolar disorder (BD) is one of the leading causes of disability worldwide, causing significant functional impairments in those affected. The heterogeneous course of BD renders the prediction of clinical progress and outcomes challenging, but it can be potentially enhanced with the use [...] Read more.
Background/Objectives: Bipolar disorder (BD) is one of the leading causes of disability worldwide, causing significant functional impairments in those affected. The heterogeneous course of BD renders the prediction of clinical progress and outcomes challenging, but it can be potentially enhanced with the use of artificial intelligence methods. In this systematic review, we aimed to examine the extant literature regarding the predictive accuracy of clinical functioning, illness affective state, relapse, and relevant predictors amongst patients with BD, using artificial intelligence methods. Methods: The study was guided by PRISMA and the Cochrane Handbook for Systematic Reviews. Six electronic databases were systematically searched from inception for relevant studies until July 2025 and relevant data were summarised in tables. The protocol of the review was registered on Prospero, ID: CRD42024590343. Results: Forty articles were included in this review. The area under the curve (AUC) values for clinical functioning, illness affective state, and relapse prediction were 0.59–0.72 (poor to acceptable), 0.57–0.97 (poor to outstanding), and 0.45–0.98 (poor to outstanding), respectively. Supervised, tree-based algorithms performed the best. Predictive factors included sociodemographic, clinical and psychological factors and wearable data, as well as speech and video recordings. Conclusions: Existing studies showed the potential of machine learning methods in the prediction of clinical progress and outcomes of BD (specifically functional status, affective state, and relapse) based on relevant collected variables. Longitudinal studies can further clarify and validate the associated predictive factors for earlier identification of those at risk of poorer prognosis to enhance management of BD. Full article
Show Figures

Figure 1

11 pages, 394 KB  
Review
Emerging Speech-in-Noise Tools for the Assessment of Hearing Loss: A Scoping Review
by Andrea Migliorelli, Marianna Manuelli, Chiara Visentin, Chiara Bianchini, Francesco Stomeo, Stefano Pelucchi, Nicola Prodi and Andrea Ciorba
Audiol. Res. 2026, 16(2), 57; https://doi.org/10.3390/audiolres16020057 - 11 Apr 2026
Viewed by 225
Abstract
Background/Objectives: The objective of this scoping review was to map and critically describe emerging speech-in-noise assessment tools developed over the last decade for the evaluation of hearing loss beyond conventional audiological measures. Methods: This review was conducted in accordance with the [...] Read more.
Background/Objectives: The objective of this scoping review was to map and critically describe emerging speech-in-noise assessment tools developed over the last decade for the evaluation of hearing loss beyond conventional audiological measures. Methods: This review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) guidelines. A comprehensive literature search was performed in the PubMed/MEDLINE, Scopus, and Embase databases. A comprehensive review of studies describing novel or emerging SIN-based assessment tools was conducted, with a particular emphasis on those including adult participants with normal hearing and hearing loss. Results: Nine studies met the inclusion criteria and were included in the review. The identified tools cover a range of methodological innovations, including advanced digits-in-noise paradigms, antiphasic and binaural presentation modes, optimized adaptive procedures, and digital or automated testing platforms. Several studies also incorporated artificial intelligence-based approaches, such as machine learning, text-to-speech, and automatic speech recognition, to enhance test development, administration, and hearing loss classification. Across all studies, SIN measures demonstrated the ability to reliably differentiate between normal hearing listeners and individuals with hearing loss and to provide complementary information beyond pure-tone audiometry. Conclusions: Emerging speech-in-noise tools show considerable potential to improve the functional assessment of hearing loss and to support more sensitive, accessible, and scalable approaches for hearing evaluation. Further research is required to assess their clinical integration and long-term impact on hearing screening and diagnostic pathways. Full article
Show Figures

Figure 1

20 pages, 489 KB  
Systematic Review
Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review
by Yuhan Zhang, Alba Carrió, Julia Sevilla-Llewellyn-Jones, Enrique Gutiérrez, Ana Calvo, Jose-Blas Navarro and Ana Barajas
Healthcare 2026, 14(8), 999; https://doi.org/10.3390/healthcare14080999 - 10 Apr 2026
Viewed by 280
Abstract
Background/Objectives: In recent years, research on psychosis has increasingly focused on prevention, aiming to implement early interventions that mitigate or reduce its impact. Within this framework, the analysis of linguistic markers in individuals with at-risk mental states (ARMS) has proven valuable for [...] Read more.
Background/Objectives: In recent years, research on psychosis has increasingly focused on prevention, aiming to implement early interventions that mitigate or reduce its impact. Within this framework, the analysis of linguistic markers in individuals with at-risk mental states (ARMS) has proven valuable for identifying those at risk and predicting psychosis onset. Artificial intelligence tools, particularly natural language processing (NLP), have emerged as effective resources for detecting these language-based indicators. This study aims to synthesize the existing scientific evidence on linguistic markers analyzed through NLP techniques in individuals with ARMS. Methods: A systematic review following the PRISMA 2020 protocol was conducted. Three databases (PubMed, PsycInfo, and Scopus) were searched for published articles from their inception to October 2025. Rayyan software was used to manage references and article downloads. Out of ninety initial search results, fifteen studies involving 1313 participants from diverse groups were included in the review. Results: The findings indicated that alterations in semantic coherence, syntactic complexity, referential cohesion, and speech/content poverty differentiated ARMS individuals from healthy controls. Several of these markers, analyzed with NLP methods, predicted the onset of psychosis with accuracy levels ranging from 79% to 100%, although these findings should be interpreted with caution due to the significant methodological heterogeneity and variability in sample sizes across the included studies. Conclusions: NLP techniques offer a powerful approach for detecting language alterations that distinguish ARMS individuals and provide meaningful predictions of psychosis onset, highlighting their potential as a complement to traditional clinical assessments for early identification and prevention. Full article
Show Figures

Figure 1

21 pages, 288 KB  
Article
In the Space Between Words: Speech–Silence Dynamics, Religio–Racial Formations, and Christian–Muslim Relationships in The Netherlands
by Deniz Aktaş
Genealogy 2026, 10(2), 43; https://doi.org/10.3390/genealogy10020043 - 10 Apr 2026
Viewed by 267
Abstract
In Western Europe, and particularly in The Netherlands, speech is rarely neutral: to talk is to participate morally and civically, while silence is frequently marked as evasive, passive, or suspect. The capacities for speech, for being heard, understood, and responsive, are widely regarded [...] Read more.
In Western Europe, and particularly in The Netherlands, speech is rarely neutral: to talk is to participate morally and civically, while silence is frequently marked as evasive, passive, or suspect. The capacities for speech, for being heard, understood, and responsive, are widely regarded as hallmarks of autonomous, transparent, free-thinking, and sovereign subjectivity, celebrated as expressions of a shared progressive modernity. These ideals of subjectivity are routinely placed in tension within the so-called secular–religious binary framework, in which the compatibility of non-secular sensibilities or non-Christian religions, especially Islam, with such Dutch societal values is persistently and heavily problematized. Within such accounts, speech becomes a criterion Muslims in Europe are then expected to meet, not merely by speaking but by doing so in ways deemed proper and intelligible. To complicate and deepen understanding of these dynamics, this article draws on ethnographic insights from (secular) Christian–Muslim couples in The Netherlands, looking at how the dynamics of speech–silence function within intimate contexts, where they take place, where they break down, and ultimately where their limits lie. Attuned to the cacophony of multivocal gestures, whether in acts of refusal, the quiet eloquence of silence, or the directness of vocal protest, the article reveals the intricate and consequential interplay between these dynamics and the structuring and affective forms of secular and religio-racial norms in everyday life. Full article
(This article belongs to the Special Issue Secularism, Multiculturalism and Race–Religion Entanglements)
31 pages, 1954 KB  
Article
HASCom: A Heterogeneous Affective-Semantic Communication Framework for Speech Transmission
by Zhenjia Yu, Taojie Zhu, Md Arman Hossain, Zineb Zbarna and Lei Wang
Sensors 2026, 26(7), 2158; https://doi.org/10.3390/s26072158 - 31 Mar 2026
Viewed by 554
Abstract
Driven by the development of next-generation wireless networks and the widespread adoption of sensing, communication is shifting from traditional bit-level transmission to intelligent, rich interactions within our digital social system. However, existing speech semantic communication frameworks predominantly focus on textual accuracy, neglecting the [...] Read more.
Driven by the development of next-generation wireless networks and the widespread adoption of sensing, communication is shifting from traditional bit-level transmission to intelligent, rich interactions within our digital social system. However, existing speech semantic communication frameworks predominantly focus on textual accuracy, neglecting the critical affective information (e.g., tone and emotion) that is essential for natural human-centric interactions in the real world. To address this limitation, we propose the Heterogeneous Affective Speech Semantic Communication (HASCom) framework, designed for the robust transmission of highly expressive speech over complex wireless channels. Specifically, we design a heterogeneous dual-stream transmission architecture that decouples discrete phoneme-level linguistic content from continuous emotional embeddings. For discrete semantic information, we use reliable digital coding protected by Low-Density Parity-Check (LDPC) to guarantee strict recoverability. Conversely, for emotional features, we employ Deep Joint Source-Channel Coding (JSCC) analog transmission to prevent irreversible quantization errors and the cliff effect. Additionally, we develop a prior-guided diffusion reconstruction module at the receiving end. This module leverages a structural prior network to align the decoded semantics, which then steers the reverse diffusion process conditioned on the recovered affective features. Extensive experiments under both AWGN and Rayleigh fading channels demonstrate that HASCom significantly outperforms state-of-the-art baselines. Specifically, it achieves superior objective semantic similarity and subjective Mean Opinion Score (MOS) at low Signal-to-Noise Ratios (SNRs), while the JSCC transmission modules maintain an ultra-low inference latency of less than 0.1 ms, validating its high efficiency and robustness for practical deployments. Full article
Show Figures

Figure 1

17 pages, 6806 KB  
Article
Personalization and Generative Dialogue in Social Robotics for Eldercare: A User Study
by Luca Pozzi, Marco Nasato, Nicola Toscani, Francesco Braghin and Marta Gandolla
Appl. Sci. 2026, 16(7), 3369; https://doi.org/10.3390/app16073369 - 31 Mar 2026
Viewed by 391
Abstract
Service robots have the potential to support cognitive and social well-being in long-term care facilities, yet their widespread adoption depends on intuitive interaction modalities that minimize user learning effort and the need for a technical expert on-ground. Spoken dialogue is a natural interface, [...] Read more.
Service robots have the potential to support cognitive and social well-being in long-term care facilities, yet their widespread adoption depends on intuitive interaction modalities that minimize user learning effort and the need for a technical expert on-ground. Spoken dialogue is a natural interface, and recent advances in large language models (LLMs) promise more flexible and engaging exchanges than traditional scripted systems. In this study, we implemented a modular speech-based architecture combining automatic speech recognition, text-to-speech synthesis, and a conversational agent capable of switching between a fully scripted and LLM-driven dialogue. The implemented architecture was embodied in a TIAGo robot (PAL Robotics) and tested to compare three conversational strategies: (1) scripted, pre-defined dialogue, (2) LLM-based free-form conversation, and (3) LLM-based conversation augmented with personal information provided through the prompt. Eighteen younger adults and eighteen older adults engaged in a five-minute interaction with the robot under all three conditions in a within-subject design, and subsequently completed the Almere model questionnaire. Across all subscales and both participant groups, differences between dialogue strategies were small and statistically non-significant, despite informal comments from several older participants indicating a perceived increase in intelligence or naturalness for the LLM conditions. The findings suggest that generative dialogue and basic personalization alone do not meaningfully shift perceived acceptance in brief, task-neutral encounters, underscoring the importance of longer-term deployment and functionally meaningful robot roles in future evaluations. Full article
(This article belongs to the Special Issue Latest Advances and Prospects of Human-Robot Interaction (HRI))
Show Figures

Figure 1

13 pages, 235 KB  
Article
A Comparative Cross-Sectional Study of Prosthodontic Residents and Large Language Models on Standardized Multiple-Choice Questions
by Gül Ates and Ali Can Bulut
Appl. Sci. 2026, 16(7), 3296; https://doi.org/10.3390/app16073296 - 29 Mar 2026
Viewed by 251
Abstract
Recent advances in artificial intelligence have expanded the use of large language models (LLMs) beyond speech-based applications and increased interest in their potential roles in dental education. However, evidence regarding LLM performance in postgraduate dental education, particularly in prosthodontics, remains limited. Therefore, this [...] Read more.
Recent advances in artificial intelligence have expanded the use of large language models (LLMs) beyond speech-based applications and increased interest in their potential roles in dental education. However, evidence regarding LLM performance in postgraduate dental education, particularly in prosthodontics, remains limited. Therefore, this study aimed to compare the accuracy of responses from prosthodontic residents and LLMs to standardized multiple-choice questions in prosthodontics and to explore the potential role of artificial intelligence in prosthodontic education. Thirty-two prosthodontic residents participated in this cross-sectional study. Participants completed a standardized 30-item multiple-choice test comprising four demographic items and 26 questions assessing basic knowledge, general dentistry, and advanced prosthodontic specialty questions. The same questions were administered to seven large language models (LLMs): ChatGPT-4o, ChatGPT-o1, ChatGPT-o3-mini, Claude Sonnet 3.7, Gemini 2.5 Pro, Microsoft Copilot (web interface, accessed in August 2025), and DeepSeek V3. Response accuracy and consistency were evaluated. Statistical analyses were performed using IBM SPSS Statistics (version 27.0), with statistical significance set at p < 0.05. A statistically significant difference was observed between prosthodontic residents and LLMs in responses to advanced-level prosthodontic specialty questions (p < 0.05), with higher correct response rates recorded for LLMs. No statistically significant differences were identified between the two groups for basic knowledge and general dentistry questions (p > 0.05). In addition, no significant association was found between the duration of prosthodontic residency training and residents’ response accuracy (p > 0.05). LLMs achieved high scores on this structured MCQ-based assessment, particularly in advanced theoretical prosthodontic items. However, these findings should be interpreted with caution within the limits of a written examination format and do not represent overall clinical competence or real-world patient care performance. Accordingly, artificial intelligence may be considered a supportive educational tool in postgraduate prosthodontic education rather than a replacement for clinical training. Full article
22 pages, 1060 KB  
Systematic Review
Artificial Intelligence in EFL Speaking Instruction: A Systematic Review of Pedagogical Design, Affective Conditions and Instructional Input
by Sareen Kaur Bhar
Encyclopedia 2026, 6(4), 74; https://doi.org/10.3390/encyclopedia6040074 - 27 Mar 2026
Viewed by 901
Abstract
Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine [...] Read more.
Speaking proficiency remains one of the most challenging skills for learners of English as a Foreign Language (EFL), particularly in contexts where sustained spoken interaction is limited. This systematic review synthesises 36 empirical studies (2015–2025) identified through a PRISMA-guided Scopus search to examine how artificial intelligence (AI)-mediated instruction supports EFL speaking development. The included studies were analysed according to AI modality, pedagogical integration, instructional input characteristics, and linguistic and affective outcomes. Findings indicate that AI tools—such as chatbots, automatic speech recognition systems, and large language models—consistently support affective outcomes, including reduced speaking anxiety and increased willingness to communicate. Improvements in fluency, pronunciation, and accuracy were frequently reported, particularly when AI tools were embedded within task-based and pedagogically structured instructional designs. However, evidence for sustained development of higher-order communicative competence was more variable. The review proposes a mediated input framework conceptualising AI as a design-sensitive instructional resource rather than an autonomous teaching agent. Full article
(This article belongs to the Section Arts & Humanities)
Show Figures

Figure 1

18 pages, 7435 KB  
Article
A Comparative Analysis of Deep-Learning-Based Speech Enhancement Models: Assessing Biometric Speaker Verification in Real-World Noisy Environments
by Md Jahangir Alam Khondkar, Ajan Ahmed, Stephanie Schuckers and Masudul H. Imtiaz
Big Data Cogn. Comput. 2026, 10(3), 98; https://doi.org/10.3390/bdcc10030098 - 23 Mar 2026
Viewed by 556
Abstract
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation [...] Read more.
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation remains an open problem. We address this gap by benchmarking three architecturally distinct DL-based enhancement models—Wave-U-Net, CMGAN, and U-Net—on three independent, domain-diverse corpora (SpEAR, VPQAD, and Clarkson) that the models never encountered during training and by introducing commercial-grade VeriSpeak speaker-verification scores as a biometric evaluation dimension absent from prior comparative studies. Our experiments reveal a clear three-way trade-off: U-Net achieves the highest signal-to-noise ratio (SNR) gains (+61.44% on SpEAR, +67.05% on VPQAD, +235.3% on Clarkson) but sacrifices naturalness; CMGAN yields the best perceptual evaluation of speech quality (PESQ) values (3.33, 1.35, and 2.50, respectively), favoring listening-comfort applications; and Wave-U-Net delivers the strongest biometric fidelity (VeriSpeak improvements of +11.63%, +30.22%, and +29.24%) while offering competitive perceptual quality. These results highlight that model selection must be driven by the target deployment scenario and provide actionable guidance for improving biometric verification robustness under real-world noise. Full article
Show Figures

Figure 1

24 pages, 19961 KB  
Article
Spatial Distribution and Influencing Factors of Speech Intelligibility in Round-Table Conversation Scenarios
by Lingling Liu, Linda Liang, Kangying Huang, Miao Ren and Yang Song
Buildings 2026, 16(6), 1258; https://doi.org/10.3390/buildings16061258 - 23 Mar 2026
Viewed by 257
Abstract
Round-table conversations, as common social environments, greatly depend on effective verbal communication to enrich the interactive experience. However, considerable variations in speech intelligibility (SI) occur among listeners at different positions under negative factors. This study employed numerical simulations, in situ measurements, and subjective [...] Read more.
Round-table conversations, as common social environments, greatly depend on effective verbal communication to enrich the interactive experience. However, considerable variations in speech intelligibility (SI) occur among listeners at different positions under negative factors. This study employed numerical simulations, in situ measurements, and subjective listening tests to evaluate the main factors affecting SI, and quantified SI using the Speech Transmission Index (STI) and Speech Reception Threshold (SRT). The results demonstrate that SI varies with listener position, with the extent of these variations surpassing expectations. The listeners closer to the speaker have a significantly greater SI than those across the table, with STI variations reaching 0.55 in the free field and 0.23 (SRT variations up to 3.1 dB) in the actual room. Both speaker orientation and listener head orientation greatly influence SI distribution and its positional sensitivity. Furthermore, the overall STI among listeners decreases by no more than 0.2 for each increase in table diameter. Overall, the trend of the change in SI in the actual room is essentially consistent with those in the free field, but reflections improve SI for listeners in less favorable positions. These findings reveal SI distribution patterns in round-table scenarios, providing evidence and insights for future research. Full article
(This article belongs to the Section Building Energy, Physics, Environment, and Systems)
Show Figures

Figure 1

33 pages, 1935 KB  
Article
Smart Industrial Safety in High-Noise Environments Using IoT and AI
by Alessia Bramanti, Luca Catarinucci, Mattia Cotardo, Rosaria Del Sorbo, Claudia Giliberti, Mazhar Jan, Luca Landi, Raffaele Mariconte, Teodoro Montanaro, Federico Paolucci, Luigi Patrono, Davide Rollo, Francesco Antonio Salzano and Ilaria Sergi
Electronics 2026, 15(6), 1311; https://doi.org/10.3390/electronics15061311 - 20 Mar 2026
Viewed by 439
Abstract
High noise levels in industrial workplaces pose significant challenges to occupational safety, particularly with hearing protection and effective communication. Traditional hearing protection devices, while effectively attenuating harmful noise, often compromise situational awareness by excessively isolating workers from the acoustic environment and preventing the [...] Read more.
High noise levels in industrial workplaces pose significant challenges to occupational safety, particularly with hearing protection and effective communication. Traditional hearing protection devices, while effectively attenuating harmful noise, often compromise situational awareness by excessively isolating workers from the acoustic environment and preventing the perception of critical auditory cues (e.g., emergency alarms), thereby introducing additional safety risks. This paper presents a smart industrial safety system that integrates Internet of Things (IoT) and artificial intelligence (AI) and is based on intelligent hearing protection devices to (a) selectively attenuate hazardous industrial noise while (b) preserving human speech and (c) reproduce targeted audio notifications to workers near malfunctioning or hazardous machinery. A real-time voice activity detection (VAD) model is employed to distinguish vocal components from background noise to adaptively control digital signal processing filters. Furthermore, indoor localization enables the delivery of targeted audio messages to workers in proximity to relevant events. Experimental evaluations on embedded hardware demonstrate that the selected VAD model operates well within real-time constraints and effectively supports dynamic noise filtering. Objective evaluation of the filtering stage using Mean Opinion Score (MOS), signal-to-noise ratio (SNR), and Harmonics-to-Noise Ratio (HNR) shows consistent quality improvements across all tested conditions, with MOS gains up to +118%, SNR increases between +10.4 and +29.0 dB, and HNR improvements up to +6.22 dB, indicating enhanced speech intelligibility and preservation of voice harmonic structure even under high-noise scenarios. Robustness validation of the VAD module across varying acoustic conditions confirms reliable speech detection performance, achieving perfect classification at +10 dB SNR, very high accuracy at 0 dB (98.3%, ROC AUC 0.998), and stable operation even at 7 dB SNR (79.8% accuracy, ROC AUC 0.878). The proposed architecture achieves a balanced trade-off between hearing protection and speech intelligibility while enhancing the effectiveness of safety communications in noisy industrial environments. Full article
Show Figures

Figure 1

15 pages, 416 KB  
Review
Artificial Intelligence for the Early Detection of Patients with Cognitive Impairment: A Scoping Review
by María Moreno-Pineda, Víctor Ortiz-Mallasén and Águeda Cervera-Gasch
Healthcare 2026, 14(6), 768; https://doi.org/10.3390/healthcare14060768 - 18 Mar 2026
Viewed by 442
Abstract
Background/Objectives: Cognitive impairment affects multiple brain functions, and its early detection is essential to prevent progression to dementia; artificial intelligence has shown considerable potential in this field. This scoping review aims to map the impact of artificial intelligence–based tools for the early detection [...] Read more.
Background/Objectives: Cognitive impairment affects multiple brain functions, and its early detection is essential to prevent progression to dementia; artificial intelligence has shown considerable potential in this field. This scoping review aims to map the impact of artificial intelligence–based tools for the early detection of cognitive impairment by identifying the main technologies used, examining their effectiveness, and exploring their ethical implications. Methods: A scoping review was conducted between April and May 2025 following the PRISMA-ScR methodological framework; the review protocol was previously registered on the Open Science Framework. PubMed, Scopus, and Cochrane databases were searched using natural language and controlled vocabulary terms via Medical Subject Headings. The search was limited to articles published between 2020 and 2025, in English or Spanish, with free full-text access. Methodological quality was assessed using CASPe, JBI, and MMAT. Results: A total of 14 studies were included after the selection and critical appraisal process. The findings show that artificial intelligence–based tools such as deep-learning models applied to neuroimaging, speech and gait analysis, electronic health record analysis, and mobile health applications demonstrate promising accuracy in detecting early cognitive changes. These technologies enable the identification of subtle patterns that may be difficult to detect using conventional clinical assessments. Conclusions: AI-based tools can provide substantial support for clinical decision-making by effectively identifying subtle changes that are imperceptible to human intelligence. However, their use also raises ethical issues related to patient privacy and data security. Full article
Show Figures

Figure 1

Back to TopTop