MDPI - Publisher of Open Access Journals

23 pages, 3741 KiB

Open AccessArticle

Multi-Corpus Benchmarking of CNN and LSTM Models for Speaker Gender and Age Profiling

by Jorge Jorrin-Coz, Mariko Nakano, Hector Perez-Meana and Leobardo Hernandez-Gonzalez

Computation 2025, 13(8), 177; https://doi.org/10.3390/computation13080177 - 23 Jul 2025

Viewed by 201

Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, [...] Read more.

Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, crowdsourced Mozilla Common Voice, and in-the-wild VoxCeleb1. All models share the same architecture, optimizer, and data preprocessing; no corpus-specific hyperparameter tuning is applied. We perform a detailed preprocessing and feature extraction procedure, evaluating multiple configurations and validating their applicability and effectiveness in improving the obtained results. A feature analysis shows that Mel spectrograms benefit CNNs, whereas Mel Frequency Cepstral Coefficients (MFCCs) suit LSTMs, and that the optimal Mel-bin count grows with corpus Signal Noise Rate (SNR). With this fixed recipe, EfficientNet achieves 99.82% gender accuracy on Common Voice (+1.25 pp over the previous best) and 98.86% on VoxCeleb1 (+0.57 pp). MobileNet attains 99.86% age-group accuracy on Common Voice (+2.86 pp) and a 5.35-year MAE for age estimation on TIMIT using a lightweight configuration. The consistent, near-state-of-the-art results across three acoustically diverse datasets substantiate the robustness and versatility of the proposed pipeline. Code and pre-trained weights are released to facilitate downstream research. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Graphical abstract

27 pages, 1817 KiB

Open AccessArticle

A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media

by Muhammad Usman, Muhammad Ahmad, Grigori Sidorov, Irina Gelbukh and Rolando Quintero Tellez

Computers 2025, 14(7), 279; https://doi.org/10.3390/computers14070279 - 15 Jul 2025

Viewed by 571

Abstract

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, [...] Read more.

The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques—joint multilingual and translation-based approaches—for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF–IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages. Full article

(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)

► Show Figures

Figure 1

16 pages, 2365 KiB

Open AccessArticle

Fast Inference End-to-End Speech Synthesis with Style Diffusion

by Hui Sun, Jiye Song and Yi Jiang

Electronics 2025, 14(14), 2829; https://doi.org/10.3390/electronics14142829 - 15 Jul 2025

Viewed by 382

Abstract

In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in [...] Read more.

In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in the decoder. To address these issues, this paper proposes an improved TTS framework named Q-VITS. Q-VITS incorporates Rotary Position Embedding (RoPE) into the text encoder to enhance long-sequence modeling, adopts a frame-level prior modeling strategy to optimize one-to-many mappings, and designs a style extractor based on a diffusion model for controllable style rendering. Additionally, the proposed decoder ConfoGAN integrates explicit F0 modeling, Pseudo-Quadrature Mirror Filter (PQMF) multi-band synthesis and Conformer structure. The experimental results demonstrate that Q-VITS outperforms the VITS in terms of speech quality, pitch accuracy, and inference efficiency in both subjective Mean Opinion Score (MOS) and objective Mel-Cepstral Distortion (MCD) and Root Mean Square Error (RMSE) evaluations on a single-speaker dataset, achieving performance close to ground-truth audio. These improvements provide an effective solution for efficient and controllable speech synthesis. Full article

(This article belongs to the Special Issue Deep Learning in Image Processing and Pattern Recognition, 2nd Edition)

► Show Figures

Figure 1

15 pages, 2125 KiB

Open AccessArticle

Psychometric Properties of a 17-Item German Language Short Form of the Speech, Spatial, and Qualities of Hearing Scale and Their Correlation to Audiometry in 97 Individuals with Unilateral Menière’s Disease from a Prospective Multicenter Registry

by Jennifer L. Spiegel, Bernhard Lehnert, Laura Schuller, Irina Adler, Tobias Rader, Tina Brzoska, Bernhard G. Weiss, Martin Canis, Chia-Jung Busch and Friedrich Ihler

J. Clin. Med. 2025, 14(14), 4953; https://doi.org/10.3390/jcm14144953 - 13 Jul 2025

Viewed by 340

Abstract

Background/Objectives: Menière’s disease (MD) is a debilitating disorder with episodic and variable ear symptoms. Diagnosis can be challenging, and evidence for therapeutic approaches is low. Furthermore, patients show a unique and fluctuating configuration of audiovestibular impairment. As a psychometric instrument to assess hearing-specific [...] Read more.

Background/Objectives: Menière’s disease (MD) is a debilitating disorder with episodic and variable ear symptoms. Diagnosis can be challenging, and evidence for therapeutic approaches is low. Furthermore, patients show a unique and fluctuating configuration of audiovestibular impairment. As a psychometric instrument to assess hearing-specific disability is currently lacking, we evaluated a short form of the Speech, Spatial, and Qualities of Hearing Scale (SSQ) in a cohort of patients with MD. Methods: Data was collected in the context of a multicenter prospective patient registry intended for the long-term follow up of MD patients. Hearing was assessed by pure tone and speech audiometry. The SSQ was applied in the German language version with 17 items. Results: In total, 97 consecutive patients with unilateral MD with a mean age of 56.2 ± 5.0 years were included. A total of 55 individuals (57.3%) were female, and 72 (75.0%) were categorized as having definite MD. The average total score of the SSQ was 6.0 ± 2.1. Cronbach’s alpha for internal consistency was 0.960 for the total score. We did not observe undue floor or ceiling effects. SSQ values showed a statistically negative correlation with hearing thresholds and a statistically positive correlation with speech recognition scores of affected ears. Conclusions: The short form of the SSQ provides insight into hearing-specific disability in patients with MD. Therefore, it may be informative regarding disease stage and rehabilitation needs. Full article

(This article belongs to the Special Issue Clinical Diagnosis and Management of Vestibular Disorders)

► Show Figures

Figure 1

33 pages, 519 KiB

Open AccessSystematic Review

Impact of Oncological Treatment on Quality of Life in Patients with Head and Neck Malignancies: A Systematic Literature Review (2020–2025)

by Raluca Grigore, Paula Luiza Bejenaru, Gloria Simona Berteșteanu, Ruxandra Ioana Nedelcu-Stancalie, Teodora Elena Schipor-Diaconu, Simona Andreea Rujan, Bianca Petra Taher, Șerban Vifor Gabriel Berteșteanu, Bogdan Popescu, Irina Doinița Popescu, Alexandru Nicolaescu, Anca Ionela Cîrstea and Catrinel Beatrice Simion-Antonie

Curr. Oncol. 2025, 32(7), 379; https://doi.org/10.3390/curroncol32070379 - 30 Jun 2025

Viewed by 364

Abstract

Background: Quality of life (QoL) is a critical indicator in assessing the success of oncological treatments for head and neck malignancies, reflecting their impact on physiological functions and psychosocial well-being beyond mere survival. Treatments (surgery, radiotherapy, chemotherapy) pose multiple functional and emotional [...] Read more.

Background: Quality of life (QoL) is a critical indicator in assessing the success of oncological treatments for head and neck malignancies, reflecting their impact on physiological functions and psychosocial well-being beyond mere survival. Treatments (surgery, radiotherapy, chemotherapy) pose multiple functional and emotional challenges, and recent advancements underscore the necessity of evaluating post-treatment QoL. Objective: This literature review investigates the impact of oncological treatment on the QoL of patients with malignant head and neck cancers (oral, oropharyngeal, hypopharyngeal, laryngeal) and identifies factors influencing their QoL index. Methodology: Using a PICO framework, studies from PubMed Central were analyzed, selected based on inclusion (English publications, full text, PROM results) and exclusion criteria. The last research was conducted on 6 April 2025. From 231 identified studies, 49 were included after applying filters (MeSH: “Quality of Life,” “laryngeal cancer,” “oral cavity cancer,” etc.). Data were organized in Excel, and the methodology adhered to PRISMA standards. Results: Treatment Impact: Oncological treatments significantly affect QoL, with acute post-treatment declines in functions such as speech, swallowing, and emotional well-being (anxiety, depression). Partial recovery depends on rehabilitative interventions. Influencing Factors: Treatment type, disease stage, socioeconomic, and demographic contexts influence QoL. De-escalated treatments and prompt rehabilitation improve recovery, while complications like trismus, dysphagia, or persistent hearing issues reduce long-term QoL. Assessment Tools: Standardized PROM questionnaires (EORTC QLQ-C30, QLQ-H&N35, MDADI, HADS) highlighted QoL variations. Studies from Europe, North America, and Asia indicate regional differences in outcomes. Limitations: Retrospective designs, small sample sizes, and PROM variability limit generalizability. Multicentric studies with extended follow-up are recommended. Conclusions: Oncological treatments for head and neck malignancies have a complex impact on QoL, necessitating personalized and multidisciplinary strategies. De-escalated therapies, early rehabilitation, and continuous monitoring are essential for optimizing functional and psychosocial outcomes. Methodological gaps highlight the need for standardized research. Full article

(This article belongs to the Section Head and Neck Oncology)

► Show Figures

Figure 1

25 pages, 2093 KiB

Open AccessArticle

Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems

by Samuel Yaw Mensah, Tao Zhang, Nahid AI Mahmud and Yanzhang Geng

Electronics 2025, 14(13), 2643; https://doi.org/10.3390/electronics14132643 - 30 Jun 2025

Viewed by 661

Abstract

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and [...] Read more.

Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. Full article

► Show Figures

Figure 1

24 pages, 4466 KiB

Open AccessArticle

Natural Interaction in Virtual Heritage: Enhancing User Experience with Large Language Models

by Isabel Sánchez-Berriel, Fernando Pérez-Nava and Lucas Pérez-Rosario

Electronics 2025, 14(12), 2478; https://doi.org/10.3390/electronics14122478 - 18 Jun 2025

Viewed by 375

Abstract

In recent years, Virtual Reality (VR) has emerged as a powerful tool for disseminating Cultural Heritage (CH), often incorporating Virtual Humans (VHs) to guide users through historical recreations. The advent of Large Language Models (LLMs) now enables natural, unscripted communication with these VHs, [...] Read more.

In recent years, Virtual Reality (VR) has emerged as a powerful tool for disseminating Cultural Heritage (CH), often incorporating Virtual Humans (VHs) to guide users through historical recreations. The advent of Large Language Models (LLMs) now enables natural, unscripted communication with these VHs, even on limited devices. This paper details a natural interaction system for VHs within a VR application of San Cristóbal de La Laguna, a UNESCO World Heritage Site. Our system integrates Speech-to-Text, LLM-based dialogue generation, and Text-to-Speech synthesis. Adhering to user-centered design (UCD) principles, we conducted two studies: a preliminary study revealing user interest in historically adapted language, and a qualitative test that identified key user experience improvements, such as incorporating feedback mechanisms and gender selection for VHs. The project successfully developed a prioritized user experience, focusing on usability evaluation, immersion, and dialogue quality. We propose a generalist methodology and recommendations for integrating unscripted VH dialogue in VR. However, limitations include dialogue generation latency and reduced quality in non-English languages. While a formative usability test evaluated the process, the small sample size restricts broad generalizations about user behavior. Full article

(This article belongs to the Special Issue Recent Advances in Virtual Reality and Computer Vision Based on Deep Learning)

► Show Figures

Figure 1

14 pages, 345 KiB

Open AccessArticle

Construct Validity and Internal Consistency of the Italian Version of the PedsQL^TM 4.0 Generic Core Scale and PedsQL^TM 3.0 Cerebral Palsy Module

by Ilaria Pedrinelli, Sofia Biagi, Domenico Marco Romeo, Elisa Musto, Valeria Fagiani, Martina Lanza, Erika Guastafierro, Alice Colombo, Andrea Giordano, Cristina Montomoli, Cristiana Rezzani, Tiziana Casalino, Eugenio Mercuri, Daria Riva, Matilde Leonardi, Giovanni Baranello and Emanuela Pagliano

Children 2025, 12(6), 749; https://doi.org/10.3390/children12060749 - 9 Jun 2025

Viewed by 367

Abstract

Background: Health-related quality of life (HRQoL) has emerged as a meaningful outcome measure in clinical trials and healthcare interventions in children with cerebral palsy (CwCP). We assessed the construct validity and internal consistency of the Italian version of the Paediatric QoL inventory (PedsQL [...] Read more.

Background: Health-related quality of life (HRQoL) has emerged as a meaningful outcome measure in clinical trials and healthcare interventions in children with cerebral palsy (CwCP). We assessed the construct validity and internal consistency of the Italian version of the Paediatric QoL inventory (PedsQL^TM) 4.0 Generic Core Scales (GCS) and PedsQL^TM 3.0 Cerebral Palsy Module (CPM). Methods: A total of 125 CwCP and their parents were enrolled. Participants completed both the GCS and the CPM modules, and the results were compared to those of a sample of 121 healthy peers and their parents. The dimensionality of the two modules was assessed through exploratory factor analysis. Construct validity was assessed by a known-groups method evaluating the differences between CwCP and healthy sample. Results: Only a few GCS subscales were unidimensional, while all CPM subscales proved to be unidimensional, except for the Speech and Communication subscales of child self-reports. GCS internal consistency was good for all subscales of the parent proxy-reports, as well as for the Physical Activities and Psychosocial Health subscales of child self-reports. CPM internal consistency was good for both parent proxy-reports and—with a few exceptions—child self-reports. As for the PedsQL^TM validity, the GCS proved effective in discriminating between CwCP and healthy participants; the CPM showed a significant association between lower neurofunctional abilities and lower HRQoL. Parent–child concordance shows that child self-report scores were always higher than the those of the proxy-reports for both the GCS and CPM modules. Conclusions: The present study confirms the internal consistency and construct validity of the Italian version of both PedsQL^TM modules. In CwCP, greater functional disability resulted in lower HRQoL scores, and there was significant discrepancy between the parent and child ratings. Full article

(This article belongs to the Special Issue Children with Cerebral Palsy and Other Developmental Disabilities)

► Show Figures

Figure 1

14 pages, 2086 KiB

Open AccessProtocol

Orofacial Myofunctional Therapy: Investigating a Novel Therapeutic Approach for Pediatric Obstructive Sleep Apnea in Children with and Without Down Syndrome—A Study Protocol

by Jolien Verbeke, Iris Meerschman, Karlien Dhondt, Els De Leenheer, Julie Willekens, Kristiane Van Lierde and Sofie Claeys

Children 2025, 12(6), 737; https://doi.org/10.3390/children12060737 - 6 Jun 2025

Viewed by 1638

Abstract

Background/Objectives: Pediatric obstructive sleep apnea (OSA) is a prevalent medical condition, affecting 1–5% of non-syndromic children and 30–90% of children with Down syndrome. Given the severity of the condition and the associated health risks, early and effective treatment is crucial. However, current treatment [...] Read more.

Background/Objectives: Pediatric obstructive sleep apnea (OSA) is a prevalent medical condition, affecting 1–5% of non-syndromic children and 30–90% of children with Down syndrome. Given the severity of the condition and the associated health risks, early and effective treatment is crucial. However, current treatment modalities are often invasive or suffer from poor patient adherence. Additionally, adenotonsillectomy, the first-line treatment in pediatric OSA, seems not to be effective in every child, leaving children with residual OSA postoperatively. These challenges are particularly pronounced in high-risk populations, such as children with Down syndrome, highlighting the need for alternative therapeutic strategies. Therefore, a protocol is presented to evaluate the effectiveness of orofacial myofunctional therapy (OMT) as a treatment for OSA in two pediatric populations: (1) Non-syndromic children aged 4–18 years: 10 weeks of OMT. (2) Children with Down syndrome aged 4–18 years: 20 weeks of OMT. Effects of the OMT program will be evaluated on: sleep parameters (e.g., obstructive Apnea–Hyponea Index (oAHI), snoring frequency); orofacial functions (e.g., breathing pattern, tongue position at rest); quality of life outcomes. Methods: A pretest–posttest design will be used to evaluate the effectiveness of OMT in both children with and without Down syndrome and OSA. Both objective measures and patient-reported outcomes are being collected. Results: OMT is expected to improve orofacial functions, reduce OSA severity and symptoms, and enhance quality of life in both non-syndromic and syndromic children. Conclusions: This multidisciplinary research protocol, involving collaboration between ENT specialists and speech-language pathologists, aims to provide a comprehensive understanding of the potential benefits of OMT in treating OSA. Full article

(This article belongs to the Special Issue Current Advances in Paediatric Sleep Medicine)

12 pages, 494 KiB

Open AccessArticle

Design of a Dual-Path Speech Enhancement Model

by Seorim Hwang, Sung Wook Park and Youngcheol Park

Appl. Sci. 2025, 15(11), 6358; https://doi.org/10.3390/app15116358 - 5 Jun 2025

Viewed by 512

Abstract

Although both noise suppression and speech restoration are fundamental to speech enhancement, many Deep neural network (DNN)-based approaches tend to focus disproportionately on one, often overlooking the importance of their joint handling. In this study, we propose a dual-path architecture designed to balance [...] Read more.

Although both noise suppression and speech restoration are fundamental to speech enhancement, many Deep neural network (DNN)-based approaches tend to focus disproportionately on one, often overlooking the importance of their joint handling. In this study, we propose a dual-path architecture designed to balance noise suppression and speech restoration. The main path consists of an encoder and two specialized decoders: one dedicated to estimating the clean speech spectrum and the other to predicting a noise suppression mask. To reinforce the joint modeling of noise suppression and speech restoration, we introduce an auxiliary refinement path. This path consists of a separate encoder–decoder structure and is designed to further refine the enhanced speech by incorporating complementary information, learned independently from the main path. By using this dual-path architecture, the model better preserves fine speech details while reducing residual noise. Experimental results on the VoiceBank + DEMAND dataset show that our model surpasses conventional methods across multiple evaluation metrics in the causal setup. Specifically, it achieves a PESQ score of 3.33, reflecting improved speech quality, and a CSIG score of 4.48, indicating enhanced intelligibility. Furthermore, it demonstrates superior noise suppression, achieving an SNRseg of 10.44 and a CBAK score of 3.75. Full article

(This article belongs to the Special Issue Application of Deep Learning in Speech Enhancement Technology)

► Show Figures

Figure 1

15 pages, 885 KiB

Open AccessArticle

Adapting the Bogenhausen Dysarthria Scales (BoDyS) to Chilean Spanish Speakers: Face and Content Validation

by Marcela Sanhueza-Garrido, Virginia García-Flores, Sebastián Contreras-Cubillos and Jaime Crisosto-Alarcón

Brain Sci. 2025, 15(6), 604; https://doi.org/10.3390/brainsci15060604 - 4 Jun 2025

Viewed by 703

Abstract

Background: Dysarthria is a neuromotor speech disorder that significantly impacts patients’ quality of life. In Chile, there is a lack of culturally validated instruments for assessing dysarthria. This study aimed to cross-culturally adapt the Bogenhausen Dysarthria Scales (BoDyS) into Chilean Spanish and to [...] Read more.

Background: Dysarthria is a neuromotor speech disorder that significantly impacts patients’ quality of life. In Chile, there is a lack of culturally validated instruments for assessing dysarthria. This study aimed to cross-culturally adapt the Bogenhausen Dysarthria Scales (BoDyS) into Chilean Spanish and to conduct face and content validation. Methods: The adaptation process included translation and back-translation, followed by validation by a panel of experts. Clarity, format, and length were evaluated, and the Kappa index (KI), content validity index (CVI), and content validity ratio (CVR) were calculated to confirm item relevance. A pilot test was subsequently conducted with ten speech–language pathologists to apply the adapted version to patients. Results: The adaptation process produced a consensus version that preserved the semantic and cultural characteristics of the original scale. The statistical measures (KI = 1.00; I-CVI = 1.00; S-CVI/Ave = 1.00; S-CVI/UA = 1.00; CVR = 1.00) indicated satisfactory levels of agreement. The pilot test demonstrated the scale’s appropriateness and effectiveness for assessing dysarthria within the Chilean context, although some experts recommended reducing task repetition for patients prone to fatigue. Conclusions: The Chilean version of the BoDyS (BoDyS-CL) is a valid and useful tool for evaluating dysarthria in Chile. This study provides a foundation for further research and the systematic implementation of this scale in local clinical practice. Full article

(This article belongs to the Special Issue Recent Advances in Assessment and Rehabilitation of Individuals with Communication and Language Disorders)

► Show Figures

Figure 1

25 pages, 3131 KiB

Open AccessArticle

Evaluating the Clinical- and Cost-Effectiveness of Cochlear Implant Sound Processor Upgrades in Older Adults: Outcomes from a Large Australian Multicenter Study

by Paola Vittoria Incerti, Jermy Pang, Jason Gavrilis, Vicky W. Zhang, Jessica Tsiolkas, Rajan Sharma, Elizabeth Seil, Antonio Ahumada-Canale, Bonny Parkinson and Padraig Thomas Kitterick

J. Clin. Med. 2025, 14(11), 3765; https://doi.org/10.3390/jcm14113765 - 28 May 2025

Viewed by 956

Abstract

Background: Many older Australian adults with cochlear implants (CI) lack funding for replacement sound processors, risking complete device failure and reduced quality of life. The need for replacement CI devices for individuals with obsolete sound processors and no access to funding poses an [...] Read more.

Background: Many older Australian adults with cochlear implants (CI) lack funding for replacement sound processors, risking complete device failure and reduced quality of life. The need for replacement CI devices for individuals with obsolete sound processors and no access to funding poses an increasing public health challenge in Australia and worldwide. We aimed to investigate the clinical and cost-effectiveness of upgrading obsolete CI sound processors in older adults. Methods: Alongside an Australian Government-funded upgrade program, a prospective, mixed-methodology design study was undertaken. Participants were aged 65 and over, with obsolete Cochlear™ sound processors and no funding for replacements. This study compared speech perception in noise, as well as self-reported outcome measures, including cognition, listening effort, fatigue, device benefit, mental well-being, participation, empowerment and user experiences, between upgraded and obsolete hearing aid processors. The economic impact of the upgrade was evaluated using two state-transition microsimulation models of adults using CIs. Results: The multi-site study ran from 20 May 2021 to 21 April 2023, with recruitment from June 2021 to May 2022. A total of 340 Cochlear™ sound processors were upgraded in 304 adults. The adults’ mean age was 77.4 years (SD 6.6), and 48.5% were female. Hearing loss onset occurred on average at 30 years (SD 21.0), with 12 years (SD 6.2) of CI use. The outcomes show significant improvements in speech understanding in noise and reduced communication difficulties, self-reported listening effort and fatigue. Semi-structured interviews have revealed that upgrades alleviated the anxiety and fear of sudden processor failure. Health economic analysis found that the cost-effectiveness of upgrades stemmed from preventing device failures, rather than from access to newer technology features. Conclusions: Our study identified significant clinical and self-reported benefits from upgrading Cochlear™ sound processors. Economic value came from avoiding scenarios where a total failure of device renders its user unable to access sound. The evidence gathered can be used to inform policy on CI processor upgrades for older adults. Full article

(This article belongs to the Special Issue The Challenges and Prospects in Cochlear Implantation)

► Show Figures

Figure 1

12 pages, 964 KiB

Open AccessArticle

A Machine Learning Model to Predict Postoperative Speech Recognition Outcomes in Cochlear Implant Recipients: Development, Validation, and Comparison with Expert Clinical Judgment

by Alexey Demyanchuk, Eugen Kludt, Thomas Lenarz and Andreas Büchner

J. Clin. Med. 2025, 14(11), 3625; https://doi.org/10.3390/jcm14113625 - 22 May 2025

Viewed by 559

Abstract

Background/Objectives: Cochlear implantation (CI) significantly enhances speech perception and quality of life in patients with severe-to-profound sensorineural hearing loss, yet outcomes vary substantially. Accurate preoperative prediction of CI outcomes remains challenging. This study aimed to develop and validate a machine learning model [...] Read more.

Background/Objectives: Cochlear implantation (CI) significantly enhances speech perception and quality of life in patients with severe-to-profound sensorineural hearing loss, yet outcomes vary substantially. Accurate preoperative prediction of CI outcomes remains challenging. This study aimed to develop and validate a machine learning model predicting postoperative speech recognition using a large, single-center dataset. Additionally, we compared model performance with expert clinical predictions to evaluate potential clinical utility. Methods: We retrospectively analyzed data from 2571 adult patients with postlingual hearing loss who received their cochlear implant between 2000 and 2022 at Hannover Medical School, Germany. A decision tree regression model was trained to predict monosyllabic (MS) word recognition score one to two years post-implantation using preoperative clinical variables (age, duration of deafness, preoperative MS score, pure tone average, onset type, and contralateral implantation status). Model evaluation was performed using a random data split (10%), a chronological future cohort (patients implanted after 2020), and a subset where experienced audiologists predicted outcomes for comparison. Results: The model achieved a mean absolute error (MAE) of 17.3% on the random test set and 17.8% on the chronological test set, demonstrating robust predictive performance over time. Compared to expert audiologist predictions, the model showed similar accuracy (MAE: 19.1% for the model vs. 18.9% for experts), suggesting comparable effectiveness. Conclusions: Our machine learning model reliably predicts postoperative speech outcomes and matches expert clinical predictions, highlighting its potential for supporting clinical decision-making. Future research should include external validation and prospective trials to further confirm clinical applicability. Full article

(This article belongs to the Special Issue The Challenges and Prospects in Cochlear Implantation)

► Show Figures

Figure 1

20 pages, 5649 KiB

Open AccessArticle

Edge-Deployed Band-Split Rotary Position Encoding Transformer for Ultra-Low-Signal-to-Noise-Ratio Unmanned Aerial Vehicle Speech Enhancement

by Feifan Liu, Muying Li, Luming Guo, Hao Guo, Jie Cao, Wei Zhao and Jun Wang

Drones 2025, 9(6), 386; https://doi.org/10.3390/drones9060386 - 22 May 2025

Cited by 1 | Viewed by 780

Abstract

Addressing the significant challenge of speech enhancement in ultra-low-Signal-to-Noise-Ratio (SNR) scenarios for Unmanned Aerial Vehicle (UAV) voice communication, particularly under edge deployment constraints, this study proposes the Edge-Deployed Band-Split Rotary Position Encoding Transformer (Edge-BS-RoFormer), a novel, lightweight band-split rotary position encoding transformer. While [...] Read more.

Addressing the significant challenge of speech enhancement in ultra-low-Signal-to-Noise-Ratio (SNR) scenarios for Unmanned Aerial Vehicle (UAV) voice communication, particularly under edge deployment constraints, this study proposes the Edge-Deployed Band-Split Rotary Position Encoding Transformer (Edge-BS-RoFormer), a novel, lightweight band-split rotary position encoding transformer. While existing deep learning methods face limitations in dynamic UAV noise suppression under such constraints, including insufficient harmonic modeling and high computational complexity, the proposed Edge-BS-RoFormer distinctively synergizes a band-split strategy for fine-grained spectral processing, a dual-dimension Rotary Position Encoding (

RoPE

) mechanism for superior joint time–frequency modeling, and

FlashAttention

to optimize computational efficiency, pivotal for its lightweight nature and robust ultra-low-SNR performance. Experiments on our self-constructed DroneNoise-LibriMix (DN-LM) dataset demonstrate Edge-BS-RoFormer’s superiority. Under a −15 dB SNR, it achieves Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) improvements of 2.2 dB over Deep Complex U-Net (DCUNet), 25.0 dB over the Dual-Path Transformer Network (DPTNet), and 2.3 dB over HTDemucs. Correspondingly, the Perceptual Evaluation of Speech Quality (PESQ) is enhanced by 0.11, 0.18, and 0.15, respectively. Crucially, its efficacy for edge deployment is substantiated by a minimal model storage of 8.534 MB, 11.617 GFLOPs (an 89.6% reduction vs. DCUNet), a runtime memory footprint of under 500MB, a Real-Time Factor (RTF) of 0.325 (latency: 330.830 ms), and a power consumption of 6.536 W on an NVIDIA Jetson AGX Xavier, fulfilling real-time processing demands. This study delivers a validated lightweight solution, exemplified by its minimal computational overhead and real-time edge inference capability, for effective speech enhancement in complex UAV acoustic scenarios, including dynamic noise conditions. Furthermore, the open-sourced dataset and model contribute to advancing research and establishing standardized evaluation frameworks in this domain. Full article

(This article belongs to the Section Drone Communications)

► Show Figures

Figure 1

13 pages, 1270 KiB

Open AccessArticle

Evidence for the Necessity of Objective Hearing Tests in Cochlear Implantation Assessment: Excluding Functional Hearing Loss Cases

by Anita Gáborján, Márton Kondé, Marianna Küstel, Nóra Kecskeméti, László Tamás, Ildikó Baranyi, Gábor Polony and Judit F. Szigeti

J. Clin. Med. 2025, 14(10), 3585; https://doi.org/10.3390/jcm14103585 - 20 May 2025

Viewed by 466

Abstract

Background/Objectives: Cochlear implantation is a crucial intervention for individuals with severe hearing loss, aiming to restore auditory function and improve quality of life. The decision to recommend cochlear implantation critically depends on accurate audiological evaluations. However, challenges arise when subjective assessments of [...] Read more.

Background/Objectives: Cochlear implantation is a crucial intervention for individuals with severe hearing loss, aiming to restore auditory function and improve quality of life. The decision to recommend cochlear implantation critically depends on accurate audiological evaluations. However, challenges arise when subjective assessments of hearing loss do not align with objective audiological measurements, leading to potential misdiagnoses. Comparisons are to be made between subjective and objective results, with an investigation into the characteristics, warning signs, and risk factors of functional hearing loss (FHL). Methods: A retrospective study of hearing loss presentations at an otorhinolaryngological university clinic between 2020 and 2024 was performed, whereby we collected FHL cases. The evaluation process included measurements of subjectively perceived hearing loss through pure-tone audiometry, speech understanding, and communication testing. The objective assessments comprised impedance measurement, otoacoustic emission measurement, auditory brainstem responses, auditory steady-state responses, and medical imaging. Results: During the studied period, 11 patients, with an average age of 35.2 years (13 to 64 years), who were originally referred for cochlear implantation evaluation and subsequently diagnosed with FHL, were identified. The majority (10 patients) were female. No organic cause was identified in four cases, while seven cases exhibited some organic ear abnormalities insufficient to justify the reported hearing loss. The degree of FHL ranged from 30 dB to 90 dB, with an average of 60 dB. Conclusions: Diagnosing FHL is challenging and requires comprehensive assessment and interdisciplinary collaboration. Failure to recognize it may lead to inappropriate treatment, including unnecessary cochlear implantation. This study advocates for the mandatory integration of ABR and ASSR in the clinical evaluation of all cochlear implant candidates to ensure accurate diagnosis and optimal treatment. Full article

(This article belongs to the Special Issue Current Advances in Assessment and Intervention for Hearing Loss and Cochlear Implantation)

► Show Figures

Figure 1

Search Results (263)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (263)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI