Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (650)

Search Parameters:
Keywords = speech quality

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
12 pages, 445 KiB  
Article
The Effect of Phoniatric and Logopedic Rehabilitation on the Voice of Patients with Puberphonia
by Lidia Nawrocka, Agnieszka Garstecka and Anna Sinkiewicz
J. Clin. Med. 2025, 14(15), 5350; https://doi.org/10.3390/jcm14155350 - 29 Jul 2025
Viewed by 202
Abstract
Background/Objective: Puberphonia is a voice disorder characterized by the persistence of a high-pitched voice in sexually mature males. In phoniatrics and speech-language pathology, it is also known as post-mutational voice instability, mutational falsetto, persistent fistulous voice, or functional falsetto. The absence of an [...] Read more.
Background/Objective: Puberphonia is a voice disorder characterized by the persistence of a high-pitched voice in sexually mature males. In phoniatrics and speech-language pathology, it is also known as post-mutational voice instability, mutational falsetto, persistent fistulous voice, or functional falsetto. The absence of an age-appropriate vocal pitch may adversely affect psychological well-being and hinder personal, social, and occupational functioning. The aim of this study was to evaluate of the impact of phoniatric and logopedic rehabilitation on voice quality in patients with puberphonia. Methods: The study included 18 male patients, aged 16 to 34 years, rehabilitated for voice mutation disorders. Phoniatric and logopedic rehabilitation included voice therapy tailored to each subject. A logopedist led exercises aimed at lowering and stabilizing the pitch of the voice and improving its quality. A phoniatrician supervised the therapy, monitoring the condition of the vocal apparatus and providing additional diagnostic and therapeutic recommendations as needed. The duration and intensity of the therapy were adjusted for each patient. Before and after voice rehabilitation, the subjects completed the following questionnaires: the Voice Handicap Index (VHI), the Vocal Tract Discomfort (VTD) scale, and the Voice-Related Quality of Life (V-RQOL). They also underwent an acoustic voice analysis. Results: Statistical analysis of the VHI, VTD, and V-RQOL scores, as well as the voice’s acoustic parameters, showed statistically significant differences before and after rehabilitation (p < 0.005). Conclusions: Phoniatric and logopedic rehabilitation is an effective method of reducing and maintaining a stable, euphonic male voice in patients with functional puberphonia. Effective voice therapy positively impacts selected aspects of psychosocial functioning reported by patients, improves voice-related quality of life, and reduces physical discomfort in the vocal tract. Full article
(This article belongs to the Section Otolaryngology)
Show Figures

Figure 1

17 pages, 8512 KiB  
Article
Interactive Holographic Display System Based on Emotional Adaptability and CCNN-PCG
by Yu Zhao, Zhong Xu, Ting-Yu Zhang, Meng Xie, Bing Han and Ye Liu
Electronics 2025, 14(15), 2981; https://doi.org/10.3390/electronics14152981 - 26 Jul 2025
Viewed by 271
Abstract
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm [...] Read more.
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm to generate a computer-generated hologram (CGH) with depth information for application in point cloud data. During digital human hologram building, 2D-to-3D conversion yields high-precision point cloud data. The system uses ChatGLM for natural language processing and emotion-adaptive responses, enabling multi-turn voice dialogs and text-driven model generation. The CCNN-PCG algorithm reduces computational complexity and improves display quality. Simulations and experiments show that CCNN-PCG enhances reconstruction quality and speeds up computation by over 2.2 times. This research provides a theoretical framework and practical technology for holographic interactive systems, applicable in virtual assistants, educational displays, and other fields. Full article
(This article belongs to the Special Issue Artificial Intelligence, Computer Vision and 3D Display)
Show Figures

Figure 1

23 pages, 3741 KiB  
Article
Multi-Corpus Benchmarking of CNN and LSTM Models for Speaker Gender and Age Profiling
by Jorge Jorrin-Coz, Mariko Nakano, Hector Perez-Meana and Leobardo Hernandez-Gonzalez
Computation 2025, 13(8), 177; https://doi.org/10.3390/computation13080177 - 23 Jul 2025
Viewed by 252
Abstract
Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, [...] Read more.
Speaker profiling systems are often evaluated on a single corpus, which complicates reliable comparison. We present a fully reproducible evaluation pipeline that trains Convolutional Neural Networks (CNNs) and Long-Short Term Memory (LSTM) models independently on three speech corpora representing distinct recording conditions—studio-quality TIMIT, crowdsourced Mozilla Common Voice, and in-the-wild VoxCeleb1. All models share the same architecture, optimizer, and data preprocessing; no corpus-specific hyperparameter tuning is applied. We perform a detailed preprocessing and feature extraction procedure, evaluating multiple configurations and validating their applicability and effectiveness in improving the obtained results. A feature analysis shows that Mel spectrograms benefit CNNs, whereas Mel Frequency Cepstral Coefficients (MFCCs) suit LSTMs, and that the optimal Mel-bin count grows with corpus Signal Noise Rate (SNR). With this fixed recipe, EfficientNet achieves 99.82% gender accuracy on Common Voice (+1.25 pp over the previous best) and 98.86% on VoxCeleb1 (+0.57 pp). MobileNet attains 99.86% age-group accuracy on Common Voice (+2.86 pp) and a 5.35-year MAE for age estimation on TIMIT using a lightweight configuration. The consistent, near-state-of-the-art results across three acoustically diverse datasets substantiate the robustness and versatility of the proposed pipeline. Code and pre-trained weights are released to facilitate downstream research. Full article
(This article belongs to the Section Computational Engineering)
Show Figures

Graphical abstract

19 pages, 1711 KiB  
Article
TSDCA-BA: An Ultra-Lightweight Speech Enhancement Model for Real-Time Hearing Aids with Multi-Scale STFT Fusion
by Zujie Fan, Zikun Guo, Yanxing Lai and Jaesoo Kim
Appl. Sci. 2025, 15(15), 8183; https://doi.org/10.3390/app15158183 - 23 Jul 2025
Viewed by 231
Abstract
Lightweight speech denoising models have made remarkable progress in improving both speech quality and computational efficiency. However, most models rely on long temporal windows as input, limiting their applicability in low-latency, real-time scenarios on edge devices. To address this challenge, we propose a [...] Read more.
Lightweight speech denoising models have made remarkable progress in improving both speech quality and computational efficiency. However, most models rely on long temporal windows as input, limiting their applicability in low-latency, real-time scenarios on edge devices. To address this challenge, we propose a lightweight hybrid module, Temporal Statistics Enhancement, Squeeze-and-Excitation-based Dual Convolutional Attention, and Band-wise Attention (TSE, SDCA, BA) Module. The TSE module enhances single-frame spectral features by concatenating statistical descriptors—mean, standard deviation, maximum, and minimum—thereby capturing richer local information without relying on temporal context. The SDCA and BA module integrates a simplified residual structure and channel attention, while the BA component further strengthens the representation of critical frequency bands through band-wise partitioning and differentiated weighting. The proposed model requires only 0.22 million multiply–accumulate operations (MMACs) and contains a total of 112.3 K parameters, making it well suited for low-latency, real-time speech enhancement applications. Experimental results demonstrate that among lightweight models with fewer than 200K parameters, the proposed approach outperforms most existing methods in both denoising performance and computational efficiency, significantly reducing processing overhead. Furthermore, real-device deployment on an improved hearing aid confirms an inference latency as low as 2 milliseconds, validating its practical potential for real-time edge applications. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

17 pages, 1467 KiB  
Article
Confidence-Based Knowledge Distillation to Reduce Training Costs and Carbon Footprint for Low-Resource Neural Machine Translation
by Maria Zafar, Patrick J. Wall, Souhail Bakkali and Rejwanul Haque
Appl. Sci. 2025, 15(14), 8091; https://doi.org/10.3390/app15148091 - 21 Jul 2025
Viewed by 389
Abstract
The transformer-based deep learning approach represents the current state-of-the-art in machine translation (MT) research. Large-scale pretrained transformer models produce state-of-the-art performance across a wide range of MT tasks for many languages. However, such deep neural network (NN) models are often data-, compute-, space-, [...] Read more.
The transformer-based deep learning approach represents the current state-of-the-art in machine translation (MT) research. Large-scale pretrained transformer models produce state-of-the-art performance across a wide range of MT tasks for many languages. However, such deep neural network (NN) models are often data-, compute-, space-, power-, and energy-hungry, typically requiring powerful GPUs or large-scale clusters to train and deploy. As a result, they are often regarded as “non-green” and “unsustainable” technologies. Distilling knowledge from large deep NN models (teachers) to smaller NN models (students) is a widely adopted sustainable development approach in MT as well as in broader areas of natural language processing (NLP), including speech, and image processing. However, distilling large pretrained models presents several challenges. First, increased training time and cost that scales with the volume of data used for training a student model. This could pose a challenge for translation service providers (TSPs), as they may have limited budgets for training. Moreover, CO2 emissions generated during model training are typically proportional to the amount of data used, contributing to environmental harm. Second, when querying teacher models, including encoder–decoder models such as NLLB, the translations they produce for low-resource languages may be noisy or of low quality. This can undermine sequence-level knowledge distillation (SKD), as student models may inherit and reinforce errors from inaccurate labels. In this study, the teacher model’s confidence estimation is employed to filter those instances from the distilled training data for which the teacher exhibits low confidence. We tested our methods on a low-resource Urdu-to-English translation task operating within a constrained training budget in an industrial translation setting. Our findings show that confidence estimation-based filtering can significantly reduce the cost and CO2 emissions associated with training a student model without drop in translation quality, making it a practical and environmentally sustainable solution for the TSPs. Full article
(This article belongs to the Special Issue Deep Learning and Its Applications in Natural Language Processing)
Show Figures

Figure 1

22 pages, 1492 KiB  
Article
An Embedded Mixed-Methods Study with a Dominant Quantitative Strand: The Knowledge of Jordanian Mothers About Risk Factors for Childhood Hearing Loss
by Shawkat Altamimi, Mohamed Tawalbeh, Omar Shawkat Al Tamimi, Tariq N. Al-Shatanawi, Saba’ Azzam Jarrar, Eftekhar Khalid Al Zoubi, Aya Shawkat Altamimi and Ensaf Almomani
Audiol. Res. 2025, 15(4), 87; https://doi.org/10.3390/audiolres15040087 - 16 Jul 2025
Viewed by 270
Abstract
Background: Childhood hearing loss is a public health problem of critical importance associated with speech development, academic achievement, and quality of life. Parents’ awareness and knowledge about risk factors contribute to early detection and timely intervention.  Objective: This study aims to [...] Read more.
Background: Childhood hearing loss is a public health problem of critical importance associated with speech development, academic achievement, and quality of life. Parents’ awareness and knowledge about risk factors contribute to early detection and timely intervention.  Objective: This study aims to examine Jordanian mothers’ knowledge of childhood hearing loss risk factors and investigate the impact of education level and socioeconomic status (SES) on the accuracy and comprehensiveness of this knowledge with the moderating effect of health literacy. Material and Methods: The approach employed an embedded mixed-methods design with a dominant quantitative strand supported by qualitative data, utilizing quantitative surveys (n = 250), analyzed using structural equation modeling (SEM) in SmartPLS, and qualitative interviews (n = 10), analyzed thematically to expand upon the quantitative findings by exploring barriers to awareness and healthcare-seeking behaviors. Results: The accuracy and comprehensiveness of knowledge of hearing loss risk factors were also positively influenced by maternal knowledge of hearing loss risk factors. Maternal knowledge was significantly associated with both education level and socioeconomic status (SES). Furthermore, maternal knowledge and accuracy were significantly moderated by health literacy, such that mothers with higher health literacy exhibited a stronger relationship between knowledge and accuracy. Qualitative findings revealed that individuals encountered barriers to accessing reliable information and comprehending medical advice and faced financial difficulties due to limited options for healthcare services. Conclusions: These results underscore the need for maternal education programs that address specific issues, provide simplified healthcare communication, and enhance access to pediatric audiology services. Future research should explore longitudinal assessments and intervention-based strategies to enhance mothers’ awareness and detect early childhood hearing loss. Full article
Show Figures

Figure 1

27 pages, 1817 KiB  
Article
A Large Language Model-Based Approach for Multilingual Hate Speech Detection on Social Media
by Muhammad Usman, Muhammad Ahmad, Grigori Sidorov, Irina Gelbukh and Rolando Quintero Tellez
Computers 2025, 14(7), 279; https://doi.org/10.3390/computers14070279 - 15 Jul 2025
Viewed by 694
Abstract
The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, [...] Read more.
The proliferation of hate speech on social media platforms poses significant threats to digital safety, social cohesion, and freedom of expression. Detecting such content—especially across diverse languages—remains a challenging task due to linguistic complexity, cultural context, and resource limitations. To address these challenges, this study introduces a comprehensive approach for multilingual hate speech detection. To facilitate robust hate speech detection across diverse languages, this study makes several key contributions. First, we created a novel trilingual hate speech dataset consisting of 10,193 manually annotated tweets in English, Spanish, and Urdu. Second, we applied two innovative techniques—joint multilingual and translation-based approaches—for cross-lingual hate speech detection that have not been previously explored for these languages. Third, we developed detailed hate speech annotation guidelines tailored specifically to all three languages to ensure consistent and high-quality labeling. Finally, we conducted 41 experiments employing machine learning models with TF–IDF features, deep learning models utilizing FastText and GloVe embeddings, and transformer-based models leveraging advanced contextual embeddings to comprehensively evaluate our approach. Additionally, we employed a large language model with advanced contextual embeddings to identify the best solution for the hate speech detection task. The experimental results showed that our GPT-3.5-turbo model significantly outperforms strong baselines, achieving up to an 8% improvement over XLM-R in Urdu hate speech detection and an average gain of 4% across all three languages. This research not only contributes a high-quality multilingual dataset but also offers a scalable and inclusive framework for hate speech detection in underrepresented languages. Full article
(This article belongs to the Special Issue Recent Advances in Social Networks and Social Media)
Show Figures

Figure 1

16 pages, 2365 KiB  
Article
Fast Inference End-to-End Speech Synthesis with Style Diffusion
by Hui Sun, Jiye Song and Yi Jiang
Electronics 2025, 14(14), 2829; https://doi.org/10.3390/electronics14142829 - 15 Jul 2025
Viewed by 468
Abstract
In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in [...] Read more.
In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in the decoder. To address these issues, this paper proposes an improved TTS framework named Q-VITS. Q-VITS incorporates Rotary Position Embedding (RoPE) into the text encoder to enhance long-sequence modeling, adopts a frame-level prior modeling strategy to optimize one-to-many mappings, and designs a style extractor based on a diffusion model for controllable style rendering. Additionally, the proposed decoder ConfoGAN integrates explicit F0 modeling, Pseudo-Quadrature Mirror Filter (PQMF) multi-band synthesis and Conformer structure. The experimental results demonstrate that Q-VITS outperforms the VITS in terms of speech quality, pitch accuracy, and inference efficiency in both subjective Mean Opinion Score (MOS) and objective Mel-Cepstral Distortion (MCD) and Root Mean Square Error (RMSE) evaluations on a single-speaker dataset, achieving performance close to ground-truth audio. These improvements provide an effective solution for efficient and controllable speech synthesis. Full article
Show Figures

Figure 1

15 pages, 2125 KiB  
Article
Psychometric Properties of a 17-Item German Language Short Form of the Speech, Spatial, and Qualities of Hearing Scale and Their Correlation to Audiometry in 97 Individuals with Unilateral Menière’s Disease from a Prospective Multicenter Registry
by Jennifer L. Spiegel, Bernhard Lehnert, Laura Schuller, Irina Adler, Tobias Rader, Tina Brzoska, Bernhard G. Weiss, Martin Canis, Chia-Jung Busch and Friedrich Ihler
J. Clin. Med. 2025, 14(14), 4953; https://doi.org/10.3390/jcm14144953 - 13 Jul 2025
Viewed by 361
Abstract
Background/Objectives: Menière’s disease (MD) is a debilitating disorder with episodic and variable ear symptoms. Diagnosis can be challenging, and evidence for therapeutic approaches is low. Furthermore, patients show a unique and fluctuating configuration of audiovestibular impairment. As a psychometric instrument to assess hearing-specific [...] Read more.
Background/Objectives: Menière’s disease (MD) is a debilitating disorder with episodic and variable ear symptoms. Diagnosis can be challenging, and evidence for therapeutic approaches is low. Furthermore, patients show a unique and fluctuating configuration of audiovestibular impairment. As a psychometric instrument to assess hearing-specific disability is currently lacking, we evaluated a short form of the Speech, Spatial, and Qualities of Hearing Scale (SSQ) in a cohort of patients with MD. Methods: Data was collected in the context of a multicenter prospective patient registry intended for the long-term follow up of MD patients. Hearing was assessed by pure tone and speech audiometry. The SSQ was applied in the German language version with 17 items. Results: In total, 97 consecutive patients with unilateral MD with a mean age of 56.2 ± 5.0 years were included. A total of 55 individuals (57.3%) were female, and 72 (75.0%) were categorized as having definite MD. The average total score of the SSQ was 6.0 ± 2.1. Cronbach’s alpha for internal consistency was 0.960 for the total score. We did not observe undue floor or ceiling effects. SSQ values showed a statistically negative correlation with hearing thresholds and a statistically positive correlation with speech recognition scores of affected ears. Conclusions: The short form of the SSQ provides insight into hearing-specific disability in patients with MD. Therefore, it may be informative regarding disease stage and rehabilitation needs. Full article
(This article belongs to the Special Issue Clinical Diagnosis and Management of Vestibular Disorders)
Show Figures

Figure 1

12 pages, 421 KiB  
Article
Function and Health in Adults with Dyskinetic Cerebral Palsy—A Follow-Up Study
by Kate Himmelmann and Meta N. Eek
J. Clin. Med. 2025, 14(14), 4909; https://doi.org/10.3390/jcm14144909 - 10 Jul 2025
Viewed by 272
Abstract
Background/Objectives: Dyskinetic cerebral palsy (DCP) often implies severe motor impairment and risk of health problems. Our aim was to follow up a group of young adults with DCP that we previously examined as children, to describe health, function, and living conditions. Methods [...] Read more.
Background/Objectives: Dyskinetic cerebral palsy (DCP) often implies severe motor impairment and risk of health problems. Our aim was to follow up a group of young adults with DCP that we previously examined as children, to describe health, function, and living conditions. Methods: Interviews regarding health issues, treatments, and living conditions, and quality of life (RAND-36) and fatigue questionnaires were completed. Gross and fine motor function, communication, and speech ability were classified, and weight, height, spasticity, and dystonia were assessed and compared to previous data. Joint range of motion (ROM) was compared to older adults with DCP. Results: Dystonia was present in all fifteen participants, and spasticity in all but two. A decrease was found mainly in those who received intrathecal baclofen (ITB). ROM limitations were most pronounced in shoulder flexion, abduction and inward rotation (while outward rotation was hypermobile), hip abduction, hamstrings, and knee extension. The majority had frequent contact with primary and specialist healthcare. Seven participants were underweight, eight had a gastrostomy, and seven had ITB. Upper gastrointestinal and respiratory problems were frequent. Orthopedic surgery for scoliosis was reported in five, and lower extremity in nine, while fractures were reported in six participants. RAND-36 revealed physical functioning, general health, and vitality as the greatest problem areas. Fatigue was significant in 64%. Eight participants lived with their parents. Participants at more functional levels completed tertiary education and lived independently. Conclusions: Most participants had severe impairment and many health issues, despite decreased dystonia and spasticity due to ITB. Sleep problems and pain were uncommon. Full article
Show Figures

Graphical abstract

13 pages, 940 KiB  
Review
Management of Dysarthria in Amyotrophic Lateral Sclerosis
by Elena Pasqualucci, Diletta Angeletti, Pamela Rosso, Elena Fico, Federica Zoccali, Paola Tirassa, Armando De Virgilio, Marco de Vincentiis and Cinzia Severini
Cells 2025, 14(14), 1048; https://doi.org/10.3390/cells14141048 - 9 Jul 2025
Viewed by 512
Abstract
Amyotrophic lateral sclerosis (ALS) stands as the leading neurodegenerative disorder affecting the motor system. One of the hallmarks of ALS, especially its bulbar form, is dysarthria, which significantly impairs the quality of life of ALS patients. This review provides a comprehensive overview of [...] Read more.
Amyotrophic lateral sclerosis (ALS) stands as the leading neurodegenerative disorder affecting the motor system. One of the hallmarks of ALS, especially its bulbar form, is dysarthria, which significantly impairs the quality of life of ALS patients. This review provides a comprehensive overview of the current knowledge on the clinical manifestations, diagnostic differentiation, underlying mechanisms, diagnostic tools, and therapeutic strategies for the treatment of dysarthria in ALS. We update on the most promising digital speech biomarkers of ALS that are critical for early and differential diagnosis. Advances in artificial intelligence and digital speech processing have transformed the analysis of speech patterns, and offer the opportunity to start therapy early to improve vocal function, as speech rate appears to decline significantly before the diagnosis of ALS is confirmed. In addition, we discuss the impact of interventions that can improve vocal function and quality of life for patients, such as compensatory speech techniques, surgical options, improving lung function and respiratory muscle strength, and percutaneous dilated tracheostomy, possibly with adjunctive therapies to treat respiratory insufficiency, and finally assistive devices for alternative communication. Full article
(This article belongs to the Special Issue Pathology and Treatments of Amyotrophic Lateral Sclerosis (ALS))
Show Figures

Figure 1

15 pages, 815 KiB  
Article
Tests of the Influence of DAF (Delayed Auditory Feedback) on Changes in Speech Signal Parameters
by Dominika Kanty and Piotr Staroniewicz
Appl. Sci. 2025, 15(13), 7524; https://doi.org/10.3390/app15137524 - 4 Jul 2025
Viewed by 251
Abstract
Contemporary phonetics and speech therapy continuously seek new techniques and methods that could contribute to improving verbal communication for individuals with speech disorders. One such phenomenon, Delayed Auditory Feedback (DAF), involves the speaker hearing their own voice with a specific delay relative to [...] Read more.
Contemporary phonetics and speech therapy continuously seek new techniques and methods that could contribute to improving verbal communication for individuals with speech disorders. One such phenomenon, Delayed Auditory Feedback (DAF), involves the speaker hearing their own voice with a specific delay relative to real-time speech. Although the research presented in this study was conducted on healthy individuals, it offers valuable insights into the mechanisms controlling speech, which may also apply to individuals with speech disorders. This article introduces a novel method and measurement setup, focusing on selected key speech signal parameters. To characterize the impact of Delayed Auditory Feedback (DAF) on fluent speakers, speech signal parameters were measured in 5 women and 5 men during spontaneous speech and reading. Parameters such as speech rate, fundamental frequency, formants, speech duration, jitter, and shimmer were analyzed both during and prior to the application of DAF. The results of this study may find practical applications in the field of telecommunications, especially in improving the efficiency and quality of human communication. Full article
Show Figures

Figure 1

33 pages, 519 KiB  
Systematic Review
Impact of Oncological Treatment on Quality of Life in Patients with Head and Neck Malignancies: A Systematic Literature Review (2020–2025)
by Raluca Grigore, Paula Luiza Bejenaru, Gloria Simona Berteșteanu, Ruxandra Ioana Nedelcu-Stancalie, Teodora Elena Schipor-Diaconu, Simona Andreea Rujan, Bianca Petra Taher, Șerban Vifor Gabriel Berteșteanu, Bogdan Popescu, Irina Doinița Popescu, Alexandru Nicolaescu, Anca Ionela Cîrstea and Catrinel Beatrice Simion-Antonie
Curr. Oncol. 2025, 32(7), 379; https://doi.org/10.3390/curroncol32070379 - 30 Jun 2025
Viewed by 442
Abstract
Background: Quality of life (QoL) is a critical indicator in assessing the success of oncological treatments for head and neck malignancies, reflecting their impact on physiological functions and psychosocial well-being beyond mere survival. Treatments (surgery, radiotherapy, chemotherapy) pose multiple functional and emotional [...] Read more.
Background: Quality of life (QoL) is a critical indicator in assessing the success of oncological treatments for head and neck malignancies, reflecting their impact on physiological functions and psychosocial well-being beyond mere survival. Treatments (surgery, radiotherapy, chemotherapy) pose multiple functional and emotional challenges, and recent advancements underscore the necessity of evaluating post-treatment QoL. Objective: This literature review investigates the impact of oncological treatment on the QoL of patients with malignant head and neck cancers (oral, oropharyngeal, hypopharyngeal, laryngeal) and identifies factors influencing their QoL index. Methodology: Using a PICO framework, studies from PubMed Central were analyzed, selected based on inclusion (English publications, full text, PROM results) and exclusion criteria. The last research was conducted on 6 April 2025. From 231 identified studies, 49 were included after applying filters (MeSH: “Quality of Life,” “laryngeal cancer,” “oral cavity cancer,” etc.). Data were organized in Excel, and the methodology adhered to PRISMA standards. Results: Treatment Impact: Oncological treatments significantly affect QoL, with acute post-treatment declines in functions such as speech, swallowing, and emotional well-being (anxiety, depression). Partial recovery depends on rehabilitative interventions. Influencing Factors: Treatment type, disease stage, socioeconomic, and demographic contexts influence QoL. De-escalated treatments and prompt rehabilitation improve recovery, while complications like trismus, dysphagia, or persistent hearing issues reduce long-term QoL. Assessment Tools: Standardized PROM questionnaires (EORTC QLQ-C30, QLQ-H&N35, MDADI, HADS) highlighted QoL variations. Studies from Europe, North America, and Asia indicate regional differences in outcomes. Limitations: Retrospective designs, small sample sizes, and PROM variability limit generalizability. Multicentric studies with extended follow-up are recommended. Conclusions: Oncological treatments for head and neck malignancies have a complex impact on QoL, necessitating personalized and multidisciplinary strategies. De-escalated therapies, early rehabilitation, and continuous monitoring are essential for optimizing functional and psychosocial outcomes. Methodological gaps highlight the need for standardized research. Full article
(This article belongs to the Section Head and Neck Oncology)
Show Figures

Figure 1

25 pages, 2093 KiB  
Article
Deep Learning-Based Speech Enhancement for Robust Sound Classification in Security Systems
by Samuel Yaw Mensah, Tao Zhang, Nahid AI Mahmud and Yanzhang Geng
Electronics 2025, 14(13), 2643; https://doi.org/10.3390/electronics14132643 - 30 Jun 2025
Viewed by 764
Abstract
Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and [...] Read more.
Deep learning has emerged as a powerful technique for speech enhancement, particularly in security systems where audio signals are often degraded by non-stationary noise. Traditional signal processing methods struggle in such conditions, making it difficult to detect critical sounds like gunshots, alarms, and unauthorized speech. This study investigates a hybrid deep learning framework that combines Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Generative Adversarial Networks (GANs) to enhance speech quality and improve sound classification accuracy in noisy security environments. The proposed model is trained and validated using real-world datasets containing diverse noise distortions, including VoxCeleb for benchmarking speech enhancement and UrbanSound8K and ESC-50 for sound classification. Performance is evaluated using industry-standard metrics such as Perceptual Evaluation of Speech Quality (PESQ), Short-Time Objective Intelligibility (STOI), and Signal-to-Noise Ratio (SNR). The architecture includes multi-layered neural networks, residual connections, and dropout regularization to ensure robustness and generalizability. Additionally, the paper addresses key challenges in deploying deep learning models for security applications, such as computational complexity, latency, and vulnerability to adversarial attacks. Experimental results demonstrate that the proposed DNN + GAN-based approach significantly improves speech intelligibility and classification performance in high-interference scenarios, offering a scalable solution for enhancing the reliability of audio-based security systems. Full article
Show Figures

Figure 1

29 pages, 643 KiB  
Review
Psychological Distress and Quality of Life in Patients with Laryngeal Cancer: A Review
by Maria Octavia Murariu, Eugen Radu Boia, Adrian Mihail Sitaru, Cristian Ion Mot, Mihaela Cristina Negru, Alexandru Cristian Brici, Delia Elena Zahoi and Nicolae Constantin Balica
Healthcare 2025, 13(13), 1552; https://doi.org/10.3390/healthcare13131552 - 29 Jun 2025
Viewed by 522
Abstract
Laryngeal cancer significantly affects not only survival but also core functions such as speech, swallowing, and breathing. These impairments often result in substantial psychological distress and reduced health-related quality of life (HRQoL). This review aims to synthesize current evidence regarding the psychological impact, [...] Read more.
Laryngeal cancer significantly affects not only survival but also core functions such as speech, swallowing, and breathing. These impairments often result in substantial psychological distress and reduced health-related quality of life (HRQoL). This review aims to synthesize current evidence regarding the psychological impact, quality of life outcomes, and system-level challenges faced by laryngeal cancer patients while identifying strategies for integrated survivorship care. Anxiety and depressive symptoms are highly prevalent among laryngeal cancer patients, particularly those undergoing total laryngectomy or chemoradiotherapy. HRQoL outcomes vary significantly depending on treatment modality, with long-term deficits noted in domains such as voice, swallowing, and emotional well-being. Access to psychological support and rehabilitation remains inconsistent, hindered by institutional, socioeconomic, and cultural barriers. Structured survivorship models, psychological screening, and patient-centered rehabilitation have demonstrated benefits but are not universally implemented. Comprehensive care for laryngeal cancer must extend beyond tumor control to address persistent functional and psychological sequelae. A multidisciplinary, anticipatory, and personalized approach—centered on integrated rehabilitation and mental health support—is essential to optimize survivorship outcomes and improve long-term quality of life. Full article
Show Figures

Figure 1

Back to TopTop