Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (460)

Search Parameters:
Keywords = emotional speech

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1536 KiB  
Article
Graph Convolution-Based Decoupling and Consistency-Driven Fusion for Multimodal Emotion Recognition
by Yingmin Deng, Chenyu Li, Yu Gu, He Zhang, Linsong Liu, Haixiang Lin, Shuang Wang and Hanlin Mo
Electronics 2025, 14(15), 3047; https://doi.org/10.3390/electronics14153047 - 30 Jul 2025
Viewed by 228
Abstract
Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic [...] Read more.
Multimodal emotion recognition (MER) is essential for understanding human emotions from diverse sources such as speech, text, and video. However, modality heterogeneity and inconsistent expression pose challenges for effective feature fusion. To address this, we propose a novel MER framework combining a Dynamic Weighted Graph Convolutional Network (DW-GCN) for feature disentanglement and a Cross-Attention Consistency-Gated Fusion (CACG-Fusion) module for robust integration. DW-GCN models complex inter-modal relationships, enabling the extraction of both common and private features. The CACG-Fusion module subsequently enhances classification performance through dynamic alignment of cross-modal cues, employing attention-based coordination and consistency-preserving gating mechanisms to optimize feature integration. Experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that our method achieves state-of-the-art performance, significantly improving the ACC7, ACC2, and F1 scores. Full article
(This article belongs to the Section Computer Science & Engineering)
Show Figures

Figure 1

17 pages, 8512 KiB  
Article
Interactive Holographic Display System Based on Emotional Adaptability and CCNN-PCG
by Yu Zhao, Zhong Xu, Ting-Yu Zhang, Meng Xie, Bing Han and Ye Liu
Electronics 2025, 14(15), 2981; https://doi.org/10.3390/electronics14152981 - 26 Jul 2025
Viewed by 315
Abstract
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm [...] Read more.
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm to generate a computer-generated hologram (CGH) with depth information for application in point cloud data. During digital human hologram building, 2D-to-3D conversion yields high-precision point cloud data. The system uses ChatGLM for natural language processing and emotion-adaptive responses, enabling multi-turn voice dialogs and text-driven model generation. The CCNN-PCG algorithm reduces computational complexity and improves display quality. Simulations and experiments show that CCNN-PCG enhances reconstruction quality and speeds up computation by over 2.2 times. This research provides a theoretical framework and practical technology for holographic interactive systems, applicable in virtual assistants, educational displays, and other fields. Full article
(This article belongs to the Special Issue Artificial Intelligence, Computer Vision and 3D Display)
Show Figures

Figure 1

20 pages, 651 KiB  
Review
Communication Disorders and Mental Health Outcomes in Children and Adolescents: A Scoping Review
by Lifan Xue, Yifang Gong, Shane Pill and Weifeng Han
Healthcare 2025, 13(15), 1807; https://doi.org/10.3390/healthcare13151807 - 25 Jul 2025
Viewed by 447
Abstract
Background/Objectives: Communication disorders in childhood, including expressive, receptive, pragmatic, and fluency impairments, have been consistently linked to mental health challenges such as anxiety, depression, and behavioural difficulties. However, existing research remains fragmented across diagnostic categories and developmental stages. This scoping review aimed [...] Read more.
Background/Objectives: Communication disorders in childhood, including expressive, receptive, pragmatic, and fluency impairments, have been consistently linked to mental health challenges such as anxiety, depression, and behavioural difficulties. However, existing research remains fragmented across diagnostic categories and developmental stages. This scoping review aimed to synthesise empirical evidence on the relationship between communication disorders and mental health outcomes in children and adolescents and to identify key patterns and implications for practice and policy. Methods: Following the PRISMA Extension for Scoping Reviews (PRISMA-ScR) and Arksey and O’Malley’s framework, this review included empirical studies published in English between 2000 and 2024. Five databases were searched, and ten studies met the inclusion criteria. Data were charted and thematically analysed to explore associations across communication profiles and emotional–behavioural outcomes. Results: Four interconnected themes were identified: (1) emotional and behavioural manifestations of communication disorders; (2) social burden linked to pragmatic and expressive difficulties; (3) family and environmental stressors exacerbating child-level challenges; and (4) a lack of integrated care models addressing both communication and mental health needs. The findings highlight that communication disorders frequently co-occur with emotional difficulties, often embedded within broader social and systemic contexts. Conclusions: This review underscores the need for developmentally informed, culturally responsive, and interdisciplinary service models that address both communication and mental health in children. Early identification, family-centred care, and policy reforms are critical to reducing inequities and improving outcomes for this underserved population. Full article
Show Figures

Figure 1

13 pages, 1177 KiB  
Perspective
Banking on My Voice: Life with Motor Neurone Disease
by Ian Barry and Sarah El-Wahsh
Healthcare 2025, 13(14), 1770; https://doi.org/10.3390/healthcare13141770 - 21 Jul 2025
Viewed by 378
Abstract
This perspective paper presents a first-person account of life with motor neurone disease (MND). Through the lens of lived experience, it explores the complex and often prolonged diagnostic journey, shaped in part by the protective grip of denial. This paper then delves into [...] Read more.
This perspective paper presents a first-person account of life with motor neurone disease (MND). Through the lens of lived experience, it explores the complex and often prolonged diagnostic journey, shaped in part by the protective grip of denial. This paper then delves into the emotional impact of MND on the individual and their close relationships, capturing the strain on identity and family dynamics. It also highlights the vital role of the multidisciplinary team in providing support throughout the journey. A central focus of the paper is the personal journey of voice banking. It reflects on the restorative experience of reclaiming a pre-disease voice through tools such as ElevenLabsTM. This narrative underscores the critical importance of early intervention and timely access to voice banking, positioning voice not only as a tool for communication but also as a powerful anchor of identity, dignity, and agency. The paper concludes by highlighting key systemic gaps in MND care. It calls for earlier referral to speech pathology, earlier access to voice banking, access to psychological support from the time of diagnosis, and better integration between research and clinical care. Full article
(This article belongs to the Special Issue Improving Care for People Living with ALS/MND)
Show Figures

Figure 1

16 pages, 317 KiB  
Perspective
Listening to the Mind: Integrating Vocal Biomarkers into Digital Health
by Irene Rodrigo and Jon Andoni Duñabeitia
Brain Sci. 2025, 15(7), 762; https://doi.org/10.3390/brainsci15070762 - 18 Jul 2025
Viewed by 528
Abstract
The human voice is an invaluable tool for communication, carrying information about a speaker’s emotional state and cognitive health. Recent research highlights the potential of acoustic biomarkers to detect early signs of mental health and neurodegenerative conditions. Despite their promise, vocal biomarkers remain [...] Read more.
The human voice is an invaluable tool for communication, carrying information about a speaker’s emotional state and cognitive health. Recent research highlights the potential of acoustic biomarkers to detect early signs of mental health and neurodegenerative conditions. Despite their promise, vocal biomarkers remain underutilized in clinical settings, with limited standardized protocols for assessment. This Perspective article argues for the integration of acoustic biomarkers into digital health solutions to improve the detection and monitoring of cognitive impairment and emotional disturbances. Advances in speech analysis and machine learning have demonstrated the feasibility of using voice features such as pitch, jitter, shimmer, and speech rate to assess these conditions. Moreover, we propose that singing, particularly simple melodic structures, could be an effective and accessible means of gathering vocal biomarkers, offering additional insights into cognitive and emotional states. Given its potential to engage multiple neural networks, singing could function as an assessment tool and an intervention strategy for individuals with cognitive decline. We highlight the necessity of further research to establish robust, reproducible methodologies for analyzing vocal biomarkers and standardizing voice-based diagnostic approaches. By integrating vocal analysis into routine health assessments, clinicians and researchers could significantly advance early detection and personalized interventions for cognitive and emotional disorders. Full article
(This article belongs to the Topic Language: From Hearing to Speech and Writing)
21 pages, 497 KiB  
Article
Small Language Models for Speech Emotion Recognition in Text and Audio Modalities
by José L. Gómez-Sirvent, Francisco López de la Rosa, Daniel Sánchez-Reolid, Roberto Sánchez-Reolid and Antonio Fernández-Caballero
Appl. Sci. 2025, 15(14), 7730; https://doi.org/10.3390/app15147730 - 10 Jul 2025
Viewed by 677
Abstract
Speech emotion recognition has become increasingly important in a wide range of applications, driven by the development of large transformer-based natural language processing models. However, the large size of these architectures limits their usability, which has led to a growing interest in smaller [...] Read more.
Speech emotion recognition has become increasingly important in a wide range of applications, driven by the development of large transformer-based natural language processing models. However, the large size of these architectures limits their usability, which has led to a growing interest in smaller models. In this paper, we evaluate nineteen of the most popular small language models for the text and audio modalities for speech emotion recognition on the IEMOCAP dataset. Based on their cross-validation accuracy, the best architectures were selected to create ensemble models to evaluate the effect of combining audio and text, as well as the effect of incorporating contextual information on model performance. The experiments conducted showed a significant increase in accuracy with the inclusion of contextual information and the combination of modalities. The results obtained were highly competitive, outperforming numerous recent approaches. The proposed ensemble model achieved an accuracy of 82.12% on the IEMOCAP dataset, outperforming several recent approaches. These results demonstrate the effectiveness of ensemble methods for improving speech emotion recognition performance, and highlight the feasibility of training multiple small language models on consumer-grade computers. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

29 pages, 4973 KiB  
Article
Speech and Elocution Training (SET): A Self-Efficacy Catalyst for Language Potential Activation and Career-Oriented Development for Higher Vocational Students
by Xiaojian Zheng, Mohd Hazwan Mohd Puad and Habibah Ab Jalil
Educ. Sci. 2025, 15(7), 850; https://doi.org/10.3390/educsci15070850 - 2 Jul 2025
Viewed by 448
Abstract
This study explores how Speech and Elocution Training (SET) activates language potential and fosters career-oriented development among higher vocational students through self-efficacy mechanisms. Through qualitative interviews with four vocational graduates who participated in SET 5 to 10 years ago, the research identifies three [...] Read more.
This study explores how Speech and Elocution Training (SET) activates language potential and fosters career-oriented development among higher vocational students through self-efficacy mechanisms. Through qualitative interviews with four vocational graduates who participated in SET 5 to 10 years ago, the research identifies three key findings. First, SET comprises curriculum content (e.g., workplace communication modules such as hosting, storytelling, and sales pitching) and classroom training using multimodal TED resources and Toastmasters International-simulated practices, which spark language potential through skill-focused, realistic exercises. Second, these pedagogies facilitate a progression where initial language potential evolves from nascent career interests into concrete job-seeking intentions and long-term career plans: completing workplace-related speech tasks boosts confidence in career choices, planning, and job competencies, enabling adaptability to professional challenges. Third, SET aligns with Bandura’s four self-efficacy determinants; these are successful experiences (including personalized and virtual skill acquisition and certified affirmation), vicarious experiences (via observation platforms and constructive peer modeling), verbal persuasion (direct instructional feedback and indirect emotional support), and the arousal of optimistic emotions (the cognitive reframing of challenges and direct desensitization to anxieties). These mechanisms collectively create a positive cycle that enhances self-efficacy, amplifies language potential, and clarifies career intentions. While highlighting SET’s efficacy, this study notes a small sample size limitation, urging future mixed-methods studies with diverse samples to validate these mechanisms across broader vocational contexts and refine understanding of language training’s role in fostering linguistic competence and career readiness. Full article
Show Figures

Figure 1

33 pages, 519 KiB  
Systematic Review
Impact of Oncological Treatment on Quality of Life in Patients with Head and Neck Malignancies: A Systematic Literature Review (2020–2025)
by Raluca Grigore, Paula Luiza Bejenaru, Gloria Simona Berteșteanu, Ruxandra Ioana Nedelcu-Stancalie, Teodora Elena Schipor-Diaconu, Simona Andreea Rujan, Bianca Petra Taher, Șerban Vifor Gabriel Berteșteanu, Bogdan Popescu, Irina Doinița Popescu, Alexandru Nicolaescu, Anca Ionela Cîrstea and Catrinel Beatrice Simion-Antonie
Curr. Oncol. 2025, 32(7), 379; https://doi.org/10.3390/curroncol32070379 - 30 Jun 2025
Viewed by 489
Abstract
Background: Quality of life (QoL) is a critical indicator in assessing the success of oncological treatments for head and neck malignancies, reflecting their impact on physiological functions and psychosocial well-being beyond mere survival. Treatments (surgery, radiotherapy, chemotherapy) pose multiple functional and emotional [...] Read more.
Background: Quality of life (QoL) is a critical indicator in assessing the success of oncological treatments for head and neck malignancies, reflecting their impact on physiological functions and psychosocial well-being beyond mere survival. Treatments (surgery, radiotherapy, chemotherapy) pose multiple functional and emotional challenges, and recent advancements underscore the necessity of evaluating post-treatment QoL. Objective: This literature review investigates the impact of oncological treatment on the QoL of patients with malignant head and neck cancers (oral, oropharyngeal, hypopharyngeal, laryngeal) and identifies factors influencing their QoL index. Methodology: Using a PICO framework, studies from PubMed Central were analyzed, selected based on inclusion (English publications, full text, PROM results) and exclusion criteria. The last research was conducted on 6 April 2025. From 231 identified studies, 49 were included after applying filters (MeSH: “Quality of Life,” “laryngeal cancer,” “oral cavity cancer,” etc.). Data were organized in Excel, and the methodology adhered to PRISMA standards. Results: Treatment Impact: Oncological treatments significantly affect QoL, with acute post-treatment declines in functions such as speech, swallowing, and emotional well-being (anxiety, depression). Partial recovery depends on rehabilitative interventions. Influencing Factors: Treatment type, disease stage, socioeconomic, and demographic contexts influence QoL. De-escalated treatments and prompt rehabilitation improve recovery, while complications like trismus, dysphagia, or persistent hearing issues reduce long-term QoL. Assessment Tools: Standardized PROM questionnaires (EORTC QLQ-C30, QLQ-H&N35, MDADI, HADS) highlighted QoL variations. Studies from Europe, North America, and Asia indicate regional differences in outcomes. Limitations: Retrospective designs, small sample sizes, and PROM variability limit generalizability. Multicentric studies with extended follow-up are recommended. Conclusions: Oncological treatments for head and neck malignancies have a complex impact on QoL, necessitating personalized and multidisciplinary strategies. De-escalated therapies, early rehabilitation, and continuous monitoring are essential for optimizing functional and psychosocial outcomes. Methodological gaps highlight the need for standardized research. Full article
(This article belongs to the Section Head and Neck Oncology)
Show Figures

Figure 1

29 pages, 643 KiB  
Review
Psychological Distress and Quality of Life in Patients with Laryngeal Cancer: A Review
by Maria Octavia Murariu, Eugen Radu Boia, Adrian Mihail Sitaru, Cristian Ion Mot, Mihaela Cristina Negru, Alexandru Cristian Brici, Delia Elena Zahoi and Nicolae Constantin Balica
Healthcare 2025, 13(13), 1552; https://doi.org/10.3390/healthcare13131552 - 29 Jun 2025
Viewed by 557
Abstract
Laryngeal cancer significantly affects not only survival but also core functions such as speech, swallowing, and breathing. These impairments often result in substantial psychological distress and reduced health-related quality of life (HRQoL). This review aims to synthesize current evidence regarding the psychological impact, [...] Read more.
Laryngeal cancer significantly affects not only survival but also core functions such as speech, swallowing, and breathing. These impairments often result in substantial psychological distress and reduced health-related quality of life (HRQoL). This review aims to synthesize current evidence regarding the psychological impact, quality of life outcomes, and system-level challenges faced by laryngeal cancer patients while identifying strategies for integrated survivorship care. Anxiety and depressive symptoms are highly prevalent among laryngeal cancer patients, particularly those undergoing total laryngectomy or chemoradiotherapy. HRQoL outcomes vary significantly depending on treatment modality, with long-term deficits noted in domains such as voice, swallowing, and emotional well-being. Access to psychological support and rehabilitation remains inconsistent, hindered by institutional, socioeconomic, and cultural barriers. Structured survivorship models, psychological screening, and patient-centered rehabilitation have demonstrated benefits but are not universally implemented. Comprehensive care for laryngeal cancer must extend beyond tumor control to address persistent functional and psychological sequelae. A multidisciplinary, anticipatory, and personalized approach—centered on integrated rehabilitation and mental health support—is essential to optimize survivorship outcomes and improve long-term quality of life. Full article
Show Figures

Figure 1

24 pages, 1664 KiB  
Review
A Comprehensive Review of Multimodal Emotion Recognition: Techniques, Challenges, and Future Directions
by You Wu, Qingwei Mi and Tianhan Gao
Biomimetics 2025, 10(7), 418; https://doi.org/10.3390/biomimetics10070418 - 27 Jun 2025
Viewed by 1823
Abstract
This paper presents a comprehensive review of multimodal emotion recognition (MER), a process that integrates multiple data modalities such as speech, visual, and text to identify human emotions. Grounded in biomimetics, the survey frames MER as a bio-inspired sensing paradigm that emulates the [...] Read more.
This paper presents a comprehensive review of multimodal emotion recognition (MER), a process that integrates multiple data modalities such as speech, visual, and text to identify human emotions. Grounded in biomimetics, the survey frames MER as a bio-inspired sensing paradigm that emulates the way humans seamlessly fuse multisensory cues to communicate affect, thereby transferring principles from living systems to engineered solutions. By leveraging various modalities, MER systems offer a richer and more robust analysis of emotional states compared to unimodal approaches. The review covers the general structure of MER systems, feature extraction techniques, and multimodal information fusion strategies, highlighting key advancements and milestones. Additionally, it addresses the research challenges and open issues in MER, including lightweight models, cross-corpus generalizability, and the incorporation of additional modalities. The paper concludes by discussing future directions aimed at improving the accuracy, explainability, and practicality of MER systems for real-world applications. Full article
(This article belongs to the Special Issue Intelligent Human–Robot Interaction: 4th Edition)
Show Figures

Figure 1

18 pages, 1498 KiB  
Article
Speech Emotion Recognition on MELD and RAVDESS Datasets Using CNN
by Gheed T. Waleed and Shaimaa H. Shaker
Information 2025, 16(7), 518; https://doi.org/10.3390/info16070518 - 21 Jun 2025
Viewed by 1157
Abstract
Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature [...] Read more.
Speech emotion recognition (SER) plays a vital role in enhancing human–computer interaction (HCI) and can be applied in affective computing, virtual support, and healthcare. This research presents a high-performance SER framework based on a lightweight 1D Convolutional Neural Network (1D-CNN) and a multi-feature fusion technique. Rather than employing spectrograms as image-based input, frame-level characteristics (Mel-Frequency Cepstral Coefficients, Mel-Spectrograms, and Chroma vectors) are calculated throughout the sequences to preserve temporal information and reduce the computing expense. The model attained classification accuracies of 94.0% on MELD (multi-party talks) and 91.9% on RAVDESS (acted speech). Ablation experiments demonstrate that the integration of complimentary features significantly outperforms the utilisation of a singular feature as a baseline. Data augmentation techniques, including Gaussian noise and time shifting, enhance model generalisation. The proposed method demonstrates significant potential for real-time emotion recognition using audio only in embedded or resource-constrained devices. Full article
(This article belongs to the Special Issue Artificial Intelligence Methods for Human-Computer Interaction)
Show Figures

Figure 1

20 pages, 297 KiB  
Article
Schizotypal Traits in Children with Autism Spectrum Disorder and the Impact on Social, Emotional and Behavioral Functioning
by Evdokia Tagkouli, Evangelia Chrysanthi Kouklari, Bruce J. Tonge, Vassiliki Ntre, Artemios Pehlivanidis, Nikos C. Stefanis, Christos Pantelis and Katerina Papanikolaou
Brain Sci. 2025, 15(7), 668; https://doi.org/10.3390/brainsci15070668 - 20 Jun 2025
Viewed by 1471
Abstract
Background: Schizotypal traits are considered to be clinical and cognitive features of Schizotypal Disorder in children (SDc). These traits are also seen in children and adolescents with high-functioning Autism Spectrum Disorder (ASD). This study examines the influence of schizotypal traits (and their severity) [...] Read more.
Background: Schizotypal traits are considered to be clinical and cognitive features of Schizotypal Disorder in children (SDc). These traits are also seen in children and adolescents with high-functioning Autism Spectrum Disorder (ASD). This study examines the influence of schizotypal traits (and their severity) on the capacity of children with ASD to manage emotions, develop relationships with others, and adapt in school and family life. Methods: The Schizotypal traits of 63 children (6–12 years old) with High Functioning ASD were measured by the Melbourne Assessment of Schizotypy in Kids (MASK). Parents and teachers of the participating children completed the Child Behavior Checklist (CBCL) and Teachers’ Report Form (TRF) from the Achenbach System of Empirically Based Assessment and the Aberrant Behavior Checklist (ABC). Results: Overall, the results indicated correlations between the MASK scores and problems recorded by teachers, such as Internalizing problems (i.e., Anxious/Depressed, Withdrawn/Depressed, and Other problems score) according to TRF and Inappropriate speech scores, according to teacher’s ABC scales. Schizotypal traits impact the social, emotional, and behavioral functioning of children with ASD at home and school environments. Conclusions: The assessment of schizotypal traits in children with ASD provides critical information about a child’s functionality and cognitive development, also leading to the identification of potential cognitive-neuropsychological endophenotypes within ASD with characteristics of both Autism and Schizophrenia spectra. Τhe development of a valid assessment tool is required, as well as the design of targeted interventions to prevent the loss of functionality. Full article
(This article belongs to the Section Neuropsychology)
25 pages, 1822 KiB  
Article
Emotion Recognition from Speech in a Subject-Independent Approach
by Andrzej Majkowski and Marcin Kołodziej
Appl. Sci. 2025, 15(13), 6958; https://doi.org/10.3390/app15136958 - 20 Jun 2025
Cited by 1 | Viewed by 650
Abstract
The aim of this article is to critically and reliably assess the potential of current emotion recognition technologies for practical applications in human–computer interaction (HCI) systems. The study made use of two databases: one in English (RAVDESS) and another in Polish (EMO-BAJKA), both [...] Read more.
The aim of this article is to critically and reliably assess the potential of current emotion recognition technologies for practical applications in human–computer interaction (HCI) systems. The study made use of two databases: one in English (RAVDESS) and another in Polish (EMO-BAJKA), both containing speech recordings expressing various emotions. The effectiveness of recognizing seven and eight different emotions was analyzed. A range of acoustic features, including energy features, mel-cepstral features, zero-crossing rate, fundamental frequency, and spectral features, were utilized to analyze the emotions in speech. Machine learning techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and support vector machines with a cubic kernel (cubic SVMs) were employed in the emotion classification task. The research findings indicated that the effective recognition of a broad spectrum of emotions in a subject-independent approach is limited. However, significantly better results were obtained in the classification of paired emotions, suggesting that emotion recognition technologies could be effectively used in specific applications where distinguishing between two particular emotional states is essential. To ensure a reliable and accurate assessment of the emotion recognition system, care was taken to divide the dataset in such a way that the training and testing data contained recordings of completely different individuals. The highest classification accuracies for pairs of emotions were achieved for Angry–Fearful (0.8), Angry–Happy (0.86), Angry–Neutral (1.0), Angry–Sad (1.0), Angry–Surprise (0.89), Disgust–Neutral (0.91), and Disgust–Sad (0.96) in the RAVDESS. In the EMO-BAJKA database, the highest classification accuracies for pairs of emotions were for Joy–Neutral (0.91), Surprise–Neutral (0.80), Surprise–Fear (0.91), and Neutral–Fear (0.91). Full article
(This article belongs to the Special Issue New Advances in Applied Machine Learning)
Show Figures

Figure 1

16 pages, 1569 KiB  
Article
Virtual Reality-Assisted, Single-Session Exposure for Public Speaking Anxiety: Improved Self-Reports and Heart Rate but No Significant Change in Heart Rate Variability
by Tonia-Flery Artemi, Thekla Konstantinou, Stephany Naziri and Georgia Panayiotou
Virtual Worlds 2025, 4(2), 27; https://doi.org/10.3390/virtualworlds4020027 - 19 Jun 2025
Viewed by 640
Abstract
Introduction: This study examines the combined use of objective physiological measures (heart rate [HR], heart rate variability [HRV]) and subjective self-reports to gain a comprehensive understanding of anxiety reduction mechanisms—specifically, habituation—in the context of Virtual Reality Exposure (VRE) for public speaking anxiety (PSA). [...] Read more.
Introduction: This study examines the combined use of objective physiological measures (heart rate [HR], heart rate variability [HRV]) and subjective self-reports to gain a comprehensive understanding of anxiety reduction mechanisms—specifically, habituation—in the context of Virtual Reality Exposure (VRE) for public speaking anxiety (PSA). The present study evaluated whether a single-session, personalized VRE intervention could effectively reduce PSA. Methods: A total of 39 university students (mean age = 20.97, SD = 3.05) with clinically significant PSA were randomly assigned to a VRE group or a control group. Participants completed a 2 min speech task before and after the intervention and reported subjective distress (Subjective Units of Distress, SUDs), public speaking confidence (Personal Report of Confidence as a Speaker, PRCS), and willingness to speak in public. Heart rate (HR) and heart rate variability (HRV; RMSSD) were recorded at baseline and during speech tasks. The VRE protocol used personalized, hierarchical exposure to virtual audiences, with repeated trials until a criterion reduction in SUDs was achieved. Non-parametric analyses assessed group and time effects. Results: VRE participants showed significant reductions in subjective distress (p < 0.001) and HR (p < 0.001), with HR returning to baseline post-intervention. No such reductions were observed in the control group. Willingness to speak improved significantly only in the VRE group (p = 0.001). HRV did not differ significantly across time or groups. Conclusions: A single, personalized VRE session can produce measurable reductions in PSA, particularly in subjective distress and autonomic arousal, supporting habituation as a primary mechanism of change, even after one session. The lack of HRV change suggests that emotion regulation may require more prolonged interventions. These findings support VRE’s potential as an efficient and scalable treatment option for PSA. Full article
Show Figures

Figure 1

20 pages, 1481 KiB  
Article
Analysis and Research on Spectrogram-Based Emotional Speech Signal Augmentation Algorithm
by Huawei Tao, Sixian Li, Xuemei Wang, Binkun Liu and Shuailong Zheng
Entropy 2025, 27(6), 640; https://doi.org/10.3390/e27060640 - 15 Jun 2025
Viewed by 387
Abstract
Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may [...] Read more.
Data augmentation techniques are widely applied in speech emotion recognition to increase the diversity of data and enhance the performance of models. However, existing research has not deeply explored the impact of these data augmentation techniques on emotional data. Inappropriate augmentation algorithms may distort emotional labels, thereby reducing the performance of models. To address this issue, in this paper we systematically evaluate the influence of common data augmentation algorithms on emotion recognition from three dimensions: (1) we design subjective auditory experiments to intuitively demonstrate the impact of augmentation algorithms on the emotional expression of speech; (2) we jointly extract multi-dimensional features from spectrograms based on the Librosa library and analyze the impact of data augmentation algorithms on the spectral features of speech signals through heatmap visualization; and (3) we objectively evaluate the recognition performance of the model by means of indicators such as cross-entropy loss and introduce statistical significance analysis to verify the effectiveness of the augmentation algorithms. The experimental results show that “time stretching” may distort speech features, affect the attribution of emotional labels, and significantly reduce the model’s accuracy. In contrast, “reverberation” (RIR) and “resampling” within a limited range have the least impact on emotional data, enhancing the diversity of samples. Moreover, their combination can increase accuracy by up to 7.1%, providing a basis for optimizing data augmentation strategies. Full article
Show Figures

Figure 1

Back to TopTop