Saved Queries

The aim of this study is to propose an uncertainty-aware dynamic cascading framework based on spiking neural network (UDC-SNN) for multimodal emotion recognition, particularly to address the inherent trade-off between recognition performance and energy efficiency. An asymmetric dynamic routing mechanism was proposed to enable demand-driven activation of the high-power electroencephalogram (EEG) branch, coupled with preliminary inference on a low-power electrocardiogram (ECG) branch and uncertainty quantification via Shannon entropy. Meanwhile, a parameter-free log-linear aggregation strategy was developed to transform modality-specific entropy into dynamic Bayesian weights through an exponential decay function, effectively mitigating the negative transfer effects induced by unimodal noise. The UDC-SNN was evaluated on the multimodal affective dataset DREAMER, comprising 23 subjects (170,660 segments). The averaged recognition accuracy and energy consumption across the three dimensions of valence, arousal, and dominance were 90.75% and 4.62

μ

J, respectively. The obtained results suggest that the proposed framework could potentially achieve a favorable balance between high emotion recognition and low energy consumption, thereby establishing its applicability for real-time monitoring in resource-constrained scenarios. Full article

(This article belongs to the Special Issue Advanced Sensing Techniques in Biomedical Signal Processing)

►▼ Show Figures

Figure 1

23 pages, 2606 KB

Open AccessArticle

Adaptive Confidence-Gated Hybrid Ensemble Framework for Speech Emotion Recognition

by Salem Titouni, Nadhir Djeffal, Abdallah Hedir, Massinissa Belazzoug, Boualem Hammache and Idris Messaoudene

Electronics 2026, 15(9), 1931; https://doi.org/10.3390/electronics15091931 (registering DOI) - 2 May 2026

Abstract

Speech Emotion Recognition (SER) is a key enabling technology for advanced human–computer interaction and affective computing. This paper presents an adaptive hybrid SER framework that combines a deep neural feature extraction module with a heterogeneous ensemble of machine learning classifiers, including XGBoost, Support Vector Machines (SVMs), and Random Forest. To overcome the limitations of static fusion strategies, a confidence-gated meta-classification mechanism is introduced to dynamically weight the contribution of each base classifier according to its instance-level reliability. The proposed approach is evaluated on two widely adopted benchmark datasets, EmoDB and SAVEE, achieving competitive accuracies of 98.88% and 91.92%, respectively. Experimental results demonstrate that the proposed fusion strategy significantly improves robustness against inter-speaker variability and emotional ambiguity, while maintaining low computational complexity suitable for real-time implementation. These findings highlight the effectiveness of the proposed framework as a robust and efficient solution for speech emotion recognition. While the model is evaluated on benchmark datasets, it is intended as a foundational component for future emotion-aware systems, including applications in human–computer interaction. Full article

(This article belongs to the Section Bioelectronics)

►▼ Show Figures

Figure 1

25 pages, 837 KB

Open AccessArticle

Dual-Branch Network with Dynamic Time Warping: Enhancing Micro-Expression Recognition Through Temporal Alignment

by Qiaohong Yao, Mengmeng Wang, Dayu Chen, Dan Liu and Yubin Li

Symmetry 2026, 18(5), 775; https://doi.org/10.3390/sym18050775 - 1 May 2026

Abstract

Micro-expressions, subtle and often asymmetric facial movements, play a pivotal role in nonverbal emotional communication. Addressing the core challenges of temporal misalignment, fragmented feature extraction, and slow real-time detection in micro-expression recognition (MER), we propose a novel dual-branch spatiotemporal model for dynamic sequence MER. Leveraging MediaPipe for 3D facial feature extraction and Dynamic Time Warping (DTW) for sequence alignment, our method nonlinearly maps variable-length sequences to a fixed length. A hybrid data augmentation technique enhances model robustness, while the dual-branch network simultaneously captures local spatial features and global temporal dynamics. Experimental results on the CASMEII dataset demonstrate state-of-the-art performance with 99.22% accuracy, along with a significant improvement in real-time detection speed. This approach holds substantial practical value for applications in deception detection, mental health assessment, and human–computer interaction. Full article

(This article belongs to the Section Computer)

20 pages, 19486 KB

Open AccessArticle

A Hierarchical Attention Synergetic Network for Facial Expression Recognition in Service Robots

by Dengpan Zhang, Qingping Ma, Zhihao Shen, Wenwen Ma, Yonggang Yan and Song Kong

Appl. Sci. 2026, 16(9), 4417; https://doi.org/10.3390/app16094417 - 30 Apr 2026

Viewed by 10

Abstract

Facial expression recognition (FER) is crucial for endowing service robots with emotional perception capabilities. Achieving high-performance facial expression recognition hinges on effectively balancing the capture of subtle local textures with the understanding of overall facial configurations. However, coordinating local feature variations with global semantic dependencies in unconstrained environments while maintaining semantic alignment remains a challenge. To address this issue, we propose FER-SDAM, a network architecture based on hierarchical attention collaboration. Through a dual-attention hierarchical collaboration mechanism, this architecture introduces an Attention Consistency Loss (ACL) to explicitly align shallow structural awareness with deep global dependencies. It simultaneously captures structural sensitivity and cross-regional correlations, facilitating the effective fusion of local structural information with global semantics, thereby balancing accuracy, robustness, and computational efficiency. We conducted extensive experiments on AffectNet, RAF-DB, and their subsets containing occlusion and pose variations, achieving accuracy rates of 68.12%, 66.68%, and 88.87% on the AffectNet-7, AffectNet-8, and RAF-DB datasets, respectively. The experimental results demonstrate that FER-SDAM achieves a critical balance between accuracy and efficiency, delivering highly competitive recognition performance while maintaining low computational overhead, making it an ideal solution for real-time deployment in service robots. Full article

35 pages, 16605 KB

Open AccessArticle

Facial Emotion Recognition Through a Smart Glasses Prototype: Improving Social Interaction for Visually Impaired Users Through Enhanced Deep Learning CBAM Architectures

by Nursel Yalcin and Muthana Alisawi

Appl. Sci. 2026, 16(9), 4415; https://doi.org/10.3390/app16094415 - 30 Apr 2026

Viewed by 102

Abstract

This research focuses on creating a real-time facial emotion recognition system for smart glasses designed for visually impaired users. By adapting a contextually adaptive attention mechanism (CBAM) in a lightweight classification header with a pre-trained deep learning model, we obtain a model capable of successfully predicting emotions from facial features. The model will be complemented by a comprehensive preprocessing pipeline that includes face detection and alignment, standard normalization, and data augmentation for underrepresented classes. The model was trained on a merged benchmark dataset (FER24, RAF-DB, CK+) and evaluated across basic emotion classes: surprise, happiness, disgust, fear, sadness, neutral, and anger. Two models were compared: FaceNet–CBAM and EmoFormer–CBAM (a ViT-Base model enhanced with a 1D-CBAM attention module). EmoFormer–CBAM achieved 98% and ~72% test accuracy on new dataset (CleanFER25_RAF_CK) and AffectNet, respectively. In addition, a small set of external real-world images is used as a pilot qualitative evaluation to assess robustness under unconstrained conditions. A detailed analytical study of both models was performed to determine the impact of their structural components on overall performance using the available data. Based on the results, the most successful model under all conditions, EmoFormer–CBAM, was selected as the prototype for the smart glasses for the visually impaired. The necessary mechanisms for future deployment and implementation of the smart glasses prototype for the target users were also studied, in accordance with the ethical approvals previously obtained from Gazi University in Türkiye. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

22 pages, 2321 KB

Open AccessArticle

A Deployment-Aware Data Processing Approach for Accuracy and Authenticity Evaluation of Artificial Emotional Intelligence in IoT Edge with Deep Learning

by Şükrü Mustafa Kaya

Appl. Sci. 2026, 16(9), 4394; https://doi.org/10.3390/app16094394 - 30 Apr 2026

Viewed by 2

Abstract

Artificial Emotional Intelligence (AEI) has gained significant attention for enabling machines to recognize and interpret human affective states through modalities such as speech. While deep learning-based speech emotion recognition (SER) models have achieved promising accuracy levels, their practical deployment in resource-constrained IoT edge environments remains insufficiently explored. In particular, there is a lack of systematic evaluation approaches that jointly consider classification performance, computational efficiency, and deployment feasibility under edge-oriented operational constraints. In this study, I address this gap by proposing a deployment-aware evaluation perspective for SER systems operating under IoT edge constraints. Rather than introducing a new model architecture, I focus on establishing a unified and reproducible evaluation framework that reflects practical deployment considerations for edge-based intelligent systems. Within this framework, three widely used deep learning architectures, convolutional neural networks (CNN), long short-term memory (LSTM), and dense neural networks, are systematically analyzed using the EMODB dataset. The experimental results demonstrate that CNN-based models achieve the most consistent classification performance, with peak validation accuracy reaching approximately 84%, while also providing a favorable balance between recognition performance and computational efficiency. To better reflect deployment-oriented evaluation, the study also considers latency-related behavior and computational characteristics relevant to edge computing environments based on benchmark-driven estimations. The findings highlight the importance of deployment-aware evaluation strategies and provide practical insights for selecting suitable model architectures in edge-oriented speech emotion recognition scenarios. This study contributes to bridging the gap between theoretical deep learning performance and practical feasibility considerations in IoT-based intelligent systems. Full article

(This article belongs to the Special Issue AI and Big Data in Internet of Things: Collection, Management and Analysis)

►▼ Show Figures

Figure 1

29 pages, 6711 KB

Open AccessArticle

Age Differences in the Relationship Between Interoception and Emotional Processing

by Sophie Cawkwell, Kata Pauly-Takacs, Katerina Zoe Kolokotroni and Gaby Pfeifer

Behav. Sci. 2026, 16(5), 672; https://doi.org/10.3390/bs16050672 - 29 Apr 2026

Viewed by 195

Abstract

Understanding how bodily signals shape emotional cognition across adulthood is critical for explaining age-related changes in emotional learning and memory. This study investigated age-related differences in interoceptive sensitivity and emotional associative memory. Interoceptive sensitivity was used as an umbrella term to refer to sensitivity to internal bodily signals across interoceptive accuracy, attention, beliefs, and insight, while emotional associative memory was defined as the ability to learn and remember emotional face–name associations. Forty younger (18–39 years) and forty older (60–85 years) adults completed behavioural and self-report interoceptive measures alongside an emotional face–name learning, recall, and recognition paradigm. No significant age differences emerged for interoceptive accuracy, attention, or insight. However, older adults reported greater trust in, and less worry about, bodily sensations, indicating selective changes in interoceptive beliefs. Older adults also showed a robust positivity bias, learning, recalling, and recognising happy face–name pairs more accurately and faster than angry or neutral pairs, whereas younger adults showed uniform performance across emotional conditions. Interoception–emotion relationships differed by age: Young adults’ interoceptive attention was positively associated with learning neutral pairs, while older adults’ interoceptive accuracy correlated with better encoding and recall of angry pairs. These findings demonstrate that age-related differences in emotional associative memory are partly rooted in changes to interoceptive processing and extend Socioemotional Selectivity Theory by identifying interoception as a physiological contributor to the positivity bias in ageing. Full article

(This article belongs to the Section Developmental Psychology)

►▼ Show Figures

Figure 1

13 pages, 227 KB

Open AccessArticle

Phased Traumatic Stress Responses Among Caregivers of Children and Adults Recently Diagnosed with Acute Leukemia: A Grounded Theory Study

by Carmine Malfitano, Stephanie M. Nanos, Luigi Grassi, Rosangela Caruso and Gary Rodin

Curr. Oncol. 2026, 33(5), 255; https://doi.org/10.3390/curroncol33050255 - 29 Apr 2026

Viewed by 116

Abstract

A diagnosis of acute leukemia (AL) represents a sudden, life-threatening event that places family caregivers (FCs) at high risk for traumatic stress. While traumatic stress symptoms have been documented among FCs later in the cancer trajectory, little is known about how these responses unfold during the immediate peri-diagnostic period, when acute stress disorder (ASD) may emerge, and early intervention could be most impactful. We conducted a qualitative study using a constructivist grounded theory approach to examine early traumatic stress responses among FCs of adults and children with newly diagnosed AL. Semi-structured interviews were conducted with 18 caregivers within the first six months of diagnosis as part of two clinical trials at major cancer centres in Toronto, Canada, and were analyzed iteratively using constant comparative methods. Caregivers described a coherent trajectory of traumatic stress responses across three phases. The anticipatory phase was characterized by prolonged uncertainty, helplessness, and mounting fear during diagnostic investigations. The acute phase, beginning at diagnosis, involved an abrupt shift toward emotional numbing, deliberate avoidance of catastrophic thoughts, and a narrowed focus on immediate tasks, often described as operating on “autopilot.” In the post-acute phase, as patients stabilized and discharge approached, caregivers reported increased emotional access, including grief, anger, and recognition of their own trauma, alongside emerging concerns about long-term caregiving and life disruption. These findings suggest that FCs of individuals with newly diagnosed AL exhibit a phased pattern of traumatic stress responses, marked by an early, adaptive dissociative coping response followed by delayed emotional processing, underscoring the importance of phase-sensitive psychosocial care in oncology. Full article

(This article belongs to the Special Issue Psychological Interventions for Cancer Survivors)

20 pages, 3850 KB

Open AccessArticle

Dimensional Emotion-Guided Conditional Modulation for Context-Aware Multimodal Driver Affect Recognition

by Wei Shen, Xingang Mou, Jing Yi and Songqing Le

Appl. Sci. 2026, 16(9), 4312; https://doi.org/10.3390/app16094312 - 28 Apr 2026

Viewed by 183

Abstract

Driver emotion recognition constitutes a fundamental pillar of intelligent cockpit systems, playing a pivotal role in enhancing driving safety and optimizing human–machine interaction. Despite the integration of vehicle sensor data in recent multimodal approaches, conventional fusion paradigms frequently encounter performance degradation due to the inherent noise and weak semantic correlation between vehicle telemetry and emotional states. To address these challenges, this study introduces a Dimensional Emotion-Guided Multi-task (DEGM) framework, a novel architecture designed to explicitly formalize the asymmetric roles of visual and vehicular modalities. Rather than employing simplistic feature concatenation, the proposed method maps multivariate vehicle data into a continuous Valence–Arousal–Dominance (VAD) space to characterize latent emotional tendencies within specific driving contexts. These predicted dimensions subsequently serve as semantic priors to conditionally modulate global facial representations through a Feature-wise Linear Modulation (FiLM) mechanism, facilitating robust and interpretable cross-modal interaction. Furthermore, the framework adopts a multi-task learning strategy that jointly optimizes discrete emotion classification and continuous dimension regression, leveraging the latter as a structural regularizer to refine the latent feature space. Comprehensive evaluations on the public PPB driving emotion dataset demonstrate that the proposed DEGM achieves a competitive accuracy of 87.50% and a weighted F1-score of 0.8727. The results validate that our framework provides a lightweight and robust paradigm for context-aware affect sensing, demonstrating strong potential for practical deployment in intelligent transportation systems. Full article

►▼ Show Figures

Figure 1

21 pages, 496 KB

Open AccessArticle

Access Intimacy as Feeling, Practice, and Political Vision: An Inclusive Research with Visually Impaired Participants in Hong Kong

by Winnie Hiu-ting Chan and Wenyan Chen

Soc. Sci. 2026, 15(5), 282; https://doi.org/10.3390/socsci15050282 - 27 Apr 2026

Viewed by 120

Abstract

This article explores access intimacy as feeling, interactional practice, and political vision through an inclusive research project in Hong Kong, where 12 visually impaired adults and 35 university students collaboratively developed accessible board games. Drawing on Mingus’s interdependence framework and Valentine’s justice-based access, we position visually impaired participants as primary knowledge producers while critically examining vulnerability, power dynamics, and research ethics. Analysis of field observations and in-depth interviews reveals three key dimensions: (1) collaborative game design enabled visually impaired participants to experience emotional access by fostering friendship, recognition, and belonging beyond logistical accessibility; (2) negotiation around “independence” and “fairness” generated transformative empowerment for both visually impaired and sighted participants, reframing interdependence as strength; and (3) reciprocal vulnerability in sighted guiding practices disrupted ableist assumptions about autonomy, care, and risk, revealing care as mutual rather than unidirectional. We argue that access intimacy functions as a learnable relational skill, and that attending to it in research design, community planning, and accessibility policy fosters justice-based paradigms that move beyond accommodation toward genuine interdependence and solidarity. Full article

(This article belongs to the Section Community and Urban Sociology)

19 pages, 1430 KB

Open AccessArticle

AI-Boosted Affective Real-Time Educational Software Adaptation

by Athanasios Nikolaidis, Athanasios Voulgaridis, Charalambos Strouthopoulos and Vassilios Chatzis

Appl. Sci. 2026, 16(9), 4117; https://doi.org/10.3390/app16094117 - 23 Apr 2026

Viewed by 135

Abstract

Nowadays, educational software across all learning levels is increasingly enhanced with Artificial Intelligence (AI), primarily through content generation or post-session learning analytics. However, most existing systems remain weakly connected to learners’ real-time affective states and rarely exploit emotional information as a direct control signal for instructional adaptation. In this work, we propose a proof-of-concept closed-loop affect-aware educational adaptation framework that integrates real-time facial emotion recognition into a dynamic learning control system. The proposed approach is built upon a dual-model ensemble architecture, combining a transformer-based model (CAGE) and a CNN-based model (DDAMFN++) trained on large-scale in-the-wild datasets. To bridge heterogeneous emotion representations, we introduce a probabilistic fusion strategy that aligns continuous valence–arousal predictions with discrete emotion classification via a Gaussian Mixture Model (GMM), enabling unified emotion inference in real time. Based on the fused emotional state, a temporal aggregation mechanism is applied to capture sustained affective trends rather than transient expressions. These aggregated signals are then mapped to instructional decisions through an emotion-driven adaptive control policy, which adjusts activity difficulty using an Average Emotion Score (AES). This establishes a fully automated closed-loop adaptation cycle, where detected learner affect directly influences the learning environment without requiring explicit user input or post-session questionnaires. The framework is integrated into an open-source educational platform (eduActiv8) to demonstrate feasibility and system-level behavior. Results from alpha-level validation show that the system can continuously monitor learner affect, generate interpretable emotional analytics, and dynamically adjust task difficulty in real time, while reducing user interaction overhead. This study contributes a modular architecture for affect-aware educational systems by combining real-time ensemble emotion recognition, probabilistic fusion of heterogeneous outputs, and closed-loop instructional adaptation. The proposed framework provides a foundation for future research in scalable, emotion-driven intelligent tutoring and adaptive learning environments. Full article

(This article belongs to the Special Issue The Age of Transformers: Emerging Trends and Applications)

24 pages, 2667 KB

Open AccessArticle

Hybrid Deep Neural Network-Based Modeling of Multimodal Emotion Recognition for Novice Drivers

by Jianzhuo Li, Ye Yu, Zhao Dai and Panyu Dai

Future Internet 2026, 18(4), 221; https://doi.org/10.3390/fi18040221 - 21 Apr 2026

Viewed by 257

Abstract

Driver emotion recognition is a crucial method for reducing traffic accidents. Most existing research focuses on experienced drivers as the primary research subjects, overlooking novice drivers, who are inexperienced in driving. However, novice drivers can easily lose control of their emotions due to the high mental load during driving, which can lead to serious traffic accidents. Therefore, to recognize the emotions of novice drivers for timely warnings, we propose an emotion recognition model based on multimodal information. The model consists of a facial feature extraction module, an eye movement feature extraction module and a classifier. The facial feature extraction module uses the ViT-B/16 to extract the facial features of novice drivers. The eye movement feature extraction module is a hybrid network containing Bi-LSTM and Transformer. It extracts eye movement features of novice drivers. Facial features and eye movement features are fused and fed to the classifier. The classifier can output the five major emotion categories of surprise, anger, calm, happy, and other for novice drivers. The experimental results demonstrate that our model accurately recognizes the emotions of novice drivers with an accuracy of 98.72%, surpassing that of other models. Full article

►▼ Show Figures

Figure 1

18 pages, 892 KB

Open AccessArticle

Emotional Recognition Under Multimodal Conflict: A Gaze-Based Response Task

by Alessandro De Santis, Giusi Antonia Toto, Martina Rossi, Laura D’Amico and Pierpaolo Limone

Psychol. Int. 2026, 8(2), 26; https://doi.org/10.3390/psycholint8020026 - 20 Apr 2026

Viewed by 254

Abstract

Emotional recognition relies on the integration of multiple affective cues. In everyday contexts, however, facial expressions, vocal prosody, and semantic content may convey incongruent emotional information, generating emotional conflict and increasing cognitive demands. The present study examined how multimodal emotional conflict affects emotion recognition during video viewing, focusing on short videos in which a single actor simultaneously conveyed incongruent emotional cues across facial, vocal, and semantic channels. Forty-seven undergraduate students completed a gaze-based response task in which, after each short video, they provided a single judgment of the overall emotion conveyed by the stimulus. The videos depicted either congruent or incongruent combinations of semantic content, facial expressions, and vocal prosody across six basic emotions and a neutral condition. Data were analyzed using repeated-measures ANOVAs and generalized linear mixed-effects models. Accuracy was consistently higher for congruent than incongruent stimuli across all domains, indicating a robust emotional interference effect. Critically, the magnitude of this effect differed by domain. Semantic content showed the largest performance reduction under incongruence, followed by facial expression and vocal prosody. Mixed-effects models confirmed these effects while accounting for participant- and item-level variability and revealed a significant Congruency × Domain interaction. In a gaze-based response task requiring a single overall emotion judgment, emotional conflict disrupted recognition in a domain-specific manner, with semantic information being particularly vulnerable to multimodal interference. Full article

(This article belongs to the Section Cognitive Psychology)

►▼ Show Figures

Figure 1

23 pages, 601 KB

Open AccessArticle

Novel Ensemble Models for Enhanced Accuracy in Time Series Classification: Application to Multimodal Emotion Detection

by Mohamed Hanafy Abdel-Kader Mahmoud, Sherine Nagy Saleh, Amin Shoukry and Yousry Elgamal

Computers 2026, 15(4), 256; https://doi.org/10.3390/computers15040256 - 20 Apr 2026

Viewed by 280

Abstract

Emotions are fundamental to the human experience and are increasingly analyzed in applications such as marketing, healthcare, and human–computer interaction. Many recent approaches to human emotion recognition rely on deep learning, which typically demands large labeled datasets and substantial computational resources and often suffers from limited interpretability. Applying classical machine-learning methods to sensor time series is more lightweight but may struggle to reach high accuracy, especially when the temporal structure is not explicitly modelled. This paper introduces three subinterval voting-based ensemble models designed for user-specific emotion classification from multimodal time-series data acquired by smartwatch inertial sensors and heart-rate measurements. Each model partitions a time window into subwindows and performs window-level voting, thereby exploiting the temporal consistency of emotional responses while remaining compatible with standard classifiers such as logistic regression and Random Forests (with or without hyperparameter tuning). The models are evaluated on a public smartwatch emotion benchmark dataset under both binary (happy vs. sad) and three-class (happy, sad, neutral) settings. The relative accuracy improvement over the corresponding baseline reported in prior work ranges from 4.68% to 26.05%, with a mean gain of 12.34%. For the three-class tasks, improvements range from 11.17% to 37.10%, with a mean gain of 21.63%. Within the evaluated experimental setting, these results show that the proposed subinterval ensembles consistently enhance performance while remaining model-agnostic and compatible with standard user-specific classification pipelines in sensor-based emotion recognition. Full article

(This article belongs to the Special Issue Wearable Computing and Activity Recognition)

►▼ Show Figures

Figure 1

27 pages, 2923 KB

Open AccessArticle

An Assistant System for Speaker and Sentiment Recognition Using RAM and a Hybrid AI Model

by Fatma Bozyiğit, İrfan Aygün, Oğuzhan Sağlam, Eren Özcan, Emin Borandağ and Bahadır Karasulu

Electronics 2026, 15(8), 1731; https://doi.org/10.3390/electronics15081731 - 19 Apr 2026

Viewed by 543

Abstract

In the age of remote communication and digital archiving, automated analysis of voice data has become increasingly important in various application areas. Despite significant advances in the field of Automatic Speech Recognition, integrating speaker recognition, textual sentiment analysis, and acoustic sentiment detection within a unified real-time processing pipeline remains a challenging task. Current approaches are often limited to monolithic designs or operate in batch processing modes, which restricts their scalability and real-time applicability. To address this gap, this work proposes a novel feature selection method called RAM, along with a hybrid decision-level merging approach combining Conv1D CNN and AutoML-based models. The proposed hybrid framework enables independent model training and integrates its probabilistic outputs through a weighted merging strategy for performance improvement. Furthermore, a scalable microservice-based software architecture has been developed to support real-time processing, feature selection, and model deployment. This design enhances system modularity, flexibility, and integration capability in practical applications. Experimental results show that when the proposed RAM method is used in conjunction with a hybrid AI model, it achieves over 97% accuracy in speaker recognition and over 82% accuracy in emotion classification, even with short audio samples. These findings demonstrate that the proposed approach provides a robust and efficient solution for real-time speech analysis tasks. Full article

(This article belongs to the Special Issue Techniques and Applications of Multimodal Data Fusion)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 39.

Go to page 1 2 3 4 5

Search Results (1,920)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI