Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (107)

Search Parameters:
Keywords = video-based emotion recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 1709 KiB  
Article
Decoding Humor-Induced Amusement via Facial Expression Analysis: Toward Emotion-Aware Applications
by Gabrielle Toupin, Arthur Dehgan, Marie Buffo, Clément Feyt, Golnoush Alamian, Karim Jerbi and Anne-Lise Saive
Appl. Sci. 2025, 15(13), 7499; https://doi.org/10.3390/app15137499 - 3 Jul 2025
Viewed by 191
Abstract
Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet, the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or [...] Read more.
Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet, the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or coarse summary ratings, providing little insight into how amusement unfolds over time. To address this gap, we developed a Random Forest model to predict the intensity of amusement evoked by humorous video clips, based on participants’ facial expressions—particularly the co-activation of Facial Action Units 6 and 12 (“% Smile”)—and video features such as motion, saliency, and topic. Our results show that exposure to humorous content significantly increases “% Smile”, with amusement peaking toward the end of videos. Importantly, we observed emotional carry-over effects, suggesting that consecutive humorous stimuli can sustain or amplify positive emotional responses. Even when trained solely on humorous content, the model reliably predicted amusement intensity, underscoring the robustness of our approach. Overall, this study provides a novel, objective method to track amusement on a fine temporal scale, advancing the measurement of nonverbal emotional expression. These findings may inform the design of emotion-aware applications and humor-based therapeutic interventions to promote well-being and emotional health. Full article
(This article belongs to the Special Issue Emerging Research in Behavioral Neuroscience and in Rehabilitation)
Show Figures

Figure 1

22 pages, 932 KiB  
Review
Advances in Video Emotion Recognition: Challenges and Trends
by Yun Yi, Yunkang Zhou, Tinghua Wang and Jin Zhou
Sensors 2025, 25(12), 3615; https://doi.org/10.3390/s25123615 - 9 Jun 2025
Viewed by 905
Abstract
Video emotion recognition (VER), situated at the convergence of affective computing and computer vision, aims to predict the primary emotion evoked in most viewers through video content, with extensive applications in video recommendation, human–computer interaction, and intelligent education. This paper commences with an [...] Read more.
Video emotion recognition (VER), situated at the convergence of affective computing and computer vision, aims to predict the primary emotion evoked in most viewers through video content, with extensive applications in video recommendation, human–computer interaction, and intelligent education. This paper commences with an analysis of the psychological models that constitute the foundation of VER theory. The paper further elaborates on datasets and evaluation metrics commonly utilized in VER. Then, the paper reviews VER algorithms according to their categories, and compares and analyzes the experimental results of classic methods on four datasets. Based on a comprehensive analysis and investigations, the paper identifies the prevailing challenges currently faced in the VER field, including gaps between emotional representations and labels, large-scale and high-quality VER datasets, and the efficient integration of multiple modalities. Furthermore, this study proposes potential research directions to address these challenges, e.g., advanced neural network architectures, efficient multimodal fusion strategies, high-quality emotional representation, and robust active learning strategies. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 1941 KiB  
Article
MMER-LMF: Multi-Modal Emotion Recognition in Lightweight Modality Fusion
by Eun-Hee Kim, Myung-Jin Lim and Ju-Hyun Shin
Electronics 2025, 14(11), 2139; https://doi.org/10.3390/electronics14112139 - 24 May 2025
Viewed by 483
Abstract
Recently, multimodal approaches that combine various modalities have been attracting attention to recognizing emotions more accurately. Although multimodal fusion delivers strong performance, it is computationally intensive and difficult to handle in real time. In addition, there is a fundamental lack of large-scale emotional [...] Read more.
Recently, multimodal approaches that combine various modalities have been attracting attention to recognizing emotions more accurately. Although multimodal fusion delivers strong performance, it is computationally intensive and difficult to handle in real time. In addition, there is a fundamental lack of large-scale emotional datasets for learning. In particular, Korean emotional datasets have fewer resources available than English-speaking datasets, thereby limiting the generalization capability of emotion recognition models. In this study, we propose a more lightweight modality fusion method, MMER-LMF, to overcome the lack of Korean emotional datasets and improve emotional recognition performance while reducing model training complexity. To this end, we suggest three algorithms that fuse emotion scores based on the reliability of each model, including text emotion scores extracted using a pre-trained large-scale language model and video emotion scores extracted based on a 3D CNN model. Each algorithm showed similar classification performance except for slight differences in disgust emotion performance with confidence-based weight adjustment, correlation coefficient utilization, and the Dempster–Shafer Theory-based combination method. The accuracy was 80% and the recall was 79%, which is higher than 58% using text modality and 72% using video modality. This is a superior result in terms of learning complexity and performance compared to previous studies using Korean datasets. Full article
(This article belongs to the Special Issue Modeling of Multimodal Speech Recognition and Language Processing)
Show Figures

Figure 1

19 pages, 1061 KiB  
Article
The Co-Creation of a Psychosocial Support Website for Advanced Cancer Patients Obtaining a Long-Term Response to Immunotherapy or Targeted Therapy
by Laura C. Zwanenburg, Marije L. van der Lee, José J. Koldenhof, Janneke van der Stap, Karijn P. M. Suijkerbuijk and Melanie P. J. Schellekens
Curr. Oncol. 2025, 32(5), 284; https://doi.org/10.3390/curroncol32050284 - 19 May 2025
Viewed by 496
Abstract
Due to new treatment options, the number of patients living longer with advanced cancer is rapidly growing. While this is promising, many long-term responders (LTRs) face difficulties adapting to life with cancer due to persistent uncertainty, feeling misunderstood, and insufficient tools to navigate [...] Read more.
Due to new treatment options, the number of patients living longer with advanced cancer is rapidly growing. While this is promising, many long-term responders (LTRs) face difficulties adapting to life with cancer due to persistent uncertainty, feeling misunderstood, and insufficient tools to navigate their “new normal”. Using the Person-Based Approach, this study developed and evaluated a website in co-creation with LTRs, healthcare professionals, and service providers, offering evidence-based information and tools for LTRs. We identified the key issues (i.e., living with uncertainty, relationships with close others, mourning losses, and adapting to life with cancer) and established the website’s main goals: acknowledging and normalizing emotions, difficulties, and challenges LTRs face and providing tailored information and practical tools. The prototype was improved through repeated feedback from a user panel (n = 9). In the evaluation phase (n = 43), 68% of participants rated the website’s usability as good or excellent. Interview data indicated that participants experienced recognition through portrait videos and quotes, valued the psycho-education via written text and (animated) videos, and made use of the practical tools (e.g. conversation aid), confirming that the main goals were achieved. Approximately 90% of participants indicated they would recommend the website to other LTRs. The Dutch website—Doorlevenmetkanker (i.e., continuing life with cancer) was officially launched in March 2025 in the Netherlands. Full article
(This article belongs to the Section Psychosocial Oncology)
Show Figures

Figure 1

31 pages, 13317 KiB  
Article
3D Micro-Expression Recognition Based on Adaptive Dynamic Vision
by Weiyi Kong, Zhisheng You and Xuebin Lv
Sensors 2025, 25(10), 3175; https://doi.org/10.3390/s25103175 - 18 May 2025
Viewed by 646
Abstract
In the research on intelligent perception, dynamic emotion recognition has been the focus in recent years. Small samples and unbalanced data are the main reasons for the low recognition accuracy of current technologies. Inspired by circular convolution networks, this paper innovatively proposes an [...] Read more.
In the research on intelligent perception, dynamic emotion recognition has been the focus in recent years. Small samples and unbalanced data are the main reasons for the low recognition accuracy of current technologies. Inspired by circular convolution networks, this paper innovatively proposes an adaptive dynamic micro-expression recognition algorithm based on self-supervised learning, namely MADV-Net. Firstly, a basic model is pre-trained with accurate tag data, and then an efficient facial motion encoder is used to embed facial coding unit tags. Finally, a cascaded pyramid structure is constructed by the multi-level adaptive dynamic encoder, and the multi-level head perceptron is used as the input into the classification loss function to calculate facial micro-motion features in the dynamic video stream. In this study, a large number of experiments were carried out on the open-source datasets SMIC, CASME-II, CAS(ME)2, and SAMM. Compared with the 13 mainstream SOTA methods, the average recognition accuracy of MADV-Net is 72.87%, 89.94%, 83.32% and 89.53%, respectively. The stable generalization ability of this method is proven, providing a new research paradigm for automatic emotion recognition. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

15 pages, 2021 KiB  
Article
Toward Annotation, Visualization, and Reproducible Archiving of Human–Human Dialog Video Recording Applications
by Verena Schreyer, Marco Xaver Bornschlegl and Matthias Hemmje
Information 2025, 16(5), 349; https://doi.org/10.3390/info16050349 - 26 Apr 2025
Viewed by 354
Abstract
The COVID-19 pandemic increased the number of video conferences, for example, through online teaching and home office meetings. Even in the medical environment, consultation sessions are now increasingly conducted in the form of video conferencing. This includes sessions between psychotherapists and one or [...] Read more.
The COVID-19 pandemic increased the number of video conferences, for example, through online teaching and home office meetings. Even in the medical environment, consultation sessions are now increasingly conducted in the form of video conferencing. This includes sessions between psychotherapists and one or more call participants (individual/group calls). To subsequently document and analyze patient conversations, as well as any other human–human dialog, it is possible to record these video conferences. This allows experts to concentrate better on the conversation during the dialog and to perform analysis afterward. Artificial intelligence (AI) and its machine learning approach, which has already been used extensively for innovations, can provide support for subsequent analyses. Among other things, emotion recognition algorithms can be used to determine dialog participants’ emotions and record them automatically. This can alert experts to any noticeable sections of the conversation during subsequent analysis, thus simplifying the analysis process. As a result, experts can identify the cause of such sections based on emotion sequence data and exchange ideas with other experts within the context of an analysis tool. Full article
(This article belongs to the Special Issue Advances in Human-Centered Artificial Intelligence)
Show Figures

Graphical abstract

10 pages, 1379 KiB  
Proceeding Paper
Recognizing Human Emotions Through Body Posture Dynamics Using Deep Neural Networks
by Arunnehru Jawaharlalnehru, Thalapathiraj Sambandham and Dhanasekar Ravikumar
Eng. Proc. 2025, 87(1), 49; https://doi.org/10.3390/engproc2025087049 - 16 Apr 2025
Viewed by 771
Abstract
Body posture dynamics have garnered significant attention in recent years due to their critical role in understanding the emotional states conveyed through human movements during social interactions. Emotions are typically expressed through facial expressions, voice, gait, posture, and overall body dynamics. Among these, [...] Read more.
Body posture dynamics have garnered significant attention in recent years due to their critical role in understanding the emotional states conveyed through human movements during social interactions. Emotions are typically expressed through facial expressions, voice, gait, posture, and overall body dynamics. Among these, body posture provides subtle yet essential cues about emotional states. However, predicting an individual’s gait and posture dynamics poses challenges, given the complexity of human body movement, which involves numerous degrees of freedom compared to facial expressions. Moreover, unlike static facial expressions, body dynamics are inherently fluid and continuously evolving. This paper presents an effective method for recognizing 17 micro-emotions by analyzing kinematic features from the GEMEP dataset using video-based motion capture. We specifically focus on upper body posture dynamics (skeleton points and angle), capturing movement patterns and their dynamic range over time. Our approach addresses the complexity of recognizing emotions from posture and gait by focusing on key elements of kinematic gesture analysis. The experimental results demonstrate the effectiveness of the proposed model, achieving a high accuracy rate of 91.48% for angle metric + DNN and 93.89% for distance + DNN on the GEMEP dataset using a deep neural network (DNN). These findings highlight the potential for our model to advance posture-based emotion recognition, particularly in applications where human body dynamics distance and angle are key indicators of emotional states. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

22 pages, 3427 KiB  
Article
A Multimodal Artificial Intelligence Model for Depression Severity Detection Based on Audio and Video Signals
by Liyuan Zhang, Shuai Zhang, Xv Zhang and Yafeng Zhao
Electronics 2025, 14(7), 1464; https://doi.org/10.3390/electronics14071464 - 4 Apr 2025
Viewed by 1304
Abstract
In recent years, artificial intelligence (AI) has increasingly utilized speech and video signals for emotion recognition, facial recognition, and depression detection, playing a crucial role in mental health assessment. However, the AI-driven research on detecting depression severity remains limited, and the existing models [...] Read more.
In recent years, artificial intelligence (AI) has increasingly utilized speech and video signals for emotion recognition, facial recognition, and depression detection, playing a crucial role in mental health assessment. However, the AI-driven research on detecting depression severity remains limited, and the existing models are often too large for lightweight deployment, restricting their real-time monitoring capabilities, especially in resource-constrained environments. To address these challenges, this study proposes a lightweight and accurate multimodal method for detecting depression severity, aiming to provide effective support for smart healthcare systems. Specifically, we design a multimodal detection network based on speech and video signals, enhancing the recognition of depression severity by optimizing the cross-modal fusion strategy. The model leverages Long Short-Term Memory (LSTM) networks to capture long-term dependencies in speech and visual sequences, effectively extracting dynamic features associated with depression. Considering the behavioral differences of respondents when interacting with human versus robotic interviewers, we train two separate sub-models and fuse their outputs using a Mixture of Experts (MOE) framework capable of modeling uncertainty, thereby suppressing the influence of low-confidence experts. In terms of the loss function, the traditional Mean Squared Error (MSE) is replaced with Negative Log-Likelihood (NLL) to better model prediction uncertainty and enhance robustness. The experimental results show that the improved AI model achieves an accuracy of 83.86% in depression severity recognition. The model’s floating-point operations per second (FLOPs) reached 0.468 GFLOPs, with a parameter size of only 0.52 MB, demonstrating its compact size and strong performance. These findings underscore the importance of emotion and facial recognition in AI applications for mental health, offering a promising solution for real-time depression monitoring in resource-limited environments. Full article
Show Figures

Figure 1

20 pages, 1411 KiB  
Article
CBR-Net: A Multisensory Emotional Electroencephalography (EEG)-Based Personal Identification Model with Olfactory-Enhanced Video Stimulation
by Rui Ouyang, Minchao Wu, Zhao Lv and Xiaopei Wu
Bioengineering 2025, 12(3), 310; https://doi.org/10.3390/bioengineering12030310 - 18 Mar 2025
Viewed by 592
Abstract
Electroencephalography (EEG)-basedpersonal identification has gained significant attention, but fluctuations in emotional states often affect model accuracy. Previous studies suggest that multisensory stimuli, such as video and olfactory cues, can enhance emotional responses and improve EEG-based identification accuracy. This study proposes a novel deep [...] Read more.
Electroencephalography (EEG)-basedpersonal identification has gained significant attention, but fluctuations in emotional states often affect model accuracy. Previous studies suggest that multisensory stimuli, such as video and olfactory cues, can enhance emotional responses and improve EEG-based identification accuracy. This study proposes a novel deep learning-based model, CNN-BiLSTM-Residual Network (CBR-Net), for EEG-based identification and establishes a multisensory emotional EEG dataset with both video-only and olfactory-enhanced video stimulation. The model includes a convolutional neural network (CNN) for spatial feature extraction, Bi-LSTM for temporal modeling, residual connections, and a fully connected classification module. Experimental results show that olfactory-enhanced video stimulation significantly improves the emotional intensity of EEG signals, leading to better recognition accuracy. The CBR-Net model outperforms video-only stimulation, achieving the highest accuracy for negative emotions (96.59%), followed by neutral (94.25%) and positive emotions (95.42%). Ablation studies reveal that the Bi-LSTM module is crucial for neutral emotions, while CNN is more effective for positive emotions. Compared to traditional machine learning and existing deep learning models, CBR-Net demonstrates superior performance across all emotional states. In conclusion, CBR-Net enhances identity recognition accuracy and validates the advantages of multisensory stimuli in EEG signals. Full article
(This article belongs to the Section Biosignal Processing)
Show Figures

Figure 1

15 pages, 974 KiB  
Article
Overcoming Challenges in Video-Based Health Monitoring: Real-World Implementation, Ethics, and Data Considerations
by Simão Ferreira, Catarina Marinheiro, Catarina Mateus, Pedro Pereira Rodrigues, Matilde A. Rodrigues and Nuno Rocha
Sensors 2025, 25(5), 1357; https://doi.org/10.3390/s25051357 - 22 Feb 2025
Viewed by 1204
Abstract
In the context of evolving healthcare technologies, this study investigates the application of AI and machine learning in video-based health monitoring systems, focusing on the challenges and potential of implementing such systems in real-world scenarios, specifically for knowledge workers. The research underscores the [...] Read more.
In the context of evolving healthcare technologies, this study investigates the application of AI and machine learning in video-based health monitoring systems, focusing on the challenges and potential of implementing such systems in real-world scenarios, specifically for knowledge workers. The research underscores the criticality of addressing technological, ethical, and practical hurdles in deploying these systems outside controlled laboratory environments. Methodologically, the study spanned three months and employed advanced facial recognition technology embedded in participants’ computing devices to collect physiological metrics such as heart rate, blinking frequency, and emotional states, thereby contributing to a stress detection dataset. This approach ensured data privacy and aligns with ethical standards. The results reveal significant challenges in data collection and processing, including biases in video datasets, the need for high-resolution videos, and the complexities of maintaining data quality and consistency, with 42% (after adjustments) of data lost. In conclusion, this research emphasizes the necessity for rigorous, ethical, and technologically adapted methodologies to fully realize the benefits of these systems in diverse healthcare contexts. Full article
Show Figures

Figure 1

17 pages, 3422 KiB  
Article
TheraSense: Deep Learning for Facial Emotion Analysis in Mental Health Teleconsultation
by Hayette Hadjar, Binh Vu and Matthias Hemmje
Electronics 2025, 14(3), 422; https://doi.org/10.3390/electronics14030422 - 22 Jan 2025
Cited by 5 | Viewed by 3275
Abstract
Background: This paper presents TheraSense, a system developed within the Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions and evidence (Smile) and Sensor Enabled Affective Computing for Enhancing Medical Care (SenseCare) projects. TheraSense is designed to enhance teleconsultation services by [...] Read more.
Background: This paper presents TheraSense, a system developed within the Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions and evidence (Smile) and Sensor Enabled Affective Computing for Enhancing Medical Care (SenseCare) projects. TheraSense is designed to enhance teleconsultation services by leveraging deep learning for real-time emotion recognition through facial expressions. It integrates with the Knowledge Management-Ecosystem Portal (SenseCare KM-EP) platform to provide mental health practitioners with valuable emotional insights during remote consultations. Method: We describe the conceptual design of TheraSense, including its use case contexts, architectural structure, and user interface layout. The system’s interoperability is discussed in detail, highlighting its seamless integration within the teleconsultation workflow. The evaluation methods include both quantitative assessments of the video-based emotion recognition system’s performance and qualitative feedback through heuristic evaluation and survey analysis. Results: The performance evaluation shows that TheraSense effectively recognizes emotions in video streams, with positive user feedback on its usability and integration. The system’s real-time emotion detection capabilities provide valuable support for mental health practitioners during remote sessions. Conclusions: TheraSense demonstrates its potential as an innovative tool for enhancing teleconsultation services. By providing real-time emotional insights, it supports better-informed decision-making in mental health care, making it an effective addition to remote telehealth platforms. Full article
Show Figures

Graphical abstract

24 pages, 3261 KiB  
Article
A Video-Based Cognitive Emotion Recognition Method Using an Active Learning Algorithm Based on Complexity and Uncertainty
by Hongduo Wu, Dong Zhou, Ziyue Guo, Zicheng Song, Yu Li, Xingzheng Wei and Qidi Zhou
Appl. Sci. 2025, 15(1), 462; https://doi.org/10.3390/app15010462 - 6 Jan 2025
Viewed by 1564
Abstract
The cognitive emotions of individuals during tasks largely determine the success or failure of tasks in various fields such as the military, medical, industrial fields, etc. Facial video data can carry more emotional information than static images because emotional expression is a temporal [...] Read more.
The cognitive emotions of individuals during tasks largely determine the success or failure of tasks in various fields such as the military, medical, industrial fields, etc. Facial video data can carry more emotional information than static images because emotional expression is a temporal process. Video-based Facial Expression Recognition (FER) has received increasing attention from the relevant scholars in recent years. However, due to the high cost of marking and training video samples, feature extraction is inefficient and ineffective, which leads to a low accuracy and poor real-time performance. In this paper, a cognitive emotion recognition method based on video data is proposed, in which 49 emotion description points were initially defined, and the spatial–temporal features of cognitive emotions were extracted from the video data through a feature extraction method that combines geodesic distances and sample entropy. Then, an active learning algorithm based on complexity and uncertainty was proposed to automatically select the most valuable samples, thereby reducing the cost of sample labeling and model training. Finally, the effectiveness, superiority, and real-time performance of the proposed method were verified utilizing the MMI Facial Expression Database and some real-time-collected data. Through comparisons and testing, the proposed method showed satisfactory real-time performance and a higher accuracy, which can effectively support the development of a real-time monitoring system for cognitive emotions. Full article
(This article belongs to the Special Issue Advanced Technologies and Applications of Emotion Recognition)
Show Figures

Figure 1

21 pages, 4242 KiB  
Article
A Learning Emotion Recognition Model Based on Feature Fusion of Photoplethysmography and Video Signal
by Xiaoliang Zhu, Zili He, Chuanyong Wang, Zhicheng Dai and Liang Zhao
Appl. Sci. 2024, 14(24), 11594; https://doi.org/10.3390/app142411594 - 12 Dec 2024
Viewed by 1481
Abstract
The ability to recognize learning emotions facilitates the timely detection of students’ difficulties during the learning process, supports teachers in modifying instructional strategies, and allows for personalized student assistance. The detection of learning emotions through the capture of convenient, non-intrusive signals such as [...] Read more.
The ability to recognize learning emotions facilitates the timely detection of students’ difficulties during the learning process, supports teachers in modifying instructional strategies, and allows for personalized student assistance. The detection of learning emotions through the capture of convenient, non-intrusive signals such as photoplethysmography (PPG) and video offers good practicality; however, it presents new challenges. Firstly, PPG-based emotion recognition is susceptible to external factors like movement and lighting conditions, leading to signal quality degradation and recognition accuracy issues. Secondly, video-based emotion recognition algorithms may witness a reduction in accuracy within spontaneous scenes due to variations, occlusions, and uneven lighting conditions, etc. Therefore, on the one hand, it is necessary to improve the performance of the two recognition methods mentioned above; on the other hand, using the complementary advantages of the two methods through multimodal fusion needs to be considered. To address these concerns, our work mainly includes the following: (i) the development of a temporal convolutional network model incorporating channel attention to overcome PPG-based emotion recognition challenges; (ii) the introduction of a network model that integrates multi-scale spatiotemporal features to address the challenges of emotion recognition in spontaneous environmental videos; (iii) an exploration of a dual-mode fusion approach, along with an improvement of the model-level fusion scheme within a parallel connection attention aggregation network. Experimental comparisons demonstrate the efficacy of the proposed methods, particularly the bimodal fusion, which substantially enhances the accuracy of learning emotion recognition, reaching 95.75%. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

40 pages, 20840 KiB  
Article
Facial Biosignals Time–Series Dataset (FBioT): A Visual–Temporal Facial Expression Recognition (VT-FER) Approach
by João Marcelo Silva Souza, Caroline da Silva Morais Alves, Jés de Jesus Fiais Cerqueira, Wagner Luiz Alves de Oliveira, Orlando Mota Pires, Naiara Silva Bonfim dos Santos, Andre Brasil Vieira Wyzykowski, Oberdan Rocha Pinheiro, Daniel Gomes de Almeida Filho, Marcelo Oliveira da Silva and Josiane Dantas Viana Barbosa
Electronics 2024, 13(24), 4867; https://doi.org/10.3390/electronics13244867 - 10 Dec 2024
Viewed by 1290
Abstract
Visual biosignals can be used to analyze human behavioral activities and serve as a primary resource for Facial Expression Recognition (FER). FER computational systems face significant challenges, arising from both spatial and temporal effects. Spatial challenges include deformations or occlusions of facial geometry, [...] Read more.
Visual biosignals can be used to analyze human behavioral activities and serve as a primary resource for Facial Expression Recognition (FER). FER computational systems face significant challenges, arising from both spatial and temporal effects. Spatial challenges include deformations or occlusions of facial geometry, while temporal challenges involve discontinuities in motion observation due to high variability in poses and dynamic conditions such as rotation and translation. To enhance the analytical precision and validation reliability of FER systems, several datasets have been proposed. However, most of these datasets focus primarily on spatial characteristics, rely on static images, or consist of short videos captured in highly controlled environments. These constraints significantly reduce the applicability of such systems in real-world scenarios. This paper proposes the Facial Biosignals Time–Series Dataset (FBioT), a novel dataset providing temporal descriptors and features extracted from common videos recorded in uncontrolled environments. To automate dataset construction, we propose Visual–Temporal Facial Expression Recognition (VT-FER), a method that stabilizes temporal effects using normalized measurements based on the principles of the Facial Action Coding System (FACS) and generates signature patterns of expression movements for correlation with real-world temporal events. To demonstrate feasibility, we applied the method to create a pilot version of the FBioT dataset. This pilot resulted in approximately 10,000 s of public videos captured under real-world facial motion conditions, from which we extracted 22 direct and virtual metrics representing facial muscle deformations. During this process, we preliminarily labeled and qualified 3046 temporal events representing two emotion classes. As a proof of concept, these emotion classes were used as input for training neural networks, with results summarized in this paper and available in an open-source online repository. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

24 pages, 1341 KiB  
Article
Emotion Classification from Electroencephalographic Signals Using Machine Learning
by Jesus Arturo Mendivil Sauceda, Bogart Yail Marquez and José Jaime Esqueda Elizondo
Brain Sci. 2024, 14(12), 1211; https://doi.org/10.3390/brainsci14121211 - 29 Nov 2024
Cited by 2 | Viewed by 1823
Abstract
Background: Emotions significantly influence decision-making, social interactions, and medical outcomes. Leveraging emotion recognition through Electroencephalography (EEG) signals offers potential advancements in personalized medicine, adaptive technologies, and mental health diagnostics. This study aimed to evaluate the performance of three neural network architectures—ShallowFBCSPNet, Deep4Net, and [...] Read more.
Background: Emotions significantly influence decision-making, social interactions, and medical outcomes. Leveraging emotion recognition through Electroencephalography (EEG) signals offers potential advancements in personalized medicine, adaptive technologies, and mental health diagnostics. This study aimed to evaluate the performance of three neural network architectures—ShallowFBCSPNet, Deep4Net, and EEGNetv4—for emotion classification using the SEED-V dataset. Methods: The SEED-V dataset comprises EEG recordings from 16 individuals exposed to 15 emotion-eliciting video clips per session, targeting happiness, sadness, disgust, neutrality, and fear. EEG data were preprocessed with a bandpass filter, segmented by emotional episodes, and split into training (80%) and testing (20%) sets. Three neural networks were trained and evaluated to classify emotions from the EEG signals. Results: ShallowFBCSPNet achieved the highest accuracy at 39.13%, followed by Deep4Net (38.26%) and EEGNetv4 (25.22%). However, significant misclassification issues were observed, such as EEGNetv4 predicting all instances as “Disgust” or “Neutral” depending on the configuration. Compared to state-of-the-art methods, such as ResNet18 combined with differential entropy, which achieved 95.61% accuracy on the same dataset, the tested models demonstrated substantial limitations. Conclusions: Our results highlight the challenges of generalizing across emotional states using raw EEG signals, emphasizing the need for advanced preprocessing and feature-extraction techniques. Despite these limitations, this study provides valuable insights into the potential and constraints of neural networks for EEG-based emotion recognition, paving the way for future advancements in the field. Full article
(This article belongs to the Section Computational Neuroscience and Neuroinformatics)
Show Figures

Figure 1

Back to TopTop