Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (627)

Search Parameters:
Keywords = facial expression recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 2039 KB  
Article
AI Creation of Facial Expression Database for Advanced Emotion Recognition Using Diffusion Model and Pre-Trained CNN Models
by Jia Jun Ho, Wee How Khoh, Ying Han Pang, Hui Yen Yap and Fang Chuen Lim Alvin
Appl. Sci. 2026, 16(6), 2769; https://doi.org/10.3390/app16062769 - 13 Mar 2026
Abstract
With applications in psychology, security, and human–computer interaction, facial expression recognition (FER) has become an essential tool for non-verbal communication. Current research often categorizes expressions into micro- and macro-types, yet existing datasets suffer from inconsistent labelling for classes, limited diversity of the databases, [...] Read more.
With applications in psychology, security, and human–computer interaction, facial expression recognition (FER) has become an essential tool for non-verbal communication. Current research often categorizes expressions into micro- and macro-types, yet existing datasets suffer from inconsistent labelling for classes, limited diversity of the databases, and insufficient scale for the currently available datasets. To address these gaps, this work proposes a novel framework combining the diffusion model with pre-trained CNNs. Leveraging original images from established datasets, CASME II, we generate synthetic facial expressions to augment training data, mitigating bias and inconsistency. The synthetic dataset is evaluated using ResNet 50, VGG16 and Inception V3 architectures. Inception V3 trained on the proposed AI-generated dataset and tested using CASME II, VGG-16 with data augmentation applied is trained on CASME II and tested on the proposed AI-generated dataset, and Inception V3 with 30% freezing layers method is trained on the proposed AI-generated dataset and tested using CASME II. These all successfully achieved state-of-the-art performance. The data augmentation and freezing layers approaches significantly improved the performance of the models. Our proposed approaches achieved state-of-the-art performance and outperformed most of the existing state-of-the-art approaches benchmarked in this study. Full article
39 pages, 7170 KB  
Article
Deep-Learning-Derived Facial Electromyogram Signatures of Emotion in Immersive Virtual Reality (bWell): Exploring the Impact of Emotional, Cognitive, and Physical Demands
by Zohreh H. Meybodi, Francis Thibault, Budhachandra Khundrakpam, Gino De Luca, Jing Zhang, Joshua A. Granek and Nusrat Choudhury
Sensors 2026, 26(6), 1827; https://doi.org/10.3390/s26061827 - 13 Mar 2026
Viewed by 28
Abstract
Emotional and workload-related states unfold dynamically during immersive virtual reality (VR) experiences, yet reliable physiological modeling in such environments remains challenging. We investigated whether multi-channel facial electromyography (fEMG), combined with spatio-temporal deep learning, can (i) accurately classify calibrated facial expressions across participants and [...] Read more.
Emotional and workload-related states unfold dynamically during immersive virtual reality (VR) experiences, yet reliable physiological modeling in such environments remains challenging. We investigated whether multi-channel facial electromyography (fEMG), combined with spatio-temporal deep learning, can (i) accurately classify calibrated facial expressions across participants and (ii) transfer to spontaneous, task-elicited behavior in immersive VR. Twelve adults completed a calibration phase involving four intentional expressions (smile, frown, raised eyebrow, neutral), followed by VR scenes designed to elicit emotional, cognitive, physical, and dual task demands. After participant-level physiological normalization, a single shared Convolutional Neural Network–Temporal Convolutional Network (CNN–TCN) model was trained and evaluated using leave-one-participant-out (LOPO) validation. The model achieved strong cross-participant performance (Macro-F1 = 0.88 ± 0.13; ROC-AUC = 0.95 ± 0.06). When applied to unlabeled spontaneous VR task-elicited fEMG recordings, the trained model generated continuous expression classes. Derived static and temporal expression features showed scene-dependent modulation and False Discovery Rate (FDR)-surviving associations, primarily with perceived physical demand (NASA-TLX). The observed muscle activation patterns were physiologically plausible and aligned with Facial Action Coding System (FACS)-based interpretations of underlying muscle activity. These findings demonstrate that end-to-end spatio-temporal modeling of raw fEMG enables facial expression sensing in immersive VR using a single shared model following physiological normalization. The proposed framework bridges calibrated expression learning and spontaneous task-elicited behavior, supporting privacy-preserving, continuous and physiologically grounded monitoring in human-centered VR applications. Full article
(This article belongs to the Special Issue Emotion Recognition Based on Sensors (3rd Edition))
Show Figures

Figure 1

21 pages, 3006 KB  
Article
Emotion Recognition from Facial Expressions Considering Individual Differences in Emotional Intelligence
by Yubin Kim, Ayoung Cho, Hyunwoo Lee and Mincheol Whang
Biomimetics 2026, 11(3), 174; https://doi.org/10.3390/biomimetics11030174 - 2 Mar 2026
Viewed by 247
Abstract
Facial expression recognition (FER) in naturalistic settings is constrained by label ambiguity and variability in stimulus–response alignment. Adopting a data-centric perspective, this study examined whether emotional intelligence (EI)-stratified training data influence FER performance by treating EI as a qualitative factor associated with affective [...] Read more.
Facial expression recognition (FER) in naturalistic settings is constrained by label ambiguity and variability in stimulus–response alignment. Adopting a data-centric perspective, this study examined whether emotional intelligence (EI)-stratified training data influence FER performance by treating EI as a qualitative factor associated with affective data consistency. Naturally elicited facial expressions were collected in a controlled emotion induction experiment with subjective arousal and valence ratings. Using response-driven labeling, neutral ratings were retained as indicators of ambiguity. Participants were grouped into High and Low EI based on the alignment between subjective evaluations and outputs from a pretrained affect estimator. Identical binary classifiers for arousal and valence recognition were trained while varying only the training data composition and evaluated across baseline, unambiguous, and ambiguous test sets using independent training repetitions with repetition-level statistical aggregation. EI-stratified training was associated with statistically detectable, context-dependent performance differences: group effects were observed primarily under baseline conditions and, to a lesser extent, under ambiguous conditions, whereas no reliable differences emerged under unambiguous conditions. Pooled discrimination differences were modest, but item-level analyses identified significant differences in classification correctness in specific task–condition combinations. Comparable patterns were observed across alternative backbone architectures. These findings indicate that FER performance in naturalistic contexts is influenced not only by model architecture but also by the statistical structure and internal coherence of the training data, supporting EI-informed data selection in ambiguity-prone scenarios. Full article
Show Figures

Figure 1

25 pages, 1175 KB  
Article
Facial Expression Recognition Integrating Multi-Stage Feature Sparse Constraints and Key Region Graph Learning
by Guanghui Xu, Yan Hong, Wanli Zhao, Zhongjie Mao, Duantengchuan Li and Yue Li
Information 2026, 17(3), 246; https://doi.org/10.3390/info17030246 - 2 Mar 2026
Viewed by 177
Abstract
Current Facial expression recognition methods typically extract facial features indiscriminately, incorporating expression-irrelevant information that compromises recognition accuracy. To overcome this, we propose Multi-stage Feature Sparse Constraints (MFSC), a novel model that integrates a Multi-scale Attention-based Sparse Window Selection (MSAWS) mechanism with key region [...] Read more.
Current Facial expression recognition methods typically extract facial features indiscriminately, incorporating expression-irrelevant information that compromises recognition accuracy. To overcome this, we propose Multi-stage Feature Sparse Constraints (MFSC), a novel model that integrates a Multi-scale Attention-based Sparse Window Selection (MSAWS) mechanism with key region graph learning. Notably, MFSC operates without dependence on pre-extracted facial landmarks, enabling more flexible deployment. The MSAWS mechanism progressively filters redundant features through multi-stage sparse attention, adaptively selecting the most discriminative facial patches. The selected tokens are structured into a dynamic graph to model regional relationships via graph neural networks (GNNs). Critically, our framework further introduces a global-guided fusion module, which effectively integrates fine-grained local features from an IR50 backbone with the global topological features from the GNN through cross-attention. This integration enables complementary strengths, where local details are enhanced by global semantic context. Comprehensive experiments on RAF-DB, FER2013, and AffectNet-7 datasets demonstrate MFSC’s superior performance, achieving state-of-the-art accuracy of 92.31%, 76.21%, and 67.35%, respectively. These results validate the effectiveness of our approach in focusing computational resources on expression-salient regions while maintaining a lightweight and efficient architecture. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

21 pages, 1469 KB  
Article
Development of Surveillance Robots Based on Face Recognition Using High-Order Statistical Features and Evidence Theory
by Slim Ben Chaabane, Rafika Harrabi, Anas Bushnag and Hassene Seddik
J. Imaging 2026, 12(3), 107; https://doi.org/10.3390/jimaging12030107 - 28 Feb 2026
Viewed by 307
Abstract
The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are [...] Read more.
The recent advancements in technologies such as artificial intelligence (AI), computer vision (CV), and Internet of Things (IoT) have significantly extended various fields, particularly in surveillance systems. These innovations enable real-time facial recognition processing, enhancing security and ensuring safety. However, mobile robots are commonly employed in surveillance systems to handle risky tasks that are beyond human capability. In this paper, we present a prototype of a cost-effective mobile surveillance robot built on the Raspberry PI 4, designed for integration into various industrial environments. This smart robot detects intruders using IoT and face recognition technology. The proposed system is equipped with a passive infrared (PIR) sensor and a camera for capturing live-streaming video and photos, which are sent to the control room through IoT technology. Additionally, the system uses face recognition algorithms to differentiate between company staff and potential intruders. The face recognition method combines high-order statistical features and evidence theory to improve facial recognition accuracy and robustness. High-order statistical features are used to capture complex patterns in facial images, enhancing discrimination between individuals. Evidence theory is employed to integrate multiple information sources, allowing for better decision-making under uncertainty. This approach effectively addresses challenges such as variations in lighting, facial expressions, and occlusions, resulting in a more reliable and accurate face recognition system. When the system detects an unfamiliar individual, it sends out alert notifications and emails to the control room with the captured picture using IoT. A web interface has also been set up to control the robot from a distance through Wi-Fi connection. The proposed face recognition method is evaluated, and a comparative analysis with existing techniques is conducted. Experimental results with 400 test images of 40 individuals demonstrate the effectiveness of combining various attribute images in improving human face recognition performance. Experimental results indicate that the algorithm can identify human faces with an accuracy of 98.63%. Full article
Show Figures

Figure 1

29 pages, 1696 KB  
Article
Optimizing Lightweight Convolutional Networks via Topological Attention and Entropy-Constrained Distillation: A Spectral–Topological Approach for Robust Facial Expression Recognition
by Xiaohong Dong, Yu Gao, Mengyan Liu and Wenxiaoman Yu
Algorithms 2026, 19(3), 177; https://doi.org/10.3390/a19030177 - 26 Feb 2026
Viewed by 169
Abstract
Deep learning models typically rely on large-scale datasets with accurate annotations, yet real-world applications inevitably suffer from label noise, which severely degrades generalization—particularly for lightweight neural networks with limited capacity. Existing learning with noisy labels methods are mainly designed for over-parameterized models and [...] Read more.
Deep learning models typically rely on large-scale datasets with accurate annotations, yet real-world applications inevitably suffer from label noise, which severely degrades generalization—particularly for lightweight neural networks with limited capacity. Existing learning with noisy labels methods are mainly designed for over-parameterized models and are often unsuitable for resource-constrained deployment. To address this challenge, we propose a robust framework that integrates a Micro Hybrid Attention Module (MHAM) with knowledge distillation (KD) for lightweight architectures such as MobileNetV3. MHAM employs a decoupled channel–spatial attention design to enhance discriminative feature extraction while suppressing noise-sensitive background responses. From a graph–signal perspective, MHAM can be interpreted as a spectral smoothing operator that improves optimization stability. In addition, knowledge distillation with soft teacher supervision mitigates overfitting to corrupted hard labels and reduces prediction uncertainty. Extensive experiments demonstrate the effectiveness of the proposed method. On FER2013, a real-world noisy facial expression recognition benchmark, our approach achieves 68.5% accuracy with only 0.52M parameters, while reducing optimization variance by 24%. On CIFAR-10 with 40% symmetric label noise, it improves accuracy from 54.85% to 60.10%. On CIFAR-10N with multiple types of real-world human annotation noise, the proposed method consistently achieves 63.9–71.9% accuracy under different noise protocols. These results show that the proposed framework provides an efficient and robust solution for noisy label learning in lightweight facial expression and object classification on edge devices. Full article
(This article belongs to the Special Issue Deep Neural Networks and Optimization Algorithms (2nd Edition))
Show Figures

Figure 1

18 pages, 1956 KB  
Article
Dynamic Occlusion-Aware Facial Expression Recognition Guided by AA-ViT
by Xiangwei Mou, Xiuping Xie, Yongfu Song and Rijun Wang
Electronics 2026, 15(4), 764; https://doi.org/10.3390/electronics15040764 - 11 Feb 2026
Viewed by 270
Abstract
In complex natural scenarios, facial expression recognition often encounters partial occlusions caused by glasses, hand gestures, and hairstyles, making it difficult for models to extract effective features and thereby reducing recognition accuracy. Existing methods often employ attention mechanisms to enhance expression-related features, but [...] Read more.
In complex natural scenarios, facial expression recognition often encounters partial occlusions caused by glasses, hand gestures, and hairstyles, making it difficult for models to extract effective features and thereby reducing recognition accuracy. Existing methods often employ attention mechanisms to enhance expression-related features, but they fail to adequately address the issue where high-frequency responses in occluded regions can disperse attention weights (e.g., incorrectly focus on occluded areas), making it challenging to effectively utilize local cues around the occlusions and limiting performance improvement. To address this, this paper proposes a network based on an adaptive attention mechanism (Adaptive Attention Vision Transformer, AA-ViT). First, an Adaptive Attention module (ADA) is designed to dynamically adjust attention scores in occluded regions, enhancing the effective information in features. Next, a Dual-Branch Multi-Layer Perceptron (DB-MLP) replaces the single linear layer to improve feature representation and model classification capability. Additionally, a Random Erasure (RE) strategy is introduced to enhance model robustness. Finally, to address the issue of model training instability caused by class imbalance in the training dataset, a hybrid loss function combining Focal Loss and Cross-Entropy Loss is adopted to ensure training stability. Experimental results show that AA-ViT achieves expression recognition accuracies of 90.66% and 90.01% on the RAF-DB and FERPlus datasets, respectively, representing improvements of 4.58 and 18.9 percentage points over the baseline ViT model, with only a 24.3% increase in parameter count. Compared to existing methods, the proposed approach demonstrates superior performance in occluded facial expression recognition tasks. Full article
Show Figures

Figure 1

23 pages, 2302 KB  
Article
Learnable Feature Disentanglement with Temporal-Complemented Motion Enhancement for Micro-Expression Recognition
by Yu Qian, Shucheng Huang and Kai Qu
Entropy 2026, 28(2), 180; https://doi.org/10.3390/e28020180 - 4 Feb 2026
Viewed by 339
Abstract
Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those [...] Read more.
Micro-expressions (MEs) are involuntary facial movements that reveal genuine emotions, holding significant value in fields like deception detection and psychological diagnosis. However, micro-expression recognition (MER) is fundamentally challenged by the entanglement of subtle emotional motions with identity-specific features. Traditional methods, such as those based on Robust Principal Component Analysis (RPCA), attempt to separate identity and motion components through fixed preprocessing and coarse decomposition. However, these methods can inadvertently remove subtle emotional cues and are disconnected from subsequent module training, limiting the discriminative power of features. Inspired by the Bruce–Young model of facial cognition, which suggests that facial identity and expression are processed via independent neural routes, we recognize the need for a more dynamic, learnable disentanglement paradigm for MER. We propose LFD-TCMEN, a novel network that introduces an end-to-end learnable feature disentanglement framework. The network is synergistically optimized by a multi-task objective unifying orthogonality, reconstruction, consistency, cycle, identity, and classification losses. Specifically, the Disentangle Representation Learning (DRL) module adaptively isolates pure motion patterns from subject-specific appearance, overcoming the limitations of static preprocessing, while the Temporal-Complemented Motion Enhancement (TCME) module integrates purified motion representations—highlighting subtle facial muscle activations—with optical flow dynamics to comprehensively model the spatiotemporal evolution of MEs. Extensive experiments on CAS(ME)3 and DFME benchmarks demonstrate that our method achieves state-of-the-art cross-subject performance, validating the efficacy of the proposed learnable disentanglement and synergistic optimization. Full article
Show Figures

Figure 1

5 pages, 484 KB  
Proceeding Paper
Dynamic Facial Expression Recognition by Concatenation of Raw, Semi-Raw, and Distance Features
by Jose Sotelo-Barrales, Mariko Nakano-Miyatake, David Mata-Mendoza, Hector Perez-Meana and Enrique Escamilla-Hernandez
Eng. Proc. 2026, 123(1), 5; https://doi.org/10.3390/engproc2026123005 - 2 Feb 2026
Viewed by 179
Abstract
We propose a method for dynamic facial expression recognition that integrates three complementary feature streams from video sequences: (1) raw texture features extracted with EfficientNet-B0, (2) deep geometric features from face mesh representations (semi-raw, EfficientNet-B0), and (3) explicit geometric features derived from facial [...] Read more.
We propose a method for dynamic facial expression recognition that integrates three complementary feature streams from video sequences: (1) raw texture features extracted with EfficientNet-B0, (2) deep geometric features from face mesh representations (semi-raw, EfficientNet-B0), and (3) explicit geometric features derived from facial landmark distances. After refinement with Neighborhood Component Analysis (NCA), features are concatenated and fed to Bi-LSTM modeling temporal dynamics. The method achieved 58.25% (UAR) and 58.40% (WAR) on CREMA-D, and it achieved 82.81% (UAR) and 82.99% (WAR) on RAVDESS. The Bi-LSTM contains 8.68 M parameters, while the EfficientNet-B0 feature extractors add approximately 4 M parameters. Full article
(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)
Show Figures

Figure 1

24 pages, 401 KB  
Article
A Multimodal Transformer-Based Framework for Emotion Analysis in Multilingual Video Content
by Sehmus Yakut, Yusuf Taha Tuten, Eren Caglar and Mehmet S. Aktas
Computers 2026, 15(2), 77; https://doi.org/10.3390/computers15020077 - 1 Feb 2026
Viewed by 594
Abstract
This research addresses the challenge of inferring complex psychological states, including stress, fatigue, anxiety, cognitive load, and boredom, from facial expressions. We propose an interpretable, literature-informed emotion-weighting methodology that transforms the eight-emotion probability outputs of facial emotion recognition models into continuous estimates of [...] Read more.
This research addresses the challenge of inferring complex psychological states, including stress, fatigue, anxiety, cognitive load, and boredom, from facial expressions. We propose an interpretable, literature-informed emotion-weighting methodology that transforms the eight-emotion probability outputs of facial emotion recognition models into continuous estimates of these five psychological states using weights derived from the Valence–Arousal framework, providing a principled bridge between discrete emotion predictions and higher-level affective constructs. The proposed formulation is evaluated across six representative deep learning architectures—a baseline CNN (ResNet-50), a modern CNN (ConvNeXt), a hybrid attention-based model (DDAMFN), and three Transformer-based models (ViT, BEiT, and Swin). Our results demonstrate that strong performance on discrete FER tasks does not directly translate to consistent behavior in complex state inference; instead, architectures capable of preserving subtle and distributed affective cues yield more stable and interpretable state estimates, with DDAMFN and Vision Transformer models exhibiting the most consistent performance across the evaluated psychological states. These findings highlight the central role of the proposed emotion-weighting formulation and the importance of architecture selection beyond categorical accuracy in complex affective state analysis. Full article
(This article belongs to the Special Issue Computational Science and Its Applications 2025 (ICCSA 2025))
Show Figures

Figure 1

24 pages, 18520 KB  
Article
Cross-Dataset Facial Micro-Expression Recognition with Regularization Learning and Action Unit-Guided Data Augmentation
by Ju Zhou, Xinyu Liu, Lin Wang, Tao Wang and Haolin Xia
Entropy 2026, 28(2), 150; https://doi.org/10.3390/e28020150 - 29 Jan 2026
Viewed by 340
Abstract
With the growing development of facial micro-expression recognition technology, its practical application value has attracted increasing attention. In real-world scenarios, facial micro-expression recognition typically involves cross-dataset evaluation, where training and testing samples come from different datasets. Specifically, cross-dataset micro-expression recognition employs multi-dataset composite [...] Read more.
With the growing development of facial micro-expression recognition technology, its practical application value has attracted increasing attention. In real-world scenarios, facial micro-expression recognition typically involves cross-dataset evaluation, where training and testing samples come from different datasets. Specifically, cross-dataset micro-expression recognition employs multi-dataset composite training and unseen single-dataset testing. This setup introduces two major challenges: inconsistent feature distributions across training sets and data imbalance. To address the distribution discrepancy of the same category across different training datasets, we propose a plug-and-play batch regularization learning module that constrains weight discrepancies across datasets through information-theoretic regularization, facilitating the learning of domain-invariant representations while preventing overfitting to specific source domains. To mitigate the data imbalance issue, we propose an Action Unit (AU)-guided generative adversarial network (GAN) for synthesizing micro-expression samples. This approach uses K-means clustering to obtain cluster centers of AU intensities for each category, which are then used to guide the GAN in generating balanced micro-expression samples. To validate the effectiveness of the proposed methods, extensive experiments are conducted on CNN, ResNet, and PoolFormer architectures. The results demonstrate that our approach achieves superior performance in cross-dataset recognition compared to state-of-the-art methods. Full article
Show Figures

Figure 1

19 pages, 3470 KB  
Article
Driver Monitoring System Using Computer Vision for Real-Time Detection of Fatigue, Distraction and Emotion via Facial Landmarks and Deep Learning
by Tamia Zambrano, Luis Arias, Edgar Haro, Victor Santos and María Trujillo-Guerrero
Sensors 2026, 26(3), 889; https://doi.org/10.3390/s26030889 - 29 Jan 2026
Viewed by 861
Abstract
Car accidents remain a leading cause of death worldwide, with drowsiness and distraction accounting for roughly 25% of fatal crashes in Ecuador. This study presents a real-time driver monitoring system that uses computer vision and deep learning to detect fatigue, distraction, and emotions [...] Read more.
Car accidents remain a leading cause of death worldwide, with drowsiness and distraction accounting for roughly 25% of fatal crashes in Ecuador. This study presents a real-time driver monitoring system that uses computer vision and deep learning to detect fatigue, distraction, and emotions from facial expressions. It combines a MobileNetV2-based CNN trained on RAF-DB for emotion recognition and MediaPipe’s 468 facial landmarks to compute the EAR (Eye Aspect Ratio), the MAR (Mouth Aspect Ratio), the gaze, and the head pose. Tests with 27 participants in both real and simulated driving environments showed strong results. There was a 100% accuracy in detecting distraction, 85.19% for yawning, and 88.89% for eye closure. The system also effectively recognized happiness (100%) and anger/disgust (96.3%). However, it struggled with sadness and failed to detect fear, likely due to the subtlety of real-world expressions and limitations in the training dataset. Despite these challenges, the results highlight the importance of integrating emotional awareness into driver monitoring systems, which helps reduce false alarms and improve response accuracy. This work supports the development of lightweight, non-invasive technologies that enhance driving safety through intelligent behavior analysis. Full article
(This article belongs to the Special Issue Sensor Fusion for the Safety of Automated Driving Systems)
Show Figures

Figure 1

12 pages, 474 KB  
Article
Toward Generalized Emotion Recognition in VR by Bridging Natural and Acted Facial Expressions
by Rahat Rizvi Rahman, Hee Yun Choi, Joonghyo Lim, Go Eun Lee, Seungmoo Lee, Chungyean Cho and Kostadin Damevski
Sensors 2026, 26(3), 845; https://doi.org/10.3390/s26030845 - 28 Jan 2026
Viewed by 350
Abstract
Recognizing emotions accurately in virtual reality (VR) enables adaptive and personalized experiences across gaming, therapy, and other domains. However, most existing facial emotion recognition models rely on acted expressions collected under controlled settings, which differ substantially from the spontaneous and subtle emotions that [...] Read more.
Recognizing emotions accurately in virtual reality (VR) enables adaptive and personalized experiences across gaming, therapy, and other domains. However, most existing facial emotion recognition models rely on acted expressions collected under controlled settings, which differ substantially from the spontaneous and subtle emotions that arise during real VR experiences. To address this challenge, the objective of this study is to develop and evaluate generalizable emotion recognition models that jointly learn from both acted and natural facial expressions in virtual reality. We integrate two complementary datasets collected using the Meta Quest Pro headset, one capturing natural emotional reactions and another containing acted expressions. We evaluate multiple model architectures, including convolutional and domain-adversarial networks, and a mixture-of-experts model that separates natural and acted expressions. Our experiments show that models trained jointly on acted and natural data achieve stronger cross-domain generalization. In particular, the domain-adversarial and mixture-of-experts configurations yield the highest accuracy on natural and mixed-emotion evaluations. Analysis of facial action units (AUs) reveals that natural and acted emotions rely on partially distinct AU patterns, while generalizable models learn a shared representation that integrates salient AUs from both domains. These findings demonstrate that bridging acted and natural expression domains can enable more accurate and robust VR emotion recognition systems. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

17 pages, 6316 KB  
Article
Research on a Lightweight Real-Time Facial Expression Recognition System Based on an Improved Mini-Xception Algorithm
by Xuchen Sun, Jianfeng Yang and Yi Zhou
Information 2026, 17(1), 111; https://doi.org/10.3390/info17010111 - 22 Jan 2026
Viewed by 334
Abstract
This paper proposes a lightweight facial expression recognition model based on an improved Mini-Xception algorithm to address the issue of deploying existing models on resource-constrained devices. The model achieves lightweight facial expression recognition, particularly for elder-oriented applications, by introducing depthwise separable convolutions, residual [...] Read more.
This paper proposes a lightweight facial expression recognition model based on an improved Mini-Xception algorithm to address the issue of deploying existing models on resource-constrained devices. The model achieves lightweight facial expression recognition, particularly for elder-oriented applications, by introducing depthwise separable convolutions, residual connections, and a four-class expression reconstruction. These designs significantly reduce the number of parameters and computational complexity while maintaining high accuracy. The model achieves an accuracy of 79.96% on the FER2013 dataset, outperforming various other popular models, and enables efficient real-time inference in standard CPU environments. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 9549 KB  
Article
Micro-Expression Recognition via LoRA-Enhanced DinoV2 and Interactive Spatio-Temporal Modeling
by Meng Wang, Xueping Tang, Bing Wang and Jing Ren
Sensors 2026, 26(2), 625; https://doi.org/10.3390/s26020625 - 16 Jan 2026
Viewed by 476
Abstract
Micro-expression recognition (MER) is challenged by a brief duration, low intensity, and heterogeneous spatial frequency patterns. This study introduces a novel MER architecture that reduces computational cost by fine-tuning a large feature extraction model with LoRA, while integrating frequency-domain transformation and graph-based temporal [...] Read more.
Micro-expression recognition (MER) is challenged by a brief duration, low intensity, and heterogeneous spatial frequency patterns. This study introduces a novel MER architecture that reduces computational cost by fine-tuning a large feature extraction model with LoRA, while integrating frequency-domain transformation and graph-based temporal modeling to minimize preprocessing requirements. A Spatial Frequency Adaptive (SFA) module decomposes high- and low-frequency information with dynamic weighting to enhance sensitivity to subtle facial texture variations. A Dynamic Graph Attention Temporal (DGAT) network models video frames as a graph, combining Graph Attention Networks and LSTM with frequency-guided attention for temporal feature fusion. Experiments on the SAMM, CASME II, and SMIC datasets demonstrate superior performance over existing methods. On the SAMM 5-class setting, the proposed approach achieves an unweighted F1 score (UF1) of 81.16% and an unweighted average recall (UAR) of 85.37%, outperforming the next best method by 0.96% and 2.27%, respectively. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

Back to TopTop