Next Article in Journal
Parameter Identification in Nonlinear Vibrating Systems Using Runge–Kutta Integration and Levenberg–Marquardt Regression
Previous Article in Journal
Artificial Neural Network-Based Optimization of an Inlet Perforated Distributor Plate for Uniform Coolant Entry in 10 kWh 24S24P Cylindrical Battery Module
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition

by
Katherine Lin Shu
1 and
Mu-Jiang-Shan Wang
2,*
1
Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9PL, UK
2
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Symmetry 2026, 18(1), 15; https://doi.org/10.3390/sym18010015 (registering DOI)
Submission received: 18 November 2025 / Revised: 4 December 2025 / Accepted: 10 December 2025 / Published: 21 December 2025
(This article belongs to the Section Computer)

Abstract

Facial expression recognition (FER) is a key task in affective computing and human–computer interaction, aiming to decode facial muscle movements into emotional categories. Although deep learning-based FER has achieved remarkable progress, robust recognition under uncontrolled conditions (e.g., illumination change, pose variation, occlusion, and cultural diversity) remains challenging. Traditional Convolutional Neural Networks (CNNs) are effective at local feature extraction but limited in modeling global dependencies, while Vision Transformers (ViT) provide global context modeling yet often neglect fine-grained texture and frequency cues that are critical for subtle expression discrimination. Moreover, existing approaches usually focus on single-domain representations and lack adaptive strategies to integrate heterogeneous cues across spatial, semantic, and spectral domains, leading to limited cross-domain generalization. To address these limitations, this study proposes a unified Multi-Domain Feature Enhancement and Fusion (MDFEFT) framework that combines a ViT-based global encoder with three complementary branches—channel, spatial, and frequency—for comprehensive feature learning. Taking into account the approximately bilateral symmetry of human faces and the asymmetric distortions introduced by pose, occlusion, and illumination, the proposed MDFEFT framework is designed to learn symmetry-aware and asymmetry-robust representations for facial expression recognition across diverse domains. An adaptive Cross-Domain Feature Enhancement and Fusion (CDFEF) module is further introduced to align and integrate heterogeneous features, achieving domain-consistent and illumination-robust expression understanding. The experimental results show that the proposed method consistently outperforms existing CNN-, Transformer-, and ensemble-based models. The proposed model achieves accuracies of 0.997, 0.796, and 0.776 on KDEF, FER2013, and RAF-DB, respectively. Compared with the strongest baselines, it further improves accuracy by 0.3%, 2.2%, and 1.9%, while also providing higher F1-scores and better robustness in cross-domain testing. These results confirm the effectiveness and strong generalization ability of the proposed framework for real-world facial expression recognition.
Keywords: facial expression recognition; Vision Transformer; multi-domain feature enhancement; frequency–spatial fusion; cross-domain generalization facial expression recognition; Vision Transformer; multi-domain feature enhancement; frequency–spatial fusion; cross-domain generalization

Share and Cite

MDPI and ACS Style

Shu, K.L.; Wang, M.-J.-S. Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition. Symmetry 2026, 18, 15. https://doi.org/10.3390/sym18010015

AMA Style

Shu KL, Wang M-J-S. Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition. Symmetry. 2026; 18(1):15. https://doi.org/10.3390/sym18010015

Chicago/Turabian Style

Shu, Katherine Lin, and Mu-Jiang-Shan Wang. 2026. "Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition" Symmetry 18, no. 1: 15. https://doi.org/10.3390/sym18010015

APA Style

Shu, K. L., & Wang, M.-J.-S. (2026). Multi-Domain Feature Fusion Transformer with Cross-Domain Robustness for Facial Expression Recognition. Symmetry, 18(1), 15. https://doi.org/10.3390/sym18010015

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop