Combining Fixed-Weight ArcFace Loss and Vision Transformer for Facial Expression Recognition
Abstract
1. Introduction
- -
- Exploring and analyzing the loss function in the face recognition task in depth, through further analysis of ArcFace loss, the authors add a new loss term with a constrained weight vector to the fully connected layer on top of the ArcFace loss. This design enforces a uniform angular distribution of class centers, enhancing intra-class compactness and inter-class separability. Consequently, it improves recognition of minority expression categories and mitigates the adverse impact of data imbalance on classification performance.
- -
- The model features a lightweight and efficient training design. Under the constraint on the weights, the classification layer no longer needs to back-propagate to update the weights, which significantly reduces the number of parameters and gradient calculation overhead. This leads to a substantial reduction in training time and computational cost.
- -
- Under the FER2013 [33] and RAF-DB [34] facial expression databases, the authors conducted experiments using Vision Transformers and traditional convolutional networks to perform feature extraction tasks. The authors systematically compared the proposed method with traditional loss functions and the standard ArcFace loss function, yielding consistent and reliable improvements.
2. Proposed Approach
2.1. Review of ArcFace Loss
2.2. Advanced Fixed Weight
| Algorithm 1: computing weights for Arcface loss |
| Input: |
| Training step: |
| then |
| 8. End while |
| Output: |
2.3. Expression Feature Extraction Network
2.3.1. Vision Transformer
2.3.2. Lightweight Convolutional Neural Network
2.4. Proposed Model Based on Advanced Fixed-Weight ArcFace Loss and Vision Transformer
3. Experiment
3.1. Preprocessing
3.2. Experimental Datasets
- FER2013 Dataset [33]
- 2.
- RAF-DB Dataset [34]
3.3. Implementation Details
3.4. Experience Results
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kara, O.; Churamani, N.; Gunes, H. Towards Fair Affective Robotics: Continual Learning for Mitigating Bias in Facial Expression and Action Unit Recognition. arXiv 2021, arXiv:2103.09233. [Google Scholar] [CrossRef]
- Essahraui, S.; Lamaakal, I.; El Hamly, I.; Maleh, Y.; Ouahbi, I.; El Makkaoui, K.; Bouami, M.F.; Pławiak, P.; Alfarraj, O.; El-Latif, A.A.A. Real-Time Driver Drowsiness Detection Using Facial Analysis and Machine Learning Techniques. Sensors 2025, 25, 812. [Google Scholar] [CrossRef]
- Ekman, P.; Friesen, W.V. Constants across cultures in the face and emotion. J. Pers. Soc. Psychol. 1971, 17, 124–129. [Google Scholar] [CrossRef]
- Luo, T.; Liu, Y.; Liu, Y.; Zhang, A.; Wang, X.; Zhan, Y.; Tang, C.; Liu, L.; Chen, Z. DIG-FACE: De-biased Learning for Generalized Facial Expression Category Discovery. arXiv 2024, arXiv:2409.20098. [Google Scholar]
- Sajjad, M.; Ullah, F.U.M.; Ullah, M.; Christodoulou, G.; Cheikh, F.A.; Hijji, M.; Muhammad, K.; Rodrigues, J.J. A comprehensive survey on deep facial expression recognition: Challenges, applications, and future guidelines. Alex. Eng. J. 2023, 68, 817–840. [Google Scholar] [CrossRef]
- Wang, Y.; Yan, S.; Liu, Y.; Song, W.; Liu, J.; Chang, Y.; Mai, X.; Hu, X.; Zhang, W.; Gan, Z. A Survey on Facial Expression Recognition of Static and Dynamic Emotions. arXiv 2024, arXiv:2408.15777. [Google Scholar] [CrossRef]
- Giannopoulos, P.; Perikos, I.; Hatzilygeroudis, I. Deep Learning Approaches for Facial Emotion Recognition: A Case Study on FER-2013. In Advances in Hybridization of Intelligent Methods; Springer: Cham, Switzerland, 2017; pp. 1–16. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, Y.; Liu, X.; Deng, W. Leave No Stone Unturned: Mine Extra Knowledge for Imbalanced Facial Expression Recognition. arXiv 2023, arXiv:2310.19636. [Google Scholar] [CrossRef]
- Lu, J.; Wu, B. A Loss Function Base on Softmax for Expression Recognition. Mob. Inf. Syst. 2022, 2022, 8230154. [Google Scholar] [CrossRef]
- Nagata, M.; Okajima, K. Effect of observer’s cultural background and masking condition of target face on facial expression recognition for machine-learning dataset. PLoS ONE 2024, 19, e0313029. [Google Scholar] [CrossRef]
- Pham, T.-D.; Duong, M.-T.; Ho, Q.-T.; Lee, S.; Hong, M.-C. CNN-Based Facial Expression Recognition with Simultaneous Consideration of Inter-Class and Intra-Class Variations. Sensors 2023, 23, 9658. [Google Scholar] [CrossRef]
- Mao, Y. Optimization of Facial Expression Recgonition on ResNet-18 using Focal Loss and CosFace Loss. In Proceedings of the 2022 International Symposium on Advances in Informatics, Electronics and Education (ISAIEE), Frankfurt, Germany, 17–19 December 2022; pp. 161–163. [Google Scholar]
- Waldner, D.; Mitra, S. Pairwise Discernment of AffectNet Expressions with ArcFace. arXiv 2024, arXiv:2412.01860. [Google Scholar] [CrossRef]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A Discriminative Feature Learning Approach for Deep Face Recognition; Leibe, B., Matas, J., Sebe, N., Eds.; Computer vision-ECCV 2016. Lecture notes in computer science; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Kopalidis, T.; Solachidis, V.; Vretos, N.; Daras, P. Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets. Information 2024, 15, 135. [Google Scholar] [CrossRef]
- Zhong, Y.; Deng, W.; Hu, J.; Zhao, D.; Li, X.; Wen, D. SFace: Sigmoid-Constrained Hypersphere Loss for Robust Face Recognition. IEEE Trans. Image Process. 2021, 30, 2587–2598. [Google Scholar] [CrossRef] [PubMed]
- Liu, Y.; Wang, W.; Zhan, Y.; Feng, S.; Liu, K.; Chen, Z. Pose-disentangled Contrastive Learning for Self-supervised Facial Representation. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 9717–9728. [Google Scholar]
- Xu, J.; Liu, X.; Zhang, X.; Si, Y.-W.; Li, X.; Shi, Z.; Wang, K.; Gong, X. X2-Softmax: Margin Adaptive Loss Function for Face Recognition. Expert Syst. Appl. 2024, 249, 123791. [Google Scholar] [CrossRef]
- Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4690–4699. [Google Scholar]
- Kato, S.; Hotta, K. Enlarged Large Margin Loss for Imbalanced Classification. In Proceedings of the 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Honolulu, HI, USA, 1–4 October 2023; pp. 1696–1701. [Google Scholar]
- Tutuianu, G.I.; Liu, Y.; Alamäki, A.; Kauttonen, J. Benchmarking deep Facial Expression Recognition: An extensive protocol with balanced dataset in the wild. Eng. Appl. Artif. Intell. 2024, 136, 108983. [Google Scholar] [CrossRef]
- Fard, A.P.; Hosseini, M.M.; Sweeny, T.D.; Mahoor, M.H. AffectNet+: A Database for Enhancing Facial Expression Recognition with Soft-Labels. IEEE Trans. Affect. Comput. 2025; early access. [Google Scholar] [CrossRef]
- Gaya-Morey, F.X.; Manresa-Yee, C.; Martinie, C.; Buades-Rubio, J.M. Evaluating Facial Expression Recognition Datasets for Deep Learning: A Benchmark Study with Novel Similarity Metrics. arXiv 2025, arXiv:2503.20428. [Google Scholar] [CrossRef]
- Chen, T.; Zhang, D.; Lee, D.-J. A New Joint Training Method for Facial Expression Recognition with Inconsistently Annotated and Imbalanced Data. Electronics 2024, 13, 3891. [Google Scholar] [CrossRef]
- Li, S.; Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 2020, 13, 1195–1215. [Google Scholar] [CrossRef]
- Hu, P.; Tao, Y.; Bao, Q.; Wang, G.; Yang, W. EvenFace: Deep Face Recognition with Uniform Distribution of Identities. In Proceedings of the 2023 IEEE International Conference on Multimedia and Expo (ICME), Brisbane, Australia, 10–14 July 2023; pp. 1733–1738. [Google Scholar]
- Jiang, F.; Yang, X.; Ren, H.; Li, Z.; Shen, K.; Jiang, J.; Li, Y. DuaFace: Data uncertainty in angular based loss for face recognition. Pattern Recognit. Lett. 2023, 167, 25–29. [Google Scholar] [CrossRef]
- Zhang, Z.; Gong, X.; Chen, J. Face recognition based on adaptive margin and diversity regularization constraints. IET Image Process. 2020, 15, 1105–1114. [Google Scholar] [CrossRef]
- Ngo, Q.T.; Yoon, S. Facial expression recognition based on weighted-cluster loss and deep transfer learning using a highly imbalanced dataset. Sensors 2020, 20, 2639. [Google Scholar] [CrossRef]
- Liu, Y.; Li, Y.; Yi, X.; Hu, Z.; Zhang, H.; Liu, Y. Lightweight ViT model for micro-expression recognition enhanced by transfer learning. Front. Neurorobot. 2022, 16, 922761. [Google Scholar] [CrossRef]
- Jiang, H. Analyzing the Current Status of the Transformer Model for a Face Recognition Application. In Proceedings of the International Conference on Computer Science and Electronic Information Technology (CSEIT 2025), Toronto, ON, Canada, 19–20 July 2025; Volume 78, p. 04029. [Google Scholar]
- Guliyev, N.J.; Ismailov, V.E. On the approximation by single hidden layer feedforward neural networks with fixed weights. Neural Netw. 2018, 98, 296–304. [Google Scholar] [CrossRef] [PubMed]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.H.; et al. Challenges in representation learning: A report on three machine learning contests. Neural Netw. 2015, 64, 59–63. [Google Scholar] [CrossRef]
- Li, S.; Deng, W.; Du, J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2584–2593. [Google Scholar] [CrossRef]
- Liu, W.; Wen, Y.; Yu, Z.; Yang, M. Large-Margin Softmax Loss for Convolutional Neural Networks. arXiv 2016, arXiv:1612.02295. [Google Scholar]
- Hasan, M.; Sami, S.M.; Nasrabadi, N. Text-Guided Face Recognition using Multi-Granularity Cross-Modal Contrastive Learning. In Proceedings of the 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2024; pp. 5772–5781. [Google Scholar]
- Liu, W.; Wen, Y.; Raj, B.; Singh, R.; Weller, A. SphereFace Revived: Unifying Hyperspherical Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2458–2474. [Google Scholar] [CrossRef]
- Wang, H.; Wang, Y.; Zhou, Z.; Ji, X.; Gong, D.; Zhou, J.; Li, Z.; Liu, W. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the 26th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
- Zhao, F.; Zhang, P.; Zhang, R.; Li, M. UnifiedFace: A Uniform Margin Loss Function for Face Recognition. Appl. Sci. 2023, 13, 2350. [Google Scholar] [CrossRef]
- Wang, Z.; Zeng, F.; Liu, S.; Zeng, B. OAENet: Oriented attention ensemble for accurate facial ex-pression recognition. Pattern Recognit. 2021, 112, 0031–3203. [Google Scholar] [CrossRef]
- Son, M.; Koo, I.; Park, J.; Kim, C. Difficulty-aware Balancing Margin Loss for Long-tailed Recognition. Proc. AAAI Conf. Artif. Intell. 2025, 39, 20522–20530. [Google Scholar] [CrossRef]
- Dosovitskiy, A. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Zhao, Z.; Liu, Q. Former-dfer: Dynamic facial expression recognition transformer. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 1553–1561. [Google Scholar]












| Symbol | Definition |
|---|---|
| Softmax-loss | |
| Cosine-Softmax Loss | |
| ArcFace Loss with angular margin | |
| Inter-class angular constraint loss | |
| Total loss function | |
| Feature vector of the -th sample | |
| Weight vector of the -th class | |
| Bias term of the -th class | |
| Ground-truth class label of the -th sample | |
| N | Total number of training samples |
| n | Total number of classes |
| Angle between feature vector and weight vector of the -th class | |
| Cosine similarity between feature vector and weight vector of the -th class | |
| s | Feature scaling factor |
| m | Angular margin in ArcFace Loss |
| ω | Weight coefficient of inter-class angular constraint loss |
| φ | Weight coefficient of the inter-class angular constraint term in total loss |
| Dot product of weight vectors of the -th and -th classes | |
| Angle between weight vectors of the -th and -th classes () |
| Model | Transformer | ResNet18 | VGG19 | |
|---|---|---|---|---|
| Method | ||||
| Softmax-loss [9,10,11,12] | 71.382 | 82.366 | 80.963 | |
| Island-loss [14] | 71.089 | 83.472 | 81.672 | |
| ArcFace-loss [19] | 71.683 | 84.126 | 82.040 | |
| Our method | 72.197 | 84.485 | 82.562 | |
| Test Set | Private | Public | |
|---|---|---|---|
| Method | |||
| Softmax-loss [9,10,11,12] | 68.877 | 70.605 | |
| Island-loss [14] | 69.260 | 71.684 | |
| ArcFace-loss [19] | 69.685 | 70.828 | |
| Our method | 70.828 | 71.942 | |
| Method | Average Training Time Per Round (S) |
|---|---|
| Softmax-loss [9,10,11,12] | 37.28 |
| Island-loss [14] | 39.62 |
| ArcFace-loss [19] | 36.84 |
| Our method | 36.50 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, Y.; Duan, X.; Fan, P.; Zhao, Z.; Guo, X. Combining Fixed-Weight ArcFace Loss and Vision Transformer for Facial Expression Recognition. Sensors 2025, 25, 7166. https://doi.org/10.3390/s25237166
Xu Y, Duan X, Fan P, Zhao Z, Guo X. Combining Fixed-Weight ArcFace Loss and Vision Transformer for Facial Expression Recognition. Sensors. 2025; 25(23):7166. https://doi.org/10.3390/s25237166
Chicago/Turabian StyleXu, Yunhao, Xinran Duan, Peihao Fan, Zengshun Zhao, and Xiaoyu Guo. 2025. "Combining Fixed-Weight ArcFace Loss and Vision Transformer for Facial Expression Recognition" Sensors 25, no. 23: 7166. https://doi.org/10.3390/s25237166
APA StyleXu, Y., Duan, X., Fan, P., Zhao, Z., & Guo, X. (2025). Combining Fixed-Weight ArcFace Loss and Vision Transformer for Facial Expression Recognition. Sensors, 25(23), 7166. https://doi.org/10.3390/s25237166

