AFCN: An Attention-Based Fusion Consistency Network for Facial Emotion Recognition
Abstract
1. Introduction
2. Related Works
2.1. Gradient-Weighted Class Activation Mapping
2.2. Weighted Cross Entropy Loss Function
3. The Proposed Method
3.1. Overview of the Attention-Based Fusion Consistency Network
3.2. Sample Certainty Analysis Module
3.3. Label Correction Module
3.4. Attention Generation Module
3.5. Fusion Consistency Learning Module
3.6. Joint Optimization
4. Experiments and Analyses
4.1. Experimental Setup
4.2. Comparisons of Classification Performance
4.3. Ablation Studies
4.4. Parameters Analyses
4.5. Potential Limitations and Practical Considerations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pantic, M.; Rothkrantz, L.J.M. Automatic analysis of facial expressions: The state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 22, 1424–1445. [Google Scholar] [CrossRef]
- Fang, B.; Li, X.; Han, G.; He, J. Facial expression recognition in educational research from the perspective of machine learning: A systematic review. IEEE Access 2023, 11, 112060–112074. [Google Scholar] [CrossRef]
- Ortega-Garcia, J.; Fierrez, J.; Alonso-Fernandez, F.; Galbally, J.; Freire, M.R.; Gonzalez-Rodriguez, J.; Garcia-Mateo, C.; Alba-Castro, J.L.; Gonzalez-Agulla, E.; Otero-Muras, E.; et al. The Multiscenario Multienvironment BioSecure Multimodal Database (BMDB). IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1097–1111. [Google Scholar] [CrossRef] [PubMed]
- Li, S.; Deng, W. Deep Facial Expression Recognition: A Survey. IEEE Trans. Affect. Comput. 2022, 13, 1195–1215. [Google Scholar] [CrossRef]
- Hino, H.; Murata, N. Information estimators for weighted observations. Neural Netw. 2013, 46, 260–275. [Google Scholar] [CrossRef]
- Zheng, C.; Mendieta, M.; Chen, C. Poster: A pyramid cross-fusion transformer network for facial expression recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 3146–3155. [Google Scholar]
- Shi, G.; Mao, S.; Gou, S.; Yan, D.; Jiao, L.; Xiong, L. Adaptively enhancing facial expression crucial regions via a local non-local joint network. Mach. Intell. Res. 2024, 21, 331–348. [Google Scholar] [CrossRef]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar] [CrossRef]
- Dong, J.; Wang, W.; Tan, T. CASIA Image Tampering Detection Evaluation Database. In Proceedings of the 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China, 6–10 July 2013; pp. 422–426. [Google Scholar] [CrossRef]
- Lyons, M.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding facial expressions with Gabor wavelets. In Proceedings of the Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar] [CrossRef]
- Barsoum, E.; Zhang, C.; Canton Ferrer, C.; Zhang, Z. Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), Tokyo, Japan, 12–16 November 2016. [Google Scholar]
- Benitez-Quiroz, C.F.; Srinivasan, R.; Martinez, A.M. EmotioNet: An Accurate, Real-Time Algorithm for the Automatic Annotation of a Million Facial Expressions in the Wild. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 5562–5570. [Google Scholar] [CrossRef]
- Li, S.; Deng, W.; Du, J. Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 2584–2593. [Google Scholar]
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2017, 10, 18–31. [Google Scholar] [CrossRef]
- Gusak, J.; Katrutsa, A.; Daulbaev, T.; Cichocki, A.; Oseledets, I.V. Meta-Solver for Neural Ordinary Differential Equations. arXiv 2021, arXiv:2103.08561. [Google Scholar] [CrossRef]
- Frenay, B.; Verleysen, M. Classification in the Presence of Label Noise: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2014, 25, 845–869. [Google Scholar] [CrossRef]
- Song, H.; Kim, M.; Park, D.; Shin, Y.; Lee, J.G. Learning From Noisy Labels with Deep Neural Networks: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 8135–8153. [Google Scholar] [CrossRef]
- Singh, G.; Brahma, D.; Rai, P.; Modi, A. Text-based fine-grained emotion prediction. IEEE Trans. Affect. Comput. 2023, 15, 405–416. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, X.; Kauttonen, J.; Zhao, G. Uncertain Label Correction via Auxiliary Action Unit Graphs for Facial Expression Recognition. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 777–783. [Google Scholar]
- Le, N.; Nguyen, K.; Tran, Q.; Tjiputra, E.; Le, B.; Nguyen, A. Uncertainty-Aware Label Distribution Learning for Facial Expression Recognition. In Proceedings of the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 2–7 January 2023; pp. 6077–6086. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, C.; Ling, X.; Deng, W. Learn From All: Erasing Attention Consistency for Noisy Label Facial Expression Recognition. In Computer Vision—ECCV 2022; Springer: Cham, Switzerland, 2022. [Google Scholar]
- Mao, S.; Zhang, Y.; Yan, D.; Chen, P. Heterogeneous Dual-Branch Emotional Consistency Network for Facial Expression Recognition. IEEE Signal Process. Lett. 2025, 32, 566–570. [Google Scholar] [CrossRef]
- Wang, K.; Peng, X.; Yang, J.; Lu, S.; Qiao, Y. Suppressing Uncertainties for Large-Scale Facial Expression Recognition. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 6896–6905. [Google Scholar]
- Zhang, Y.; Wang, C.; Deng, W. Relative Uncertainty Learning for Facial Expression Recognition. In NIPS ’21, Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 17616–17627. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Guo, Y.; Liu, Y.; Oerlemans, A.; Lao, S.; Wu, S.; Lew, M.S. Deep learning for visual understanding: A review. Neurocomputing 2016, 187, 27–48. [Google Scholar] [CrossRef]
- Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
- Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
- Tang, H.; Yuan, C.; Li, Z.; Tang, J. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognit. 2022, 130, 108792. [Google Scholar] [CrossRef]
- Selvaraju, R.R.; Das, A.; Vedantam, R.; Cogswell, M.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2016, 128, 336–359. [Google Scholar] [CrossRef]
- Li, P.; Tao, H.; Zhou, H.; Zhou, P.; Deng, Y. Enhanced Multiview attention network with random interpolation resize for few-shot surface defect detection. Multimed. Syst. 2025, 31, 36. [Google Scholar] [CrossRef]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 839–847. [Google Scholar]
- Omeiza, D.; Speakman, S.; Cintas, C.; Weldermariam, K. Smooth grad-cam++: An enhanced inference level visualization technique for deep convolutional neural network models. arXiv 2019, arXiv:1908.01224. [Google Scholar]
- Desai, S.; Ramaswamy, H.G. Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass, CO, USA, 1–5 March 2020; pp. 983–991. [Google Scholar]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Ho, Y.; Wookey, S. The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling. IEEE Access 2020, 8, 4806–4813. [Google Scholar] [CrossRef]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]




| Methods | Noise | RAF-DB (%) | FER+ (%) | AffectNet (%) | 
|---|---|---|---|---|
| Baseline | 10% | 81.01 | 83.29 | 57.24 | 
| SCN | 10% | 82.15 | 84.99 | 58.60 | 
| RUL | 10% | 86.17 | 86.93 | 60.54 | 
| EAC | 10% | 88.02 | 87.03 | 61.11 | 
| AFCN (Ours) | 10% | 82.15 | 84.99 | 58.60 | 
| Baseline | 20% | 77.98 | 82.34 | 55.89 | 
| SCN | 20% | 79.79 | 83.35 | 57.51 | 
| RUL | 20% | 84.32 | 85.05 | 59.01 | 
| EAC | 20% | 86.05 | 86.07 | 60.29 | 
| AFCN (Ours) | 20% | 87.78 | 88.50 | 59.58 | 
| Baseline | 30% | 75.50 | 79.77 | 52.16 | 
| SCN | 30% | 77.45 | 82.20 | 54.60 | 
| RUL | 30% | 82.06 | 83.90 | 56.93 | 
| EAC | 30% | 84.42 | 85.44 | 58.31 | 
| AFCN (Ours) | 30% | 87.45 | 87.83 | 58.71 | 
| Cases | Label Correction | RAF-DB | ||||
|---|---|---|---|---|---|---|
| 1 | × | √ | √ | √ | √ | 86.57% | 
| 2 | √ | × | √ | √ | √ | 86.39% | 
| 3 | √ | √ | × | √ | √ | 85.01% | 
| 4 | √ | √ | √ | × | √ | 86.70% | 
| 5 | √ | √ | √ | √ | × | 86.60% | 
| 6 | √ | √ | √ | √ | √ | 87.45% | 
| Dataset | ||||||
|---|---|---|---|---|---|---|
| 1 | 3 | 5 | 7 | 10 | ||
| RAF-DB | 0.1 | 87.03% | 86.93% | 87.31% | 86.83% | 86.54% | 
| 0.3 | 86.99% | 87.35% | 86.96% | 86.44% | 86.86% | |
| 0.5 | 87.08% | 87.45% | 86.54% | 87.29% | 86.57% | |
| 0.7 | 87.13% | 87.39% | 86.89% | 86.76% | 86.73% | |
| 1 | 87.21% | 87.27% | 87.16% | 86.97% | 86.75% | |
| FERPlus | 0.1 | 86.78% | 86.75% | 86.73% | 86.62% | 86.59% | 
| 0.3 | 86.89% | 87.01% | 87.15% | 86.97% | 86.83% | |
| 0.5 | 87.10% | 87.83% | 87.71% | 86.65% | 86.57% | |
| 0.7 | 87.21% | 87.46% | 86.97% | 87.03% | 86.88% | |
| 1 | 87.15% | 87.32% | 87.36% | 86.81% | 86.69% | |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wei, Q.; Pei, H.; Mao, S. AFCN: An Attention-Based Fusion Consistency Network for Facial Emotion Recognition. Electronics 2025, 14, 3523. https://doi.org/10.3390/electronics14173523
Wei Q, Pei H, Mao S. AFCN: An Attention-Based Fusion Consistency Network for Facial Emotion Recognition. Electronics. 2025; 14(17):3523. https://doi.org/10.3390/electronics14173523
Chicago/Turabian StyleWei, Qi, Hao Pei, and Shasha Mao. 2025. "AFCN: An Attention-Based Fusion Consistency Network for Facial Emotion Recognition" Electronics 14, no. 17: 3523. https://doi.org/10.3390/electronics14173523
APA StyleWei, Q., Pei, H., & Mao, S. (2025). AFCN: An Attention-Based Fusion Consistency Network for Facial Emotion Recognition. Electronics, 14(17), 3523. https://doi.org/10.3390/electronics14173523
 
        



 
       