Facial Expression Recognition with Contrastive Disentangled Generative Adversarial Network
Abstract
1. Introduction
- (1)
- To learn distangled expression representations, a model called the CD-GAN is proposed. Through adversarial learning, the CD-GAN is encouraged to separate the redundant identity information from the expression information, resulting in better performance on FER.
- (2)
- To integrate conventional facial expression recognition methods with Generative Adversarial Networks (GANs), the encoder component of the GAN is modified to incorporate a ResNet-18 architecture capable of loading pre-trained weights. By fine-tuning the model based on the pre-trained network, the performance of the expression recognition system is effectively enhanced.
- (3)
- A new supervised contrast loss function learning strategy has been formulated for detangling, based on the concept of contrast learning. This strategy involves bringing similar expression features closer together and distancing other features in the embedding space, thereby maximizing the use of the limited supervision available.
2. Related Work
2.1. Identity-Free Facial Expression Recognition
2.2. Contrastive Learning
3. Proposed Method
3.1. Network Architecture of CD-GAN
3.1.1. Generator
3.1.2. Discriminator
3.2. Contrastive Disentanglement Loss
3.3. Model Training
3.3.1. Discriminator Loss
3.3.2. Generator Loss
4. Experiment
4.1. Dataset and Experimental Setup
4.2. Experiment Setting
4.3. Quantitative Analysis
4.4. Visualization and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, S.; Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 2022, 13, 1195–1215. [Google Scholar] [CrossRef]
- Sajjad, M.; Ullah, F.M.; Ullah, M.; Christodoulou, G.; Cheikh, F.A.; Hijji, M.; Muhammad, K.; Rodrigues, J.J.P.C. A comprehensive survey on deep facial expression recognition. Eng. J. 2023, 68, 817–840. [Google Scholar]
- Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
- Zhang, S.; Zhang, Y.; Zhang, Y.; Wang, Y.; Song, Z. A Dual-Direction Attention Mixed Feature Network for Facial Expression Recognition. Electronics 2023, 12, 3595. [Google Scholar] [CrossRef]
- Jiang, M.; Yin, S. Facial expression recognition based on convolutional block attention module and multi-feature fusion. Int. J. Comput. Vis. Robot. 2023, 13, 21–37. [Google Scholar] [CrossRef]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Jiang, J.; Deng, W. Disentangling identity and pose for facial expression recognition. IEEE Trans. Affect. Comput. 2022, 13, 1868–1878. [Google Scholar] [CrossRef]
- Liu, X.; Kumar, B.V.K.V.; You, J.; Jia, P. Adaptive Deep Metric Learning for Identity-Aware Facial Expression Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 522–531. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Yang, H.; Ciftci, U.; Yin, L. Facial expression recognition by De-expression residue learning. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2168–2177. [Google Scholar]
- Yang, H.; Zhang, Z.; Yin, L. Identity-Adaptive Facial Expression Recognition through Expression Regeneration Using Conditional Generative Adversarial Networks. In Proceedings of the 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), Xi’an, China, 15–19 May 2018; pp. 294–301. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8789–8797. [Google Scholar]
- Cai, J.; Meng, Z.; Khan, A.S.; O’Reilly, J.; Li, Z.; Han, S.; Tong, Y. Identity-free facial expression recognition using conditional generative adversarial network. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 1344–1348. [Google Scholar]
- Zhang, W.; Ji, X.; Chen, K.; Ding, Y.; Fan, C. Learning a facial expression embedding disentangled from identity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6759–6768. [Google Scholar]
- Gutmann, M.; Hyvärinen, A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010; pp. 297–304. [Google Scholar]
- Kihyuk, S. Improved deep metric learning with multi-class n-pair loss objective. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. Adv. Neural Inf. Process. Syst. 2020, 33, 9912–9924. [Google Scholar]
- Li, J.; Zhou, P.; Xiong, C.; Hoi, S.C.H. Prototypical Contrastive Learning of Unsupervised Representations. arXiv 2020, arXiv:2005.04966. [Google Scholar]
- He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 9729–9738. [Google Scholar]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
- Kingma, D.P.; Rezende, D.J.; Mohamed, S.; Welling, M. Semi-supervised learning with deep generative models. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; Volume 27. [Google Scholar]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- Chen, L.; Yen, Y. Taiwanese Facial Expression Image Database; Brain Mapping Laboratory, Institute of Brain Science, National Yang-Ming University: Taipei, Taiwan, 2007. [Google Scholar]
- Li, S.; Deng, W.; Du, J.P. Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition. IEEE Trans. Image Process. 2018, 28, 356–370. [Google Scholar] [CrossRef] [PubMed]
- Shen, W.; Liu, R. Learning residual images for face attribute manipulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4030–4038. [Google Scholar]
- Xie, S.; Hu, H. Facial expression recognition with two-branch disentangled generative adversarial network. IEEE Trans. Circuits Syst. Video Technol. 2020, 31, 2359–2371. [Google Scholar] [CrossRef]
- Vaidya, K.S.; Patil, P.M.; Alagirisamy, M. Hybrid CNN-SVM Classifier for Human Emotion Recognition Using ROI Extraction and Feature Fusion. Wirel. Pers. Commun. 2023, 132, 1099–1135. [Google Scholar] [CrossRef]
- Farajzadeh, N.; Hashemzadeh, M. Exemplar-based facial expression recognition. Inf. Sci. 2018, 460, 318–330. [Google Scholar] [CrossRef]
- Chirra, V.R.R.; Uyyala, S.R.; Kolli, V.K.K. Virtual facial expression recognition using deep CNN with ensemble learning. J. Ambient. Intell. Humaniz. Comput. 2021, 12, 10581–10599. [Google Scholar] [CrossRef]
- Baygin, M.; Tuncer, I.; Dogan, S.; Barua, P.D.; Tuncer, T.; Cheong, K.H.; Acharya, U.R. Automated facial expression recognition using exemplar hybrid deep feature generation technique. Soft Comput. 2023, 27, 8721–8737. [Google Scholar] [CrossRef]
- Fard, A.P.; Mahoor, M.H. Ad-corre: Adaptive correlation-based loss for facial expression recognition in the wild. IEEE Access 2022, 10, 26756–26768. [Google Scholar] [CrossRef]
- Wang, K.; Peng, X.; Yang, J.; Meng, D.; Qiao, Y. Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 2020, 29, 4057–4069. [Google Scholar] [CrossRef]
- Saurav, S.; Gidde, P.; Saini, R.; Singh, S. Dual integrated convolutional neural network for real-time facial expression recognition in the wild. Vis. Comput. 2022, 38, 1083–1096. [Google Scholar] [CrossRef]
- Xue, F.; Wang, Q.; Guo, G. Transfer: Learning relation-aware facial expression representations with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montréal, QC, Canada, 11–17 October 2021; pp. 3601–3610. [Google Scholar]
- Wu, H.; Jia, J.; Xie, L.; Qi, G.; Shi, Y.; Tian, Q. Cross-VAE: Towards Disentangling Expression from Identity For Human Faces. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4087–4091. [Google Scholar]
Methods | CK+ | TFEID |
---|---|---|
DAM-CNN [25] | 95.88% | 93.20% |
TD-GAN [26] | 97.53% | 97.20% |
CNN-SVM [27] | 94.12% | 85.10% |
Exemplar-based [28] | 97.14% | 98.90% |
DCNN-VC [29] | 99.42% | 99.58% |
Auto-FER [30] | 100% | 97.01% |
ResNet-18 | 98.72% | 97.53% |
CD-GAN (ours) | 99.11% | 98.21% |
Methods | RAF-DB |
---|---|
TD-GAN [25] | 81.91% |
Ad-core [31] | 86.96% |
RAN [32] | 86.90% |
gACNN [33] | 85.53% |
DLN [14] | 86.40% |
ViT-FER [34] | 90.91% |
Cross-VAE [35] | 84.81% |
ResNet-18 | 85.13% |
CD-GAN (ours) | 88.21% |
Methods | Avg. Accuracy | RAF-DB |
---|---|---|
ResNet-18 | 79.46% | 0.8013 |
CD-GAN (ours) | 82.35% | 0.8432 |
Modality | RAF-DB |
---|---|
Baseline (ResNet-18) | 85.13% |
Baseline+FEC | 85.82% |
Baseline+CL | 86.54 % |
Baseline+FEC+CL (CD-GAN) | 88.21% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, S.; Ni, S.; Cai, H. Facial Expression Recognition with Contrastive Disentangled Generative Adversarial Network. Electronics 2025, 14, 3795. https://doi.org/10.3390/electronics14193795
Liu S, Ni S, Cai H. Facial Expression Recognition with Contrastive Disentangled Generative Adversarial Network. Electronics. 2025; 14(19):3795. https://doi.org/10.3390/electronics14193795
Chicago/Turabian StyleLiu, Shuaishi, Shihao Ni, and Huaze Cai. 2025. "Facial Expression Recognition with Contrastive Disentangled Generative Adversarial Network" Electronics 14, no. 19: 3795. https://doi.org/10.3390/electronics14193795
APA StyleLiu, S., Ni, S., & Cai, H. (2025). Facial Expression Recognition with Contrastive Disentangled Generative Adversarial Network. Electronics, 14(19), 3795. https://doi.org/10.3390/electronics14193795