Cross-Domain Facial Expression Recognition through Reliable Global–Local Representation Learning and Dynamic Label Weighting
Abstract
:1. Introduction
2. Related Work
2.1. Cross-Domain Facial Expression Recognition
2.2. Pseudo-Label Learning
3. Methodology
3.1. Overview
3.2. Pseudo-Complementary Label Generation
3.2.1. Pseudo-Label Generation
3.2.2. Complementary Label Learning
3.3. Label Dynamic Weight Matching
3.4. Loss Function
Algorithm 1 Global–Local Representation Learning and Dynamic Label Weighting framework |
Input: : source domain dataset; : target domain dataset; C: total number of categories : fixed threshold; : the label weight for category j; : the number of generated pseudo-labels for category j; : weak augmentation strategy; |
1: while not reach the maximun iteration do |
2: for to C do |
3: |
4: |
5: end for |
6: Sample mini-batch of size B from : |
7: Sample mini-batch of size B from : |
8: for to B do |
9: if then |
10: |
11: |
12: ; |
13: end if |
14: end for |
15: |
16: end while |
Output: Model parameters. |
3.5. Evaluation Metrics
3.6. Implementation Details
3.6.1. Network Architecture
3.6.2. Training Details
4. Experiments and Analyses
4.1. Datasets
4.2. Comparisons with State-of-the-Art Methods
4.3. Ablation Studies
4.4. Parameter Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Lu, C.; Zong, Y.; Zheng, W.; Li, Y.; Tang, C.; Schuller, B.W. Domain invariant feature learning for speaker-independent speech emotion recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 2217–2230. [Google Scholar] [CrossRef]
- Zhang, T.; Zong, Y.; Zheng, W.; Chen, C.L.P.; Hong, X.; Tang, C.; Cui, Z.; Zhao, G. Cross-database micro-expression recognition: A benchmark. IEEE Trans. Knowl. Data Eng. 2022, 34, 544–559. [Google Scholar] [CrossRef]
- Zhang, S.; Zhang, Y.; Zhang, Y.; Wang, Y.; Song, Z. A Dual-Direction Attention Mixed Feature Network for Facial Expression Recognition. Electronics 2023, 12, 3595. [Google Scholar] [CrossRef]
- Yan, L.; Shi, Y.; Wei, M.; Wu, Y. Multi-feature fusing local directional ternary pattern for facial expressions signal recognition based on video communication system. Alex. Eng. J. 2023, 63, 307–320. [Google Scholar] [CrossRef]
- Li, S.; Deng, W.H. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 2020, 13, 1195–1215. [Google Scholar] [CrossRef]
- Sun, Z.; Zhong, H.H.; Bai, J.T.; Liu, M.Y.; Hu, Z.P. A discriminatively deep fusion approach with improved conditional gan (im-cgan) for facial expression recognition. Pattern Recognit. 2023, 135, 109157. [Google Scholar] [CrossRef]
- Li, S.; Deng, W.; Du, J. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2852–2861. [Google Scholar]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.H.; et al. Challenges in representation learning: A report on three machine learning contests. In Proceedings of the Neural Information Processing, Daegu, Republic of Korea, 3–7 November 2013; pp. 117–124. [Google Scholar]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition-Workshops, San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- Lyons, M.; Akamatsu, S.; Kamachi, M.; Gyoba, J. Coding facial expressions with gabor wavelets. In Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, 14–16 April 1998; pp. 200–205. [Google Scholar]
- Dhall, A.; Goecke, R.; Lucey, S.; Gedeon, T. Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Barcelona, Spain, 6–13 November 2011; pp. 2106–2112. [Google Scholar]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 2018, 126, 550–569. [Google Scholar] [CrossRef]
- Yan, H. Transfer subspace learning for cross-dataset facial expression recognition. Neurocomputing 2016, 208, 165–173. [Google Scholar] [CrossRef]
- Miao, Y.Q.; Araujo, R.; Kamel, M.S. Cross-domain facial expression recognition using supervised kernel mean matching. In Proceedings of the International Conference on Machine Learning and Applications, Boca Raton, FL, USA, 12–15 December 2012; Volume 2, pp. 326–332. [Google Scholar]
- Sun, Z.; Chiong, R.; Hu, Z.P.; Dhakal, S. A dynamic constraint representation approach based on cross-domain dictionary learning for expression recognition. J. Vis. Commun. Image Represent. 2022, 85, 103458. [Google Scholar] [CrossRef]
- Ni, T.; Zhang, C.; Gu, X. Transfer model collaborating metric learning and dictionary learning for cross-domain facial expression recognition. IEEE Trans. Comput. Soc. Syst. 2020, 8, 1213–1222. [Google Scholar] [CrossRef]
- Wang, C.; Ding, J.; Yan, H.; Shen, S. A Prototype-Oriented Contrastive Adaption Network for Cross-Domain Facial Expression Recognition. In Proceedings of the Asian Conference on Computer Vision, Macau, China, 4–8 December 2022; pp. 4194–4210. [Google Scholar]
- Bozorgtabar, B.; Mahapatra, D.; Thiran, J.P. ExprADA: Adversarial domain adaptation for facial expression analysis. Pattern Recognit. 2020, 100, 107111. [Google Scholar] [CrossRef]
- Liang, G.; Wang, S.; Wang, C. Pose-aware adversarial domain adaptation for personalized facial expression recognition. arXiv 2020, arXiv:2007.05932. [Google Scholar]
- Yang, H.; Zhang, Z.; Yin, L. Identity-adaptive facial expression recognition through expression regeneration using conditional generative adversarial networks. In Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition, Xi’an, China, 15–19 May 2018; pp. 294–301. [Google Scholar]
- Xie, Y.; Chen, T.; Pu, T.; Wu, H.; Lin, L. Adversarial graph representation adaptation for cross-domain facial expression recognition. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 1255–1264. [Google Scholar]
- Chen, T.; Pu, T.; Wu, H.; Xie, Y.; Liu, L.; Lin, L. Cross-domain facial expression recognition: A unified evaluation benchmark and adversarial graph learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 9887–9903. [Google Scholar] [CrossRef]
- Li, Y.; Gao, Y.; Chen, B.; Zhang, Z.; Zhu, L.; Lu, G. JDMAN: Joint discriminative and mutual adaptation networks for cross-domain facial expression recognition. In Proceedings of the ACM International Conference on Multimedia, Virtual Event, 20–24 October 2021; pp. 3312–3320. [Google Scholar]
- Xie, Y.; Gao, Y.; Lin, J.; Chen, T. Learning Consistent Global-Local Representation for Cross-Domain Facial Expression Recognition. In Proceedings of the International Conference on Pattern Recognition, Montreal, QC, Canada, 21–25 August 2022; pp. 2489–2495. [Google Scholar]
- Zheng, W.; Zong, Y.; Zhou, X.; Xin, M. Cross-domain color facial expression recognition using transductive transfer subspace learning. IEEE Trans. Affect. Comput. 2016, 9, 21–37. [Google Scholar] [CrossRef]
- Zong, Y.; Zheng, W.; Huang, X.; Shi, J.; Cui, Z.; Zhao, G. Domain regeneration for cross-database micro-expression recognition. IEEE Trans. Image Process. 2018, 27, 2484–2498. [Google Scholar] [CrossRef]
- Li, S.; Deng, W. A deeper look at facial expression dataset bias. IEEE Trans. Affect. Comput. 2020, 13, 881–893. [Google Scholar] [CrossRef]
- Lu, S.; Liu, M.; Yin, L.; Yin, Z.; Liu, X.; Zheng, W. The multi-modal fusion in visual question answering: A review of attention mechanisms. PeerJ Comput. Sci. 2023. [Google Scholar] [CrossRef]
- Luo, Y.; Zheng, L.; Guan, T.; Yu, J.; Yang, Y. Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2507–2516. [Google Scholar]
- Tsai, Y.H.; Sohn, K.; Schulter, S.; Chandraker, M. Domain adaptation for structured output via discriminative patch representations. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1456–1465. [Google Scholar]
- Zhang, X.; Huang, D.; Li, H.; Zhang, Y.; Xia, Y.; Liu, J. Self-training maximum classifier discrepancy for EEG emotion recognition. CAAI Trans. Intell. Technol. 2023. [Google Scholar] [CrossRef]
- Gao, D.; Wang, H.; Guo, X.; Wang, L.; Gui, G.; Wang, W.; Yin, Z.; Wang, S.; Liu, Y.; He, T. Federated Learning Based on CTC for Heterogeneous Internet of Things. IEEE Internet Things J. 2023. [Google Scholar] [CrossRef]
- Wang, H.; Xiao, R.; Li, Y.; Feng, L.; Niu, G.; Chen, G.; Zhao, J. PiCO: Contrastive Label Disambiguation for Partial Label Learning. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021; pp. 1–18. [Google Scholar]
- Zheng, D.; Xiao, J.; Chen, K.; Huang, X.; Chen, L.; Zhao, Y. Soft Pseudo-Label Shrinkage for Unsupervised Domain Adaptive Person Re-identification. Pattern Recognit. 2022, 127, 108615. [Google Scholar] [CrossRef]
- Wang, J.; Zhang, X.L. Improving pseudo labels with intra-class similarity for unsupervised domain adaptation. Pattern Recognit. 2023, 138, 109379. [Google Scholar] [CrossRef]
- Rizve, M.; Duarte, K.; Rawat, Y.; Shah, M. In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning. In Proceedings of the International Conference on Learning Representations, Virtual, 3–7 May 2021; pp. 1–20. [Google Scholar]
- Xie, Q.; Luong, M.T.; Hovy, E.; Le, Q.V. Self-Training With Noisy Student Improves ImageNet Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10684–10695. [Google Scholar]
- Sohn, K.; Berthelot, D.; Li, C.L.; Zhang, Z.; Carlini, N.; Cubuk, E.; Kurakin, A.; Zhang, H.; Raffel, C. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 596–608. [Google Scholar]
- Xie, Q.; Dai, Z.; Hovy, E.; Luong, T.; Le, Q. Unsupervised Data Augmentation for Consistency Training. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–12 December 2020; pp. 6256–6268. [Google Scholar]
- Yi, C.; Chen, H.; Xu, Y.; Liu, Y.; Jiang, L.; Tan, H. ATPL: Mutually enhanced adversarial training and pseudo labeling for unsupervised domain adaptation. Knowl. Based Syst. 2022, 250, 108831. [Google Scholar] [CrossRef]
- Zhang, B.; Wang, Y.; Hou, W.; WU, H.; Wang, J.; Okumura, M.; Shinozaki, T. FlexMatch: Boosting Semi-Supervised Learning with Curriculum Pseudo Labeling. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021; pp. 18408–18419. [Google Scholar]
- Guo, Y.; Zhang, L.; Hu, Y.; He, X.; Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 87–102. [Google Scholar]
- Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 1647–1657. [Google Scholar]
- Xu, R.; Li, G.; Yang, J.; Lin, L. Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1426–1435. [Google Scholar]
- Lee, C.Y.; Batra, T.; Baig, M.H.; Ulbricht, D. Sliced wasserstein discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10285–10295. [Google Scholar]
Method | (i) Source = RAF-DB, Backbone = ResNet50 | |||||
CK+ | JAFEE | SFEW2.0 | FER2013 | ExpW | Mean | |
CADA [43] | 72.09 | 52.11 | 53.44 | 57.61 | 63.15 | 59.68 |
SAFN [44] | 75.97 | 61.03 | 52.98 | 55.64 | 64.91 | 62.11 |
SWD [45] | 75.19 | 54.93 | 52.06 | 55.84 | 68.35 | 61.27 |
ECAN [27] | 79.77 | 57.28 | 52.29 | 56.46 | 47.37 | 58.63 |
AGRA [22] | 85.27 | 61.50 | 56.43 | 58.95 | 68.50 | 66.13 |
CGLRL [24] | 82.95 | 59.62 | 56.88 | 59.30 | 70.02 | 65.75 |
Ours | 88.37 | 68.54 | 56.88 | 61.10 | 73.25 | 69.63 |
Method | (ii) Source = FER2013, Backbone = ResNet50 | |||||
CK+ | JAFEE | SFEW2.0 | RAF-DB | ExpW | Mean | |
CADA [43] | 81.40 | 45.07 | 46.33 | 65.96 | 54.84 | 58.72 |
SAFN [44] | 68.99 | 45.07 | 38.07 | 62.80 | 53.91 | 53.77 |
SWD [45] | 65.89 | 49.30 | 45.64 | 65.28 | 56.05 | 56.43 |
ECAN [27] | 60.47 | 41.76 | 46.01 | 53.41 | 48.88 | 50.11 |
AGRA [22] | 85.69 | 52.74 | 49.31 | 67.62 | 60.23 | 63.12 |
CGLRL [24] | 79.84 | 53.52 | 52.29 | 71.84 | 61.94 | 63.87 |
Ours | 82.95 | 60.09 | 49.08 | 75.68 | 54.68 | 64.50 |
Method | (iii) Source = RAF-DB, Backbone = MobileNet-v2 | |||||
CK+ | JAFEE | SFEW2.0 | FER2013 | ExpW | Mean | |
CADA [43] | 62.79 | 53.05 | 43.12 | 49.34 | 59.40 | 53.54 |
SAFN [44] | 66.67 | 45.07 | 40.14 | 49.90 | 61.40 | 52.64 |
SWD [45] | 68.22 | 55.40 | 43.58 | 50.30 | 60.04 | 55.51 |
ECAN [27] | 53.49 | 43.08 | 35.09 | 45.77 | 45.09 | 44.50 |
AGRA [22] | 72.87 | 55.40 | 45.64 | 51.05 | 63.94 | 57.78 |
CGLRL [24] | 69.77 | 52.58 | 49.77 | 52.46 | 64.87 | 57.89 |
Ours | 75.97 | 49.77 | 47.25 | 52.97 | 64.89 | 58.17 |
Method | (iv) Source = FER2013, Backbone = MobileNet-v2 | |||||
CK+ | JAFEE | SFEW2.0 | RAF-DB | ExpW | Mean | |
CADA [43] | 66.67 | 50.23 | 41.28 | 53.15 | 51.84 | 52.63 |
SAFN [44] | 66.67 | 37.56 | 35.78 | 38.73 | 45.56 | 44.86 |
SWD [45] | 53.49 | 48.36 | 35.78 | 47.44 | 50.02 | 47.02 |
ECAN [27] | 55.65 | 44.12 | 28.46 | 42.31 | 41.53 | 42.41 |
AGRA [22] | 67.44 | 47.89 | 41.74 | 52.27 | 59.41 | 53.75 |
CGLRL [24] | 68.22 | 46.95 | 46.79 | 59.15 | 54.30 | 55.08 |
Ours | 76.74 | 47.72 | 46.79 | 64.79 | 54.70 | 58.13 |
Module | CK+ | JAFFE | SFEW2.0 | FER2013 | ExpW | Mean |
---|---|---|---|---|---|---|
Baseline | 73.64 | 59.15 | 52.29 | 56.88 | 68.93 | 62.18 |
PLG + CLG | 88.37 | 66.67 | 55.73 | 60.56 | 72.72 | 68.81 |
PLG + LDWM | 88.37 | 67.13 | 56.65 | 60.93 | 73.04 | 69.22 |
Ours | 88.37 | 68.54 | 56.88 | 61.10 | 73.25 | 69.63 |
Method | CK+ | JAFFE | SFEW2.0 | FER2013 | ExpW | Mean |
---|---|---|---|---|---|---|
Ours HFs | 88.37 | 68.08 | 55.96 | 60.08 | 72.65 | 69.03 |
Ours | 88.37 | 68.54 | 56.88 | 61.10 | 73.25 | 69.63 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, Y.; Cai, Y.; Bi, X.; Li, B.; Li, S.; Zheng, W. Cross-Domain Facial Expression Recognition through Reliable Global–Local Representation Learning and Dynamic Label Weighting. Electronics 2023, 12, 4553. https://doi.org/10.3390/electronics12214553
Gao Y, Cai Y, Bi X, Li B, Li S, Zheng W. Cross-Domain Facial Expression Recognition through Reliable Global–Local Representation Learning and Dynamic Label Weighting. Electronics. 2023; 12(21):4553. https://doi.org/10.3390/electronics12214553
Chicago/Turabian StyleGao, Yuefang, Yiteng Cai, Xuanming Bi, Bizheng Li, Shunpeng Li, and Weiping Zheng. 2023. "Cross-Domain Facial Expression Recognition through Reliable Global–Local Representation Learning and Dynamic Label Weighting" Electronics 12, no. 21: 4553. https://doi.org/10.3390/electronics12214553
APA StyleGao, Y., Cai, Y., Bi, X., Li, B., Li, S., & Zheng, W. (2023). Cross-Domain Facial Expression Recognition through Reliable Global–Local Representation Learning and Dynamic Label Weighting. Electronics, 12(21), 4553. https://doi.org/10.3390/electronics12214553