Attention-Guided Network Model for Image-Based Emotion Recognition
Abstract
:Featured Application
Abstract
1. Introduction
1.1. Related Work
1.1.1. Facial Emotion Recognition (FER)
1.1.2. Attention Mechanisms
1.1.3. Facial Emotion Recognition with Attention
2. Materials and Methods
2.1. Base Network Architecture
2.2. Attention Modules
2.2.1. Squeeze and Excitation (SE)
2.2.2. Convolutional Block Attention Model
2.2.3. Attention-Guided Facial Emotion Recognition
2.3. Guidance Strategy
2.4. Database Descriptions
2.5. Image Pre-Processing
2.6. Performance Criteria
2.7. Training Options
3. Results
3.1. Dataset Distribution
3.2. Model Performance
4. Discussion
5. Ablation Study
5.1. Model Stride Reduction
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Samek, W.; Wiegand, T.; Müller, K.-R. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arXiv 2017, arXiv:1708.08296. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Computer Vision–ECCV 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2018; Volume 11211, pp. 3–19. ISBN 978-3-030-01233-5. [Google Scholar]
- Ling, H.; Wu, J.; Wu, L.; Huang, J.; Chen, J.; Li, P. Self Residual Attention Network for Deep Face Recognition. IEEE Access 2019, 7, 55159–55168. [Google Scholar] [CrossRef]
- Fukui, H.; Hirakawa, T.; Yamashita, T.; Fujiyoshi, H. Attention Branch Network: Learning of Attention Mechanism for Visual Explanation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 10705–10714. [Google Scholar]
- Sepas-Moghaddam, A.; Etemad, A.; Pereira, F.; Correia, P.L. Facial Emotion Recognition Using Light Field Images with Deep Attention-Based Bidirectional LSTM. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 3367–3371. [Google Scholar]
- Jung, H.; Lee, S.; Yim, J.; Park, S.; Kim, J. Joint Fine-Tuning in Deep Neural Networks for Facial Expression Recognition. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2983–2991. [Google Scholar]
- Mehrabian, A. Communication without Words. In Communication Theory; Mortensen, C.D., Ed.; Routledge: London, UK, 2017; ISBN 978-1-351-52752-1. [Google Scholar]
- Arabian, H.; Wagner-Hartl, V.; Chase, J.G.; Möller, K. Image Pre-Processing Significance on Regions of Impact in a Trained Network for Facial Emotion Recognition. IFAC-Pap. 2021, 54, 299–303. [Google Scholar] [CrossRef]
- de Gelder, B. Why Bodies? Twelve Reasons for Including Bodily Expressions in Affective Neuroscience. Philos. Trans. R. Soc. B Biol. Sci. 2009, 364, 3475–3484. [Google Scholar] [CrossRef]
- de Gelder, B.; Van den Stock, J.; Meeren, H.K.M.; Sinke, C.B.A.; Kret, M.E.; Tamietto, M. Standing up for the Body. Recent Progress in Uncovering the Networks Involved in the Perception of Bodies and Bodily Expressions. Neurosci. Biobehav. Rev. 2010, 34, 513–527. [Google Scholar] [CrossRef]
- Lang, P.J.; Bradley, M.M. Emotion and the Motivational Brain. Biol. Psychol. 2010, 84, 437–450. [Google Scholar] [CrossRef]
- Vuilleumier, P. How Brains Beware: Neural Mechanisms of Emotional Attention. Trends Cogn. Sci. 2005, 9, 585–594. [Google Scholar] [CrossRef]
- Mancini, C.; Falciati, L.; Maioli, C.; Mirabella, G. Happy Facial Expressions Impair Inhibitory Control with Respect to Fearful Facial Expressions but Only When Task-Relevant. Emotion 2022, 22, 142–152. [Google Scholar] [CrossRef]
- Mirabella, G.; Grassi, M.; Mezzarobba, S.; Bernardis, P. Angry and Happy Expressions Affect Forward Gait Initiation Only When Task Relevant. Emotion 2023, 23, 387–399. [Google Scholar] [CrossRef] [PubMed]
- Mancini, C.; Falciati, L.; Maioli, C.; Mirabella, G. Threatening Facial Expressions Impact Goal-Directed Actions Only If Task-Relevant. Brain Sci. 2020, 10, 794. [Google Scholar] [CrossRef]
- Leo, M.; Del Coco, M.; Carcagni, P.; Distante, C.; Bernava, M.; Pioggia, G.; Palestra, G. Automatic Emotion Recognition in Robot-Children Interaction for ASD Treatment. In Proceedings of the 2015 IEEE International Conference on Computer Vision Workshop (ICCVW), Santiago, Chile, 7–13 December 2015; pp. 537–545. [Google Scholar]
- Ravindran, V.; Osgood, M.; Sazawal, V.; Solorzano, R.; Turnacioglu, S. Virtual Reality Support for Joint Attention Using the Floreo Joint Attention Module: Usability and Feasibility Pilot Study. JMIR Pediatr. Parent. 2019, 2, e14429. [Google Scholar] [CrossRef]
- Hendrycks, D.; Dietterich, T. Benchmarking Neural Network Robustness to Common Corruptions and Perturbations. arXiv 2019, arXiv:1903.12261. [Google Scholar]
- Zhao, G.; Huang, X.; Taini, M.; Li, S.Z.; Pietikäinen, M. Facial Expression Recognition from Near-Infrared Videos. Image Vis. Comput. 2011, 29, 607–619. [Google Scholar] [CrossRef]
- Ebner, N.C.; Riediger, M.; Lindenberger, U. FACES—A Database of Facial Expressions in Young, Middle-Aged, and Older Women and Men: Development and Validation. Behav. Res. Methods 2010, 42, 351–362. [Google Scholar] [CrossRef] [PubMed]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; Adaptive Computation and Machine Learning; MIT Press: Cambridge, MA, USA, 2009; ISBN 978-0-262-01319-2. [Google Scholar]
- Arabian, H.; Wagner-Hartl, V.; Geoffrey Chase, J.; Möller, K. Facial Emotion Recognition Focused on Descriptive Region Segmentation. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC), Virtual, 1–5 November 2021; pp. 3415–3418. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2020, 128, 336–359. [Google Scholar] [CrossRef]
- Khaireddin, Y.; Chen, Z. Facial Emotion Recognition: State of the Art Performance on FER2013. arXiv 2021, arXiv:2105.03588. [Google Scholar]
- Challenges in Representation Learning: Facial Expression Recognition Challenge. Available online: https://kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge (accessed on 9 August 2021).
- Mehendale, N. Facial Emotion Recognition Using Convolutional Neural Networks (FERC). SN Appl. Sci. 2020, 2, 446. [Google Scholar] [CrossRef]
- Zhao, X.; Liang, X.; Liu, L.; Li, T.; Han, Y.; Vasconcelos, N.; Yan, S. Peak-Piloted Deep Network for Facial Expression Recognition. arXiv 2017, arXiv:1607.06997. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. arXiv 2015, arXiv:1409.0575. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Li, Y.; Lu, G.; Li, J.; Zhang, Z.; Zhang, D. Facial Expression Recognition in the Wild Using Multi-Level Features and Attention Mechanisms. IEEE Trans. Affect. Comput. 2020, 14, 451–462. [Google Scholar] [CrossRef]
- Vardazaryan, A.; Mutter, D.; Marescaux, J.; Padoy, N. Weakly-Supervised Learning for Tool Localization in Laparoscopic Videos. In Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis; Stoyanov, D., Taylor, Z., Balocco, S., Sznitman, R., Martel, A., Maier-Hein, L., Duong, L., Zahnd, G., Demirci, S., Albarqouni, S., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2018; pp. 169–179. [Google Scholar]
- Fernandez, P.D.M.; Pena, F.A.G.; Ren, T.I.; Cunha, A. FERAtt: Facial Expression Recognition with Attention Net. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019; IEEE: Long Beach, CA, USA, 2019; pp. 837–846. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network In Network. arXiv 2014, arXiv:1312.4400. [Google Scholar]
- Viola, P.; Jones, M. Rapid Object Detection Using a Boosted Cascade of Simple Features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, Kauai, HI, USA, 8–14 December 2001; Volume 1, p. I–I. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
- Castrillón, M.; Déniz, O.; Guerra, C.; Hernández, M. ENCARA2: Real-Time Detection of Multiple Faces at Different Resolutions in Video Streams. J. Vis. Commun. Image Represent. 2007, 18, 130–140. [Google Scholar] [CrossRef]
- Haddad, J.; Lezoray, O.; Hamel, P. 3D-CNN for Facial Emotion Recognition in Videos. In Advances in Visual Computing; Bebis, G., Yin, Z., Kim, E., Bender, J., Subr, K., Kwon, B.C., Zhao, J., Kalkofen, D., Baciu, G., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12510, pp. 298–309. ISBN 978-3-030-64558-8. [Google Scholar]
- Yu, Z.; Liu, G.; Liu, Q.; Deng, J. Spatio-Temporal Convolutional Features with Nested LSTM for Facial Expression Recognition. Neurocomputing 2018, 317, 50–57. [Google Scholar] [CrossRef]
- Yu, Z.; Liu, Q.; Liu, G. Deeper Cascaded Peak-Piloted Network for Weak Expression Recognition. Vis. Comput. 2018, 34, 1691–1699. [Google Scholar] [CrossRef]
- Ding, H.; Zhou, S.K.; Chellappa, R. Facenet2expnet: Regularizing a Deep Face Recognition Net for Expression Recognition. In Proceedings of the 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, USA, 30 May–3 June 2017; IEEE: Washington, DC, USA, 2017; pp. 118–126. [Google Scholar]
- Zhang, W.; Li, D.; Min, X.; Zhai, G.; Guo, G.; Yang, X.; Ma, K. Perceptual Attacks of No-Reference Image Quality Models with Human-in-the-Loop. Adv. Neural Inf. Process. Syst. 2022, 35, 2916–2929. [Google Scholar]
Emotion | OULU-CASIA | FACES | ||
---|---|---|---|---|
Original | After Image Pre-Processing | Original | After Image Pre-Processing | |
Anger | 1790 | 1315 | 342 | 292 |
Disgust | 1633 | 1195 | 342 | 292 |
Fear | 1796 | 1503 | 342 | 303 |
Happiness | 1668 | 1502 | 342 | 338 |
Neutral | N/A | 1219 | 342 | 330 |
Sadness | 1668 | 1200 | 342 | 317 |
Surprise | 1701 | 1365 | N/A | N/A |
Total | 10,379 | 9299 | 2052 | 1872 |
Age Group | AGFER | Base | SEFER | CBAMFER |
---|---|---|---|---|
Young | 49.60 ± 0.95 | 17.35 ± 0.45 | 43.95 ± 8.89 | 48.81 ± 3.36 |
Middle-aged | 48.50 ± 1.30 | 18.07 ± 0.43 | 41.80 ± 7.57 | 46.57 ± 4.71 |
Old | 41.40 ± 1.92 | 20.17 ± 0.50 | 35.70 ± 4.66 | 40.56 ± 6.92 |
Mean | 46.50 ± 1.39 | 18.53 ± 0.46 | 40.48 ± 7.04 | 45.31 ± 5.00 |
Approach | Mean Accuracy (%) |
---|---|
Jung H. et al. [10] | 81.46 |
Haddad J. et al. [42] | 84.17 |
Zhao X. et al. [31] | 84.59 |
Yu Z. et al. [43] | 84.72 |
Yu Z. et al. [44] | 86.23 |
Ding H. et al. [45] | 87.71 |
Yu Z. et al. [43] (with LSTM) | 88.98 |
Proposed AGFER approach | 89.19 |
Dataset | Overall Stride | AGFER | SEFER | CBAMFER |
---|---|---|---|---|
OULU-CASIA | 32 | 86.81% ± 0.64 | 86.00% ± 0.74 | 86.84% ± 1.69 |
16 | 89.19% ± 0.75 | 88.73% ± 0.91 | 89.77% ± 0.70 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Arabian, H.; Battistel, A.; Chase, J.G.; Moeller, K. Attention-Guided Network Model for Image-Based Emotion Recognition. Appl. Sci. 2023, 13, 10179. https://doi.org/10.3390/app131810179
Arabian H, Battistel A, Chase JG, Moeller K. Attention-Guided Network Model for Image-Based Emotion Recognition. Applied Sciences. 2023; 13(18):10179. https://doi.org/10.3390/app131810179
Chicago/Turabian StyleArabian, Herag, Alberto Battistel, J. Geoffrey Chase, and Knut Moeller. 2023. "Attention-Guided Network Model for Image-Based Emotion Recognition" Applied Sciences 13, no. 18: 10179. https://doi.org/10.3390/app131810179
APA StyleArabian, H., Battistel, A., Chase, J. G., & Moeller, K. (2023). Attention-Guided Network Model for Image-Based Emotion Recognition. Applied Sciences, 13(18), 10179. https://doi.org/10.3390/app131810179