IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition
Abstract
1. Introduction
- IAE-Net is presented as a compact FER framework that integrates a DenseNet121 backbone with CA-based channel re-weighting and DA-based feature alignment to improve discriminative emphasis and geometric consistency without relying on heavy ensembles.
- We extend the proposed pipeline to an incremental learning setting using weight transfer, a class-balanced exemplar rehearsal buffer, and knowledge distillation, enabling sequential learning of new emotion classes while reducing catastrophic forgetting.
- Class-incremental performance is reported across multiple stages and retention is quantified using forgetting rate (FR) and total knowledge loss (TKL), providing evidence of stability under sequential updates in addition to offline accuracy.
- Extensive evaluation is conducted on FER2013, FERPlus, KDEF, and AffectNet under standard within-dataset protocols, including backbone and attention ablations under a unified training setup, along with Grad-CAM visualizations to inspect expression-relevant regions that drive predictions.
2. Methodology
2.1. Preprocessing and Data Preparation
2.2. DenseNet121 Backbone for Feature Extraction
2.3. Channel Attention (CA)
2.4. Deformable Attention (DA)
2.5. Classification Head and Training Objective
2.6. Continual Learning Under Sequential Updates
2.6.1. Exemplar Replay and Weight Transfer
2.6.2. Incremental Training Objective
3. Results and Discussion
3.1. Experimental Setup
3.2. Datasets
3.3. Comparative Analysis of the Proposed Model Against SOTA Methods
| Method | FER2013 (%) | FERPlus (%) | KDEF (%) | AffectNet (%) |
|---|---|---|---|---|
| VGG [27] | 65.80 | – | 86.75 | – |
| Mollahosseini [41] | 66.40 | – | – | – |
| InceptionV3 [24] | 68.86 | – | 90.25 | – |
| Inception-ResNetV2 [25] | 69.72 | – | 94.70 | – |
| DenseNet201 [23] | 68.52 | – | 92.52 | – |
| Transfer Learning DCNN [50] | 62.30 | – | – | – |
| VGG16 Transfer Learning [51] | 55.80 | 68.40 | – | 59.20 |
| FaceLiveNet [52] | 68.60 | – | – | – |
| Dense_FaceLiveNet [53] | 69.99 | – | 95.89 | – |
| Deep Fusion [54] | – | – | 98.30 | – |
| FMA+MLP [43] | 59.77 | – | 92.28 | – |
| FMA+LD [43] | 66.60 | – | 93.67 | – |
| FMA+SVM [43] | 61.11 | – | 92.05 | – |
| DCNN [50] | 63.80 | – | 89.54 | – |
| DBN [56] | – | – | 90.22 | – |
| CBiLSTM [28] | 58.09 | – | 94.23 | – |
| GA-Dense-FaceLiveNet [55] | – | – | 99.17 | – |
| PDREP [57] | 73.50 | – | 76.33 | – |
| GA [58] | 77.40 | – | – | – |
| iVABL [19] | 69.60 | – | 95.63 | – |
| VGG (tuned) [71] | 69.65 | – | 95.92 | – |
| ECAN [59] | 58.21 | – | 86.49 | 51.84 |
| AU-ViT [60] | – | 90.15 | – | 65.59 |
| VTFF [61] | – | 88.81 | – | 61.85 |
| ESSRN [62] | 50.98 | – | 80.83 | – |
| Novel CNN [63] | 72.16 | – | 89.93 | – |
| LCANet [64] | – | 91.43 | – | 64.43 |
| TAN + OLC [65] | – | 90.67 | – | 65.17 |
| MFER [66] | – | 91.09 | – | 67.06 |
| SSFER [67] | – | 85.82 | – | 65.37 |
| FERMixNet [68] | – | 90.58 | – | 66.40 |
| FMR-CapsNet [69] | – | 91.82 | – | 71.12 |
| EmotionLens [70] | – | – | – | 73.96 |
| IAE-Net (Ours) | 79.15 | 92.03 | 99.48 | 74.20 |
3.4. Ablation Study on Channel and Deformable Attention
| Dataset | Method | CA | DA | P | R | F1 | Acc (%) |
|---|---|---|---|---|---|---|---|
| FER2013 | Inception | × | ✓ | 0.6045 | 0.6416 | 0.6225 | 61.25 |
| ResNet50 | × | ✓ | – | – | – | 64.92 | |
| Xception | × | ✓ | 0.6733 | 0.6831 | 0.6781 | 67.73 | |
| DenseNet121 | ✓ | × | 0.7177 | 0.7514 | 0.7341 | 73.23 | |
| Inception | ✓ | ✓ | 0.7291 | 0.7618 | 0.7451 | 74.31 | |
| MobileNetV2 | ✓ | ✓ | 0.7512 | 0.5793 | 0.6541 | 66.00 | |
| ConvNeXt | ✓ | ✓ | 0.7313 | 0.7213 | 0.7263 | 72.30 | |
| ViT | ✓ | ✓ | 0.6738 | 0.7005 | 0.6869 | 68.55 | |
| ResNet50 | ✓ | × | 0.7051 | 0.7185 | 0.7117 | 71.00 | |
| Xception | ✓ | ✓ | 0.7059 | 0.7276 | 0.7166 | 71.90 | |
| DenseNet121 | ✓ | ✓ | 0.7890 | 0.8015 | 0.7952 | 79.15 | |
| FERPlus | Inception | × | ✓ | 0.8826 | 0.8795 | 0.8810 | 88.08 |
| ResNet50 | ✓ | ✓ | 0.8983 | 0.8942 | 0.8962 | 89.58 | |
| Xception | × | ✓ | 0.8452 | 0.8401 | 0.8427 | 84.25 | |
| DenseNet121 | × | ✓ | 0.8667 | 0.8770 | 0.8718 | 87.10 | |
| Inception | ✓ | × | 0.8135 | 0.8249 | 0.8191 | 81.75 | |
| MobileNetV2 | ✓ | ✓ | 0.8700 | 0.8599 | 0.8650 | 86.10 | |
| ConvNeXt | ✓ | ✓ | 0.8585 | 0.8746 | 0.8665 | 87.10 | |
| ViT | ✓ | ✓ | 0.8130 | 0.8356 | 0.8241 | 82.65 | |
| ResNet50 | × | ✓ | 0.7890 | 0.7933 | 0.7912 | 79.02 | |
| Xception | ✓ | × | 0.8382 | 0.8286 | 0.8334 | 83.23 | |
| DenseNet121 | ✓ | ✓ | 0.9080 | 0.9340 | 0.9208 | 92.03 | |
| KDEF | Inception | × | ✓ | 0.9328 | 0.9204 | 0.9266 | 92.63 |
| ResNet50 | ✓ | ✓ | 0.8981 | 0.8866 | 0.8923 | 89.15 | |
| Xception | ✓ | × | 0.9278 | 0.9361 | 0.9320 | 93.15 | |
| DenseNet121 | × | ✓ | 0.9545 | 0.9597 | 0.9570 | 95.86 | |
| Inception | ✓ | × | 0.9746 | 0.9712 | 0.9729 | 97.28 | |
| MobileNetV2 | ✓ | ✓ | 0.8913 | 0.8939 | 0.8926 | 88.95 | |
| ConvNeXt | ✓ | ✓ | 0.9520 | 0.9520 | 0.9502 | 95.10 | |
| ViT | ✓ | ✓ | 0.9215 | 0.9313 | 0.9264 | 92.35 | |
| ResNet50 | × | ✓ | 0.8642 | 0.8780 | 0.8710 | 87.00 | |
| Xception | × | ✓ | 0.9322 | 0.9458 | 0.9400 | 93.95 | |
| DenseNet121 | ✓ | ✓ | 0.9935 | 0.9960 | 0.9948 | 99.48 | |
| AffectNet | Inception | × | × | 0.6193 | 0.6596 | 0.6388 | 64.78 |
| ResNet50 | ✓ | × | 0.6344 | 0.6436 | 0.6390 | 63.53 | |
| Xception | ✓ | × | 0.6541 | 0.6638 | 0.6589 | 66.33 | |
| DenseNet121 | ✓ | × | 0.7261 | 0.7427 | 0.7343 | 72.90 | |
| Inception | × | ✓ | 0.7132 | 0.7302 | 0.7216 | 71.55 | |
| MobileNetV2 | ✓ | ✓ | 0.6223 | 0.6416 | 0.6318 | 62.65 | |
| ConvNeXt | ✓ | ✓ | 0.6912 | 0.6744 | 0.6827 | 69.00 | |
| ViT | ✓ | ✓ | 0.6456 | 0.6584 | 0.6519 | 64.60 | |
| ResNet50 | × | × | 0.6057 | 0.5906 | 0.5981 | 59.48 | |
| Xception | × | ✓ | 0.6789 | 0.6994 | 0.6890 | 68.73 | |
| DenseNet121 | ✓ | ✓ | 0.7363 | 0.7533 | 0.7447 | 74.20 |
3.5. Incremental Learning Evaluation
3.6. Qualitative Evaluation of Visual Results
3.7. Efficiency and Complexity Analysis
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| FER | Facial Emotion Recognition |
| DL | Deep Learning |
| ML | Machine Learning |
| CNN | Convolutional Neural Network |
| ND | Neurological and neurodegenerative disorder |
| IAE-Net | Incremental Attention-Enhanced Network |
| CA | Channel Attention |
| DA | Deformable Attention |
| GAP | Global Average Pooling |
| GMP | Global Max Pooling |
| SGD | Stochastic Gradient Descent |
References
- Mannepalli, K.; Sastry, P.N.; Suman, M. A novel adaptive fractional deep belief networks for speaker emotion recognition. Alex. Eng. J. 2017, 56, 485–497. [Google Scholar] [CrossRef]
- Nan, Y.; Ju, J.; Hua, Q.; Zhang, H.; Wang, B. A-MobileNet: An approach of facial expression recognition. Alex. Eng. J. 2022, 61, 4435–4444. [Google Scholar] [CrossRef]
- Jeong, M.; Ko, B.C. Driver’s facial expression recognition in real-time for safe driving. Sensors 2018, 18, 4270. [Google Scholar] [CrossRef]
- Shen, F.; Dai, G.; Lin, G.; Zhang, J.; Kong, W.; Zeng, H. EEG-based emotion recognition using 4D convolutional recurrent neural network. Cogn. Neurodyn. 2020, 14, 815–828. [Google Scholar] [CrossRef]
- Yun, S.S.; Choi, J.; Park, S.K.; Bong, G.Y.; Yoo, H. Social skills training for children with autism spectrum disorder using a robotic behavioral intervention system. Autism Res. 2017, 10, 1306–1323. [Google Scholar] [CrossRef]
- Kaulard, K.; Cunningham, D.W.; Bülthoff, H.H.; Wallraven, C. The MPI facial expression database—A validated database of emotional and conversational facial expressions. PLoS ONE 2012, 7, e32321. [Google Scholar]
- Canal, F.Z.; Müller, T.R.; Matias, J.C.; Scotton, G.G.; de Sa Junior, A.R.; Pozzebon, E.; Sobieranski, A.C. A survey on facial emotion recognition techniques: A state-of-the-art literature review. Inf. Sci. 2022, 582, 593–617. [Google Scholar] [CrossRef]
- Mellouk, W.; Handouzi, W. Facial emotion recognition using deep learning: Review and insights. Procedia Comput. Sci. 2020, 175, 689–694. [Google Scholar] [CrossRef]
- Afzal, S.; Khan, H.A.; Ali, S.; Lee, J.W. Virtual reality environment: Detecting and inducing emotions. In Proceedings of the 2025 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 11–14 January 2025; pp. 1–4. [Google Scholar]
- Ricciardi, L.; Visco-Comandini, F.; Erro, R.; Morgante, F.; Bologna, M.; Fasano, A.; Ricciardi, D.; Edwards, M.J.; Kilner, J. Facial emotion recognition and expression in Parkinson’s disease: An emotional mirror mechanism? PLoS ONE 2017, 12, e0169110. [Google Scholar] [CrossRef]
- Lin, J.; Chen, Y.; Wen, H.; Yang, Z.; Zeng, J. Weakness of eye closure with central facial paralysis after unilateral hemispheric stroke predicts a worse outcome. J. Stroke Cerebrovasc. Dis. 2017, 26, 834–841. [Google Scholar] [CrossRef] [PubMed]
- Ferrari, C.; Berretti, S.; Pala, P.; Del Bimbo, A. Measuring 3D face deformations from RGB images of expression rehabilitation exercises. Virtual Real. Intell. Hardw. 2022, 4, 306–323. [Google Scholar] [CrossRef]
- Gomez, L.F.; Morales, A.; Orozco-Arroyave, J.R.; Daza, R.; Fierrez, J. Improving Parkinson Detection Using Dynamic Features From Evoked Expressions in Video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Nashville, TN, USA, 20–25 June 2021; pp. 1562–1570. [Google Scholar]
- Rakshith, D.; Kenchannavar, H. Hybrid deep optimal network for recognizing emotions using facial expressions at real time. Int. J. Intell. Syst. Appl. 2024, 16, 47–58. [Google Scholar] [CrossRef]
- Tanaka, H.; Umeda, R.; Kurogi, T.; Nagata, Y.; Ishimaru, D.; Fukuhara, K.; Nakai, S.; Tenjin, M.; Nishikawa, T. Clinical utility of an assessment scale for engagement in activities for patients with moderate-to-severe dementia: Additional analysis. Psychogeriatrics 2022, 22, 433–444. [Google Scholar] [CrossRef]
- Bevilacqua, V.; D’Ambruoso, D.; Mandolino, G.; Suma, M. A new tool to support diagnosis of neurological disorders by means of facial expressions. In Proceedings of the IEEE International Symposium on Medical Measurements and Applications, Bari, Italy, 30–31 May 2011; pp. 544–549. [Google Scholar]
- Dantcheva, A.; Bilinski, P.; Nguyen, H.T.; Broutart, J.C.; Bremond, F. Expression recognition for severely demented patients in music reminiscence-therapy. In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017; pp. 783–787. [Google Scholar]
- Jin, B.; Qu, Y.; Zhang, L.; Gao, Z. Diagnosing Parkinson disease through facial expression recognition: Video analysis. J. Med. Internet Res. 2020, 22, e18697. [Google Scholar] [CrossRef]
- Kerr-Gaffney, J.; Mason, L.; Jones, E.; Hayward, H.; Ahmad, J.; Harrison, A.; Loth, E.; Murphy, D.; Tchanturia, K. Emotion recognition abilities in adults with anorexia nervosa are associated with autistic traits. J. Clin. Med. 2020, 9, 1057. [Google Scholar] [CrossRef]
- Carcagnì, P.; Del Coco, M.; Leo, M.; Distante, C. Facial expression recognition and histograms of oriented gradients: A comprehensive study. SpringerPlus 2015, 4, 645. [Google Scholar] [CrossRef]
- Soyel, H.; Demirel, H. Improved SIFT matching for pose robust facial expression recognition. In Proceedings of the 2011 IEEE International Conference on Automatic Face & Gesture Recognition (FG), Santa Barbara, CA, USA, 21–25 March 2011. [Google Scholar]
- Chen, L.; Zhou, C.; Shen, L. Facial expression recognition based on SVM in E-learning. IERI Procedia 2012, 2, 781–787. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the impact of residual connections on learning. Proc. AAAI Conf. Artif. Intell. 2017, 31, 4278–4284. [Google Scholar] [CrossRef]
- Tang, Y. Deep learning using linear support vector machines. arXiv 2013, arXiv:1306.0239. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Liang, D.; Liang, H.; Yu, Z.; Zhang, Y. Deep convolutional BiLSTM fusion network for facial expression recognition. Vis. Comput. 2020, 36, 499–508. [Google Scholar] [CrossRef]
- Pan, X.; Ying, G.; Chen, G.; Li, H.; Li, W. A deep spatial and temporal aggregation framework for video-based facial expression recognition. IEEE Access 2019, 7, 48807–48815. [Google Scholar] [CrossRef]
- Sun, N.; Li, Q.; Huan, R.; Liu, J.; Han, G. Deep spatial-temporal feature fusion for facial expression recognition in static images. Pattern Recognit. Lett. 2019, 119, 49–61. [Google Scholar] [CrossRef]
- Ye, Y.; Pan, Y.; Liang, Y.; Pan, J. A cascaded spatiotemporal attention network for dynamic facial expression recognition. Appl. Intell. 2023, 53, 5402–5415. [Google Scholar] [CrossRef]
- Huo, H.; Yu, Y.; Liu, Z. Facial expression recognition based on improved depthwise separable convolutional network. Multimed. Tools Appl. 2023, 82, 18635–18652. [Google Scholar] [CrossRef] [PubMed]
- Zhu, Q.; Mao, Q.; Jia, H.; Noi, O.E.N.; Tu, J. Convolutional relation network for facial expression recognition in the wild with few-shot learning. Expert Syst. Appl. 2022, 189, 116046. [Google Scholar] [CrossRef]
- Khan, T.; Choi, G.; Lee, S. EFFNet-CA: An efficient driver distraction detection based on multiscale features extractions and channel attention mechanism. Sensors 2023, 23, 3835. [Google Scholar] [CrossRef]
- Wadhawan, R.; Gandhi, T.K. Landmark-aware and part-based ensemble transfer learning network for static facial expression recognition from images. IEEE Trans. Artif. Intell. 2022, 4, 349–361. [Google Scholar] [CrossRef]
- Li, R.; Ren, C.; Zhang, X.; Hu, B. A novel ensemble learning method using multiple objective particle swarm optimization for subject-independent EEG-based emotion recognition. Comput. Methods Programs Biomed. 2022, 140, 105080. [Google Scholar] [CrossRef]
- Khan, T.; Yasir, M.; Choi, C. Attention-enhanced optimized deep ensemble network for effective facial emotion recognition. Alex. Eng. J. 2025, 119, 111–123. [Google Scholar] [CrossRef]
- Lin, Z.; Wang, Y.; Zhou, Y.; Du, F.; Yang, Y. MLM-EOE: Automatic depression detection via sentimental annotation and multi-expert ensemble. IEEE Trans. Affect. Comput. 2025, 16, 2842–2858. [Google Scholar] [CrossRef]
- Wang, Y.; Lin, Z.; Yang, C.; Zhou, Y.; Yang, Y. Automatic depression recognition with an ensemble of multimodal spatio-temporal routing features. IEEE Trans. Affect. Comput. 2025, 16, 1855–1872. [Google Scholar] [CrossRef]
- Afzal, S.; Khan, H.A.; Piran, M.J.; Lee, J.W. A comprehensive survey on affective computing: Challenges, trends, applications, and future directions. IEEE Access 2024, 12, 96150–96168. [Google Scholar] [CrossRef]
- Mollahosseini, A.; Chan, D.; Mahoor, M.H. Going deeper in facial expression recognition using deep neural networks. In Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, 7–10 March 2016; IEEE: Piscataway, NJ, USA, 2016. [Google Scholar]
- Lasri, I.; Riadsolh, A.; Elbelkacemi, M. Facial emotion recognition of deaf and hard-of-hearing students for engagement detection using deep learning. Educ. Inf. Technol. 2023, 28, 4069–4092. [Google Scholar] [CrossRef]
- Solis-Arrazola, M.A.; Sanchez-Yañez, R.E.; Garcia-Capulin, C.H.; Rostro-Gonzalez, H. Enhancing image-based facial expression recognition through muscle activation-based facial feature extraction. Comput. Vis. Image Underst. 2024, 240, 103927. [Google Scholar] [CrossRef]
- Hussain, A.; Ullah, W.; Khan, N.; Khan, Z.A.; Yar, H.; Baik, S.W. Class-incremental learning network for real-time anomaly recognition in surveillance environments. Pattern Recognit. 2026, 170, 112064. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Erhan, D.; Carrier, P.L.; Courville, A.; Mirza, M.; Hamner, B.; Cukierski, W.; Tang, Y.; Thaler, D.; Lee, D.H.; et al. Challenges in Representation Learning: A report on three machine learning contests. arXiv 2013, arXiv:1307.0414. [Google Scholar] [CrossRef]
- Barsoum, E.; Zhang, C.; Ferrer, C.C.; Zhang, Z. Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution. In Proceedings of the 18th ACM International Conference on Multimodal Interaction (ICMI), Tokyo, Japan, 12–16 November 2016; pp. 279–283. [Google Scholar] [CrossRef]
- Calvo, M.G.; Lundqvist, D. Facial expressions of emotion (KDEF): Identification under different display-duration conditions. Behav. Res. Methods 2008, 40, 109–115. [Google Scholar] [CrossRef] [PubMed]
- Mollahosseini, A.; Hasani, B.; Mahoor, M.H. AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild. IEEE Trans. Affect. Comput. 2019, 10, 18–31. [Google Scholar] [CrossRef]
- Kaggle. AffectNet Dataset (Kaggle Subset). 2024. Available online: https://www.kaggle.com/datasets/mstjebashazida/affectnet (accessed on 1 March 2026).
- Akhand, M.A.H.; Roy, S.; Siddique, N.; Kamal, M.A.S.; Shimamura, T. Facial emotion recognition using transfer learning in the deep CNN. Electronics 2021, 10, 1036. [Google Scholar] [CrossRef]
- Avcı, S.O.; Akay, O. Employment and Investigation of Various CNN Models and Datasets for Facial Expression Recognition and Classification. In Proceedings of the 2023 14th International Conference on Electrical and Electronics Engineering (ELECO), Bursa, Turkiye, 30 November–2 December 2023; pp. 1–5. [Google Scholar]
- Ming, Z.; Chazalon, J.; Luqman, M.M.; Visani, M.; Burie, J.C. FaceLiveNet: End-to-end networks combining face verification with interactive facial expression-based liveness detection. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 3507–3512. [Google Scholar]
- Hung, J.C.; Lin, K.C.; Lai, N.X. Recognizing learning emotion based on convolutional neural networks and transfer learning. Appl. Soft Comput. 2019, 84, 105724. [Google Scholar] [CrossRef]
- Sun, Z.; Zhang, H.; Bai, J.; Liu, M.; Hu, Z. A discriminatively deep fusion approach with improved conditional GAN (im-cGAN) for facial expression recognition. Pattern Recognit. 2023, 135, 109157. [Google Scholar] [CrossRef]
- Aghabeigi, F.; Nazari, S.; Osati Eraghi, N. An optimized facial emotion recognition architecture based on a deep convolutional neural network and genetic algorithm. Signal Image Video Process. 2024, 18, 1119–1129. [Google Scholar] [CrossRef]
- Vedantham, R.; Reddy, E.S. A robust feature extraction with optimized DBN-SMO for facial expression recognition. Multimed. Tools Appl. 2020, 79, 21487–21512. [Google Scholar] [CrossRef]
- Chen, X.; Li, D.; Tang, Y.; Huang, S.; Wu, Y.; Wu, Y. Pairwise dependency-based robust ensemble pruning for facial expression recognition. Multimed. Tools Appl. 2023, 83, 37089–37117. [Google Scholar] [CrossRef]
- Nida, N.; Yousaf, M.H.; Irtaza, A.; Javed, S.; Velastin, S.A. Spatial deep feature augmentation technique for FER using genetic algorithm. Neural Comput. Appl. 2024, 36, 4563–4581. [Google Scholar] [CrossRef]
- Li, S.; Deng, W. A deeper look at facial expression dataset bias. IEEE Trans. Affect. Comput. 2020, 13, 881–893. [Google Scholar] [CrossRef]
- Mao, S.; Li, X.; Wu, Q.; Peng, X. Au-aware vision transformers for biased facial expression recognition. arXiv 2022, arXiv:2211.06609. [Google Scholar] [CrossRef]
- Ma, F.; Sun, B.; Li, S. Facial Expression Recognition with Visual Transformers and Attentional Selective Fusion. IEEE Trans. Affect. Comput. 2023, 14, 1236–1248. [Google Scholar] [CrossRef]
- Xu, X.; Zong, Y.; Lu, C.; Jiang, X. Enhanced sample self-revised network for cross-dataset facial expression recognition. Entropy 2022, 24, 1475. [Google Scholar] [CrossRef]
- Khan, N.; Singh, A.V.; Agrawal, R. Novel Approach of Facial Expression Recognition for Cross-Datasets. In Proceedings of the 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 6–8 July 2023; pp. 1–7. [Google Scholar]
- Hu, P.; Tang, X.; Yang, L.; Kong, C.; Xia, D. LCANet: A model for analysis of students real-time sentiment by integrating attention mechanism and joint loss function. Complex Intell. Syst. 2025, 11, 27. [Google Scholar] [CrossRef]
- Ma, F.; Sun, B.; Li, S. Transformer-Augmented Network with Online Label Correction for Facial Expression Recognition. IEEE Trans. Affect. Comput. 2024, 15, 593–605. [Google Scholar] [CrossRef]
- Xu, J.; Li, Y.; Yang, G.; He, L.; Luo, K. Multiscale Facial Expression Recognition Based on Dynamic Global and Static Local Attention. IEEE Trans. Affect. Comput. 2025, 16, 683–696. [Google Scholar] [CrossRef]
- Oadud, M.A.; HaiYu, H.; Kayes, A. Facial Expression Recognition with Limited Labels: A Self-Supervised and Semi-Supervised Approach: Overcoming Label Scarcity in FER. In Proceedings of the 2025 10th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 16–18 May 2025; pp. 1171–1178. [Google Scholar] [CrossRef]
- Huang, Y.; Peng, J.; Zhang, W.; Zhao, T.; Chen, G.; Tan, S.; Yi, F.; Wang, L. FERMixNet: An Occlusion Robust Facial Expression Recognition Model with Facial Mixing Augmentation and Mid-Level Representation Learning. IEEE Trans. Affect. Comput. 2025, 16, 639–654. [Google Scholar] [CrossRef]
- Verma, B. In-the-wild facial emotion recognition using relation-aware geometric features and CapsNet. Comput. Electr. Eng. 2025, 128, 110685. [Google Scholar]
- Singh, M.; Bhargava, K.; Natarajan, K. EmotionLens: Optimizing FER with a Lightweight Residual CNN Architecture. In Proceedings of the 2025 3rd International Conference on Sustainable Computing and Data Communication Systems (ICSCDS), Erode, India, 6–8 August 2025; pp. 1–6. [Google Scholar]
- Khaireddin, Y.; Chen, Z. Facial emotion recognition: State of the art performance on FER2013. arXiv 2021, arXiv:2105.03588. [Google Scholar] [CrossRef]






| Dataset | Images | Classes | Train | Val | Test | Input Size | Reference |
|---|---|---|---|---|---|---|---|
| FER2013 | 35,887 | 7 | 28,709 | 3589 | 3589 | 48 × 48 | [45] |
| FERPlus | 35,887 | 8 | 28,709 | 3589 | 3589 | 48 × 48 | [46] |
| KDEF | 4900 | 7 | 3920 | 490 | 490 | 224 × 224 | [47] |
| AffectNet (Kaggle subset) | 28,175 | 8 | 22,540 | 2817 | 2818 | 224 × 224 | [48,49] |
| Backbone | Method | P (%) | R (%) | F1 (%) | Acc (%) | Params (M) |
|---|---|---|---|---|---|---|
| DenseNet121 | Baseline | 87.88 | 88.96 | 88.41 | 88.59 | 7.31 |
| DenseNet121 | + STN | 91.84 | 90.45 | 91.14 | 91.25 | 7.33 |
| DenseNet121 | + DeformConv | 93.80 | 91.41 | 92.59 | 92.50 | 10.07 |
| DenseNet121 | + DA | 97.21 | 95.07 | 96.13 | 96.35 | 8.44 |
| Incremental Step | KDEF | FER2013 | AffectNet | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | Acc | FR | P | R | F1 | Acc | FR | P | R | F1 | Acc | FR | |
| Stage 1 | 99.80 | 99.70 | 99.75 | 99.75 | 0 | 85.20 | 84.40 | 84.80 | 84.80 | 0 | 86.10 | 84.80 | 85.45 | 85.40 | 0 |
| Stage 2 | 99.74 | 99.65 | 99.69 | 99.70 | 0.05 | 83.90 | 83.10 | 83.50 | 83.60 | 0.40 | 84.20 | 83.10 | 83.65 | 83.60 | 0.32 |
| Stage 3 | 99.67 | 99.59 | 99.63 | 99.63 | 0.11 | 82.70 | 82.00 | 82.30 | 82.40 | 0.65 | 82.90 | 81.70 | 82.30 | 82.20 | 0.58 |
| Stage 4 | 99.62 | 99.54 | 99.58 | 99.58 | 0.09 | 81.80 | 81.00 | 81.40 | 81.50 | 0.58 | 81.30 | 80.50 | 80.90 | 80.80 | 0.47 |
| Stage 5 | 99.57 | 99.49 | 99.53 | 99.53 | 0.16 | 80.90 | 80.10 | 80.50 | 80.60 | 0.92 | 79.80 | 79.00 | 79.40 | 79.30 | 0.93 |
| Stage 6 | 99.54 | 99.46 | 99.50 | 99.48 | 0.20 | 80.20 | 79.40 | 79.80 | 79.90 | 0.88 | 78.60 | 77.70 | 78.15 | 78.10 | 0.88 |
| Final evaluation | – | – | – | – | – | 79.50 | 78.80 | 79.10 | 79.15 | 1.35 | 77.00 | 75.90 | 76.45 | 74.20 | 1.42 |
| TKL | 0.0871 | 0.5975 | 0.5750 | ||||||||||||
| Dataset | Method | Precision | Recall | F1 | Accuracy |
|---|---|---|---|---|---|
| KDEF | Naive fine-tuning | 97.15 | 95.51 | 96.32 | 96.10 |
| KDEF | LwF | 93.69 | 95.10 | 94.39 | 94.35 |
| KDEF | Proposed network | 99.54 | 99.46 | 99.50 | 99.48 |
| FER2013 | Naive fine-tuning | 75.55 | 78.96 | 77.22 | 76.75 |
| FER2013 | LwF | 70.67 | 69.72 | 70.19 | 71.25 |
| FER2013 | Proposed network | 79.50 | 78.80 | 79.10 | 79.15 |
| AffectNet | Naive fine-tuning | 68.37 | 69.57 | 68.97 | 69.85 |
| AffectNet | LwF | 71.61 | 69.59 | 70.59 | 70.25 |
| AffectNet | Proposed network | 77.00 | 75.90 | 76.45 | 74.20 |
| Method | CPU FPS | GPU FPS | Batch Size | FLOPs (G) | Latency (ms) | Parameter (M) | Accuracy (%) |
|---|---|---|---|---|---|---|---|
| MobileNetV2 | 16–19 | 61–63 | 64 | 0.57 | 15.9–16.4 | 4.9 | 88.95 |
| InceptionV3 | 5–6 | 48–51 | 64 | 5.72 | 19.6–20.8 | 27.7 | 97.28 |
| ResNet50 | 6–8 | 36–49 | 64 | 4.09 | 20.4–27.8 | 29.5 | 89.15 |
| ConvNeXtTiny | 6–9 | 49–52 | 64 | 4.49 | 19.2–20.4 | 28.8 | 95.10 |
| Xception | 5–6 | 44–47 | 64 | 8.40 | 21.3–22.7 | 26.8 | 93.95 |
| Proposed Network | 11–14 | 54–58 | 64 | 2.94 | 17.2–18.5 | 8.7 | 99.48 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Khan, H.A.; Lee, J.-H. IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition. Mathematics 2026, 14, 1023. https://doi.org/10.3390/math14061023
Khan HA, Lee J-H. IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition. Mathematics. 2026; 14(6):1023. https://doi.org/10.3390/math14061023
Chicago/Turabian StyleKhan, Haseeb Ali, and Jong-Ha Lee. 2026. "IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition" Mathematics 14, no. 6: 1023. https://doi.org/10.3390/math14061023
APA StyleKhan, H. A., & Lee, J.-H. (2026). IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition. Mathematics, 14(6), 1023. https://doi.org/10.3390/math14061023

