An Explainable Framework for Mental Health Monitoring Using Lightweight and Privacy-Preserving Federated Facial Emotion Recognition
Abstract
1. Introduction
- Lightweight and Edge-Deployable FER Architecture: A compact CNN architecture optimized for efficient training and low-latency inference, achieving a strong balance between accuracy and computational cost. The model demonstrates superior cross-dataset generalization across Facial Emotion Recognition 2013 (FER2013), Real-world Affective Faces Database (RAF-DB), and Expression in the Wild (ExpW), reaching 75.5% and 74.3% average accuracy under centralized and federated configurations, respectively. With only 1.45 M parameters and 0.107 GFLOPs, it demonstrates stronger cross-dataset generalization than previously reported cross-dataset FER architectures across all three datasets, while remaining suitable for real-time emotion monitoring on resource-constrained edge devices.
- Federated Learning with Integrated Explainability for Trustworthy Model Selection: A privacy-preserving federated learning framework that explicitly incorporates explainability into the evaluation and model optimization process. The framework combines federated training with multi-level interpretability analysis, enabling systematic comparison of model behavior across different configurations. By leveraging explainability to guide model selection, the approach enhances transparency, interpretability, and accountability, supporting its suitability for deployment in privacy-sensitive mental health monitoring systems.
- Quantitative Explainability Assessment: A systematic evaluation of Gradient-weighted Class Activation Mapping++ (Grad-CAM++) explanations using perturbation metrics, including Insertion Area Under Curve (IAUC), Deletion Area Under Curve (DAUC), Average Drop (AD), Increase in Confidence (IC), Average Drop in Accuracy (ADA), and Active Pixel Ratio, to provide objective, reproducible, and comparable measures of explanation quality and model trustworthiness.
2. Related Work
2.1. FER Systems and Lightweight Models
2.2. FER for Assistive, Clinical, and Contextual Applications
2.3. Research Gap
- Limited Explainability and Lack of Standardized Quantitative XAI Evaluation: Many mental-health-focused FER studies either ignore explainability [24,25,26] or use it only as a basic visualization tool, limiting the trust and interpretability of their decisions. Even in cases where XAI was used, explanation reliability is primarily evaluated by subjective visual inspection rather than using standardized quantitative metrics [27,28,31,32,33,34,35,37,38].
- High Computational Overhead and Lack of Lightweight Architecture Optimization: Most FER systems use computationally expensive deep learning models unsuitable for resource-constrained environments [26,27,28,29,34,35,36,37,38,40,42,43,44]. Current research lacks lightweight architectures that simultaneously achieve high accuracy and reliable XAI interpretability.
- Limited Cross-Generalization and Privacy-Preserving Deployment: FER systems still struggle to generalize to real-world clinical environments, as many studies rely on a single dataset or controlled, posed emotion data that fail to capture in-the-wild variability [31,32,33,34,39,41]. Federated learning settings also remain limited, often involving few clients and acted datasets, which weakens real-world generalization and deployability [43,45,46]. Additionally, some approaches remain conceptual or compromise privacy, without achieving true cross-dataset learning under privacy-preserving constraints [47].
3. Proposed Methodology
3.1. FER for Mental Health
3.2. Federated Learning Collaborative Training
3.3. Datasets
- Real-world Affective Faces Database (RAF-DB) [54]: contributes a total of 12,488 images split into 9983 training and 2505 test samples, capturing a wide demographic range with variations in age, ethnicity, gender, head poses, and illumination. The dataset follows its official split of approximately 80:20 (train:test), which we retained to ensure comparability with prior works.
- Facial Emotion Recognition 2013 (FER2013) [55]: provides 16,482 images under challenging in-the-wild conditions such as occlusions and variable lighting, split into 13,269 training and 3213 test samples (80:20).
- Expression in the Wild (ExpW) [56]: adds situational diversity with 79,650 images of both posed and spontaneous expressions, split into 55,739 training and 23,911 test samples (70:30).
3.4. Explainability, Performance, and Trustworthiness-Driven CNN Architecture Optimization
Grad-CAM++ Explainability Evaluation
3.5. Assessing Grad-CAM++ Explanations with Perturbation-Based Metrics
- Insertion and Deletion AUC (IAUC/DAUC), which quantify how the model’s confidence evolves as the most salient regions are progressively revealed or removed.
- Average Drop (AD) and Average Drop in Accuracy (ADA), which measure the reduction in confidence and accuracy when the important regions highlighted by the explanation are removed.
- Increase in Confidence (IC), which counts cases where retaining only the salient regions leads to higher confidence than using the full image.
4. Results Analysis and Discussion
4.1. Optimization and Selection of CNN Architectures for Explainability Analysis
4.2. Reproducibility and Statistical Evaluation
4.3. Discussion on Client Scaling and Data Characteristics
5. Comparison with Cross-Dataset Evaluation Methods
6. Edge Deployment and Runtime Evaluation
7. Ethical and Practical Considerations for Deployment
8. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Perturbation-Based Metrics
Appendix A.1. Insertion and Deletion Area Under Curve
Appendix A.2. Average Drop
- represents the model’s confidence for class c of the ith sample when the entire image is used as input.
- is the model’s confidence for class c of the ith sample when the important regions are removed.
- N is the total number of images.
- is the indicator function that equals 1 if the condition is true and 0 otherwise.
- and denote the predicted classes for the original and masked images, respectively.
- represents the true class label for the ith sample.
Appendix A.3. Increase in Confidence
- represents the model’s confidence for class c of the ith sample when the entire image is used as input.
- represents the model’s confidence for class c of the ith sample when only the important regions identified by the explanation map are kept, and the remaining regions are replaced with the image mean.
- is an indicator function that returns 1 when the argument is true (i.e., when ) and 0 otherwise.
References
- World Health Organization. Mental Disorders. 2022. Available online: https://www.who.int/news-room/fact-sheets/detail/mental-disorders (accessed on 1 November 2025).
- World Health Organization. Comprehensive Mental Health Action Plan 2013–2030. 2021. Available online: https://www.who.int/publications/i/item/9789240031029 (accessed on 25 August 2025).
- Fredrickson, B.L.; Losada, M.F. Positive affect and the complex dynamics of human flourishing. Am. Psychol. 2005, 60, 678–686. [Google Scholar] [CrossRef]
- Ong, A.D.; Bergeman, C.; Chow, S.M. Positive emotions as a basic building block of resilience in adulthood. In Handbook of Adult Resilience; Guilford Press: New York, NY, USA, 2010; pp. 81–93. [Google Scholar]
- Barker, D.; Tippireddy, M.K.R.; Farhan, A.; Ahmed, B. Ethical Considerations in Emotion Recognition Research. Psychol. Int. 2025, 7, 43. [Google Scholar] [CrossRef]
- McMahan, H.B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), Ft. Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
- Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
- Minaee, S.; Minaei, M.; Abdolrashidi, A. Deep-emotion: Facial expression recognition using attentional convolutional network. Sensors 2021, 21, 3046. [Google Scholar] [CrossRef] [PubMed]
- Hou, C.; Ai, J.; Lin, Y.; Guan, C.; Li, J.; Zhu, W. Evaluation of Online Teaching Quality Based on Facial Expression Recognition. Future Internet 2022, 14, 177. [Google Scholar] [CrossRef]
- Shi, C.; Tan, C.; Wang, L. A facial expression recognition method based on a multibranch cross-connection convolutional neural network. IEEE Access 2021, 9, 39255–39274. [Google Scholar] [CrossRef]
- Zhou, N.; Liang, R.; Shi, W. A lightweight convolutional neural network for real-time facial expression detection. IEEE Access 2020, 9, 5573–5584. [Google Scholar] [CrossRef]
- Kim, J.; Kang, J.K.; Kim, Y. A Resource Efficient Integer-Arithmetic-Only FPGA-Based CNN Accelerator for Real-Time Facial Emotion Recognition. IEEE Access 2021, 9, 104367–104381. [Google Scholar] [CrossRef]
- Zhao, G.; Wei, W.; Xie, X.; Fan, S.; Sun, K. An FPGA-Based BNN Real-Time Facial Emotion Recognition Algorithm. In Proceedings of the 2022 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 24–26 June 2022; pp. 20–24. [Google Scholar]
- Saurav, S.; Gidde, P.; Saini, R.; Singh, S. Dual integrated convolutional neural network for real-time facial expression recognition in the wild. Vis. Comput. 2022, 38, 1083–1096. [Google Scholar] [CrossRef]
- Saurav, S.; Saini, R.; Singh, S. EmNet: A deep integrated convolutional neural network for facial emotion recognition in the wild. Appl. Intell. 2021, 51, 5543–5570. [Google Scholar] [CrossRef]
- Zhao, G.; Yang, H.; Yu, M. Expression recognition method based on a lightweight convolutional neural network. IEEE Access 2020, 8, 38528–38537. [Google Scholar] [CrossRef]
- Huang, Z.Y.; Chiang, C.C.; Chen, J.H.; Chen, Y.C.; Chung, H.L.; Cai, Y.P.; Hsu, H.C. A study on computer vision for facial emotion recognition. Sci. Rep. 2023, 13, 8425. [Google Scholar] [CrossRef]
- Dias, W.; Andaló, F.; Padilha, R.; Bertocco, G.; Almeida, W.; Costa, P.; Rocha, A. Cross-dataset emotion recognition from facial expressions through convolutional neural networks. J. Vis. Commun. Image Represent. 2022, 82, 103395. [Google Scholar] [CrossRef]
- Liang, L.; Lang, C.; Li, Y.; Feng, S.; Zhao, J. Fine-grained facial expression recognition in the wild. IEEE Trans. Inf. Forensics Secur. 2020, 16, 482–494. [Google Scholar] [CrossRef]
- Reghunathan, R.K.; Ramankutty, V.K.; Kallingal, A.; Vinod, V. Facial Expression Recognition Using Pre-trained Architectures. Eng. Proc. 2024, 62, 22. [Google Scholar]
- Gupta, S.; Kumar, P.; Tekchandani, R.K. Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models. Multimed. Tools Appl. 2023, 82, 11365–11394. [Google Scholar] [CrossRef]
- Hu, F.; He, K.; Wang, C.; Zheng, Q.; Zhou, B.; Li, G.; Sun, Y. STRFLNet: Spatio-Temporal Representation Fusion Learning Network for EEG-Based Emotion Recognition. IEEE Trans. Affect. Comput. 2025; in press. [Google Scholar] [CrossRef]
- Cai, M.; Chen, J.; Hua, C.; Wen, G.; Fu, R. EEG emotion recognition using EEG-SWTNS neural network through EEG spectral image. Inf. Sci. 2024, 680, 121198. [Google Scholar] [CrossRef]
- Ma, Y.; Shen, J.; Zhao, Z.; Liang, H.; Tan, Y.; Liu, Z.; Qian, K.; Yang, M.; Hu, B. What can facial movements reveal? Depression recognition and analysis based on optical flow using Bayesian networks. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 3459–3468. [Google Scholar] [CrossRef] [PubMed]
- Shangguan, Z.; Liu, Z.; Li, G.; Chen, Q.; Ding, Z.; Hu, B. Dual-stream multiple instance learning for depression detection with facial expression videos. IEEE Trans. Neural Syst. Rehabil. Eng. 2022, 31, 554–563. [Google Scholar] [CrossRef] [PubMed]
- Gue, J.X.; Chong, C.Y.; Lim, M.K. Facial Expression Recognition as markers of Depression. In Proceedings of the 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Taipei, Taiwan, 31 October–3 November 2023; pp. 674–680. [Google Scholar]
- Ramis Guarinos, S.; Manresa Yee, C.; Buades Rubio, J.M.; Gaya-Morey, F.X. Explainable Facial Expression Recognition for People with Intellectual Disabilities. In Proceedings of the XXIII International Conference on Human Computer Interaction, Copenhagen, Denmark, 23–28 July 2023; pp. 1–7. [Google Scholar]
- Rathod, M.; Dalvi, C.; Kaur, K.; Patil, S.; Gite, S.; Kamat, P.; Kotecha, K.; Abraham, A.; Gabralla, L.A. Kids’ emotion recognition using various deep-learning models with explainable ai. Sensors 2022, 22, 8066. [Google Scholar] [CrossRef] [PubMed]
- Ruangdit, T.; Sungkhin, T.; Phenglong, W.; Phaisangittisagul, E. Integration of Facial and Speech Expressions for Multimodal Emotional Recognition. In Proceedings of the TENCON 2023–2023 IEEE Region 10 Conference (TENCON), Chiang Mai, Thailand, 31 October–3 November 2023; pp. 519–523. [Google Scholar]
- Hettiarachchi, H.; Ekanayake, M.; Kaveendhya, G.; Koralage, O.; Samarasekara, P.; Kasthurirathna, D. Gender influence on emotional recognition using facial expressions and voice. In Proceedings of the 2023 5th International Conference on Advancements in Computing (ICAC), Colombo, Sri Lanka, 7–8 December 2023; pp. 721–726. [Google Scholar]
- Cesarelli, M.; Martinelli, F.; Mercaldo, F.; Santone, A. Emotion recognition from facial expression using explainable deep learning. In Proceedings of the 2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Falerna, Italy, 12–15 September 2022; pp. 1–6. [Google Scholar]
- Kandeel, A.A.; Abbas, H.M.; Hassanein, H.S. Explainable model selection of a convolutional neural network for driver’s facial emotion identification. In Proceedings of the International Conference on Pattern Recognition, Milan, Italy, 10–15 January 2021; pp. 699–713. [Google Scholar]
- Deramgozin, M.; Jovanovic, S.; Rabah, H.; Ramzan, N. A hybrid explainable ai framework applied to global and local facial expression recognition. In Proceedings of the 2021 IEEE International Conference on Imaging Systems and Techniques (IST), Online, 24–26 August 2021; pp. 1–5. [Google Scholar]
- Zhu, H.; Yu, C.; Cangelosi, A. Explainable emotion recognition for trustworthy human–robot interaction. In Proceedings of the Workshop Context-Awareness Human-Robot Interaction Approaches Challenges ACM/IEEE HRI, Sapporo, Japan, 7–10 March 2022. [Google Scholar]
- Borriero, A.; Milazzo, M.; Diano, M.; Orsenigo, D.; Villa, M.C.; DiFazio, C.; Tamietto, M.; Perotti, A. Explainable Emotion Decoding for Human and Computer Vision. In Proceedings of the World Conference on Explainable Artificial Intelligence, Valletta, Malta, 17–19 July 2024; pp. 178–201. [Google Scholar]
- Punuri, S.B.; Kuanar, S.K.; Mishra, T.K.; Rao, V.V.R.M.; Reddy, S.S. Decoding Human Facial Emotions: A Ranking Approach using Explainable AI. IEEE Access 2024, 12, 186229–186245. [Google Scholar] [CrossRef]
- Manresa-Yee, C.; Ramis, S.; Gaya-Morey, F.X.; Buades, J.M. Impact of explanations for trustworthy and transparent artificial intelligence. In Proceedings of the XXIII International Conference on Human Computer Interaction, Copenhagen, Denmark, 23–28 July 2023; pp. 1–8. [Google Scholar]
- Rajpal, A.; Sehra, K.; Bagri, R.; Sikka, P. Xai-fr: Explainable ai-based face recognition using deep neural networks. Wirel. Pers. Commun. 2023, 129, 663–680. [Google Scholar] [CrossRef] [PubMed]
- Gaya-Morey, F.X.; Ramis-Guarinos, S.; Manresa-Yee, C.; Buades-Rubio, J.M. Unveiling the human-like similarities of automatic facial expression recognition: An empirical exploration through explainable ai. Multimed. Tools Appl. 2024, 83, 85725–85753. [Google Scholar] [CrossRef]
- Lorch, S.; Gebele, J.; Brune, P. Towards Trustworthy AI: Evaluating SHAP and LIME for Facial Emotion Recognition. In Proceedings of the 58th Hawaii International Conference on System Sciences, Big Island, HI, USA, 7–10 January 2025; pp. 7532–7546. [Google Scholar]
- Di Luzio, F.; Rosato, A.; Panella, M. An explainable fast deep neural network for emotion recognition. Biomed. Signal Process. Control 2025, 100, 107177. [Google Scholar] [CrossRef]
- Jaswanth, M.; Narayana, N.; Rahul, S.; Amudha, J.; Aiswariya Milan, K. Emotion and Advertising Effectiveness: A Novel Facial Expression Analysis Approach Using Federated Learning. In Proceedings of the 2023 IEEE 20th India Council International Conference (INDICON), Hyderabad, India, 14–17 December 2023; pp. 368–373. [Google Scholar]
- Zhang, C.; Li, M.; Wu, D. Federated multidomain learning with graph ensemble autoencoder GMM for emotion recognition. IEEE Trans. Intell. Transp. Syst. 2022, 24, 7631–7641. [Google Scholar] [CrossRef]
- Franco, D.; Oneto, L.; Navarin, N.; Anguita, D. Toward learning trustworthily from data combining privacy, fairness, and explainability: An application to face recognition. Entropy 2021, 23, 1047. [Google Scholar] [CrossRef]
- Ghosh, T.; Banna, M.H.A.; Nahian, M.J.A.; Kaiser, M.S.; Mahmud, M.; Li, S.; Pillay, N. A privacy-preserving federated-mobilenet for facial expression detection from images. In Proceedings of the International Conference on Applied Intelligence and Informatics, Reggio Calabria, Italy, 1–3 September 2022; pp. 277–292. [Google Scholar]
- Simić, N.; Suzić, S.; Milošević, N.; Stanojev, V.; Nosek, T.; Popović, B.; Bajović, D. Enhancing Emotion Recognition through Federated Learning: A Multimodal Approach with Convolutional Neural Networks. Appl. Sci. 2024, 14, 1325. [Google Scholar] [CrossRef]
- Qi, F.; Zhang, Z.; Yang, X.; Zhang, H.; Xu, C. Feeling Without Sharing: A Federated Video Emotion Recognition Framework Via Privacy-Agnostic Hybrid Aggregation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 151–160. [Google Scholar] [CrossRef]
- Treadway, M.T.; Zald, D.H. Reconsidering anhedonia in depression: Lessons from translational neuroscience. Neurosci. Biobehav. Rev. 2011, 35, 537–555. [Google Scholar] [CrossRef]
- Pizzagalli, D.A. Depression, stress, and anhedonia: Toward a synthesis and integrated model. Annu. Rev. Clin. Psychol. 2014, 10, 393–423. [Google Scholar] [CrossRef]
- Rottenberg, J. Mood and emotion in major depression. Curr. Dir. Psychol. Sci. 2005, 14, 167–170. [Google Scholar] [CrossRef]
- Bylsma, L.M.; Morris, B.H.; Rottenberg, J. A meta-analysis of emotional reactivity in major depressive disorder. Clin. Psychol. Rev. 2008, 28, 676–691. [Google Scholar] [CrossRef]
- Martin, L.A.; Neighbors, H.W.; Griffith, D.M. The experience of symptoms of depression in men vs women: Analysis of the National Comorbidity Survey Replication. JAMA Psychiatry 2013, 70, 1100–1106. [Google Scholar] [CrossRef]
- Busch, F.N. Anger and depression. Adv. Psychiatr. Treat. 2009, 15, 271–278. [Google Scholar] [CrossRef]
- Li, S.; Zhang, M.; Liu, Z.; Chen, W.; Yang, Y.; Jiang, Z. RAF-DB: A Real-world Affective Faces Database. In Proceedings of the 2017 International Conference on Affective Computing and Intelligent Interaction (ACII), San Antonio, TX, USA, 23–26 October 2017; pp. 51–57. [Google Scholar]
- Goodfellow, I.; DeVries, W.; Fidler, S.; Grosse, R.; Mnih, V.; Taylor, G.; An, Y. The Facial Expression Recognition 2013 (FER2013) Dataset. arXiv 2013, arXiv:1307.6888. [Google Scholar]
- Zhang, Z.; Luo, P.; Loy, C.C.; Tang, X. From Facial Expression Recognition to Interpersonal Relation Prediction. arXiv 2016, arXiv:1609.06426v2. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Qiao, Y. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar] [CrossRef]
- Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
- Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Fong, R.; Patrick, M.; Vedaldi, A. Understanding deep networks via extremal perturbations and smooth masks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 2950–2958. [Google Scholar]
- Gomez, T.; Fréour, T.; Mouchère, H. Metrics for saliency map evaluation of deep learning explanation methods. In Proceedings of the International Conference on Pattern Recognition and Artificial Intelligence, Paris, France, 1–3 June 2022; pp. 84–95. [Google Scholar]
- Gao, Y.; Cai, Y.; Bi, X.; Li, B.; Li, S.; Zheng, W. Cross-domain facial expression recognition through reliable global–local representation learning and dynamic label weighting. Electronics 2023, 12, 4553. [Google Scholar] [CrossRef]
- Xu, R.; Li, G.; Yang, J.; Lin, L. Larger norm more transferable: An adaptive feature norm approach for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1426–1435. [Google Scholar]
- Long, M.; Cao, Z.; Wang, J.; Jordan, M.I. Conditional adversarial domain adaptation. Adv. Neural Inf. Process. Syst. 2018, 31, 1647–1657. [Google Scholar]
- Li, S.; Deng, W. A deeper look at facial expression dataset bias. IEEE Trans. Affect. Comput. 2020, 13, 881–893. [Google Scholar] [CrossRef]
- Chen, T.; Pu, T.; Wu, H.; Xie, Y.; Liu, L.; Lin, L. Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 9887–9903. [Google Scholar] [CrossRef]
- Xie, Y.; Gao, Y.; Lin, J.; Chen, T. Learning consistent global-local representation for cross-domain facial expression recognition. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 2489–2495. [Google Scholar]
- Lee, J.; Hwang, K.i. YOLO with adaptive frame control for real-time object detection applications. Multimed. Tools Appl. 2022, 81, 36375–36396. [Google Scholar] [CrossRef]
- Dey, A.; Srivastava, S.; Singh, G.; Pettit, R.G. Real-Time Performance Benchmarking of TinyML Models in Embedded Systems (PICO: Performance of Inference, CPU, and Operations). In Proceedings of the 2025 28th International Symposium on Real-Time Distributed Computing (ISORC), Toulouse, France, 26–28 May 2025; pp. 279–284. [Google Scholar]
- Pascual, A.M.; Valverde, E.C.; Kim, J.i.; Jeong, J.W.; Jung, Y.; Kim, S.H.; Lim, W. Light-FER: A Lightweight Facial Emotion Recognition System on Edge Devices. Sensors 2022, 22, 9524. [Google Scholar] [CrossRef]
- Allan, A. Benchmarking TensorFlow Lite on the New Raspberry Pi 4, Model B. 2019. Available online: https://www.hackster.io/news/benchmarking-tensorflow-lite-on-the-new-raspberry-pi-4-model-b-3fd859d05b98 (accessed on 1 November 2025).
- Liao, L.; Wu, S.; Song, C.; Fu, J. RS-Xception: A lightweight network for facial expression recognition. Electronics 2024, 13, 3217. [Google Scholar] [CrossRef]
- Gursesli, M.C.; Lombardi, S.; Duradoni, M.; Bocchi, L.; Guazzini, A.; Lanata, A. Facial emotion recognition (FER) through custom lightweight CNN model: Performance evaluation in public datasets. IEEE Access 2024, 12, 45543–45559. [Google Scholar] [CrossRef]
- Zhu, Q.; Zhuang, H.; Zhao, M.; Xu, S.; Meng, R. A study on expression recognition based on improved mobilenetV2 network. Sci. Rep. 2024, 14, 8121. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Li, S.; Deng, W. Deep facial expression recognition: A survey. IEEE Trans. Affect. Comput. 2020, 13, 1195–1215. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Alvarez, J.M.; Colmenarejo, A.B.; Elobaid, A.; Fabbrizzi, S.; Fahimi, M.; Ferrara, A.; Ghodsi, S.; Mougan, C.; Papageorgiou, I.; Reyero, P.; et al. Policy advice and best practices on bias and fairness in AI. Ethics Inf. Technol. 2024, 26, 31. [Google Scholar] [CrossRef]
- Latif, S.; Ali, H.S.; Usama, M.; Rana, R.; Schuller, B.; Qadir, J. Ai-based emotion recognition: Promise, peril, and prescriptions for prosocial path. arXiv 2022, arXiv:2211.07290. [Google Scholar] [CrossRef]
- Martinez-Martin, N. What are important ethical implications of using facial recognition technology in health care? AMA J. Ethics 2019, 21, E180. [Google Scholar] [PubMed]
- Ferrara, E. Fairness and bias in artificial intelligence: A brief survey of sources, impacts, and mitigation strategies. Sci 2024, 6, 3. [Google Scholar] [CrossRef]
- González-Sendino, R.; Serrano, E.; Bajo, J. Mitigating bias in artificial intelligence: Fair data generation via causal models for transparent and explainable decision-making. Future Gener. Comput. Syst. 2024, 155, 384–401. [Google Scholar] [CrossRef]
- Benouis, M.; Andre, E.; Can, Y.S. Balancing Between Privacy and Utility for Affect Recognition Using Multitask Learning in Differential Privacy–Added Federated Learning Settings: Quantitative Study. JMIR Ment. Health 2024, 11, e60003. [Google Scholar] [CrossRef]
- Petsiuk, V.; Das, A.; Saenko, K. RISE: Randomized Input Sampling for Explanation of Black-box Models. arXiv 2018, arXiv:1806.07421. [Google Scholar] [CrossRef]
- Mitra, S.; Sukul, A.; Roy, S.K.; Singh, P.; Verma, V. Reliable or Deceptive? Investigating Gated Features for Smooth Visual Explanations in CNNs. arXiv 2024, arXiv:2404.19341. [Google Scholar] [CrossRef]




| Metric | (16, 32, 64) | (32, 64, 128) | (64, 128, 256) | (128, 128, 256) |
|---|---|---|---|---|
| Accuracy | 0.675 | 0.704 | 0.711 | 0.711 |
| F1 Score | 0.673 | 0.702 | 0.712 | 0.709 |
| Recall | 0.675 | 0.704 | 0.711 | 0.711 |
| Precision | 0.688 | 0.712 | 0.723 | 0.719 |
| Metric | (16, 32, 64) | (32, 64, 128) | (64, 128, 256) | (128, 128, 256) |
|---|---|---|---|---|
| Accuracy | 0.778 | 0.813 | 0.828 | 0.807 |
| F1 Score | 0.775 | 0.812 | 0.827 | 0.804 |
| Recall | 0.778 | 0.813 | 0.828 | 0.807 |
| Precision | 0.774 | 0.812 | 0.828 | 0.807 |
| Metric | (16, 32, 64) | (32, 64, 128) | (64, 128, 256) | (128, 128, 256) |
|---|---|---|---|---|
| Accuracy | 0.665 | 0.690 | 0.688 | 0.691 |
| F1 Score | 0.658 | 0.681 | 0.681 | 0.681 |
| Recall | 0.665 | 0.690 | 0.688 | 0.691 |
| Precision | 0.665 | 0.680 | 0.679 | 0.682 |
| Metric | (16, 32, 64, 128) | (32, 64, 128, 256) | (64, 128, 256, 512) | (128, 128, 256, 512) |
|---|---|---|---|---|
| Accuracy | 0.679 | 0.696 | 0.704 | 0.716 |
| F1 Score | 0.678 | 0.695 | 0.702 | 0.715 |
| Recall | 0.679 | 0.696 | 0.704 | 0.716 |
| Precision | 0.685 | 0.705 | 0.723 | 0.719 |
| Metric | (16, 32, 64, 128) | (32, 64, 128, 256) | (64, 128, 256, 512) | (128, 128, 256, 512) |
|---|---|---|---|---|
| Accuracy | 0.760 | 0.785 | 0.806 | 0.797 |
| F1 Score | 0.757 | 0.784 | 0.802 | 0.795 |
| Recall | 0.760 | 0.785 | 0.806 | 0.797 |
| Precision | 0.756 | 0.784 | 0.811 | 0.794 |
| Metric | (16, 32, 64, 128) | (32, 64, 128, 256) | (64, 128, 256, 512) | (128, 128, 256, 512) |
|---|---|---|---|---|
| Accuracy | 0.649 | 0.676 | 0.690 | 0.682 |
| F1 Score | 0.647 | 0.669 | 0.679 | 0.676 |
| Recall | 0.649 | 0.676 | 0.690 | 0.682 |
| Precision | 0.645 | 0.666 | 0.687 | 0.675 |
| Model | Emotion | Sample | Grad-CAM++ Overlays | |||
|---|---|---|---|---|---|---|
| FB1 | FB2 | FB3 | FB4 | |||
| (64, 128, 256) | Anger | ![]() | ![]() | ![]() | ![]() | — |
| (64, 128, 256, 512) | Anger | ![]() | ![]() | ![]() | ![]() | ![]() |
| (64, 128, 256) | Anger | ![]() | ![]() | ![]() | ![]() | — |
| (64, 128, 256, 512) | Anger | ![]() | ![]() | ![]() | ![]() | ![]() |
| (64, 128, 256) | Sad | ![]() | ![]() | ![]() | ![]() | — |
| (64, 128, 256, 512) | Sad | ![]() | ![]() | ![]() | ![]() | ![]() |
| (64, 128, 256) | Sad | ![]() | ![]() | ![]() | ![]() | — |
| (64, 128, 256, 512) | Sad | ![]() | ![]() | ![]() | ![]() | ![]() |
| (64, 128, 256) | Happy | ![]() | ![]() | ![]() | ![]() | — |
| (64, 128, 256, 512) | Happy | ![]() | ![]() | ![]() | ![]() | ![]() |
| (64, 128, 256) | Happy | ![]() | ![]() | ![]() | ![]() | — |
| (64, 128, 256, 512) | Happy | ![]() | ![]() | ![]() | ![]() | ![]() |
| Emotion | Sample | Grad-CAM++ Overlays (FB3) | |
|---|---|---|---|
| (64, 128, 256) | (64, 128, 256, 512) | ||
| Anger | ![]() | ![]() | ![]() |
| Anger | ![]() | ![]() | ![]() |
| Sad | ![]() | ![]() | ![]() |
| Sad | ![]() | ![]() | ![]() |
| Happy | ![]() | ![]() | ![]() |
| Happy | ![]() | ![]() | ![]() |
| Neutral | ![]() | ![]() | ![]() |
| Neutral | ![]() | ![]() | ![]() |
| Metric | Anger | Happy | Sad | Neutral | ||||
|---|---|---|---|---|---|---|---|---|
| (64, 128, 256) | (64, 128, 256, 512) | (64, 128, 256) | (64, 128, 256, 512) | (64, 128, 256) | (64, 128, 256, 512) | (64, 128, 256) | (64, 128, 256, 512) | |
| Original Confidence (↑) | 0.982 | 0.972 | 0.992 | 0.965 | 0.970 | 0.912 | 0.962 | 0.918 |
| IAUC (↑) | 0.226 | 0.191 | 0.268 | 0.145 | 0.230 | 0.154 | 0.699 | 0.777 |
| DAUC (↓) | 0.258 | 0.230 | 0.211 | 0.247 | 0.263 | 0.197 | 0.672 | 0.765 |
| AD (↑) | 0.708 | 0.557 | 0.369 | 0.456 | 0.690 | 0.667 | 0.308 | 0.177 |
| IC (↑) | 0.436 | 0.222 | 0.519 | 0.355 | 0.414 | 0.234 | 0.768 | 0.920 |
| ADA (↑) | 0.713 | 0.563 | 0.331 | 0.422 | 0.702 | 0.674 | 0.303 | 0.172 |
| Active Pixels (↓) | 0.401 | 0.481 | 0.373 | 0.547 | 0.464 | 0.529 | 0.451 | 0.510 |
| Accuracy After Masking (↓) | 0.287 | 0.437 | 0.669 | 0.578 | 0.298 | 0.326 | 0.697 | 0.828 |
| Model | Original (↑) | IAUC (↑) | DAUC (↓) | AD (↑) | IC (↑) | ADA (↑) | Active (↓) | Accuracy (↓) |
|---|---|---|---|---|---|---|---|---|
| Confidence | Pixels | After Masking | ||||||
| (64, 128, 256) | 0.976 | 0.356 | 0.351 | 0.519 | 0.534 | 0.512 | 0.422 | 0.488 |
| (64, 128, 256, 512) | 0.942 | 0.317 | 0.360 | 0.464 | 0.432 | 0.458 | 0.517 | 0.542 |
| Dataset | Metric | Mean | Std | 95% CI (Lower) | 95% CI (Upper) |
|---|---|---|---|---|---|
| FER2013 | Accuracy | 71.59 | 0.37 | 71.33 | 71.86 |
| F1 | 71.52 | 0.36 | 71.27 | 71.78 | |
| Recall | 71.59 | 0.37 | 71.33 | 71.86 | |
| Precision | 72.30 | 0.34 | 72.05 | 72.55 | |
| ExpW | Accuracy | 68.82 | 0.29 | 68.61 | 69.03 |
| F1 | 68.09 | 0.25 | 67.91 | 68.27 | |
| Recall | 68.82 | 0.29 | 68.61 | 69.03 | |
| Precision | 68.00 | 0.33 | 67.76 | 68.24 | |
| RAF-DB | Accuracy | 81.24 | 0.68 | 80.75 | 81.72 |
| F1 | 81.06 | 0.69 | 80.56 | 81.55 | |
| Recall | 81.24 | 0.68 | 80.75 | 81.72 | |
| Precision | 81.16 | 0.76 | 80.61 | 81.70 |
| Dataset | Clients | Accuracy (%) |
|---|---|---|
| FER2013 | 3 | 71.99 |
| 6 | 70.12 | |
| 9 | 70.68 | |
| 15 | 70.25 | |
| Mean | 70.76 | |
| Std | 0.85 | |
| RAF-DB | 3 | 80.80 |
| 6 | 81.44 | |
| 9 | 80.92 | |
| 15 | 80.24 | |
| Mean | 80.85 | |
| Std | 0.49 | |
| ExpW | 3 | 68.95 |
| 6 | 68.30 | |
| 9 | 67.71 | |
| 15 | 66.86 | |
| Mean | 67.95 | |
| Std | 0.89 |
| Method | Backbone | RAF-DB | ExpW | FER2013 | Avg. |
|---|---|---|---|---|---|
| Large-Scale Architectures | |||||
| SAFN [65] | ResNet-50 | 62.8 | 64.9 | 55.6 | 61.1 |
| CADA [66] | ResNet-50 | 66.0 | 63.2 | 57.6 | 62.3 |
| ECAN [67] | ResNet-50 | 53.4 | 47.4 | 56.5 | 52.4 |
| AGRA [68] | ResNet-50 | 67.6 | 68.5 | 59.0 | 65.0 |
| CGLRL [69] | ResNet-50 | 71.8 | 70.0 | 59.3 | 67.0 |
| LDWM [64] | ResNet-50 | 75.9 | 73.3 | 61.1 | 70.1 |
| Lightweight Architectures | |||||
| SAFN [65] | MobileNet-V2 | 38.7 | 61.4 | 49.9 | 50.0 |
| CADA [66] | MobileNet-V2 | 53.2 | 59.4 | 49.3 | 54.0 |
| ECAN [67] | MobileNet-V2 | 42.3 | 45.1 | 45.8 | 44.4 |
| AGRA [68] | MobileNet-V2 | 52.3 | 64.0 | 45.8 | 54.0 |
| CGLRL [69] | MobileNet-V2 | 62.0 | 64.9 | 52.5 | 59.8 |
| LDWM [64] | MobileNet-V2 | 64.8 | 64.9 | 53.0 | 60.9 |
| Proposed (Centralized) | Custom | 83.6 | 70.6 | 72.3 | 75.5 |
| Proposed (Federated) | Custom | 82.8 | 68.8 | 71.3 | 74.3 |
| Metric | Value |
|---|---|
| Model Size | 5.5 MB |
| Average Latency | 13.92 ± 1.46 ms |
| Minimum Latency | 12.33 ms |
| Maximum Latency | 18.35 ms |
| Throughput | 71.84 FPS |
| CPU Usage | 25.30 ± 8.51% |
| Resident Set Size (RSS) | 156.10 MB |
| Steady-State Memory | 151.12 MB |
| Model Size (TFLite) | 5.54 MB |
| MFLOPs per Inference | ∼107.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shehada, D.; Tawfik, H.; Bouridane, A.; Hussain, A. An Explainable Framework for Mental Health Monitoring Using Lightweight and Privacy-Preserving Federated Facial Emotion Recognition. Sensors 2025, 25, 7320. https://doi.org/10.3390/s25237320
Shehada D, Tawfik H, Bouridane A, Hussain A. An Explainable Framework for Mental Health Monitoring Using Lightweight and Privacy-Preserving Federated Facial Emotion Recognition. Sensors. 2025; 25(23):7320. https://doi.org/10.3390/s25237320
Chicago/Turabian StyleShehada, Dina, Hissam Tawfik, Ahmed Bouridane, and Abir Hussain. 2025. "An Explainable Framework for Mental Health Monitoring Using Lightweight and Privacy-Preserving Federated Facial Emotion Recognition" Sensors 25, no. 23: 7320. https://doi.org/10.3390/s25237320
APA StyleShehada, D., Tawfik, H., Bouridane, A., & Hussain, A. (2025). An Explainable Framework for Mental Health Monitoring Using Lightweight and Privacy-Preserving Federated Facial Emotion Recognition. Sensors, 25(23), 7320. https://doi.org/10.3390/s25237320















































































