Deep Learning for Facial Emotion Analysis and Human Activity Recognition

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 February 2026 | Viewed by 14005

Special Issue Editors


E-Mail Website
Guest Editor
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Interests: facial expression analysis; pain assessment; depression detection; partial label learning; multi-instance learning

E-Mail Website
Guest Editor
Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, Xi’an 710071, China
Interests: medical image classification and recognition; big data analysis and mining; artificial intelligence algorithms

E-Mail Website
Guest Editor
Academy of Advanced Interdisciplinary Research, Xidian University, Xi’an 710071, China
Interests: video-based action recognition; action quality assessment; computer-aided diagnosis of developmental coordination disorder

E-Mail Website
Guest Editor
Academy of Advanced Interdisciplinary Research, Xidian University, Xi’an 710071, China
Interests: multi-organ segmentation; motion-compensated 4DCBCT reconstruction

Special Issue Information

Dear Colleagues,

We are pleased to announce a Special Issue on "Deep Learning for Facial Emotion Analysis and Human Activity Recognition" in Electronics. This Special Issue aims to explore the advancements and applications of deep learning in facial emotion analysis and human activity recognition, with a focus on their significance in the domains of health, interaction, and security.

Facial emotion and human activity are two of the most important human biological characteristics, and they are the most direct and powerful signals for human beings to express their deepest emotional states and their own intentions. Facial emotion analysis and human activity recognition are keys to intelligently perceiving human emotions and activities, which have received extensive attention in various fields, such as disease-assisted diagnosis, human–computer interaction, automatic driving, national defence and security, intelligent education, and intelligent surveillance. Deep learning techniques have demonstrated remarkable performance in extracting discriminative features and modelling complex patterns from facial images, enabling accurate and robust facial emotion analysis.

This Special Issue aims to bring together researchers and experts from diverse fields, such as computer vision, psychology, healthcare, human–computer interaction, and security, to present their original research, review articles, and technical reports on topics related to deep learning for facial emotion analysis and human activity recognition.

The scope of this Special Issue includes, but is not limited to, the following topics:  

  • Deep learning for facial expression recognition;
  • Deep learning for facial pain assessment;
  • Deep learning-based depression detection;
  • Driver fatigue detection using facial emotion analysis;
  • Multi-modal fusion for enhanced facial emotion analysis;
  • Real-time facial emotion analysis for interactive systems;
  • Transfer learning and domain adaptation for facial emotion analysis;
  • Facial emotion analysis in virtual reality and augmented reality environments;
  • Explainable deep learning models for facial emotion analysis;
  • Deep learning for human action recognition;
  • Spatiotemporal action localization;
  • Action quality assessment;
  • Emotion generation.

By exploring the intersection of deep learning on facial emotion analysis and human activity recognition and their applications in health, interaction, and security, this Special Issue aims to provide valuable insights into the potential of deep learning techniques in understanding and utilizing facial expressions and human activity. The contributions in this Special Issue will foster advancements in healthcare diagnostics, human–computer interaction, and security systems, leading to improved well-being, enhanced user experiences, and better safety measures.

We look forward to receiving your contributions.

Dr. Shasha Mao
Prof. Dr. Shuiping Gou
Dr. Ruimin Li
Dr. Nuo Tong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • deep learning
  • facial emotion analysis
  • facial expression recognition
  • pain estimation
  • depression detection
  • affective computing
  • human activity recognition
  • human behaviour analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 6085 KB  
Article
AFCN: An Attention-Based Fusion Consistency Network for Facial Emotion Recognition
by Qi Wei, Hao Pei and Shasha Mao
Electronics 2025, 14(17), 3523; https://doi.org/10.3390/electronics14173523 - 3 Sep 2025
Viewed by 692
Abstract
Due to the local similarities between different facial expressions and the subjective influences of annotators, large-scale facial expression datasets contain significant label noise. Recognition-based noisy labels are a key challenge in the field of deep facial expression recognition (FER). Based on this, this [...] Read more.
Due to the local similarities between different facial expressions and the subjective influences of annotators, large-scale facial expression datasets contain significant label noise. Recognition-based noisy labels are a key challenge in the field of deep facial expression recognition (FER). Based on this, this paper proposes a simple and effective attention-based fusion consistency network (AFCN), which suppresses the impact of uncertainty and prevents deep networks from overemphasising local features. Specifically, the AFCN comprises four modules: a sample certainty analysis module, a label correction module, an attention fusion module, and a fusion consistency learning module. Among these, the sample certainty analysis module is designed to calculate the certainty of each input facial expression image; the label correction module re-labels samples with low certainty based on the model’s prediction results; the attention fusion module identifies all possible key regions of facial expressions and fuses them; the fusion consistency learning module constrains the model to maintain consistency between the regions of interest for the actual labels of facial expressions and the fusion of all possible key regions of facial expressions. This guides the model to perceive and learn global facial expression features and prevents it from incorrectly classifying expressions based solely on local features associated with noisy labels. Experiments are conducted on multiple noisy datasets to validate the effectiveness of the proposed method. The experimental results illustrate that the proposed method outperforms current state-of-the-art methods, achieving a 3.03% accuracy improvement on the 30% noisy RAF-DB dataset in particular. Full article
Show Figures

Figure 1

19 pages, 771 KB  
Article
FRU-Adapter: Frame Recalibration Unit Adapter for Dynamic Facial Expression Recognition
by Myungbeom Her, Hamza Ghulam Nabi and Ji-Hyeong Han
Electronics 2025, 14(5), 978; https://doi.org/10.3390/electronics14050978 - 28 Feb 2025
Cited by 1 | Viewed by 1752
Abstract
Dynamic facial expression recognition (DFER) is one of the most important challenges in computer vision, as it plays a crucial role in human–computer interaction. Recently, adapter-based approaches have been introduced into DFER, and they have achieved remarkable success. However, the adapters still suffer [...] Read more.
Dynamic facial expression recognition (DFER) is one of the most important challenges in computer vision, as it plays a crucial role in human–computer interaction. Recently, adapter-based approaches have been introduced into DFER, and they have achieved remarkable success. However, the adapters still suffer from the following problems: overlooking irrelevant frames and interference with pre-trained information. In this paper, we propose a frame recalibration unit adapter (FRU-Adapter) which combines the strengths of a frame recalibration unit (FRU) and temporal self-attention (T-SA) to address the aforementioned issues. The FRU initially recalibrates the frames by emphasizing important frames and suppressing less relevant frames. The recalibrated frames are then fed into T-SA to capture the correlations between meaningful frames. As a result, the FRU-Adapter captures enhanced temporal dependencies by considering the irrelevant frames in a clip. Furthermore, we propose a method for attaching the FRU-Adapter to each encoder layer in parallel to reduce the loss of pre-trained information. Notably, the FRU-Adapter uses only 2% of the total training parameters per task while achieving an improved accuracy. Extended experiments on DFER tasks show that the proposed FRU-Adapter not only outperforms the state-of-the-art models but also exhibits parameter efficiency. The source code will be made publicly available. Full article
Show Figures

Figure 1

20 pages, 4882 KB  
Article
Empowering Recovery: The T-Rehab System’s Semi-Immersive Approach to Emotional and Physical Well-Being in Tele-Rehabilitation
by Hayette Hadjar, Binh Vu and Matthias Hemmje
Electronics 2025, 14(5), 852; https://doi.org/10.3390/electronics14050852 - 21 Feb 2025
Viewed by 1329
Abstract
The T-Rehab System delivers a semi-immersive tele-rehabilitation experience by integrating Affective Computing (AC) through facial expression analysis and contactless heartbeat monitoring. T-Rehab closely monitors patients’ mental health as they engage in a personalized, semi-immersive Virtual Reality (VR) game on a desktop PC, using [...] Read more.
The T-Rehab System delivers a semi-immersive tele-rehabilitation experience by integrating Affective Computing (AC) through facial expression analysis and contactless heartbeat monitoring. T-Rehab closely monitors patients’ mental health as they engage in a personalized, semi-immersive Virtual Reality (VR) game on a desktop PC, using a webcam with MediaPipe to track their hand movements for interactive exercises, allowing the system to tailor treatment content for increased engagement and comfort. T-Rehab’s evaluation comprises two assessments: system performance and cognitive walkthroughs. The first evaluation focuses on system performance, assessing the tested game, middleware, and facial emotion monitoring to ensure hardware compatibility and effective support for AC, gaming, and tele-rehabilitation. The second evaluation uses cognitive walkthroughs to examine usability, identifying potential issues in emotion detection and tele-rehabilitation. Together, these evaluations provide insights into T-Rehab’s functionality, usability, and impact in supporting both physical rehabilitation and emotional well-being. The thorough integration of technology inside T-Rehab ensures a holistic approach to tele-rehabilitation, allowing patients to participate comfortably and efficiently from anywhere. This technique not only improves physical therapy outcomes but also promotes mental resilience, marking an important step advance in tele-rehabilitation practices. Full article
Show Figures

Figure 1

20 pages, 647 KB  
Article
Towards Realistic Human Motion Prediction with Latent Diffusion and Physics-Based Models
by Ziliang Ren, Miaomiao Jin, Huabei Nie, Jianqiao Shen, Ani Dong and Qieshi Zhang
Electronics 2025, 14(3), 605; https://doi.org/10.3390/electronics14030605 - 4 Feb 2025
Cited by 2 | Viewed by 3593
Abstract
Many applications benefit from the prediction of 3D human motion based on past observations, e.g., human–computer interactions, autonomous driving. However, while existing methods based on encoding–decoding achieve good performance, prediction in the range of seconds still suffers from errors and motion switching scarcity. [...] Read more.
Many applications benefit from the prediction of 3D human motion based on past observations, e.g., human–computer interactions, autonomous driving. However, while existing methods based on encoding–decoding achieve good performance, prediction in the range of seconds still suffers from errors and motion switching scarcity. In this paper, we propose a Latent Diffusion and Physical Principles Model (LDPM) to achieve accurate human motion prediction. Our framework performs human motion prediction by learning information about the potential space, noise-generated motion, and combining physical control of body motion, where physics principles estimate the next frame through the Euler–Lagrange equation. The framework effectively accomplishes motion switching and reduces the error accumulated over time. The proposed architecture is evaluated on three challenging datasets: Human3.6M (Human 3D Motion Capture Dataset), HumanEva-I (Human Evaluation dataset I), and AMASS (Archive of Motion Capture as Surface Shapes). We experimentally demonstrate the significant superiority of the proposed framework in the prediction range of seconds. Full article
Show Figures

Figure 1

17 pages, 3422 KB  
Article
TheraSense: Deep Learning for Facial Emotion Analysis in Mental Health Teleconsultation
by Hayette Hadjar, Binh Vu and Matthias Hemmje
Electronics 2025, 14(3), 422; https://doi.org/10.3390/electronics14030422 - 22 Jan 2025
Cited by 8 | Viewed by 5769
Abstract
Background: This paper presents TheraSense, a system developed within the Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions and evidence (Smile) and Sensor Enabled Affective Computing for Enhancing Medical Care (SenseCare) projects. TheraSense is designed to enhance teleconsultation services by [...] Read more.
Background: This paper presents TheraSense, a system developed within the Supporting Mental Health in Young People: Integrated Methodology for cLinical dEcisions and evidence (Smile) and Sensor Enabled Affective Computing for Enhancing Medical Care (SenseCare) projects. TheraSense is designed to enhance teleconsultation services by leveraging deep learning for real-time emotion recognition through facial expressions. It integrates with the Knowledge Management-Ecosystem Portal (SenseCare KM-EP) platform to provide mental health practitioners with valuable emotional insights during remote consultations. Method: We describe the conceptual design of TheraSense, including its use case contexts, architectural structure, and user interface layout. The system’s interoperability is discussed in detail, highlighting its seamless integration within the teleconsultation workflow. The evaluation methods include both quantitative assessments of the video-based emotion recognition system’s performance and qualitative feedback through heuristic evaluation and survey analysis. Results: The performance evaluation shows that TheraSense effectively recognizes emotions in video streams, with positive user feedback on its usability and integration. The system’s real-time emotion detection capabilities provide valuable support for mental health practitioners during remote sessions. Conclusions: TheraSense demonstrates its potential as an innovative tool for enhancing teleconsultation services. By providing real-time emotional insights, it supports better-informed decision-making in mental health care, making it an effective addition to remote telehealth platforms. Full article
Show Figures

Graphical abstract

Back to TopTop