error_outline You can access the new MDPI.com website here. Explore and share your feedback with us.
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (76)

Search Parameters:
Keywords = facial action unit

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 3111 KB  
Article
Context-Aware Visual Emotion Recognition Through Hierarchical Fusion of Facial Micro-Features and Scene Semantics
by Karn Yongsiriwit, Parkpoom Chaisiriprasert, Thannob Aribarg and Sokliv Kork
Appl. Sci. 2025, 15(24), 13160; https://doi.org/10.3390/app152413160 - 15 Dec 2025
Viewed by 376
Abstract
Visual emotion recognition in unconstrained environments remains challenging, as single-stream deep learning models often fail to capture the localized facial cues and contextual information necessary for accurate classification. This study introduces a hierarchical multi-level feature fusion framework that systematically combines low-level micro-textural features [...] Read more.
Visual emotion recognition in unconstrained environments remains challenging, as single-stream deep learning models often fail to capture the localized facial cues and contextual information necessary for accurate classification. This study introduces a hierarchical multi-level feature fusion framework that systematically combines low-level micro-textural features (Local Binary Patterns), mid-level facial cues (Facial Action Units), and high-level scene semantics (Places365) with ResNet-50 global embeddings. Evaluated on the large-scale EmoSet-3.3M dataset, which contains 3.3 million images across eight emotion categories, the framework demonstrates marked performance gains with the best configuration (LBP-FAUs-Places365-ResNet). The proposed framework achieves 74% accuracy and a macro-averaged F1-score of 0.75 under its best configuration (LBP-FAUs-Places365-ResNet), representing a five-percentage-point improvement over the ResNet-50 baseline. The approach excels at distinguishing high-intensity emotions, maintaining efficient inference (2.2 ms per image, 29 M parameters), and analysis confirms that integrating facial muscle activations with scene context enables nuanced emotional differentiation. These results validate that hierarchical feature integration significantly advances robust, human-aligned visual emotion recognition, making it suitable for real-world Human–Computer Interaction (HCI) and affective computing applications. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

20 pages, 14885 KB  
Article
MultiPhysio-HRC: A Multimodal Physiological Signals Dataset for Industrial Human–Robot Collaboration
by Andrea Bussolan, Stefano Baraldo, Oliver Avram, Pablo Urcola, Luis Montesano, Luca Maria Gambardella and Anna Valente
Robotics 2025, 14(12), 184; https://doi.org/10.3390/robotics14120184 - 5 Dec 2025
Viewed by 751
Abstract
Human–robot collaboration (HRC) is a key focus of Industry 5.0, aiming to enhance worker productivity while ensuring well-being. The ability to perceive human psycho-physical states, such as stress and cognitive load, is crucial for adaptive and human-aware robotics. This paper introduces MultiPhysio-HRC, a [...] Read more.
Human–robot collaboration (HRC) is a key focus of Industry 5.0, aiming to enhance worker productivity while ensuring well-being. The ability to perceive human psycho-physical states, such as stress and cognitive load, is crucial for adaptive and human-aware robotics. This paper introduces MultiPhysio-HRC, a multimodal dataset containing physiological, audio, and facial data collected during real-world HRC scenarios. The dataset includes electroencephalography (EEG), electrocardiography (ECG), electrodermal activity (EDA), respiration (RESP), electromyography (EMG), voice recordings, and facial action units. The dataset integrates controlled cognitive tasks, immersive virtual reality experiences, and industrial disassembly activities performed manually and with robotic assistance, to capture a holistic view of the participants’ mental states. Rich ground truth annotations were obtained using validated psychological self-assessment questionnaires. Baseline models were evaluated for stress and cognitive load classification, demonstrating the dataset’s potential for affective computing and human-aware robotics research. MultiPhysio-HRC is publicly available to support research in human-centered automation, workplace well-being, and intelligent robotic systems. Full article
(This article belongs to the Special Issue Human–Robot Collaboration in Industry 5.0)
Show Figures

Figure 1

7 pages, 645 KB  
Proceeding Paper
Detection of Students’ Emotions in an Online Learning Environment Using a CNN-LSTM Model
by Bilkisu Muhammad Bashir and Hadiza Ali Umar
Eng. Proc. 2025, 87(1), 116; https://doi.org/10.3390/engproc2025087116 - 2 Dec 2025
Viewed by 392
Abstract
Emotion recognition through facial expressions is crucial in fields like healthcare, entertainment, and education, offering insights into user experiences. In online learning, traditional methods fail to capture students’ emotions effectively. This research introduces a hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory [...] Read more.
Emotion recognition through facial expressions is crucial in fields like healthcare, entertainment, and education, offering insights into user experiences. In online learning, traditional methods fail to capture students’ emotions effectively. This research introduces a hybrid Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) model to recognize learning emotions (interest, boredom, and confusion) during online lectures. A custom dataset was constructed by mapping action units from FER2013, CK+48, and JAFFE datasets into three learning-related categories. Images were preprocessed (grayscale conversion, resizing, normalization) and divided into training and testing sets. The CNN layers extract spatial facial features, while the LSTM layers capture temporal dependencies across video frames. Evaluation metrics included accuracy, precision, recall, and F1-score. The model achieved 98.0% accuracy, 97% precision, 98% recall, and 98% F1-score, surpassing existing CNN-only methods. This advancement enhances online learning by enabling personalized support and has applications in education, psychology, and human–computer interaction, contributing to affective computing development. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Figure 1

25 pages, 5070 KB  
Article
An Emotional AI Chatbot Using an Ontology and a Novel Audiovisual Emotion Transformer for Improving Nonverbal Communication
by Yun Wang, Liege Cheung, Patrick Ma, Herbert Lee and Adela S.M. Lau
Electronics 2025, 14(21), 4304; https://doi.org/10.3390/electronics14214304 - 31 Oct 2025
Viewed by 893
Abstract
One of the key limitations of AI chatbots is the lack of human-like nonverbal communication. Although there are many research studies on video or audio emotion recognition for detecting human emotions, there is no research that combines video, audio, and ontology methods to [...] Read more.
One of the key limitations of AI chatbots is the lack of human-like nonverbal communication. Although there are many research studies on video or audio emotion recognition for detecting human emotions, there is no research that combines video, audio, and ontology methods to develop an AI chatbot with human-like communication. Therefore, this research aims to develop an audio-video emotion recognition model and an emotion-ontology-based chatbot engine to improve human-like communication with emotion detection. This research proposed a novel model of cluster-based audiovisual emotion recognition for improving emotion detection with both video and audio signals and compared it with existing methods using video or audio signals only. Twenty-two audio features, the Mel spectrogram, and facial action units were extracted, and the last two were fed into a cluster-based independent transformer to learn long-term temporal dependencies. Our model was validated on three public audiovisual datasets: RAVDESS, SAVEE, and RML. The results demonstrated that the accuracy scores of the clustered transformer model for RAVDESS, SAVEE, and RML were 86.46%, 92.71%, and 91.67%, respectively, outperforming the existing best model with accuracy scores of 86.3%, 75%, and 60.2%, respectively. An emotion-ontology-based chatbot engine was implemented to make inquiry responses based on the detected emotion. A case study of the HKU Campusland metaverse was used as proof of concept of the emotional AI chatbot for nonverbal communication. Full article
Show Figures

Figure 1

13 pages, 1590 KB  
Article
Anorexia Nervosa Dampens Subjective and Facial Pain Responsiveness
by Stefan Lautenbacher, Miriam Kunz and Karl-Jürgen Bär
Brain Sci. 2025, 15(10), 1082; https://doi.org/10.3390/brainsci15101082 - 7 Oct 2025
Viewed by 484
Abstract
Background/Objectives: Individuals with anorexia nervosa (AN) are known to exhibit both reduced pain sensitivity—when assessed via thresholds and subjective ratings—and diminished facial expressions of emotion. Therefore, investigating the facial response to pain in this population is of particular interest. Method: Seventeen patients with [...] Read more.
Background/Objectives: Individuals with anorexia nervosa (AN) are known to exhibit both reduced pain sensitivity—when assessed via thresholds and subjective ratings—and diminished facial expressions of emotion. Therefore, investigating the facial response to pain in this population is of particular interest. Method: Seventeen patients with AN and 18 age- and sex-matched healthy controls were assessed using a thermode to induce heat pain. Subjective pain measures included pain threshold, pain tolerance, and pain ratings of supra-threshold stimuli, rated on a numerical rating scale (NRS). Facial responses to the suprathreshold stimuli were analyzed using the Facial Action Coding System (FACS). Eating pathology was assessed using the Eating Attitudes Test (EAT-26), the Eating Disorder Inventory-2 (EDI-2) and the body mass index (BMI), while depression was measured using the Beck Depression Inventory-II (BDI-II). Results: Compared with healthy controls, AN patients showed altogether significantly reduced facial expressions of pain, with particularly pronounced reductions in Action Units AU 6_7 and AU 9_10. In contrast, subjective pain measures showed only marginal differences between groups. Importantly, the reduction in facial expression could not be accounted for by differences in pain thresholds or ratings, nor by levels of eating pathology or depression. Conclusions: Individuals with AN display a markedly reduced facial expression of pain, which was observed for the first time, consistent with similar findings regarding the facial expressions of emotions. As this reduction cannot be explained by subjective pain report, it suggests that the communication of pain is impaired on two levels in AN: both in verbal and in nonverbal signaling. This may hinder the ability of others to recognize and respond to their pain appropriately. Full article
(This article belongs to the Section Neuropsychiatry)
Show Figures

Figure 1

23 pages, 3668 KB  
Article
Graph-Driven Micro-Expression Rendering with Emotionally Diverse Expressions for Lifelike Digital Humans
by Lei Fang, Fan Yang, Yichen Lin, Jing Zhang and Mincheol Whang
Biomimetics 2025, 10(9), 587; https://doi.org/10.3390/biomimetics10090587 - 3 Sep 2025
Viewed by 1132
Abstract
Micro-expressions, characterized by brief and subtle facial muscle movements, are essential for conveying nuanced emotions in digital humans, yet existing rendering techniques often produce rigid or emotionally monotonous animations due to the inadequate modeling of temporal dynamics and action unit interdependencies. This paper [...] Read more.
Micro-expressions, characterized by brief and subtle facial muscle movements, are essential for conveying nuanced emotions in digital humans, yet existing rendering techniques often produce rigid or emotionally monotonous animations due to the inadequate modeling of temporal dynamics and action unit interdependencies. This paper proposes a graph-driven framework for micro-expression rendering that generates emotionally diverse and lifelike expressions. We employ a 3D-ResNet-18 backbone network to perform joint spatio-temporal feature extraction from facial video sequences, enhancing sensitivity to transient motion cues. Action units (AUs) are modeled as nodes in a symmetric graph, with edge weights derived from empirical co-occurrence probabilities and processed via a graph convolutional network to capture structural dependencies and symmetric interactions. This symmetry is justified by the inherent bilateral nature of human facial anatomy, where AU relationships are based on co-occurrence and facial anatomy analysis (as per the FACS), which are typically undirected and symmetric. Human faces are symmetric, and such relationships align with the design of classic spectral GCNs for undirected graphs, assuming that adjacency matrices are symmetric to model non-directional co-occurrences effectively. Predicted AU activations and timestamps are interpolated into continuous motion curves using B-spline functions and mapped to skeletal controls within a real-time animation pipeline (Unreal Engine). Experiments on the CASME II dataset demonstrate superior performance, achieving an F1-score of 77.93% and an accuracy of 84.80% (k-fold cross-validation, k = 5), outperforming baselines in temporal segmentation. Subjective evaluations confirm that the rendered digital human exhibits improvements in perceptual clarity, naturalness, and realism. This approach bridges micro-expression recognition and high-fidelity facial animation, enabling more expressive virtual interactions through curve extraction from AU values and timestamps. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

23 pages, 10088 KB  
Article
Development of an Interactive Digital Human with Context-Sensitive Facial Expressions
by Fan Yang, Lei Fang, Rui Suo, Jing Zhang and Mincheol Whang
Sensors 2025, 25(16), 5117; https://doi.org/10.3390/s25165117 - 18 Aug 2025
Cited by 2 | Viewed by 1992
Abstract
With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression [...] Read more.
With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression generation. The system establishes a complete pipeline for real-time interaction and compound emotional expression, following a sequence of “speech semantic parsing—multimodal emotion recognition—Action Unit (AU)-level 3D facial expression control.” First, a ResNet18-based model is employed for robust emotion classification using the AffectNet dataset. Then, an AU motion curve driving module is constructed on the Unreal Engine platform, where dynamic synthesis of basic emotions is achieved via a state-machine mechanism. Finally, Generative Pre-trained Transformer (GPT) is utilized for semantic analysis, generating structured emotional weight vectors that are mapped to the AU layer to enable language-driven facial responses. Experimental results demonstrate that the proposed system significantly improves facial animation quality, with naturalness increasing from 3.54 to 3.94 and semantic congruence from 3.44 to 3.80. These results validate the system’s capability to generate realistic and emotionally coherent expressions in real time. This research provides a complete technical framework and practical foundation for high-fidelity digital humans with affective interaction capabilities. Full article
(This article belongs to the Special Issue Emotion Recognition Based on Sensors (3rd Edition))
Show Figures

Figure 1

14 pages, 1124 KB  
Article
The Correlation Between Body Pain Indicators and the Facial Expression Scale in Sows During Farrowing and Pre-Weaning: The Effects of Parity, the Farrowing Moment, and Suckling Events
by Elena Navarro, Raúl David Guevara, Eva Mainau, Ricardo de Miguel and Xavier Manteca
Animals 2025, 15(15), 2225; https://doi.org/10.3390/ani15152225 - 28 Jul 2025
Viewed by 919
Abstract
Parturition is accepted as a painful situation. Few studies explore pain-specific behaviours during farrowing in sows. The objectives of this study were, first, to assess if behavioural pain indicators (BPIs) are affected by the farrowing moment, parity, and suckling events, and second, to [...] Read more.
Parturition is accepted as a painful situation. Few studies explore pain-specific behaviours during farrowing in sows. The objectives of this study were, first, to assess if behavioural pain indicators (BPIs) are affected by the farrowing moment, parity, and suckling events, and second, to determine the relationship between the Facial Action Units (FAUs) and BPIs during farrowing. Ten Danbred sows were recorded throughout farrowing and on day 19 post-farrowing. Continuous observations of five BPIs and five FAUs were obtained across the three moments studied: (i) at the expulsion of the piglets, (ii) the time interval between the delivery of each piglet, and (iii) 19 days after farrowing, used as a control. Primiparous sows had more BPIs but fewer postural changes than multiparous sows. The BPIs were more frequent during suckling events in the pre-weaning moment. All the FAUs and BPIs were rare or absent post-farrowing (p < 0.05), and almost all of them were more frequent during farrowing (especially at the moment of delivery). Back arching showed the highest correlation with all the FAUs, and tension above the eyes showed the highest correlation with four of the BPIs. The BPIs and FAUs indicate that sows experience more pain during farrowing than during the third week post-farrowing, and piglet expulsion is the most painful moment in farrowing. Full article
(This article belongs to the Section Animal Welfare)
Show Figures

Figure 1

18 pages, 5806 KB  
Article
Optical Flow Magnification and Cosine Similarity Feature Fusion Network for Micro-Expression Recognition
by Heyou Chang, Jiazheng Yang, Kai Huang, Wei Xu, Jian Zhang and Hao Zheng
Mathematics 2025, 13(15), 2330; https://doi.org/10.3390/math13152330 - 22 Jul 2025
Viewed by 975
Abstract
Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification [...] Read more.
Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification and Cosine Similarity Feature Fusion Network (MCNet). MCNet introduces a multi-facial action optical flow estimation module that integrates global motion-amplified optical flow with localized optical flow from the eye and mouth–nose regions, enabling precise capture of facial expression nuances. Additionally, an enhanced MobileNetV3-based feature extraction module, incorporating Kolmogorov–Arnold networks and convolutional attention mechanisms, effectively captures both global and local features from optical flow images. A novel multi-channel feature fusion module leverages cosine similarity between Query and Key token sequences to optimize feature integration. Extensive evaluations on four public datasets—CASME II, SAMM, SMIC-HS, and MMEW—demonstrate MCNet’s superior performance, achieving state-of-the-art results with 92.88% UF1 and 86.30% UAR on the composite dataset, surpassing the best prior method by 1.77% in UF1 and 6.0% in UAR. Full article
(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)
Show Figures

Figure 1

21 pages, 1709 KB  
Article
Decoding Humor-Induced Amusement via Facial Expression Analysis: Toward Emotion-Aware Applications
by Gabrielle Toupin, Arthur Dehgan, Marie Buffo, Clément Feyt, Golnoush Alamian, Karim Jerbi and Anne-Lise Saive
Appl. Sci. 2025, 15(13), 7499; https://doi.org/10.3390/app15137499 - 3 Jul 2025
Viewed by 1150
Abstract
Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet, the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or [...] Read more.
Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet, the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or coarse summary ratings, providing little insight into how amusement unfolds over time. To address this gap, we developed a Random Forest model to predict the intensity of amusement evoked by humorous video clips, based on participants’ facial expressions—particularly the co-activation of Facial Action Units 6 and 12 (“% Smile”)—and video features such as motion, saliency, and topic. Our results show that exposure to humorous content significantly increases “% Smile”, with amusement peaking toward the end of videos. Importantly, we observed emotional carry-over effects, suggesting that consecutive humorous stimuli can sustain or amplify positive emotional responses. Even when trained solely on humorous content, the model reliably predicted amusement intensity, underscoring the robustness of our approach. Overall, this study provides a novel, objective method to track amusement on a fine temporal scale, advancing the measurement of nonverbal emotional expression. These findings may inform the design of emotion-aware applications and humor-based therapeutic interventions to promote well-being and emotional health. Full article
Show Figures

Figure 1

21 pages, 2869 KB  
Article
Multimodal Feature-Guided Audio-Driven Emotional Talking Face Generation
by Xueping Wang, Yuemeng Huo, Yanan Liu, Xueni Guo, Feihu Yan and Guangzhe Zhao
Electronics 2025, 14(13), 2684; https://doi.org/10.3390/electronics14132684 - 2 Jul 2025
Cited by 2 | Viewed by 5629
Abstract
Audio-driven emotional talking face generation aims to generate talking face videos with rich facial expressions and temporal coherence. Current diffusion model-based approaches predominantly depend on either single-label emotion annotations or external video references, which often struggle to capture the complex relationships between modalities, [...] Read more.
Audio-driven emotional talking face generation aims to generate talking face videos with rich facial expressions and temporal coherence. Current diffusion model-based approaches predominantly depend on either single-label emotion annotations or external video references, which often struggle to capture the complex relationships between modalities, resulting in less natural emotional expressions. To address these issues, we propose MF-ETalk, a multimodal feature-guided method for emotional talking face generation. Specifically, we design an emotion-aware multimodal feature disentanglement and fusion framework that leverages Action Units (AUs) to disentangle facial expressions and models the nonlinear relationships among AU features using a residual encoder. Furthermore, we introduce a hierarchical multimodal feature fusion module that enables dynamic interactions among audio, visual cues, AUs, and motion dynamics. This module is optimized through global motion modeling, lip synchronization, and expression subspace learning, enabling full-face dynamic generation. Finally, an emotion-consistency constraint module is employed to refine the generated results and ensure the naturalness of expressions. Extensive experiments on the MEAD and HDTF datasets demonstrate that MF-ETalk outperforms state-of-the-art methods in both expression naturalness and lip-sync accuracy. For example, it achieves an FID of 43.052 and E-FID of 2.403 on MEAD, along with strong synchronization performance (LSE-C of 6.781, LSE-D of 7.962), confirming the effectiveness of our approach in producing realistic and emotionally expressive talking face videos. Full article
Show Figures

Figure 1

20 pages, 3651 KB  
Article
A Meta-Learner Based on the Combination of Stacking Ensembles and a Mixture of Experts for Balancing Action Unit Recognition
by Andrew Sumsion and Dah-Jye Lee
Electronics 2025, 14(13), 2665; https://doi.org/10.3390/electronics14132665 - 30 Jun 2025
Cited by 1 | Viewed by 1739
Abstract
Facial action units (AUs) are used throughout animation, clinical settings, and robotics. AU recognition usually works better for these downstream tasks when it achieves high performance across all AUs. Current facial AU recognition approaches tend to perform unevenly across all AUs. Among other [...] Read more.
Facial action units (AUs) are used throughout animation, clinical settings, and robotics. AU recognition usually works better for these downstream tasks when it achieves high performance across all AUs. Current facial AU recognition approaches tend to perform unevenly across all AUs. Among other potential reasons, one cause is their focus on improving the overall average F1 score, where good performance on a small number of AUs increases the overall average F1 score even with poor performance on other AUs. Building on our previous success, which achieved the highest average F1 score, this work focuses on improving its performance across all AUs to address this challenge. We propose a mixture of experts as the meta-learner to combine the outputs of an explicit stacking ensemble. For our ensemble, we use a heterogeneous, negative correlation, explicit stacking ensemble. We introduce an additional measurement called Borda ranking to better evaluate the overall performance across all AUs. As indicated by this additional metric, our method not only maintains the best overall average F1 score but also achieves the highest performance across all AUs on the BP4D and DISFA datasets. We also release a synthetic dataset as additional training data, the first with balanced AU labels. Full article
Show Figures

Figure 1

23 pages, 3258 KB  
Article
Trade-Off Between Energy Consumption and Three Configuration Parameters in Artificial Intelligence (AI) Training: Lessons for Environmental Policy
by Sri Ariyanti, Muhammad Suryanegara, Ajib Setyo Arifin, Amalia Irma Nurwidya and Nur Hayati
Sustainability 2025, 17(12), 5359; https://doi.org/10.3390/su17125359 - 10 Jun 2025
Cited by 4 | Viewed by 4161
Abstract
Rapid advancements in artificial intelligence (AI) have led to a substantial increase in energy consumption, particularly during the training phase of AI models. As AI adoption continues to grow, its environmental impact presents a significant challenge to the achievement of the United Nations’ [...] Read more.
Rapid advancements in artificial intelligence (AI) have led to a substantial increase in energy consumption, particularly during the training phase of AI models. As AI adoption continues to grow, its environmental impact presents a significant challenge to the achievement of the United Nations’ Sustainable Development Goals (SDGs). This study examines how three key training configuration parameters—early-stopping epochs, training data size, and batch size—can be optimized to balance model accuracy and energy efficiency. Through a series of experimental simulations, we analyze the impact of each parameter on both energy consumption and model performance, offering insights that contribute to the development of environmental policies that are aligned with the SDGs. The results demonstrate strong potential for reducing energy usage without compromising model reliability. The results highlight three lessons: promoting early-stopping epochs as an energy-efficient practice, limiting training data size to enhance energy efficiency, and developing standardized guidelines for batch size optimization. The practical applicability of these three lessons is illustrated through the implementation of a smart building attendance system using facial recognition technology within an Ecocampus environment. This real-world application highlights how energy-conscious AI training configurations support sustainable urban innovation and contribute to climate action and environmentally responsible AI development. Full article
(This article belongs to the Special Issue Artificial Intelligence and Sustainable Development)
Show Figures

Figure 1

23 pages, 1664 KB  
Article
Seeing the Unseen: Real-Time Micro-Expression Recognition with Action Units and GPT-Based Reasoning
by Gabriela Laura Sălăgean, Monica Leba and Andreea Cristina Ionica
Appl. Sci. 2025, 15(12), 6417; https://doi.org/10.3390/app15126417 - 6 Jun 2025
Cited by 3 | Viewed by 4082
Abstract
This paper presents a real-time system for the detection and classification of facial micro-expressions, evaluated on the CASME II dataset. Micro-expressions are brief and subtle indicators of genuine emotions, posing significant challenges for automatic recognition due to their low intensity, short duration, and [...] Read more.
This paper presents a real-time system for the detection and classification of facial micro-expressions, evaluated on the CASME II dataset. Micro-expressions are brief and subtle indicators of genuine emotions, posing significant challenges for automatic recognition due to their low intensity, short duration, and inter-subject variability. To address these challenges, the proposed system integrates advanced computer vision techniques, rule-based classification grounded in the Facial Action Coding System, and artificial intelligence components. The architecture employs MediaPipe for facial landmark tracking and action unit extraction, expert rules to resolve common emotional confusions, and deep learning modules for optimized classification. Experimental validation demonstrated a classification accuracy of 93.30% on CASME II, highlighting the effectiveness of the hybrid design. The system also incorporates mechanisms for amplifying weak signals and adapting to new subjects through continuous knowledge updates. These results confirm the advantages of combining domain expertise with AI-driven reasoning to improve micro-expression recognition. The proposed methodology has practical implications for various fields, including clinical psychology, security, marketing, and human-computer interaction, where the accurate interpretation of emotional micro-signals is essential. Full article
Show Figures

Figure 1

17 pages, 4080 KB  
Article
Defining and Analyzing Nervousness Using AI-Based Facial Expression Recognition
by Hyunsoo Seo, Seunghyun Kim and Eui Chul Lee
Mathematics 2025, 13(11), 1745; https://doi.org/10.3390/math13111745 - 25 May 2025
Viewed by 2615
Abstract
Nervousness is a complex emotional state characterized by high arousal and ambiguous valence, often triggered in high-stress environments. This study presents a mathematical and computational framework for defining and classifying nervousness using facial expression data projected onto a valence–arousal (V–A) space. A statistical [...] Read more.
Nervousness is a complex emotional state characterized by high arousal and ambiguous valence, often triggered in high-stress environments. This study presents a mathematical and computational framework for defining and classifying nervousness using facial expression data projected onto a valence–arousal (V–A) space. A statistical approach employing the Minimum Covariance Determinant (MCD) estimator is used to construct 90% and 99% confidence ellipses for nervous and non-nervous states, respectively, using Mahalanobis distance. These ellipses form the basis for binary labeling of the AffectNet dataset. We apply a deep learning model trained via knowledge distillation, with EmoNet as the teacher and MobileNetV2 as the student, to efficiently classify nervousness. The experimental results on the AffectNet dataset show that our proposed method achieves a classification accuracy of 81.08%, improving over the baseline by approximately 6%. These results are obtained by refining the valence–arousal distributions and applying knowledge distillation from EmoNet to MobileNetV2. We use accuracy and F1-score as evaluation metrics to validate the performance. Furthermore, we perform a qualitative analysis using action unit (AU) activation graphs to provide deeper insight into nervous facial expressions. The proposed method demonstrates how mathematical tools and deep learning can be integrated for robust affective state modeling. Full article
Show Figures

Figure 1

Back to TopTop