MDPI - Publisher of Open Access Journals

22 pages, 964 KB

Open AccessArticle

Multi-Modal Emotion Detection and Tracking System Using AI Techniques

by Werner Mostert, Anish Kurien and Karim Djouani

Computers 2025, 14(10), 441; https://doi.org/10.3390/computers14100441 - 16 Oct 2025

Viewed by 352

Emotion detection significantly impacts healthcare by enabling personalized patient care and improving treatment outcomes. Single-modality emotion recognition often lacks reliability due to the complexity and subjectivity of human emotions. This study proposes a multi-modal emotion detection platform integrating visual, audio, and heart rate [...] Read more.

Emotion detection significantly impacts healthcare by enabling personalized patient care and improving treatment outcomes. Single-modality emotion recognition often lacks reliability due to the complexity and subjectivity of human emotions. This study proposes a multi-modal emotion detection platform integrating visual, audio, and heart rate data using AI techniques, including convolutional neural networks and support vector machines. The system outperformed single-modality approaches, demonstrating enhanced accuracy and robustness. This improvement underscores the value of multi-modal AI in emotion detection, offering potential benefits across healthcare, education, and human–computer interaction. Full article

(This article belongs to the Special Issue Advances in Semantic Multimedia and Personalized Digital Content)

► Show Figures

Figure 1

9 pages, 3434 KB

Open AccessCommunication

Equine Skull Fractures: A Review of 13 Cases Managed Conservatively (2018–2022)

by Melanie Perrier, Maty Looijen and Gabriel Manso-Diaz

Sinusitis 2025, 9(2), 20; https://doi.org/10.3390/sinusitis9020020 - 15 Oct 2025

Viewed by 194

Abstract

This retrospective study reviews the clinical features, computed tomography (CT) findings, complications and outcomes of horses with skull fractures involving the facial bones. Medical records from the Royal Veterinary College, Hatfield, United Kingdom, and the Universidad Complutense, Madrid, Spain, were reviewed to identify [...] Read more.

This retrospective study reviews the clinical features, computed tomography (CT) findings, complications and outcomes of horses with skull fractures involving the facial bones. Medical records from the Royal Veterinary College, Hatfield, United Kingdom, and the Universidad Complutense, Madrid, Spain, were reviewed to identify horses presented for head CT with a history of skull fracture involving the facial bones between 2018 and 2022. Thirteen horses were included. Secondary sinusitis was present in 10 of the horses with the rostral maxillary, caudal maxillary and ventral conchal sinuses being the most commonly affected. There was associated fracture of dental structures in three cases. Treatment was conservative in seven cases, while in six horses some minimal surgical intervention was undertaken and included the removal of loose bony fragments and trephination for sinoscopy in two cases, fragment removal and sinus flush through a Foley catheter in three cases and dental extraction in one case. Prognosis was reported to be good to excellent in 10 horses. Among the most common complications, cosmetic sequalae was recorded in three cases. Overall conservative management of skull fracture should be considered a viable option for cases where perfect cosmetic results are not expected and where economics may be a limitation. Full article

► Show Figures

Figure 1

14 pages, 920 KB

Open AccessArticle

AI-Based Facial Emotion Analysis for Early and Differential Diagnosis of Dementia

by Letizia Bergamasco, Anita Coletta, Gabriella Olmo, Aurora Cermelli, Elisa Rubino and Innocenzo Rainero

Bioengineering 2025, 12(10), 1082; https://doi.org/10.3390/bioengineering12101082 - 4 Oct 2025

Viewed by 785

Abstract

Early and differential diagnosis of dementia is essential for timely and targeted care. This study investigated the feasibility of using an artificial intelligence (AI)-based system to discriminate between different stages and etiologies of dementia by analyzing facial emotions. We collected video recordings of [...] Read more.

Early and differential diagnosis of dementia is essential for timely and targeted care. This study investigated the feasibility of using an artificial intelligence (AI)-based system to discriminate between different stages and etiologies of dementia by analyzing facial emotions. We collected video recordings of 64 participants exposed to standardized audio-visual stimuli. Facial emotion features in terms of valence and arousal were extracted and used to train machine learning models on multiple classification tasks, including distinguishing individuals with mild cognitive impairment (MCI) and overt dementia from healthy controls (HCs) and differentiating Alzheimer’s disease (AD) from other types of cognitive impairment. Nested cross-validation was adopted to evaluate the performance of different tested models (K-Nearest Neighbors, Logistic Regression, and Support Vector Machine models) and optimize their hyperparameters. The system achieved a cross-validation accuracy of 76.0% for MCI vs. HCs, 73.6% for dementia vs. HCs, and 64.1% in the three-class classification (MCI vs. dementia vs. HCs). Among cognitively impaired individuals, a 75.4% accuracy was reached in distinguishing AD from other etiologies. These results demonstrated the potential of AI-driven facial emotion analysis as a non-invasive tool for early detection of cognitive impairment and for supporting differential diagnosis of AD in clinical settings. Full article

(This article belongs to the Special Issue Next-Generation Diagnostic and Therapy Systems for Neurodegenerative Diseases)

► Show Figures

Figure 1

19 pages, 5381 KB

Open AccessArticle

Context_Driven Emotion Recognition: Integrating Multi_Cue Fusion and Attention Mechanisms for Enhanced Accuracy on the NCAER_S Dataset

by Merieme Elkorchi, Boutaina Hdioud, Rachid Oulad Haj Thami and Safae Merzouk

Information 2025, 16(10), 834; https://doi.org/10.3390/info16100834 - 26 Sep 2025

Viewed by 398

Abstract

In recent years, most conventional emotion recognition approaches have concentrated primarily on facial cues, often overlooking complementary sources of information such as body posture and contextual background. This limitation reduces their effectiveness in complex, real-world environments. In this work, we present a multi-branch [...] Read more.

In recent years, most conventional emotion recognition approaches have concentrated primarily on facial cues, often overlooking complementary sources of information such as body posture and contextual background. This limitation reduces their effectiveness in complex, real-world environments. In this work, we present a multi-branch emotion recognition framework that separately processes facial, bodily, and contextual information using three dedicated neural networks. To better capture contextual cues, we intentionally mask the face and body of the main subject within the scene, prompting the model to explore alternative visual elements that may convey emotional states. To further enhance the quality of the extracted features, we integrate both channel and spatial attention mechanisms into the network architecture. Evaluated on the challenging NCAER-S dataset, our model achieves an accuracy of 56.42%, surpassing the state-of-the-art GLAMOUR-Net. These results highlight the effectiveness of combining multi-cue representation and attention-guided feature extraction for robust emotion recognition in unconstrained settings. The findings also highlight the importance of accurate emotion recognition for human–computer interaction, where affect detection enables systems to adapt to users and deliver more effective experiences. Full article

(This article belongs to the Special Issue Multimodal Human-Computer Interaction)

► Show Figures

Figure 1

26 pages, 7399 KB

Open AccessArticle

ECL-ConvNeXt: An Ensemble Strategy Combining ConvNeXt and Contrastive Learning for Facial Beauty Prediction

by Junying Gan, Wenchao Xu, Hantian Chen, Zhen Chen, Zhenxin Zhuang and Huicong Li

Electronics 2025, 14(19), 3777; https://doi.org/10.3390/electronics14193777 - 24 Sep 2025

Viewed by 362

Abstract

Facial beauty prediction (FBP) is a cutting-edge topic in deep learning, aiming to endow computers with human-like esthetic judgment capabilities. Current facial beauty datasets are characterized by multi-class classification and imbalanced sample distributions. Most FBP methods focus on improving accuracy (ACC) as their [...] Read more.

Facial beauty prediction (FBP) is a cutting-edge topic in deep learning, aiming to endow computers with human-like esthetic judgment capabilities. Current facial beauty datasets are characterized by multi-class classification and imbalanced sample distributions. Most FBP methods focus on improving accuracy (ACC) as their primary goal, aiming to indirectly optimize other metrics. In contrast to ACC, which is well known to be a poor metric in cases of highly imbalanced datasets, the recall measures the proportion of correctly identified samples for each class, effectively evaluating classification performance across all classes without being affected by sample imbalances, thereby providing a fairer assessment of minority class performance. Therefore, targeting recall improvement facilitates balanced classification across all classes. The Macro Recall (MR), which averages the recall of all the classes, serves as a comprehensive metric for evaluating a model’s performance. Among numerous classic models, ConvNeXt, which integrates the designs of the Swin Transformer and ResNet, performs exceptionally well regarding its MR but still suffers from inter-class confusion in certain categories. To address this issue, this paper introduces contrastive learning (CL) to enhance the class separability by optimizing feature representations and reducing confusion. However, directly applying CL to all the classes may degrade the performance for high-recall categories. To this end, we propose using an ensemble strategy, ECL-ConvNeXt: First, ConvNeXt is used for multi-class prediction on the whole of dataset A to identify the most confused class pairs. Second, samples predicted to belong to these class pairs are extracted from the multi-class results to form dataset B. Third, true samples of these class pairs are extracted from dataset A to form dataset C, and CL is applied to improve their separability, training a dedicated auxiliary binary classifier (ConvNeXtCL-ABC) based on ConvNeXt. Subsequently, ConvNeXtCL-ABC is used to reclassify dataset B. Finally, the predictions of ConvNeXtCL-ABC replace the corresponding class predictions of ConvNeXt, while preserving the high recall performance for the other classes. The experimental results demonstrate that ECL-ConvNeXt significantly improves the classification performance for confused class pairs while maintaining strong performance for high-recall classes. On the LSAFBD dataset, it achieves 72.09% ACC and 75.43% MR; on the MEBeauty dataset, 73.23% ACC and 67.50% MR; on the HotOrNot dataset, 62.62% ACC and 49.29% MR. The approach is also generalizable to other multi-class imbalanced data scenarios. Full article

(This article belongs to the Special Issue Applications of Computer Vision, 3rd Edition)

► Show Figures

Figure 1

30 pages, 2050 KB

Open AccessArticle

An Ensemble Learning Approach for Facial Emotion Recognition Based on Deep Learning Techniques

by Manal Almubarak and Fawaz A. Alsulaiman

Electronics 2025, 14(17), 3415; https://doi.org/10.3390/electronics14173415 - 27 Aug 2025

Cited by 1 | Viewed by 899

Abstract

Facial emotion recognition (FER) is an evolving sub-field of computer vision and affective computing. It entails the development of algorithms and models to detect, analyze, and interpret facial expressions, thereby determining individuals’ emotional states. This paper explores the effectiveness of transfer learning using [...] Read more.

Facial emotion recognition (FER) is an evolving sub-field of computer vision and affective computing. It entails the development of algorithms and models to detect, analyze, and interpret facial expressions, thereby determining individuals’ emotional states. This paper explores the effectiveness of transfer learning using the EfficientNet-B0 convolutional neural network for FER, alongside the utilization of stacking techniques. The pretrained EfficientNet-B0 model is employed to train on a dataset comprising a diverse range of natural human face images for emotion recognition. This dataset consists of grayscale images categorized into eight distinct emotion classes. Our approach involves fine-tuning the pretrained EfficientNet-B0 model, adapting its weights and layers to capture subtle facial expressions. Moreover, this study utilizes ensemble learning by integrating transfer learning from pretrained models, a strategic tuning approach, binary classifiers, and a meta-classifier. Our approach achieves superior performance in accurately identifying and classifying emotions within facial images. Experimental results for the meta-classifier demonstrate 100% accuracy on the test set. For further assessment, we also train our meta-classifier on a Cohn–Kanade (CK+) dataset, achieving 92% accuracy on the test set. These findings highlight the effectiveness and potential of employing transfer learning and stacking techniques with EfficientNet-B0 for FER tasks. Full article

► Show Figures

Figure 1

23 pages, 811 KB

Open AccessArticle

Efficient Dynamic Emotion Recognition from Facial Expressions Using Statistical Spatio-Temporal Geometric Features

by Yacine Yaddaden

Big Data Cogn. Comput. 2025, 9(8), 213; https://doi.org/10.3390/bdcc9080213 - 19 Aug 2025

Viewed by 1099

Abstract

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal [...] Read more.

Automatic Facial Expression Recognition (AFER) is a key component of affective computing, enabling machines to recognize and interpret human emotions across various applications such as human–computer interaction, healthcare, entertainment, and social robotics. Dynamic AFER systems, which exploit image sequences, can capture the temporal evolution of facial expressions but often suffer from high computational costs, limiting their suitability for real-time use. In this paper, we propose an efficient dynamic AFER approach based on a novel spatio-temporal representation. Facial landmarks are extracted, and all possible Euclidean distances are computed to model the spatial structure. To capture temporal variations, three statistical metrics are applied to each distance sequence. A feature selection stage based on the Extremely Randomized Trees (ExtRa-Trees) algorithm is then performed to reduce dimensionality and enhance classification performance. Finally, the emotions are classified using a linear multi-class Support Vector Machine (SVM) and compared against the k-Nearest Neighbors (k-NN) method. The proposed approach is evaluated on three benchmark datasets: CK+, MUG, and MMI, achieving recognition rates of 94.65%, 93.98%, and 75.59%, respectively. Our results demonstrate that the proposed method achieves a strong balance between accuracy and computational efficiency, making it well-suited for real-time facial expression recognition applications. Full article

(This article belongs to the Special Issue Perception and Detection of Intelligent Vision)

► Show Figures

Figure 1

23 pages, 10088 KB

Open AccessArticle

Development of an Interactive Digital Human with Context-Sensitive Facial Expressions

by Fan Yang, Lei Fang, Rui Suo, Jing Zhang and Mincheol Whang

Sensors 2025, 25(16), 5117; https://doi.org/10.3390/s25165117 - 18 Aug 2025

Viewed by 1041

Abstract

With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression [...] Read more.

With the increasing complexity of human–computer interaction scenarios, conventional digital human facial expression systems show notable limitations in handling multi-emotion co-occurrence, dynamic expression, and semantic responsiveness. This paper proposes a digital human system framework that integrates multimodal emotion recognition and compound facial expression generation. The system establishes a complete pipeline for real-time interaction and compound emotional expression, following a sequence of “speech semantic parsing—multimodal emotion recognition—Action Unit (AU)-level 3D facial expression control.” First, a ResNet18-based model is employed for robust emotion classification using the AffectNet dataset. Then, an AU motion curve driving module is constructed on the Unreal Engine platform, where dynamic synthesis of basic emotions is achieved via a state-machine mechanism. Finally, Generative Pre-trained Transformer (GPT) is utilized for semantic analysis, generating structured emotional weight vectors that are mapped to the AU layer to enable language-driven facial responses. Experimental results demonstrate that the proposed system significantly improves facial animation quality, with naturalness increasing from 3.54 to 3.94 and semantic congruence from 3.44 to 3.80. These results validate the system’s capability to generate realistic and emotionally coherent expressions in real time. This research provides a complete technical framework and practical foundation for high-fidelity digital humans with affective interaction capabilities. Full article

(This article belongs to the Special Issue Emotion Recognition Based on Sensors (3rd Edition))

► Show Figures

Figure 1

25 pages, 1203 KB

Open AccessReview

Perception and Monitoring of Sign Language Acquisition for Avatar Technologies: A Rapid Focused Review (2020–2025)

by Khansa Chemnad and Achraf Othman

Multimodal Technol. Interact. 2025, 9(8), 82; https://doi.org/10.3390/mti9080082 - 14 Aug 2025

Viewed by 1557

Abstract

Sign language avatar systems have emerged as a promising solution to bridge communication gaps where human sign language interpreters are unavailable. However, the design of these avatars often fails to account for the diversity in how users acquire and perceive sign language. This [...] Read more.

Sign language avatar systems have emerged as a promising solution to bridge communication gaps where human sign language interpreters are unavailable. However, the design of these avatars often fails to account for the diversity in how users acquire and perceive sign language. This study presents a rapid review of 17 empirical studies (2020–2025) to synthesize how linguistic and cognitive variability affects sign language perception and how these findings can guide avatar development. We extracted and synthesized key constructs, participant profiles, and capture techniques relevant to avatar fidelity. This review finds that delayed exposure to sign language is consistently linked to persistent challenges in syntactic processing, classifier use, and avatar comprehension. In contrast, early-exposed signers demonstrate more robust parsing and greater tolerance of perceptual irregularities. Key perceptual features, such as smooth transitions between signs, expressive facial cues for grammatical clarity, and consistent spatial placement of referents, emerge as critical for intelligibility, particularly for late learners. These findings highlight the importance of participatory design and user-centered validation in advancing accessible, culturally responsive human–computer interaction through next-generation avatar systems. Full article

► Show Figures

Figure 1

24 pages, 10460 KB

Open AccessArticle

WGGLFA: Wavelet-Guided Global–Local Feature Aggregation Network for Facial Expression Recognition

by Kaile Dong, Xi Li, Cong Zhang, Zhenhua Xiao and Runpu Nie

Biomimetics 2025, 10(8), 495; https://doi.org/10.3390/biomimetics10080495 - 27 Jul 2025

Viewed by 654

Abstract

Facial expression plays an important role in human–computer interaction and affective computing. However, existing expression recognition methods cannot effectively capture multi-scale structural details contained in facial expressions, leading to a decline in recognition accuracy. Inspired by the multi-scale processing mechanism of the biological [...] Read more.

Facial expression plays an important role in human–computer interaction and affective computing. However, existing expression recognition methods cannot effectively capture multi-scale structural details contained in facial expressions, leading to a decline in recognition accuracy. Inspired by the multi-scale processing mechanism of the biological visual system, this paper proposes a wavelet-guided global–local feature aggregation network (WGGLFA) for facial expression recognition (FER). Our WGGLFA network consists of three main modules: the scale-aware expansion (SAE) module, which combines dilated convolution and wavelet transform to capture multi-scale contextual features; the structured local feature aggregation (SLFA) module based on facial keypoints to extract structured local features; and the expression-guided region refinement (ExGR) module, which enhances features from high-response expression areas to improve the collaborative modeling between local details and key expression regions. All three modules utilize the spatial frequency locality of the wavelet transform to achieve high-/low-frequency feature separation, thereby enhancing fine-grained expression representation under frequency domain guidance. Experimental results show that our WGGLFA achieves accuracies of 90.32%, 91.24%, and 71.90% on the RAF-DB, FERPlus, and FED-RO datasets, respectively, demonstrating that our WGGLFA is effective and has more capability of robustness and generalization than state-of-the-art (SOTA) expression recognition methods. Full article

(This article belongs to the Special Issue New Biomimetic Advances in Signal and Image Processing for Biomedical Applications 2025)

► Show Figures

Figure 1

24 pages, 9767 KB

Open AccessArticle

Facial Bone Defects Associated with Lateral Facial Clefts Tessier Type 6, 7 and 8 in Syndromic Neurocristopathies: A Detailed Micro-CT Analysis on Historical Museum Specimens

by Jana Behunova, Helga Rehder, Anton Dobsak, Susanne G. Kircher, Lucas L. Boer, Andreas A. Mueller, Janina M. Patsch, Eduard Winter, Roelof-Jan Oostra, Eva Piehslinger and Karoline M. Reich

Biology 2025, 14(7), 872; https://doi.org/10.3390/biology14070872 - 17 Jul 2025

Viewed by 907

Abstract

Lateral facial clefts are rare and often part of more complex syndromic neurocristopathies. According to Tessier’s classification, they correspond to facial cleft numbers 6, 7 and 8. Using micro-computer tomography (micro-CT), we analyzed their underlying bone defects (resolution 50 and 55 µm/voxel) in [...] Read more.

Lateral facial clefts are rare and often part of more complex syndromic neurocristopathies. According to Tessier’s classification, they correspond to facial cleft numbers 6, 7 and 8. Using micro-computer tomography (micro-CT), we analyzed their underlying bone defects (resolution 50 and 55 µm/voxel) in the context of the known syndrome-specific genetic background. Lateral facial clefts were diagnosed in three severely affected museum specimens representing mandibulofacial dysostosis type Treacher Collins syndrome (TCS), acrofacial dysostosis syndrome of Rodriguez (AFD-Rod) and tetra-amelia syndrome (TETAMS). The TCS specimen mainly showed an absence of the zygomatic bones and most of the lateral maxilla. The AFD-Rod specimen showed an extensive defect of the lateral maxilla, zygomatic bones, and mandible. The TETAMS specimen showed almost isolated agnathia. Possible relationships are discussed between the diverse facial bone defects due to apoptosis of neural crest-derived cells, known to be associated with ribosomopathies and spliceosomopathies, such as TCS and AFD-Rod, and the more targeted bone defects due to genetic variants known to cause TETAMS. Full article

(This article belongs to the Section Neuroscience)

► Show Figures

Figure 1

26 pages, 15354 KB

Open AccessArticle

Adaptive Neuro-Affective Engagement via Bayesian Feedback Learning in Serious Games for Neurodivergent Children

by Diego Resende Faria and Pedro Paulo da Silva Ayrosa

Appl. Sci. 2025, 15(13), 7532; https://doi.org/10.3390/app15137532 - 4 Jul 2025

Viewed by 897

Abstract

Neuro-Affective Intelligence (NAI) integrates neuroscience, psychology, and artificial intelligence to support neurodivergent children through personalized Child–Machine Interaction (CMI). This paper presents an adaptive neuro-affective system designed to enhance engagement in children with neurodevelopmental disorders through serious games. The proposed framework incorporates real-time biophysical [...] Read more.

Neuro-Affective Intelligence (NAI) integrates neuroscience, psychology, and artificial intelligence to support neurodivergent children through personalized Child–Machine Interaction (CMI). This paper presents an adaptive neuro-affective system designed to enhance engagement in children with neurodevelopmental disorders through serious games. The proposed framework incorporates real-time biophysical signals—including EEG-based concentration, facial expressions, and in-game performance—to compute a personalized engagement score. We introduce a novel mechanism, Bayesian Immediate Feedback Learning (BIFL), which dynamically selects visual, auditory, or textual stimuli based on real-time neuro-affective feedback. A multimodal CNN-based classifier detects mental states, while a probabilistic ensemble merges affective state classifications derived from facial expressions. A multimodal weighted engagement function continuously updates stimulus–response expectations. The system adapts in real time by selecting the most appropriate cue to support the child’s cognitive and emotional state. Experimental validation with 40 children (ages 6–10) diagnosed with Autism Spectrum Disorder (ASD) and Attention Deficit Hyperactivity Disorder (ADHD) demonstrates the system’s effectiveness in sustaining attention, improving emotional regulation, and increasing overall game engagement. The proposed framework—combining neuro-affective state recognition, multimodal engagement scoring, and BIFL—significantly improved cognitive and emotional outcomes: concentration increased by 22.4%, emotional engagement by 24.8%, and game performance by 32.1%. Statistical analysis confirmed the significance of these improvements (

p < 0.001

, Cohen’s

d > 1.4

). These findings demonstrate the feasibility and impact of probabilistic, multimodal, and neuro-adaptive AI systems in therapeutic and educational applications. Full article

► Show Figures

Figure 1

21 pages, 1709 KB

Open AccessArticle

Decoding Humor-Induced Amusement via Facial Expression Analysis: Toward Emotion-Aware Applications

by Gabrielle Toupin, Arthur Dehgan, Marie Buffo, Clément Feyt, Golnoush Alamian, Karim Jerbi and Anne-Lise Saive

Appl. Sci. 2025, 15(13), 7499; https://doi.org/10.3390/app15137499 - 3 Jul 2025

Viewed by 668

Abstract

Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet, the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or [...] Read more.

Humor is widely recognized for its positive effects on well-being, including stress reduction, mood enhancement, and cognitive benefits. Yet, the lack of reliable tools to objectively quantify amusement—particularly its temporal dynamics—has limited progress in this area. Existing measures often rely on self-report or coarse summary ratings, providing little insight into how amusement unfolds over time. To address this gap, we developed a Random Forest model to predict the intensity of amusement evoked by humorous video clips, based on participants’ facial expressions—particularly the co-activation of Facial Action Units 6 and 12 (“% Smile”)—and video features such as motion, saliency, and topic. Our results show that exposure to humorous content significantly increases “% Smile”, with amusement peaking toward the end of videos. Importantly, we observed emotional carry-over effects, suggesting that consecutive humorous stimuli can sustain or amplify positive emotional responses. Even when trained solely on humorous content, the model reliably predicted amusement intensity, underscoring the robustness of our approach. Overall, this study provides a novel, objective method to track amusement on a fine temporal scale, advancing the measurement of nonverbal emotional expression. These findings may inform the design of emotion-aware applications and humor-based therapeutic interventions to promote well-being and emotional health. Full article

(This article belongs to the Special Issue Emerging Research in Behavioral Neuroscience and in Rehabilitation)

► Show Figures

Figure 1

22 pages, 568 KB

Open AccessReview

A Review of Methods for Unobtrusive Measurement of Work-Related Well-Being

by Zoja Anžur, Klara Žinkovič, Junoš Lukan, Pietro Barbiero, Gašper Slapničar, Mohan Li, Martin Gjoreski, Maike E. Debus, Sebastijan Trojer, Mitja Luštrek and Marc Langheinrich

Mach. Learn. Knowl. Extr. 2025, 7(3), 62; https://doi.org/10.3390/make7030062 - 1 Jul 2025

Viewed by 1508

Abstract

Work-related well-being is an important research topic, as it is linked to various aspects of individuals’ lives, including job performance. To measure it effectively, unobtrusive sensors are desirable to minimize the burden on employees. Because there is a lack of consensus on the [...] Read more.

Work-related well-being is an important research topic, as it is linked to various aspects of individuals’ lives, including job performance. To measure it effectively, unobtrusive sensors are desirable to minimize the burden on employees. Because there is a lack of consensus on the definitions of well-being in the psychological literature in terms of its dimensions, our work begins by proposing a conceptualization of well-being based on the refined definition of health provided by the World Health Organization. We focus on reviewing the existing literature on the unobtrusive measurement of well-being. In our literature review, we focus on affect, engagement, fatigue, stress, sleep deprivation, physical comfort, and social interactions. Our initial search resulted in a total of 644 studies, from which we then reviewed 35, revealing a variety of behavioral markers such as facial expressions, posture, eye movements, and speech. The most commonly used sensory devices were red, green, and blue (RGB) cameras, followed by microphones and smartphones. The methods capture a variety of behavioral markers, the most common being body movement, facial expressions, and posture. Our work serves as an investigation into various unobtrusive measuring methods applicable to the workplace context, aiming to foster a more employee-centric approach to the measurement of well-being and to emphasize its affective component. Full article

(This article belongs to the Special Issue Sustainable Applications for Machine Learning)

► Show Figures

Figure 1

26 pages, 3494 KB

Open AccessArticle

A Hyper-Attentive Multimodal Transformer for Real-Time and Robust Facial Expression Recognition

by Zarnigor Tagmatova, Sabina Umirzakova, Alpamis Kutlimuratov, Akmalbek Abdusalomov and Young Im Cho

Appl. Sci. 2025, 15(13), 7100; https://doi.org/10.3390/app15137100 - 24 Jun 2025

Cited by 3 | Viewed by 1167

Abstract

Facial expression recognition (FER) plays a critical role in affective computing, enabling machines to interpret human emotions through facial cues. While recent deep learning models have achieved progress, many still fail under real-world conditions such as occlusion, lighting variation, and subtle expressions. In [...] Read more.

Facial expression recognition (FER) plays a critical role in affective computing, enabling machines to interpret human emotions through facial cues. While recent deep learning models have achieved progress, many still fail under real-world conditions such as occlusion, lighting variation, and subtle expressions. In this work, we propose FERONet, a novel hyper-attentive multimodal transformer architecture tailored for robust and real-time FER. FERONet integrates a triple-attention mechanism (spatial, channel, and cross-patch), a hierarchical transformer with token merging for computational efficiency, and a temporal cross-attention decoder to model emotional dynamics in video sequences. The model fuses RGB, optical flow, and depth/landmark inputs, enhancing resilience to environmental variation. Experimental evaluations across five standard FER datasets—FER-2013, RAF-DB, CK+, BU-3DFE, and AFEW—show that FERONet achieves superior recognition accuracy (up to 97.3%) and real-time inference speeds (<16 ms per frame), outperforming prior state-of-the-art models. The results confirm the model’s suitability for deployment in applications such as intelligent tutoring, driver monitoring, and clinical emotion assessment. Full article

(This article belongs to the Special Issue Emerging Trends in Affective Computing and Measuring Emotional Intelligence)

► Show Figures

Figure 1

Search Results (149)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (149)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI