MDPI - Publisher of Open Access Journals

20 pages, 3651 KiB

Open AccessArticle

A Meta-Learner Based on the Combination of Stacking Ensembles and a Mixture of Experts for Balancing Action Unit Recognition

by Andrew Sumsion and Dah-Jye Lee

Electronics 2025, 14(13), 2665; https://doi.org/10.3390/electronics14132665 - 30 Jun 2025

Viewed by 230

Abstract

Facial action units (AUs) are used throughout animation, clinical settings, and robotics. AU recognition usually works better for these downstream tasks when it achieves high performance across all AUs. Current facial AU recognition approaches tend to perform unevenly across all AUs. Among other [...] Read more.

Facial action units (AUs) are used throughout animation, clinical settings, and robotics. AU recognition usually works better for these downstream tasks when it achieves high performance across all AUs. Current facial AU recognition approaches tend to perform unevenly across all AUs. Among other potential reasons, one cause is their focus on improving the overall average F1 score, where good performance on a small number of AUs increases the overall average F1 score even with poor performance on other AUs. Building on our previous success, which achieved the highest average F1 score, this work focuses on improving its performance across all AUs to address this challenge. We propose a mixture of experts as the meta-learner to combine the outputs of an explicit stacking ensemble. For our ensemble, we use a heterogeneous, negative correlation, explicit stacking ensemble. We introduce an additional measurement called Borda ranking to better evaluate the overall performance across all AUs. As indicated by this additional metric, our method not only maintains the best overall average F1 score but also achieves the highest performance across all AUs on the BP4D and DISFA datasets. We also release a synthetic dataset as additional training data, the first with balanced AU labels. Full article

(This article belongs to the Special Issue Advancements in Artificial Intelligence (AI) for Engineering Applications)

► Show Figures

Figure 1

23 pages, 1664 KiB

Open AccessArticle

Seeing the Unseen: Real-Time Micro-Expression Recognition with Action Units and GPT-Based Reasoning

by Gabriela Laura Sălăgean, Monica Leba and Andreea Cristina Ionica

Appl. Sci. 2025, 15(12), 6417; https://doi.org/10.3390/app15126417 - 6 Jun 2025

Viewed by 1086

Abstract

This paper presents a real-time system for the detection and classification of facial micro-expressions, evaluated on the CASME II dataset. Micro-expressions are brief and subtle indicators of genuine emotions, posing significant challenges for automatic recognition due to their low intensity, short duration, and [...] Read more.

This paper presents a real-time system for the detection and classification of facial micro-expressions, evaluated on the CASME II dataset. Micro-expressions are brief and subtle indicators of genuine emotions, posing significant challenges for automatic recognition due to their low intensity, short duration, and inter-subject variability. To address these challenges, the proposed system integrates advanced computer vision techniques, rule-based classification grounded in the Facial Action Coding System, and artificial intelligence components. The architecture employs MediaPipe for facial landmark tracking and action unit extraction, expert rules to resolve common emotional confusions, and deep learning modules for optimized classification. Experimental validation demonstrated a classification accuracy of 93.30% on CASME II, highlighting the effectiveness of the hybrid design. The system also incorporates mechanisms for amplifying weak signals and adapting to new subjects through continuous knowledge updates. These results confirm the advantages of combining domain expertise with AI-driven reasoning to improve micro-expression recognition. The proposed methodology has practical implications for various fields, including clinical psychology, security, marketing, and human-computer interaction, where the accurate interpretation of emotional micro-signals is essential. Full article

► Show Figures

Figure 1

25 pages, 6904 KiB

Open AccessArticle

A Weighted Facial Expression Analysis for Pain Level Estimation

by Parkpoom Chaisiriprasert and Nattapat Patchsuwan

J. Imaging 2025, 11(5), 151; https://doi.org/10.3390/jimaging11050151 - 9 May 2025

Viewed by 685

Abstract

Accurate assessment of pain intensity is critical, particularly for patients who are unable to verbally express their discomfort. This study proposes a novel weighted analytical framework that integrates facial expression analysis through action units (AUs) with a facial feature-based weighting mechanism to enhance [...] Read more.

Accurate assessment of pain intensity is critical, particularly for patients who are unable to verbally express their discomfort. This study proposes a novel weighted analytical framework that integrates facial expression analysis through action units (AUs) with a facial feature-based weighting mechanism to enhance the estimation of pain intensity. The proposed method was evaluated on a dataset comprising 4084 facial images from 25 individuals and demonstrated an average accuracy of 92.72% using the weighted pain level estimation model, in contrast to 83.37% achieved using conventional approaches. The observed improvements are primarily attributed to the strategic utilization of AU zones and expression-based weighting, which enable more precise differentiation between pain-related and non-pain-related facial movements. These findings underscore the efficacy of the proposed model in enhancing the accuracy and reliability of automated pain detection, especially in contexts where verbal communication is impaired or absent. Full article

(This article belongs to the Topic Applications of Image and Video Processing in Medical Imaging)

► Show Figures

Figure 1

24 pages, 2289 KiB

Open AccessArticle

A Non-Invasive Approach for Facial Action Unit Extraction and Its Application in Pain Detection

by Mondher Bouazizi, Kevin Feghoul, Shengze Wang, Yue Yin and Tomoaki Ohtsuki

Bioengineering 2025, 12(2), 195; https://doi.org/10.3390/bioengineering12020195 - 17 Feb 2025

Cited by 1 | Viewed by 1682

Abstract

A significant challenge that hinders advancements in medical research is the sensitive and confidential nature of patient data in available datasets. In particular, sharing patients’ facial images poses considerable privacy risks, especially with the rise of generative artificial intelligence (AI), which could misuse [...] Read more.

A significant challenge that hinders advancements in medical research is the sensitive and confidential nature of patient data in available datasets. In particular, sharing patients’ facial images poses considerable privacy risks, especially with the rise of generative artificial intelligence (AI), which could misuse such data if accessed by unauthorized parties. However, facial expressions are a valuable source of information for doctors and researchers, which creates a need for methods to derive them without compromising patient privacy or safety by exposing identifiable facial images. To address this, we present a quick, computationally efficient method for detecting action units (AUs) and their intensities—key indicators of health and emotion—using only 3D facial landmarks. Our proposed framework extracts 3D face landmarks from video recordings and employs a lightweight neural network (NN) to identify AUs and estimate AU intensities based on these landmarks. Our proposed method reaches a 79.25% F1-score in AU detection for the main AUs, and 0.66 in AU intensity estimation Root Mean Square Error (RMSE). This performance shows that it is possible for researchers to share 3D landmarks, which are far less intrusive, instead of facial images while maintaining high accuracy in AU detection. Moreover, to showcase the usefulness of our AU detection model, using the detected AUs and estimated intensities, we trained state-of-the-art Deep Learning (DL) models to detect pain. Our method reaches 91.16% accuracy in pain detection, which is not far behind the 93.14% accuracy obtained when employing a convolutional neural network (CNN) with residual blocks trained on actual images and the 92.11% accuracy obtained when employing all the ground-truth AUs. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Figure 1

25 pages, 3228 KiB

Open AccessArticle

Microexpression Recognition Method Based on ADP-DSTN Feature Fusion and Convolutional Block Attention Module

by Junfang Song, Shanzhong Lei and Wenzhe Wu

Electronics 2024, 13(20), 4012; https://doi.org/10.3390/electronics13204012 - 12 Oct 2024

Viewed by 1507

Abstract

Microexpressions are subtle facial movements that occur within an extremely brief time frame, often revealing suppressed emotions. These expressions hold significant importance across various fields, including security monitoring and human–computer interaction. However, the accuracy of microexpression recognition is severely constrained by the inherent [...] Read more.

Microexpressions are subtle facial movements that occur within an extremely brief time frame, often revealing suppressed emotions. These expressions hold significant importance across various fields, including security monitoring and human–computer interaction. However, the accuracy of microexpression recognition is severely constrained by the inherent characteristics of these expressions. To address the issue of low detection accuracy regarding the subtle features present in microexpressions’ facial action units, this paper proposes a microexpression action unit detection algorithm, Attention-embedded Dual Path and Shallow Three-stream Networks (ADP-DSTN), that incorporates an attention-embedded dual path and a shallow three-stream network. First, an attention mechanism was embedded after each Bottleneck layer in the foundational Dual Path Networks to extract static features representing subtle texture variations that have significant weights in the action units. Subsequently, a shallow three-stream 3D convolutional neural network was employed to extract optical flow features that were particularly sensitive to temporal and discriminative characteristics specific to microexpression action units. Finally, the acquired static facial feature vectors and optical flow feature vectors were concatenated to form a fused feature vector that encompassed more effective information for recognition. Each facial action unit was then trained individually to address the issue of weak correlations among the facial action units, thereby facilitating the classification of microexpression emotions. The experimental results demonstrated that the proposed method achieved great performance across several microexpression datasets. The unweighted average recall (UAR) values were 80.71%, 89.55%, 44.64%, 80.59%, and 88.32% for the SAMM, CASME II, CAS(ME)³, SMIC, and MEGC2019 datasets, respectively. The unweighted F1 scores (UF1) were 79.32%, 88.30%, 43.03%, 81.12%, and 88.95%, respectively. Furthermore, when compared to the benchmark model, our proposed model achieved better performance with lower computational complexity, characterized by a Floating Point Operations (FLOPs) value of 1087.350 M and a total of 6.356 × 10⁶ model parameters. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

22 pages, 1736 KiB

Open AccessArticle

Demystifying Mental Health by Decoding Facial Action Unit Sequences

by Deepika Sharma, Jaiteg Singh, Sukhjit Singh Sehra and Sumeet Kaur Sehra

Big Data Cogn. Comput. 2024, 8(7), 78; https://doi.org/10.3390/bdcc8070078 - 9 Jul 2024

Viewed by 4081

Abstract

Mental health is indispensable for effective daily functioning and stress management. Facial expressions may provide vital clues about the mental state of a person as they are universally consistent across cultures. This study intends to detect the emotional variances through facial micro-expressions using [...] Read more.

Mental health is indispensable for effective daily functioning and stress management. Facial expressions may provide vital clues about the mental state of a person as they are universally consistent across cultures. This study intends to detect the emotional variances through facial micro-expressions using facial action units (AUs) to identify probable mental health issues. In addition, convolutional neural networks (CNN) were used to detect and classify the micro-expressions. Further, combinations of AUs were identified for the segmentation of micro-expressions classes using K-means square. Two benchmarked datasets CASME II and SAMM were employed for the training and evaluation of the model. The model achieved an accuracy of 95.62% on CASME II and 93.21% on the SAMM dataset, respectively. Subsequently, a case analysis was done to identify depressive patients using the proposed framework and it attained an accuracy of 92.99%. This experiment revealed the fact that emotions like disgust, sadness, anger, and surprise are the prominent emotions experienced by depressive patients during communication. The findings suggest that leveraging facial action units for micro-expression detection offers a promising approach to mental health diagnostics. Full article

► Show Figures

Figure 1

16 pages, 5449 KiB

Open AccessArticle

Explainable Depression Detection Based on Facial Expression Using LSTM on Attentional Intermediate Feature Fusion with Label Smoothing

by Yanisa Mahayossanunt, Natawut Nupairoj, Solaphat Hemrungrojn and Peerapon Vateekul

Sensors 2023, 23(23), 9402; https://doi.org/10.3390/s23239402 - 25 Nov 2023

Cited by 7 | Viewed by 5890

Abstract

Machine learning is used for a fast pre-diagnosis approach to prevent the effects of Major Depressive Disorder (MDD). The objective of this research is to detect depression using a set of important facial features extracted from interview video, e.g., radians, gaze at angles, [...] Read more.

Machine learning is used for a fast pre-diagnosis approach to prevent the effects of Major Depressive Disorder (MDD). The objective of this research is to detect depression using a set of important facial features extracted from interview video, e.g., radians, gaze at angles, action unit intensity, etc. The model is based on LSTM with an attention mechanism. It aims to combine those features using the intermediate fusion approach. The label smoothing was presented to further improve the model’s performance. Unlike other black-box models, the integrated gradient was presented as the model explanation to show important features of each patient. The experiment was conducted on 474 video samples collected at Chulalongkorn University. The data set was divided into 134 depressed and 340 non-depressed categories. The results showed that our model is the winner, with a 88.89% F1-score, 87.03% recall, 91.67% accuracy, and 91.40% precision. Moreover, the model can capture important features of depression, including head turning, no specific gaze, slow eye movement, no smiles, frowning, grumbling, and scowling, which express a lack of concentration, social disinterest, and negative feelings that are consistent with the assumptions in the depressive theories. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

19 pages, 1688 KiB

Open AccessArticle

Electromyographic Validation of Spontaneous Facial Mimicry Detection Using Automated Facial Action Coding

by Chun-Ting Hsu and Wataru Sato

Sensors 2023, 23(22), 9076; https://doi.org/10.3390/s23229076 - 9 Nov 2023

Cited by 7 | Viewed by 2446

Abstract

Although electromyography (EMG) remains the standard, researchers have begun using automated facial action coding system (FACS) software to evaluate spontaneous facial mimicry despite the lack of evidence of its validity. Using the facial EMG of the zygomaticus major (ZM) as a standard, we [...] Read more.

Although electromyography (EMG) remains the standard, researchers have begun using automated facial action coding system (FACS) software to evaluate spontaneous facial mimicry despite the lack of evidence of its validity. Using the facial EMG of the zygomaticus major (ZM) as a standard, we confirmed the detection of spontaneous facial mimicry in action unit 12 (AU12, lip corner puller) via an automated FACS. Participants were alternately presented with real-time model performance and prerecorded videos of dynamic facial expressions, while simultaneous ZM signal and frontal facial videos were acquired. Facial videos were estimated for AU12 using FaceReader, Py-Feat, and OpenFace. The automated FACS is less sensitive and less accurate than facial EMG, but AU12 mimicking responses were significantly correlated with ZM responses. All three software programs detected enhanced facial mimicry by live performances. The AU12 time series showed a roughly 100 to 300 ms latency relative to the ZM. Our results suggested that while the automated FACS could not replace facial EMG in mimicry detection, it could serve a purpose for large effect sizes. Researchers should be cautious with the automated FACS outputs, especially when studying clinical populations. In addition, developers should consider the EMG validation of AU estimation as a benchmark. Full article

(This article belongs to the Special Issue Advanced-Sensors-Based Emotion Sensing and Recognition)

► Show Figures

Figure 1

17 pages, 2984 KiB

Open AccessArticle

Emotion Recognition in Individuals with Down Syndrome: A Convolutional Neural Network-Based Algorithm Proposal

by Nancy Paredes, Eduardo Caicedo-Bravo and Bladimir Bacca

Symmetry 2023, 15(7), 1435; https://doi.org/10.3390/sym15071435 - 17 Jul 2023

Cited by 5 | Viewed by 2665

Abstract

This research introduces an algorithm that automatically detects five primary emotions in individuals with Down syndrome: happiness, anger, sadness, surprise, and neutrality. The study was conducted in a specialized institution dedicated to caring for individuals with Down syndrome, which allowed for collecting samples [...] Read more.

This research introduces an algorithm that automatically detects five primary emotions in individuals with Down syndrome: happiness, anger, sadness, surprise, and neutrality. The study was conducted in a specialized institution dedicated to caring for individuals with Down syndrome, which allowed for collecting samples in uncontrolled environments and capturing spontaneous emotions. Collecting samples through facial images strictly followed a protocol approved by certified Ethics Committees in Ecuador and Colombia. The proposed system consists of three convolutional neural networks (CNNs). The first network analyzes facial microexpressions by assessing the intensity of action units associated with each emotion. The second network utilizes transfer learning based on the mini-Xception architecture, using the Dataset-DS, comprising images collected from individuals with Down syndrome as the validation dataset. Finally, these two networks are combined in a CNN network to enhance accuracy. The final CNN processes the information, resulting in an accuracy of 85.30% in emotion recognition. In addition, the algorithm was optimized by tuning specific hyperparameters of the network, leading to a 91.48% accuracy in emotion recognition accuracy, specifically for people with Down syndrome. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

18 pages, 4010 KiB

Open AccessEditor’s ChoiceArticle

Processing Real-Life Recordings of Facial Expressions of Polish Sign Language Using Action Units

by Anna Irasiak, Jan Kozak, Adam Piasecki and Tomasz Stęclik

Entropy 2023, 25(1), 120; https://doi.org/10.3390/e25010120 - 6 Jan 2023

Cited by 2 | Viewed by 2489

Abstract

Automatic translation between the national language and sign language is a complex process similar to translation between two different foreign languages. A very important aspect is the precision of not only manual gestures but also facial expressions, which are extremely important in the [...] Read more.

Automatic translation between the national language and sign language is a complex process similar to translation between two different foreign languages. A very important aspect is the precision of not only manual gestures but also facial expressions, which are extremely important in the overall context of a sentence. In this article, we present the problem of including facial expressions in the automation of Polish-to-Polish Sign Language (PJM) translation—this is part of an ongoing project related to a comprehensive solution allowing for the animation of manual gestures, body movements and facial expressions. Our approach explores the possibility of using action unit (AU) recognition in the automatic annotation of recordings, which in the subsequent steps will be used to train machine learning models. This paper aims to evaluate entropy in real-life translation recordings and analyze the data associated with the detected action units. Our approach has been subjected to evaluation by experts related to Polish Sign Language, and the results obtained allow for the development of further work related to automatic translation into Polish Sign Language. Full article

(This article belongs to the Special Issue Entropy in Real-World Datasets and Its Impact on Machine Learning)

► Show Figures

Figure 1

43 pages, 9806 KiB

Open AccessArticle

ADABase: A Multimodal Dataset for Cognitive Load Estimation

by Maximilian P. Oppelt, Andreas Foltyn, Jessica Deuschel, Nadine R. Lang, Nina Holzer, Bjoern M. Eskofier and Seung Hee Yang

Sensors 2023, 23(1), 340; https://doi.org/10.3390/s23010340 - 28 Dec 2022

Cited by 15 | Viewed by 10292

Abstract

Driver monitoring systems play an important role in lower to mid-level autonomous vehicles. Our work focuses on the detection of cognitive load as a component of driver-state estimation to improve traffic safety. By inducing single and dual-task workloads of increasing intensity on 51 [...] Read more.

Driver monitoring systems play an important role in lower to mid-level autonomous vehicles. Our work focuses on the detection of cognitive load as a component of driver-state estimation to improve traffic safety. By inducing single and dual-task workloads of increasing intensity on 51 subjects, while continuously measuring signals from multiple modalities, based on physiological measurements such as ECG, EDA, EMG, PPG, respiration rate, skin temperature and eye tracker data, as well as behavioral measurements such as action units extracted from facial videos, performance metrics like reaction time and subjective feedback using questionnaires, we create ADABase (Autonomous Driving Cognitive Load Assessment Database) As a reference method to induce cognitive load onto subjects, we use the well-established n-back test, in addition to our novel simulator-based k-drive test, motivated by real-world semi-autonomously vehicles. We extract expert features of all measurements and find significant changes in multiple modalities. Ultimately we train and evaluate machine learning algorithms using single and multimodal inputs to distinguish cognitive load levels. We carefully evaluate model behavior and study feature importance. In summary, we introduce a novel cognitive load test, create a cognitive load database, validate changes using statistical tests, introduce novel classification and regression tasks for machine learning and train and evaluate machine learning models. Full article

(This article belongs to the Special Issue Sensor Based Multi-Modal Emotion Recognition)

► Show Figures

Figure 1

15 pages, 1584 KiB

Open AccessArticle

AU-Guided Unsupervised Domain-Adaptive Facial Expression Recognition

by Xiaojiang Peng, Yuxin Gu and Panpan Zhang

Appl. Sci. 2022, 12(9), 4366; https://doi.org/10.3390/app12094366 - 26 Apr 2022

Cited by 10 | Viewed by 2548

Abstract

Domain diversities, including inconsistent annotation and varied image collection conditions, inevitably exist among different facial expression recognition (FER) datasets, posing an evident challenge for adapting FER models trained on one dataset to another one. Recent works mainly focus on domain-invariant deep feature learning [...] Read more.

Domain diversities, including inconsistent annotation and varied image collection conditions, inevitably exist among different facial expression recognition (FER) datasets, posing an evident challenge for adapting FER models trained on one dataset to another one. Recent works mainly focus on domain-invariant deep feature learning with adversarial learning mechanisms, ignoring the sibling facial action unit (AU) detection task, which has obtained great progress. Considering that AUs objectively determine facial expressions, this paper proposes an AU-guided unsupervised domain-adaptive FER (AdaFER) framework to relieve the annotation bias between different FER datasets. In AdaFER, we first leverage an advanced model for AU detection on both a source and a target domain. Then, we compare the AU results to perform AU-guided annotating, i.e., target faces that own the same AUs as source faces would inherit the labels from the source domain. Meanwhile, to achieve domain-invariant compact features, we utilize an AU-guided triplet training, which randomly collects anchor–positive–negative triplets on both domains with AUs. We conduct extensive experiments on several popular benchmarks and show that AdaFER achieves state-of-the-art results on all these benchmarks. Full article

(This article belongs to the Special Issue Deep Learning for Facial Expression Analysis)

► Show Figures

Figure 1

23 pages, 832 KiB

Open AccessArticle

A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset

by Cristina Luna-Jiménez, Ricardo Kleinlein, David Griol, Zoraida Callejas, Juan M. Montero and Fernando Fernández-Martínez

Appl. Sci. 2022, 12(1), 327; https://doi.org/10.3390/app12010327 - 30 Dec 2021

Cited by 79 | Viewed by 11944

Abstract

Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and [...] Read more.

Emotion recognition is attracting the attention of the research community due to its multiple applications in different fields, such as medicine or autonomous driving. In this paper, we proposed an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy results were achieved when we fine-tuned the whole model by appending a multilayer perceptron on top of it, confirming that the training was more robust when it did not start from scratch and the previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion recognizer, we extracted the Action Units of the videos and compared the performance between employing static models against sequential models. Results showed that sequential models beat static models by a narrow difference. Error analysis reported that the visual systems could improve with a detector of high-emotional load frames, which opened a new line of research to discover new ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. Results demonstrated that these modalities carried relevant information to detect users’ emotional state and their combination allowed to improve the final system performance. Full article

(This article belongs to the Special Issue IberSPEECH 2020: Speech and Language Technologies for Iberian Languages)

► Show Figures

Figure 1

9 pages, 1354 KiB

Open AccessArticle

Viewpoint Robustness of Automated Facial Action Unit Detection Systems

by Shushi Namba, Wataru Sato and Sakiko Yoshikawa

Appl. Sci. 2021, 11(23), 11171; https://doi.org/10.3390/app112311171 - 25 Nov 2021

Cited by 10 | Viewed by 3125

Abstract

Automatic facial action detection is important, but no previous studies have evaluated pre-trained models on the accuracy of facial action detection as the angle of the face changes from frontal to profile. Using static facial images obtained at various angles (0°, 15°, 30°, [...] Read more.

Automatic facial action detection is important, but no previous studies have evaluated pre-trained models on the accuracy of facial action detection as the angle of the face changes from frontal to profile. Using static facial images obtained at various angles (0°, 15°, 30°, and 45°), we investigated the performance of three automated facial action detection systems (FaceReader, OpenFace, and Py-feat). The overall performance was best for OpenFace, followed by FaceReader and Py-Feat. The performance of FaceReader significantly decreased at 45° compared to that at other angles, while the performance of Py-Feat did not differ among the four angles. The performance of OpenFace decreased as the target face turned sideways. Prediction accuracy and robustness to angle changes varied with the target facial components and action detection system. Full article

(This article belongs to the Special Issue Research on Facial Expression Recognition)

► Show Figures

Figure 1

14 pages, 1736 KiB

Open AccessArticle

Synthesising Facial Macro- and Micro-Expressions Using Reference Guided Style Transfer

by Chuin Hong Yap, Ryan Cunningham, Adrian K. Davison and Moi Hoon Yap

J. Imaging 2021, 7(8), 142; https://doi.org/10.3390/jimaging7080142 - 11 Aug 2021

Cited by 4 | Viewed by 4111

Abstract

Long video datasets of facial macro- and micro-expressions remains in strong demand with the current dominance of data-hungry deep learning methods. There are limited methods of generating long videos which contain micro-expressions. Moreover, there is a lack of performance metrics to quantify the [...] Read more.

Long video datasets of facial macro- and micro-expressions remains in strong demand with the current dominance of data-hungry deep learning methods. There are limited methods of generating long videos which contain micro-expressions. Moreover, there is a lack of performance metrics to quantify the generated data. To address the research gaps, we introduce a new approach to generate synthetic long videos and recommend assessment methods to inspect dataset quality. For synthetic long video generation, we use the state-of-the-art generative adversarial network style transfer method—StarGANv2. Using StarGANv2 pre-trained on the CelebA dataset, we transfer the style of a reference image from SAMM long videos (a facial micro- and macro-expression long video dataset) onto a source image of the FFHQ dataset to generate a synthetic dataset (SAMM-SYNTH). We evaluate SAMM-SYNTH by conducting an analysis based on the facial action units detected by OpenFace. For quantitative measurement, our findings show high correlation on two Action Units (AUs), i.e., AU12 and AU6, of the original and synthetic data with a Pearson’s correlation of 0.74 and 0.72, respectively. This is further supported by evaluation method proposed by OpenFace on those AUs, which also have high scores of 0.85 and 0.59. Additionally, optical flow is used to visually compare the original facial movements and the transferred facial movements. With this article, we publish our dataset to enable future research and to increase the data pool of micro-expressions research, especially in the spotting task. Full article

(This article belongs to the Special Issue Imaging Studies for Face and Gesture Analysis)

► Show Figures

Figure 1

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (21)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI