Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (10)

Search Parameters:
Keywords = facial pose and expression transfer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 23958 KB  
Article
A Lightweight Dual-Stream Network with an Adaptive Strategy for Efficient Micro-Expression Recognition
by Xinyu Liu, Ju Zhou, Feng Chen, Shigang Li, Hanpu Wang, Yingjuan Jia and Yuhao Shan
Sensors 2025, 25(9), 2866; https://doi.org/10.3390/s25092866 - 1 May 2025
Cited by 1 | Viewed by 1237
Abstract
Micro-expressions (MEs), characterized by their brief duration and subtle facial muscle movements, pose significant challenges for accurate recognition. These ultra-fast signals, typically captured by high-speed vision sensors, require specialized computational methods to extract spatio-temporal features effectively. In this study, we propose a lightweight [...] Read more.
Micro-expressions (MEs), characterized by their brief duration and subtle facial muscle movements, pose significant challenges for accurate recognition. These ultra-fast signals, typically captured by high-speed vision sensors, require specialized computational methods to extract spatio-temporal features effectively. In this study, we propose a lightweight dual-stream network with an adaptive strategy for efficient ME recognition. Firstly, a motion magnification network based on transfer learning is employed to magnify the motion states of facial muscles in MEs. This process can generate additional samples, thereby expanding the training set. To effectively capture the dynamic changes of facial muscles, dense optical flow is extracted from the onset frame and the magnified apex frame, thereby obtaining magnified dense optical flow (MDOF). Subsequently, we design a dual-stream spatio-temporal network (DSTNet), using the magnified apex frame and MDOF as inputs for the spatial and temporal streams, respectively. An adaptive strategy that dynamically adjusts the magnification factor based on the top-1 confidence is introduced to enhance the robustness of DSTNet. Experimental results show that our proposed method outperforms existing methods in terms of F1-score on the SMIC, CASME II, SAMM, and composite dataset, as well as in cross-dataset tasks. Adaptive DSTNet significantly enhances the handling of sample imbalance while demonstrating robustness and featuring a lightweight design, indicating strong potential for future edge sensor deployment. Full article
Show Figures

Figure 1

22 pages, 4938 KB  
Article
Transfer Learning for Facial Expression Recognition
by Rajesh Kumar, Giacomo Corvisieri, Tullio Flavio Fici, Syed Ibrar Hussain, Domenico Tegolo and Cesare Valenti
Information 2025, 16(4), 320; https://doi.org/10.3390/info16040320 - 17 Apr 2025
Cited by 7 | Viewed by 4592
Abstract
Facial expressions reflect psychological states and are crucial for understanding human emotions. Traditional facial expression recognition methods face challenges in real-world healthcare applications due to variations in facial structure, lighting conditions and occlusion. We present a methodology based on transfer learning with the [...] Read more.
Facial expressions reflect psychological states and are crucial for understanding human emotions. Traditional facial expression recognition methods face challenges in real-world healthcare applications due to variations in facial structure, lighting conditions and occlusion. We present a methodology based on transfer learning with the pre-trained models VGG-19 and ResNet-152, and we highlight dataset-specific preprocessing techniques that include resizing images to 124 × 124 pixels, augmenting the data and selectively freezing layers to enhance the robustness of the model. This study explores the application of deep learning-based facial expression recognition in healthcare, particularly for remote patient monitoring and telemedicine, where accurate facial expression recognition can enhance patient assessment and early diagnosis of psychological conditions such as depression and anxiety. The proposed method achieved an average accuracy of 0.98 on the CK+ dataset, demonstrating its effectiveness in controlled environments. However performance varied across datasets, with accuracy rates of 0.44 on FER2013 and 0.89 on JAFFE, reflecting the challenges posed by noisy and diverse data. Our findings emphasize the potential of deep learning-based facial expression recognition in healthcare applications while underscoring the importance of dataset-specific model optimization to improve generalization across different data distributions. This research contributes to the advancement of automated facial expression recognition in telemedicine, supporting enhanced doctor–patient communication and improving patient care. Full article
Show Figures

Figure 1

21 pages, 1358 KB  
Article
A 3D Face Recognition Algorithm Directly Applied to Point Clouds
by Xingyi You and Xiaohu Zhao
Biomimetics 2025, 10(2), 70; https://doi.org/10.3390/biomimetics10020070 - 23 Jan 2025
Cited by 1 | Viewed by 2575
Abstract
Face recognition technology, despite its widespread use in various applications, still faces challenges related to occlusions, pose variations, and expression changes. Three-dimensional face recognition with depth information, particularly using point cloud-based networks, has shown effectiveness in overcoming these challenges. However, due to the [...] Read more.
Face recognition technology, despite its widespread use in various applications, still faces challenges related to occlusions, pose variations, and expression changes. Three-dimensional face recognition with depth information, particularly using point cloud-based networks, has shown effectiveness in overcoming these challenges. However, due to the limited extent of extensive 3D facial data and the non-rigid nature of facial structures, extracting distinct facial representations directly from point clouds remains challenging. To address this, our research proposes two key approaches. Firstly, we introduce a learning framework guided by a small amount of real face data based on morphable models with Gaussian processes. This system uses a novel method for generating large-scale virtual face scans, addressing the scarcity of 3D data. Secondly, we present a dual-branch network that directly extracts non-rigid facial features from point clouds, using kernel point convolution (KPConv) as its foundation. A local neighborhood adaptive feature learning module is introduced and employs context sampling technology, hierarchically downsampling feature-sensitive points critical for deep transfer and aggregation of discriminative facial features, to enhance the extraction of discriminative facial features. Notably, our training strategy combines large-scale face scanning data with 967 real face data from the FRGC v2.0 subset, demonstrating the effectiveness of guiding with a small amount of real face data. Experiments on the FRGC v2.0 dataset and the Bosphorus dataset demonstrate the effectiveness and potential of our method. Full article
(This article belongs to the Special Issue Exploration of Bioinspired Computer Vision and Pattern Recognition)
Show Figures

Figure 1

30 pages, 43651 KB  
Article
AmazingFS: A High-Fidelity and Occlusion-Resistant Video Face-Swapping Framework
by Zhiqiang Zeng, Wenhua Shao, Dingli Tong and Li Liu
Electronics 2024, 13(15), 2986; https://doi.org/10.3390/electronics13152986 - 29 Jul 2024
Viewed by 5448
Abstract
Current video face-swapping technologies face challenges such as poor facial fitting and the inability to handle obstructions. This paper introduces Amazing FaceSwap (AmazingFS), a novel framework for producing cinematic quality and realistic face swaps. Key innovations include the development of a Source-Target Attention [...] Read more.
Current video face-swapping technologies face challenges such as poor facial fitting and the inability to handle obstructions. This paper introduces Amazing FaceSwap (AmazingFS), a novel framework for producing cinematic quality and realistic face swaps. Key innovations include the development of a Source-Target Attention Mechanism (STAM) to improve face-swap quality while preserving target face expressions and poses. We also enhanced the AdaIN style transfer module to better retain the identity features of the source face. To address obstructions like hair and glasses during face-swap synthesis, we created the AmazingSeg network and a small dataset AST. Extensive qualitative and quantitative experiments demonstrate that AmazingFS significantly outperforms other SOTA networks, achieving amazing face swap results. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

14 pages, 437 KB  
Article
Cross-Domain Facial Expression Recognition through Reliable Global–Local Representation Learning and Dynamic Label Weighting
by Yuefang Gao, Yiteng Cai, Xuanming Bi, Bizheng Li, Shunpeng Li and Weiping Zheng
Electronics 2023, 12(21), 4553; https://doi.org/10.3390/electronics12214553 - 6 Nov 2023
Cited by 7 | Viewed by 2904
Abstract
Cross-Domain Facial Expression Recognition (CD-FER) aims to develop a facial expression recognition model that can be trained in one domain and deliver consistent performance in another. CD-FER poses a significant challenges due to changes in marginal and class distributions between source and target [...] Read more.
Cross-Domain Facial Expression Recognition (CD-FER) aims to develop a facial expression recognition model that can be trained in one domain and deliver consistent performance in another. CD-FER poses a significant challenges due to changes in marginal and class distributions between source and target domains. Existing methods primarily emphasize achieving domain-invariant features through global feature adaptation, often neglecting the potential benefits of transferable local features across different domains. To address this issue, we propose a novel framework for CD-FER that combines reliable global–local representation learning and dynamic label weighting. Our framework incorporates two key modules: the Pseudo-Complementary Label Generation (PCLG) module, which leverages pseudo-labels and complementary labels obtained using a credibility threshold to learn domain-invariant global and local features, and the Label Dynamic Weight Matching (LDWM) module, which assesses the learning difficulty of each category and adaptively assigns corresponding label weights, thereby enhancing the classification performance in the target domain. We evaluate our approach through extensive experiments and analyses on multiple public datasets, including RAF-DB, FER2013, CK+, JAFFE, SFW2.0, and ExpW. The experimental results demonstrate that our proposed model outperforms state-of-the-art methods, with an average accuracy improvement of 3.5% across the five datasets. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

16 pages, 932 KB  
Article
Synthesizing Human Activity for Data Generation
by Ana Romero, Pedro Carvalho, Luís Côrte-Real and Américo Pereira
J. Imaging 2023, 9(10), 204; https://doi.org/10.3390/jimaging9100204 - 29 Sep 2023
Cited by 4 | Viewed by 2264
Abstract
The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, [...] Read more.
The problem of gathering sufficiently representative data, such as those about human actions, shapes, and facial expressions, is costly and time-consuming and also requires training robust models. This has led to the creation of techniques such as transfer learning or data augmentation. However, these are often insufficient. To address this, we propose a semi-automated mechanism that allows the generation and editing of visual scenes with synthetic humans performing various actions, with features such as background modification and manual adjustments of the 3D avatars to allow users to create data with greater variability. We also propose an evaluation methodology for assessing the results obtained using our method, which is two-fold: (i) the usage of an action classifier on the output data resulting from the mechanism and (ii) the generation of masks of the avatars and the actors to compare them through segmentation. The avatars were robust to occlusion, and their actions were recognizable and accurate to their respective input actors. The results also showed that even though the action classifier concentrates on the pose and movement of the synthetic humans, it strongly depends on contextual information to precisely recognize the actions. Generating the avatars for complex activities also proved problematic for action recognition and the clean and precise formation of the masks. Full article
(This article belongs to the Special Issue Machine Learning for Human Activity Recognition)
Show Figures

Figure 1

14 pages, 5005 KB  
Article
Facial Pose and Expression Transfer Based on Classification Features
by Zhiyi Cao, Lei Shi, Wei Wang and Shaozhang Niu
Electronics 2023, 12(8), 1756; https://doi.org/10.3390/electronics12081756 - 7 Apr 2023
Cited by 4 | Viewed by 3233
Abstract
Transferring facial pose and expression features from one face to another is a challenging problem and an interesting topic in pattern recognition, but is one of great importance with many applications. However, existing models usually learn to transfer pose and expression features with [...] Read more.
Transferring facial pose and expression features from one face to another is a challenging problem and an interesting topic in pattern recognition, but is one of great importance with many applications. However, existing models usually learn to transfer pose and expression features with classification labels, which cannot hold all the differences in shape and size between conditional faces and source faces. To solve this problem, we propose a generative adversarial network model based on classification features for facial pose and facial expression transfer. We constructed a two-stage classifier to capture the high-dimensional classification features for each face first. Then, the proposed generation model attempts to transfer pose and expression features with classification features. In addition, we successfully combined two cost functions with different convergence speeds to learn pose and expression features. Compared to state-of-the-art models, the proposed model achieved leading scores for facial pose and expression transfer on two datasets. Full article
Show Figures

Figure 1

19 pages, 6036 KB  
Article
Facial Emotion Recognition Using Transfer Learning in the Deep CNN
by M. A. H. Akhand, Shuvendu Roy, Nazmul Siddique, Md Abdus Samad Kamal and Tetsuya Shimamura
Electronics 2021, 10(9), 1036; https://doi.org/10.3390/electronics10091036 - 27 Apr 2021
Cited by 276 | Viewed by 30497
Abstract
Human facial emotion recognition (FER) has attracted the attention of the research community for its promising applications. Mapping different facial expressions to the respective emotional states are the main task in FER. The classical FER consists of two major steps: feature extraction and [...] Read more.
Human facial emotion recognition (FER) has attracted the attention of the research community for its promising applications. Mapping different facial expressions to the respective emotional states are the main task in FER. The classical FER consists of two major steps: feature extraction and emotion recognition. Currently, the Deep Neural Networks, especially the Convolutional Neural Network (CNN), is widely used in FER by virtue of its inherent feature extraction mechanism from images. Several works have been reported on CNN with only a few layers to resolve FER problems. However, standard shallow CNNs with straightforward learning schemes have limited feature extraction capability to capture emotion information from high-resolution images. A notable drawback of the most existing methods is that they consider only the frontal images (i.e., ignore profile views for convenience), although the profile views taken from different angles are important for a practical FER system. For developing a highly accurate FER system, this study proposes a very Deep CNN (DCNN) modeling through Transfer Learning (TL) technique where a pre-trained DCNN model is adopted by replacing its dense upper layer(s) compatible with FER, and the model is fine-tuned with facial emotion data. A novel pipeline strategy is introduced, where the training of the dense layer(s) is followed by tuning each of the pre-trained DCNN blocks successively that has led to gradual improvement of the accuracy of FER to a higher level. The proposed FER system is verified on eight different pre-trained DCNN models (VGG-16, VGG-19, ResNet-18, ResNet-34, ResNet-50, ResNet-152, Inception-v3 and DenseNet-161) and well-known KDEF and JAFFE facial image datasets. FER is very challenging even for frontal views alone. FER on the KDEF dataset poses further challenges due to the diversity of images with different profile views together with frontal views. The proposed method achieved remarkable accuracy on both datasets with pre-trained models. On a 10-fold cross-validation way, the best achieved FER accuracies with DenseNet-161 on test sets of KDEF and JAFFE are 96.51% and 99.52%, respectively. The evaluation results reveal the superiority of the proposed FER system over the existing ones regarding emotion detection accuracy. Moreover, the achieved performance on the KDEF dataset with profile views is promising as it clearly demonstrates the required proficiency for real-life applications. Full article
(This article belongs to the Special Issue Deep Learning Technologies for Machine Vision and Audition)
Show Figures

Figure 1

16 pages, 6966 KB  
Article
Adaptive 3D Model-Based Facial Expression Synthesis and Pose Frontalization
by Yu-Jin Hong, Sung Eun Choi, Gi Pyo Nam, Heeseung Choi, Junghyun Cho and Ig-Jae Kim
Sensors 2020, 20(9), 2578; https://doi.org/10.3390/s20092578 - 1 May 2020
Cited by 4 | Viewed by 7529
Abstract
Facial expressions are one of the important non-verbal ways used to understand human emotions during communication. Thus, acquiring and reproducing facial expressions is helpful in analyzing human emotional states. However, owing to complex and subtle facial muscle movements, facial expression modeling from images [...] Read more.
Facial expressions are one of the important non-verbal ways used to understand human emotions during communication. Thus, acquiring and reproducing facial expressions is helpful in analyzing human emotional states. However, owing to complex and subtle facial muscle movements, facial expression modeling from images with face poses is difficult to achieve. To handle this issue, we present a method for acquiring facial expressions from a non-frontal single photograph using a 3D-aided approach. In addition, we propose a contour-fitting method that improves the modeling accuracy by automatically rearranging 3D contour landmarks corresponding to fixed 2D image landmarks. The acquired facial expression input can be parametrically manipulated to create various facial expressions through a blendshape or expression transfer based on the FACS (Facial Action Coding System). To achieve a realistic facial expression synthesis, we propose an exemplar-texture wrinkle synthesis method that extracts and synthesizes appropriate expression wrinkles according to the target expression. To do so, we constructed a wrinkle table of various facial expressions from 400 people. As one of the applications, we proved that the expression-pose synthesis method is suitable for expression-invariant face recognition through a quantitative evaluation, and showed the effectiveness based on a qualitative evaluation. We expect our system to be a benefit to various fields such as face recognition, HCI, and data augmentation for deep learning. Full article
(This article belongs to the Special Issue Sensor Applications on Emotion Recognition)
Show Figures

Figure 1

27 pages, 2200 KB  
Review
A Review on Automatic Facial Expression Recognition Systems Assisted by Multimodal Sensor Data
by Najmeh Samadiani, Guangyan Huang, Borui Cai, Wei Luo, Chi-Hung Chi, Yong Xiang and Jing He
Sensors 2019, 19(8), 1863; https://doi.org/10.3390/s19081863 - 18 Apr 2019
Cited by 168 | Viewed by 19304
Abstract
Facial Expression Recognition (FER) can be widely applied to various research areas, such as mental diseases diagnosis and human social/physiological interaction detection. With the emerging advanced technologies in hardware and sensors, FER systems have been developed to support real-world application scenes, instead of [...] Read more.
Facial Expression Recognition (FER) can be widely applied to various research areas, such as mental diseases diagnosis and human social/physiological interaction detection. With the emerging advanced technologies in hardware and sensors, FER systems have been developed to support real-world application scenes, instead of laboratory environments. Although the laboratory-controlled FER systems achieve very high accuracy, around 97%, the technical transferring from the laboratory to real-world applications faces a great barrier of very low accuracy, approximately 50%. In this survey, we comprehensively discuss three significant challenges in the unconstrained real-world environments, such as illumination variation, head pose, and subject-dependence, which may not be resolved by only analysing images/videos in the FER system. We focus on those sensors that may provide extra information and help the FER systems to detect emotion in both static images and video sequences. We introduce three categories of sensors that may help improve the accuracy and reliability of an expression recognition system by tackling the challenges mentioned above in pure image/video processing. The first group is detailed-face sensors, which detect a small dynamic change of a face component, such as eye-trackers, which may help differentiate the background noise and the feature of faces. The second is non-visual sensors, such as audio, depth, and EEG sensors, which provide extra information in addition to visual dimension and improve the recognition reliability for example in illumination variation and position shift situation. The last is target-focused sensors, such as infrared thermal sensors, which can facilitate the FER systems to filter useless visual contents and may help resist illumination variation. Also, we discuss the methods of fusing different inputs obtained from multimodal sensors in an emotion system. We comparatively review the most prominent multimodal emotional expression recognition approaches and point out their advantages and limitations. We briefly introduce the benchmark data sets related to FER systems for each category of sensors and extend our survey to the open challenges and issues. Meanwhile, we design a framework of an expression recognition system, which uses multimodal sensor data (provided by the three categories of sensors) to provide complete information about emotions to assist the pure face image/video analysis. We theoretically analyse the feasibility and achievability of our new expression recognition system, especially for the use in the wild environment, and point out the future directions to design an efficient, emotional expression recognition system. Full article
(This article belongs to the Special Issue Sensor Applications on Face Analysis)
Show Figures

Figure 1

Back to TopTop