MDPI - Publisher of Open Access Journals

26 pages, 3207 KiB

Open AccessArticle

A Novel Face Frontalization Method by Seamlessly Integrating Landmark Detection and Decision Forest into Generative Adversarial Network (GAN)

by Mahmood H. B. Alhlffee and Yea-Shuan Huang

Mathematics 2025, 13(3), 499; https://doi.org/10.3390/math13030499 - 2 Feb 2025

Viewed by 1484

Abstract

In real-world scenarios, posture variation and low-quality image resolution are two well-known factors that compromise the accuracy and reliability of face recognition system. These challenges can be overcome using various methods, including Generative Adversarial Networks (GANs). Despite this, concerns over the accuracy and [...] Read more.

In real-world scenarios, posture variation and low-quality image resolution are two well-known factors that compromise the accuracy and reliability of face recognition system. These challenges can be overcome using various methods, including Generative Adversarial Networks (GANs). Despite this, concerns over the accuracy and reliability of GAN methods are increasing as the facial recognition market expands rapidly. The existing framework such as Two-Pathway GAN (TP-GAN) method has demonstrated that it is superior to numerous GAN methods that provide better face-texture details due to its unique deep neural network structure that allows it to perceive local details and global structure in a supervised manner. TP-GAN overcomes some of the obstacle associated with face frontalization tasks through the use of landmark detection and synthesis functions, but it remains challenging to achieve the desired performance across a wide range of datasets. To address the inherent limitations of TP-GAN, we propose a novel face frontalization method (NFF) combining landmark detection, decision forests, and data augmentation. NFF provides 2D landmark detection to integrate global structure with local details of the generator model so that more accurate facial feature representations and robust feature extractions can be achieved. NFF enhances the stability of the discriminator model over time by integrating decision forest capabilities into the TP-GAN discriminator core architecture that allows us to perform a wide range of facial pose tasks. Moreover, NFF uses data augmentation techniques to maximize training data by generating completely new synthetic data from existing data. Our evaluations are based on the Multi-PIE, FEI, and CAS-PEAL datasets. NFF results indicate that TP-GAN performance can be significantly enhanced by resolving the challenges described above, leading to high quality visualizations and rank-1 face identification. Full article

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

► Show Figures

Figure 1

17 pages, 1381 KiB

Open AccessArticle

Comparison of Mirroring and Overlapping Analysis and Three-Dimensional Soft Tissue Spatial Angle Wireframe Template in Evaluating Facial Asymmetry

by Gengchen Yang, Liang Lyu, Aonan Wen, Yijiao Zhao, Yong Wang, Jing Li, Huichun Yan, Mingjin Zhang, Yi Yu, Tingting Yu and Dawei Liu

Bioengineering 2025, 12(1), 79; https://doi.org/10.3390/bioengineering12010079 - 16 Jan 2025

Viewed by 1081

Abstract

Aim: The purpose of this study was to evaluate the accuracy and efficacy of a new wireframe template methodology in analyzing three-dimensional facial soft tissue asymmetry. Materials and methods: Three-dimensional facial soft tissue data were obtained for 24 patients. The wireframe template was [...] Read more.

Aim: The purpose of this study was to evaluate the accuracy and efficacy of a new wireframe template methodology in analyzing three-dimensional facial soft tissue asymmetry. Materials and methods: Three-dimensional facial soft tissue data were obtained for 24 patients. The wireframe template was established by identifying 34 facial landmarks and then forming a template on the face with the MeshLab 2020 software. The angle asymmetry index was automatically scored using the template. The mirroring and overlapping technique is accepted as the golden standard method to diagnose facial asymmetry by acquiring deviation values of one’s face. Consistency rates between the two methodologies were determined through a statistical comparison of the angle asymmetry index and deviation values. Results: Overall consistency rates in the labial, mandibular angle, cheek, chin, and articular regions were 87.5%, 95.8%, 87.5%, 91.7%, and 100%, respectively. Regions with consistency rates in three dimensions of more than 85% are the x-axis and the z-axis of all regions and the y-axis of the mandibular angle, chin, and articular region. Conclusions: Soft tissue facial asymmetry can be diagnosed accurately and effectively by using a three-dimensional soft tissue spatial angle wireframe template. Precise localization of asymmetry can be offered, and indiscernible tiny asymmetry can be identified. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Graphical abstract

19 pages, 3783 KiB

Open AccessArticle

MCCA-VNet: A Vit-Based Deep Learning Approach for Micro-Expression Recognition Based on Facial Coding

by Dehao Zhang, Tao Zhang, Haijiang Sun, Yanhui Tang and Qiaoyuan Liu

Sensors 2024, 24(23), 7549; https://doi.org/10.3390/s24237549 - 26 Nov 2024

Viewed by 1182

Abstract

In terms of facial expressions, micro-expressions are more realistic than macro-expressions and provide more valuable information, which can be widely used in psychological counseling and clinical diagnosis. In the past few years, deep learning methods based on optical flow and Transformer have achieved [...] Read more.

In terms of facial expressions, micro-expressions are more realistic than macro-expressions and provide more valuable information, which can be widely used in psychological counseling and clinical diagnosis. In the past few years, deep learning methods based on optical flow and Transformer have achieved excellent results in this field, but most of the current algorithms are mainly concentrated on establishing a serialized token through the self-attention model, and they do not take into account the spatial relationship between facial landmarks. For the locality and changes in the micro-facial conditions themselves, we propose the deep learning model MCCA-VNET on the basis of Transformer. We effectively extract the changing features as the input of the model, fusing channel attention and spatial attention into Vision Transformer to capture correlations between features in different dimensions, which enhances the accuracy of the identification of micro-expressions. In order to verify the effectiveness of the algorithm mentioned, we conduct experimental testing in the SAMM, CAS (ME) II, and SMIC datasets and compared the results with other former best algorithms. Our algorithms can improve the UF1 score and UAR score to, respectively, 0.8676 and 0.8622 for the composite dataset, and they are better than other algorithms on multiple indicators, achieving the best comprehensive performance. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

16 pages, 8982 KiB

Open AccessArticle

A Two-Stream Method for Human Action Recognition Using Facial Action Cues

by Zhimao Lai, Yan Zhang and Xiubo Liang

Sensors 2024, 24(21), 6817; https://doi.org/10.3390/s24216817 - 23 Oct 2024

Cited by 1 | Viewed by 1440

Abstract

Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. [...] Read more.

Human action recognition (HAR) is a critical area in computer vision with wide-ranging applications, including video surveillance, healthcare monitoring, and abnormal behavior detection. Current HAR methods predominantly rely on full-body data, which can limit their effectiveness in real-world scenarios where occlusion is common. In such situations, the face often remains visible, providing valuable cues for action recognition. This paper introduces Face in Action (FIA), a novel two-stream method that leverages facial action cues for robust action recognition under conditions of significant occlusion. FIA consists of an RGB stream and a landmark stream. The RGB stream processes facial image sequences using a fine-spatio-multitemporal (FSM) 3D convolution module, which employs smaller spatial receptive fields to capture detailed local facial movements and larger temporal receptive fields to model broader temporal dynamics. The landmark stream processes facial landmark sequences using a normalized temporal attention (NTA) module within an NTA-GCN block, enhancing the detection of key facial frames and improving overall recognition accuracy. We validate the effectiveness of FIA using the NTU RGB+D and NTU RGB+D 120 datasets, focusing on action categories related to medical conditions. Our experiments demonstrate that FIA significantly outperforms existing methods in scenarios with extensive occlusion, highlighting its potential for practical applications in surveillance and healthcare settings. Full article

(This article belongs to the Special Issue Deep Learning Applications for Pose Estimation and Human Action Recognition)

► Show Figures

Figure 1

21 pages, 11958 KiB

Open AccessArticle

Deep Learning-Based Fine-Tuning Approach of Coarse Registration for Ear–Nose–Throat (ENT) Surgical Navigation Systems

by Dongjun Lee, Ahnryul Choi and Joung Hwan Mun

Bioengineering 2024, 11(9), 941; https://doi.org/10.3390/bioengineering11090941 - 20 Sep 2024

Viewed by 1476

Abstract

Accurate registration between medical images and patient anatomy is crucial for surgical navigation systems in minimally invasive surgeries. This study introduces a novel deep learning-based refinement step to enhance the accuracy of surface registration without disrupting established workflows. The proposed method integrates a [...] Read more.

Accurate registration between medical images and patient anatomy is crucial for surgical navigation systems in minimally invasive surgeries. This study introduces a novel deep learning-based refinement step to enhance the accuracy of surface registration without disrupting established workflows. The proposed method integrates a machine learning model between conventional coarse registration and ICP fine registration. A deep-learning model was trained using simulated anatomical landmarks with introduced localization errors. The model architecture features global feature-based learning, an iterative prediction structure, and independent processing of rotational and translational components. Validation with silicon-masked head phantoms and CT imaging compared the proposed method to both conventional registration and a recent deep-learning approach. The results demonstrated significant improvements in target registration error (TRE) across different facial regions and depths. The average TRE for the proposed method (1.58 ± 0.52 mm) was significantly lower than that of the conventional (2.37 ± 1.14 mm) and previous deep-learning (2.29 ± 0.95 mm) approaches (p < 0.01). The method showed a consistent performance across various facial regions and enhanced registration accuracy for deeper areas. This advancement could significantly enhance precision and safety in minimally invasive surgical procedures. Full article

(This article belongs to the Special Issue Optical Imaging for Biomedical Applications)

► Show Figures

Graphical abstract

18 pages, 2857 KiB

Open AccessArticle

AnyFace++: Deep Multi-Task, Multi-Domain Learning for Efficient Face AI

by Tomiris Rakhimzhanova, Askat Kuzdeuov and Huseyin Atakan Varol

Sensors 2024, 24(18), 5993; https://doi.org/10.3390/s24185993 - 15 Sep 2024

Cited by 1 | Viewed by 2620

Abstract

Accurate face detection and subsequent localization of facial landmarks are mandatory steps in many computer vision applications, such as emotion recognition, age estimation, and gender identification. Thanks to advancements in deep learning, numerous facial applications have been developed for human faces. However, most [...] Read more.

Accurate face detection and subsequent localization of facial landmarks are mandatory steps in many computer vision applications, such as emotion recognition, age estimation, and gender identification. Thanks to advancements in deep learning, numerous facial applications have been developed for human faces. However, most have to employ multiple models to accomplish several tasks simultaneously. As a result, they require more memory usage and increased inference time. Also, less attention is paid to other domains, such as animals and cartoon characters. To address these challenges, we propose an input-agnostic face model, AnyFace++, to perform multiple face-related tasks concurrently. The tasks are face detection and prediction of facial landmarks for human, animal, and cartoon faces, including age estimation, gender classification, and emotion recognition for human faces. We trained the model using deep multi-task, multi-domain learning with a heterogeneous cost function. The experimental results demonstrate that AnyFace++ generates outcomes comparable to cutting-edge models designed for specific domains. Full article

(This article belongs to the Section Biomedical Sensors)

► Show Figures

Figure 1

16 pages, 4067 KiB

Open AccessArticle

TriCAFFNet: A Tri-Cross-Attention Transformer with a Multi-Feature Fusion Network for Facial Expression Recognition

by Yuan Tian, Zhao Wang, Di Chen and Huang Yao

Sensors 2024, 24(16), 5391; https://doi.org/10.3390/s24165391 - 21 Aug 2024

Cited by 2 | Viewed by 2189

Abstract

In recent years, significant progress has been made in facial expression recognition methods. However, tasks related to facial expression recognition in real environments still require further research. This paper proposes a tri-cross-attention transformer with a multi-feature fusion network (TriCAFFNet) to improve facial expression [...] Read more.

In recent years, significant progress has been made in facial expression recognition methods. However, tasks related to facial expression recognition in real environments still require further research. This paper proposes a tri-cross-attention transformer with a multi-feature fusion network (TriCAFFNet) to improve facial expression recognition performance under challenging conditions. By combining LBP (Local Binary Pattern) features, HOG (Histogram of Oriented Gradients) features, landmark features, and CNN (convolutional neural network) features from facial images, the model is provided with a rich input to improve its ability to discern subtle differences between images. Additionally, tri-cross-attention blocks are designed to facilitate information exchange between different features, enabling mutual guidance among different features to capture salient attention. Extensive experiments on several widely used datasets show that our TriCAFFNet achieves the SOTA performance on RAF-DB with 92.17%, AffectNet (7 cls) with 67.40%, and AffectNet (8 cls) with 63.49%, respectively. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

14 pages, 3807 KiB

Open AccessArticle

Region-Aware Deep Feature-Fused Network for Robust Facial Landmark Localization

by Xuxin Lin and Yanyan Liang

Mathematics 2023, 11(19), 4026; https://doi.org/10.3390/math11194026 - 22 Sep 2023

Cited by 1 | Viewed by 1412

Abstract

In facial landmark localization, facial region initialization usually plays an important role in guiding the model to learn critical face features. Most facial landmark detectors assume a well-cropped face as input and may underperform in real applications if the input is unexpected. To [...] Read more.

In facial landmark localization, facial region initialization usually plays an important role in guiding the model to learn critical face features. Most facial landmark detectors assume a well-cropped face as input and may underperform in real applications if the input is unexpected. To alleviate this problem, we present a region-aware deep feature-fused network (RDFN). The RDFN consists of a region detection subnetwork and a region-wise landmark localization subnetwork to explicitly solve the input initialization problem and derive the landmark score maps, respectively. To exploit the association between tasks, we develop a cross-task feature fusion scheme to extract multi-semantic region features while trading off their importance in different dimensions via global channel attention and global spatial attention. Furthermore, we design a within-task feature fusion scheme to capture the multi-scale context and improve the gradient flow for the landmark localization subnetwork. At the inference stage, a location reweighting strategy is employed to transform the score maps into 2D landmark coordinates. Extensive experimental results demonstrate that our method has competitive performance compared to recent state-of-the-art methods, achieving NMEs of 3.28%, 1.48%, and 3.43% on the 300W, AFLW, and COFW datasets, respectively. Full article

(This article belongs to the Special Issue Application of Advanced Computing and Artificial Intelligence in Engineering and Science)

► Show Figures

Figure 1

14 pages, 16370 KiB

Open AccessArticle

An Automated Method of 3D Facial Soft Tissue Landmark Prediction Based on Object Detection and Deep Learning

by Yuchen Zhang, Yifei Xu, Jiamin Zhao, Tianjing Du, Dongning Li, Xinyan Zhao, Jinxiu Wang, Chen Li, Junbo Tu and Kun Qi

Diagnostics 2023, 13(11), 1853; https://doi.org/10.3390/diagnostics13111853 - 25 May 2023

Cited by 3 | Viewed by 4719

Abstract

Background: Three-dimensional facial soft tissue landmark prediction is an important tool in dentistry, for which several methods have been developed in recent years, including a deep learning algorithm which relies on converting 3D models into 2D maps, which results in the loss of [...] Read more.

Background: Three-dimensional facial soft tissue landmark prediction is an important tool in dentistry, for which several methods have been developed in recent years, including a deep learning algorithm which relies on converting 3D models into 2D maps, which results in the loss of information and precision. Methods: This study proposes a neural network architecture capable of directly predicting landmarks from a 3D facial soft tissue model. Firstly, the range of each organ is obtained by an object detection network. Secondly, the prediction networks obtain landmarks from the 3D models of different organs. Results: The mean error of this method in local experiments is

2.62 \pm 2.39

, which is lower than that in other machine learning algorithms or geometric information algorithms. Additionally, over 72% of the mean error of test data falls within

\pm 2.5

mm, and 100% falls within 3 mm. Moreover, this method can predict 32 landmarks, which is higher than any other machine learning-based algorithm. Conclusions: According to the results, the proposed method can precisely predict a large number of 3D facial soft tissue landmarks, which gives the feasibility of directly using 3D models for prediction. Full article

(This article belongs to the Special Issue Craniofacial Imaging in Clinical Practice: Techniques, Innovations and Clinical Applications)

► Show Figures

Figure 1

17 pages, 2657 KiB

Open AccessArticle

Combining CNNs and Markov-like Models for Facial Landmark Detection with Spatial Consistency Estimates

by Ahmed Gdoura, Markus Degünther, Birgit Lorenz and Alexander Effland

J. Imaging 2023, 9(5), 104; https://doi.org/10.3390/jimaging9050104 - 22 May 2023

Cited by 6 | Viewed by 3138

Abstract

The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting [...] Read more.

The accurate localization of facial landmarks is essential for several tasks, including face recognition, head pose estimation, facial region extraction, and emotion detection. Although the number of required landmarks is task-specific, models are typically trained on all available landmarks in the datasets, limiting efficiency. Furthermore, model performance is strongly influenced by scale-dependent local appearance information around landmarks and the global shape information generated by them. To account for this, we propose a lightweight hybrid model for facial landmark detection designed specifically for pupil region extraction. Our design combines a convolutional neural network (CNN) with a Markov random field (MRF)-like process trained on only 17 carefully selected landmarks. The advantage of our model is the ability to run different image scales on the same convolutional layers, resulting in a significant reduction in model size. In addition, we employ an approximation of the MRF that is run on a subset of landmarks to validate the spatial consistency of the generated shape. This validation process is performed against a learned conditional distribution, expressing the location of one landmark relative to its neighbor. Experimental results on popular facial landmark localization datasets such as 300 w, WFLW, and HELEN demonstrate the accuracy of our proposed model. Furthermore, our model achieves state-of-the-art performance on a well-defined robustness metric. In conclusion, the results demonstrate the ability of our lightweight model to filter out spatially inconsistent predictions, even with significantly fewer training landmarks. Full article

(This article belongs to the Topic Computer Vision and Image Processing)

► Show Figures

Figure 1

22 pages, 7434 KiB

Open AccessArticle

Heatmap-Guided Selective Feature Attention for Robust Cascaded Face Alignment

by Jaehyun So and Youngjoon Han

Sensors 2023, 23(10), 4731; https://doi.org/10.3390/s23104731 - 13 May 2023

Cited by 3 | Viewed by 2545

Abstract

Face alignment methods have been actively studied using coordinate and heatmap regression tasks. Although these regression tasks have the same objective for facial landmark detection, each task requires different valid feature maps. Therefore, it is not easy to simultaneously train two kinds of [...] Read more.

Face alignment methods have been actively studied using coordinate and heatmap regression tasks. Although these regression tasks have the same objective for facial landmark detection, each task requires different valid feature maps. Therefore, it is not easy to simultaneously train two kinds of tasks with a multi-task learning network structure. Some studies have proposed multi-task learning networks with two kinds of tasks, but they do not suggest an efficient network that can train them simultaneously because of the shared noisy feature maps. In this paper, we propose a heatmap-guided selective feature attention for robust cascaded face alignment based on multi-task learning, which improves the performance of face alignment by efficiently training coordinate regression and heatmap regression. The proposed network improves the performance of face alignment by selecting valid feature maps for heatmap and coordinate regression and using the background propagation connection for tasks. This study also uses a refinement strategy that detects global landmarks through a heatmap regression task and then localizes landmarks through cascaded coordinate regression tasks. To evaluate the proposed network, we tested it on the 300W, AFLW, COFW, and WFLW datasets and obtained results that outperformed other state-of-the-art networks. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

16 pages, 2959 KiB

Open AccessArticle

Comparison Study of Extraction Accuracy of 3D Facial Anatomical Landmarks Based on Non-Rigid Registration of Face Template

by Aonan Wen, Yujia Zhu, Ning Xiao, Zixiang Gao, Yun Zhang, Yong Wang, Shengjin Wang and Yijiao Zhao

Diagnostics 2023, 13(6), 1086; https://doi.org/10.3390/diagnostics13061086 - 13 Mar 2023

Cited by 6 | Viewed by 2785

Abstract

(1) Background: Three-dimensional (3D) facial anatomical landmarks are the premise and foundation of facial morphology analysis. At present, there is no ideal automatic determination method for 3D facial anatomical landmarks. This research aims to realize the automatic determination of 3D facial anatomical landmarks [...] Read more.

(1) Background: Three-dimensional (3D) facial anatomical landmarks are the premise and foundation of facial morphology analysis. At present, there is no ideal automatic determination method for 3D facial anatomical landmarks. This research aims to realize the automatic determination of 3D facial anatomical landmarks based on the non-rigid registration algorithm developed by our research team and to evaluate its landmark localization accuracy. (2) Methods: A 3D facial scanner, Face Scan, was used to collect 3D facial data of 20 adult males without significant facial deformities. Using the radial basis function optimized non-rigid registration algorithm, TH-OCR, developed by our research team (experimental group: TH group) and the non-rigid registration algorithm, MeshMonk (control group: MM group), a 3D face template constructed in our previous research was deformed and registered to each participant’s data. The automatic determination of 3D facial anatomical landmarks was realized according to the index of 32 facial anatomical landmarks determined on the 3D face template. Considering these 32 facial anatomical landmarks manually selected by experts on the 3D facial data as the gold standard, the distance between the automatically determined and the corresponding manually selected facial anatomical landmarks was calculated as the “landmark localization error” to evaluate the effect and feasibility of the automatic determination method (template method). (3) Results: The mean landmark localization error of all facial anatomical landmarks in the TH and MM groups was 2.34 ± 1.76 mm and 2.16 ± 1.97 mm, respectively. The automatic determination of the anatomical landmarks in the middle face was better than that in the upper and lower face in both groups. Further, the automatic determination of anatomical landmarks in the center of the face was better than in the marginal part. (4) Conclusions: In this study, the automatic determination of 3D facial anatomical landmarks was realized based on non-rigid registration algorithms. There is no significant difference in the automatic landmark localization accuracy between the TH-OCR algorithm and the MeshMonk algorithm, and both can meet the needs of oral clinical applications to a certain extent. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

20 pages, 3390 KiB

Open AccessArticle

Relation between Nasal Septum Deviation and Facial Asymmetry: An Ontogenetic Analysis from Infants to Children Using Geometric Morphometrics

by Azalea Shamaei-Tousi, Alessio Veneziano and Federica Landi

Appl. Sci. 2022, 12(22), 11362; https://doi.org/10.3390/app122211362 - 9 Nov 2022

Cited by 3 | Viewed by 4820

Abstract

The nasal septum has been postulated to have an intrinsic growth power and act as a pacemaker for facial development, its interactions with local craniofacial structures likely to influence facial anatomy and morphology. Recent studies have begun to investigate the link between nasal [...] Read more.

The nasal septum has been postulated to have an intrinsic growth power and act as a pacemaker for facial development, its interactions with local craniofacial structures likely to influence facial anatomy and morphology. Recent studies have begun to investigate the link between nasal septum deviation and facial asymmetry; however, the magnitude and mechanisms of this relation are still unclear. This study aimed to analyse the degree of nasal septum deviation in a sample of infants and children (males and females from 0 to 8 years old) and its correlation with the three-dimensional structure of the facial skeleton. The scope was to test whether septal deviation is linked, and might cause, the development of a more asymmetric face. For this aim, 41 3D landmarks (homologous points) were collected on the nasal septum and cranial surface of 46 specimens extracted from medical CT-scans and were analysed using Geometric Morphometrics, Multiple Linear regressions, Multivariate ANOVAs, and Principal Component Analysis (PCA). Results showed no significant correlation between magnitude of septal deviation and the ontogeny (changes in age) or sex of the sample, but a significant association was found between side of deviation and septal deviation magnitude and frequency. The asymmetric PCA reveals that most of the asymmetry identified is fluctuating, and that changes in the asymmetric morphology of the face are not associated to a specific side of septal deviation. In addition, a series of Multivariate ANOVAs showed that age, sex, and septal deviation have no impact on facial asymmetry, with only age impacting the symmetric development of the facial morphology. When looking at factors impacting the general morphology of the face, age is again the only major driving component, with fluctuating asymmetry and sex only approaching significance. These results could imply a certain degree of dissociation between the mechanisms of facial and septal growth and development; however, an investigation of other key developmental stages in facial morphology is needed to further understand the relation between septal deviation and facial growth. Full article

► Show Figures

Figure 1

25 pages, 620 KiB

Open AccessArticle

Personalized Federated Multi-Task Learning over Wireless Fading Channels

by Matin Mortaheb, Cemil Vahapoglu and Sennur Ulukus

Algorithms 2022, 15(11), 421; https://doi.org/10.3390/a15110421 - 9 Nov 2022

Cited by 10 | Viewed by 3784

Abstract

Multi-task learning (MTL) is a paradigm to learn multiple tasks simultaneously by utilizing a shared network, in which a distinct header network is further tailored for fine-tuning for each distinct task. Personalized federated learning (PFL) can be achieved through MTL in the context [...] Read more.

Multi-task learning (MTL) is a paradigm to learn multiple tasks simultaneously by utilizing a shared network, in which a distinct header network is further tailored for fine-tuning for each distinct task. Personalized federated learning (PFL) can be achieved through MTL in the context of federated learning (FL) where tasks are distributed across clients, referred to as personalized federated MTL (PF-MTL). Statistical heterogeneity caused by differences in the task complexities across clients and the non-identically independently distributed (non-i.i.d.) characteristics of local datasets degrades the system performance. To overcome this degradation, we propose FedGradNorm, a distributed dynamic weighting algorithm that balances learning speeds across tasks by normalizing the corresponding gradient norms in PF-MTL. We prove an exponential convergence rate for FedGradNorm. Further, we propose HOTA-FedGradNorm by utilizing over-the-air aggregation (OTA) with FedGradNorm in a hierarchical FL (HFL) setting. HOTA-FedGradNorm is designed to have efficient communication between the parameter server (PS) and clients in the power- and bandwidth-limited regime. We conduct experiments with both FedGradNorm and HOTA-FedGradNorm using MT facial landmark (MTFL) and wireless communication system (RadComDynamic) datasets. The results indicate that both frameworks are capable of achieving a faster training performance compared to equal-weighting strategies. In addition, FedGradNorm and HOTA-FedGradNorm compensate for imbalanced datasets across clients and adverse channel effects. Full article

(This article belongs to the Special Issue Gradient Methods for Optimization)

► Show Figures

Figure 1

21 pages, 5598 KiB

Open AccessArticle

Kids’ Emotion Recognition Using Various Deep-Learning Models with Explainable AI

by Manish Rathod, Chirag Dalvi, Kulveen Kaur, Shruti Patil, Shilpa Gite, Pooja Kamat, Ketan Kotecha, Ajith Abraham and Lubna Abdelkareim Gabralla

Sensors 2022, 22(20), 8066; https://doi.org/10.3390/s22208066 - 21 Oct 2022

Cited by 19 | Viewed by 9711

Abstract

Human ideas and sentiments are mirrored in facial expressions. They give the spectator a plethora of social cues, such as the viewer’s focus of attention, intention, motivation, and mood, which can help develop better interactive solutions in online platforms. This could be helpful [...] Read more.

Human ideas and sentiments are mirrored in facial expressions. They give the spectator a plethora of social cues, such as the viewer’s focus of attention, intention, motivation, and mood, which can help develop better interactive solutions in online platforms. This could be helpful for children while teaching them, which could help in cultivating a better interactive connect between teachers and students, since there is an increasing trend toward the online education platform due to the COVID-19 pandemic. To solve this, the authors proposed kids’ emotion recognition based on visual cues in this research with a justified reasoning model of explainable AI. The authors used two datasets to work on this problem; the first is the LIRIS Children Spontaneous Facial Expression Video Database, and the second is an author-created novel dataset of emotions displayed by children aged 7 to 10. The authors identified that the LIRIS dataset has achieved only 75% accuracy, and no study has worked further on this dataset in which the authors have achieved the highest accuracy of 89.31% and, in the authors’ dataset, an accuracy of 90.98%. The authors also realized that the face construction of children and adults is different, and the way children show emotions is very different and does not always follow the same way of facial expression for a specific emotion as compared with adults. Hence, the authors used 3D 468 landmark points and created two separate versions of the dataset from the original selected datasets, which are LIRIS-Mesh and Authors-Mesh. In total, all four types of datasets were used, namely LIRIS, the authors’ dataset, LIRIS-Mesh, and Authors-Mesh, and a comparative analysis was performed by using seven different CNN models. The authors not only compared all dataset types used on different CNN models but also explained for every type of CNN used on every specific dataset type how test images are perceived by the deep-learning models by using explainable artificial intelligence (XAI), which helps in localizing features contributing to particular emotions. The authors used three methods of XAI, namely Grad-CAM, Grad-CAM++, and SoftGrad, which help users further establish the appropriate reason for emotion detection by knowing the contribution of its features in it. Full article

(This article belongs to the Special Issue Deep Learning Methods for Human Activity Recognition and Emotion Detection)

► Show Figures

Figure 1

Search Results (30)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (30)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI