Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,483)

Search Parameters:
Keywords = facial feature

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 4182 KB  
Article
Gender-Aware Driver Drowsiness Detection Using Multi-Stream Shifted-Window-Based Hierarchical Vision Transformers
by M. Faisal Nurnoby and El-Sayed M. El-Alfy
Appl. Sci. 2026, 16(7), 3353; https://doi.org/10.3390/app16073353 - 30 Mar 2026
Abstract
Given its substantial contribution to traffic accidents, one of the main goals of intelligent driver-assistance systems has become the detection and mitigation of driver fatigue to enhance driving safety and comfort. Among various approaches, vision-based facial analysis using deep learning has emerged as [...] Read more.
Given its substantial contribution to traffic accidents, one of the main goals of intelligent driver-assistance systems has become the detection and mitigation of driver fatigue to enhance driving safety and comfort. Among various approaches, vision-based facial analysis using deep learning has emerged as an effective and non-intrusive method for identifying driver drowsiness, as a key manifestation of fatigue. However, current drowsiness detection models do not account for demographic factors like gender, even though recent research has shown gender behavioral differences such as eye closure duration, blink frequency, yawning patterns, and facial muscle relaxation. In this paper, we present a fine-grained multi-stream transformer architecture that incorporates gender-awareness and shifted-windows attention for spatial feature fusion. Integrating gender embedding, by modulating the region-based features, allows the model to effectively learn gender-conditioned drowsiness features to minimize bias and diluted representations. Using the NTHU-DDD dataset, we evaluated two-stream and three-stream variants for gender-aware and gender-agnostic across three facial region contexts: the face region with a 20% margin, bare face region, and key facial regions (face, eyes, and mouth). A comprehensive ablation study was conducted to identify the most effective model setup. The results demonstrate that incorporating gender embedding improves detection performance, achieving an accuracy of 95.47% on the evaluation set. Moreover, using the proposed three-stream model (SWT-DD-3S) produced better results. Full article
21 pages, 18952 KB  
Article
Evaluating AI-Based Image Inpainting Techniques for Facial Components Restoration Using Semantic Masks
by Hussein Sharadga, Abdullah Hayajneh and Erchin Serpedin
AI 2026, 7(4), 119; https://doi.org/10.3390/ai7040119 - 30 Mar 2026
Abstract
This paper presents a comparative analysis of advanced AI-based techniques for human face inpainting using semantic masks that fully occlude targeted facial components. The primary objective is to evaluate the ability of image inpainting methods to accurately restore semantically meaningful facial features. Our [...] Read more.
This paper presents a comparative analysis of advanced AI-based techniques for human face inpainting using semantic masks that fully occlude targeted facial components. The primary objective is to evaluate the ability of image inpainting methods to accurately restore semantically meaningful facial features. Our results show that existing inpainting models face significant challenges when semantic masks completely obscure the underlying facial structures. In contrast to random masks, which leave partial visual cues, semantic masks remove all structural information, making reconstruction substantially more difficult. We assess the performance of generative adversarial networks (GANs), transformer-based models, and diffusion models in restoring fully occluded facial components. To address these challenges, we explore three retraining strategies: using semantic masks, using random masks, and a hybrid approach combining both. While the hybrid strategy leverages the complementary strengths of each mask type and improves contextual understanding, fully accurate reconstruction remains challenging. These findings demonstrate that inpainting under fully occluding semantic masks is a critical yet underexplored area, offering opportunities for developing new AI architectures and strategies for advanced facial reconstruction. Full article
16 pages, 3976 KB  
Article
Spiking Feature-Driven Event Simulation with Movement-Aware Polarity Integration
by Jiwoong Oh, Byeongjun Kang, Hyungsik Shin and Dongwoo Kang
Electronics 2026, 15(7), 1420; https://doi.org/10.3390/electronics15071420 - 29 Mar 2026
Abstract
Event-based face detection has attracted significant interest due to the unique advantages of event cameras, including high temporal resolution, high dynamic range, and low power consumption. However, the lack of annotated public datasets remains a major challenge for training effective event-based face detection [...] Read more.
Event-based face detection has attracted significant interest due to the unique advantages of event cameras, including high temporal resolution, high dynamic range, and low power consumption. However, the lack of annotated public datasets remains a major challenge for training effective event-based face detection models. In this paper, we propose a spiking feature-driven synthetic event generation framework that utilizes a spiking neural network (SNN) in conjunction with a pretrained convolutional backbone to generate synthetic event representations from a single RGB image. To incorporate motion-induced ON/OFF polarity information, we introduce a movement-aware polarity integration (MPI) module that assumes four directional facial movements. An event-similarity score is further employed to select representations most consistent with real event data for training. Unlike conventional approaches relying on video-based simulators, our method enables efficient synthetic event dataset construction without requiring video inputs or additional simulation training. Experimental results on the N-Caltech101 dataset demonstrate a face detection accuracy of 99.91%, outperforming existing event-based face detection methods. Full article
(This article belongs to the Special Issue Edge-Intelligent Sustainable Cyber-Physical Systems)
Show Figures

Figure 1

24 pages, 1254 KB  
Article
ConvNeXt Meets Vision Transformers: A Powerful Hybrid Framework for Facial Age Estimation
by Gaby Maroun, Salah Eddine Bekhouche and Fadi Dornaika
Appl. Sci. 2026, 16(7), 3281; https://doi.org/10.3390/app16073281 - 28 Mar 2026
Viewed by 55
Abstract
Age estimation based on facial images is a challenging task due to the complex and nonlinear nature of facial aging, which is influenced by both genetic and environmental factors. To address this challenge, we propose a hybrid ConvNeXt–Transformer framework that combines convolutional local [...] Read more.
Age estimation based on facial images is a challenging task due to the complex and nonlinear nature of facial aging, which is influenced by both genetic and environmental factors. To address this challenge, we propose a hybrid ConvNeXt–Transformer framework that combines convolutional local feature extraction with attention-based global contextual modeling within a unified age regression pipeline. The methodological contribution of this work lies in the sequential integration of these two complementary paradigms for facial age estimation, allowing the model to capture both fine-grained textural cues—such as wrinkles and skin spots—and long-range spatial dependencies. We evaluate the proposed framework on benchmark datasets including MORPH II, CACD, UTKFace, and AFAD. The results show competitive performance across these datasets and confirm the effectiveness of the proposed hybrid design through extensive ablation analyses. Experimental results demonstrate that our approach achieves state-of-the-art MAE on MORPH II (2.26), CACD (4.35), and AFAD (3.09) under the adopted benchmark settings while remaining competitive on UTKFace. To address computational efficiency, we employ ImageNet pre-trained backbones and explore different architectural configurations, including fusion strategies and varying depths of the Transformer module, as well as regularization techniques such as stochastic depth and label smoothing. Ablation studies confirm the contribution of each component, particularly the role of attention mechanisms, in enhancing the model’s sensitivity to age-relevant features. Overall, the proposed hybrid framework provides a robust and accurate solution for facial age estimation, effectively balancing performance and computational cost. Full article
(This article belongs to the Special Issue Applications of Data Science and Artificial Intelligence, 2nd Edition)
Show Figures

Figure 1

22 pages, 5163 KB  
Article
How Blue–Green Integration Shapes Urban Emotional Behavior: Evidence from Facial Expressions in Social Media Photos
by Xiaolu Wu, Huihui Liu, Jing Wu and Ziyi Li
Land 2026, 15(4), 553; https://doi.org/10.3390/land15040553 - 27 Mar 2026
Viewed by 107
Abstract
Urban mental health is increasingly influenced by daily environmental exposures, yet limited empirical evidence exists regarding how the spatial configuration of blue–green environments, rather than their mere quantity, relates to emotional behavior in high-density cities. Guided by restoration theories and a perception-based perspective [...] Read more.
Urban mental health is increasingly influenced by daily environmental exposures, yet limited empirical evidence exists regarding how the spatial configuration of blue–green environments, rather than their mere quantity, relates to emotional behavior in high-density cities. Guided by restoration theories and a perception-based perspective on landscape integration, this study analyzes the urban core of Shanghai by linking blue–green configurations to emotional states inferred from 20,907 geotagged social media facial photographs. Facial expressions serve to derive indices for emotional valence and arousal. The results demonstrate significant spatial clustering of emotional behavior, where hotspots are concentrated in higher-quality and more open settings, while coldspots cluster in dense areas with sparse vegetation. Emotional behavior also exhibits demographic heterogeneity, as females display higher valence and arousal than males. Furthermore, happiness tends to increase with age across both genders, whereas arousal declines specifically among male age groups. Crucially, emotional outcomes align more consistently with landscape integration and configuration than with isolated blue or green areas. Factors such as high connectivity, superior vegetation condition, and configurations featuring water embedded within green space are associated with favorable emotional responses. Conversely, extensive edge-dominated interfaces and high traffic exposure correlate with less favorable outcomes. These findings suggest a shift in blue–green planning from increasing total area toward optimizing spatial composition. Specifically, priority should be given to embedded and cohesive designs alongside the reduction of ambient stressors to foster emotionally supportive environments in dense urban cores. Methodologically, image-derived behavioral traces provide a scalable and ecologically grounded approach for investigating place-based affect at a city scale. Full article
20 pages, 17596 KB  
Article
Enhanced Facial Realism in Personalized Diffusion Models: A Memory-Optimized DreamBooth Implementation for Consumer Hardware
by Sandeep Gupta, Kanad Ray, Shamim Kaiser, Sazzad Hossain and Jocelyn Faubert
Algorithms 2026, 19(4), 257; https://doi.org/10.3390/a19040257 - 27 Mar 2026
Viewed by 213
Abstract
Despite significant progress in general-purpose diffusion-based models capable of producing high-quality media, this approach is still too difficult to implement on consumer/gamer hardware. We present here a memory-optimized DreamBooth framework designed for consumer-grade GPUs with 16 GB of VRAM, that allows for end-to-end [...] Read more.
Despite significant progress in general-purpose diffusion-based models capable of producing high-quality media, this approach is still too difficult to implement on consumer/gamer hardware. We present here a memory-optimized DreamBooth framework designed for consumer-grade GPUs with 16 GB of VRAM, that allows for end-to-end image personalization and addresses some of the limitations of existing solutions. Our system reduces peak GPU memory from 22 GB (baseline DreamBooth) to 14.2 GB through novel hierarchical memory management, including attention slicing, Variational Autoencoder (VAE) tiling, gradient accumulation, and gradient checkpointing integrated within the Hugging Face Accelerate ecosystem. The framework further incorporates state-of-the-art techniques for preserving facial features and a comprehensive automated quality management system. The result is a complete end-to-end pipeline achieving a peak memory of 14.2 GB, with quantitative performance (LPIPS: 0.139, SSIM: 0.879, identity: 0.852, and FID: 23.1) competitive with methods requiring significantly more hardware resources. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

16 pages, 348 KB  
Article
Challenges in Diagnosis and Management of Coffin–Lowry Syndrome—Single-Center Experience
by Ana Maria Chirilas, Alexandru Cărămizaru, Anca-Lelia Riza, Andreea Mitut-Veliscu, Andrei Costache, Rebecca-Cristiana Șerban, Aritina Morosanu, Carmen Niculescu, Alexandru-Cătălin Pâslaru, Florin Burada and Ioana Streata
Diagnostics 2026, 16(7), 990; https://doi.org/10.3390/diagnostics16070990 - 25 Mar 2026
Viewed by 201
Abstract
Background/Objectives: Coffin–Lowry syndrome (CLS) is a rare X-linked disease caused by pathogenic variants in the RPS6KA3 gene. It is generally characterized by syndromic intellectual disability and distinctive facial features, skeletal abnormalities, stimulus-induced drop attacks in males, and variable manifestations in females. Methods [...] Read more.
Background/Objectives: Coffin–Lowry syndrome (CLS) is a rare X-linked disease caused by pathogenic variants in the RPS6KA3 gene. It is generally characterized by syndromic intellectual disability and distinctive facial features, skeletal abnormalities, stimulus-induced drop attacks in males, and variable manifestations in females. Methods: We report clinical and genetic findings in a series of 10 cases, eight males and two females, evaluated at the Regional Centre of Medical Genetics Dolj—Emergency Clinical County Hospital Craiova. Results: Genetic testing identified 10 de novo variants in the RPS6KA3 gene consisting of six missense mutations, one nonsense variant, one frameshift, and two variants in non-coding or intronic regions. Case management requires multidisciplinary coordination and is limited to resources mostly available in reference centers. Conclusions: CLS highlights the importance of molecular diagnosis in rare genetic disorders, particularly when clinical features are subtle or atypical. These findings have practical implications for clinical management, suggesting the need for comprehensive genetic screening and individualized care approaches. Full article
18 pages, 1085 KB  
Article
Self-Learning Multimodal Emotion Recognition Based on Multi-Scale Dilated Attention
by Xiuli Du and Luyao Zhu
Brain Sci. 2026, 16(4), 350; https://doi.org/10.3390/brainsci16040350 (registering DOI) - 25 Mar 2026
Viewed by 205
Abstract
Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance [...] Read more.
Background/Objectives: Emotions can be recognized through external behavioral cues and internal physiological signals. Owing to the inherently complex psychological and physiological nature of emotions, models relying on a single modality often suffer from limited robustness. This study aims to improve emotion recognition performance by effectively integrating electroencephalogram (EEG) signals and facial expressions through a multimodal framework. Methods: We propose a multimodal emotion recognition model that employs a Multi-Scale Dilated Attention Convolution (MSDAC) network tailored for facial expression recognition, integrates an EEG emotion recognition method based on three-dimensional features, and adopts a self-learning decision-level fusion strategy. MSDAC incorporates Multi-Scale Dilated Convolutions and a Dual-Branch Attention (D-BA) module to capture discontinuous facial action units. For EEG processing, raw signals are converted into a multidimensional time–frequency–spatial representation to preserve temporal, spectral, and spatial information. To overcome the limitations of traditional stitching or fixed-weight fusion approaches, a self-learning weight fusion mechanism is introduced at the decision level to adaptively adjust modality contributions. Results: The facial analysis branch achieved average accuracies of 74.1% on FER2013, 99.69% on CK+, and 98.05% (valence)/96.15% (arousal) on DEAP. On the DEAP dataset, the complete multimodal model reached 98.66% accuracy for valence and 97.49% for arousal classification. Conclusions: The proposed framework enhances emotion recognition by improving facial feature extraction and enabling adaptive multimodal fusion, demonstrating the effectiveness of combining EEG and facial information for robust emotion analysis. Full article
(This article belongs to the Section Cognitive, Social and Affective Neuroscience)
Show Figures

Graphical abstract

11 pages, 2071 KB  
Article
Heimler Syndrome Caused by Novel PEX6 Variants: Clinical and Genetic Characterization in a Saudi Cohort
by Basamat AlMoallem
Genes 2026, 17(4), 360; https://doi.org/10.3390/genes17040360 - 24 Mar 2026
Viewed by 171
Abstract
Background: Heimler syndrome (HS) is a rare autosomal recessive disorder representing the mildest end of the peroxisome biogenesis disorder spectrum. It is caused by hypomorphic mutations in peroxisomal assembly genes, most commonly PEX1 and PEX6, and is characterized by sensorineural hearing loss, [...] Read more.
Background: Heimler syndrome (HS) is a rare autosomal recessive disorder representing the mildest end of the peroxisome biogenesis disorder spectrum. It is caused by hypomorphic mutations in peroxisomal assembly genes, most commonly PEX1 and PEX6, and is characterized by sensorineural hearing loss, amelogenesis imperfecta, and retinal dystrophy. Due to phenotypic overlap with other inherited sensory disorders, particularly Usher syndrome, diagnosis of this condition is frequently delayed. Methods: We investigated two unrelated Saudi families presenting with congenital hearing loss and retinal dystrophy who were initially diagnosed with Usher syndrome. Detailed clinical evaluation, including comprehensive ophthalmologic and audiologic assessments, was performed. Whole-exome sequencing (WES) was conducted to identify the underlying genetic cause, followed by variant filtering and in silico pathogenicity prediction. Results: We identified a novel homozygous missense variant, p.Val97Gly (V97G), in the PEX6 gene that co-segregated with the disease phenotype in both families. This variant was absent from major population databases, including dbSNP, the 1000 Genomes Project, ExAC, and gnomAD, and was predicted to be deleterious by multiple in silico prediction tools. Clinically, affected individuals presented with congenital sensorineural hearing loss, pigmentary retinal dystrophy with electrophysiological evidence of cone–rod dysfunction, enamel abnormalities consistent with amelogenesis imperfecta, and mild dysmorphic facial features, supporting a diagnosis within the Heimler syndrome spectrum. Conclusions: Our findings expand the mutational spectrum of PEX6 and highlight Heimler syndrome as an important differential diagnosis in patients presenting with Usher-like phenotypes. To the best of our knowledge, this study represents the first report of the PEX6 p.Val97Gly variant associated with Heimler syndrome in a Saudi population, underscoring the value of whole-exome sequencing for accurate diagnosis and genetic counseling in individuals with inherited sensory disorders. Full article
(This article belongs to the Special Issue The Genetic Lens: A New Era in Ophthalmology)
Show Figures

Figure 1

24 pages, 5930 KB  
Article
Style-Abstraction-Based Data Augmentation for Robust Affective Computing
by Xu Qiu, Taewan Kim and Bongjae Kim
Appl. Sci. 2026, 16(6), 3109; https://doi.org/10.3390/app16063109 - 23 Mar 2026
Viewed by 224
Abstract
Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial [...] Read more.
Personality recognition and emotion recognition, two core tasks within affective computing, are fundamentally constrained by data scarcity as collecting and annotating human behavioral data is expensive and restricted by privacy concerns. Under these limited data conditions, existing models tend to rely on superficial shortcut features such as background appearance, lighting conditions, or color variations, rather than behavior-relevant cues including facial expressions, posture, and motion dynamics. To address this issue, we propose Style-Abstraction-based Data Augmentation, a style transfer-based augmentation strategy that reduces dependency on low-level appearance information while preserving high-level semantic cues. Specifically, we employ cartoonization to generate stylized variants of training videos that retain expressive characteristics but remove stylistic bias. We validate our approach on three diverse personality benchmarks (First Impression v2, UDIVA v0.5, and KETI) and emotion benchmark(Emotion Dataset) using state-of-the-art models including ViViT (Video Vision Transformer), TimeSformer, and VST (Video Swin Transformer). Our experiments indicate that increasing the proportion of style-abstracted data in the training set can improve performance on the evaluated datasets. Notably, our method yields consistent gains across all benchmarks: a 0.0893 reduction in MSE on UDIVA v0.5 (with VST), a 0.0023 improvement in 1-MAE on KETI (with TimeSformer), and a 0.0051 improvement on First Impression v2 (with TimeSformer). Furthermore, extending style-abstraction-based data augmentation to a four-class categorical emotion recognition task demonstrates similar performance gains, achieving up to a 3.44% accuracy increase with the TimeSformer backbone. These findings verify that our style-abstraction-based data augmentation facilitates learning of behavior-relevant features by reducing reliance on superficial shortcuts. Overall, cartoonization-based style abstraction for data augmentation functions as both an effective augmentation strategy and a regularization mechanism, encouraging the model to learn more stable and generalizable representations for affective computing applications. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

16 pages, 475 KB  
Article
Skeletal Characteristics and Clinical Treatment Patterns in Orthognathic Surgery: A Virtual Surgical Planning-Based Study
by Merve Berika Kadıoğlu, Mehmet Emre Yurttutan, Mehmet Alp Eriş, Meyra Durmaz and Ömer Faruk Kocamaz
Healthcare 2026, 14(6), 809; https://doi.org/10.3390/healthcare14060809 - 22 Mar 2026
Viewed by 199
Abstract
Background/Objectives: Virtual surgical planning (VSP) allows three-dimensional assessment of complex dentofacial deformities and has become integral to modern orthognathic surgery. However, evidence remains limited regarding how skeletal characteristics and malocclusion patterns translate into surgical movement selection. This study aimed to evaluate demographic features, [...] Read more.
Background/Objectives: Virtual surgical planning (VSP) allows three-dimensional assessment of complex dentofacial deformities and has become integral to modern orthognathic surgery. However, evidence remains limited regarding how skeletal characteristics and malocclusion patterns translate into surgical movement selection. This study aimed to evaluate demographic features, skeletal malocclusion patterns, and clinical treatment strategies in patients undergoing VSP-guided orthognathic surgery. Methods: This retrospective study included 158 patients who underwent VSP-assisted orthognathic surgery between 2019 and 2025. Sagittal skeletal classification, vertical growth pattern, facial asymmetry, and maxillary crossbite were evaluated together with planned maxillary and mandibular movements. Surgical procedures were analyzed according to skeletal malocclusion classes (Class I, II, and III). Group comparisons were performed using chi-square and Kruskal–Wallis tests. Multivariable logistic regression analysis was conducted to assess factors associated with bimaxillary surgery (p < 0.05). Results: Skeletal Class I malocclusion was most prevalent (46.8%), followed by Class III (29.7%) and Class II (23.4%). Hyperdivergent growth patterns were predominantly observed in Class II patients, whereas normodivergent patterns were most common in Class III cases (p < 0.05). Mandibular advancement and setback generally followed expected class-based trends but were also observed across non-corresponding skeletal classes. Maxillary impaction and mandibular autorotation were frequently incorporated. Bimaxillary surgery was performed in 84.2% of cases. Logistic regression analysis showed no independent predictors of bimaxillary surgery (p > 0.05). Conclusions: VSP-assisted orthognathic surgery demonstrates that surgical planning cannot be reduced to sagittal skeletal classification alone. Treatment decisions are shaped by combined sagittal, vertical, transverse, and patient-specific factors, supporting a multidimensional and individualized planning approach. Full article
(This article belongs to the Special Issue Oral and Maxillofacial Health Care: Third Edition)
Show Figures

Figure 1

19 pages, 13660 KB  
Article
CA-GFNet: A Cross-Modal Adaptive Gated Fusion Network for Facial Emotion Recognition
by Sitara Afzal and Jong-Ha Lee
Mathematics 2026, 14(6), 1068; https://doi.org/10.3390/math14061068 - 21 Mar 2026
Viewed by 176
Abstract
Facial emotion recognition (FER) plays an important role in healthcare, human–computer interaction, and intelligent security systems. However, despite recent advances, many state-of-the-art FER methods depend on computationally intensive CNN or transformer backbones and large-scale annotated datasets while suffering noticeable performance degradation under cross-dataset [...] Read more.
Facial emotion recognition (FER) plays an important role in healthcare, human–computer interaction, and intelligent security systems. However, despite recent advances, many state-of-the-art FER methods depend on computationally intensive CNN or transformer backbones and large-scale annotated datasets while suffering noticeable performance degradation under cross-dataset evaluation because of domain shift. These limitations hinder practical usage in resource-constrained and real-world environments. To address this issue, we propose Cross-Adaptive Gated Fusion Network (CA-GFNet), a lightweight dual-stream FER framework that explicitly combines shallow structural features with deep semantic representations. The proposed architecture integrates domain-robust gradient-based descriptors with compact deep features extracted from a VGG-based backbone. After face detection and normalization, the structural stream captures fine-grained local appearance cues, whereas the semantic stream encodes high-level facial configurations. The two feature streams are projected into a shared latent space and adaptively fused using a gated fusion mechanism that learns sample-specific weights, allowing the model to prioritize the more reliable feature source under dataset shift. Extensive experiments on KDEF along with zero-shot cross-dataset evaluation on CK+ using a strict train-on-KDEF/test-on-CK+ protocol with subject-independent splits demonstrate the effectiveness of the proposed method. CA-GFNet achieves 99.30% accuracy on KDEF and 98.98% on CK+ while requiring significantly fewer parameters than conventional deep FER models. These results confirm that adaptive gated fusion of shallow and deep features can deliver both high recognition accuracy and strong cross-dataset robustness. Full article
(This article belongs to the Special Issue Advanced Algorithms in Multimodal Affective Computing)
Show Figures

Figure 1

19 pages, 34223 KB  
Article
A Real Time Multi Modal Computer Vision Framework for Automated Autism Spectrum Disorder Screening
by Lehel Dénes-Fazakas, Ioan Catalin Mateas, Alexandru George Berciu, László Szilágyi, Levente Kovács and Eva-H. Dulf
Electronics 2026, 15(6), 1287; https://doi.org/10.3390/electronics15061287 - 19 Mar 2026
Viewed by 279
Abstract
Background: The early detection of autism spectrum disorder (ASD) is imperative for enhancing long-term developmental outcomes. Nevertheless, conventional screening methods depend on time-consuming, expert-driven behavioral assessments and are characterized by limited scalability. Automated video-based analysis provides a noninvasive and objective approach for the [...] Read more.
Background: The early detection of autism spectrum disorder (ASD) is imperative for enhancing long-term developmental outcomes. Nevertheless, conventional screening methods depend on time-consuming, expert-driven behavioral assessments and are characterized by limited scalability. Automated video-based analysis provides a noninvasive and objective approach for the extraction of behavioral biomarkers from naturalistic recordings. Methods: A modular multimodal framework was developed that integrates motion-based video analysis and facial feature extraction for the purpose of ASD versus typically developing (TD) classification. The system is capable of processing RGB videos, skeleton/stickman representations, and motion trajectory streams. A comprehensive set of kinematic features was extracted, encompassing joint trajectories, velocity and acceleration profiles, posture variability, movement smoothness, and bilateral asymmetry. The repetitive stereotypical behaviors exhibited by the subjects were characterized using frequency-domain analysis via FFT within the 0.3–7.0 Hz band. Facial expression features derived from normalized face crops and landmark-based morphological descriptors were integrated as complementary modalities. The feature-level fusion process was executed subsequent to z-score normalization, and the classification procedure was conducted using a Random Forest model with stratified 5-fold cross validation. The implementation of GPU acceleration was instrumental in facilitating near real-time inference. Results: The motion-based ComplexVideos pipeline demonstrated a cross-validated accuracy of 94.2 ± 2.1% with an area under the ROC curve (AUC) of 0.93. Skeleton-based KinectStickman inputs demonstrated moderate performance, with an accuracy range of 60–80%. In contrast, facial-only models exhibited an accuracy of approximately 60%. The integration of multiple modalities through feature fusion has been demonstrated to enhance the robustness of classification algorithms and mitigate the occurrence of false negative outcomes, thereby surpassing the performance of single-modality models. The mean inference time remained below one second per video frame under standard operating conditions. Conclusions: The experimental results demonstrate that the integration of multimodal cues, including motion and facial features, facilitates the development of effective and efficient video-based screening methods for autism spectrum disorder (ASD). The proposed framework is designed to offer a scalable, extensible, and computationally efficient solution that can support early screening in clinical and remote assessment settings. Full article
(This article belongs to the Special Issue Computer Vision and Machine Learning for Biometric Systems)
Show Figures

Figure 1

22 pages, 7355 KB  
Article
IAE-Net: Incremental Learning-Based Attention-Enhanced DenseNet for Robust Facial Emotion Recognition
by Haseeb Ali Khan and Jong-Ha Lee
Mathematics 2026, 14(6), 1023; https://doi.org/10.3390/math14061023 - 18 Mar 2026
Viewed by 150
Abstract
Facial emotion recognition (FER) is an important component of human–computer interaction and healthcare-oriented affective computing. However, reliable deployment remains difficult in unconstrained settings due to appearance and geometric variability (e.g., pose, illumination, and occlusion), demographic imbalance, and dataset bias. In practice, two additional [...] Read more.
Facial emotion recognition (FER) is an important component of human–computer interaction and healthcare-oriented affective computing. However, reliable deployment remains difficult in unconstrained settings due to appearance and geometric variability (e.g., pose, illumination, and occlusion), demographic imbalance, and dataset bias. In practice, two additional constraints frequently limit real-world FER systems: the computational overhead of heavy architectures and limited adaptability when data evolve over time, where sequential updates can cause catastrophic forgetting. To address these challenges, we propose the Incremental Attention-Enhanced Network (IAE-Net), a compact single-branch framework built on a DenseNet121 backbone and a cascaded refinement pipeline. The model incorporates Channel Attention (CA) to emphasize expression-relevant feature channels and suppress less informative responses, followed by a deformable attention module (DA) that reduces feature misalignment caused by non-rigid facial motion and pose shifts, thereby improving robustness under geometric variability. For continual deployment, IAE-Net supports class-incremental updates via weight transfer, exemplar replay, and knowledge distillation to improve retention during sequential learning. We evaluate IAE-Net on four widely used benchmarks, FER2013, FERPlus, KDEF, and AffectNet, covering both controlled and in-the-wild conditions under a unified training protocol. The proposed approach achieves accuracies of 79.15%, 92.03%, 99.48%, and 74.20% on FER2013, FERPlus, KDEF, and AffectNet, respectively, with balanced precision, recall, and F1-score trends. These results indicate that IAE-Net provides an efficient and extensible FER framework with potential utility in dynamic real-world and longitudinal healthcare-oriented applications. Full article
(This article belongs to the Special Issue Recent Advances and Applications of Artificial Neural Networks)
Show Figures

Figure 1

23 pages, 2010 KB  
Article
Visibility-Prior Guided Dual-Stream Mixture-of-Experts for Robust Facial Expression Recognition Under Complex Occlusions
by Siyuan Ma, Long Liu, Mingzhi Cheng, Peijun Qin, Zixuan Han, Cui Chen, Shizhao Yang and Hongjuan Wang
Electronics 2026, 15(6), 1230; https://doi.org/10.3390/electronics15061230 - 16 Mar 2026
Viewed by 247
Abstract
Facial occlusion induces sample-wise reliability shifts in facial expression recognition (FER), where the usefulness of global context and local discriminative cues varies dramatically with the amount of visible facial information. Existing occlusion-robust FER studies often evaluate under limited or homogeneous occlusion settings and [...] Read more.
Facial occlusion induces sample-wise reliability shifts in facial expression recognition (FER), where the usefulness of global context and local discriminative cues varies dramatically with the amount of visible facial information. Existing occlusion-robust FER studies often evaluate under limited or homogeneous occlusion settings and commonly adopt static fusion strategies, which are insufficient for complex and heterogeneous real-world occlusions. In this work, we establish a rigorous occlusion robustness evaluation protocol by constructing a fixed offline test benchmark with diverse synthetic occlusion patterns (e.g., masks, sunglasses, texture blocks, and mixed occlusions) on top of public FER test splits. We further propose a Dual-Stream Adaptive Weighting Mixture-of-Experts framework (DS-AW-MoE) that fuses a global contextual expert and a local discriminative expert via an occlusion-aware weighting network. Crucially, we introduce a facial visibility assessment as a task-agnostic prior to explicitly regulate expert contributions, enabling dynamic re-allocation of model capacity according to input-dependent feature reliability. Extensive experiments on public datasets and the constructed occlusion benchmark demonstrate that DS-AW-MoE achieves more stable recognition under complex occlusions, characterized by a smaller and more consistent performance drop. To support reproducibility under dataset license constraints, we will release an anonymous, fully runnable repository containing the complete occlusion synthesis pipeline, evaluation protocol, and configuration files, allowing researchers to reproduce the benchmark after obtaining the original datasets. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
Show Figures

Figure 1

Back to TopTop