Saved Queries

This paper presents a novel EEG-based learning system designed to enhance the efficiency and effectiveness of studying by dynamically adjusting the difficulty level of learning materials based on real-time attention levels. In the training phase, EEG signals corresponding to high and low concentration levels are recorded while participants engage in quizzes to learn and memorize Chinese characters. The attention levels are determined based on performance metrics derived from the quiz results. Following extensive preprocessing, the EEG data undergoes severmal feature extraction steps: removal of artifacts due to eye blinks and facial movements, segregation of waves based on their frequencies, similarity indexing with respect to delay, binary thresholding, and (PCA). These extracted features are then fed into a k-NN classifier, which accurately distinguishes between high and low attention brain wave patterns, with the labels derived from the quiz performance indicating high or low attention. During the implementation phase, the system continuously monitors the user’s EEG signals while studying. When low attention levels are detected, the system increases the repetition frequency and reduces the difficulty of the flashcards to refocus the user’s attention. Conversely, when high concentration levels are identified, the system escalates the difficulty level of the flashcards to maximize the learning challenge. This adaptive approach ensures a more effective learning experience by maintaining optimal cognitive engagement, resulting in improved learning rates, reduced stress, and increased overall learning efficiency. This adaptive approach ensures a more effective learning experience by maintaining optimal cognitive engagement, resulting in improved learning rates, reduced stress, and increased overall learning efficiency. Our results indicate that this EEG-based adaptive learning system holds significant potential for personalized education, fostering better retention and understanding of Chinese characters. Full article

(This article belongs to the Special Issue EEG Horizons: Exploring Neural Dynamics and Neurocognitive Processes)

►▼ Show Figures

Figure 1

26 pages, 2625 KiB

Open AccessArticle

Evaluating the Efficacy of the More Young HIFU Device for Facial Skin Improvement: A Comparative Study with 7D Ultrasound

by Ihab Adib and Youjun Liu

Appl. Sci. 2025, 15(15), 8485; https://doi.org/10.3390/app15158485 (registering DOI) - 31 Jul 2025

Viewed by 410

Abstract

High-Intensity Focused Ultrasound (HIFU) is a non-invasive technology widely used in aesthetic dermatology for skin tightening and facial rejuvenation. This study aimed to evaluate the safety and efficacy of a modified HIFU device, More Young, compared to the standard 7D HIFU system through a randomized, single-blinded clinical trial. The More Young device features enhanced focal depth precision and energy delivery algorithms, including nine pre-programmed stabilization checkpoints to minimize treatment risks. A total of 100 participants with facial wrinkles and skin laxity were randomly assigned to receive either More Young or 7D HIFU treatment. Skin improvements were assessed at baseline and one to six months post-treatment using the VISIA^® Skin Analysis System (7th Generation), focusing on eight key parameters. Patient satisfaction was evaluated through the Global Aesthetic Improvement Scale (GAIS). Data were analyzed using paired and independent t-tests, with effect sizes measured via Cohen’s d. Both groups showed significant post-treatment improvements; however, the More Young group demonstrated superior outcomes in wrinkle reduction, skin tightening, and texture enhancement, along with higher satisfaction and fewer adverse effects. No significant differences were observed in five of the eight skin parameters. Limitations include the absence of a placebo group, limited sample diversity, and short follow-up duration. Further studies are needed to validate long-term outcomes and assess performance across varied demographics and skin types. Full article

(This article belongs to the Section Biomedical Engineering)

►▼ Show Figures

Figure 1

27 pages, 1869 KiB

Open AccessReview

Understanding the Molecular Basis of Miller–Dieker Syndrome

by Gowthami Mahendran and Jessica A. Brown

Int. J. Mol. Sci. 2025, 26(15), 7375; https://doi.org/10.3390/ijms26157375 - 30 Jul 2025

Viewed by 388

Abstract

Miller–Dieker Syndrome (MDS) is a rare neurodevelopmental disorder caused by a heterozygous deletion of approximately 26 genes within the MDS locus of human chromosome 17. MDS, which affects 1 in 100,000 babies, can lead to a range of phenotypes, including lissencephaly, severe neurological defects, distinctive facial abnormalities, cognitive impairments, seizures, growth retardation, and congenital heart and liver abnormalities. One hallmark feature of MDS is an unusually smooth brain surface due to abnormal neuronal migration during early brain development. Several genes located within the MDS locus have been implicated in the pathogenesis of MDS, including PAFAH1B1, YWHAE, CRK, and METTL16. These genes play a role in the molecular and cellular pathways that are vital for neuronal migration, the proper development of the cerebral cortex, and protein translation in MDS. Improved model systems, such as MDS patient-derived organoids and multi-omics analyses indicate that WNT/β-catenin signaling, calcium signaling, S-adenosyl methionine (SAM) homeostasis, mammalian target of rapamycin (mTOR) signaling, Janus kinase/signal transducer and activator of transcription (JAK/STAT) signaling, and others are dysfunctional in MDS. This review of MDS integrates details at the clinical level alongside newly emerging details at the molecular and cellular levels, which may inform the development of novel therapeutic strategies for MDS. Full article

(This article belongs to the Special Issue Rare Diseases and Neuroscience)

►▼ Show Figures

Figure 1

35 pages, 4940 KiB

Open AccessArticle

A Novel Lightweight Facial Expression Recognition Network Based on Deep Shallow Network Fusion and Attention Mechanism

by Qiaohe Yang, Yueshun He, Hongmao Chen, Youyong Wu and Zhihua Rao

Algorithms 2025, 18(8), 473; https://doi.org/10.3390/a18080473 - 30 Jul 2025

Viewed by 296

Abstract

Facial expression recognition (FER) is a critical research direction in artificial intelligence, which is widely used in intelligent interaction, medical diagnosis, security monitoring, and other domains. These applications highlight its considerable practical value and social significance. Face expression recognition models often need to run efficiently on mobile devices or edge devices, so the research on lightweight face expression recognition is particularly important. However, feature extraction and classification methods of lightweight convolutional neural network expression recognition algorithms mostly used at present are not specifically and fully optimized for the characteristics of facial expression images, yet fail to make full use of the feature information in face expression images. To address the lack of facial expression recognition models that are both lightweight and effectively optimized for expression-specific feature extraction, this study proposes a novel network design tailored to the characteristics of facial expressions. In this paper, we refer to the backbone architecture of MobileNet V2 network, and redesign LightExNet, a lightweight convolutional neural network based on the fusion of deep and shallow layers, attention mechanism, and joint loss function, according to the characteristics of the facial expression features. In the network architecture of LightExNet, firstly, deep and shallow features are fused in order to fully extract the shallow features in the original image, reduce the loss of information, alleviate the problem of gradient disappearance when the number of convolutional layers increases, and achieve the effect of multi-scale feature fusion. The MobileNet V2 architecture has also been streamlined to seamlessly integrate deep and shallow networks. Secondly, by combining the own characteristics of face expression features, a new channel and spatial attention mechanism is proposed to obtain the feature information of different expression regions as much as possible for encoding. Thus improve the accuracy of expression recognition effectively. Finally, the improved center loss function is superimposed to further improve the accuracy of face expression classification results, and corresponding measures are taken to significantly reduce the computational volume of the joint loss function. In this paper, LightExNet is tested on the three mainstream face expression datasets: Fer2013, CK+ and RAF-DB, respectively, and the experimental results show that LightExNet has 3.27 M Parameters and 298.27 M Flops, and the accuracy on the three datasets is 69.17%, 97.37%, and 85.97%, respectively. The comprehensive performance of LightExNet is better than the current mainstream lightweight expression recognition algorithms such as MobileNet V2, IE-DBN, Self-Cure Net, Improved MobileViT, MFN, Ada-CM, Parallel CNN(Convolutional Neural Network), etc. Experimental results confirm that LightExNet effectively improves recognition accuracy and computational efficiency while reducing energy consumption and enhancing deployment flexibility. These advantages underscore its strong potential for real-world applications in lightweight facial expression recognition. Full article

►▼ Show Figures

Figure 1

17 pages, 597 KiB

Open AccessReview

Dry Needling for Tension-Type Headache: A Scoping Review on Intervention Procedures, Muscle Targets, and Outcomes

by Ana Bravo-Vazquez, Ernesto Anarte-Lazo, Cleofas Rodriguez-Blanco and Carlos Bernal-Utrera

J. Clin. Med. 2025, 14(15), 5320; https://doi.org/10.3390/jcm14155320 - 28 Jul 2025

Viewed by 265

Abstract

Background/Objectives: Tension-type headache (TTH) is the most prevalent form of primary headache. The etiology of TTH is not yet fully understood, although it is associated with the presence of myofascial trigger points (MTPs) in cervical and facial muscles. Dry needling (DN) therapy has emerged as an effective and safe non-pharmacological option for pain relief, but there are a lack of systematic reviews focused on its specific characteristics in TTH. The aim of this paper is to examine the characteristics and methodologies of DN in managing TTH. Methods: A scoping review was conducted with inclusion criteria considering studies that evaluated DN interventions in adults with TTH, reporting target muscles, diagnostic criteria, and technical features. The search was performed using PubMed, Embase, Scopus, and the Web of Science, resulting in the selection of seven studies after a rigorous filtering and evaluation process. Results: The included studies, primarily randomized controlled trials, involved a total of 309 participants. The most frequently treated muscles were the temporalis and trapezius. Identification of MTPs was mainly performed through manual palpation, although diagnostic criteria varied. DN interventions differed in technique. All studies included indicated favorable outcomes with improvements in headache symptoms. No serious adverse effects were reported, suggesting that the technique is safe. However, heterogeneity in protocols and diagnostic criteria limits the comparability of results. Conclusions: The evidence supports the use of DN in key muscles such as the temporalis and trapezius for managing TTH, although the diversity in methodologies and diagnostic criteria highlights the need for standardization. The safety profile of the method is favorable, but further research is necessary to define optimal protocols and improve reproducibility. Implementing objective diagnostic criteria and uniform protocols will facilitate advances in clinical practice and future research, ultimately optimizing outcomes for patients with TTH. Full article

(This article belongs to the Section Clinical Neurology)

►▼ Show Figures

Figure 1

24 pages, 10460 KiB

Open AccessArticle

WGGLFA: Wavelet-Guided Global–Local Feature Aggregation Network for Facial Expression Recognition

by Kaile Dong, Xi Li, Cong Zhang, Zhenhua Xiao and Runpu Nie

Biomimetics 2025, 10(8), 495; https://doi.org/10.3390/biomimetics10080495 - 27 Jul 2025

Viewed by 313

Abstract

Facial expression plays an important role in human–computer interaction and affective computing. However, existing expression recognition methods cannot effectively capture multi-scale structural details contained in facial expressions, leading to a decline in recognition accuracy. Inspired by the multi-scale processing mechanism of the biological visual system, this paper proposes a wavelet-guided global–local feature aggregation network (WGGLFA) for facial expression recognition (FER). Our WGGLFA network consists of three main modules: the scale-aware expansion (SAE) module, which combines dilated convolution and wavelet transform to capture multi-scale contextual features; the structured local feature aggregation (SLFA) module based on facial keypoints to extract structured local features; and the expression-guided region refinement (ExGR) module, which enhances features from high-response expression areas to improve the collaborative modeling between local details and key expression regions. All three modules utilize the spatial frequency locality of the wavelet transform to achieve high-/low-frequency feature separation, thereby enhancing fine-grained expression representation under frequency domain guidance. Experimental results show that our WGGLFA achieves accuracies of 90.32%, 91.24%, and 71.90% on the RAF-DB, FERPlus, and FED-RO datasets, respectively, demonstrating that our WGGLFA is effective and has more capability of robustness and generalization than state-of-the-art (SOTA) expression recognition methods. Full article

(This article belongs to the Special Issue New Biomimetic Advances in Signal and Image Processing for Biomedical Applications 2025)

►▼ Show Figures

Figure 1

20 pages, 3386 KiB

Open AccessArticle

Design of Realistic and Artistically Expressive 3D Facial Models for Film AIGC: A Cross-Modal Framework Integrating Audience Perception Evaluation

by Yihuan Tian, Xinyang Li, Zuling Cheng, Yang Huang and Tao Yu

Sensors 2025, 25(15), 4646; https://doi.org/10.3390/s25154646 - 26 Jul 2025

Viewed by 378

Abstract

The rise of virtual production has created an urgent need for both efficient and high-fidelity 3D face generation schemes for cinema and immersive media, but existing methods are often limited by lighting–geometry coupling, multi-view dependency, and insufficient artistic quality. To address this, this study proposes a cross-modal 3D face generation framework based on single-view semantic masks. It utilizes Swin Transformer for multi-level feature extraction and combines with NeRF for illumination decoupled rendering. We utilize physical rendering equations to explicitly separate surface reflectance from ambient lighting to achieve robust adaptation to complex lighting variations. In addition, to address geometric errors across illumination scenes, we construct geometric a priori constraint networks by mapping 2D facial features to 3D parameter space as regular terms with the help of semantic masks. On the CelebAMask-HQ dataset, this method achieves a leading score of SSIM = 0.892 (37.6% improvement from baseline) with FID = 40.6. The generated faces excel in symmetry and detail fidelity with realism and aesthetic scores of 8/10 and 7/10, respectively, in a perceptual evaluation with 1000 viewers. By combining physical-level illumination decoupling with semantic geometry a priori, this paper establishes a quantifiable feedback mechanism between objective metrics and human aesthetic evaluation, providing a new paradigm for aesthetic quality assessment of AI-generated content. Full article

(This article belongs to the Special Issue Convolutional Neural Network Technology for 3D Imaging and Sensing)

►▼ Show Figures

Figure 1

20 pages, 4162 KiB

Open AccessArticle

Discovering the Emotions of Frustration and Confidence During the Application of Cognitive Tests in Mexican University Students

by Marco A. Moreno-Armendáriz, Jesús Mercado-Ríos, José E. Valdez-Rodríguez, Rolando Quintero and Victor H. Ponce-Ponce

Big Data Cogn. Comput. 2025, 9(8), 195; https://doi.org/10.3390/bdcc9080195 - 24 Jul 2025

Viewed by 349

Abstract

Emotion detection using computer vision has advanced significantly in recent years, achieving remarkable performance that, in some cases, surpasses that of humans. Convolutional neural networks (CNNs) excel in this task by capturing facial features that allow for effective emotion classification. However, most research focuses on basic emotions, such as happiness, anger, or sadness, neglecting more complex emotions, like frustration. People set expectations or goals to meet; if they do not happen, frustration arises, generating reactions such as annoyance, anger, and disappointment, which can harm confidence and motivation. These aspects make it especially relevant in mental health and educational contexts, where detecting it could help mitigate its adverse effects. In this research, we developed a CNN-based approach to detect frustration through facial expressions. The scarcity of specific datasets for this task led us to create an experimental protocol to generate our dataset. This classification task presents a high degree of difficulty due to the variability in facial expressions among different participants when feeling frustrated. Despite this, our new model achieved an F1-score of

0.8080

, thus obtaining an adequate baseline model. Full article

(This article belongs to the Special Issue Application of Deep Neural Networks)

►▼ Show Figures

Figure 1

22 pages, 2952 KiB

Open AccessArticle

Raw-Data Driven Functional Data Analysis with Multi-Adaptive Functional Neural Networks for Ergonomic Risk Classification Using Facial and Bio-Signal Time-Series Data

by Suyeon Kim, Afrooz Shakeri, Seyed Shayan Darabi, Eunsik Kim and Kyongwon Kim

Sensors 2025, 25(15), 4566; https://doi.org/10.3390/s25154566 - 23 Jul 2025

Viewed by 234

Abstract

Ergonomic risk classification during manual lifting tasks is crucial for the prevention of workplace injuries. This study addresses the challenge of classifying lifting task risk levels (low, medium, and high risk, labeled as 0, 1, and 2) using multi-modal time-series data comprising raw facial landmarks and bio-signals (electrocardiography [ECG] and electrodermal activity [EDA]). Classifying such data presents inherent challenges due to multi-source information, temporal dynamics, and class imbalance. To overcome these challenges, this paper proposes a Multi-Adaptive Functional Neural Network (Multi-AdaFNN), a novel method that integrates functional data analysis with deep learning techniques. The proposed model introduces a novel adaptive basis layer composed of micro-networks tailored to each individual time-series feature, enabling end-to-end learning of discriminative temporal patterns directly from raw data. The Multi-AdaFNN approach was evaluated across five distinct dataset configurations: (1) facial landmarks only, (2) bio-signals only, (3) full fusion of all available features, (4) a reduced-dimensionality set of 12 selected facial landmark trajectories, and (5) the same reduced set combined with bio-signals. Performance was rigorously assessed using 100 independent stratified splits (70% training and 30% testing) and optimized via a weighted cross-entropy loss function to manage class imbalance effectively. The results demonstrated that the integrated approach, fusing facial landmarks and bio-signals, achieved the highest classification accuracy and robustness. Furthermore, the adaptive basis functions revealed specific phases within lifting tasks critical for risk prediction. These findings underscore the efficacy and transparency of the Multi-AdaFNN framework for multi-modal ergonomic risk assessment, highlighting its potential for real-time monitoring and proactive injury prevention in industrial environments. Full article

(This article belongs to the Special Issue (Bio)sensors for Physiological Monitoring)

►▼ Show Figures

Figure 1

18 pages, 3102 KiB

Open AccessArticle

A Multicomponent Face Verification and Identification System

by Athanasios Douklias, Ioannis Zorzos, Evangelos Maltezos, Vasilis Nousis, Spyridon Nektarios Bolierakis, Lazaros Karagiannidis, Eleftherios Ouzounoglou and Angelos Amditis

Appl. Sci. 2025, 15(15), 8161; https://doi.org/10.3390/app15158161 - 22 Jul 2025

Viewed by 237

Abstract

Face recognition technology is a biometric technology, which is based on the identification or verification of facial features. Automatic face recognition is an active research field in the context of computer vision and artificial intelligence (AI) that is fundamental for a variety of real-time applications. In this research, the design and implementation of a face verification and identification system of a flexible, modular, secure, and scalable architecture is proposed. The proposed system incorporates several and various types of system components: (i) portable capabilities (mobile application and mixed reality [MR] glasses), (ii) enhanced monitoring and visualization via a user-friendly Web-based user interface (UI), and (iii) information sharing via middleware to other external systems. The experiments showed that such interconnected and complementary system components were able to perform robust and real-time results related to face identification and verification. Furthermore, to identify a proper model of high accuracy, robustness, and performance speed for face identification and verification tasks, a comprehensive evaluation of multiple face recognition pre-trained models (FaceNet, ArcFace, Dlib, and MobileNetV2) on a curated version of the ID vs. Spot dataset was performed. Among the models used, FaceNet emerged as a preferable choice for real-time tasks due to its balance between accuracy and inference speed for both face identification and verification tasks achieving AUC of 0.99, Rank-1 of 91.8%, Rank-5 of 95.8%, FNR of 2% and FAR of 0.1%, accuracy of 98.6%, and inference speed of 52 ms. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Image Processing)

►▼ Show Figures

Figure 1

26 pages, 829 KiB

Open AccessArticle

Enhanced Face Recognition in Crowded Environments with 2D/3D Features and Parallel Hybrid CNN-RNN Architecture with Stacked Auto-Encoder

by Samir Elloumi, Sahbi Bahroun, Sadok Ben Yahia and Mourad Kaddes

Big Data Cogn. Comput. 2025, 9(8), 191; https://doi.org/10.3390/bdcc9080191 - 22 Jul 2025

Viewed by 385

Abstract

Face recognition (FR) in unconstrained conditions remains an open research topic and an ongoing challenge. The facial images exhibit diverse expressions, occlusions, variations in illumination, and heterogeneous backgrounds. This work aims to produce an accurate and robust system for enhanced Security and Surveillance. A parallel hybrid deep learning model for feature extraction and classification is proposed. An ensemble of three parallel extraction layer models learns the best representative features using CNN and RNN. 2D LBP and 3D Mesh LBP are computed on face images to extract image features as input to two RNNs. A stacked autoencoder (SAE) merged the feature vectors extracted from the three CNN-RNN parallel layers. We tested the designed 2D/3D CNN-RNN framework on four standard datasets. We achieved an accuracy of

98.9 %

. The hybrid deep learning model significantly improves FR against similar state-of-the-art methods. The proposed model was also tested on an unconstrained conditions human crowd dataset, and the results were very promising with an accuracy of

95 %

. Furthermore, our model shows an 11.5% improvement over similar hybrid CNN-RNN architectures, proving its robustness in complex environments where the face can undergo different transformations. Full article

►▼ Show Figures

Figure 1

18 pages, 5806 KiB

Open AccessArticle

Optical Flow Magnification and Cosine Similarity Feature Fusion Network for Micro-Expression Recognition

by Heyou Chang, Jiazheng Yang, Kai Huang, Wei Xu, Jian Zhang and Hao Zheng

Mathematics 2025, 13(15), 2330; https://doi.org/10.3390/math13152330 - 22 Jul 2025

Viewed by 241

Abstract

Recent advances in deep learning have significantly advanced micro-expression recognition, yet most existing methods process the entire facial region holistically, struggling to capture subtle variations in facial action units, which limits recognition performance. To address this challenge, we propose the Optical Flow Magnification and Cosine Similarity Feature Fusion Network (MCNet). MCNet introduces a multi-facial action optical flow estimation module that integrates global motion-amplified optical flow with localized optical flow from the eye and mouth–nose regions, enabling precise capture of facial expression nuances. Additionally, an enhanced MobileNetV3-based feature extraction module, incorporating Kolmogorov–Arnold networks and convolutional attention mechanisms, effectively captures both global and local features from optical flow images. A novel multi-channel feature fusion module leverages cosine similarity between Query and Key token sequences to optimize feature integration. Extensive evaluations on four public datasets—CASME II, SAMM, SMIC-HS, and MMEW—demonstrate MCNet’s superior performance, achieving state-of-the-art results with 92.88% UF1 and 86.30% UAR on the composite dataset, surpassing the best prior method by 1.77% in UF1 and 6.0% in UAR. Full article

(This article belongs to the Special Issue Representation Learning for Computer Vision and Pattern Recognition)

►▼ Show Figures

Figure 1

38 pages, 2346 KiB

Open AccessReview

Review of Masked Face Recognition Based on Deep Learning

by Bilal Saoud, Abdul Hakim H. M. Mohamed, Ibraheem Shayea, Ayman A. El-Saleh and Abdulaziz Alashbi

Technologies 2025, 13(7), 310; https://doi.org/10.3390/technologies13070310 - 21 Jul 2025

Viewed by 1278

Abstract

With the widespread adoption of face masks due to global health crises and heightened security concerns, traditional face recognition systems have struggled to maintain accuracy, prompting significant research into masked face recognition (MFR). Although various models have been proposed, a comprehensive and systematic understanding of recent deep learning (DL)-based approaches remains limited. This paper addresses this research gap by providing an extensive review and comparative analysis of state-of-the-art MFR techniques. We focus on DL-based methods due to their superior performance in real-world scenarios, discussing key architectures, feature extraction strategies, datasets, and evaluation metrics. This paper also introduces a structured methodology for selecting and reviewing relevant works, ensuring transparency and reproducibility. As a contribution, we present a detailed taxonomy of MFR approaches, highlight current challenges, and suggest potential future research directions. This survey serves as a valuable resource for researchers and practitioners seeking to advance the field of robust facial recognition in masked conditions. Full article

►▼ Show Figures

Figure 1

18 pages, 2423 KiB

Open AccessArticle

A New AI Framework to Support Social-Emotional Skills and Emotion Awareness in Children with Autism Spectrum Disorder

by Andrea La Fauci De Leo, Pooneh Bagheri Zadeh, Kiran Voderhobli and Akbar Sheikh Akbari

Computers 2025, 14(7), 292; https://doi.org/10.3390/computers14070292 - 20 Jul 2025

Viewed by 920

Abstract

This research highlights the importance of Emotion Aware Technologies (EAT) and their implementation in serious games to assist children with Autism Spectrum Disorder (ASD) in developing social-emotional skills. As AI is gaining popularity, such tools can be used in mobile applications as invaluable teaching tools. In this paper, a new AI framework application is discussed that will help children with ASD develop efficient social-emotional skills. It uses the Jetpack Compose framework and Google Cloud Vision API as emotion-aware technology. The framework is developed with two main features designed to help children reflect on their emotions, internalise them, and train them how to express these emotions. Each activity is based on similar features from literature with enhanced functionalities. A diary feature allows children to take pictures of themselves, and the application categorises their facial expressions, saving the picture in the appropriate space. The three-level minigame consists of a series of prompts depicting a specific emotion that children have to match. The results of the framework offer a good starting point for similar applications to be developed further, especially by training custom models to be used with ML Kit. Full article

(This article belongs to the Special Issue AI in Its Ecosystem)

►▼ Show Figures

Figure 1

21 pages, 1115 KiB

Open AccessArticle

Non-Contact Oxygen Saturation Estimation Using Deep Learning Ensemble Models and Bayesian Optimization

by Andrés Escobedo-Gordillo, Jorge Brieva and Ernesto Moya-Albor

Technologies 2025, 13(7), 309; https://doi.org/10.3390/technologies13070309 - 19 Jul 2025

Viewed by 373

Abstract

Monitoring Peripheral Oxygen Saturation (SpO₂) is an important vital sign both in Intensive Care Units (ICUs), during surgery and convalescence, and as part of remote medical consultations after of the COVID-19 pandemic. This has made the development of new SpO₂-measurement tools an area of active research and opportunity. In this paper, we present a new Deep Learning (DL) combined strategy to estimate SpO₂ without contact, using pre-magnified facial videos to reveal subtle color changes related to blood flow and with no calibration per subject required. We applied the Eulerian Video Magnification technique using the Hermite Transform (EVM-HT) as a feature detector to feed a Three-Dimensional Convolutional Neural Network (3D-CNN). Additionally, parameters and hyperparameter Bayesian optimization and an ensemble technique over the dataset magnified were applied. We tested the method on 18 healthy subjects, where facial videos of the subjects, including the automatic detection of the reference from a contact pulse oximeter device, were acquired. As performance metrics for the SpO₂-estimation proposal, we calculated the Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and other parameters from the Bland–Altman (BA) analysis with respect to the reference. Therefore, a significant improvement was observed by adding the ensemble technique with respect to the only optimization, obtaining 14.32% in RMSE (reduction from 0.6204 to 0.5315) and 13.23% in MAE (reduction from 0.4323 to 0.3751). On the other hand, regarding Bland–Altman analysis, the upper and lower limits of agreement for the Mean of Differences (MOD) between the estimation and the ground truth were 1.04 and −1.05, with an MOD (bias) of −0.00175; therefore, MOD

\pm 1.96 σ

= −0.00175 ± 1.04. Thus, by leveraging Bayesian optimization for hyperparameter tuning and integrating a Bagging Ensemble, we achieved a significant reduction in the training error (bias), achieving a better generalization over the test set, and reducing the variance in comparison with the baseline model for SpO₂ estimation. Full article

(This article belongs to the Section Assistive Technologies)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 26.

Go to page 1 2 3 4 5

Search Results (1,282)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI