MDPI - Publisher of Open Access Journals

12 pages, 412 KB

Open AccessArticle

Evaluation of Facial Soft Tissue Angles in Adolescents with Angle Class I, II, and III Malocclusion Using Profile Image Analysis

by Kristina Cernova, Andris Abeltins, Oskars Radzins and Anda Slaidina

Dent. J. 2026, 14(6), 324; https://doi.org/10.3390/dj14060324 - 29 May 2026

Viewed by 196

Abstract

Background/Objectives: Soft tissue profile plays a crucial role in orthodontic diagnosis and treatment planning. However, limited data exist regarding differences in facial soft tissue angles among adolescents with different classes of malocclusion. This study aimed to evaluate variations in soft tissue facial [...] Read more.

Background/Objectives: Soft tissue profile plays a crucial role in orthodontic diagnosis and treatment planning. However, limited data exist regarding differences in facial soft tissue angles among adolescents with different classes of malocclusion. This study aimed to evaluate variations in soft tissue facial angles among patients with Angle Class I, II, and III malocclusions aged 12–16 years using profile photographs. Methods: This retrospective observational study included 489 patients (330 females and 159 males; mean age 13.69 ± 1.30 years) examined between January 2008 and December 2018. 3D Slicer (Brigham and Women’s Hospital, Boston, MA, USA) was used only for landmark positioning and coordinate extraction from 2D profile photographs. Five facial angles were measured: Nasion–Nose tip–Pogonion (Na-T-Pg), Glabella–Subnasale–Pogonion (Gl-Sn-Pg), Pogonion–Nasion–Upper lip (Pg-Na-Ls), Pogonion–Nasion–Lower lip (Pg-Na-Li), and Pogonion–Subnasale–Upper lip (Pg-Sn-Ls). Statistical analysis was performed using R software, including ANOVA and t-tests, with significance set at p < 0.05. Results: Patients with Class III malocclusion demonstrated significantly higher mean values of the Na-T-Pg and Gl-Sn-Pg angles and lower values of the Pg-Na-Ls, Pg-Na-Li, and Pg-Sn-Ls angles compared with Class I and Class II malocclusions (p < 0.05), indicating mandibular protrusion. Conversely, Class II malocclusion showed lower Na-T-Pg and Gl-Sn-Pg angles and higher Pg-Na-Ls, Pg-Na-Li, and Pg-Sn-Ls values, consistent with mandibular retrusion relative to the maxilla. No clinically significant sex-related differences were observed in most parameters. Conclusions: Significant differences in facial soft tissue angles exist among adolescents with different malocclusion classes. These findings highlight the importance of soft tissue analysis in orthodontic diagnosis and may support the development of artificial intelligence-based tools for automated malocclusion assessment and treatment planning. Full article

► Show Figures

Figure 1

20 pages, 461 KB

Open AccessSystematic Review

The Role of Virtual and Augmented Reality in Transsphenoidal Surgical Approaches to the Sellar and Parasellar Area—A Systematic Review

by Kristian Bechev, Daniel Markov, Vladimir Aleksiev, Galabin Markov, Elena Poryazova and Antoaneta Fasova

J. Clin. Med. 2026, 15(11), 4142; https://doi.org/10.3390/jcm15114142 - 27 May 2026

Viewed by 204

Abstract

Background/Objectives: Transsphenoidal surgery has become the gold standard for the treatment of sellar and parasellar lesions, but it remains associated with significant anatomical challenges and the risk of intraoperative complications. The limitations of conventional imaging in depicting the complex three-dimensional anatomy of [...] Read more.

Background/Objectives: Transsphenoidal surgery has become the gold standard for the treatment of sellar and parasellar lesions, but it remains associated with significant anatomical challenges and the risk of intraoperative complications. The limitations of conventional imaging in depicting the complex three-dimensional anatomy of the skull base have led to a growing interest in virtual (VR) and augmented reality (AR) technologies, which offer enhanced spatial visualization, preoperative simulation, and image-guided intraoperative navigation. This systematic review aims to evaluate the current evidence on the role of virtual and augmented reality in transsphenoidal surgical interventions, with a focus on their impact on preoperative planning, intraoperative orientation, surgical outcomes, and neurosurgical training. Methods: A systematic literature search was conducted in accordance with PRISMA 2020 guidelines across PubMed, Scopus, and Web of Science for the period 2015–2025. MeSH terms and free-text keywords related to transsphenoidal surgery, sphenoid sinus anatomy, and VR/AR technologies were combined using Boolean operators. Risk of bias was assessed using RoB 2.0 for RCTs; methodological quality was assessed using the Newcastle–Ottawa Scale for observational studies and AMSTAR 2 for systematic reviews. Clinical, morphometric, and experimental studies evaluating VR/AR applications were included. Data were extracted using a standardized protocol and synthesized through qualitative analysis, with subgroup analysis by technology type (VR vs. AR) and clinical application domain. Results: A total of 218 publications were identified, of which 52 met the inclusion criteria (clinical studies n = 12, simulation and technology studies n = 30, morphological studies n = 10). VR-based three-dimensional reconstructions were consistently associated with improved preoperative spatial orientation and anatomical landmark recognition. AR systems demonstrated a meaningful contribution to intraoperative navigation, with reported reductions in time to target and improved visualization of critical neurovascular structures. VR platforms showed high effectiveness in surgical training, with shorter learning curves and improved technical performance. However, the majority of included studies were small observational cohorts, simulation studies, or expert overviews, with substantial heterogeneity in methodology, technology platforms, and outcome measures, precluding quantitative meta-analysis. Conclusions: Virtual and augmented reality represent clinically promising adjuncts to transsphenoidal surgery, with demonstrated benefits in preoperative planning, intraoperative navigation, and surgical training. These conclusions should be interpreted in the context of a predominantly early-phase and heterogeneous evidence base. Standardized protocols, larger prospective studies, and randomized trials are needed before the integration of VR/AR with navigation systems and artificial intelligence can be established as a routine component of personalized transsphenoidal surgery. Full article

(This article belongs to the Topic Assessment of Craniofacial Morphology: Traditional Methods and Innovative Approaches)

► Show Figures

Figure 1

19 pages, 2395 KB

Open AccessArticle

Reproducible RGB Video Screening of Amyotrophic Lateral Sclerosis Using Spherical-Coordinate Landmark Correlations

by Daniela Suárez-Hernández, Sulema Torres-Ramos, Stewart R. Santos-Arce and Israel Román-Godínez

Eng 2026, 7(5), 238; https://doi.org/10.3390/eng7050238 - 14 May 2026

Viewed by 326

Abstract

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disorder for which delayed recognition may limit timely clinical management. This study investigates a reproducible computer-aided screening approach based on facial motion analysis from standard RGB video recorded during the diadochokinetic /pataka/ task. Facial landmarks [...] Read more.

Amyotrophic lateral sclerosis (ALS) is a progressive neurodegenerative disorder for which delayed recognition may limit timely clinical management. This study investigates a reproducible computer-aided screening approach based on facial motion analysis from standard RGB video recorded during the diadochokinetic /pataka/ task. Facial landmarks were extracted using a face-mesh model and mapped into spherical coordinates to represent facial motion trajectories. Coordinated facial behavior was characterized through pairwise Pearson correlation matrices computed between landmark trajectories, yielding correlation-based descriptors of inter-region motion patterns. We compared a domain-informed Manual-24 reference configuration with data-driven feature-selection strategies (ElasticNet and mRMR) under a leakage-aware nested cross-validation design using the Toronto NeuroFace dataset. Performance was reported as mean ± standard deviation across outer folds, with sensitivity emphasized because of its relevance for screening-oriented applications. The primary configuration (mRMR,

k = 3

,

ϕ

+ kNN) achieved 61.11 ± 19.24% accuracy, 61.11 ± 9.62% sensitivity, and 61.11 ± 34.70% specificity. These results suggest that correlation-derived coordination patterns contain discriminative information for ALS/HC separation, although fold-level variability indicates that performance should be interpreted cautiously. Task-aligned comparisons with prior /pataka/-based studies highlight the influence of sensing modality, evaluation level, and uncertainty reporting on apparent performance. Overall, correlation-based facial motion descriptors combined with leakage-aware feature selection provide a transparent proof-of-concept framework for RGB video-based ALS screening, motivating validation on larger cohorts and independent datasets. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Techniques for Disease Prediction, Diagnosis and Management)

► Show Figures

Figure 1

19 pages, 2064 KB

Open AccessArticle

Clinical Equivalence of a CNN-Based Automated Soft Tissue Landmark Detection System on 2D Facial Images

by Argun Ege Türkün, Müslim Ege Kalender, Murat Kurt and Servet Doğan

Diagnostics 2026, 16(10), 1464; https://doi.org/10.3390/diagnostics16101464 - 11 May 2026

Viewed by 508

Abstract

Background/Objectives: The aim of this study was to evaluate and compare the accuracy, reliability, and time efficiency of a convolutional neural network (CNN)-based deep learning model with manual annotation in the identification of soft tissue landmarks on two-dimensional (2D) facial images for orthodontic [...] Read more.

Background/Objectives: The aim of this study was to evaluate and compare the accuracy, reliability, and time efficiency of a convolutional neural network (CNN)-based deep learning model with manual annotation in the identification of soft tissue landmarks on two-dimensional (2D) facial images for orthodontic applications. Materials and Methods: Three-dimensional (3D) facial scans were obtained from 100 participants (50 females, 50 males) aged 18–25 years using the Revopoint Pop2 3D Scanner. Frontal and profile 2D images were extracted from the 3D models. Manual landmark identification was performed by a single investigator using LabelMe software, marking 22 landmarks on frontal images and 15 landmarks on profile images. A novel CNN model was developed and trained on these manually annotated images. The model’s automatic landmark identifications were compared with manual annotations in terms of positional error, identification time, and reproducibility. Results: The CNN model achieved a mean localization accuracy of 96.07%. The mean prediction error ranged from 2.3% to 4.5% across various anatomical points. Trichion, Menton, and Gonion points exhibited relatively higher error rates. The model significantly reduced the annotation time compared to manual identification (manual method: 237 s per image). Intra-observer reliability analysis demonstrated excellent agreement for manual landmarking (ICC: 0.85–0.95). The AI model provided consistent predictions for identical inputs. Conclusions: The deep learning-based model demonstrated comparable accuracy to manual landmark identification while significantly improving the annotation speed and reproducibility. These results suggest that CNN-based systems offer a promising alternative for clinical orthodontic analysis and digital workflow integration. Full article

(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

► Show Figures

Figure 1

32 pages, 25709 KB

Open AccessArticle

Landmark-Based Features for Vehicle Trajectory Anomaly Detection from Traffic Video in Urban Intersections—A Case Study

by Nicolae Cleju and Constantin Catargiu

Sensors 2026, 26(10), 3027; https://doi.org/10.3390/s26103027 - 11 May 2026

Viewed by 949

Abstract

We study trajectory feature representations in the context of detecting spatially anomalous vehicle trajectories in urban intersections, using trajectory data from video streams captured by camera monitoring systems. These trajectories are extracted using an object detection pipeline and have particular characteristics like short [...] Read more.

We study trajectory feature representations in the context of detecting spatially anomalous vehicle trajectories in urban intersections, using trajectory data from video streams captured by camera monitoring systems. These trajectories are extracted using an object detection pipeline and have particular characteristics like short lengths, variable endpoints, and other viewpoint-dependent detection artifacts, which make existing spatial feature approaches less effective. We introduce two feature representations adapted for intersection-level trajectories, based on distances to a fixed set of landmark points, which provide fixed-length vectors compatible with common tabular anomaly detector algorithms. We evaluate using a dataset of 5378 labeled trajectories collected from camera recordings in one deployment site, as well as on other existing city-wide benchmark datasets, showing that, in the evaluated setting, the proposed feature representations improve upon several existing spatial features and enable better detection of both shape and placement anomalies. Full article

(This article belongs to the Special Issue Sensors and Sensing Technologies for Traffic, Driving and Transportation)

► Show Figures

Figure 1

18 pages, 3465 KB

Open AccessArticle

Geometric Radiomic Analysis of Hip Joint Space for Automatic Detection of Developmental Dysplasia of the Hip in Infants

by Olga Sitsiani, Andreas Vezakis, Nektaria Karangeli, Ioannis Vezakis, Stavros T. Miloulis, Eleftherios Kontopodis, Ioannis Kakkos and George K. Matsopoulos

Appl. Sci. 2026, 16(9), 4345; https://doi.org/10.3390/app16094345 - 29 Apr 2026

Viewed by 262

Abstract

Developmental dysplasia of the hip (DDH) is a common musculoskeletal disorder in infancy, and early detection is essential for optimal clinical outcomes. Radiographic assessment is traditionally based on angular measurements, which may be limited by variability in landmark identification and do not fully [...] Read more.

Developmental dysplasia of the hip (DDH) is a common musculoskeletal disorder in infancy, and early detection is essential for optimal clinical outcomes. Radiographic assessment is traditionally based on angular measurements, which may be limited by variability in landmark identification and do not fully capture the complex morphology of the hip joint. In this study, we investigate whether geometric features derived from the hip joint articulation space can be used to differentiate between normal and dysplastic hips in infant radiographs. Pelvic X-ray images from infants (mean age 4.5 ± 0.83 months) were analyzed, and custom segmentation masks were developed to isolate the joint space region. A total of 99 geometric and radiomic features were extracted and evaluated using statistical analysis and supervised machine learning methods. Multiple features demonstrated strong discriminative power between normal and DDH (p < 0.001), with shape and spatial distribution characteristics showing the highest relevance. Classification models achieved an F1-score of approximately 80% on the full dataset. Notably, patient age was identified as a significant confounding factor, and analysis on an age-matched subset improved classification performance to 94% accuracy and 93% recall. These findings suggest that geometric characterization of the hip joint space provides a promising and interpretable framework for DDH detection. The results also highlight the importance of age-stratified analysis in pediatric imaging. Further validation on larger and more diverse datasets is required to assess clinical applicability. Full article

(This article belongs to the Section Biomedical Engineering)

► Show Figures

Figure 1

19 pages, 3599 KB

Open AccessArticle

Automated Pomelo Posture Detection: A Lightweight Deep Learning Solution for Conveyor-Based Fruit Processing

by Qingting Jin, Runqi Yuan, Jiayan Fang, Jing Huang, Jiayu Chen, Shilei Lyu, Zhen Li and Yu Deng

Agriculture 2026, 16(9), 946; https://doi.org/10.3390/agriculture16090946 - 24 Apr 2026

Viewed by 960

Abstract

In modern intelligent food processing, the unpredictable variability in pomelo orientation on high-speed conveyors poses a significant challenge to automated grading and precision peeling operations. To address this, a deep learning-based method is proposed for the real-time detection of pomelo posture. Firstly, a [...] Read more.

In modern intelligent food processing, the unpredictable variability in pomelo orientation on high-speed conveyors poses a significant challenge to automated grading and precision peeling operations. To address this, a deep learning-based method is proposed for the real-time detection of pomelo posture. Firstly, a pomelo posture dataset was constructed to support model training and validation. Secondly, to balance the extraction of posture features from uniform fruits with the low-power constraints of edge deployment, a domain-specific architectural optimization is presented. Building on the YOLOv8n framework, the proposed model synergistically integrates specialized modules. A lightweight GhostHGNetV2 foundation is utilized to significantly reduce computational redundancy while maintaining the resolution required to detect key anatomical landmarks. To overcome spatial confusion and capture multi-scale global appearance information, a multi-path coordinate attention (MPCA) module is introduced. Furthermore, the SlimNeck architecture and VoVGSCSP module streamline multi-scale feature fusion via one-time aggregation, effectively preventing computational bottlenecks. This design optimizes the computational efficiency of the model while maintaining detection accuracy. Experimental results demonstrate that compared with the baseline YOLOv8n model, the proposed method increased the mAP50 accuracy by 3.67% while reducing parameter count and computational load by 17.5% and 23.3%, respectively. Additionally, it achieved a processing speed of 19.3 FPS on the Jetson Orin Nano 6G edge platform. This research provides a critical technical foundation for the recognition of pomelo posture, enabling subsequent orientation rectification and fostering the development of streamlined, automated pomelo processing lines. Full article

(This article belongs to the Special Issue Application of Smart Agricultural Technologies in Mountain Farming Systems)

► Show Figures

Figure 1

31 pages, 1645 KB

Open AccessReview

The Mediterranean Diet and Cardiovascular Protection: Biochemical Mechanisms with Emphasis on Platelet-Activating Factor

by Paraskevi Detopoulou, Smaragdi Antonopoulou, Pinelopi Douvogianni and Constantinos A. Demopoulos

Nutrients 2026, 18(9), 1320; https://doi.org/10.3390/nu18091320 - 22 Apr 2026

Viewed by 1082

Abstract

Landmark epidemiological studies and clinical trials, such as the Seven Countries Study, the Lyon Diet Heart Study, the PREDIMED Study and the CORDIOPREV Study, have shown significant reductions in cardiovascular events in those following the Mediterranean diet (MD). The aim of the present [...] Read more.

Landmark epidemiological studies and clinical trials, such as the Seven Countries Study, the Lyon Diet Heart Study, the PREDIMED Study and the CORDIOPREV Study, have shown significant reductions in cardiovascular events in those following the Mediterranean diet (MD). The aim of the present work is to summarize the most robust available evidence and the major biological pathways underlying the protective effects of the MD, with particular emphasis on the role of PAF inhibitors. Mechanistically, MD functions through a complex synergy of antioxidant, anti-inflammatory, and antithrombotic effects that collectively improve lipid profiles, enhance endothelial function, optimize postprandial metabolism and cell membrane signaling, making it a functional model for human longevity. The PAF-Implicated Atherosclerosis Theory has emerged as a key unifying framework, proposing that Platelet-Activating Factor (PAF)—a highly potent lipid inflammatory mediator—plays a central role in the initiation and progression of atherosclerosis. Oxidized LDL promotes the production of PAF and PAF-like lipids, leading to endothelial dysfunction, vascular inflammation, and atherosclerotic plaque formation. Traditional Mediterranean foods are rich in natural PAF inhibitors, particularly the polar lipid fractions of extra virgin olive oil, as well as wine, fish, vegetables, onions, and garlic. Animal studies demonstrate that these compounds can reduce or even regress atherosclerotic lesions, independently of serum cholesterol levels. Human dietary interventions have further shown that MD-based meals and functional foods enriched with PAF inhibitors reduce PAF activity and improve thrombosis-related biomarkers. This mechanistic framework helps explain phenomena such as the “French Paradox” and the cardio-protective effects associated with fish consumption. Moreover, the extraction of PAF inhibitors from Mediterranean food by-products, such as olive pomace, offers promising ecological and economic advantages. Collectively, targeting PAF and increasing dietary intake of PAF inhibitors represent promising strategies for the prevention and management of atherosclerosis and other inflammatory diseases, supporting the view that PAF may function as a major, modifiable risk factor in these conditions. Full article

(This article belongs to the Special Issue Mediterranean Diet and Cardiovascular Diseases)

► Show Figures

Graphical abstract

23 pages, 53680 KB

Open AccessArticle

A Movement Description Language for Functional Training Exercise Analysis

by Lúcia Sousa, Daniel Canedo, Pedro Santos and António Neves

J. Funct. Morphol. Kinesiol. 2026, 11(2), 162; https://doi.org/10.3390/jfmk11020162 - 21 Apr 2026

Viewed by 417

Abstract

Objective: Functional training exercises involve complex multi-joint movements that challenge traditional rule-based or data-driven recognition systems. This paper introduces a Movement Description Language (MDL) designed to formally represent, analyze, and evaluate such exercises using camera-based pose estimation and interpretable, composable structures. Methods: The [...] Read more.

Objective: Functional training exercises involve complex multi-joint movements that challenge traditional rule-based or data-driven recognition systems. This paper introduces a Movement Description Language (MDL) designed to formally represent, analyze, and evaluate such exercises using camera-based pose estimation and interpretable, composable structures. Methods: The proposed MDL models each exercise as a finite-state machine defined by pose-derived angle proxy transitions, allowing movements to be described in a modular and reusable way. Demonstrated with MediaPipe landmark extraction from monocular video, while the MDL remains compatible with any pose estimation algorithm, the framework focuses on exercise phase detection and repetition counting. Experimental validation was conducted on a dataset of 1513 videos of 12 functional exercises (squats, deadlifts, lunges, shoulder presses, planks, push-ups, pull-ups, bent-over rows, box jumps, thrusters, overhead squats, and burpees) obtained from public pose datasets, competition footage, and recordings of 9 participants in real-world environments. Results: Automated repetition counts were compared against manually annotated ground truth, showing an overall repetition-counting accuracy of 97.2%, with a mean per-exercise accuracy of 98.8% (range 95–100%). The MDL successfully handled both simple and compound exercises, maintaining reliable phase detection despite variations in execution speed, camera perspective, and environmental conditions. Conclusions: The system was implemented using real-time pose estimation to demonstrate the practical execution of the MDL framework. The proposed MDL provides a transparent, extensible, and computationally efficient framework for functional exercise analysis. By bridging human-readable movement semantics with executable motion logic, it enables interpretable automatic repetition counting and phase detection, offering an alternative to black-box recognition approaches. The results support its potential for scalable deployment in training, monitoring and movement analysis applications. The proposed system is not intended for biomechanical measurement or clinical-grade kinematic analysis, but rather for interpretable modeling of exercise structure and repetition detection using approximate pose-derived signals. Full article

(This article belongs to the Section Kinesiology and Biomechanics)

► Show Figures

Graphical abstract

42 pages, 7524 KB

Open AccessArticle

3D Face Reconstruction with Deep Learning: Architectures, Datasets, and Benchmark Analysis

by Sankarshan Dasgupta, Ju Shen and Tam V. Nguyen

Sensors 2026, 26(8), 2540; https://doi.org/10.3390/s26082540 - 20 Apr 2026

Viewed by 1369

Abstract

Three-Dimensional (3D) face reconstruction from monocular Red-Green-Blue (RGB) imagery remains a fundamental yet ill-posed challenge in computer vision, with applications in biometrics, augmented reality/virtual reality (AR/VR), and intelligent visual sensing systems. While deep learning has significantly improved reconstruction fidelity and realism, existing surveys [...] Read more.

Three-Dimensional (3D) face reconstruction from monocular Red-Green-Blue (RGB) imagery remains a fundamental yet ill-posed challenge in computer vision, with applications in biometrics, augmented reality/virtual reality (AR/VR), and intelligent visual sensing systems. While deep learning has significantly improved reconstruction fidelity and realism, existing surveys primarily focus on network architectures in isolation, often overlooking how sensing conditions, data acquisition protocols, and geometric calibration influence reconstruction reliability and evaluation outcomes. This paper presents a sensor-aware, end-to-end review of deep learning-based 3D face reconstruction and introduces a unified modular framework that connects sensing hardware, data acquisition, calibration, representation learning, and geometric refinement within a coherent pipeline. The reconstruction process is organized into four stages: sensor-driven acquisition and calibration, landmark estimation and feature extraction, 3D representation and parameter regression, and iterative refinement via differentiable rendering. Within this framework, we examine how sensor characteristics, calibration accuracy, representation models, and supervision strategies affect reconstruction accuracy, perceptual quality, robustness, and computational efficiency. We further synthesize the reported results across widely used benchmarks using both geometric and perceptual metrics, highlighting trade-offs between reconstruction fidelity and deployment constraints. By integrating sensing-aware analysis with architectural evaluation, this survey provides practical insights for developing scalable and reliable 3D face reconstruction systems under real-world conditions. Full article

(This article belongs to the Special Issue Visual Sensing Methods for 3D Object Detection, Tracking, and Quantification)

► Show Figures

Figure 1

27 pages, 3995 KB

Open AccessArticle

Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques

by Dana El-Rushaidat, Nour Almohammad, Raine Yeh and Kinda Fayyad

J. Imaging 2026, 12(4), 177; https://doi.org/10.3390/jimaging12040177 - 20 Apr 2026

Viewed by 955

Abstract

This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile [...] Read more.

This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

9 pages, 4519 KB

Open AccessProceeding Paper

UAV Position Tracking with Ground Cameras

by Andrea Masiero, Paolo Dabove, Vincenzo Di Pietra, Marco Piragnolo, Alberto Guarnieri, Charles Toth, Wioleta Blaszczak-Bak, Jelena Gabela and Kai-Wei Chiang

Eng. Proc. 2026, 126(1), 50; https://doi.org/10.3390/engproc2026126050 - 15 Apr 2026

Viewed by 432

Abstract

The use of Unmanned Aerial Vehicles (UAVs) has become quite popular in several applications during the last few years. Their spread is motivated by the flexibility of usage of UAVs and by their ability to automatically execute several tasks, mostly thanks to the [...] Read more.

The use of Unmanned Aerial Vehicles (UAVs) has become quite popular in several applications during the last few years. Their spread is motivated by the flexibility of usage of UAVs and by their ability to automatically execute several tasks, mostly thanks to the availability of Global Navigation Satellite Systems (GNSSs), which usually allow reliable outdoor localization of aerial vehicles. However, the extension of task automatic execution indoors, and in other challenging working conditions for the GNSS, requires an alternative positioning system able to compensate for the unreliability or unavailability of GNSS in those cases. To this end, additional sensors are usually considered. Among them, cameras are probably the most popular ones. The most common case of a vision-based positioning system is a camera mounted on a moving platform used to determine its ego-motion in a dead-reckoning approach, i.e., visual odometry. Although this solution is affordable and does not require the installation of any infrastructure, it enables absolute positioning of the camera, i.e., of the UAV, only if certain landmarks, with known position, are visible in the flying area. In contrast, this work considers the use of external cameras installed in the flying area to track the UAV movements. This approach is similar to the one implemented in motion capture systems as well, where a set of static cameras is used to triangulate some target positions using calibrated cameras. Instead, this work investigates the use of vision and machine learning tools to (i) extract the UAV position from each video frame and (ii) estimate its 3D position. Estimation of the 3D UAV position is performed with a single camera, exploiting machine learning tools in order to avoid the need for camera calibration. Performance analysis is provided for a dataset collected at the Agripolis campus of the University of Padua. Full article

(This article belongs to the Proceedings of European Navigation Conference 2025)

► Show Figures

Figure 1

28 pages, 3548 KB

Open AccessArticle

Edge Computing Approach to AI-Based Gesture for Human–Robot Interaction and Control

by Nikola Ivačko, Ivan Ćirić and Miloš Simonović

Computers 2026, 15(4), 241; https://doi.org/10.3390/computers15040241 - 14 Apr 2026

Viewed by 1073

Abstract

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection [...] Read more.

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection within a unified perception–decision–control pipeline. Hand landmarks are extracted using MediaPipe Hands, from which continuous hand trajectories, static gestures, and dynamic gestures are derived. Task objects are detected using a YOLO-based model, and both hand and object observations are mapped into the robot workspace using ArUco-based planar calibration. To ensure stable robot motion, the hand control signal is smoothed using low-pass and Kalman filtering, while dynamic gestures such as waving are recognized using a lightweight LSTM classifier. The complete pipeline runs locally on edge hardware, specifically NVIDIA Jetson Orin Nano and Raspberry Pi 5 with a Hailo AI accelerator. Experimental evaluation includes trajectory stability, gesture recognition reliability, and runtime performance on both platforms. Results show that filtering significantly reduces hand-tracking jitter, gesture recognition provides stable command states for control, and both edge devices support real-time operation, with Jetson achieving consistently lower runtime than Raspberry Pi. The proposed system demonstrates the feasibility of low-cost edge AI solutions for responsive and practical human–robot interaction in collaborative industrial environments. Full article

(This article belongs to the Special Issue Intelligent Edge: When AI Meets Edge Computing)

► Show Figures

Figure 1

26 pages, 2634 KB

Open AccessArticle

Minimal Angular Facial Representation for Real-Time Emotion Recognition

by Gerardo Garcia-Gil

Appl. Sci. 2026, 16(7), 3572; https://doi.org/10.3390/app16073572 - 6 Apr 2026

Viewed by 693

Abstract

Real-time facial emotion recognition remains challenging due to the high dimensionality and computational cost of dense facial representations, which limit their applicability in resource-constrained and real-time scenarios. This study proposes a compact, anatomically informed angular facial representation for efficient, interpretable emotion recognition under [...] Read more.

Real-time facial emotion recognition remains challenging due to the high dimensionality and computational cost of dense facial representations, which limit their applicability in resource-constrained and real-time scenarios. This study proposes a compact, anatomically informed angular facial representation for efficient, interpretable emotion recognition under real-time constraints. Facial landmarks are first extracted using a standard landmark detection framework, from which a reduced facial mesh of 27 anatomically selected points is defined. Internal geometric angles computed from this mesh are analyzed using temporal variability and redundancy criteria, resulting in a minimal set of eight angular descriptors that capture the most expressive facial dynamics while preserving geometric invariance and computational efficiency. The proposed representation is evaluated using multiple supervised machine learning classifiers under two complementary validation strategies: stratified frame-level cross-validation and strict Leave-One-Subject-Out evaluation. Under mixed-subject stratified validation, the best-performing model (MLP) achieved macro-averaged F1-scores exceeding 0.95 and near-unity ROC–AUC values. However, subject-independent evaluation revealed reduced generalization performance (average accuracy ≈55%), highlighting the influence of inter-subject morphological variability embedded in absolute angular descriptors. These findings indicate that a minimal angular geometric encoding provides strong intra-subject discriminative capability while transparently characterizing its cross-subject generalization limits, offering a practical and interpretable alternative for data- and resource-constrained real-time scenarios. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

21 pages, 13964 KB

Open AccessArticle

Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness

by Hao Chen, Zhengxu Zhang, Qin Li and Chunhui Feng

Algorithms 2026, 19(4), 270; https://doi.org/10.3390/a19040270 - 1 Apr 2026

Viewed by 668

Abstract

As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized [...] Read more.

As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized facial tampering artifacts. Meanwhile, traditional convolutional methods excel at extracting local image features but struggle to incorporate prior knowledge about facial anatomy, resulting in limited representational capability. To address these limitations, this paper proposes LGMamba, a novel detection framework that integrates facial guidance focusing on key facial components and fine-grained detail regions commonly manipulated in deepfakes with global modeling. First, we introduce an innovative Landmark-Guided Convolution (LGConv), which adaptively adjusts convolutional sampling positions using facial landmark information. This allows the model to attend to forgery-prone facial regions, such as the eyes and mouth. Second, we design a parallel Facial Structure Awareness Block (FSAB) to operate alongside the VMamba-based visual State-Space Model. Equipped with a multi-stage residual design and a CBAM attention mechanism, FSAB enhances the model’s sensitivity to subtle facial artifacts, enabling joint exploitation of global semantic consistency and fine-grained forgery cues within a unified architecture. The proposed LGMamba achieves superior performance compared to existing mainstream approaches. In cross-dataset evaluations, it attains AUC scores of 92.34% on CD1 and 96.01% on CD2, outperforming all compared methods. Full article

► Show Figures

Figure 1

Search Results (342)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (342)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI