Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (255)

Search Parameters:
Keywords = MediaPipe

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 3288 KB  
Article
An Intelligent Real-Time System for Sentence-Level Recognition of Continuous Saudi Sign Language Using Landmark-Based Temporal Modeling
by Adel BenAbdennour, Mohammed Mukhtar, Osama Almolike, Bilal A. Khawaja and Abdulmajeed M. Alenezi
Sensors 2026, 26(5), 1652; https://doi.org/10.3390/s26051652 - 5 Mar 2026
Viewed by 213
Abstract
A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) [...] Read more.
A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) and the scarcity of real-time, sentence-level translation systems. This paper presents a real-time system for sentence-level recognition of continuous SSL and direct mapping to natural spoken Arabic. The proposed system operates end-to-end on live video streams or pre-recorded content, extracting spatio-temporal landmark features using the MediaPipe Holistic framework. For classification, the input feature vector consists of 225 features derived from hand and body pose landmarks. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network trained on the ArabSign (ArSL) dataset to perform direct sentence-level classification over a vocabulary of 50 continuous Arabic sign language sentences, supported by an idle-based segmentation mechanism that enables natural, uninterrupted signing. Experimental evaluation demonstrates robust generalization: under a Leave-One-Signer-Out (LOSO) cross-validation protocol, the model attains a mean sentence-level accuracy of 94.2%, outperforming the fixed signer-independent split baseline of 92.07%, while maintaining real-time performance suitable for interactive use. To enhance linguistic fluency, an optional post-recognition refinement stage is incorporated using a large language model (LLM), followed by text-to-speech synthesis to produce audible Arabic output; this refinement operates strictly as post-processing and is not included in the reported recognition accuracy metrics. The results demonstrate that direct sentence-level modeling, combined with landmark-based feature extraction and real-time segmentation, provides an effective and practical solution for continuous SSL sentence recognition in real-time. Full article
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))
Show Figures

Figure 1

19 pages, 18999 KB  
Article
TFS Point-on-Hand Sign Recognition Using Part Affinity Fields
by Jinnavat Sanalohit and Tatpong Katanyukul
Appl. Sci. 2026, 16(5), 2416; https://doi.org/10.3390/app16052416 - 2 Mar 2026
Viewed by 159
Abstract
Our study investigates an application of a bottom-up design for keypoint regression, Part Affinity Fields (PAFs), for sign language recognition. Automatic sign language recognition could facilitate communication between deaf people and the hearing majority. Sign languages generally employ both semantic and finger-spelling signing. [...] Read more.
Our study investigates an application of a bottom-up design for keypoint regression, Part Affinity Fields (PAFs), for sign language recognition. Automatic sign language recognition could facilitate communication between deaf people and the hearing majority. Sign languages generally employ both semantic and finger-spelling signing. Semantic signing includes acting out to convey meaning, while finger spelling complements signing through the spelling out of proper names. Specifically, this article addresses an automatic recognition framework for the static point-on-hand (PoH) signing of Thai Finger Spelling (TFS)—the finger-spelling part of Thai Sign Language (TSL). From a pattern recognition perspective, PoH signing is quite distinct among signing schemes for requirement of precise localization of key parts on the signing hands. A recent study addressed PoH using an off-the-shelf version of MediaPipe Hands (MPH) and found shortcomings particularly when there was a high degree of hand-to-hand interaction. The top-down design of MPH was hypothesized to be the culprit. Our study investigates a bottom-up design, Part Affinity Fields (PAFs), along with examination of the related factors. The results support the hypothesis of a high-degree of hand-to-hand interaction posited by the MPH study. However, the overall performance of the PAF-based approach is shown to be modestly effective (72% accuracy vs. 58% and 47% of the MPH- and X-Pose-based approaches). In addition, its generalization is shown to be lacking. Thus TFS point-on-hand sign recognition remains a challenge. Full article
Show Figures

Figure 1

17 pages, 1732 KB  
Article
Lightweight Visual Dynamic Gesture Recognition System Based on CNN-LSTM-DSA
by Zhenxing Wang, Ziyan Wu, Ruidi Qi and Xuan Dou
Sensors 2026, 26(5), 1558; https://doi.org/10.3390/s26051558 - 2 Mar 2026
Viewed by 198
Abstract
Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, [...] Read more.
Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, extracts 21 keypoint 3D coordinates using MediaPipe, and employs a lightweight hybrid model to perform spatial and temporal feature modeling on keypoint sequences, achieving high-precision recognition of complex dynamic gestures. In static gesture recognition, the system determines the gesture state through joint angle calculation and a sliding window smoothing algorithm, ensuring smooth mapping of the servo motor angles and stability of the robotic hand’s movements. In dynamic gesture recognition, the system models the key point time series based on the CNN-LSTM-DSA hybrid model, enabling accurate classification and reproduction of gesture actions. Experimental results show that the proposed system demonstrates good robustness under various lighting and background conditions, with a static gesture recognition accuracy of up to 96%, dynamic gesture recognition accuracy of 90.19%, and an overall response delay of less than 300 ms. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

7 pages, 5296 KB  
Proceeding Paper
Multi-Step Action Recognition for Long-Term Care Using Temporal Convolutional Network–Dynamic Time Warping–Finite State Machine and MediaPipe
by Feng-Jung Liu, Mei-Jou Lu and Min Chao
Eng. Proc. 2026, 129(1), 21; https://doi.org/10.3390/engproc2026129021 - 28 Feb 2026
Viewed by 98
Abstract
An intelligent multi-step action recognition system was designed for long-term caregiver training and assessment. Leveraging MediaPipe for precise and real-time human pose estimation, the system extracts detailed spatiotemporal body and hand keypoints. Temporal convolutional networks are employed to effectively capture temporal dependencies and [...] Read more.
An intelligent multi-step action recognition system was designed for long-term caregiver training and assessment. Leveraging MediaPipe for precise and real-time human pose estimation, the system extracts detailed spatiotemporal body and hand keypoints. Temporal convolutional networks are employed to effectively capture temporal dependencies and complex features from sequential motion data. Dynamic time warping provides robust sequence alignment, allowing flexible comparison between performed actions and standard templates despite temporal variations in execution speed or style. A finite state machine imposes logical constraints by modeling expected action step sequences, enabling accurate detection of sequence anomalies or deviations. This hybrid architecture supports comprehensive evaluation and real-time feedback, facilitating improved caregiver skill acquisition, process adherence, and quality control within long-term care settings. The system aims to advance digital transformation in healthcare education by providing a scalable, precise, and adaptive training solution. Full article
Show Figures

Figure 1

22 pages, 1021 KB  
Article
Clinical Validation of an On-Device AI-Driven Real-Time Human Pose Estimation and Exercise Prescription Program; Prospective Single-Arm Quasi-Experimental Study
by Seoyoon Heo, Taeseok Choi and Wansuk Choi
Healthcare 2026, 14(4), 482; https://doi.org/10.3390/healthcare14040482 - 13 Feb 2026
Viewed by 385
Abstract
Background: Physical inactivity remains a major public health challenge, particularly for underserved populations lacking exercise facility access. AI-powered smartphone applications with real-time human pose estimation offer scalable solutions, but they lack rigorous clinical validation. Objective: This study validates the clinical efficacy of a [...] Read more.
Background: Physical inactivity remains a major public health challenge, particularly for underserved populations lacking exercise facility access. AI-powered smartphone applications with real-time human pose estimation offer scalable solutions, but they lack rigorous clinical validation. Objective: This study validates the clinical efficacy of a 16-week on-device AI-driven resistance training program using MediaPipe pose estimation technology in young adults with limited facility access. Primary outcomes included muscular strength (1RM squat), body composition, functional movement (FMS), and cardiorespiratory fitness (VO2max). Methods: A single-group pre–post study enrolled 216 participants (mean age 23.77 ± 4.02 years; 69.2% male), with 146 (67.6%) completing the protocol. Participants performed three 30 min weekly sessions of seven compound exercises delivered via a smartphone app providing real-time pose analysis (97.2% key point accuracy, 28.6 ms inference), multimodal feedback, and personalized progression using self-selected equipment. Results: Significant improvements across all domains: muscular strength (+4.39 kg 1RM squat, p < 0.001, d = 1.148), body fat (−2.92%, p < 0.001, d = −1.373), skeletal muscle mass (+2.19 kg, p < 0.001, d = 1.433), FMS (+0.29 points, p = 0.001, d = 0.285), and VO2max (+1.82 mL/kg/min, p < 0.001, d = 0.917). Pose classification accuracy reached 95.8% vs. physiotherapist assessment (ICC = 0.94). Conclusions: This study provides the first clinical evidence that on-device AI pose estimation enables facility-independent resistance training with outcomes comparable to traditional programs. Unlike cloud-based systems, our lightweight model (28.6 ms inference) supports real-time mobile deployment, advancing accessible precision exercise medicine. Limitations include a single-arm design and gender imbalance, warranting future RCTs with diverse cohorts. Full article
(This article belongs to the Special Issue Artificial Intelligence and Machine Learning in Rehabilitation)
Show Figures

Figure 1

27 pages, 5316 KB  
Article
Webcam-Based Exergame for Motor Recovery with Physical Assessment via DTW
by Norapat Labchurat, Kingkarn Sookhanaphibarn, Worawat Choensawat and Pujana Paliyawan
Sensors 2026, 26(4), 1219; https://doi.org/10.3390/s26041219 - 13 Feb 2026
Viewed by 345
Abstract
This paper presents RehabHub, a home-based exergaming system that integrates standardized physical assessment directly into gameplay by using a common webcam and MediaPipe for real-time pose estimation. The system quantifies upper-limb movement quality, specifically abduction, shoulder flexion, and elbow flexion based on FMA-UE [...] Read more.
This paper presents RehabHub, a home-based exergaming system that integrates standardized physical assessment directly into gameplay by using a common webcam and MediaPipe for real-time pose estimation. The system quantifies upper-limb movement quality, specifically abduction, shoulder flexion, and elbow flexion based on FMA-UE guidelines, by applying Dynamic Time Warping (DTW) together with a Z-score-based scoring model that relies on data from non-clinical adult participants. A pilot study, which included movements simulated with a 5-kg resistance band, evaluated three feature-extraction methods. The findings indicate that the single-angle method provides the clearest distinction between normal and abnormal movements, particularly for abduction and elbow flexion. In the case of shoulder flexion, the score separation was less distinct because of movement variability and posture-related angle fluctuations, which suggests that further refinement of feature design is needed. The cloud-based platform supports remote monitoring and gives caregivers access to both performance scores and recorded exercise videos. Overall, the results demonstrate the feasibility of a low-cost webcam-based assessment integrated into exergaming, and they highlight important trends for improving abnormal-movement detection in home rehabilitation systems. Full article
Show Figures

Graphical abstract

17 pages, 1091 KB  
Article
ASD Recognition Through Weighted Integration of Landmark-Based Handcrafted and Pixel-Based Deep Learning Features
by Asahi Sekine, Abu Saleh Musa Miah, Koki Hirooka, Najmul Hassan, Md. Al Mehedi Hasan, Yuichi Okuyama, Yoichi Tomioka and Jungpil Shin
Computers 2026, 15(2), 124; https://doi.org/10.3390/computers15020124 - 13 Feb 2026
Viewed by 426
Abstract
Autism Spectrum Disorder (ASD) is a neurological condition that affects communication and social interaction skills, with individuals experiencing a range of challenges that often require specialized care. Automated systems for recognizing ASD face significant challenges due to the complexity of identifying distinguishing features [...] Read more.
Autism Spectrum Disorder (ASD) is a neurological condition that affects communication and social interaction skills, with individuals experiencing a range of challenges that often require specialized care. Automated systems for recognizing ASD face significant challenges due to the complexity of identifying distinguishing features from facial images. This study proposes an incremental advancement in ASD recognition by introducing a dual-stream model that combines handcrafted facial-landmark features with deep learning-based pixel-level features. The model processes images through two distinct streams to capture complementary aspects of facial information. In the first stream, facial landmarks are extracted using MediaPipe (v0.10.21),with a focus on 137 symmetric landmarks. The face’s position is adjusted using in-plane rotation based on eye-corner angles, and geometric features along with 52 blendshape features are processed through Dense layers. In the second stream, RGB image features are extracted using pre-trained CNNs (e.g., ResNet50V2, DenseNet121, InceptionV3) enhanced with Squeeze-and-Excitation (SE) blocks, followed by feature refinement through Global Average Pooling (GAP) and DenseNet layers. The outputs from both streams are fused using weighted concatenation through a softmax gate, followed by further feature refinement for classification. This hybrid approach significantly improves the ability to distinguish between ASD and non-ASD faces, demonstrating the benefits of combining geometric and pixel-based features. The model achieved an accuracy of 96.43% on the Kaggle dataset and 97.83% on the YTUIA dataset. Statistical hypothesis testing further confirms that the proposed approach provides a statistically meaningful advantage over strong baselines, particularly in terms of classification correctness and robustness across datasets. While these results are promising, they show incremental improvements over existing methods, and future work will focus on optimizing performance to exceed current benchmarks. Full article
(This article belongs to the Special Issue Machine and Deep Learning in the Health Domain (3rd Edition))
Show Figures

Figure 1

31 pages, 3468 KB  
Article
From RGB-D to RGB-Only: Reliability and Clinical Relevance of Markerless Skeletal Tracking for Postural Assessment in Parkinson’s Disease
by Claudia Ferraris, Gianluca Amprimo, Gabriella Olmo, Marco Ghislieri, Martina Patera, Antonio Suppa, Silvia Gallo, Gabriele Imbalzano, Leonardo Lopiano and Carlo Alberto Artusi
Sensors 2026, 26(4), 1146; https://doi.org/10.3390/s26041146 - 10 Feb 2026
Viewed by 375
Abstract
Axial postural abnormalities in Parkinson’s Disease (PD) are traditionally assessed using clinical rating scales, although picture-based assessment is considered the gold standard. This study evaluates the reliability and clinical relevance of two markerless body-tracking frameworks, the RGB-D-based Microsoft Azure Kinect (providing the reference [...] Read more.
Axial postural abnormalities in Parkinson’s Disease (PD) are traditionally assessed using clinical rating scales, although picture-based assessment is considered the gold standard. This study evaluates the reliability and clinical relevance of two markerless body-tracking frameworks, the RGB-D-based Microsoft Azure Kinect (providing the reference KIN_3D model) and the RGB-only Google MediaPipe Pose (MP), using a synchronous dual-camera setup. Forty PD patients performed a 60 s static standing task. We compared KIN_3D with three MP models (at different complexity levels) across horizontal, vertical, sagittal, and 3D joint angles. Results show that lower-complexity MP models achieved high congruence with KIN_3D for trunk and shoulder alignment (ρ > 0.75), while the lateral view significantly improved tracking of sagittal angles (ρ ≥ 0.72). Conversely, the high-complexity model introduced significant skeletal distortions. Clinically, several angular parameters emerged as robust metrics for postural assessment and global motor impairments, while sagittal angles correlated with motor complications. Unexpectedly, a more upright frontal alignment was associated with greater freezing of gait severity, suggesting that static postural metrics may serve as proxies for dynamic gait performance. In addition, both RGB-only and RGB-D frameworks effectively discriminated between postural severity clusters. While the higher-complexity MP model should be avoided due to inaccurate 3D reconstructions, our findings demonstrate that low- and medium-complexity MP models represent a reliable alternative to RGB-D sensors for objective postural assessment in PD, facilitating the widespread application of objective posture measurements in clinical contexts. Full article
(This article belongs to the Special Issue Sensors for Human Motion Analysis and Applications)
Show Figures

Figure 1

17 pages, 7804 KB  
Article
A 3D Camera-Based Approach for Real-Time Hand Configuration Recognition in Italian Sign Language
by Luca Ulrich, Asia De Luca, Riccardo Miraglia, Emma Mulassano, Simone Quattrocchio, Giorgia Marullo, Chiara Innocente, Federico Salerno and Enrico Vezzetti
Sensors 2026, 26(3), 1059; https://doi.org/10.3390/s26031059 - 6 Feb 2026
Viewed by 296
Abstract
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign [...] Read more.
Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign Language (LIS). In this work, we address configuration-level recognition as an independent classification task and propose a machine vision framework based on RGB-D sensing. The proposed approach combines MediaPipe-based hand landmark extraction with normalized three-dimensional geometric features and a Support Vector Machine classifier. The first contribution of this study is the formulation of LIS hand configuration recognition as a standalone, configuration-level problem, decoupled from temporal gesture modeling. The second contribution is the integration of sensor-acquired RGB-D depth measurements into the landmark-based feature representation, enabling a direct comparison with estimated depth obtained from monocular data. The third contribution consists of a systematic experimental evaluation on two LIS configuration sets (6 and 16 classes), demonstrating that the use of real depth significantly improves classification performance and class separability, particularly for geometrically similar configurations. The results highlight the critical role of depth quality in configuration-level recognition and provide insights into the design of robust vision-based systems for LIS analysis. Full article
(This article belongs to the Special Issue Sensing and Machine Learning Control: Progress and Applications)
Show Figures

Figure 1

22 pages, 763 KB  
Article
Comparative Evaluation of LSTM and 3D CNN Models in a Hybrid System for IoT-Enabled Sign-to-Text Translation in Deaf Communities
by Samar Mouti, Hani Al Chalabi, Mohammed Abushohada, Samer Rihawi and Sulafa Abdalla
Informatics 2026, 13(2), 27; https://doi.org/10.3390/informatics13020027 - 5 Feb 2026
Viewed by 493
Abstract
This paper presents a hybrid deep learning framework for real-time sign language recognition (SLR) tailored to Internet of Things (IoT)-enabled environments, enhancing accessibility for Deaf communities. The proposed system integrates a Long Short-Term Memory (LSTM) network for static gesture recognition and a 3D [...] Read more.
This paper presents a hybrid deep learning framework for real-time sign language recognition (SLR) tailored to Internet of Things (IoT)-enabled environments, enhancing accessibility for Deaf communities. The proposed system integrates a Long Short-Term Memory (LSTM) network for static gesture recognition and a 3D Convolutional Neural Network (3D CNN) for dynamic gesture recognition. Implemented on a Raspberry Pi device using MediaPipe for landmark extraction, the system supports low-latency, on-device inference suitable for resource-constrained edge computing. Experimental results demonstrate that the LSTM model achieves its highest stability and performance for static signs at 1000 training epochs, yielding an average F1-score of 0.938 and an accuracy of 86.67%. In contrast, at 2000 epochs, the model exhibits a catastrophic performance collapse (F1-score of 0.088) due to overfitting and weight instability, highlighting the necessity of careful training regulation. Despite this, the overall system achieves consistently high classification performance under controlled conditions. In contrast, the 3D CNN component maintains robust and consistent performance across all evaluated training phases (500–2000 epochs), achieving up to 99.6% accuracy on dynamic signs. When deployed on a Raspberry Pi platform, the system achieves real-time performance with a frame rate of 12–15 FPS and an average inference latency of approximately 65 ms per frame. The hybrid architecture effectively balances recognition accuracy with computational efficiency by routing static gestures to the LSTM and dynamic gestures to the 3D CNN. This work presents a detailed epoch-wise comparative analysis of model stability and computational feasibility, contributing a practical and scalable IoT-enabled solution for inclusive, real-time sign-to-text communication in intelligent environments. Full article
(This article belongs to the Section Machine Learning)
Show Figures

Figure 1

14 pages, 5659 KB  
Article
Computer Vision 360 Video Analysis in Sports: 3D Athlete Pose and Rig Motion Estimation in Olympic Sailing
by Lars Martin Ølstad, Eirik E. Semb, Sander Hjortland and Martin Steinert
Appl. Sci. 2026, 16(3), 1386; https://doi.org/10.3390/app16031386 - 29 Jan 2026
Viewed by 477
Abstract
This paper presents a novel system for estimating 3D athlete pose, boom angle, and rudder angle in Olympic dinghy sailing using onboard 360° video footage. The proposed approach integrates adaptive panorama slicing, keypoint-based rig detection, and geometric ray-casting into an end-to-end pipeline for [...] Read more.
This paper presents a novel system for estimating 3D athlete pose, boom angle, and rudder angle in Olympic dinghy sailing using onboard 360° video footage. The proposed approach integrates adaptive panorama slicing, keypoint-based rig detection, and geometric ray-casting into an end-to-end pipeline for quantitative performance analysis under real-world on-water conditions. Traditionally, restrictive International Laser Class Association (ILCA) rules have prohibited advanced sensor systems during competition. However, recent rule changes permit a single onboard camera, enabling unobtrusive and rule-compliant measurement solutions. The purpose of this study is to evaluate whether a competition-legal 360° camera combined with computer vision can provide meaningful performance-related measurements in Olympic sailing. The experimental results indicate that computer-vision-based analysis can complement traditional performance assessment and provide access to data previously limited to physical sensors or manual estimation. The system can support teams and coaches in identifying technique-related performance opportunities. Full article
(This article belongs to the Special Issue Sports Performance: Data Measurement, Analysis and Improvement)
Show Figures

Graphical abstract

19 pages, 3470 KB  
Article
Driver Monitoring System Using Computer Vision for Real-Time Detection of Fatigue, Distraction and Emotion via Facial Landmarks and Deep Learning
by Tamia Zambrano, Luis Arias, Edgar Haro, Victor Santos and María Trujillo-Guerrero
Sensors 2026, 26(3), 889; https://doi.org/10.3390/s26030889 - 29 Jan 2026
Viewed by 811
Abstract
Car accidents remain a leading cause of death worldwide, with drowsiness and distraction accounting for roughly 25% of fatal crashes in Ecuador. This study presents a real-time driver monitoring system that uses computer vision and deep learning to detect fatigue, distraction, and emotions [...] Read more.
Car accidents remain a leading cause of death worldwide, with drowsiness and distraction accounting for roughly 25% of fatal crashes in Ecuador. This study presents a real-time driver monitoring system that uses computer vision and deep learning to detect fatigue, distraction, and emotions from facial expressions. It combines a MobileNetV2-based CNN trained on RAF-DB for emotion recognition and MediaPipe’s 468 facial landmarks to compute the EAR (Eye Aspect Ratio), the MAR (Mouth Aspect Ratio), the gaze, and the head pose. Tests with 27 participants in both real and simulated driving environments showed strong results. There was a 100% accuracy in detecting distraction, 85.19% for yawning, and 88.89% for eye closure. The system also effectively recognized happiness (100%) and anger/disgust (96.3%). However, it struggled with sadness and failed to detect fear, likely due to the subtlety of real-world expressions and limitations in the training dataset. Despite these challenges, the results highlight the importance of integrating emotional awareness into driver monitoring systems, which helps reduce false alarms and improve response accuracy. This work supports the development of lightweight, non-invasive technologies that enhance driving safety through intelligent behavior analysis. Full article
(This article belongs to the Special Issue Sensor Fusion for the Safety of Automated Driving Systems)
Show Figures

Figure 1

22 pages, 8373 KB  
Article
Real-Time Automated Ergonomic Monitoring: A Bio-Inspired System Using 3D Computer Vision
by Gabriel Andrés Zamorano Núñez, Nicolás Norambuena, Isabel Cuevas Quezada, José Luis Valín Rivera, Javier Narea Olmos and Cristóbal Galleguillos Ketterer
Biomimetics 2026, 11(2), 88; https://doi.org/10.3390/biomimetics11020088 - 26 Jan 2026
Viewed by 443
Abstract
Work-related musculoskeletal disorders (MSDs) remain a global occupational health priority, with recognized limitations in current point-in-time assessment methodologies. This research extends prior computer vision ergonomic assessment approaches by implementing biological proprioceptive feedback principles into a continuous, real-time monitoring system. Unlike traditional periodic ergonomic [...] Read more.
Work-related musculoskeletal disorders (MSDs) remain a global occupational health priority, with recognized limitations in current point-in-time assessment methodologies. This research extends prior computer vision ergonomic assessment approaches by implementing biological proprioceptive feedback principles into a continuous, real-time monitoring system. Unlike traditional periodic ergonomic evaluation methods such as “Rapid Upper Limb Assessment” (RULA), our bio-inspired system translates natural proprioceptive mechanisms—which enable continuous postural monitoring through spinal feedback loops operating at 50–150 ms latencies—into automated assessment technology. The system integrates (1) markerless 3D pose estimation via MediaPipe Holistic (33 anatomical landmarks at 30 FPS), (2) depth validation via Orbbec Femto Mega RGB-D camera (640 × 576 resolution, Time-of-Flight sensor), and (3) proprioceptive-inspired alert architecture. Experimental validation with 40 adult participants (age 18–25, n = 26 female, n = 14 male) performing standardized load-lifting tasks (6 kg) demonstrated that 62.5% exhibited critical postural risk (RULA ≥ 5) during dynamic movement versus 7.5% at static rest, with McNemar test p<0.001 (Cohen’s h=1.22, 95% CI: 0.91–0.97). The system achieved 95% Pearson correlation between risk elevation and alert activation, with response latency of 42.1±8.3 ms. This work demonstrates technical feasibility for continuous occupational monitoring. However, long-term prospective studies are required to establish whether continuous real-time feedback reduces workplace injury incidence. The biomimetic design framework provides a systematic foundation for translating biological feedback principles into occupational health technology. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

21 pages, 8159 KB  
Article
Accuracy and Reliability of Markerless Human Pose Estimation for Upper Limb Kinematic Analysis Across Full and Partial Range of Motion Tasks
by Carlalberto Francia, Lucia Donno, Filippo Motta, Veronica Cimolin, Manuela Galli and Antonella LoMauro
Appl. Sci. 2026, 16(3), 1202; https://doi.org/10.3390/app16031202 - 24 Jan 2026
Viewed by 357
Abstract
Markerless human pose estimation is increasingly used for kinematic assessment, but evidence of its applicability to upper limb movements across different ranges of motion (ROM) remains limited. This study examined the accuracy and reliability of a markerless pose estimation system for shoulder, elbow [...] Read more.
Markerless human pose estimation is increasingly used for kinematic assessment, but evidence of its applicability to upper limb movements across different ranges of motion (ROM) remains limited. This study examined the accuracy and reliability of a markerless pose estimation system for shoulder, elbow and wrist flexion–extension analysis under full and partial ROM tasks. Ten healthy participants performed standardized movements which were synchronously recorded, with an optoelectronic motion capture system used as a reference. Joint angles were compared using RMSE, percentage RMSE (%RMSE), accuracy (Acc), intraclass correlation coefficients (ICC), and Pearson correlation of ROM values. The markerless system reproduced the temporal morphology of the movement with high coherence, showing ICC values above 0.91 for the elbow and 0.94 for the shoulder in full ROM trials. Wrist tracking presented the lowest RMSE values and low inter-subject variability. The main critical aspect was a systematic underestimation of maximum flexion, especially at the shoulder, indicating a magnitude bias likely influenced by occlusion and joint geometry rather than by temporal fluctuations. Despite this limitation, the system adapted consistently to different ROM amplitudes, maintaining proportional variations in joint excursion across tasks. Overall, the findings outline the conditions in which markerless pose estimation provides reliable upper limb kinematics and where methodological improvements are still required, particularly in movements involving extreme flexion and occlusion. Full article
(This article belongs to the Section Mechanical Engineering)
Show Figures

Figure 1

27 pages, 11232 KB  
Article
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
by Sergei Kondratev, Yulia Dyrchenkova, Georgiy Nikitin, Leonid Voskov, Vladimir Pikalov and Victor Meshcheryakov
Technologies 2026, 14(1), 69; https://doi.org/10.3390/technologies14010069 - 16 Jan 2026
Viewed by 584
Abstract
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in [...] Read more.
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises two hierarchical control levels: (1) high-level discrete command control utilizing a fully connected neural network classifier for static gesture recognition, and (2) low-level continuous flight control based on three-dimensional hand keypoint analysis from a depth camera. The gesture classification module achieves an accuracy exceeding 99% using a multi-layer perceptron trained on MediaPipe-extracted hand landmarks. For continuous control, we propose a novel approach that computes Euler angles (roll, pitch, yaw) and throttle from 3D hand pose estimation, enabling intuitive four-degree-of-freedom quadcopter manipulation. A hybrid signal filtering pipeline ensures robust control signal generation while maintaining real-time responsiveness. Comparative user studies demonstrate that gesture-based control reduces task completion time by 52.6% for beginners compared to conventional remote controllers. The results confirm the viability of vision-based gesture interfaces for IoT-enabled UAV applications. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

Back to TopTop