Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (68)

Search Parameters:
Keywords = MediaPipe Pose

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 3470 KB  
Article
Driver Monitoring System Using Computer Vision for Real-Time Detection of Fatigue, Distraction and Emotion via Facial Landmarks and Deep Learning
by Tamia Zambrano, Luis Arias, Edgar Haro, Victor Santos and María Trujillo-Guerrero
Sensors 2026, 26(3), 889; https://doi.org/10.3390/s26030889 - 29 Jan 2026
Viewed by 249
Abstract
Car accidents remain a leading cause of death worldwide, with drowsiness and distraction accounting for roughly 25% of fatal crashes in Ecuador. This study presents a real-time driver monitoring system that uses computer vision and deep learning to detect fatigue, distraction, and emotions [...] Read more.
Car accidents remain a leading cause of death worldwide, with drowsiness and distraction accounting for roughly 25% of fatal crashes in Ecuador. This study presents a real-time driver monitoring system that uses computer vision and deep learning to detect fatigue, distraction, and emotions from facial expressions. It combines a MobileNetV2-based CNN trained on RAF-DB for emotion recognition and MediaPipe’s 468 facial landmarks to compute the EAR (Eye Aspect Ratio), the MAR (Mouth Aspect Ratio), the gaze, and the head pose. Tests with 27 participants in both real and simulated driving environments showed strong results. There was a 100% accuracy in detecting distraction, 85.19% for yawning, and 88.89% for eye closure. The system also effectively recognized happiness (100%) and anger/disgust (96.3%). However, it struggled with sadness and failed to detect fear, likely due to the subtlety of real-world expressions and limitations in the training dataset. Despite these challenges, the results highlight the importance of integrating emotional awareness into driver monitoring systems, which helps reduce false alarms and improve response accuracy. This work supports the development of lightweight, non-invasive technologies that enhance driving safety through intelligent behavior analysis. Full article
(This article belongs to the Special Issue Sensor Fusion for the Safety of Automated Driving Systems)
Show Figures

Figure 1

22 pages, 8373 KB  
Article
Real-Time Automated Ergonomic Monitoring: A Bio-Inspired System Using 3D Computer Vision
by Gabriel Andrés Zamorano Núñez, Nicolás Norambuena, Isabel Cuevas Quezada, José Luis Valín Rivera, Javier Narea Olmos and Cristóbal Galleguillos Ketterer
Biomimetics 2026, 11(2), 88; https://doi.org/10.3390/biomimetics11020088 - 26 Jan 2026
Viewed by 221
Abstract
Work-related musculoskeletal disorders (MSDs) remain a global occupational health priority, with recognized limitations in current point-in-time assessment methodologies. This research extends prior computer vision ergonomic assessment approaches by implementing biological proprioceptive feedback principles into a continuous, real-time monitoring system. Unlike traditional periodic ergonomic [...] Read more.
Work-related musculoskeletal disorders (MSDs) remain a global occupational health priority, with recognized limitations in current point-in-time assessment methodologies. This research extends prior computer vision ergonomic assessment approaches by implementing biological proprioceptive feedback principles into a continuous, real-time monitoring system. Unlike traditional periodic ergonomic evaluation methods such as “Rapid Upper Limb Assessment” (RULA), our bio-inspired system translates natural proprioceptive mechanisms—which enable continuous postural monitoring through spinal feedback loops operating at 50–150 ms latencies—into automated assessment technology. The system integrates (1) markerless 3D pose estimation via MediaPipe Holistic (33 anatomical landmarks at 30 FPS), (2) depth validation via Orbbec Femto Mega RGB-D camera (640 × 576 resolution, Time-of-Flight sensor), and (3) proprioceptive-inspired alert architecture. Experimental validation with 40 adult participants (age 18–25, n = 26 female, n = 14 male) performing standardized load-lifting tasks (6 kg) demonstrated that 62.5% exhibited critical postural risk (RULA ≥ 5) during dynamic movement versus 7.5% at static rest, with McNemar test p<0.001 (Cohen’s h=1.22, 95% CI: 0.91–0.97). The system achieved 95% Pearson correlation between risk elevation and alert activation, with response latency of 42.1±8.3 ms. This work demonstrates technical feasibility for continuous occupational monitoring. However, long-term prospective studies are required to establish whether continuous real-time feedback reduces workplace injury incidence. The biomimetic design framework provides a systematic foundation for translating biological feedback principles into occupational health technology. Full article
(This article belongs to the Section Bioinspired Sensorics, Information Processing and Control)
Show Figures

Figure 1

21 pages, 8159 KB  
Article
Accuracy and Reliability of Markerless Human Pose Estimation for Upper Limb Kinematic Analysis Across Full and Partial Range of Motion Tasks
by Carlalberto Francia, Lucia Donno, Filippo Motta, Veronica Cimolin, Manuela Galli and Antonella LoMauro
Appl. Sci. 2026, 16(3), 1202; https://doi.org/10.3390/app16031202 - 24 Jan 2026
Viewed by 141
Abstract
Markerless human pose estimation is increasingly used for kinematic assessment, but evidence of its applicability to upper limb movements across different ranges of motion (ROM) remains limited. This study examined the accuracy and reliability of a markerless pose estimation system for shoulder, elbow [...] Read more.
Markerless human pose estimation is increasingly used for kinematic assessment, but evidence of its applicability to upper limb movements across different ranges of motion (ROM) remains limited. This study examined the accuracy and reliability of a markerless pose estimation system for shoulder, elbow and wrist flexion–extension analysis under full and partial ROM tasks. Ten healthy participants performed standardized movements which were synchronously recorded, with an optoelectronic motion capture system used as a reference. Joint angles were compared using RMSE, percentage RMSE (%RMSE), accuracy (Acc), intraclass correlation coefficients (ICC), and Pearson correlation of ROM values. The markerless system reproduced the temporal morphology of the movement with high coherence, showing ICC values above 0.91 for the elbow and 0.94 for the shoulder in full ROM trials. Wrist tracking presented the lowest RMSE values and low inter-subject variability. The main critical aspect was a systematic underestimation of maximum flexion, especially at the shoulder, indicating a magnitude bias likely influenced by occlusion and joint geometry rather than by temporal fluctuations. Despite this limitation, the system adapted consistently to different ROM amplitudes, maintaining proportional variations in joint excursion across tasks. Overall, the findings outline the conditions in which markerless pose estimation provides reliable upper limb kinematics and where methodological improvements are still required, particularly in movements involving extreme flexion and occlusion. Full article
(This article belongs to the Section Mechanical Engineering)
Show Figures

Figure 1

27 pages, 11232 KB  
Article
Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2
by Sergei Kondratev, Yulia Dyrchenkova, Georgiy Nikitin, Leonid Voskov, Vladimir Pikalov and Victor Meshcheryakov
Technologies 2026, 14(1), 69; https://doi.org/10.3390/technologies14010069 - 16 Jan 2026
Viewed by 308
Abstract
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in [...] Read more.
This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises two hierarchical control levels: (1) high-level discrete command control utilizing a fully connected neural network classifier for static gesture recognition, and (2) low-level continuous flight control based on three-dimensional hand keypoint analysis from a depth camera. The gesture classification module achieves an accuracy exceeding 99% using a multi-layer perceptron trained on MediaPipe-extracted hand landmarks. For continuous control, we propose a novel approach that computes Euler angles (roll, pitch, yaw) and throttle from 3D hand pose estimation, enabling intuitive four-degree-of-freedom quadcopter manipulation. A hybrid signal filtering pipeline ensures robust control signal generation while maintaining real-time responsiveness. Comparative user studies demonstrate that gesture-based control reduces task completion time by 52.6% for beginners compared to conventional remote controllers. The results confirm the viability of vision-based gesture interfaces for IoT-enabled UAV applications. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

27 pages, 80350 KB  
Article
Pose-Based Static Sign Language Recognition with Deep Learning for Turkish, Arabic, and American Sign Languages
by Rıdvan Yayla, Hakan Üçgün and Mahmud Abbas
Sensors 2026, 26(2), 524; https://doi.org/10.3390/s26020524 - 13 Jan 2026
Viewed by 291
Abstract
Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark [...] Read more.
Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark extraction, ensuring stable and consistent feature representation across diverse linguistic contexts. Datasets were meticulously constructed from nine public-domain sources (four Arabic, three American, and two Turkish). The final training data comprises curated image datasets, with frames for each language carefully selected from varying angles and distances to ensure high diversity. A comprehensive comparative evaluation was conducted across three state-of-the-art deep learning architectures—ConvNeXt (CNN-based), Swin Transformer (ViT-based), and Vision Mamba (SSM-based)—all applied to identical feature sets. The evaluation demonstrates the superior performance of contemporary vision Transformers and state space models in capturing subtle spatial cues across diverse sign languages. Our approach provides a comparative analysis of model generalization capabilities across three distinct sign languages, offering valuable insights for model selection in pose-based SLR systems. Full article
(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))
Show Figures

Figure 1

20 pages, 4633 KB  
Article
Teleoperation System for Service Robots Using a Virtual Reality Headset and 3D Pose Estimation
by Tiago Ribeiro, Eduardo Fernandes, António Ribeiro, Carolina Lopes, Fernando Ribeiro and Gil Lopes
Sensors 2026, 26(2), 471; https://doi.org/10.3390/s26020471 - 10 Jan 2026
Viewed by 350
Abstract
This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense [...] Read more.
This paper presents an immersive teleoperation framework for service robots that combines real-time 3D human pose estimation with a Virtual Reality (VR) interface to support intuitive, natural robot control. The operator is tracked using MediaPipe for 2D landmark detection and an Intel RealSense D455 RGB-D (Red-Green-Blue plus Depth) camera for depth acquisition, enabling 3D reconstruction of key joints. Joint angles are computed using efficient vector operations and mapped to the kinematic constraints of an anthropomorphic arm on the CHARMIE service robot. A VR-based telepresence interface provides stereoscopic video and head-motion-based view control to improve situational awareness during manipulation tasks. Experiments in real-world object grasping demonstrate reliable arm teleoperation and effective telepresence; however, vision-only estimation remains limited for axial rotations (e.g., elbow and wrist yaw), particularly under occlusions and unfavorable viewpoints. The proposed system provides a practical pathway toward low-cost, sensor-driven, immersive human–robot interaction for service robotics in dynamic environments. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

13 pages, 1221 KB  
Article
A 2D Hand Pose Estimation System Accuracy for Finger Tapping Test Monitoring: A Pilot Study
by Saeid Edriss, Cristian Romagnoli, Rossella Rotondo, Maria Francesca De Pandis, Elvira Padua, Vincenzo Bonaiuto, Giuseppe Annino and Lloyd Smith
Appl. Sci. 2026, 16(1), 229; https://doi.org/10.3390/app16010229 - 25 Dec 2025
Viewed by 746
Abstract
Accurate and accessible motor function quantification is important for monitoring the movement disorders’ progression. Manual muscle testing models and wearable sensors can be costly or reduce degrees of freedom. Artificial intelligence, especially human pose estimation (PE), offers promising alternatives. This work aims to [...] Read more.
Accurate and accessible motor function quantification is important for monitoring the movement disorders’ progression. Manual muscle testing models and wearable sensors can be costly or reduce degrees of freedom. Artificial intelligence, especially human pose estimation (PE), offers promising alternatives. This work aims to compare the accuracy of a 2D PE tool for the Finger Tapping Test (FTT) with a 3D infrared motion capture system (MoCap). PE tracked three anatomical landmarks (wrist, thumb, index finger), while reflective markers were placed at corresponding locations on both tools to measure wrist-centered angles. Different trials of slow and rapid FTT sessions were statistically analyzed by rank correlation analysis, Friedman, Bland–Altman, and Kruskal–Wallis to assess agreement and repeatability. PE and MoCap measurements showed no significant differences (p > 0.05), with high reliability (ICC 0.87–0.91), low variability (CV 6–8.6%), and negligible effect size. Bland–Altman slopes indicated minor amplitude-dependent bias, while RMSE (2.92–4.48°) and MAPE (6.38–8.22%) errors occurred in slow and rapid conditions. These results demonstrate that 2D PE provides a reliable, accessible, and low-cost alternative for quantifying finger movement. The findings suggest that PE can serve as an assistive method for monitoring motor function. Future studies can be population-level studies with patients with neurological disorders. Full article
Show Figures

Figure 1

14 pages, 9414 KB  
Article
AutoMCA: A Robust Approach for Automatic Measurement of Cranial Angles
by Junjian Chen, Yuqian Wang, Xinyu Shi and Yan Luximon
Automation 2025, 6(4), 88; https://doi.org/10.3390/automation6040088 - 5 Dec 2025
Viewed by 439
Abstract
Head posture assessment commonly involves measuring cranial angles, with photogrammetry favored for its simplicity over CT scans or goniometers. However, most photo-based measurements remain manual, making them time-consuming and inefficient. Existing automatic measuring approaches often requires specific markers and clean backgrounds, limiting their [...] Read more.
Head posture assessment commonly involves measuring cranial angles, with photogrammetry favored for its simplicity over CT scans or goniometers. However, most photo-based measurements remain manual, making them time-consuming and inefficient. Existing automatic measuring approaches often requires specific markers and clean backgrounds, limiting their usability. We present AutoMCA, a robust automatic measurement system for cranial angles using accessible markers and tolerating typical indoor backgrounds. AutoMCA integrates MediaPipe Pose, a machine-learning solution, for head–neck segmentation and applies color thresholding and morphological operations for marker detection. Validation tests demonstrated Pearson correlation coefficients above 0.98 compared to manual Kinovea measurements for both the craniovertebral angle (CVA) and cranial rotation angle (CRA), confirming high accuracy. Further validation on individuals with neck disorders showed similarly strong correlations, supporting clinical applicability. Speed comparison tests revealed that AutoMCA significantly reduces measurement time compared to traditional photogrammetry. Robustness tests confirmed reliable performance across varied backgrounds and marker types. In conclusion, AutoMCA measures head posture efficiency and lowers the requirements for instruments and space, making the assessment more versatile and applicable. Full article
(This article belongs to the Topic Intelligent Image Processing Technology)
Show Figures

Figure 1

17 pages, 3038 KB  
Article
Research on Deep Learning-Based Human–Robot Static/Dynamic Gesture-Driven Control Framework
by Gong Zhang, Jiahong Su, Shuzhong Zhang, Jianzheng Qi, Zhicheng Hou and Qunxu Lin
Sensors 2025, 25(23), 7203; https://doi.org/10.3390/s25237203 - 25 Nov 2025
Cited by 1 | Viewed by 860
Abstract
For human–robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid [...] Read more.
For human–robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid architecture combining three-dimensional Convolutional Neural Networks (3D-CNNs) and Long Short-Term Memory networks (3D-CNN+LSTM) for dynamic gesture recognition. Results on a custom gesture dataset demonstrate validation accuracies of 95.38% for static gestures and 93.18% for dynamic gestures, respectively. Then, in order to control and drive the robot to perform corresponding tasks, hand pose estimation was performed. The MediaPipe machine learning framework was first employed to extract hand feature points. These 2D feature points were then converted into 3D coordinates using a depth camera-based pose estimation method, followed by coordinate system transformation to obtain hand poses relative to the robot’s base coordinate system. Finally, an experimental platform for human–robot gesture-driven interaction was established, deploying both gesture recognition models. Four participants were invited to perform 100 trials each of gesture-driven object-grasping and delivery tasks under three lighting conditions: natural light, low light, and strong light. Experimental results show that the average success rates for completing tasks via static and dynamic gestures are no less than 96.88% and 94.63%, respectively, with task completion times consistently within 20 s. These findings demonstrate that the proposed approach enables robust vision-based robotic control through natural hand gestures, showing great prospects for human–robot collaboration applications. Full article
Show Figures

Figure 1

15 pages, 2030 KB  
Article
Automated Classification of Baseball Pitching Phases Using Machine Learning and Artificial Intelligence-Based Posture Estimation
by Shin Osawa, Atsuyuki Inui, Yutaka Mifune, Kohei Yamaura, Tomoya Yoshikawa, Issei Shinohara, Masaya Kusunose, Shuya Tanaka, Shunsaku Takigami, Yutaka Ehara, Daiji Nakabayashi, Takanobu Higashi, Ryota Wakamatsu, Shinya Hayashi, Tomoyuki Matsumoto and Ryosuke Kuroda
Appl. Sci. 2025, 15(22), 12155; https://doi.org/10.3390/app152212155 - 16 Nov 2025
Viewed by 1160
Abstract
High-precision analyses of baseball pitching have traditionally relied on optical motion capture systems, which, despite their accuracy, are complex and impractical for widespread use. Classifying sequential pitching phases, essential for biomechanical evaluation, conventionally requires manual expert labeling, a time-consuming and labor-intensive process. Accurate [...] Read more.
High-precision analyses of baseball pitching have traditionally relied on optical motion capture systems, which, despite their accuracy, are complex and impractical for widespread use. Classifying sequential pitching phases, essential for biomechanical evaluation, conventionally requires manual expert labeling, a time-consuming and labor-intensive process. Accurate identification of phase boundaries is critical because they correspond to key temporal events related to pitching injuries. This study developed and validated a smartphone-based system for automatically classifying the five key pitching phases—wind-up, stride, arm-cocking, arm acceleration, and follow-through—using pose estimation artificial intelligence and machine learning. Slow-motion videos (240 frames per second, 1080p) of 500 healthy right-handed high school pitchers were recorded from the front using a single smartphone. Skeletal landmarks were extracted using MediaPipe, and 33 kinematic features, including joint angles and limb distances, were computed. Expert-annotated phase labels were used to train classification models. Among the models evaluated, Light Gradient Boosting Machine (LightGBM) achieved a classification accuracy of 99.7% and processed each video in a few seconds demonstrating feasibility for on-site analysis. This system enables high-accuracy phase classification directly from video without motion capture, supporting future tools to detect abnormal pitching mechanics, prevent throwing-related injuries, and broaden access to pitching analysis. Full article
Show Figures

Figure 1

32 pages, 16687 KB  
Article
Toward Robust Human Pose Estimation Under Real-World Image Degradations and Restoration Scenarios
by Nada E. Elshami, Ahmad Salah, Amr Abdellatif and Heba Mohsen
Information 2025, 16(11), 970; https://doi.org/10.3390/info16110970 - 10 Nov 2025
Viewed by 1359
Abstract
Human Pose Estimation (HPE) models have varied applications and represent a cutting-edge branch of study, whose systems such as MediaPipe (MP), OpenPose (OP), and AlphaPose (ALP) show marked success. One of these areas, however, that is inadequately researched is the impact of image [...] Read more.
Human Pose Estimation (HPE) models have varied applications and represent a cutting-edge branch of study, whose systems such as MediaPipe (MP), OpenPose (OP), and AlphaPose (ALP) show marked success. One of these areas, however, that is inadequately researched is the impact of image degradation on the accuracy of HPE models. Image degradation refers to images whose visual quality has been purposefully degraded by means of techniques, such as brightness adjustments (which can lead to an increase or a decrease in the intensity levels), geometric rotations, or resolution downscaling. The study of how these types of degradation impact the performance functionality of HPE models is an under-researched domaina that is a virtually unexplored area. In addition, current methods of the efficacy of existing image restoration techniques have not been rigorously evaluated and improving degraded images to a high quality has not been well examined in relation to improving HPE models. In this study, we explicitly clearly demonstrate a decline in the precision of the HPE model when image quality is degraded. Our qualitative and quantitative measurements identify a wide difference in performance in identifying landmarks as images undergo changes in brightness, rotation, or reductions in resolution. Additionally, we have tested a variety of existing image enhancement methods in an attempt to enhance their capability in restoring low-quality images, hence supporting improved functionality of HPE. Interestingly, for rotated images, using Pillow of OpenCV improves landmark recognition precision drastically, nearly restoring it to levels we see in high-quality images. In instances of brightness variation and in low-quality images, however, existing methods of enhancement fail to yield the improvements anticipated, highlighting a large direction of study that warrants further investigation and calls for additional research. In this regard, we proposed a wide-ranging system for classifying different types of image degradation systematically and for selecting appropriate algorithms for image restoration, in an effort to restore image quality. A key finding is that in a related study of current methods, the Tuned RotNet model achieves 92.04% accuracy, significantly outperforming the baseline model and surpassing the official RotNet model in predicting rotation degree of images, where the accuracy of official RotNet and Tuned RotNet classifiers were 61.59% and 92.04%, respectively. Furthermore, in an effort to facilitate future research and make it easier for other studies, we provide a new dataset of reference images and corresponding degenerated images, addressing a notable gap in controlled comparative studies, since currently there is a lack of controlled comparatives. Full article
(This article belongs to the Special Issue Artificial Intelligence for Signal, Image and Video Processing)
Show Figures

Graphical abstract

38 pages, 10491 KB  
Article
Development of Control Algorithms for an Adaptive Running Platform for a Musculoskeletal Rehabilitation System
by Artem Obukhov and Andrey Volkov
Sensors 2025, 25(21), 6667; https://doi.org/10.3390/s25216667 - 1 Nov 2025
Viewed by 568
Abstract
An essential component of modern musculoskeletal rehabilitation systems is treadmills of various sizes, the control of which may rely either on manual adjustment of treadmill speed, fixed for the entire training session, or on automatic regulation based on analysis of the user’s movements [...] Read more.
An essential component of modern musculoskeletal rehabilitation systems is treadmills of various sizes, the control of which may rely either on manual adjustment of treadmill speed, fixed for the entire training session, or on automatic regulation based on analysis of the user’s movements and velocity. The aim of this study was to experimentally compare the control functions of an adaptive treadmill designed for musculoskeletal rehabilitation and to assess the influence of the hardware configuration and tracking systems on user stability and the smoothness of transient processes. Two running platforms (of different lengths, one equipped with handrails and one without), two tracking systems (virtual reality trackers and a computer vision system using the MediaPipe Pose model), and three control functions—linear, nonlinear, and proportional-integral-derivative (PID)—were investigated. A set of metrics with both metrical and physiological interpretability was proposed (including positional stability, duration and amplitude of transient processes in position and velocity, subjective assessment, and others), all integrated into a single quality control criterion. This study presents extensive experimental research comparing various designs of adaptive running platforms and tracking systems, exploring the relationships between the available working area length and user comfort, and determining the optimal parameters for the selected control functions. The optimal control function was identified as the linear law for the tracking system based on virtual reality trackers and the PID function for the computer-vision-based tracking system. The conducted experiments made it possible to formulate recommendations regarding the minimum permissible working area length of treadmill platforms and the selection of tracking systems and control functions for musculoskeletal rehabilitation systems. The obtained results are of practical relevance for developing adaptive rehabilitation simulators and creating control algorithms that ensure smooth and stable treadmill motion, thereby enhancing user comfort, efficiency, and safety during musculoskeletal rehabilitation exercises. Full article
(This article belongs to the Section Biomedical Sensors)
Show Figures

Figure 1

18 pages, 1906 KB  
Article
Generalizable Interaction Recognition for Learning from Demonstration Using Wrist and Object Trajectories
by Jagannatha Charjee Pyaraka, Mats Isaksson, John McCormick, Sheila Sutjipto and Fouad Sukkar
Electronics 2025, 14(21), 4297; https://doi.org/10.3390/electronics14214297 - 31 Oct 2025
Viewed by 742
Abstract
Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition [...] Read more.
Learning from Demonstration (LfD) enables robots to acquire manipulation skills by observing human actions. However, existing methods often face challenges such as high computational cost, limited generalizability, and a loss of key interaction details. This study presents a compact representation for interaction recognition in LfD that encodes human–object interactions using 2D wrist trajectories and 3D object poses. A lightweight extraction pipeline combines MediaPipe-based wrist tracking with FoundationPose-based 6-DoF object estimation to obtain these trajectories directly from RGB-D video without specialized sensors or heavy preprocessing. Experiments on the GRAB and FPHA datasets show that the representation effectively captures task-relevant interactions, achieving 94.6% accuracy on GRAB and 96.0% on FPHA with well-calibrated probability predictions. Both Bidirectional Long Short-Term Memory (Bi-LSTM) with attention and Transformer architectures deliver consistent performance, confirming robustness and generalizability. The method achieves sub-second inference, a memory footprint under 1 GB, and reliable operation on both GPU and CPU platforms, enabling deployment on edge devices such as NVIDIA Jetson. By bridging pose-based and object-centric paradigms, this approach offers a compact and efficient foundation for scalable robot learning while preserving essential spatiotemporal dynamics. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

27 pages, 11871 KB  
Article
Experiences Using MediaPipe to Make the Arms of a Humanoid Robot Imitate a Video-Recorded Dancer Performing a Robot Dance
by Eduard Clotet, David Martínez and Jordi Palacín
Robotics 2025, 14(11), 153; https://doi.org/10.3390/robotics14110153 - 26 Oct 2025
Viewed by 2484
Abstract
This paper presents our first results obtained in the direction of using a humanoid robot to perform a robot dance at a level comparable to that of a human dancer. The scope of this first approach is limited to performing an offline analysis [...] Read more.
This paper presents our first results obtained in the direction of using a humanoid robot to perform a robot dance at a level comparable to that of a human dancer. The scope of this first approach is limited to performing an offline analysis of the movements of the arms of the dancer and to replicating these movements with the arms of the robot. To this end, the movements of a dancer performing a static robot dance (without moving the hips or feet) were recorded. These movements were analyzed offline using the MediaPipe BlazePose framework, adapted to the mechanical configuration of the arms of the humanoid robot, and finally reproduced by the robot. Results showed that MediaPipe has some inaccuracies when detecting sudden movements of the dancer’s arms that appeared blurred in the images. In general, the humanoid robot was capable of replicating the movement of the dancer’s arms but was unable to follow the original rhythm of the robotic dance due to acceleration limitations of its actuators. Full article
(This article belongs to the Section Humanoid and Human Robotics)
Show Figures

Figure 1

41 pages, 4151 KB  
Systematic Review
AI Video Analysis in Parkinson’s Disease: A Systematic Review of the Most Accurate Computer Vision Tools for Diagnosis, Symptom Monitoring, and Therapy Management
by Lazzaro di Biase, Pasquale Maria Pecoraro and Francesco Bugamelli
Sensors 2025, 25(20), 6373; https://doi.org/10.3390/s25206373 - 15 Oct 2025
Cited by 3 | Viewed by 2490
Abstract
Background. Clinical assessment of Parkinson’s disease (PD) is limited by high subjectivity and inter-rater variability. Markerless video analysis, namely Computer Vision (CV), offers objective and scalable characterization of motor signs. We systematically reviewed CV technologies suited for PD diagnosis, symptom monitoring, and treatment [...] Read more.
Background. Clinical assessment of Parkinson’s disease (PD) is limited by high subjectivity and inter-rater variability. Markerless video analysis, namely Computer Vision (CV), offers objective and scalable characterization of motor signs. We systematically reviewed CV technologies suited for PD diagnosis, symptom monitoring, and treatment management. Methods. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines, we searched PubMed for articles published between 1 January 1984 and 9 May 2025. We used the following search strategy: (“Parkinson Disease” [MeSH Terms] OR “parkinson’s disease” OR “parkinson disease”) AND (“computer vision” OR “video analysis” OR “pose estimation” OR “OpenPose” OR “DeepLabCut” OR “OpenFace” OR “YOLO” OR “MediaPipe” OR “markerless motion capture” OR “skeleton tracking”). Results. Out of 154 identified studies, 45 met eligibility criteria and were synthesized. Gait was assessed in 42% of studies, followed by bradykinesia items (17.7%). OpenPose and custom CV solutions were each used in 36% of studies, followed by MediaPipe (16%), DeepLabCut (9%), YOLO (4%). Across aims, CV pipelines consistently showed diagnostic discrimination and severity tracking aligned with expert ratings. Conclusions. CV non-invasively quantifies PD motor impairment, holding potential for objective diagnosis, longitudinal monitoring, and therapy response. Guidelines for standardized video-recording protocols and software usage are needed for real-world applications. Full article
(This article belongs to the Collection Sensors for Gait, Human Movement Analysis, and Health Monitoring)
Show Figures

Figure 1

Back to TopTop