MDPI - Publisher of Open Access Journals

19 pages, 759 KB

Open AccessArticle

Dual-Stream BiLSTM–Transformer Architecture for Real-Time Two-Handed Dynamic Sign Language Gesture Recognition

by Enachi Andrei, Turcu Corneliu-Octavian, Culea George, Andrioaia Dragos-Alexandru, Ungureanu Andrei-Gabriel and Sghera Bogdan-Constantin

Appl. Sci. 2026, 16(6), 2912; https://doi.org/10.3390/app16062912 - 18 Mar 2026

Viewed by 151

Abstract

Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks [...] Read more.

Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks rely on early fusion strategies that merge joints, keypoints or landmarks from both hands in early processing stages, primarily to reduce model complexity and enforce a unified representation. In this work, a novel dual-stream BiLSTM–Transformer model architecture is proposed for two-handed dynamic sign language recognition, where parallel encoders process the trajectories of each hand independently. To capture spatial and temporal dependencies for each hand, an attention-based cross-hand fusion mechanism is employed, with hand landmarks extracted by the MediaPipe Hands framework as a preprocessing step to enable real-time CPU-based inference. Experimental evaluation conducted on custom Romanian Sign Language dynamic gesture datasets indicates that the proposed dual-stream-based system outperforms single-handed baselines, achieving improvements in high recognition accuracy for asymmetric gestures and consistent performance gains for synchronized two-handed gestures. The proposed architecture represents an efficient and lightweight solution suitable for real-time sign language recognition and interpretation. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

12 pages, 3058 KB

Open AccessProceeding Paper

AI Facial Acupuncture Point Interactive Voice Health Care Teaching System

by Wen-Cheng Chen, Yu-Hsuan Chen, Yu-Hsing Chen, Jiu-Wen Wang, Hung-Jen Chen and Jr-Wei Tsai

Eng. Proc. 2026, 128(1), 37; https://doi.org/10.3390/engproc2026128037 - 16 Mar 2026

Viewed by 182

Abstract

We developed an AI-based system for facial acupoint recognition and healthcare support, integrating MediaPipe facial and hand tracking technologies to address the problems of inaccurate and non-standardized acupoint identification in traditional Chinese medicine (TCM). By leveraging facial landmark detection and fingertip tracking, the [...] Read more.

We developed an AI-based system for facial acupoint recognition and healthcare support, integrating MediaPipe facial and hand tracking technologies to address the problems of inaccurate and non-standardized acupoint identification in traditional Chinese medicine (TCM). By leveraging facial landmark detection and fingertip tracking, the system enables accurate localization of facial acupoints to ensure precise stimulation. The system contributes to the standardization of acupoint recognition, intelligent health consultation, and the digital transformation of TCM practices. Further enhancements are necessary by expanding acupoint recognition to other body parts (e.g., ears, hands, feet, and back) and integrating with wearable devices to further promote personalized and precise TCM healthcare. Full article

► Show Figures

Figure 1

22 pages, 3288 KB

Open AccessArticle

An Intelligent Real-Time System for Sentence-Level Recognition of Continuous Saudi Sign Language Using Landmark-Based Temporal Modeling

by Adel BenAbdennour, Mohammed Mukhtar, Osama Almolike, Bilal A. Khawaja and Abdulmajeed M. Alenezi

Sensors 2026, 26(5), 1652; https://doi.org/10.3390/s26051652 - 5 Mar 2026

Viewed by 386

Abstract

A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) [...] Read more.

A persistent challenge for Deaf and Hard-of-Hearing individuals is the communication gap between sign language users and the hearing community, particularly in regions with limited automated translation resources. In Saudi Arabia, this gap is amplified by the reliance on Saudi Sign Language (SSL) and the scarcity of real-time, sentence-level translation systems. This paper presents a real-time system for sentence-level recognition of continuous SSL and direct mapping to natural spoken Arabic. The proposed system operates end-to-end on live video streams or pre-recorded content, extracting spatio-temporal landmark features using the MediaPipe Holistic framework. For classification, the input feature vector consists of 225 features derived from hand and body pose landmarks. These features are processed by a Bidirectional Long Short-Term Memory (BiLSTM) network trained on the ArabSign (ArSL) dataset to perform direct sentence-level classification over a vocabulary of 50 continuous Arabic sign language sentences, supported by an idle-based segmentation mechanism that enables natural, uninterrupted signing. Experimental evaluation demonstrates robust generalization: under a Leave-One-Signer-Out (LOSO) cross-validation protocol, the model attains a mean sentence-level accuracy of 94.2%, outperforming the fixed signer-independent split baseline of 92.07%, while maintaining real-time performance suitable for interactive use. To enhance linguistic fluency, an optional post-recognition refinement stage is incorporated using a large language model (LLM), followed by text-to-speech synthesis to produce audible Arabic output; this refinement operates strictly as post-processing and is not included in the reported recognition accuracy metrics. The results demonstrate that direct sentence-level modeling, combined with landmark-based feature extraction and real-time segmentation, provides an effective and practical solution for continuous SSL sentence recognition in real-time. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

► Show Figures

Figure 1

19 pages, 18999 KB

Open AccessArticle

TFS Point-on-Hand Sign Recognition Using Part Affinity Fields

by Jinnavat Sanalohit and Tatpong Katanyukul

Appl. Sci. 2026, 16(5), 2416; https://doi.org/10.3390/app16052416 - 2 Mar 2026

Viewed by 215

Abstract

Our study investigates an application of a bottom-up design for keypoint regression, Part Affinity Fields (PAFs), for sign language recognition. Automatic sign language recognition could facilitate communication between deaf people and the hearing majority. Sign languages generally employ both semantic and finger-spelling signing. [...] Read more.

Our study investigates an application of a bottom-up design for keypoint regression, Part Affinity Fields (PAFs), for sign language recognition. Automatic sign language recognition could facilitate communication between deaf people and the hearing majority. Sign languages generally employ both semantic and finger-spelling signing. Semantic signing includes acting out to convey meaning, while finger spelling complements signing through the spelling out of proper names. Specifically, this article addresses an automatic recognition framework for the static point-on-hand (PoH) signing of Thai Finger Spelling (TFS)—the finger-spelling part of Thai Sign Language (TSL). From a pattern recognition perspective, PoH signing is quite distinct among signing schemes for requirement of precise localization of key parts on the signing hands. A recent study addressed PoH using an off-the-shelf version of MediaPipe Hands (MPH) and found shortcomings particularly when there was a high degree of hand-to-hand interaction. The top-down design of MPH was hypothesized to be the culprit. Our study investigates a bottom-up design, Part Affinity Fields (PAFs), along with examination of the related factors. The results support the hypothesis of a high-degree of hand-to-hand interaction posited by the MPH study. However, the overall performance of the PAF-based approach is shown to be modestly effective (72% accuracy vs. 58% and 47% of the MPH- and X-Pose-based approaches). In addition, its generalization is shown to be lacking. Thus TFS point-on-hand sign recognition remains a challenge. Full article

► Show Figures

Figure 1

17 pages, 1732 KB

Open AccessArticle

Lightweight Visual Dynamic Gesture Recognition System Based on CNN-LSTM-DSA

by Zhenxing Wang, Ziyan Wu, Ruidi Qi and Xuan Dou

Sensors 2026, 26(5), 1558; https://doi.org/10.3390/s26051558 - 2 Mar 2026

Viewed by 324

Abstract

Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, [...] Read more.

Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, extracts 21 keypoint 3D coordinates using MediaPipe, and employs a lightweight hybrid model to perform spatial and temporal feature modeling on keypoint sequences, achieving high-precision recognition of complex dynamic gestures. In static gesture recognition, the system determines the gesture state through joint angle calculation and a sliding window smoothing algorithm, ensuring smooth mapping of the servo motor angles and stability of the robotic hand’s movements. In dynamic gesture recognition, the system models the key point time series based on the CNN-LSTM-DSA hybrid model, enabling accurate classification and reproduction of gesture actions. Experimental results show that the proposed system demonstrates good robustness under various lighting and background conditions, with a static gesture recognition accuracy of up to 96%, dynamic gesture recognition accuracy of 90.19%, and an overall response delay of less than 300 ms. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

7 pages, 5296 KB

Open AccessProceeding Paper

Multi-Step Action Recognition for Long-Term Care Using Temporal Convolutional Network–Dynamic Time Warping–Finite State Machine and MediaPipe

by Feng-Jung Liu, Mei-Jou Lu and Min Chao

Eng. Proc. 2026, 129(1), 21; https://doi.org/10.3390/engproc2026129021 - 28 Feb 2026

Viewed by 203

Abstract

An intelligent multi-step action recognition system was designed for long-term caregiver training and assessment. Leveraging MediaPipe for precise and real-time human pose estimation, the system extracts detailed spatiotemporal body and hand keypoints. Temporal convolutional networks are employed to effectively capture temporal dependencies and [...] Read more.

An intelligent multi-step action recognition system was designed for long-term caregiver training and assessment. Leveraging MediaPipe for precise and real-time human pose estimation, the system extracts detailed spatiotemporal body and hand keypoints. Temporal convolutional networks are employed to effectively capture temporal dependencies and complex features from sequential motion data. Dynamic time warping provides robust sequence alignment, allowing flexible comparison between performed actions and standard templates despite temporal variations in execution speed or style. A finite state machine imposes logical constraints by modeling expected action step sequences, enabling accurate detection of sequence anomalies or deviations. This hybrid architecture supports comprehensive evaluation and real-time feedback, facilitating improved caregiver skill acquisition, process adherence, and quality control within long-term care settings. The system aims to advance digital transformation in healthcare education by providing a scalable, precise, and adaptive training solution. Full article

► Show Figures

Figure 1

17 pages, 7804 KB

Open AccessArticle

A 3D Camera-Based Approach for Real-Time Hand Configuration Recognition in Italian Sign Language

by Luca Ulrich, Asia De Luca, Riccardo Miraglia, Emma Mulassano, Simone Quattrocchio, Giorgia Marullo, Chiara Innocente, Federico Salerno and Enrico Vezzetti

Sensors 2026, 26(3), 1059; https://doi.org/10.3390/s26031059 - 6 Feb 2026

Viewed by 375

Abstract

Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign [...] Read more.

Deafness poses significant challenges to effective communication, particularly in contexts where access to sign language interpreters is limited. Hand configuration recognition represents a fundamental component of sign language understanding, as configurations constitute a core cheremic element in many sign languages, including Italian Sign Language (LIS). In this work, we address configuration-level recognition as an independent classification task and propose a machine vision framework based on RGB-D sensing. The proposed approach combines MediaPipe-based hand landmark extraction with normalized three-dimensional geometric features and a Support Vector Machine classifier. The first contribution of this study is the formulation of LIS hand configuration recognition as a standalone, configuration-level problem, decoupled from temporal gesture modeling. The second contribution is the integration of sensor-acquired RGB-D depth measurements into the landmark-based feature representation, enabling a direct comparison with estimated depth obtained from monocular data. The third contribution consists of a systematic experimental evaluation on two LIS configuration sets (6 and 16 classes), demonstrating that the use of real depth significantly improves classification performance and class separability, particularly for geometrically similar configurations. The results highlight the critical role of depth quality in configuration-level recognition and provide insights into the design of robust vision-based systems for LIS analysis. Full article

(This article belongs to the Special Issue Sensing and Machine Learning Control: Progress and Applications)

► Show Figures

Figure 1

27 pages, 11232 KB

Open AccessArticle

Aerokinesis: An IoT-Based Vision-Driven Gesture Control System for Quadcopter Navigation Using Deep Learning and ROS2

by Sergei Kondratev, Yulia Dyrchenkova, Georgiy Nikitin, Leonid Voskov, Vladimir Pikalov and Victor Meshcheryakov

Technologies 2026, 14(1), 69; https://doi.org/10.3390/technologies14010069 - 16 Jan 2026

Viewed by 710

Abstract

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in [...] Read more.

This paper presents Aerokinesis, an IoT-based software–hardware system for intuitive gesture-driven control of quadcopter unmanned aerial vehicles (UAVs), developed within the Robot Operating System 2 (ROS2) framework. The proposed system addresses the challenge of providing an accessible human–drone interaction interface for operators in scenarios where traditional remote controllers are impractical or unavailable. The architecture comprises two hierarchical control levels: (1) high-level discrete command control utilizing a fully connected neural network classifier for static gesture recognition, and (2) low-level continuous flight control based on three-dimensional hand keypoint analysis from a depth camera. The gesture classification module achieves an accuracy exceeding 99% using a multi-layer perceptron trained on MediaPipe-extracted hand landmarks. For continuous control, we propose a novel approach that computes Euler angles (roll, pitch, yaw) and throttle from 3D hand pose estimation, enabling intuitive four-degree-of-freedom quadcopter manipulation. A hybrid signal filtering pipeline ensures robust control signal generation while maintaining real-time responsiveness. Comparative user studies demonstrate that gesture-based control reduces task completion time by 52.6% for beginners compared to conventional remote controllers. The results confirm the viability of vision-based gesture interfaces for IoT-enabled UAV applications. Full article

(This article belongs to the Section Information and Communication Technologies)

► Show Figures

Figure 1

27 pages, 80350 KB

Open AccessArticle

Pose-Based Static Sign Language Recognition with Deep Learning for Turkish, Arabic, and American Sign Languages

by Rıdvan Yayla, Hakan Üçgün and Mahmud Abbas

Sensors 2026, 26(2), 524; https://doi.org/10.3390/s26020524 - 13 Jan 2026

Viewed by 694

Abstract

Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark [...] Read more.

Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark extraction, ensuring stable and consistent feature representation across diverse linguistic contexts. Datasets were meticulously constructed from nine public-domain sources (four Arabic, three American, and two Turkish). The final training data comprises curated image datasets, with frames for each language carefully selected from varying angles and distances to ensure high diversity. A comprehensive comparative evaluation was conducted across three state-of-the-art deep learning architectures—ConvNeXt (CNN-based), Swin Transformer (ViT-based), and Vision Mamba (SSM-based)—all applied to identical feature sets. The evaluation demonstrates the superior performance of contemporary vision Transformers and state space models in capturing subtle spatial cues across diverse sign languages. Our approach provides a comparative analysis of model generalization capabilities across three distinct sign languages, offering valuable insights for model selection in pose-based SLR systems. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

► Show Figures

Figure 1

27 pages, 4631 KB

Open AccessArticle

Multimodal Minimal-Angular-Geometry Representation for Real-Time Dynamic Mexican Sign Language Recognition

by Gerardo Garcia-Gil, Gabriela del Carmen López-Armas and Yahir Emmanuel Ramirez-Pulido

Technologies 2026, 14(1), 48; https://doi.org/10.3390/technologies14010048 - 8 Jan 2026

Viewed by 535

Abstract

Current approaches to dynamic sign language recognition commonly rely on dense landmark representations, which impose high computational cost and hinder real-time deployment on resource-constrained devices. To address this limitation, this work proposes a computationally efficient framework for real-time dynamic Mexican Sign Language (MSL) [...] Read more.

Current approaches to dynamic sign language recognition commonly rely on dense landmark representations, which impose high computational cost and hinder real-time deployment on resource-constrained devices. To address this limitation, this work proposes a computationally efficient framework for real-time dynamic Mexican Sign Language (MSL) recognition based on a multimodal minimal angular-geometry representation. Instead of processing complete landmark sets (e.g., MediaPipe Holistic with up to 468 keypoints), the proposed method encodes the relational geometry of the hands, face, and upper body into a compact set of 28 invariant internal angular descriptors. This representation substantially reduces feature dimensionality and computational complexity while preserving linguistically relevant manual and non-manual information required for grammatical and semantic discrimination in MSL. A real-time end-to-end pipeline is developed, comprising multimodal landmark extraction, angular feature computation, and temporal modeling using a Bidirectional Long Short-Term Memory (BiLSTM) network. The system is evaluated on a custom dataset of dynamic MSL gestures acquired under controlled real-time conditions. Experimental results demonstrate that the proposed approach achieves 99% accuracy and 99% macro F1-score, matching state-of-the-art performance while using fewer features dramatically. The compactness, interpretability, and efficiency of the minimal angular descriptor make the proposed system suitable for real-time deployment on low-cost devices, contributing toward more accessible and inclusive sign language recognition technologies. Full article

(This article belongs to the Special Issue Image Analysis and Processing)

► Show Figures

Figure 1

31 pages, 9303 KB

Open AccessArticle

Automatic Quadrotor Dispatch Missions Based on Air-Writing Gesture Recognition

by Pu-Sheng Tsai, Ter-Feng Wu and Yen-Chun Wang

Processes 2025, 13(12), 3984; https://doi.org/10.3390/pr13123984 - 9 Dec 2025

Viewed by 676

Abstract

This study develops an automatic dispatch system for quadrotor UAVs that integrates air-writing gesture recognition with a graphical user interface (GUI). The DJI RoboMaster quadrotor UAV (DJI, Shenzhen, China) was employed as the experimental platform, combined with an ESP32 microcontroller (Espressif Systems, Shanghai, [...] Read more.

This study develops an automatic dispatch system for quadrotor UAVs that integrates air-writing gesture recognition with a graphical user interface (GUI). The DJI RoboMaster quadrotor UAV (DJI, Shenzhen, China) was employed as the experimental platform, combined with an ESP32 microcontroller (Espressif Systems, Shanghai, China) and the RoboMaster SDK (version 3.0). On the Python (version 3.12.7) platform, a GUI was implemented using Tkinter (version 8.6), allowing users to input addresses or landmarks, which were then automatically converted into geographic coordinates and imported into Google Maps for route planning. The generated flight commands were transmitted to the UAV via a UDP socket, enabling remote autonomous flight. For gesture recognition, a Raspberry Pi integrated with the MediaPipe Hands module was used to capture 16 types of air-written flight commands in real time through a camera. The training samples were categorized into one-dimensional coordinates and two-dimensional images. In the one-dimensional case, X/Y axis coordinates were concatenated after data augmentation, interpolation, and normalization. In the two-dimensional case, three types of images were generated, namely font trajectory plots (T-plots), coordinate-axis plots (XY-plots), and composite plots combining the two (XYT-plots). To evaluate classification performance, several machine learning and deep learning architectures were employed, including a multi-layer perceptron (MLP), support vector machine (SVM), one-dimensional convolutional neural network (1D-CNN), and two-dimensional convolutional neural network (2D-CNN). The results demonstrated effective recognition accuracy across different models and sample formats, verifying the feasibility of the proposed air-writing trajectory framework for non-contact gesture-based UAV control. Furthermore, by combining gesture recognition with a GUI-based map planning interface, the system enhances the intuitiveness and convenience of UAV operation. Future extensions, such as incorporating aerial image object recognition, could extend the framework’s applications to scenarios including forest disaster management, vehicle license plate recognition, and air pollution monitoring. Full article

(This article belongs to the Special Issue Artificial Intelligence-Based Analytics for Data-Driven Decision-Making in Industrial Process Engineering)

► Show Figures

Figure 1

17 pages, 3038 KB

Open AccessArticle

Research on Deep Learning-Based Human–Robot Static/Dynamic Gesture-Driven Control Framework

by Gong Zhang, Jiahong Su, Shuzhong Zhang, Jianzheng Qi, Zhicheng Hou and Qunxu Lin

Sensors 2025, 25(23), 7203; https://doi.org/10.3390/s25237203 - 25 Nov 2025

Cited by 1 | Viewed by 1154

Abstract

For human–robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid [...] Read more.

For human–robot gesture-driven control, this paper proposes a deep learning-based approach that employs both static and dynamic gestures to drive and control robots for object-grasping and delivery tasks. The method utilizes two-dimensional Convolutional Neural Networks (2D-CNNs) for static gesture recognition and a hybrid architecture combining three-dimensional Convolutional Neural Networks (3D-CNNs) and Long Short-Term Memory networks (3D-CNN+LSTM) for dynamic gesture recognition. Results on a custom gesture dataset demonstrate validation accuracies of 95.38% for static gestures and 93.18% for dynamic gestures, respectively. Then, in order to control and drive the robot to perform corresponding tasks, hand pose estimation was performed. The MediaPipe machine learning framework was first employed to extract hand feature points. These 2D feature points were then converted into 3D coordinates using a depth camera-based pose estimation method, followed by coordinate system transformation to obtain hand poses relative to the robot’s base coordinate system. Finally, an experimental platform for human–robot gesture-driven interaction was established, deploying both gesture recognition models. Four participants were invited to perform 100 trials each of gesture-driven object-grasping and delivery tasks under three lighting conditions: natural light, low light, and strong light. Experimental results show that the average success rates for completing tasks via static and dynamic gestures are no less than 96.88% and 94.63%, respectively, with task completion times consistently within 20 s. These findings demonstrate that the proposed approach enables robust vision-based robotic control through natural hand gestures, showing great prospects for human–robot collaboration applications. Full article

(This article belongs to the Collection Advances in Human-Robot Interaction: Sensing, Cognition and Control)

► Show Figures

Figure 1

22 pages, 8469 KB

Open AccessArticle

Virtual Trainer for Learning Mexican Sign Language Using Video Similarity Analysis

by Felipe de Jesús Rivera-Cervantes, Diana-Margarita Córdova-Esparza, Juan Terven, Julio-Alejandro Romero-González, Jaime-Rodrigo González-Rodríguez, Mauricio-Arturo Ibarra-Corona and Pedro-Alfonso Ramírez-Pedraza

Technologies 2025, 13(12), 540; https://doi.org/10.3390/technologies13120540 - 21 Nov 2025

Viewed by 726

Abstract

Learning Mexican Sign Language (MSL) benefits from interactive systems that provide immediate feedback without requiring specialized sensors. This work presents a virtual training platform that operates with a conventional RGB camera and applies computer vision techniques to guide learners in real time. A [...] Read more.

Learning Mexican Sign Language (MSL) benefits from interactive systems that provide immediate feedback without requiring specialized sensors. This work presents a virtual training platform that operates with a conventional RGB camera and applies computer vision techniques to guide learners in real time. A dataset of 335 videos was recorded across 12 lessons with professional interpreters and used as the reference material for practice sessions. From each video, 48 keypoints corresponding to hands and facial landmarks were extracted using MediaPipe, normalized, and compared with user trajectories through Dynamic Time Warping (DTW). A sign is accepted when the DTW distance is below a similarity threshold, allowing users to receive quantitative feedback on performance. Additionally, an experimental baseline using video embeddings generated by the Qwen2.5-VL, VideoMAEv2, and VJEPA2 models and classified via Matching Networks was evaluated for scalability. Results show that the DTW-based module provides accurate and interpretable feedback for guided practice with minimal computational cost, while the embedding-based approach serves as an exploratory baseline for larger-scale classification and semi-automatic labeling. A user study with 33 participants evidenced feasibility and perceived usefulness (all category means significantly above neutral; Cronbach’s

α = 0.81

). Overall, the proposed framework offers an accessible, low-cost, and effective solution for inclusive MSL education and represents a promising foundation for future multimodal sign-language learning tools. Full article

(This article belongs to the Special Issue Innovations in Design, Development and Evaluation of Assistive Technologies)

► Show Figures

Figure 1

21 pages, 4379 KB

Open AccessArticle

ReHAb Playground: A DL-Based Framework for Game-Based Hand Rehabilitation

by Samuele Rasetto, Giorgia Marullo, Ludovica Adamo, Federico Bordin, Francesca Pavesi, Chiara Innocente, Enrico Vezzetti and Luca Ulrich

Future Internet 2025, 17(11), 522; https://doi.org/10.3390/fi17110522 - 17 Nov 2025

Cited by 1 | Viewed by 1359

Abstract

Hand rehabilitation requires consistent, repetitive exercises that can often reduce patient motivation, especially in home-based therapy. This study introduces ReHAb Playground, a deep learning-based system that merges real-time gesture recognition with 3D hand tracking to create an engaging and adaptable rehabilitation experience built [...] Read more.

Hand rehabilitation requires consistent, repetitive exercises that can often reduce patient motivation, especially in home-based therapy. This study introduces ReHAb Playground, a deep learning-based system that merges real-time gesture recognition with 3D hand tracking to create an engaging and adaptable rehabilitation experience built in the Unity Game Engine. The system utilizes a YOLOv10n model for hand gesture classification and MediaPipe Hands for 3D hand landmark extraction. Three mini-games were developed to target specific motor and cognitive functions: Cube Grab, Coin Collection, and Simon Says. Key gameplay parameters, namely repetitions, time limits, and gestures, can be tuned according to therapeutic protocols. Experiments with healthy participants were conducted to establish reference performance ranges based on average completion times and standard deviations. The results showed a consistent decrease in both task completion and gesture times across trials, indicating learning effects and improved control of gesture-based interactions. The most pronounced improvement was observed in the more complex Coin Collection task, confirming the system’s ability to support skill acquisition and engagement in rehabilitation-oriented activities. ReHAb Playground was conceived with modularity and scalability at its core, enabling the seamless integration of additional exercises, gesture libraries, and adaptive difficulty mechanisms. While preliminary, the findings highlight its promise as an accessible, low-cost rehabilitation platform suitable for home use, capable of monitoring motor progress over time and enhancing patient adherence through engaging, game-based interactions. Future developments will focus on clinical validation with patient populations and the implementation of adaptive feedback strategies to further personalize the rehabilitation process. Full article

(This article belongs to the Special Issue Advances in Deep Learning and Next-Generation Internet Technologies)

► Show Figures

Graphical abstract

15 pages, 2030 KB

Open AccessArticle

Automated Classification of Baseball Pitching Phases Using Machine Learning and Artificial Intelligence-Based Posture Estimation

by Shin Osawa, Atsuyuki Inui, Yutaka Mifune, Kohei Yamaura, Tomoya Yoshikawa, Issei Shinohara, Masaya Kusunose, Shuya Tanaka, Shunsaku Takigami, Yutaka Ehara, Daiji Nakabayashi, Takanobu Higashi, Ryota Wakamatsu, Shinya Hayashi, Tomoyuki Matsumoto and Ryosuke Kuroda

Appl. Sci. 2025, 15(22), 12155; https://doi.org/10.3390/app152212155 - 16 Nov 2025

Viewed by 1572

Abstract

High-precision analyses of baseball pitching have traditionally relied on optical motion capture systems, which, despite their accuracy, are complex and impractical for widespread use. Classifying sequential pitching phases, essential for biomechanical evaluation, conventionally requires manual expert labeling, a time-consuming and labor-intensive process. Accurate [...] Read more.

High-precision analyses of baseball pitching have traditionally relied on optical motion capture systems, which, despite their accuracy, are complex and impractical for widespread use. Classifying sequential pitching phases, essential for biomechanical evaluation, conventionally requires manual expert labeling, a time-consuming and labor-intensive process. Accurate identification of phase boundaries is critical because they correspond to key temporal events related to pitching injuries. This study developed and validated a smartphone-based system for automatically classifying the five key pitching phases—wind-up, stride, arm-cocking, arm acceleration, and follow-through—using pose estimation artificial intelligence and machine learning. Slow-motion videos (240 frames per second, 1080p) of 500 healthy right-handed high school pitchers were recorded from the front using a single smartphone. Skeletal landmarks were extracted using MediaPipe, and 33 kinematic features, including joint angles and limb distances, were computed. Expert-annotated phase labels were used to train classification models. Among the models evaluated, Light Gradient Boosting Machine (LightGBM) achieved a classification accuracy of 99.7% and processed each video in a few seconds demonstrating feasibility for on-site analysis. This system enables high-accuracy phase classification directly from video without motion capture, supporting future tools to detect abnormal pitching mechanics, prevent throwing-related injuries, and broaden access to pitching analysis. Full article

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization, 2nd Edition)

► Show Figures

Figure 1

Search Results (71)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (71)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI