Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (752)

Search Parameters:
Keywords = hand gestures

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 3479 KB  
Article
Electrospun Surface-Modified Epidermal Strain Sensors Enable Silent Speech and Hand Gesture Recognition for Virtual Reality Interaction
by Zuowei Wang, Fuzheng Zhang, Qijing Lin, Hongze Ke, Yueming Gao, Wufeng Zhang, Jiawen He, Yan Ma, Na Liu, Dan Xian, Ping Yang, Libo Zhao, Ryutaro Maeda, Yael Hanein and Zhuangde Jiang
Nanomaterials 2026, 16(9), 520; https://doi.org/10.3390/nano16090520 (registering DOI) - 25 Apr 2026
Abstract
Voice disorders severely limit verbal communication, creating a need for intuitive assistive technologies. To meet this need, we present epidermal strain sensors that capture strain signals during silent speech and hand gesture. A thin electrospun nanofiber layer integrated onto commercial polyurethane films guides [...] Read more.
Voice disorders severely limit verbal communication, creating a need for intuitive assistive technologies. To meet this need, we present epidermal strain sensors that capture strain signals during silent speech and hand gesture. A thin electrospun nanofiber layer integrated onto commercial polyurethane films guides uniform, controlled microcrack formation in screen-printed carbon conductive paths, achieving a gauge factor up to 243 over 0–40% strain. Signals from the seven-channel strain sensor array are recognized by a hybrid neural network that combines convolutional and Transformer architectures, reaching over 98% accuracy. The recognized outputs are rendered in virtual reality (VR), enabling intuitive, real-time communication. Moreover, the approach simplifies fabrication by enabling crack-based strain sensing with only a thin electrospun surface layer on commercial polyurethane films, eliminating the need for thick freestanding electrospun substrates. This cost-effective approach addresses limitations of conventional electrospun substrates by minimizing the thickness of the electrospun layer, thereby shortening the electrospinning time. Overall, the work demonstrates a method for translating natural non-verbal expressions into speech and text in VR, with promising applications in healthcare and assistive communication. Full article
(This article belongs to the Section Nanoelectronics, Nanosensors and Devices)
Show Figures

Figure 1

8 pages, 2823 KB  
Proceeding Paper
Innovative Filipino Sign Language Translation and Interpretation with MediaPipe
by Zylwyn A. Alejo, Nathan Cyvel Jann R. Fuentes, Maria Patricia Z. Lungay, Alpha Isabel D. Maniquez, Paul Emmanuel G. Empas and John Paul T. Cruz
Eng. Proc. 2026, 134(1), 75; https://doi.org/10.3390/engproc2026134075 - 22 Apr 2026
Viewed by 254
Abstract
Filipino Sign Language (FSL) serves as a vital means of communication for the Deaf and hard-of-hearing in the Philippines. However, its societal use remains limited due to the scarcity of qualified interpreters and the general lack of FSL literacy among the population. Therefore, [...] Read more.
Filipino Sign Language (FSL) serves as a vital means of communication for the Deaf and hard-of-hearing in the Philippines. However, its societal use remains limited due to the scarcity of qualified interpreters and the general lack of FSL literacy among the population. Therefore, this study aims to address the gap between FSL development and automated FSL translation by employing machine learning and computer vision techniques. A model was trained using the FSL-105 dataset, which comprises video clips of gestures related to greetings and colors, and utilized MediaPipe for real-time detection of hand, face, and body landmarks. Through iterative training with transfer learning, the model’s performance improved from an initial accuracy of 80% to a final accuracy of 98.75%. The results demonstrate that the MediaPipe-based model can reliably interpret FSL gestures, positioning it as a potentially accessible assistive tool for the Deaf and hard of hearing community. This technology holds promise for applications in education, healthcare, and public service, offering new opportunities to promote the social inclusion of Filipino Deaf communities through more inclusive communication. Full article
Show Figures

Figure 1

27 pages, 3995 KB  
Article
Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques
by Dana El-Rushaidat, Nour Almohammad, Raine Yeh and Kinda Fayyad
J. Imaging 2026, 12(4), 177; https://doi.org/10.3390/jimaging12040177 - 20 Apr 2026
Viewed by 275
Abstract
This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile [...] Read more.
This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community. Full article
(This article belongs to the Section Computer Vision and Pattern Recognition)
Show Figures

Figure 1

15 pages, 3786 KB  
Article
A Flexible Copper Electrode Array for High-Density Surface Electromyography
by Chaoxin Li, Chenghong Lu, Jiuqiang Li and Kai Guo
Bioengineering 2026, 13(4), 467; https://doi.org/10.3390/bioengineering13040467 - 16 Apr 2026
Viewed by 271
Abstract
Precise monitoring of forearm muscle groups is crucial for decoding motor intentions in human–machine interfaces (HMIs) and rehabilitation. However, traditional surface electromyography (sEMG) electrodes face significant challenges in densely packed muscle regions with large skin deformations, leading to severe signal crosstalk and unstable [...] Read more.
Precise monitoring of forearm muscle groups is crucial for decoding motor intentions in human–machine interfaces (HMIs) and rehabilitation. However, traditional surface electromyography (sEMG) electrodes face significant challenges in densely packed muscle regions with large skin deformations, leading to severe signal crosstalk and unstable contact. Here, we report a flexible, low-cost 16-channel copper electrode array system designed for the high-density monitoring of multiple forearm muscle activities. Through a facile fabrication process, rigid copper is transformed into a conformable sensing interface. The optimized serpentine interconnects endow the array with excellent stretchability and effectively isolate motion-induced stress, ensuring high-quality signal acquisition under complex deformations. The high-density 2 × 8 array enables the spatiotemporal mapping of distributed flexor and extensor muscle groups. Integrated with a customized wireless data acquisition system, the array successfully demonstrates real-time, multi-channel sEMG monitoring of various hand movements (e.g., fist clenching, wrist flexion/extension), clearly revealing specific muscle activation patterns. This low-cost, high-performance flexible sensor array provides a highly promising tool for complex gesture decoding, electromyographic imaging, and next-generation wearable HMIs. Full article
Show Figures

Figure 1

28 pages, 3548 KB  
Article
Edge Computing Approach to AI-Based Gesture for Human–Robot Interaction and Control
by Nikola Ivačko, Ivan Ćirić and Miloš Simonović
Computers 2026, 15(4), 241; https://doi.org/10.3390/computers15040241 - 14 Apr 2026
Viewed by 461
Abstract
This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection [...] Read more.
This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection within a unified perception–decision–control pipeline. Hand landmarks are extracted using MediaPipe Hands, from which continuous hand trajectories, static gestures, and dynamic gestures are derived. Task objects are detected using a YOLO-based model, and both hand and object observations are mapped into the robot workspace using ArUco-based planar calibration. To ensure stable robot motion, the hand control signal is smoothed using low-pass and Kalman filtering, while dynamic gestures such as waving are recognized using a lightweight LSTM classifier. The complete pipeline runs locally on edge hardware, specifically NVIDIA Jetson Orin Nano and Raspberry Pi 5 with a Hailo AI accelerator. Experimental evaluation includes trajectory stability, gesture recognition reliability, and runtime performance on both platforms. Results show that filtering significantly reduces hand-tracking jitter, gesture recognition provides stable command states for control, and both edge devices support real-time operation, with Jetson achieving consistently lower runtime than Raspberry Pi. The proposed system demonstrates the feasibility of low-cost edge AI solutions for responsive and practical human–robot interaction in collaborative industrial environments. Full article
(This article belongs to the Special Issue Intelligent Edge: When AI Meets Edge Computing)
Show Figures

Figure 1

14 pages, 2318 KB  
Article
A Flexible Wearable Data Glove Based on Hybrid Fiber-Optic Sensing for Hand Motion Monitoring
by Jing Li, Xiangting Hou, Ke Du, Huiying Piao and Cheng Li
Materials 2026, 19(8), 1525; https://doi.org/10.3390/ma19081525 - 10 Apr 2026
Viewed by 398
Abstract
Wearable data gloves often suffer from electromagnetic interference, insufficient substrate stability, and limited capability for multi-degree-of-freedom motion measurement. To address these limitations, a flexible glove incorporating a hybrid POF-FBG sensing scheme was designed and fabricated. Plastic optical fibers (POFs) were side-polished and patterned [...] Read more.
Wearable data gloves often suffer from electromagnetic interference, insufficient substrate stability, and limited capability for multi-degree-of-freedom motion measurement. To address these limitations, a flexible glove incorporating a hybrid POF-FBG sensing scheme was designed and fabricated. Plastic optical fibers (POFs) were side-polished and patterned with long-period gratings to improve sensitivity to wrist flexion-extension and abduction-adduction. Then fiber Bragg gratings (FBGs) were embedded in a polydimethylsiloxane substrate and encapsulated using thermoplastic polyurethane fixtures to reduce the influence of skin stretching and improve measurement accuracy of finger-joint angle. Moreover, a thermoplastic polyurethane skeleton with an adaptive sliding-rail structure was 3D printed to maintain the stability of the sensor placement at the joints. Experimental results demonstrated the mean absolute errors of 4.06°, 1.38° and 1.70° for wrist flexion-extension, abduction-adduction and finger-joint bending, respectively, along with excellent gesture classification using a support vector machine algorithm, which indicates great potential in virtual reality interaction and hand rehabilitation applications. Full article
(This article belongs to the Special Issue Advances in Optical Fiber Materials and Their Applications)
Show Figures

Figure 1

27 pages, 6782 KB  
Article
Development and Evaluation of a Data Glove-Based System for Assisting Puzzle Solving
by Shashank Srikanth Bharadwaj, Kazuma Sato and Lei Jing
Sensors 2026, 26(8), 2341; https://doi.org/10.3390/s26082341 - 10 Apr 2026
Viewed by 413
Abstract
Many hands-on tasks remain difficult to fully automate because they require human dexterity and flexible object handling. Data gloves offer a promising interface for sensing hand–object interactions, but most prior systems focus on gesture recognition or object classification rather than closed-loop, step-by-step task [...] Read more.
Many hands-on tasks remain difficult to fully automate because they require human dexterity and flexible object handling. Data gloves offer a promising interface for sensing hand–object interactions, but most prior systems focus on gesture recognition or object classification rather than closed-loop, step-by-step task guidance. In this work, we develop and evaluate a tactile-sensing operation support system using an e-textile data glove with 88 pressure sensors, a tactile pressure sheet for placement verification, and a GUI that provides step-by-step instructions. As a core component, a CNN classifies the grasped state as bare hand or one of four discs with 93.3% accuracy using 16,175 training samples collected from five participants. In a user study on the Tower of Hanoi task as a controlled proxy for multi-step manipulation, the system reduced mean solving time by 51.5% (from 242.6 s to 117.8 s), reduced the number of disc movements (35.4 to 15, about 20 fewer moves on average), and lowered perceived workload (NASA-TLX) by 53.1% (from 68.5 to 32.1), while achieving a SUS score of 75. These results demonstrate the feasibility of tactile-based step verification and guidance in a controlled multi-step task; broader generalization requires evaluation with larger and more diverse participant groups and tasks. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

7 pages, 1242 KB  
Proceeding Paper
Real-Time Recognition of Dual-Arm Motion Using Joint Direction Vectors and Temporal Deep Learning
by Yi-Hsiang Tseng, Che-Wei Hsu and Yih-Guang Leu
Eng. Proc. 2025, 120(1), 75; https://doi.org/10.3390/engproc2025120075 - 9 Apr 2026
Viewed by 226
Abstract
We developed a dual-arm motion recognition system designed for real-time upper-limb movement analysis using video input. The system integrates MediaPipe Hands for skeletal critical point detection, a feature extraction pipeline that encodes spatial and temporal characteristics from upper-limb joints, and a three-layer long [...] Read more.
We developed a dual-arm motion recognition system designed for real-time upper-limb movement analysis using video input. The system integrates MediaPipe Hands for skeletal critical point detection, a feature extraction pipeline that encodes spatial and temporal characteristics from upper-limb joints, and a three-layer long short-term memory network for temporal modeling and classification. By computing directional vectors from the shoulder to the elbow and wrist, a 168-dimensional feature vector is generated per frame. Sequences of 90 frames are used to capture full motion patterns. The system effectively supports multi-class recognition of coordinated dual-arm gestures, offering applications in rehabilitation, gesture control, and human–computer interaction. Full article
(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)
Show Figures

Figure 1

24 pages, 4042 KB  
Article
Memory Cueing and Augmented Sensory Feedback in Virtual Reality as an Assistive Technology for Enhancing Hand Motor Performance
by Zachary Marvin, Sophie Dewil, Yu Shi, Noam Y. Harel and Raviraj Nataraj
Technologies 2026, 14(4), 217; https://doi.org/10.3390/technologies14040217 - 8 Apr 2026
Viewed by 406
Abstract
Neurological injuries and disorders affecting hand motor control can severely impair the ability to perform activities of daily living and substantially reduce quality of life. Technologies such as virtual reality (VR) are increasingly used to address fundamental challenges in therapy, including motivation and [...] Read more.
Neurological injuries and disorders affecting hand motor control can severely impair the ability to perform activities of daily living and substantially reduce quality of life. Technologies such as virtual reality (VR) are increasingly used to address fundamental challenges in therapy, including motivation and engagement; further, programmable features of digital interfaces offer additional opportunities to personalize and optimize motor training. In this proof-of-concept study, we developed and evaluated a novel VR-based training framework to support improved dexterity and hand function using physiological (sensory-driven) and cognitive (memory) cues designed to promote greater task-relevant neural engagement. The proposed approach leverages the integration of augmented sensory feedback (ASF) with memory-anchored cues for motor learning of target hand gestures. Using a within-subjects design, thirteen neurotypical adults completed four training conditions: (1) control (baseline gesture-matching in VR), (2) visual ASF (enhanced visualization and feedback of gesture accuracy), (3) memory-anchored cues (associating gestures with semantically meaningful entities, loosely analogous to American Sign Language), and (4) hybrid multimodal (visual ASF + memory-anchored cues). Training with the hybrid condition produced the fastest skill acquisition (9.3 trials to reach an 80% accuracy threshold) and the steepest initial learning slope (1.86 ± 0.12%/trial), with all conditions differing significantly in initial slope (all p < 0.002). Post-training assessment showed that the hybrid condition achieved the highest gesture accuracy (95.2%), greatest normalized post-training accuracy gain (14.3% above baseline), fastest execution time to target gesture (1.14 s), and lowest variability in gestural kinematics (SD = 3.9%). Both ASF and memory-anchored cue conditions each also independently outperformed the control condition on gesture accuracy (both p ≤ 0.002), with omnibus ANOVAs indicating significant condition effects across metrics. Together, these findings suggest that pairing ASF cues with memory-based cognitive scaffolding can yield additive benefits for motor skill acquisition and stability. Pending validation in clinical populations, such approaches may inform the design of VR-based motor training frameworks for rehabilitation. Full article
Show Figures

Figure 1

16 pages, 1624 KB  
Article
Surface EMG-Based Hand Gesture Recognition Using a Hybrid Multistream Deep Learning Architecture
by Yusuf Çelik and Umit Can
Sensors 2026, 26(7), 2281; https://doi.org/10.3390/s26072281 - 7 Apr 2026
Viewed by 436
Abstract
Surface electromyography (sEMG) enables non-invasive measurement of muscle activity for applications such as human–machine interaction, rehabilitation, and prosthesis control. However, high noise levels, inter-subject variability, and the complex nature of muscle activation hinder robust gesture classification. This study proposes a multistream hybrid deep-learning [...] Read more.
Surface electromyography (sEMG) enables non-invasive measurement of muscle activity for applications such as human–machine interaction, rehabilitation, and prosthesis control. However, high noise levels, inter-subject variability, and the complex nature of muscle activation hinder robust gesture classification. This study proposes a multistream hybrid deep-learning architecture for the FORS-EMG dataset to address these challenges. The model integrates Temporal Convolutional Networks (TCN), depthwise separable convolutions, bidirectional Long Short-Term Memory (LSTM)–Gated Recurrent Unit (GRU) layers, and a Transformer encoder to capture complementary temporal and spectral patterns, and an ArcFace-based classifier to enhance class separability. We evaluate the approach under three protocols: subject-wise, random split without augmentation, and random split with augmentation. In the augmented random-split setting, the model attains 96.4% accuracy, surpassing previously reported values. In the subject-wise setting, accuracy is 74%, revealing limited cross-user generalization. The results demonstrate the method’s high performance and highlight the impact of data-partition strategies for real-world sEMG-based gesture recognition. Full article
(This article belongs to the Special Issue Machine Learning in Biomedical Signal Processing)
Show Figures

Figure 1

23 pages, 5784 KB  
Article
Learning Italian Hand Gesture Culture Through an Automatic Gesture Recognition Approach
by Chiara Innocente, Giorgio Di Pisa, Irene Lionetti, Andrea Mamoli, Manuela Vitulano, Giorgia Marullo, Simone Maffei, Enrico Vezzetti and Luca Ulrich
Future Internet 2026, 18(4), 177; https://doi.org/10.3390/fi18040177 - 24 Mar 2026
Viewed by 351
Abstract
Italian hand gestures constitute a distinctive and widely recognized form of nonverbal communication, deeply embedded in everyday interaction and cultural identity. Despite their prominence, these gestures are rarely formalized or systematically taught, posing challenges for foreign speakers and visitors seeking to interpret their [...] Read more.
Italian hand gestures constitute a distinctive and widely recognized form of nonverbal communication, deeply embedded in everyday interaction and cultural identity. Despite their prominence, these gestures are rarely formalized or systematically taught, posing challenges for foreign speakers and visitors seeking to interpret their meaning and pragmatic use. Moreover, their ephemeral and embodied nature complicates traditional preservation and transmission approaches, positioning them within the broader domain of intangible cultural heritage. This paper introduces a machine learning–based framework for recognizing iconic Italian hand gestures, designed to support cultural learning and engagement among foreign speakers and visitors. The approach combines RGB–D sensing with depth-enhanced geometric feature extraction, employing interpretable classification models trained on a purpose-built dataset. The recognition system is integrated into a non-immersive virtual reality application simulating an interactive digital totem conceived for public arrival spaces, providing tutorial content, real-time gesture recognition, and immediate feedback within a playful and accessible learning environment. Three supervised machine learning pipelines were evaluated, and Random Forest achieved the best overall performance. Its integration with an Isolation Forest module was further considered for deployment, achieving a macro-averaged accuracy and F1-score of 0.82 under a 5-fold cross-validation protocol. An experimental user study was conducted with 25 subjects to evaluate the proposed interactive system in terms of usability, user engagement, and learning effectiveness, obtaining favorable results and demonstrating its potential as a practical tool for cultural education and intercultural communication. Full article
Show Figures

Figure 1

19 pages, 759 KB  
Article
Dual-Stream BiLSTM–Transformer Architecture for Real-Time Two-Handed Dynamic Sign Language Gesture Recognition
by Enachi Andrei, Turcu Corneliu-Octavian, Culea George, Andrioaia Dragos-Alexandru, Ungureanu Andrei-Gabriel and Sghera Bogdan-Constantin
Appl. Sci. 2026, 16(6), 2912; https://doi.org/10.3390/app16062912 - 18 Mar 2026
Viewed by 293
Abstract
Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks [...] Read more.
Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks rely on early fusion strategies that merge joints, keypoints or landmarks from both hands in early processing stages, primarily to reduce model complexity and enforce a unified representation. In this work, a novel dual-stream BiLSTM–Transformer model architecture is proposed for two-handed dynamic sign language recognition, where parallel encoders process the trajectories of each hand independently. To capture spatial and temporal dependencies for each hand, an attention-based cross-hand fusion mechanism is employed, with hand landmarks extracted by the MediaPipe Hands framework as a preprocessing step to enable real-time CPU-based inference. Experimental evaluation conducted on custom Romanian Sign Language dynamic gesture datasets indicates that the proposed dual-stream-based system outperforms single-handed baselines, achieving improvements in high recognition accuracy for asymmetric gestures and consistent performance gains for synchronized two-handed gestures. The proposed architecture represents an efficient and lightweight solution suitable for real-time sign language recognition and interpretation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 3935 KB  
Article
PSO Trajectory Optimization of Robot Arm for Ultrasonic Testing of Complex Curved Surface
by Rao Yao, Yahui Lv, Kai Wang, Yan Gao and Dazhong Wang
Coatings 2026, 16(3), 332; https://doi.org/10.3390/coatings16030332 - 8 Mar 2026
Viewed by 301
Abstract
In ultrasonic nondestructive testing, maintaining the ultrasonic sensor in normal contact with curved surfaces is pivotal for acquiring valid defect signals. Replacing manual operation with a robotic arm ensures stable signal collection, while stable and fast trajectory planning for complex curved-surface tracking remains [...] Read more.
In ultrasonic nondestructive testing, maintaining the ultrasonic sensor in normal contact with curved surfaces is pivotal for acquiring valid defect signals. Replacing manual operation with a robotic arm ensures stable signal collection, while stable and fast trajectory planning for complex curved-surface tracking remains a key challenge. This research investigates gesture-driven robotic trajectory planning and impact optimization via the particle swarm optimization (PSO) algorithm in the robot joint space for rapid and smooth movement. Gesture trajectories are acquired via a Leap Motion device, with unified mapping established through spatial transformations among gesture, simulation, and experimental robot spaces. PSO is utilized to optimize trajectories, enhancing accuracy and controllability. Median filtering is applied to trajectory coordinate data to suppress errors from hand tremor and sensor limitations, followed by introducing a surface normal offset to generate pose matrices at each trajectory point. Systematic comparison of interpolation methods (polynomial, cubic spline, circular, cubic B-spline) reveals that cubic B-spline interpolation achieves the shortest execution time under angular acceleration constraints. The results show that PSO optimizes point-to-point trajectories based on 5-5-5 polynomial interpolation, with impact force and execution time as objectives, yielding the optimal trajectory with minimal time under acceleration constraints. This research provides valuable methodological references for robotic manipulator trajectory planning and optimization in complex curved-surface ultrasonic testing. Full article
(This article belongs to the Section Surface Characterization, Deposition and Modification)
Show Figures

Figure 1

17 pages, 1701 KB  
Article
CLIP-ArASL: A Lightweight Multimodal Model for Arabic Sign Language Recognition
by Naif Alasmari
Appl. Sci. 2026, 16(5), 2573; https://doi.org/10.3390/app16052573 - 7 Mar 2026
Viewed by 338
Abstract
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. [...] Read more.
Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. This paper introduces CLIP-ArASL, a lightweight CLIP-style multimodal approach for static ArASL letter recognition that aligns visual hand gestures with bilingual textual descriptions. The approach integrates an EfficientNet-B0 image encoder with a MiniLM text encoder to learn a shared embedding space using a hybrid objective that combines contrastive and cross-entropy losses. This design supports supervised classification on seen classes and zero-shot prediction on unseen classes using textual class representations. The proposed approach is evaluated on two public datasets, ArASL2018 and ArASL21L. Under supervised evaluation, recognition accuracies of 99.25±0.14% and 91.51±1.29% are achieved, respectively. Zero-shot performance is assessed by withholding 20% of gesture classes during training and predicting them using only their textual descriptions. In this setting, accuracies of 55.2±12.15% on ArASL2018 and 37.6±9.07% on ArASL21L are obtained. These results show that multimodal vision–language alignment supports semantic transfer and enables recognition of unseen classes. Full article
(This article belongs to the Special Issue Machine Learning in Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 417 KB  
Review
A Review of the Effectiveness of Hand Gestures in Second Language Phonetic Training
by Xiaotong Xi and Peng Li
Languages 2026, 11(3), 43; https://doi.org/10.3390/languages11030043 - 4 Mar 2026
Cited by 1 | Viewed by 690
Abstract
This narrative review synthesizes 24 empirical studies on the role of four types of pedagogical gestures (beat, durational, pitch, and articulatory) in second language (L2) phonetic training since 2010. We reviewed studies involving training interventions to assess the efficacy, mediating factors, and robustness [...] Read more.
This narrative review synthesizes 24 empirical studies on the role of four types of pedagogical gestures (beat, durational, pitch, and articulatory) in second language (L2) phonetic training since 2010. We reviewed studies involving training interventions to assess the efficacy, mediating factors, and robustness of multimodal training. The findings confirm that gestural training is a powerful tool, yielding the most robust positive effects for L2 speech production and the acquisition of suprasegmental features. Crucially, the effectiveness is highly dependent on gesture-sound consistency and visual saliency of the target phonetic/prosodic feature. However, results are mixed regarding perceptual learning and the generalization of gains to untrained items or novel contexts. While the literature supports the value of gestural training, there are gaps in determining the optimal training paradigm (observing gestures vs. performing gestures), accounting for individual learner differences, and establishing long-term retention and ecological validity. Future research should incorporate longitudinal designs and neurophysiological methods to fully illuminate the cognitive mechanisms that drive the body–mind link in L2 speech acquisition. Full article
Back to TopTop