MDPI - Publisher of Open Access Journals

26 pages, 5171 KB

Open AccessArticle

A Deep Forest and Histogram Feature Fusion Framework for sEMG-Based Hand Gesture Recognition with Enhanced Signal Representation

by Huibin Li, Xiaorong Guan, Sijing Wang and Zhihua Yuan

Electronics 2026, 15(9), 1935; https://doi.org/10.3390/electronics15091935 - 2 May 2026

Abstract

A novel hand gesture recognition framework based on surface electromyography (sEMG) is proposed for soldier operational scenarios under small-sample conditions. The framework integrates Empirical Mode Decomposition (EMD) for signal reconstruction, histogram-based features, and the Deep Forest (DF) classifier. Evaluations are conducted under two [...] Read more.

A novel hand gesture recognition framework based on surface electromyography (sEMG) is proposed for soldier operational scenarios under small-sample conditions. The framework integrates Empirical Mode Decomposition (EMD) for signal reconstruction, histogram-based features, and the Deep Forest (DF) classifier. Evaluations are conducted under two protocols: subject-wise evaluation and mixed-subject nested 8-fold cross-validation. Under subject-wise evaluation, the proposed EMD-HIST-DF method achieves 99.94% accuracy with 0.00027 ms per sample. Under mixed-subject nested 8-fold cross-validation, 98.41% accuracy is maintained with 0.00053 ms per sample. Ablation studies confirm the significant contribution of EMD-based signal enhancement in the mixed-subject setting (approximately 10.6 percentage points, p < 0.001). Parameter sensitivity analysis guides optimal parameter selection, and statistical tests confirm significant performance gains over baseline methods. Confusion matrices illustrate high per-class accuracy with minimal inter-class confusion. The framework shows potential as a promising solution for accurate, efficient, and sample-sparing gesture recognition in resource-constrained environments such as supernumerary robotic limb control. Full article

(This article belongs to the Section Circuit and Signal Processing)

► Show Figures

Figure 1

28 pages, 12791 KB

Open AccessArticle

Empirical Validation of Fitts’ Law in Virtual Reality: Modeling, Prediction, and Modality Comparison

by Nikolina Rodin, Dario Ogrizović, Luka Batistić and Sandi Ljubic

Multimodal Technol. Interact. 2026, 10(5), 49; https://doi.org/10.3390/mti10050049 - 1 May 2026

Viewed by 9

Abstract

Fitts’ law is a foundational model for predicting pointing performance and has been increasingly explored in immersive virtual reality (VR) environments. This paper presents a controlled experimental framework for deriving modality-specific Fitts’ law models in VR and evaluating their predictive transfer to applied [...] Read more.

Fitts’ law is a foundational model for predicting pointing performance and has been increasingly explored in immersive virtual reality (VR) environments. This paper presents a controlled experimental framework for deriving modality-specific Fitts’ law models in VR and evaluating their predictive transfer to applied interaction tasks. The framework comprises two scenarios. The first replicates a standardized ISO 9241 pointing task in a 3D virtual environment to derive predictive movement time models by systematically varying target distance (20–50 cm), target size (2.5–5 cm), and spatial configuration (

0^{\circ}

,

45^{\circ}

,

90^{\circ}

,

135^{\circ}

). The second simulates an applied warehouse-inspired task involving tool sorting and structured placement actions to evaluate the generalizability of the derived models in more ecologically valid VR interactions. Thirty-two participants completed all tasks using the Meta Quest 3 headset and two interaction modalities: a handheld controller and hand tracking with gesture recognition. Results show that Fitts’ law remains a strong predictor of movement time for 3D pointing in VR, with high linear fits for both the controller (

R^{2} = 0.9615

) and hand tracking (

R^{2} = 0.9668

). However, models derived from standardized pointing tasks showed limited transferability to applied object-manipulation scenarios, producing prediction errors of approximately 27–35% and systematically underestimating movement times. Additionally, both objective metrics and subjective evaluations indicated that controller-based interaction outperformed hand tracking in efficiency, accuracy, perceived workload, and usability. These findings highlight both the robustness and limitations of Fitts-based performance modeling in realistic VR interaction contexts. Full article

27 pages, 7124 KB

Open AccessArticle

HAMSNet: An Explainable Multi-Scale 1D Hydra-CNN for sEMG-Based Hand Gesture Recognition

by Nebras Sobahi, Salih Taha Alperen Özçelik, Muhammed Halil Akpınar and Abdulkadir Sengur

Symmetry 2026, 18(5), 777; https://doi.org/10.3390/sym18050777 - 1 May 2026

Viewed by 51

Abstract

Background/Objectives: Surface Electromyography (sEMG) presents tremendous potential as a non-invasive interface for the detection of motor intent, yet the low signal-to-noise ratio, subject variability, and the need to capture patterns at both long and short timescales make the recognition of hand gestures [...] Read more.

Background/Objectives: Surface Electromyography (sEMG) presents tremendous potential as a non-invasive interface for the detection of motor intent, yet the low signal-to-noise ratio, subject variability, and the need to capture patterns at both long and short timescales make the recognition of hand gestures challenging. Methods: In this paper, the HAMSNet model is presented, which is designed for the recognition of ten different hand gestures using the sEMG signal. Sliding window segmentation is employed to segment the signal into fixed-length time windows, and channel-wise z-score normalization is applied to reduce amplitude variations. To capture the signal at different timescales, the model utilizes the Hydra 1D convolutional neural network (1D CNN), which extracts both short-range and long-range features. Furthermore, the learned features are refined using the multi-head self-attention technique, which highlights the more discriminative time regions. Finally, the Squeeze-and-Excitation (SE) technique is employed to refine the obtained features by channel-wise recalibration. Results: The model is trained in end-to-end fashion, and the results are validated using the 80/20 split method, where the model achieves 0.9894 accuracy, Macro F1 of 0.9894, and an ROC-AUC score of 0.99977. Additionally, the model achieves an MSE score of 0.001969. Furthermore, the model also achieves high accuracy under the leave-one-subject-out cross-validation (LOSO-CV) protocol, providing encouraging evidence of subject-independent performance within the evaluated dataset. Conclusions: The obtained HAMSNet model’s results are compared with the existing results from the literature on the same dataset. The comparisons show that the HAMSNet outperforms the existing methods. An ablation study is conducted to validate the contribution of each component to the proposed model and an explainability analysis is conducted to indicate the interpretability of the model’s decisions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

14 pages, 3479 KB

Open AccessArticle

Electrospun Surface-Modified Epidermal Strain Sensors Enable Silent Speech and Hand Gesture Recognition for Virtual Reality Interaction

by Zuowei Wang, Fuzheng Zhang, Qijing Lin, Hongze Ke, Yueming Gao, Wufeng Zhang, Jiawen He, Yan Ma, Na Liu, Dan Xian, Ping Yang, Libo Zhao, Ryutaro Maeda, Yael Hanein and Zhuangde Jiang

Nanomaterials 2026, 16(9), 520; https://doi.org/10.3390/nano16090520 - 25 Apr 2026

Viewed by 723

Abstract

Voice disorders severely limit verbal communication, creating a need for intuitive assistive technologies. To meet this need, we present epidermal strain sensors that capture strain signals during silent speech and hand gesture. A thin electrospun nanofiber layer integrated onto commercial polyurethane films guides [...] Read more.

Voice disorders severely limit verbal communication, creating a need for intuitive assistive technologies. To meet this need, we present epidermal strain sensors that capture strain signals during silent speech and hand gesture. A thin electrospun nanofiber layer integrated onto commercial polyurethane films guides uniform, controlled microcrack formation in screen-printed carbon conductive paths, achieving a gauge factor up to 243 over 0–40% strain. Signals from the seven-channel strain sensor array are recognized by a hybrid neural network that combines convolutional and Transformer architectures, reaching over 98% accuracy. The recognized outputs are rendered in virtual reality (VR), enabling intuitive, real-time communication. Moreover, the approach simplifies fabrication by enabling crack-based strain sensing with only a thin electrospun surface layer on commercial polyurethane films, eliminating the need for thick freestanding electrospun substrates. This cost-effective approach addresses limitations of conventional electrospun substrates by minimizing the thickness of the electrospun layer, thereby shortening the electrospinning time. Overall, the work demonstrates a method for translating natural non-verbal expressions into speech and text in VR, with promising applications in healthcare and assistive communication. Full article

(This article belongs to the Section Nanoelectronics, Nanosensors and Devices)

► Show Figures

Figure 1

27 pages, 3995 KB

Open AccessArticle

Video-Based Arabic Sign Language Recognition with Mediapipe and Deep Learning Techniques

by Dana El-Rushaidat, Nour Almohammad, Raine Yeh and Kinda Fayyad

J. Imaging 2026, 12(4), 177; https://doi.org/10.3390/jimaging12040177 - 20 Apr 2026

Viewed by 410

Abstract

This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile [...] Read more.

This paper addresses the critical communication barrier experienced by deaf and hearing-impaired individuals in the Arab world through the development of an affordable, video-based Arabic Sign Language (ArSL) recognition system. Designed for broad accessibility, the system eliminates specialized hardware by leveraging standard mobile or laptop cameras. Our methodology employs Mediapipe for real-time extraction of hand, face, and pose landmarks from video streams. These anatomical features are then processed by a hybrid deep learning model integrating Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), specifically Bidirectional Long Short-Term Memory (BiLSTM) layers. The CNN component captures spatial features, such as intricate hand shapes and body movements, within individual frames. Concurrently, BiLSTMs model long-term temporal dependencies and motion trajectories across consecutive frames. This integrated CNN-BiLSTM architecture is critical for generating a comprehensive spatiotemporal representation, enabling accurate differentiation of complex signs where meaning relies on both static gestures and dynamic transitions, thus preventing misclassification that CNN-only or RNN-only models would incur. Rigorously evaluated on the author-created JUST-SL dataset and the publicly available KArSL dataset, the system achieved 96% overall accuracy for JUST-SL and an impressive 99% for KArSL. These results demonstrate the system’s superior accuracy compared to previous research, particularly for recognizing full Arabic words, thereby significantly enhancing communication accessibility for the deaf and hearing-impaired community. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

28 pages, 3548 KB

Open AccessArticle

Edge Computing Approach to AI-Based Gesture for Human–Robot Interaction and Control

by Nikola Ivačko, Ivan Ćirić and Miloš Simonović

Computers 2026, 15(4), 241; https://doi.org/10.3390/computers15040241 - 14 Apr 2026

Viewed by 621

Abstract

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection [...] Read more.

This paper presents an edge-deployable vision-based framework for human–robot interaction using a xArm collaborative robot and a single RGB camera mounted on the robot wrist, and lightweight AI-based perception modules. The system enables intuitive, contact-free control by combining hand understanding and object detection within a unified perception–decision–control pipeline. Hand landmarks are extracted using MediaPipe Hands, from which continuous hand trajectories, static gestures, and dynamic gestures are derived. Task objects are detected using a YOLO-based model, and both hand and object observations are mapped into the robot workspace using ArUco-based planar calibration. To ensure stable robot motion, the hand control signal is smoothed using low-pass and Kalman filtering, while dynamic gestures such as waving are recognized using a lightweight LSTM classifier. The complete pipeline runs locally on edge hardware, specifically NVIDIA Jetson Orin Nano and Raspberry Pi 5 with a Hailo AI accelerator. Experimental evaluation includes trajectory stability, gesture recognition reliability, and runtime performance on both platforms. Results show that filtering significantly reduces hand-tracking jitter, gesture recognition provides stable command states for control, and both edge devices support real-time operation, with Jetson achieving consistently lower runtime than Raspberry Pi. The proposed system demonstrates the feasibility of low-cost edge AI solutions for responsive and practical human–robot interaction in collaborative industrial environments. Full article

(This article belongs to the Special Issue Intelligent Edge: When AI Meets Edge Computing)

► Show Figures

Figure 1

27 pages, 6782 KB

Open AccessArticle

Development and Evaluation of a Data Glove-Based System for Assisting Puzzle Solving

by Shashank Srikanth Bharadwaj, Kazuma Sato and Lei Jing

Sensors 2026, 26(8), 2341; https://doi.org/10.3390/s26082341 - 10 Apr 2026

Viewed by 452

Abstract

Many hands-on tasks remain difficult to fully automate because they require human dexterity and flexible object handling. Data gloves offer a promising interface for sensing hand–object interactions, but most prior systems focus on gesture recognition or object classification rather than closed-loop, step-by-step task [...] Read more.

Many hands-on tasks remain difficult to fully automate because they require human dexterity and flexible object handling. Data gloves offer a promising interface for sensing hand–object interactions, but most prior systems focus on gesture recognition or object classification rather than closed-loop, step-by-step task guidance. In this work, we develop and evaluate a tactile-sensing operation support system using an e-textile data glove with 88 pressure sensors, a tactile pressure sheet for placement verification, and a GUI that provides step-by-step instructions. As a core component, a CNN classifies the grasped state as bare hand or one of four discs with 93.3% accuracy using 16,175 training samples collected from five participants. In a user study on the Tower of Hanoi task as a controlled proxy for multi-step manipulation, the system reduced mean solving time by 51.5% (from 242.6 s to 117.8 s), reduced the number of disc movements (35.4 to 15, about 20 fewer moves on average), and lowered perceived workload (NASA-TLX) by 53.1% (from 68.5 to 32.1), while achieving a SUS score of 75. These results demonstrate the feasibility of tactile-based step verification and guidance in a controlled multi-step task; broader generalization requires evaluation with larger and more diverse participant groups and tasks. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

7 pages, 1242 KB

Open AccessProceeding Paper

Real-Time Recognition of Dual-Arm Motion Using Joint Direction Vectors and Temporal Deep Learning

by Yi-Hsiang Tseng, Che-Wei Hsu and Yih-Guang Leu

Eng. Proc. 2025, 120(1), 75; https://doi.org/10.3390/engproc2025120075 - 9 Apr 2026

Viewed by 246

Abstract

We developed a dual-arm motion recognition system designed for real-time upper-limb movement analysis using video input. The system integrates MediaPipe Hands for skeletal critical point detection, a feature extraction pipeline that encodes spatial and temporal characteristics from upper-limb joints, and a three-layer long [...] Read more.

We developed a dual-arm motion recognition system designed for real-time upper-limb movement analysis using video input. The system integrates MediaPipe Hands for skeletal critical point detection, a feature extraction pipeline that encodes spatial and temporal characteristics from upper-limb joints, and a three-layer long short-term memory network for temporal modeling and classification. By computing directional vectors from the shoulder to the elbow and wrist, a 168-dimensional feature vector is generated per frame. Sequences of 90 frames are used to capture full motion patterns. The system effectively supports multi-class recognition of coordinated dual-arm gestures, offering applications in rehabilitation, gesture control, and human–computer interaction. Full article

(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)

► Show Figures

Figure 1

16 pages, 1624 KB

Open AccessArticle

Surface EMG-Based Hand Gesture Recognition Using a Hybrid Multistream Deep Learning Architecture

by Yusuf Çelik and Umit Can

Sensors 2026, 26(7), 2281; https://doi.org/10.3390/s26072281 - 7 Apr 2026

Viewed by 523

Abstract

Surface electromyography (sEMG) enables non-invasive measurement of muscle activity for applications such as human–machine interaction, rehabilitation, and prosthesis control. However, high noise levels, inter-subject variability, and the complex nature of muscle activation hinder robust gesture classification. This study proposes a multistream hybrid deep-learning [...] Read more.

Surface electromyography (sEMG) enables non-invasive measurement of muscle activity for applications such as human–machine interaction, rehabilitation, and prosthesis control. However, high noise levels, inter-subject variability, and the complex nature of muscle activation hinder robust gesture classification. This study proposes a multistream hybrid deep-learning architecture for the FORS-EMG dataset to address these challenges. The model integrates Temporal Convolutional Networks (TCN), depthwise separable convolutions, bidirectional Long Short-Term Memory (LSTM)–Gated Recurrent Unit (GRU) layers, and a Transformer encoder to capture complementary temporal and spectral patterns, and an ArcFace-based classifier to enhance class separability. We evaluate the approach under three protocols: subject-wise, random split without augmentation, and random split with augmentation. In the augmented random-split setting, the model attains 96.4% accuracy, surpassing previously reported values. In the subject-wise setting, accuracy is 74%, revealing limited cross-user generalization. The results demonstrate the method’s high performance and highlight the impact of data-partition strategies for real-world sEMG-based gesture recognition. Full article

(This article belongs to the Special Issue Machine Learning in Biomedical Signal Processing)

► Show Figures

Figure 1

23 pages, 5784 KB

Open AccessArticle

Learning Italian Hand Gesture Culture Through an Automatic Gesture Recognition Approach

by Chiara Innocente, Giorgio Di Pisa, Irene Lionetti, Andrea Mamoli, Manuela Vitulano, Giorgia Marullo, Simone Maffei, Enrico Vezzetti and Luca Ulrich

Future Internet 2026, 18(4), 177; https://doi.org/10.3390/fi18040177 - 24 Mar 2026

Viewed by 404

Abstract

Italian hand gestures constitute a distinctive and widely recognized form of nonverbal communication, deeply embedded in everyday interaction and cultural identity. Despite their prominence, these gestures are rarely formalized or systematically taught, posing challenges for foreign speakers and visitors seeking to interpret their [...] Read more.

Italian hand gestures constitute a distinctive and widely recognized form of nonverbal communication, deeply embedded in everyday interaction and cultural identity. Despite their prominence, these gestures are rarely formalized or systematically taught, posing challenges for foreign speakers and visitors seeking to interpret their meaning and pragmatic use. Moreover, their ephemeral and embodied nature complicates traditional preservation and transmission approaches, positioning them within the broader domain of intangible cultural heritage. This paper introduces a machine learning–based framework for recognizing iconic Italian hand gestures, designed to support cultural learning and engagement among foreign speakers and visitors. The approach combines RGB–D sensing with depth-enhanced geometric feature extraction, employing interpretable classification models trained on a purpose-built dataset. The recognition system is integrated into a non-immersive virtual reality application simulating an interactive digital totem conceived for public arrival spaces, providing tutorial content, real-time gesture recognition, and immediate feedback within a playful and accessible learning environment. Three supervised machine learning pipelines were evaluated, and Random Forest achieved the best overall performance. Its integration with an Isolation Forest module was further considered for deployment, achieving a macro-averaged accuracy and F1-score of 0.82 under a 5-fold cross-validation protocol. An experimental user study was conducted with 25 subjects to evaluate the proposed interactive system in terms of usability, user engagement, and learning effectiveness, obtaining favorable results and demonstrating its potential as a practical tool for cultural education and intercultural communication. Full article

(This article belongs to the Special Issue Virtual Reality and Metaverse: Impact on the Digital Transformation of Society—3rd Edition)

► Show Figures

Figure 1

19 pages, 759 KB

Open AccessArticle

Dual-Stream BiLSTM–Transformer Architecture for Real-Time Two-Handed Dynamic Sign Language Gesture Recognition

by Enachi Andrei, Turcu Corneliu-Octavian, Culea George, Andrioaia Dragos-Alexandru, Ungureanu Andrei-Gabriel and Sghera Bogdan-Constantin

Appl. Sci. 2026, 16(6), 2912; https://doi.org/10.3390/app16062912 - 18 Mar 2026

Viewed by 333

Abstract

Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks [...] Read more.

Two-handed dynamic gesture recognition represents a fundamental component of sign language interpretation involving the modeling of temporal dependencies and inter-hand coordination. In this task, a major challenge is modeling asymmetric motion patterns, as well as bidirectional and long-range temporal dependencies. Most existing frameworks rely on early fusion strategies that merge joints, keypoints or landmarks from both hands in early processing stages, primarily to reduce model complexity and enforce a unified representation. In this work, a novel dual-stream BiLSTM–Transformer model architecture is proposed for two-handed dynamic sign language recognition, where parallel encoders process the trajectories of each hand independently. To capture spatial and temporal dependencies for each hand, an attention-based cross-hand fusion mechanism is employed, with hand landmarks extracted by the MediaPipe Hands framework as a preprocessing step to enable real-time CPU-based inference. Experimental evaluation conducted on custom Romanian Sign Language dynamic gesture datasets indicates that the proposed dual-stream-based system outperforms single-handed baselines, achieving improvements in high recognition accuracy for asymmetric gestures and consistent performance gains for synchronized two-handed gestures. The proposed architecture represents an efficient and lightweight solution suitable for real-time sign language recognition and interpretation. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

17 pages, 1701 KB

Open AccessArticle

CLIP-ArASL: A Lightweight Multimodal Model for Arabic Sign Language Recognition

by Naif Alasmari

Appl. Sci. 2026, 16(5), 2573; https://doi.org/10.3390/app16052573 - 7 Mar 2026

Viewed by 364

Abstract

Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. [...] Read more.

Arabic sign language (ArASL) is the primary communication medium for Deaf and hard-of-hearing people across Arabic-speaking communities. Most current ArASL recognition systems are based solely on visual features and do not incorporate linguistic or semantic information that could improve generalization and semantic grounding. This paper introduces CLIP-ArASL, a lightweight CLIP-style multimodal approach for static ArASL letter recognition that aligns visual hand gestures with bilingual textual descriptions. The approach integrates an EfficientNet-B0 image encoder with a MiniLM text encoder to learn a shared embedding space using a hybrid objective that combines contrastive and cross-entropy losses. This design supports supervised classification on seen classes and zero-shot prediction on unseen classes using textual class representations. The proposed approach is evaluated on two public datasets, ArASL2018 and ArASL21L. Under supervised evaluation, recognition accuracies of

99.25 \pm 0.14 %

and

91.51 \pm 1.29 %

are achieved, respectively. Zero-shot performance is assessed by withholding

20 %

of gesture classes during training and predicting them using only their textual descriptions. In this setting, accuracies of

55.2 \pm 12.15 %

on ArASL2018 and

37.6 \pm 9.07 %

on ArASL21L are obtained. These results show that multimodal vision–language alignment supports semantic transfer and enables recognition of unseen classes. Full article

(This article belongs to the Special Issue Machine Learning in Computer Vision and Image Processing)

► Show Figures

Figure 1

17 pages, 1732 KB

Open AccessArticle

Lightweight Visual Dynamic Gesture Recognition System Based on CNN-LSTM-DSA

by Zhenxing Wang, Ziyan Wu, Ruidi Qi and Xuan Dou

Sensors 2026, 26(5), 1558; https://doi.org/10.3390/s26051558 - 2 Mar 2026

Viewed by 529

Abstract

Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, [...] Read more.

Addressing the challenges of large-scale gesture recognition models, high computational complexity, and inefficient deployment on embedded devices, this study designs and implements a visual dynamic gesture recognition system based on a lightweight CNN-LSTM-DSA model. The system captures user hand images via a camera, extracts 21 keypoint 3D coordinates using MediaPipe, and employs a lightweight hybrid model to perform spatial and temporal feature modeling on keypoint sequences, achieving high-precision recognition of complex dynamic gestures. In static gesture recognition, the system determines the gesture state through joint angle calculation and a sliding window smoothing algorithm, ensuring smooth mapping of the servo motor angles and stability of the robotic hand’s movements. In dynamic gesture recognition, the system models the key point time series based on the CNN-LSTM-DSA hybrid model, enabling accurate classification and reproduction of gesture actions. Experimental results show that the proposed system demonstrates good robustness under various lighting and background conditions, with a static gesture recognition accuracy of up to 96%, dynamic gesture recognition accuracy of 90.19%, and an overall response delay of less than 300 ms. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

17 pages, 4471 KB

Open AccessArticle

Utilizing Data Quality Indices for Strategic Sensor Channel Selection to Enhance Performance of Hand Gesture Recognition Systems

by Shen Zhang, Hao Zhou, Rayane Tchantchane and Gursel Alici

Sensors 2026, 26(4), 1213; https://doi.org/10.3390/s26041213 - 12 Feb 2026

Viewed by 456

Abstract

This study proposes a data quality-driven channel selection methodology to improve hand gesture recognition performance in multi-channel wearable Human–Machine Interface (HMI) systems. The methodology centers around calculating (i) five data quality indices for both surface electromyography (sEMG) and pressure-based force myography (pFMG) signals [...] Read more.

This study proposes a data quality-driven channel selection methodology to improve hand gesture recognition performance in multi-channel wearable Human–Machine Interface (HMI) systems. The methodology centers around calculating (i) five data quality indices for both surface electromyography (sEMG) and pressure-based force myography (pFMG) signals and (ii) establishing a relationship between these data quality indices and the accuracy of gesture recognition for applications typified by prosthetic hand control. Machine learning (ML)-based and correlation-based methods were used to select three optimal channel/pair configurations from an eight-channel/pair system. Evaluations on the UOW and Ninapro DB2 datasets showed that the proposed methods consistently outperformed random channel selection, with the ML-based approach achieving the best results (76.36% for sEMG, 71.59% for pFMG, and 88.2% for fused sEMG-pFMG on the UOW dataset and 70.28% on Ninapro DB2). Notably, using three pairs of strategically selected sEMG-pFMG channels generated 88.2%, which is comparable to the 88.38% accuracy obtained with a full eight-channel sEMG system on the UOW dataset, highlighting the efficacy of our channel selection methodologies. These results highlight the value of data quality indices for sensor selection and provide a foundation for developing more efficient wearable HMI systems. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

18 pages, 1956 KB

Open AccessArticle

Dynamic Occlusion-Aware Facial Expression Recognition Guided by AA-ViT

by Xiangwei Mou, Xiuping Xie, Yongfu Song and Rijun Wang

Electronics 2026, 15(4), 764; https://doi.org/10.3390/electronics15040764 - 11 Feb 2026

Viewed by 431

Abstract

In complex natural scenarios, facial expression recognition often encounters partial occlusions caused by glasses, hand gestures, and hairstyles, making it difficult for models to extract effective features and thereby reducing recognition accuracy. Existing methods often employ attention mechanisms to enhance expression-related features, but [...] Read more.

In complex natural scenarios, facial expression recognition often encounters partial occlusions caused by glasses, hand gestures, and hairstyles, making it difficult for models to extract effective features and thereby reducing recognition accuracy. Existing methods often employ attention mechanisms to enhance expression-related features, but they fail to adequately address the issue where high-frequency responses in occluded regions can disperse attention weights (e.g., incorrectly focus on occluded areas), making it challenging to effectively utilize local cues around the occlusions and limiting performance improvement. To address this, this paper proposes a network based on an adaptive attention mechanism (Adaptive Attention Vision Transformer, AA-ViT). First, an Adaptive Attention module (ADA) is designed to dynamically adjust attention scores in occluded regions, enhancing the effective information in features. Next, a Dual-Branch Multi-Layer Perceptron (DB-MLP) replaces the single linear layer to improve feature representation and model classification capability. Additionally, a Random Erasure (RE) strategy is introduced to enhance model robustness. Finally, to address the issue of model training instability caused by class imbalance in the training dataset, a hybrid loss function combining Focal Loss and Cross-Entropy Loss is adopted to ensure training stability. Experimental results show that AA-ViT achieves expression recognition accuracies of 90.66% and 90.01% on the RAF-DB and FERPlus datasets, respectively, representing improvements of 4.58 and 18.9 percentage points over the baseline ViT model, with only a 24.3% increase in parameter count. Compared to existing methods, the proposed approach demonstrates superior performance in occluded facial expression recognition tasks. Full article

(This article belongs to the Special Issue Emerging Trends in Facial Expression Recognition: Applications and Challenges)

► Show Figures

Figure 1

Search Results (479)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (479)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI