Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (49)

Search Parameters:
Keywords = gesture recognition and representation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 29852 KB  
Article
Dual-Axis Transformer-GNN Framework for Touchless Finger Location Sensing by Using Wi-Fi Channel State Information
by Minseok Koo and Jaesung Park
Electronics 2026, 15(3), 565; https://doi.org/10.3390/electronics15030565 - 28 Jan 2026
Viewed by 163
Abstract
Camera, lidar, and wearable-based gesture recognition technologies face practical limitations such as lighting sensitivity, occlusion, hardware cost, and user inconvenience. Wi-Fi channel state information (CSI) can be used as a contactless alternative to capture subtle signal variations caused by human motion. However, existing [...] Read more.
Camera, lidar, and wearable-based gesture recognition technologies face practical limitations such as lighting sensitivity, occlusion, hardware cost, and user inconvenience. Wi-Fi channel state information (CSI) can be used as a contactless alternative to capture subtle signal variations caused by human motion. However, existing CSI-based methods are highly sensitive to domain shifts and often suffer notable performance degradation when applied to environments different from the training conditions. To address this issue, we propose a domain-robust touchless finger location sensing framework that operates reliably even in a single-link environment composed of commercial Wi-Fi devices. The proposed system applies preprocessing procedures to reduce noise and variability introduced by environmental factors and introduces a multi-domain segment combination strategy to increase the domain diversity during training. In addition, the dual-axis transformer learns temporal and spatial features independently, and the GNN-based integration module incorporates relationships among segments originating from different domains to produce more generalized representations. The proposed model is evaluated using CSI data collected from various users and days; experimental results show that the proposed method achieves an in-domain accuracy of 99.31% and outperforms the best baseline by approximately 4% and 3% in cross-user and cross-day evaluation settings, respectively, even in a single-link setting. Our work demonstrates a viable path for robust, calibration-free finger-level interaction using ubiquitous single-link Wi-Fi in real-world and constrained environments, providing a foundation for more reliable contactless interaction systems. Full article
Show Figures

Figure 1

27 pages, 4802 KB  
Article
Fine-Grained Radar Hand Gesture Recognition Method Based on Variable-Channel DRSN
by Penghui Chen, Siben Li, Chenchen Yuan, Yujing Bai and Jun Wang
Electronics 2026, 15(2), 437; https://doi.org/10.3390/electronics15020437 - 19 Jan 2026
Viewed by 167
Abstract
With the ongoing miniaturization of smart devices, fine-grained hand gesture recognition using millimeter-wave radar has attracted increasing attention, yet practical deployment remains challenging in continuous-gesture segmentation, robust feature extraction, and reliable classification. This paper presents an end-to-end fine-grained gesture recognition framework based on [...] Read more.
With the ongoing miniaturization of smart devices, fine-grained hand gesture recognition using millimeter-wave radar has attracted increasing attention, yet practical deployment remains challenging in continuous-gesture segmentation, robust feature extraction, and reliable classification. This paper presents an end-to-end fine-grained gesture recognition framework based on frequency modulated continuous wave(FMCW) millimeter-wave radar, including gesture design, data acquisition, feature construction, and neural network-based classification. Ten gesture types are recorded (eight valid gestures and two return-to-neutral gestures); for classification, the two return-to-neutral gesture types are merged into a single invalid class, yielding a nine-class task. A sliding-window segmentation method is developed using short-time Fourier transformation(STFT)-based Doppler-time representations, and a dataset of 4050 labeled samples is collected. Multiple signal classification(MUSIC)-based super-resolution estimation is adopted to construct range–time and angle–time representations, and instance-wise normalization is applied to Doppler and range features to mitigate inter-individual variability without test leakage. For recognition, a variable-channel deep residual shrinkage network (DRSN) is employed to improve robustness to noise, supporting single-, dual-, and triple-channel feature inputs. Results under both subject-dependent evaluation with repeated random splits and subject-independent leave one subject out(LOSO) cross-validation show that DRSN architecture consistently outperforms the RefineNet-based baseline, and the triple-channel configuration achieves the best performance (98.88% accuracy). Overall, the variable-channel design enables flexible feature selection to meet diverse application requirements. Full article
Show Figures

Figure 1

26 pages, 3626 KB  
Article
A Lightweight Frozen Multi-Convolution Dual-Branch Network for Efficient sEMG-Based Gesture Recognition
by Shengbiao Wu, Zhezhe Lv, Yuehong Li, Chengmin Fang, Tao You and Jiazheng Gui
Sensors 2026, 26(2), 580; https://doi.org/10.3390/s26020580 - 15 Jan 2026
Viewed by 220
Abstract
Gesture recognition is important for rehabilitation assistance and intelligent prosthetic control. However, surface electromyography (sEMG) signals exhibit strong non-stationarity, and conventional deep-learning models require long training time and high computational cost, limiting their use on resource-constrained devices. This study proposes a Frozen Multi-Convolution [...] Read more.
Gesture recognition is important for rehabilitation assistance and intelligent prosthetic control. However, surface electromyography (sEMG) signals exhibit strong non-stationarity, and conventional deep-learning models require long training time and high computational cost, limiting their use on resource-constrained devices. This study proposes a Frozen Multi-Convolution Dual-Branch Network (FMC-DBNet) to address these challenges. The model employs randomly initialized and fixed convolutional kernels for training-free multi-scale feature extraction, substantially reducing computational overhead. A dual-branch architecture is adopted to capture complementary temporal and physiological patterns from raw sEMG signals and intrinsic mode functions (IMFs) obtained through variational mode decomposition (VMD). In addition, positive-proportion (PPV) and global-average-pooling (GAP) statistics enhance lightweight multi-resolution representation. Experiments on the Ninapro DB1 dataset show that FMC-DBNet achieves an average accuracy of 96.4% ± 1.9% across 27 subjects and reduces training time by approximately 90% compared with a conventional trainable CNN baseline. These results demonstrate that frozen random-convolution structures provide an efficient and robust alternative to fully trained deep networks, offering a promising solution for low-power and computationally efficient sEMG gesture recognition. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

27 pages, 4631 KB  
Article
Multimodal Minimal-Angular-Geometry Representation for Real-Time Dynamic Mexican Sign Language Recognition
by Gerardo Garcia-Gil, Gabriela del Carmen López-Armas and Yahir Emmanuel Ramirez-Pulido
Technologies 2026, 14(1), 48; https://doi.org/10.3390/technologies14010048 - 8 Jan 2026
Viewed by 309
Abstract
Current approaches to dynamic sign language recognition commonly rely on dense landmark representations, which impose high computational cost and hinder real-time deployment on resource-constrained devices. To address this limitation, this work proposes a computationally efficient framework for real-time dynamic Mexican Sign Language (MSL) [...] Read more.
Current approaches to dynamic sign language recognition commonly rely on dense landmark representations, which impose high computational cost and hinder real-time deployment on resource-constrained devices. To address this limitation, this work proposes a computationally efficient framework for real-time dynamic Mexican Sign Language (MSL) recognition based on a multimodal minimal angular-geometry representation. Instead of processing complete landmark sets (e.g., MediaPipe Holistic with up to 468 keypoints), the proposed method encodes the relational geometry of the hands, face, and upper body into a compact set of 28 invariant internal angular descriptors. This representation substantially reduces feature dimensionality and computational complexity while preserving linguistically relevant manual and non-manual information required for grammatical and semantic discrimination in MSL. A real-time end-to-end pipeline is developed, comprising multimodal landmark extraction, angular feature computation, and temporal modeling using a Bidirectional Long Short-Term Memory (BiLSTM) network. The system is evaluated on a custom dataset of dynamic MSL gestures acquired under controlled real-time conditions. Experimental results demonstrate that the proposed approach achieves 99% accuracy and 99% macro F1-score, matching state-of-the-art performance while using fewer features dramatically. The compactness, interpretability, and efficiency of the minimal angular descriptor make the proposed system suitable for real-time deployment on low-cost devices, contributing toward more accessible and inclusive sign language recognition technologies. Full article
(This article belongs to the Special Issue Image Analysis and Processing)
Show Figures

Figure 1

24 pages, 15172 KB  
Article
Real-Time Hand Gesture Recognition for IoT Devices Using FMCW mmWave Radar and Continuous Wavelet Transform
by Anna Ślesicka and Adam Kawalec
Electronics 2026, 15(2), 250; https://doi.org/10.3390/electronics15020250 - 6 Jan 2026
Viewed by 364
Abstract
This paper presents an intelligent framework for real-time hand gesture recognition using Frequency-Modulated Continuous-Wave (FMCW) mmWave radar and deep learning. Unlike traditional radar-based recognition methods that rely on Discrete Fourier Transform (DFT) signal representations and focus primarily on classifier optimization, the proposed system [...] Read more.
This paper presents an intelligent framework for real-time hand gesture recognition using Frequency-Modulated Continuous-Wave (FMCW) mmWave radar and deep learning. Unlike traditional radar-based recognition methods that rely on Discrete Fourier Transform (DFT) signal representations and focus primarily on classifier optimization, the proposed system introduces a novel pre-processing stage based on the Continuous Wavelet Transform (CWT). The CWT enables the extraction of discriminative time–frequency features directly from raw radar signals, improving the interpretability and robustness of the learned representations. A lightweight convolutional neural network architecture is then designed to process the CWT maps for efficient classification on edge IoT devices. Experimental validation with data collected from 20 participants performing five standardized gestures demonstrates that the proposed framework achieves an accuracy of up to 99.87% using the Morlet wavelet, with strong generalization to unseen users (82–84% accuracy). The results confirm that the integration of CWT-based radar signal processing with deep learning forms a computationally efficient and accurate intelligent system for human–computer interaction in real-time IoT environments. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

17 pages, 1312 KB  
Article
RGB Fusion of Multiple Radar Sensors for Deep Learning-Based Traffic Hand Gesture Recognition
by Hüseyin Üzen
Electronics 2026, 15(1), 140; https://doi.org/10.3390/electronics15010140 - 28 Dec 2025
Viewed by 353
Abstract
Hand gesture recognition (HGR) systems play a critical role in modern intelligent transportation frameworks by enabling reliable communication between pedestrians, traffic operators, and autonomous vehicles. This work presents a novel traffic hand gesture recognition method that combines nine grayscale radar images captured from [...] Read more.
Hand gesture recognition (HGR) systems play a critical role in modern intelligent transportation frameworks by enabling reliable communication between pedestrians, traffic operators, and autonomous vehicles. This work presents a novel traffic hand gesture recognition method that combines nine grayscale radar images captured from multiple millimeter-wave radar nodes into a single RGB representation through an optimized rotation–shift fusion strategy. This transformation preserves complementary spatial information while minimizing inter-image interference, enabling deep learning models to more effectively utilize the distinctive micro-Doppler and spatial patterns embedded in radar measurements. Extensive experimental studies were conducted to verify the model’s performance, demonstrating that the proposed RGB fusion approach provides higher classification accuracy than single-sensor or unfused representations. In addition, the proposed model outperformed state-of-the-art methods in the literature with an accuracy of 92.55%. These results highlight its potential as a lightweight yet powerful solution for reliable gesture interpretation in future intelligent transportation and human–vehicle interaction systems. Full article
(This article belongs to the Special Issue Advanced Techniques for Multi-Agent Systems)
Show Figures

Figure 1

24 pages, 2879 KB  
Article
Skeleton-Based Real-Time Hand Gesture Recognition Using Data Fusion and Ensemble Multi-Stream CNN Architecture
by Maki K. Habib, Oluwaleke Yusuf and Mohamed Moustafa
Technologies 2025, 13(11), 484; https://doi.org/10.3390/technologies13110484 - 26 Oct 2025
Viewed by 1500
Abstract
Hand Gesture Recognition (HGR) is a vital technology that enables intuitive human–computer interaction in various domains, including augmented reality, smart environments, and assistive systems. Achieving both high accuracy and real-time performance remains challenging due to the complexity of hand dynamics, individual morphological variations, [...] Read more.
Hand Gesture Recognition (HGR) is a vital technology that enables intuitive human–computer interaction in various domains, including augmented reality, smart environments, and assistive systems. Achieving both high accuracy and real-time performance remains challenging due to the complexity of hand dynamics, individual morphological variations, and computational limitations. This paper presents a lightweight and efficient skeleton-based HGR framework that addresses these challenges through an optimized multi-stream Convolutional Neural Network (CNN) architecture and a trainable ensemble tuner. Dynamic 3D gestures are transformed into structured, noise-minimized 2D spatiotemporal representations via enhanced data-level fusion, supporting robust classification across diverse spatial perspectives. The ensemble tuner strengthens semantic relationships between streams and improves recognition accuracy. Unlike existing solutions that rely on high-end hardware, the proposed framework achieves real-time inference on consumer-grade devices without compromising accuracy. Experimental validation across five benchmark datasets (SHREC2017, DHG1428, FPHA, LMDHG, and CNR) confirms consistent or superior performance with reduced computational overhead. Additional validation on the SBU Kinect Interaction Dataset highlights generalization potential for broader Human Action Recognition (HAR) tasks. This advancement bridges the gap between efficiency and accuracy, supporting scalable deployment in AR/VR, mobile computing, interactive gaming, and resource-constrained environments. Full article
Show Figures

Figure 1

23 pages, 4949 KB  
Article
Hybrid LDA-CNN Framework for Robust End-to-End Myoelectric Hand Gesture Recognition Under Dynamic Conditions
by Hongquan Le, Marc in het Panhuis, Geoffrey M. Spinks and Gursel Alici
Robotics 2025, 14(6), 83; https://doi.org/10.3390/robotics14060083 - 17 Jun 2025
Cited by 1 | Viewed by 1857
Abstract
Gesture recognition based on conventional machine learning is the main control approach for advanced prosthetic hand systems. Its primary limitation is the need for feature extraction, which must meet real-time control requirements. On the other hand, deep learning models could potentially overfit when [...] Read more.
Gesture recognition based on conventional machine learning is the main control approach for advanced prosthetic hand systems. Its primary limitation is the need for feature extraction, which must meet real-time control requirements. On the other hand, deep learning models could potentially overfit when trained on small datasets. For these reasons, we propose a hybrid Linear Discriminant Analysis–convolutional neural network (LDA-CNN) framework to improve the gesture recognition performance of sEMG-based prosthetic hand control systems. Within this framework, 1D-CNN filters are trained to generate latent representation that closely approximates Fisher’s (LDA’s) discriminant subspace, constructed from handcrafted features. Under the train-one-test-all evaluation scheme, our proposed hybrid framework consistently outperformed the 1D-CNN trained with cross-entropy loss only, showing improvements from 4% to 11% across two public datasets featuring hand gestures recorded under various limb positions and arm muscle contraction levels. Furthermore, our framework exhibited advantages in terms of induced spectral regularization, which led to a state-of-the-art recognition error of 22.79% with the extended 23 feature set when tested on the multi-limb position dataset. The main novelty of our hybrid framework is that it decouples feature extraction in regard to the inference time, enabling the future incorporation of a more extensive set of features, while keeping the inference computation time minimal. Full article
(This article belongs to the Special Issue AI for Robotic Exoskeletons and Prostheses)
Show Figures

Figure 1

18 pages, 4982 KB  
Article
Unsupervised Clustering and Ensemble Learning for Classifying Lip Articulation in Fingerspelling
by Nurzada Amangeldy, Nazerke Gazizova, Marek Milosz, Bekbolat Kurmetbek, Aizhan Nazyrova and Akmaral Kassymova
Sensors 2025, 25(12), 3703; https://doi.org/10.3390/s25123703 - 13 Jun 2025
Viewed by 1048
Abstract
This paper presents a new methodology for analyzing lip articulation during fingerspelling aimed at extracting robust visual patterns that can overcome the inherent ambiguity and variability of lip shape. The proposed approach is based on unsupervised clustering of lip movement trajectories to identify [...] Read more.
This paper presents a new methodology for analyzing lip articulation during fingerspelling aimed at extracting robust visual patterns that can overcome the inherent ambiguity and variability of lip shape. The proposed approach is based on unsupervised clustering of lip movement trajectories to identify consistent articulatory patterns across different time profiles. The methodology is not limited to using a single model. Still, it includes the exploration of varying cluster configurations and an assessment of their robustness, as well as a detailed analysis of the correspondence between individual alphabet letters and specific clusters. In contrast to direct classification based on raw visual features, this approach pre-tests clustered representations using a model-based assessment of their discriminative potential. This structured approach enhances the interpretability and robustness of the extracted features, highlighting the importance of lip dynamics as an auxiliary modality in multimodal sign language recognition. The obtained results demonstrate that trajectory clustering can serve as a practical method for generating features, providing more accurate and context-sensitive gesture interpretation. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

25 pages, 9742 KB  
Article
Autism Spectrum Disorder Detection Using Skeleton-Based Body Movement Analysis via Dual-Stream Deep Learning
by Jungpil Shin, Abu Saleh Musa Miah, Manato Kakizaki, Najmul Hassan and Yoichi Tomioka
Electronics 2025, 14(11), 2231; https://doi.org/10.3390/electronics14112231 - 30 May 2025
Cited by 2 | Viewed by 2466
Abstract
Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns [...] Read more.
Autism Spectrum Disorder (ASD) poses significant challenges in diagnosis due to its diverse symptomatology and the complexity of early detection. Atypical gait and gesture patterns, prominent behavioural markers of ASD, hold immense potential for facilitating early intervention and optimising treatment outcomes. These patterns can be efficiently and non-intrusively captured using modern computational techniques, making them valuable for ASD recognition. Various types of research have been conducted to detect ASD through deep learning, including facial feature analysis, eye gaze analysis, and movement and gesture analysis. In this study, we optimise a dual-stream architecture that combines image classification and skeleton recognition models to analyse video data for body motion analysis. The first stream processes Skepxels—spatial representations derived from skeleton data—using ConvNeXt-Base, a robust image recognition model that efficiently captures aggregated spatial embeddings. The second stream encodes angular features, embedding relative joint angles into the skeleton sequence and extracting spatiotemporal dynamics using Multi-Scale Graph 3D Convolutional Network(MSG3D), a combination of Graph Convolutional Networks (GCNs) and Temporal Convolutional Networks (TCNs). We replace the ViT model from the original architecture with ConvNeXt-Base to evaluate the efficacy of CNN-based models in capturing gesture-related features for ASD detection. Additionally, we experimented with a Stack Transformer in the second stream instead of MSG3D but found it to result in lower performance accuracy, thus highlighting the importance of GCN-based models for motion analysis. The integration of these two streams ensures comprehensive feature extraction, capturing both global and detailed motion patterns. A pairwise Euclidean distance loss is employed during training to enhance the consistency and robustness of feature representations. The results from our experiments demonstrate that the two-stream approach, combining ConvNeXt-Base and MSG3D, offers a promising method for effective autism detection. This approach not only enhances accuracy but also contributes valuable insights into optimising deep learning models for gesture-based recognition. By integrating image classification and skeleton recognition, we can better capture both global and detailed motion patterns, which are crucial for improving early ASD diagnosis and intervention strategies. Full article
(This article belongs to the Special Issue Convolutional Neural Networks and Vision Applications, 4th Edition)
Show Figures

Figure 1

15 pages, 4192 KB  
Article
Enhancing Kazakh Sign Language Recognition with BiLSTM Using YOLO Keypoints and Optical Flow
by Zholdas Buribayev, Maria Aouani, Zhansaya Zhangabay, Ainur Yerkos, Zemfira Abdirazak and Mukhtar Zhassuzak
Appl. Sci. 2025, 15(10), 5685; https://doi.org/10.3390/app15105685 - 20 May 2025
Cited by 2 | Viewed by 1852
Abstract
Sign languages are characterized by complex and subtle hand movements, which are challenging for computer vision systems to accurately recognize. This study suggests an innovative deep learning pipeline specifically designed for reliable gesture recognition of Kazakh Sign Language. This approach combines key point [...] Read more.
Sign languages are characterized by complex and subtle hand movements, which are challenging for computer vision systems to accurately recognize. This study suggests an innovative deep learning pipeline specifically designed for reliable gesture recognition of Kazakh Sign Language. This approach combines key point detection using the YOLO model, optical flow estimation, and a bidirectional long short-term memory (BiLSTM) network. At the initial stage, a dataset is generated using MediaPipe, which is then used to train the YOLO model in order to accurately identify key hand points. After training, the YOLO model extracts key points and bounding boxes from video recordings of gestures, creating consistent representations of movements. To improve the recognition of dynamic gestures, the optical flow is calculated in an area covering 10% of the area around key points, which allows the dynamics of movements to be captured and provides additional time characteristics. The BiLSTM network is trained on multimodal input that combines data on keypoints, bounding boxes, and optical flow, resulting in improved gesture classification accuracy. The experimental results demonstrate that the proposed approach is superior to traditional methods based solely on key points, especially in recognizing fast and complex gestures. The proposed structure promotes the development of sign language recognition technologies, especially for poorly studied languages such as Kazakh, paving the way to more inclusive and effective communication tools. Full article
Show Figures

Figure 1

23 pages, 8209 KB  
Article
Spatio-Temporal Transformer with Kolmogorov–Arnold Network for Skeleton-Based Hand Gesture Recognition
by Pengcheng Han, Xin He, Takafumi Matsumaru and Vibekananda Dutta
Sensors 2025, 25(3), 702; https://doi.org/10.3390/s25030702 - 24 Jan 2025
Cited by 2 | Viewed by 3215
Abstract
Manually crafted features often suffer from being subjective, having an inadequate accuracy, or lacking in robustness in recognition. Meanwhile, existing deep learning methods often overlook the structural and dynamic characteristics of the human hand, failing to fully explore the contextual information of joints [...] Read more.
Manually crafted features often suffer from being subjective, having an inadequate accuracy, or lacking in robustness in recognition. Meanwhile, existing deep learning methods often overlook the structural and dynamic characteristics of the human hand, failing to fully explore the contextual information of joints in both the spatial and temporal domains. To effectively capture dependencies between the hand joints that are not adjacent but may have potential connections, it is essential to learn long-term relationships. This study proposes a skeleton-based hand gesture recognition framework, the ST-KT, a spatio-temporal graph convolution network, and a transformer with the Kolmogorov–Arnold Network (KAN) model. It incorporates spatio-temporal graph convolution network (ST-GCN) modules and a spatio-temporal transformer module with KAN (KAN–Transformer). ST-GCN modules, which include a spatial graph convolution network (SGCN) and a temporal convolution network (TCN), extract primary features from skeleton sequences by leveraging the strength of graph convolutional networks in the spatio-temporal domain. A spatio-temporal position embedding method integrates node features, enriching representations by including node identities and temporal information. The transformer layer includes a spatial KAN–Transformer (S-KT) and a temporal KAN–Transformer (T-KT), which further extract joint features by learning edge weights and node embeddings, providing richer feature representations and the capability for nonlinear modeling. We evaluated the performance of our method on two challenging skeleton-based dynamic gesture datasets: our method achieved an accuracy of 97.5% on the SHREC’17 track dataset and 94.3% on the DHG-14/28 dataset. These results demonstrate that our proposed method, ST-KT, effectively captures dynamic skeleton changes and complex joint relationships. Full article
Show Figures

Figure 1

18 pages, 9899 KB  
Article
A Robotic Teleoperation System with Integrated Augmented Reality and Digital Twin Technologies for Disassembling End-of-Life Batteries
by Feifan Zhao, Wupeng Deng and Duc Truong Pham
Batteries 2024, 10(11), 382; https://doi.org/10.3390/batteries10110382 - 30 Oct 2024
Cited by 11 | Viewed by 4567
Abstract
Disassembly is a key step in remanufacturing, especially for end-of-life (EoL) products such as electric vehicle (EV) batteries, which are challenging to dismantle due to uncertainties in their condition and potential risks of fire, fumes, explosions, and electrical shock. To address these challenges, [...] Read more.
Disassembly is a key step in remanufacturing, especially for end-of-life (EoL) products such as electric vehicle (EV) batteries, which are challenging to dismantle due to uncertainties in their condition and potential risks of fire, fumes, explosions, and electrical shock. To address these challenges, this paper presents a robotic teleoperation system that leverages augmented reality (AR) and digital twin (DT) technologies to enable a human operator to work away from the danger zone. By integrating AR and DTs, the system not only provides a real-time visual representation of the robot’s status but also enables remote control via gesture recognition. A bidirectional communication framework established within the system synchronises the virtual robot with its physical counterpart in an AR environment, which enhances the operator’s understanding of both the robot and task statuses. In the event of anomalies, the operator can interact with the virtual robot through intuitive gestures based on information displayed on the AR interface, thereby improving decision-making efficiency and operational safety. The application of this system is demonstrated through a case study involving the disassembly of a busbar from an EoL EV battery. Furthermore, the performance of the system in terms of task completion time and operator workload was evaluated and compared with that of AR-based control methods without informational cues and ‘smartpad’ controls. The findings indicate that the proposed system reduces operation time and enhances user experience, delivering its broad application potential in complex industrial settings. Full article
(This article belongs to the Section Battery Processing, Manufacturing and Recycling)
Show Figures

Figure 1

21 pages, 1550 KB  
Article
Using 3D Hand Pose Data in Recognizing Human–Object Interaction and User Identification for Extended Reality Systems
by Danish Hamid, Muhammad Ehatisham Ul Haq, Amanullah Yasin, Fiza Murtaza and Muhammad Awais Azam
Information 2024, 15(10), 629; https://doi.org/10.3390/info15100629 - 12 Oct 2024
Cited by 1 | Viewed by 2972
Abstract
Object detection and action/gesture recognition have become imperative in security and surveillance fields, finding extensive applications in everyday life. Advancement in such technologies will help in furthering cybersecurity and extended reality systems through the accurate identification of users and their interactions, which plays [...] Read more.
Object detection and action/gesture recognition have become imperative in security and surveillance fields, finding extensive applications in everyday life. Advancement in such technologies will help in furthering cybersecurity and extended reality systems through the accurate identification of users and their interactions, which plays a pivotal role in the security management of an entity and providing an immersive experience. Essentially, it enables the identification of human–object interaction to track actions and behaviors along with user identification. Yet, it is performed by traditional camera-based methods with high difficulties and challenges since occlusion, different camera viewpoints, and background noise lead to significant appearance variation. Deep learning techniques also demand large and labeled datasets and a large amount of computational power. In this paper, a novel approach to the recognition of human–object interactions and the identification of interacting users is proposed, based on three-dimensional hand pose data from an egocentric camera view. A multistage approach that integrates object detection with interaction recognition and user identification using the data from hand joints and vertices is proposed. Our approach uses a statistical attribute-based model for feature extraction and representation. The proposed technique is tested on the HOI4D dataset using the XGBoost classifier, achieving an average F1-score of 81% for human–object interaction and an average F1-score of 80% for user identification, hence proving to be effective. This technique is mostly targeted for extended reality systems, as proper interaction recognition and users identification are the keys to keeping systems secure and personalized. Its relevance extends into cybersecurity, augmented reality, virtual reality, and human–robot interactions, offering a potent solution for security enhancement along with enhancing interactivity in such systems. Full article
(This article belongs to the Special Issue Extended Reality and Cybersecurity)
Show Figures

Figure 1

18 pages, 9066 KB  
Article
Semi-Supervised FMCW Radar Hand Gesture Recognition via Pseudo-Label Consistency Learning
by Yuhang Shi, Lihong Qiao, Yucheng Shu, Baobin Li, Bin Xiao, Weisheng Li and Xinbo Gao
Remote Sens. 2024, 16(13), 2267; https://doi.org/10.3390/rs16132267 - 21 Jun 2024
Cited by 2 | Viewed by 2466
Abstract
Hand gesture recognition is pivotal in facilitating human–machine interaction within the Internet of Things. Nevertheless, it encounters challenges, including labeling expenses and robustness. To tackle these issues, we propose a semi-supervised learning framework guided by pseudo-label consistency. This framework utilizes a dual-branch structure [...] Read more.
Hand gesture recognition is pivotal in facilitating human–machine interaction within the Internet of Things. Nevertheless, it encounters challenges, including labeling expenses and robustness. To tackle these issues, we propose a semi-supervised learning framework guided by pseudo-label consistency. This framework utilizes a dual-branch structure with a mean-teacher network. Within this setup, a global and locally guided self-supervised learning encoder acts as a feature extractor in a teacher–student network to efficiently extract features, maximizing data utilization to enhance feature representation. Additionally, we introduce a pseudo-label Consistency-Guided Mean-Teacher model, where simulated noise is incorporated to generate newly unlabeled samples for the teacher model before advancing to the subsequent stage. By enforcing consistency constraints between the outputs of the teacher and student models, we alleviate accuracy degradation resulting from individual differences and interference from other body parts, thereby bolstering the network’s robustness. Ultimately, the teacher model undergoes refinement through exponential moving averages to achieve stable weights. We evaluate our semi-supervised method on two publicly available hand gesture datasets and compare it with several state-of-the-art fully-supervised algorithms. The results demonstrate the robustness of our method, achieving an accuracy rate exceeding 99% across both datasets. Full article
Show Figures

Figure 1

Back to TopTop