MDPI - Publisher of Open Access Journals

24 pages, 6540 KiB

Open AccessArticle

A Hybrid Control Approach Integrating Model-Predictive Control and Fractional-Order Admittance Control for Automatic Internal Limiting Membrane Peeling Surgery

by Hongcheng Liu, Xiaodong Zhang, Yachun Wang, Zirui Zhao and Ning Wang

Actuators 2025, 14(7), 328; https://doi.org/10.3390/act14070328 - 1 Jul 2025

Viewed by 157

Abstract

As the prevalence of related diseases continues to rise, a corresponding increase in the demand for internal limiting membrane (ILM) peeling surgery has been observed. However, significant challenges are encountered in ILM peeling surgery, including limited force feedback, inadequate depth perception, and surgeon [...] Read more.

As the prevalence of related diseases continues to rise, a corresponding increase in the demand for internal limiting membrane (ILM) peeling surgery has been observed. However, significant challenges are encountered in ILM peeling surgery, including limited force feedback, inadequate depth perception, and surgeon hand tremors. Research on fully autonomous ILM peeling surgical robots has been conducted to address the imbalance between medical resource availability and patient demand while enhancing surgical safety. An automatic control framework for break initiation in ILM peeling is proposed in this study, which integrates model-predictive control with fractional-order admittance control. Additionally, a multi-vision task surgical scene perception method is introduced based on target detection, key point recognition, and sparse binocular matching. A surgical trajectory planning strategy for break initiation in ILM peeling aligned with operative specifications is proposed. Finally, validation experiments for automatic break initiation in ILM peeling were performed using eye phantoms. The results indicated that the positional error of the micro-forceps tip remained within 40 μm. At the same time, the contact force overshoot was limited to under 6%, thereby ensuring both the effectiveness and safety of break initiation during ILM peeling. Full article

(This article belongs to the Special Issue Motion Planning, Trajectory Prediction, and Control for Robotics)

► Show Figures

Figure 1

27 pages, 569 KiB

Open AccessArticle

Construction Worker Activity Recognition Using Deep Residual Convolutional Network Based on Fused IMU Sensor Data in Internet-of-Things Environment

by Sakorn Mekruksavanich and Anuchit Jitpattanakul

IoT 2025, 6(3), 36; https://doi.org/10.3390/iot6030036 - 28 Jun 2025

Viewed by 262

Abstract

With the advent of Industry 4.0, sensor-based human activity recognition has become increasingly vital for improving worker safety, enhancing operational efficiency, and optimizing workflows in Internet-of-Things (IoT) environments. This study introduces a novel deep learning-based framework for construction worker activity recognition, employing a [...] Read more.

With the advent of Industry 4.0, sensor-based human activity recognition has become increasingly vital for improving worker safety, enhancing operational efficiency, and optimizing workflows in Internet-of-Things (IoT) environments. This study introduces a novel deep learning-based framework for construction worker activity recognition, employing a deep residual convolutional neural network (ResNet) architecture integrated with multi-sensor fusion techniques. The proposed system processes data from multiple inertial measurement unit sensors strategically positioned on workers’ bodies to identify and classify construction-related activities accurately. A comprehensive pre-processing pipeline is implemented, incorporating Butterworth filtering for noise suppression, data normalization, and an adaptive sliding window mechanism for temporal segmentation. Experimental validation is conducted using the publicly available VTT-ConIoT dataset, which includes recordings of 16 construction activities performed by 13 participants in a controlled laboratory setting. The results demonstrate that the ResNet-based sensor fusion approach outperforms traditional single-sensor models and other deep learning methods. The system achieves classification accuracies of 97.32% for binary discrimination between recommended and non-recommended activities, 97.14% for categorizing six core task types, and 98.68% for detailed classification across sixteen individual activities. Optimal performance is consistently obtained with a 4-second window size, balancing recognition accuracy with computational efficiency. Although the hand-mounted sensor proved to be the most effective as a standalone unit, multi-sensor configurations delivered significantly higher accuracy, particularly in complex classification tasks. The proposed approach demonstrates strong potential for real-world applications, offering robust performance across diverse working conditions while maintaining computational feasibility for IoT deployment. This work advances the field of innovative construction by presenting a practical solution for real-time worker activity monitoring, which can be seamlessly integrated into existing IoT infrastructures to promote workplace safety, streamline construction processes, and support data-driven management decisions. Full article

► Show Figures

Figure 1

30 pages, 8644 KiB

Open AccessArticle

Development of a UR5 Cobot Vision System with MLP Neural Network for Object Classification and Sorting

by Szymon Kluziak and Piotr Kohut

Information 2025, 16(7), 550; https://doi.org/10.3390/info16070550 - 27 Jun 2025

Viewed by 315

Abstract

This paper presents the implementation of a vision system for a collaborative robot equipped with a web camera and a Python-based control algorithm for automated object-sorting tasks. The vision system aims to detect, classify, and manipulate objects within the robot’s workspace using only [...] Read more.

This paper presents the implementation of a vision system for a collaborative robot equipped with a web camera and a Python-based control algorithm for automated object-sorting tasks. The vision system aims to detect, classify, and manipulate objects within the robot’s workspace using only 2D camera images. The vision system was integrated with the Universal Robots UR5 cobot and designed for object sorting based on shape recognition. The software stack includes OpenCV for image processing, NumPy for numerical operations, and scikit-learn for multilayer perceptron (MLP) models. The paper outlines the calibration process, including lens distortion correction and camera-to-robot calibration in a hand-in-eye configuration to establish the spatial relationship between the camera and the cobot. Object localization relied on a virtual plane aligned with the robot’s workspace. Object classification was conducted using contour similarity with Hu moments, SIFT-based descriptors with FLANN matching, and MLP-based neural models trained on preprocessed images. Conducted performance evaluations encompassed accuracy metrics for used identification methods (MLP classifier, contour similarity, and feature descriptor matching) and the effectiveness of the vision system in controlling the cobot for sorting tasks. The evaluation focused on classification accuracy and sorting effectiveness, using sensitivity, specificity, precision, accuracy, and F1-score metrics. Results showed that neural network-based methods outperformed traditional methods in all categories, concurrently offering more straightforward implementation. Full article

(This article belongs to the Section Information Applications)

► Show Figures

Graphical abstract

27 pages, 6771 KiB

Open AccessArticle

A Deep Neural Network Framework for Dynamic Two-Handed Indian Sign Language Recognition in Hearing and Speech-Impaired Communities

by Vaidhya Govindharajalu Kaliyaperumal and Paavai Anand Gopalan

Sensors 2025, 25(12), 3652; https://doi.org/10.3390/s25123652 - 11 Jun 2025

Viewed by 475

Abstract

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand [...] Read more.

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand gesture expression along with identification through the various different unidentified signals to configure their palms. This challenge can be met with a novel Enhanced Convolutional Transformer with Adaptive Tuna Swarm Optimization (ECT-ATSO) recognition framework proposed for double-handed sign language. In order to improve both model generalization and image quality, preprocessing is applied to images prior to prediction, and the proposed dataset is organized to handle multiple dynamic words. Feature graining is employed to obtain local features, and the ViT transformer architecture is then utilized to capture global features from the preprocessed images. After concatenation, this generates a feature map that is then divided into various words using an Inverted Residual Feed-Forward Network (IRFFN). Using the Tuna Swarm Optimization (TSO) algorithm in its enhanced form, the provided Enhanced Convolutional Transformer (ECT) model is optimally tuned to handle the problem dimensions with convergence problem parameters. In order to solve local optimization constraints when adjusting the position for the tuna update process, a mutation operator was introduced. The dataset visualization that demonstrates the best effectiveness compared to alternative cutting-edge methods, recognition accuracy, and convergences serves as a means to measure performance of this suggested framework. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

14 pages, 4259 KiB

Open AccessArticle

Preparation and Performance of a Grid-Based PCL/TPU@MWCNTs Nanofiber Membrane for Pressure Sensor

by Ping Zhu and Qian Lan

Sensors 2025, 25(10), 3201; https://doi.org/10.3390/s25103201 - 19 May 2025

Viewed by 599

Abstract

The intrinsic trade-off among sensitivity, response speed, and measurement range continues to hinder the wider adoption of flexible pressure sensors in areas such as medical diagnostics and gesture recognition. In this work, we propose a grid-structured polycaprolactone/thermoplastic-polyurethane nanofiber pressure sensor decorated with multi-walled [...] Read more.

The intrinsic trade-off among sensitivity, response speed, and measurement range continues to hinder the wider adoption of flexible pressure sensors in areas such as medical diagnostics and gesture recognition. In this work, we propose a grid-structured polycaprolactone/thermoplastic-polyurethane nanofiber pressure sensor decorated with multi-walled carbon nanotubes (PCL/TPU@MWCNTs). By introducing a gradient grid membrane, the strain distribution and reconstruction of the conductive network can be modulated, thereby alleviating the conflict between sensitivity, response speed, and operating range. First, static mechanical simulations were performed to compare the mechanical responses of planar and grid membranes, confirming that the grid architecture offers superior sensitivity. Next, PCL/TPU@MWCNT nanofiber membranes were fabricated via coaxial electrospinning followed by vacuum-filtration and assembled into three-layer planar and grid piezoresistive pressure sensors. Their sensing characteristics were evaluated by simple index-finger motions and slide the mouse wheel identified. Within 0–34 kPa, the sensitivities of the planar and grid sensors reached 1.80 kPa⁻¹ and 2.24 kPa⁻¹, respectively; in the 35–75 kPa range, they were 1.03 kPa⁻¹ and 1.27 kPa⁻¹. The rise/decay times of the output signals were 10.53 ms/11.20 ms for the planar sensor and 9.17 ms/9.65 ms for the grid sensor. Both sensors successfully distinguished active index-finger bending at 0–0.5 Hz. The dynamic range of the grid sensor during the extension motion of the index finger is 105 dB and, during the scrolling mouse motion, is 55 dB, affording higher measurement stability and a broader operating window, fully meeting the requirements for high-precision hand-motion recognition. Full article

(This article belongs to the Special Issue Advanced Flexible Electronics and Wearable Biosensing Systems)

► Show Figures

Figure 1

28 pages, 10571 KiB

Open AccessArticle

Towards Seamless Human–Robot Interaction: Integrating Computer Vision for Tool Handover and Gesture-Based Control

by Branislav Malobický, Marián Hruboš, Júlia Kafková, Jakub Krško, Mário Michálik, Rastislav Pirník and Pavol Kuchár

Appl. Sci. 2025, 15(7), 3575; https://doi.org/10.3390/app15073575 - 25 Mar 2025

Cited by 1 | Viewed by 1023

Abstract

This paper presents the development of a robotic workstation that integrates a collaborative robot as an assistant, leveraging advanced computer vision techniques to enhance human–robot interaction. The system employs state-of-the-art computer vision models, YOLOv7 and YOLOv8, for precise tool detection and gesture recognition, [...] Read more.

This paper presents the development of a robotic workstation that integrates a collaborative robot as an assistant, leveraging advanced computer vision techniques to enhance human–robot interaction. The system employs state-of-the-art computer vision models, YOLOv7 and YOLOv8, for precise tool detection and gesture recognition, enabling the robot to seamlessly interpret operator commands and hand over tools based on gestural cues. The primary objective is to facilitate intuitive, non-verbal control of the robot, improving collaboration between human operators and robots in dynamic work environments. The results show that this approach enhances the efficiency and reliability of human–robot cooperation, particularly in manufacturing settings, by streamlining tasks and boosting productivity. By integrating real-time computer vision into the robot’s decision-making process, the system demonstrates heightened adaptability and responsiveness, creating the way for more natural and effective human–robot collaboration in industrial contexts. Full article

(This article belongs to the Special Issue Future-Proof Solutions for Intelligent and Sustainable Machinery and Equipment)

► Show Figures

Figure 1

19 pages, 28961 KiB

Open AccessArticle

Human-like Dexterous Grasping Through Reinforcement Learning and Multimodal Perception

by Wen Qi, Haoyu Fan, Cankun Zheng, Hang Su and Samer Alfayad

Biomimetics 2025, 10(3), 186; https://doi.org/10.3390/biomimetics10030186 - 18 Mar 2025

Cited by 2 | Viewed by 1189

Abstract

Dexterous robotic grasping with multifingered hands remains a critical challenge in non-visual environments, where diverse object geometries and material properties demand adaptive force modulation and tactile-aware manipulation. To address this, we propose the Reinforcement Learning-Based Multimodal Perception (RLMP) framework, which integrates human-like grasping [...] Read more.

Dexterous robotic grasping with multifingered hands remains a critical challenge in non-visual environments, where diverse object geometries and material properties demand adaptive force modulation and tactile-aware manipulation. To address this, we propose the Reinforcement Learning-Based Multimodal Perception (RLMP) framework, which integrates human-like grasping intuition through operator-worn gloves with tactile-guided reinforcement learning. The framework’s key innovation lies in its Tactile-Driven DCNN architecture—a lightweight convolutional network achieving 98.5% object recognition accuracy using spatiotemporal pressure patterns—coupled with an RL policy refinement mechanism that dynamically correlates finger kinematics with real-time tactile feedback. Experimental results demonstrate reliable grasping performance across deformable and rigid objects while maintaining force precision critical for fragile targets. By bridging human teleoperation with autonomous tactile adaptation, RLMP eliminates dependency on visual input and predefined object models, establishing a new paradigm for robotic dexterity in occlusion-rich scenarios. Full article

(This article belongs to the Special Issue Biomimetic Innovations for Human–Machine Interaction)

► Show Figures

Figure 1

23 pages, 10794 KiB

Open AccessArticle

Hand–Eye Separation-Based First-Frame Positioning and Follower Tracking Method for Perforating Robotic Arm

by Handuo Zhang, Jun Guo, Chunyan Xu and Bin Zhang

Appl. Sci. 2025, 15(5), 2769; https://doi.org/10.3390/app15052769 - 4 Mar 2025

Viewed by 704

Abstract

In subway tunnel construction, current hand–eye integrated drilling robots use a camera mounted on the drilling arm for image acquisition. However, dust interference and long-distance operation cause a decline in image quality, affecting the stability and accuracy of the visual recognition system. Additionally, [...] Read more.

In subway tunnel construction, current hand–eye integrated drilling robots use a camera mounted on the drilling arm for image acquisition. However, dust interference and long-distance operation cause a decline in image quality, affecting the stability and accuracy of the visual recognition system. Additionally, the computational complexity of high-precision detection models limits deployment on resource-constrained edge devices, such as industrial controllers. To address these challenges, this paper proposes a dual-arm tunnel drilling robot system with hand–eye separation, utilizing the first-frame localization and follower tracking method. The vision arm (“eye”) provides real-time position data to the drilling arm (“hand”), ensuring accurate and efficient operation. The study employs an RFBNet model for initial frame localization, replacing the original VGG16 backbone with ShuffleNet V2. This reduces model parameters by 30% (135.5 MB vs. 146.3 MB) through channel splitting and depthwise separable convolutions to reduce computational complexity. Additionally, the GIoU loss function is introduced to replace the traditional IoU, further optimizing bounding box regression through the calculation of the minimum enclosing box. This resolves the gradient vanishing problem in traditional IoU and improves average precision (AP) by 3.3% (from 0.91 to 0.94). For continuous tracking, a SiamRPN-based algorithm combined with Kalman filtering and PID control ensures robustness against occlusions and nonlinear disturbances, increasing the success rate by 1.6% (0.639 vs. 0.629). Experimental results show that this approach significantly improves tracking accuracy and operational stability, achieving 31 FPS inference speed on edge devices and providing a deployable solution for tunnel construction’s safety and efficiency needs. Full article

► Show Figures

Figure 1

10 pages, 2588 KiB

Open AccessProceeding Paper

Combining Interactive Technology and Visual Cognition—A Case Study on Preventing Dementia in Older Adults

by Chung-Shun Feng and Chao-Ming Wang

Eng. Proc. 2025, 89(1), 16; https://doi.org/10.3390/engproc2025089016 - 25 Feb 2025

Viewed by 567

Abstract

According to the World Health Organization, the global population is aging, with cognitive and memory functions declining from the age of 40–50. Individuals aged 65 and older are particularly prone to dementia. Therefore, we developed an interactive system for visual cognitive training to [...] Read more.

According to the World Health Organization, the global population is aging, with cognitive and memory functions declining from the age of 40–50. Individuals aged 65 and older are particularly prone to dementia. Therefore, we developed an interactive system for visual cognitive training to prevent dementia and delay the onset of memory loss. The system comprises three “three-dimensional objects” with printed 2D barcodes and near-field communication (NFC) tags and operating software processing text, images, and multimedia content. Electroencephalography (EEG) data from a brainwave sensor were used to interpret brain signals. The system operates through interactive games combined with real-time feedback from EEG data to reduce the likelihood of dementia. The system provides feedback based on textual, visual, and multimedia information and offers a new form of entertainment. Thirty participants were invited to participate in a pre-test questionnaire survey. Different tasks were assigned to randomly selected participants with three-dimensional objects. Sensing technologies such as quick-response (QR) codes and near-field communication (NFC) were used to display information on smartphones. Visual content included text-image narratives and media playback. EEG was used for visual recognition and perception responses. The system was evaluated using the system usability scale (SUS). Finally, the data obtained from participants using the system were analyzed. The system improved hand-eye coordination and brain memory using interactive games. After receiving visual information, brain function was stimulated through brain stimulation and focused reading, which prevents dementia. This system could be introduced into the healthcare industry to accumulate long-term cognitive function data for the brain and personal health data to prevent the occurrence of dementia. Full article

(This article belongs to the Proceedings of 2024 IEEE 7th International Conference on Knowledge Innovation and Invention)

► Show Figures

Figure 1

19 pages, 8196 KiB

Open AccessEditor’s ChoiceArticle

Human–Robot Interaction Using Dynamic Hand Gesture for Teleoperation of Quadruped Robots with a Robotic Arm

by Jianan Xie, Zhen Xu, Jiayu Zeng, Yuyang Gao and Kenji Hashimoto

Electronics 2025, 14(5), 860; https://doi.org/10.3390/electronics14050860 - 21 Feb 2025

Cited by 2 | Viewed by 2267

Abstract

Human–Robot Interaction (HRI) using hand gesture recognition offers an effective and non-contact approach to enhancing operational intuitiveness and user convenience. However, most existing studies primarily focus on either static sign language recognition or the tracking of hand position and orientation in space. These [...] Read more.

Human–Robot Interaction (HRI) using hand gesture recognition offers an effective and non-contact approach to enhancing operational intuitiveness and user convenience. However, most existing studies primarily focus on either static sign language recognition or the tracking of hand position and orientation in space. These approaches often prove inadequate for controlling complex robotic systems. This paper proposes an advanced HRI system leveraging dynamic hand gestures for controlling quadruped robots equipped with a robotic arm. The proposed system integrates both semantic and pose information from dynamic gestures to enable comprehensive control over the robot’s diverse functionalities. First, a Depth–MediaPipe framework is introduced to facilitate the precise three-dimensional (3D) coordinate extraction of 21 hand bone keypoints. Subsequently, a Semantic-Pose to Motion (SPM) model is developed to analyze and interpret both the pose and semantic aspects of hand gestures. This model translates the extracted 3D coordinate data into corresponding mechanical actions in real-time, encompassing quadruped robot locomotion, robotic arm end-effector tracking, and semantic-based command switching. Extensive real-world experiments demonstrate the proposed system’s effectiveness in achieving real-time interaction and precise control, underscoring its potential for enhancing the usability of complex robotic platforms. Full article

(This article belongs to the Special Issue Communication Systems and Manipulators for Robots and Unmanned Systems)

► Show Figures

Figure 1

13 pages, 35894 KiB

Open AccessArticle

An Artificial Intelligence Approach to the Craniofacial Recapitulation of Crisponi/Cold-Induced Sweating Syndrome 1 (CISS1/CISS) from Newborns to Adolescent Patients

by Giulia Pascolini, Dario Didona and Luigi Tarani

Diagnostics 2025, 15(5), 521; https://doi.org/10.3390/diagnostics15050521 - 21 Feb 2025

Viewed by 864

Abstract

Background/Objectives: Crisponi/cold-induced sweating syndrome 1 (CISS1/CISS, MIM#272430) is a genetic disorder due to biallelic variants in CRFL1 (MIM*604237). The related phenotype is mainly characterized by abnormal thermoregulation and sweating, facial muscle contractions in response to tactile and crying-inducing stimuli at an early [...] Read more.

Background/Objectives: Crisponi/cold-induced sweating syndrome 1 (CISS1/CISS, MIM#272430) is a genetic disorder due to biallelic variants in CRFL1 (MIM*604237). The related phenotype is mainly characterized by abnormal thermoregulation and sweating, facial muscle contractions in response to tactile and crying-inducing stimuli at an early age, skeletal anomalies (camptodactyly of the hands, scoliosis), and craniofacial dysmorphisms, comprising full cheeks, micrognathia, high and narrow palate, low-set ears, and a depressed nasal bridge. The condition is associated with high lethality during the neonatal period and can benefit from timely symptomatic therapy. Methods: We collected frontal images of all patients with CISS1/CISS published to date, which were analyzed with Face2Gene (F2G), a machine-learning technology for the facial diagnosis of syndromic phenotypes. In total, 75 portraits were subdivided into three cohorts, based on age (Cohort 1 and 2) and the presence of the typical facial trismus (Cohort 3). These portraits were uploaded to F2G to test their suitability for facial analysis and to verify the capacity of the AI tool to correctly recognize the syndrome based on the facial features only. The photos which passed this phase (62 images) were fed to three different AI algorithms—DeepGestalt, Facial D-Score, and GestaltMatcher. Results: The DeepGestalt algorithm results, including the correct diagnosis using a frontal portrait, suggested a similar facial phenotype in the first two cohorts. Cohort 3 seemed to be highly differentiable. The results were expressed in terms of the area under the curve (AUC) of the receiver operating characteristic (ROC) curve and p Value. The Facial D-Score values indicated the presence of a consistent degree of dysmorphic signs in the three cohorts, which was also confirmed by the GestaltMatcher algorithm. Interestingly, the latter allowed us to identify overlapping genetic disorders. Conclusions: This is the first AI-powered image analysis in defining the craniofacial contour of CISS1/CISS and in determining the feasibility of training the tool used in its clinical recognition. The obtained results showed that the use of F2G can reveal valid support in the diagnostic process of CISS1/CISS, especially in more severe phenotypes, manifesting with facial contractions and potentially lethal consequences. Full article

(This article belongs to the Special Issue Application of Machine Learning in Disease Screening, Diagnosis and Prognosis)

► Show Figures

Figure 1

20 pages, 1820 KiB

Open AccessArticle

Hybrid Solution Through Systematic Electrical Impedance Tomography Data Reduction and CNN Compression for Efficient Hand Gesture Recognition on Resource-Constrained IoT Devices

by Salwa Sahnoun, Mahdi Mnif, Bilel Ghoul, Mohamed Jemal, Ahmed Fakhfakh and Olfa Kanoun

Future Internet 2025, 17(2), 89; https://doi.org/10.3390/fi17020089 - 14 Feb 2025

Cited by 2 | Viewed by 965

Abstract

The rapid advancement of edge computing and Tiny Machine Learning (TinyML) has created new opportunities for deploying intelligence in resource-constrained environments. With the growing demand for intelligent Internet of Things (IoT) devices that can efficiently process complex data in real-time, there is an [...] Read more.

The rapid advancement of edge computing and Tiny Machine Learning (TinyML) has created new opportunities for deploying intelligence in resource-constrained environments. With the growing demand for intelligent Internet of Things (IoT) devices that can efficiently process complex data in real-time, there is an urgent need for innovative optimisation techniques that overcome the limitations of IoT devices and enable accurate and efficient computations. This study investigates a novel approach to optimising Convolutional Neural Network (CNN) models for Hand Gesture Recognition (HGR) based on Electrical Impedance Tomography (EIT), which requires complex signal processing, energy efficiency, and real-time processing, by simultaneously reducing input complexity and using advanced model compression techniques. By systematically reducing and halving the input complexity of a 1D CNN from 40 to 20 Boundary Voltages (BVs) and applying an innovative compression method, we achieved remarkable model size reductions of 91.75% and 97.49% for 40 and 20 BVs EIT inputs, respectively. Additionally, the Floating-Point operations (FLOPs) are significantly reduced, by more than 99% in both cases. These reductions have been achieved with a minimal loss of accuracy, maintaining the performance of 97.22% and 94.44% for 40 and 20 BVs inputs, respectively. The most significant result is the 20 BVs compressed model. In fact, at only 8.73 kB and a remarkable 94.44% accuracy, our model demonstrates the potential of intelligent design strategies in creating ultra-lightweight, high-performance CNN-based solutions for resource-constrained devices with near-full performance capabilities specifically for the case of HGR based on EIT inputs. Full article

(This article belongs to the Special Issue Joint Design and Integration in Smart IoT Systems)

► Show Figures

Graphical abstract

25 pages, 2844 KiB

Open AccessArticle

Real-Time Gesture-Based Hand Landmark Detection for Optimized Mobile Photo Capture and Synchronization

by Pedro Marques, Paulo Váz, José Silva, Pedro Martins and Maryam Abbasi

Electronics 2025, 14(4), 704; https://doi.org/10.3390/electronics14040704 - 12 Feb 2025

Viewed by 2039

Abstract

Gesture recognition technology has emerged as a transformative solution for natural and intuitive human–computer interaction (HCI), offering touch-free operation across diverse fields such as healthcare, gaming, and smart home systems. In mobile contexts, where hygiene, convenience, and the ability to operate under resource [...] Read more.

Gesture recognition technology has emerged as a transformative solution for natural and intuitive human–computer interaction (HCI), offering touch-free operation across diverse fields such as healthcare, gaming, and smart home systems. In mobile contexts, where hygiene, convenience, and the ability to operate under resource constraints are critical, hand gesture recognition provides a compelling alternative to traditional touch-based interfaces. However, implementing effective gesture recognition in real-world mobile settings involves challenges such as limited computational power, varying environmental conditions, and the requirement for robust offline–online data management. In this study, we introduce ThumbsUp, which is a gesture-driven system, and employ a partially systematic literature review approach (inspired by core PRISMA guidelines) to identify the key research gaps in mobile gesture recognition. By incorporating insights from deep learning–based methods (e.g., CNNs and Transformers) while focusing on low resource consumption, we leverage Google’s MediaPipe in our framework for real-time detection of 21 hand landmarks and adaptive lighting pre-processing, enabling accurate recognition of a “thumbs-up” gesture. The system features a secure queue-based offline–cloud synchronization model, which ensures that the captured images and metadata (encrypted with AES-GCM) remain consistent and accessible even with intermittent connectivity. Experimental results under dynamic lighting, distance variations, and partially cluttered environments confirm the system’s superior low-light performance and decreased resource consumption compared to baseline camera applications. Additionally, we highlight the feasibility of extending ThumbsUp to incorporate AI-driven enhancements for abrupt lighting changes and, in the future, electromyographic (EMG) signals for users with motor impairments. Our comprehensive evaluation demonstrates that ThumbsUp maintains robust performance on typical mobile hardware, showing resilience to unstable network conditions and minimal reliance on high-end GPUs. These findings offer new perspectives for deploying gesture-based interfaces in the broader IoT ecosystem, thus paving the way toward secure, efficient, and inclusive mobile HCI solutions. Full article

(This article belongs to the Special Issue AI-Driven Digital Image Processing: Latest Advances and Prospects)

► Show Figures

Figure 1

17 pages, 20814 KiB

Open AccessArticle

Vision-Based Gesture-Driven Drone Control in a Metaverse-Inspired 3D Simulation Environment

by Yaseen, Oh-Jin Kwon, Jaeho Kim, Jinhee Lee and Faiz Ullah

Drones 2025, 9(2), 92; https://doi.org/10.3390/drones9020092 - 24 Jan 2025

Cited by 2 | Viewed by 2174

Abstract

Unlike traditional remote control systems for controlling unmanned aerial vehicles (UAVs) and drones, active research is being carried out in the domain of vision-based hand gesture recognition systems for drone control. However, contrary to static and sensor based hand gesture recognition, recognizing dynamic [...] Read more.

Unlike traditional remote control systems for controlling unmanned aerial vehicles (UAVs) and drones, active research is being carried out in the domain of vision-based hand gesture recognition systems for drone control. However, contrary to static and sensor based hand gesture recognition, recognizing dynamic hand gestures is challenging due to the complex nature of multi-dimensional hand gesture data, present in 2D images. In a real-time application scenario, performance and safety is crucial. Therefore we propose a hybrid lightweight dynamic hand gesture recognition system and a 3D simulator based drone control environment for live simulation. We used transfer learning-based computer vision techniques to detect dynamic hand gestures in real-time. The gestures are recognized, based on which predetermine commands are selected and sent to a drone simulation environment that operates on a different computer via socket connectivity. Without conventional input devices, hand gesture detection integrated with the virtual environment offers a user-friendly and immersive way to control drone motions, improving user interaction. Through a variety of test situations, the efficacy of this technique is illustrated, highlighting its potential uses in remote-control systems, gaming, and training. The system is tested and evaluated in real-time, outperforming state-of-the-art methods. The code utilized in this study are publicly accessible. Further details can be found in the “Data Availability Statement”. Full article

(This article belongs to the Special Issue Mobile Fog and Edge Computing in Drone Swarms)

► Show Figures

Figure 1

17 pages, 8323 KiB

Open AccessArticle

A Symmetrical Leech-Inspired Soft Crawling Robot Based on Gesture Control

by Jiabiao Li, Ruiheng Liu, Tianyu Zhang and Jianbin Liu

Biomimetics 2025, 10(1), 35; https://doi.org/10.3390/biomimetics10010035 - 8 Jan 2025

Viewed by 1015

Abstract

This paper presents a novel soft crawling robot controlled by gesture recognition, aimed at enhancing the operability and adaptability of soft robots through natural human–computer interactions. The Leap Motion sensor is employed to capture hand gesture data, and Unreal Engine is used for [...] Read more.

This paper presents a novel soft crawling robot controlled by gesture recognition, aimed at enhancing the operability and adaptability of soft robots through natural human–computer interactions. The Leap Motion sensor is employed to capture hand gesture data, and Unreal Engine is used for gesture recognition. Using the UE4Duino, gesture semantics are transmitted to an Arduino control system, enabling direct control over the robot’s movements. For accurate and real-time gesture recognition, we propose a threshold-based method for static gestures and a backpropagation (BP) neural network model for dynamic gestures. In terms of design, the robot utilizes cost-effective thermoplastic polyurethane (TPU) film as the primary pneumatic actuator material. Through a positive and negative pressure switching circuit, the robot’s actuators achieve controllable extension and contraction, allowing for basic movements such as linear motion and directional changes. Experimental results demonstrate that the robot can successfully perform diverse motions under gesture control, highlighting the potential of gesture-based interaction in soft robotics. Full article

(This article belongs to the Special Issue Design, Actuation, and Fabrication of Bio-Inspired Soft Robotics)

► Show Figures

Figure 1

Search Results (134)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (134)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI