MDPI - Publisher of Open Access Journals

19 pages, 709 KiB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 141

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

17 pages, 5876 KiB

Open AccessArticle

Optimization of Knitted Strain Sensor Structures for a Real-Time Korean Sign Language Translation Glove System

by Youn-Hee Kim and You-Kyung Oh

Sensors 2025, 25(14), 4270; https://doi.org/10.3390/s25144270 - 9 Jul 2025

Viewed by 177

Abstract

Herein, an integrated system is developed based on knitted strain sensors for real-time translation of sign language into text and audio voices. To investigate how the structural characteristics of the knit affect the electrical performance, the position of the conductive yarn and the [...] Read more.

Herein, an integrated system is developed based on knitted strain sensors for real-time translation of sign language into text and audio voices. To investigate how the structural characteristics of the knit affect the electrical performance, the position of the conductive yarn and the presence or absence of elastic yarn are set as experimental variables, and five distinct sensors are manufactured. A comprehensive analysis of the electrical and mechanical performance, including sensitivity, responsiveness, reliability, and repeatability, reveals that the sensor with a plain-plated-knit structure, no elastic yarn included, and the conductive yarn positioned uniformly on the back exhibits the best performance, with a gauge factor (GF) of 88. The sensor exhibited a response time of less than 0.1 s at 50 cycles per minute (cpm), demonstrating that it detects and responds promptly to finger joint bending movements. Moreover, it exhibits stable repeatability and reliability across various angles and speeds, confirming its optimization for sign language recognition applications. Based on this design, an integrated textile-based system is developed by incorporating the sensor, interconnections, snap connectors, and a microcontroller unit (MCU) with built-in Bluetooth Low Energy (BLE) technology into the knitted glove. The complete system successfully recognized 12 Korean Sign Language (KSL) gestures in real time and output them as both text and audio through a dedicated application, achieving a high recognition accuracy of 98.67%. Thus, the present study quantitatively elucidates the structure–performance relationship of a knitted sensor and proposes a wearable system that accounts for real-world usage environments, thereby demonstrating the commercialization potential of the technology. Full article

(This article belongs to the Section Wearables)

► Show Figures

Figure 1

18 pages, 4982 KiB

Open AccessArticle

Unsupervised Clustering and Ensemble Learning for Classifying Lip Articulation in Fingerspelling

by Nurzada Amangeldy, Nazerke Gazizova, Marek Milosz, Bekbolat Kurmetbek, Aizhan Nazyrova and Akmaral Kassymova

Sensors 2025, 25(12), 3703; https://doi.org/10.3390/s25123703 - 13 Jun 2025

Viewed by 349

Abstract

This paper presents a new methodology for analyzing lip articulation during fingerspelling aimed at extracting robust visual patterns that can overcome the inherent ambiguity and variability of lip shape. The proposed approach is based on unsupervised clustering of lip movement trajectories to identify [...] Read more.

This paper presents a new methodology for analyzing lip articulation during fingerspelling aimed at extracting robust visual patterns that can overcome the inherent ambiguity and variability of lip shape. The proposed approach is based on unsupervised clustering of lip movement trajectories to identify consistent articulatory patterns across different time profiles. The methodology is not limited to using a single model. Still, it includes the exploration of varying cluster configurations and an assessment of their robustness, as well as a detailed analysis of the correspondence between individual alphabet letters and specific clusters. In contrast to direct classification based on raw visual features, this approach pre-tests clustered representations using a model-based assessment of their discriminative potential. This structured approach enhances the interpretability and robustness of the extracted features, highlighting the importance of lip dynamics as an auxiliary modality in multimodal sign language recognition. The obtained results demonstrate that trajectory clustering can serve as a practical method for generating features, providing more accurate and context-sensitive gesture interpretation. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

27 pages, 6771 KiB

Open AccessArticle

A Deep Neural Network Framework for Dynamic Two-Handed Indian Sign Language Recognition in Hearing and Speech-Impaired Communities

by Vaidhya Govindharajalu Kaliyaperumal and Paavai Anand Gopalan

Sensors 2025, 25(12), 3652; https://doi.org/10.3390/s25123652 - 11 Jun 2025

Viewed by 475

Abstract

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand [...] Read more.

Language is that kind of expression by which effective communication with another can be well expressed. One may consider such as a connecting bridge for bridging communication gaps for the hearing- and speech-impaired, even though it remains as an advanced method for hand gesture expression along with identification through the various different unidentified signals to configure their palms. This challenge can be met with a novel Enhanced Convolutional Transformer with Adaptive Tuna Swarm Optimization (ECT-ATSO) recognition framework proposed for double-handed sign language. In order to improve both model generalization and image quality, preprocessing is applied to images prior to prediction, and the proposed dataset is organized to handle multiple dynamic words. Feature graining is employed to obtain local features, and the ViT transformer architecture is then utilized to capture global features from the preprocessed images. After concatenation, this generates a feature map that is then divided into various words using an Inverted Residual Feed-Forward Network (IRFFN). Using the Tuna Swarm Optimization (TSO) algorithm in its enhanced form, the provided Enhanced Convolutional Transformer (ECT) model is optimally tuned to handle the problem dimensions with convergence problem parameters. In order to solve local optimization constraints when adjusting the position for the tuna update process, a mutation operator was introduced. The dataset visualization that demonstrates the best effectiveness compared to alternative cutting-edge methods, recognition accuracy, and convergences serves as a means to measure performance of this suggested framework. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

16 pages, 2108 KiB

Open AccessArticle

One Possible Path Towards a More Robust Task of Traffic Sign Classification in Autonomous Vehicles Using Autoencoders

by Ivan Martinović, Tomás de Jesús Mateo Sanguino, Jovana Jovanović, Mihailo Jovanović and Milena Djukanović

Electronics 2025, 14(12), 2382; https://doi.org/10.3390/electronics14122382 - 11 Jun 2025

Viewed by 580

Abstract

The increasing deployment of autonomous vehicles (AVs) has exposed critical vulnerabilities in traffic sign classification systems, particularly against adversarial attacks that can compromise safety. This study proposes a dual-purpose defense framework based on convolutional autoencoders to enhance robustness against two prominent white-box attacks: [...] Read more.

The increasing deployment of autonomous vehicles (AVs) has exposed critical vulnerabilities in traffic sign classification systems, particularly against adversarial attacks that can compromise safety. This study proposes a dual-purpose defense framework based on convolutional autoencoders to enhance robustness against two prominent white-box attacks: Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD). Experiments on the German Traffic Sign Recognition Benchmark (GTSRB) dataset show that, although these attacks can significantly degrade system performance, the proposed models are capable of partially recovering lost accuracy. Notably, the defense demonstrates strong capabilities in both detecting and reconstructing manipulated traffic signs, even under low-perturbation scenarios. Additionally, a feature-based autoencoder is introduced, which—despite a high false positive rate—achieves perfect detection in critical conditions, a tradeoff considered acceptable in safety-critical contexts. These results highlight the potential of autoencoder-based architectures as a foundation for resilient AV perception while underscoring the need for hybrid models integrating visual-language frameworks for real-time, fail-safe operation. Full article

(This article belongs to the Special Issue Autonomous and Connected Vehicles)

► Show Figures

Figure 1

26 pages, 8022 KiB

Open AccessArticle

Toward a Recognition System for Mexican Sign Language: Arm Movement Detection

by Gabriela Hilario-Acuapan, Keny Ordaz-Hernández, Mario Castelán and Ismael Lopez-Juarez

Sensors 2025, 25(12), 3636; https://doi.org/10.3390/s25123636 - 10 Jun 2025

Viewed by 642

Abstract

This paper describes ongoing work surrounding the creation of a recognition system for Mexican Sign Language (LSM). We propose a general sign decomposition that is divided into three parts, i.e., hand configuration (HC), arm movement (AM), and non-hand gestures (NHGs). This paper focuses [...] Read more.

This paper describes ongoing work surrounding the creation of a recognition system for Mexican Sign Language (LSM). We propose a general sign decomposition that is divided into three parts, i.e., hand configuration (HC), arm movement (AM), and non-hand gestures (NHGs). This paper focuses on the AM features and reports the approach created to analyze visual patterns in arm joint movements (wrists, shoulders, and elbows). For this research, a proprietary dataset—one that does not limit the recognition of arm movements—was developed, with active participation from the deaf community and LSM experts. We analyzed two case studies involving three sign subsets. For each sign, the pose was extracted to generate shapes of the joint paths during the arm movements and fed to a CNN classifier. YOLOv8 was used for pose estimation and visual pattern classification purposes. The proposed approach, based on pose estimation, shows promising results for constructing CNN models to classify a wide range of signs. Full article

(This article belongs to the Special Issue Multimodal Perception Modeling Based on Advanced Computational Technologies)

► Show Figures

Figure 1

24 pages, 6881 KiB

Open AccessArticle

Sign Language Anonymization: Face Swapping Versus Avatars

by Marina Perea-Trigo, Manuel Vázquez-Enríquez, Jose C. Benjumea-Bellot, Jose L. Alba-Castro and Juan A. Álvarez-García

Electronics 2025, 14(12), 2360; https://doi.org/10.3390/electronics14122360 - 9 Jun 2025

Viewed by 470

Abstract

The visual nature of Sign Language datasets raises privacy concerns that hinder data sharing, which is essential for advancing deep learning (DL) models in Sign Language recognition and translation. This study evaluated two anonymization techniques, realistic avatar synthesis and face swapping (FS), designed [...] Read more.

The visual nature of Sign Language datasets raises privacy concerns that hinder data sharing, which is essential for advancing deep learning (DL) models in Sign Language recognition and translation. This study evaluated two anonymization techniques, realistic avatar synthesis and face swapping (FS), designed to anonymize the identities of signers, while preserving the semantic integrity of signed content. A novel metric, Identity Anonymization with Expressivity Preservation (IAEP), is introduced to assess the balance between effective anonymization and the preservation of facial expressivity crucial for Sign Language communication. In addition, the quality evaluation included the LPIPS and FID metrics, which measure perceptual similarity and visual quality. A survey with deaf participants further complemented the analysis, providing valuable insight into the practical usability and comprehension of anonymized videos. The results show that while face swapping achieved acceptable anonymization and preserved semantic clarity, avatar-based anonymization struggled with comprehension. These findings highlight the need for further research efforts on securing privacy while preserving Sign Language understandability, both for dataset accessibility and the anonymous participation of deaf people in digital content. Full article

(This article belongs to the Special Issue Application of Machine Learning in Graphics and Images, 2nd Edition)

► Show Figures

Figure 1

24 pages, 4340 KiB

Open AccessArticle

Real-Time Mobile Application for Translating Portuguese Sign Language to Text Using Machine Learning

by Gonçalo Fonseca, Gonçalo Marques, Pedro Albuquerque Santos and Rui Jesus

Electronics 2025, 14(12), 2351; https://doi.org/10.3390/electronics14122351 - 8 Jun 2025

Viewed by 1009

Abstract

Communication barriers between deaf and hearing individuals present significant challenges to social inclusion, highlighting the need for effective technological aids. This study aimed to bridge this gap by developing a mobile system for the real-time translation of Portuguese Sign Language (LGP) alphabet gestures [...] Read more.

Communication barriers between deaf and hearing individuals present significant challenges to social inclusion, highlighting the need for effective technological aids. This study aimed to bridge this gap by developing a mobile system for the real-time translation of Portuguese Sign Language (LGP) alphabet gestures into text, addressing a specific technological void for LGP. The core of the solution is a mobile application integrating two distinct machine learning approaches trained on a custom LGP dataset: firstly, a Convolutional Neural Network (CNN) optimized with TensorFlow Lite for efficient, on-device image classification, enabling offline use; secondly, a method utilizing MediaPipe for hand landmark extraction from the camera feed, with classification performed by a server-side Multilayer Perceptron (MLP). Evaluation tests confirmed that both approaches could recognize LGP alphabet gestures with good accuracy (F1-scores of approximately 76% for the CNN and 77% for the MediaPipe+MLP) and processing speed (1 to 2 s per gesture on high-end devices using the CNN and 3 to 5 s under typical network conditions using MediaPipe+MLP), facilitating efficient real-time translation, though performance trade-offs regarding speed versus accuracy under different conditions were observed. The implementation of this dual-method system provides crucial flexibility, adapting to varying network conditions and device capabilities, and offers a scalable foundation for future expansion to include more complex gestures. This work delivers a practical tool that may contribute to improve communication accessibility and the societal integration of the deaf community in Portugal. Full article

(This article belongs to the Special Issue Virtual Reality Applications in Enhancing Human Lives)

► Show Figures

Figure 1

21 pages, 512 KiB

Open AccessArticle

Enhancing Sign Language Recognition Performance Through Coverage-Based Dynamic Clip Generation

by Taewan Kim and Bongjae Kim

Appl. Sci. 2025, 15(11), 6372; https://doi.org/10.3390/app15116372 - 5 Jun 2025

Viewed by 543

Abstract

Sign Language Recognition (SLR) has made substantial progress through advances in deep learning and video-based action recognition. Conventional SLR systems typically segment input videos into a fixed number of clips (e.g., five clips per video), regardless of the video’s actual length, to meet [...] Read more.

Sign Language Recognition (SLR) has made substantial progress through advances in deep learning and video-based action recognition. Conventional SLR systems typically segment input videos into a fixed number of clips (e.g., five clips per video), regardless of the video’s actual length, to meet the fixed-length input requirements of deep learning models. While this approach simplifies model design and training, it fails to account for temporal variations inherent in sign language videos. Specifically, applying a fixed number of clips to videos of varying lengths can lead to significant information loss: longer videos suffer from excessive frame skipping, causing the model to miss critical gestural cues, whereas shorter videos require frame duplication, introducing temporal redundancy that distorts motion dynamics. To address these limitations, we propose a dynamic clip generation method that adaptively adjusts the number of clips during inference based on a novel coverage metric. This metric quantifies how effectively a clip selection captures the temporal information in a given video, enabling the system to maintain both temporal fidelity and computational efficiency. Experimental results on benchmark SLR datasets using multiple models-including 3D CNNs, R(2+1)D, Video Swin Transformer, and Multiscale Vision Transformers demonstrate that our method consistently outperforms fixed clip generation methods. Notably, our approach achieves 98.67% accuracy with the Video Swin Transformer while reducing inference time by 28.57%. These findings highlight the effectiveness of coverage-based dynamic clip generation in improving both accuracy and efficiency, particularly for videos with high temporal variability. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

► Show Figures

Figure 1

15 pages, 4192 KiB

Open AccessArticle

Enhancing Kazakh Sign Language Recognition with BiLSTM Using YOLO Keypoints and Optical Flow

by Zholdas Buribayev, Maria Aouani, Zhansaya Zhangabay, Ainur Yerkos, Zemfira Abdirazak and Mukhtar Zhassuzak

Appl. Sci. 2025, 15(10), 5685; https://doi.org/10.3390/app15105685 - 20 May 2025

Viewed by 630

Abstract

Sign languages are characterized by complex and subtle hand movements, which are challenging for computer vision systems to accurately recognize. This study suggests an innovative deep learning pipeline specifically designed for reliable gesture recognition of Kazakh Sign Language. This approach combines key point [...] Read more.

Sign languages are characterized by complex and subtle hand movements, which are challenging for computer vision systems to accurately recognize. This study suggests an innovative deep learning pipeline specifically designed for reliable gesture recognition of Kazakh Sign Language. This approach combines key point detection using the YOLO model, optical flow estimation, and a bidirectional long short-term memory (BiLSTM) network. At the initial stage, a dataset is generated using MediaPipe, which is then used to train the YOLO model in order to accurately identify key hand points. After training, the YOLO model extracts key points and bounding boxes from video recordings of gestures, creating consistent representations of movements. To improve the recognition of dynamic gestures, the optical flow is calculated in an area covering 10% of the area around key points, which allows the dynamics of movements to be captured and provides additional time characteristics. The BiLSTM network is trained on multimodal input that combines data on keypoints, bounding boxes, and optical flow, resulting in improved gesture classification accuracy. The experimental results demonstrate that the proposed approach is superior to traditional methods based solely on key points, especially in recognizing fast and complex gestures. The proposed structure promotes the development of sign language recognition technologies, especially for poorly studied languages such as Kazakh, paving the way to more inclusive and effective communication tools. Full article

► Show Figures

Figure 1

29 pages, 4936 KiB

Open AccessArticle

Continuous Arabic Sign Language Recognition Models

by Nahlah Algethami, Raghad Farhud, Manal Alghamdi, Huda Almutairi, Maha Sorani and Noura Aleisa

Sensors 2025, 25(9), 2916; https://doi.org/10.3390/s25092916 - 5 May 2025

Cited by 1 | Viewed by 778

Abstract

A significant communication gap persists between the deaf and hearing communities, often leaving deaf individuals isolated and marginalised. This challenge is especially pronounced for Arabic-speaking individuals, given the lack of publicly available Arabic Sign Language datasets and dedicated recognition systems. This study is [...] Read more.

A significant communication gap persists between the deaf and hearing communities, often leaving deaf individuals isolated and marginalised. This challenge is especially pronounced for Arabic-speaking individuals, given the lack of publicly available Arabic Sign Language datasets and dedicated recognition systems. This study is the first to use the Temporal Convolutional Network (TCN) model for Arabic Sign Language (ArSL) recognition. We created a custom dataset of the 30 most common sentences in ArSL. We improved recognition performance by enhancing a Recurrent Neural Network (RNN) incorporating a Bidirectional Long Short-Term Memory (BiLSTM) model. Our approach achieved outstanding accuracy results compared to baseline RNN-BiLSTM models. This study contributes to developing recognition systems that could bridge communication barriers for the hearing-impaired community. Through a comparative analysis, we assessed the performance of the TCN and the enhanced RNN architecture in capturing the temporal dependencies and semantic nuances unique to Arabic Sign Language. The models are trained and evaluated using the created dataset of Arabic sign gestures based on recognition accuracy, processing speed, and robustness to variations in signing styles. This research provides insights into the strengths and limitations of TCNs and the enhanced RNN-BiLSTM by investigating their applicability in sign language recognition scenarios. The results indicate that the TCN model achieved an accuracy of 99.5%, while the original RNN-BiLSTM model initially achieved a 96% accuracy but improved to 99% after enhancement. While the accuracy gap between the two models was small, the TCN model demonstrated significant advantages in terms of computational efficiency, requiring fewer resources and achieving faster inference times. These factors make TCNs more practical for real-time sign language recognition applications. Full article

(This article belongs to the Special Issue Sensor-Based Behavioral Biometrics)

► Show Figures

Figure 1

22 pages, 7640 KiB

Open AccessArticle

Bilingual Sign Language Recognition: A YOLOv11-Based Model for Bangla and English Alphabets

by Nawshin Navin, Fahmid Al Farid, Raiyen Z. Rakin, Sadman S. Tanzim, Mashrur Rahman, Shakila Rahman, Jia Uddin and Hezerul Abdul Karim

J. Imaging 2025, 11(5), 134; https://doi.org/10.3390/jimaging11050134 - 27 Apr 2025

Cited by 2 | Viewed by 1480

Abstract

Communication through sign language effectively helps both hearing- and speaking-impaired individuals connect. However, there are problems with the interlingual communication between Bangla Sign Language (BdSL) and English Sign Language (ASL) due to the absence of a unified system. This study aims to introduce [...] Read more.

Communication through sign language effectively helps both hearing- and speaking-impaired individuals connect. However, there are problems with the interlingual communication between Bangla Sign Language (BdSL) and English Sign Language (ASL) due to the absence of a unified system. This study aims to introduce a detection system that incorporates these two sign languages to enhance the flow of communication for those who use these forms of sign language. This study developed and tested a deep learning-based sign-language detection system that can recognize both BdSL and ASL alphabets concurrently in real time. The approach uses a YOLOv11 object detection architecture that has been trained with an open-source dataset on a set of 9556 images containing 64 different letter signs from both languages. Data preprocessing was applied to enhance the performance of the model. Evaluation criteria, including the precision, recall, mAP, and other parameter values were also computed to evaluate the model. The performance analysis of the proposed method shows a precision of 99.12% and average recall rates of 99.63% in 30 epochs. The studies show that the proposed model outperforms the current techniques in sign language recognition (SLR) and can be used in communicating assistive technologies and human–computer interaction systems. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 8938 KiB

Open AccessArticle

Enhancing Hand Gesture Image Recognition by Integrating Various Feature Groups

by Ismail Taha Ahmed, Wisam Hazim Gwad, Baraa Tareq Hammad and Entisar Alkayal

Technologies 2025, 13(4), 164; https://doi.org/10.3390/technologies13040164 - 19 Apr 2025

Cited by 1 | Viewed by 930

Abstract

Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems [...] Read more.

Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems affecting the performance of human gesture picture recognition methods are ambiguity and invariance. Ambiguity occurs when gestures have the same shape but different orientations, while invariance guarantees that gestures are correctly classified even when scale, lighting, or orientation varies. To overcome this issue, hand-crafted features can be combined with deep learning to greatly improve the performance of hand gesture image recognition models. This combination improves the model’s overall accuracy and dependability in identifying a variety of hand movements by enhancing its capacity to record both shape and texture properties. Thus, in this study, we propose a hand gesture recognition method that combines Reset50 model feature extraction with the Tamura texture descriptor and uses the adaptability of GAM to represent intricate interactions between the features. Experiments were carried out on publicly available datasets containing images of American Sign Language (ASL) gestures. As Tamura-ResNet50-OptimizedGAM achieved the highest accuracy rate in the ASL datasets, it is believed to be the best option for human gesture image recognition. According to the experimental results, the accuracy rate was 96%, which is higher than the total accuracy of the state-of-the-art techniques currently in use. Full article

(This article belongs to the Section Information and Communication Technologies)

► Show Figures

Figure 1

24 pages, 9841 KiB

Open AccessArticle

Mexican Sign Language Recognition: Dataset Creation and Performance Evaluation Using MediaPipe and Machine Learning Techniques

by Mario Rodriguez, Outmane Oubram, A. Bassam, Noureddine Lakouari and Rasikh Tariq

Electronics 2025, 14(7), 1423; https://doi.org/10.3390/electronics14071423 - 1 Apr 2025

Cited by 2 | Viewed by 1042

Abstract

In Mexico, around 2.4 million people (1.9% of the national population) are deaf, and Mexican Sign Language (MSL) support is essential for people with communication disabilities. Research and technological prototypes of sign language recognition have been developed to support public communication systems without [...] Read more.

In Mexico, around 2.4 million people (1.9% of the national population) are deaf, and Mexican Sign Language (MSL) support is essential for people with communication disabilities. Research and technological prototypes of sign language recognition have been developed to support public communication systems without human interpreters. However, most of these systems and research are closely related to American Sign Language (ASL) or other sign languages of other languages whose scope has had the highest level of accuracy and recognition of letters and words. The objective of the current study is to develop and evaluate a sign language recognition system tailored to MSL. The research aims to achieve accurate recognition of dactylology and the first ten numerical digits (1–10) in MSL. A database of sign language and numeration of MSL was created with the 29 different characters of MSL’s dactylology and the first ten digits with a camera. Then, MediaPipe was first applied for feature extraction for both hands (21 points per hand). Once the features were extracted, Machine Learning and Deep Learning Techniques were applied to recognize MSL signs. The recognition of MSL patterns in the context of static (29 classes) and continuous signs (10 classes) yielded an accuracy of 92% with Support Vector Machine (SVM) and 86% with Gated Recurrent Unit (GRU) accordingly. The trained algorithms are based on full scenarios with both hands; therefore, it will sign under these conditions. To improve the accuracy, it is suggested to amplify the number of samples. Full article

► Show Figures

Figure 1

21 pages, 5202 KiB

Open AccessArticle

Real-Time American Sign Language Interpretation Using Deep Learning and Keypoint Tracking

by Bader Alsharif, Easa Alalwany, Ali Ibrahim, Imad Mahgoub and Mohammad Ilyas

Sensors 2025, 25(7), 2138; https://doi.org/10.3390/s25072138 - 28 Mar 2025

Viewed by 5311

Abstract

Communication barriers pose significant challenges for the Deaf and Hard-of-Hearing (DHH) community, limiting their access to essential services, social interactions, and professional opportunities. To bridge this gap, assistive technologies leveraging artificial intelligence (AI) and deep learning have gained prominence. This study presents a [...] Read more.

Communication barriers pose significant challenges for the Deaf and Hard-of-Hearing (DHH) community, limiting their access to essential services, social interactions, and professional opportunities. To bridge this gap, assistive technologies leveraging artificial intelligence (AI) and deep learning have gained prominence. This study presents a real-time American Sign Language (ASL) interpretation system that integrates deep learning with keypoint tracking to enhance accessibility and foster inclusivity. By combining the YOLOv11 model for gesture recognition with MediaPipe for precise hand tracking, the system achieves high accuracy in identifying ASL alphabet letters in real time. The proposed approach addresses challenges such as gesture ambiguity, environmental variations, and computational efficiency. Additionally, this system enables users to spell out names and locations, further improving its practical applications. Experimental results demonstrate that the model attains a mean Average Precision (mAP@0.5) of 98.2%, with an inference speed optimized for real-world deployment. This research underscores the critical role of AI-driven assistive technologies in empowering the DHH community by enabling seamless communication and interaction. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

► Show Figures

Figure 1

Search Results (207)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (207)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI