MDPI - Publisher of Open Access Journals

18 pages, 6253 KB

Open AccessArticle

Exploring Sign Language Dataset Augmentation with Generative Artificial Intelligence Videos: A Case Study Using Adobe Firefly-Generated American Sign Language Data

by Valentin Bercaru and Nirvana Popescu

Information 2025, 16(9), 799; https://doi.org/10.3390/info16090799 - 15 Sep 2025

Viewed by 506

Abstract

Currently, high quality datasets focused on Sign Language Recognition are either private, proprietary or difficult to obtain due to costs. Therefore, we aim to mitigate this problem by augmenting a publicly available dataset with artificially generated data in order to enrich and obtain [...] Read more.

Currently, high quality datasets focused on Sign Language Recognition are either private, proprietary or difficult to obtain due to costs. Therefore, we aim to mitigate this problem by augmenting a publicly available dataset with artificially generated data in order to enrich and obtain a more diverse dataset. The performance of Sign Language Recognition (SLR) systems is highly dependent on the quality and diversity of training datasets. However, acquiring large-scale and well-annotated sign language video data remains a significant challenge. This experiment explores the use of Generative Artificial Intelligence (GenAI), specifically Adobe Firefly, to create synthetic video data for American Sign Language (ASL) fingerspelling. Thirteen letters out of 26 were selected for generation, and short videos representing each sign were synthesized and processed into static frames. These synthetic frames replaced approximately 7.5% of the original dataset and were integrated into the training data of a publicly available Convolutional Neural Network (CNN) model. After retraining the model with the augmented dataset, the accuracy did not drop. Moreover, the validation accuracy was approximately the same. The resulting model achieved a maximum accuracy of 98.04%. While the performance gain was limited (less than 1%), the approach illustrates the feasibility of using GenAI tools to generate training data and supports further research into data augmentation for low-resource SLR tasks. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Graphical abstract

20 pages, 16450 KB

Open AccessArticle

A Smart Textile-Based Tactile Sensing System for Multi-Channel Sign Language Recognition

by Keran Chen, Longnan Li, Qinyao Peng, Mengyuan He, Liyun Ma, Xinxin Li and Zhenyu Lu

Sensors 2025, 25(15), 4602; https://doi.org/10.3390/s25154602 - 25 Jul 2025

Viewed by 804

Abstract

Sign language recognition plays a crucial role in enabling communication for deaf individuals, yet current methods face limitations such as sensitivity to lighting conditions, occlusions, and lack of adaptability in diverse environments. This study presents a wearable multi-channel tactile sensing system based on [...] Read more.

Sign language recognition plays a crucial role in enabling communication for deaf individuals, yet current methods face limitations such as sensitivity to lighting conditions, occlusions, and lack of adaptability in diverse environments. This study presents a wearable multi-channel tactile sensing system based on smart textiles, designed to capture subtle wrist and finger motions for static sign language recognition. The system leverages triboelectric yarns sewn into gloves and sleeves to construct a skin-conformal tactile sensor array, capable of detecting biomechanical interactions through contact and deformation. Unlike vision-based approaches, the proposed sensor platform operates independently of environmental lighting or occlusions, offering reliable performance in diverse conditions. Experimental validation on American Sign Language letter gestures demonstrates that the proposed system achieves high signal clarity after customized filtering, leading to a classification accuracy of 94.66%. Experimental results show effective recognition of complex gestures, highlighting the system’s potential for broader applications in human-computer interaction. Full article

(This article belongs to the Special Issue Advanced Tactile Sensors: Design and Applications)

► Show Figures

Figure 1

22 pages, 7640 KB

Open AccessArticle

Bilingual Sign Language Recognition: A YOLOv11-Based Model for Bangla and English Alphabets

by Nawshin Navin, Fahmid Al Farid, Raiyen Z. Rakin, Sadman S. Tanzim, Mashrur Rahman, Shakila Rahman, Jia Uddin and Hezerul Abdul Karim

J. Imaging 2025, 11(5), 134; https://doi.org/10.3390/jimaging11050134 - 27 Apr 2025

Cited by 4 | Viewed by 2804

Abstract

Communication through sign language effectively helps both hearing- and speaking-impaired individuals connect. However, there are problems with the interlingual communication between Bangla Sign Language (BdSL) and English Sign Language (ASL) due to the absence of a unified system. This study aims to introduce [...] Read more.

Communication through sign language effectively helps both hearing- and speaking-impaired individuals connect. However, there are problems with the interlingual communication between Bangla Sign Language (BdSL) and English Sign Language (ASL) due to the absence of a unified system. This study aims to introduce a detection system that incorporates these two sign languages to enhance the flow of communication for those who use these forms of sign language. This study developed and tested a deep learning-based sign-language detection system that can recognize both BdSL and ASL alphabets concurrently in real time. The approach uses a YOLOv11 object detection architecture that has been trained with an open-source dataset on a set of 9556 images containing 64 different letter signs from both languages. Data preprocessing was applied to enhance the performance of the model. Evaluation criteria, including the precision, recall, mAP, and other parameter values were also computed to evaluate the model. The performance analysis of the proposed method shows a precision of 99.12% and average recall rates of 99.63% in 30 epochs. The studies show that the proposed model outperforms the current techniques in sign language recognition (SLR) and can be used in communicating assistive technologies and human–computer interaction systems. Full article

(This article belongs to the Section Computer Vision and Pattern Recognition)

► Show Figures

Figure 1

22 pages, 8938 KB

Open AccessArticle

Enhancing Hand Gesture Image Recognition by Integrating Various Feature Groups

by Ismail Taha Ahmed, Wisam Hazim Gwad, Baraa Tareq Hammad and Entisar Alkayal

Technologies 2025, 13(4), 164; https://doi.org/10.3390/technologies13040164 - 19 Apr 2025

Cited by 5 | Viewed by 1895

Abstract

Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems [...] Read more.

Human gesture image recognition is the process of identifying, deciphering, and classifying human gestures in images or video frames using computer vision algorithms. These gestures can vary from the simplest hand motions, body positions, and facial emotions to complicated gestures. Two significant problems affecting the performance of human gesture picture recognition methods are ambiguity and invariance. Ambiguity occurs when gestures have the same shape but different orientations, while invariance guarantees that gestures are correctly classified even when scale, lighting, or orientation varies. To overcome this issue, hand-crafted features can be combined with deep learning to greatly improve the performance of hand gesture image recognition models. This combination improves the model’s overall accuracy and dependability in identifying a variety of hand movements by enhancing its capacity to record both shape and texture properties. Thus, in this study, we propose a hand gesture recognition method that combines Reset50 model feature extraction with the Tamura texture descriptor and uses the adaptability of GAM to represent intricate interactions between the features. Experiments were carried out on publicly available datasets containing images of American Sign Language (ASL) gestures. As Tamura-ResNet50-OptimizedGAM achieved the highest accuracy rate in the ASL datasets, it is believed to be the best option for human gesture image recognition. According to the experimental results, the accuracy rate was 96%, which is higher than the total accuracy of the state-of-the-art techniques currently in use. Full article

(This article belongs to the Section Information and Communication Technologies)

► Show Figures

Figure 1

24 pages, 9841 KB

Open AccessArticle

Mexican Sign Language Recognition: Dataset Creation and Performance Evaluation Using MediaPipe and Machine Learning Techniques

by Mario Rodriguez, Outmane Oubram, A. Bassam, Noureddine Lakouari and Rasikh Tariq

Electronics 2025, 14(7), 1423; https://doi.org/10.3390/electronics14071423 - 1 Apr 2025

Cited by 4 | Viewed by 1928

Abstract

In Mexico, around 2.4 million people (1.9% of the national population) are deaf, and Mexican Sign Language (MSL) support is essential for people with communication disabilities. Research and technological prototypes of sign language recognition have been developed to support public communication systems without [...] Read more.

In Mexico, around 2.4 million people (1.9% of the national population) are deaf, and Mexican Sign Language (MSL) support is essential for people with communication disabilities. Research and technological prototypes of sign language recognition have been developed to support public communication systems without human interpreters. However, most of these systems and research are closely related to American Sign Language (ASL) or other sign languages of other languages whose scope has had the highest level of accuracy and recognition of letters and words. The objective of the current study is to develop and evaluate a sign language recognition system tailored to MSL. The research aims to achieve accurate recognition of dactylology and the first ten numerical digits (1–10) in MSL. A database of sign language and numeration of MSL was created with the 29 different characters of MSL’s dactylology and the first ten digits with a camera. Then, MediaPipe was first applied for feature extraction for both hands (21 points per hand). Once the features were extracted, Machine Learning and Deep Learning Techniques were applied to recognize MSL signs. The recognition of MSL patterns in the context of static (29 classes) and continuous signs (10 classes) yielded an accuracy of 92% with Support Vector Machine (SVM) and 86% with Gated Recurrent Unit (GRU) accordingly. The trained algorithms are based on full scenarios with both hands; therefore, it will sign under these conditions. To improve the accuracy, it is suggested to amplify the number of samples. Full article

► Show Figures

Figure 1

21 pages, 5202 KB

Open AccessArticle

Real-Time American Sign Language Interpretation Using Deep Learning and Keypoint Tracking

by Bader Alsharif, Easa Alalwany, Ali Ibrahim, Imad Mahgoub and Mohammad Ilyas

Sensors 2025, 25(7), 2138; https://doi.org/10.3390/s25072138 - 28 Mar 2025

Cited by 2 | Viewed by 9474

Abstract

Communication barriers pose significant challenges for the Deaf and Hard-of-Hearing (DHH) community, limiting their access to essential services, social interactions, and professional opportunities. To bridge this gap, assistive technologies leveraging artificial intelligence (AI) and deep learning have gained prominence. This study presents a [...] Read more.

Communication barriers pose significant challenges for the Deaf and Hard-of-Hearing (DHH) community, limiting their access to essential services, social interactions, and professional opportunities. To bridge this gap, assistive technologies leveraging artificial intelligence (AI) and deep learning have gained prominence. This study presents a real-time American Sign Language (ASL) interpretation system that integrates deep learning with keypoint tracking to enhance accessibility and foster inclusivity. By combining the YOLOv11 model for gesture recognition with MediaPipe for precise hand tracking, the system achieves high accuracy in identifying ASL alphabet letters in real time. The proposed approach addresses challenges such as gesture ambiguity, environmental variations, and computational efficiency. Additionally, this system enables users to spell out names and locations, further improving its practical applications. Experimental results demonstrate that the model attains a mean Average Precision (mAP@0.5) of 98.2%, with an inference speed optimized for real-world deployment. This research underscores the critical role of AI-driven assistive technologies in empowering the DHH community by enabling seamless communication and interaction. Full article

(This article belongs to the Special Issue Sensor Systems for Gesture Recognition (3rd Edition))

► Show Figures

Figure 1

15 pages, 910 KB

Open AccessBrief Report

Real-Time Norwegian Sign Language Recognition Using MediaPipe and LSTM

by Md. Zia Uddin, Costas Boletsis and Pål Rudshavn

Multimodal Technol. Interact. 2025, 9(3), 23; https://doi.org/10.3390/mti9030023 - 3 Mar 2025

Cited by 4 | Viewed by 4300

Abstract

The application of machine learning models for sign language recognition (SLR) is a well-researched topic. However, many existing SLR systems focus on widely used sign languages, e.g., American Sign Language, leaving other underrepresented sign languages such as Norwegian Sign Language (NSL) relatively underexplored. [...] Read more.

The application of machine learning models for sign language recognition (SLR) is a well-researched topic. However, many existing SLR systems focus on widely used sign languages, e.g., American Sign Language, leaving other underrepresented sign languages such as Norwegian Sign Language (NSL) relatively underexplored. This work presents a preliminary system for recognizing NSL gestures, focusing on numbers 0 to 10. Mediapipe is used for feature extraction and Long Short-Term Memory (LSTM) networks for temporal modeling. This system achieves a testing accuracy of 95%, aligning with existing benchmarks and demonstrating its robustness to variations in signing styles, orientations, and speeds. While challenges such as data imbalance and misclassification of similar gestures (e.g., Signs 3 and 8) were observed, the results underscore the potential of our proposed approach. Future iterations of the system will prioritize expanding the dataset by including additional gestures and environmental variations as well as integrating additional modalities. Full article

► Show Figures

Figure 1

19 pages, 9789 KB

Open AccessArticle

A Static Sign Language Recognition Method Enhanced with Self-Attention Mechanisms

by Yongxin Wang, He Jiang, Yutong Sun and Longqi Xu

Sensors 2024, 24(21), 6921; https://doi.org/10.3390/s24216921 - 29 Oct 2024

Cited by 3 | Viewed by 2336

Abstract

For the current wearable devices in the application of cross-diversified user groups, it is common to face the technical difficulties of static sign language recognition accuracy attenuation, weak anti-noise ability, and insufficient system robustness due to the differences in the use of users. [...] Read more.

For the current wearable devices in the application of cross-diversified user groups, it is common to face the technical difficulties of static sign language recognition accuracy attenuation, weak anti-noise ability, and insufficient system robustness due to the differences in the use of users. This paper proposes a novel static sign language recognition method enhanced by a self-attention mechanism. The key features of sign language gesture classification are highlighted by the weight function, and then the self-attention mechanism is combined to pay more attention to the key features, and the convolutional neural network is used to extract the features and classify them, which realizes the accurate recognition of different types of static sign language under standard gestures and non-standard gestures. Experimental results reveal that the proposed method achieves an average accuracy of 99.52% in the standard static sign language recognition task when tested against the standard 36 static gestures selected within the reference American Sign Language dataset. By imposing random angular bias conditions of ±(0°–9°] and ±(9°–18°], the average recognition rates in this range were 98.63% and 86.33%. These findings indicate that, compared to existing methods, the proposed method not only maintains a high recognition rate for standard static gestures but also exhibits superior noise resistance and robustness, rendering it suitable for static sign language recognition among diverse user populations. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

20 pages, 7762 KB

Open AccessArticle

Applying Swin Architecture to Diverse Sign Language Datasets

by Yulia Kumar, Kuan Huang, Chin-Chien Lin, Annaliese Watson, J. Jenny Li, Patricia Morreale and Justin Delgado

Electronics 2024, 13(8), 1509; https://doi.org/10.3390/electronics13081509 - 16 Apr 2024

Cited by 4 | Viewed by 3560

Abstract

In an era where artificial intelligence (AI) bridges crucial communication gaps, this study extends AI’s utility to American and Taiwan Sign Language (ASL and TSL) communities through advanced models like the hierarchical vision transformer with shifted windows (Swin). This research evaluates Swin’s adaptability [...] Read more.

In an era where artificial intelligence (AI) bridges crucial communication gaps, this study extends AI’s utility to American and Taiwan Sign Language (ASL and TSL) communities through advanced models like the hierarchical vision transformer with shifted windows (Swin). This research evaluates Swin’s adaptability across sign languages, aiming for a universal platform for the unvoiced. Utilizing deep learning and transformer technologies, it has developed prototypes for ASL-to-English translation, supported by an educational framework to facilitate learning and comprehension, with the intention to include more languages in the future. This study highlights the efficacy of the Swin model, along with other models such as the vision transformer with deformable attention (DAT), ResNet-50, and VGG-16, in ASL recognition. The Swin model’s accuracy across various datasets underscore its potential. Additionally, this research explores the challenges of balancing accuracy with the need for real-time, portable language recognition capabilities and introduces the use of cutting-edge transformer models like Swin, DAT, and video Swin transformers for diverse datasets in sign language recognition. This study explores the integration of multimodality and large language models (LLMs) to promote global inclusivity. Future efforts will focus on enhancing these models and expanding their linguistic reach, with an emphasis on real-time translation applications and educational frameworks. These achievements not only advance the technology of sign language recognition but also provide more effective communication tools for the deaf and hard-of-hearing community. Full article

(This article belongs to the Special Issue Applications of Deep Learning Techniques)

► Show Figures

Figure 1

15 pages, 2849 KB

Open AccessArticle

American Sign Language Recognition and Translation Using Perception Neuron Wearable Inertial Motion Capture System

by Yutong Gu, Hiromasa Oku and Masahiro Todoh

Sensors 2024, 24(2), 453; https://doi.org/10.3390/s24020453 - 11 Jan 2024

Cited by 11 | Viewed by 4808

Abstract

Sign language is designed as a natural communication method to convey messages among the deaf community. In the study of sign language recognition through wearable sensors, the data sources are limited, and the data acquisition process is complex. This research aims to collect [...] Read more.

Sign language is designed as a natural communication method to convey messages among the deaf community. In the study of sign language recognition through wearable sensors, the data sources are limited, and the data acquisition process is complex. This research aims to collect an American sign language dataset with a wearable inertial motion capture system and realize the recognition and end-to-end translation of sign language sentences with deep learning models. In this work, a dataset consisting of 300 commonly used sentences is gathered from 3 volunteers. In the design of the recognition network, the model mainly consists of three layers: convolutional neural network, bi-directional long short-term memory, and connectionist temporal classification. The model achieves accuracy rates of 99.07% in word-level evaluation and 97.34% in sentence-level evaluation. In the design of the translation network, the encoder-decoder structured model is mainly based on long short-term memory with global attention. The word error rate of end-to-end translation is 16.63%. The proposed method has the potential to recognize more sign language sentences with reliable inertial data from the device. Full article

(This article belongs to the Section Wearables)

► Show Figures

Figure 1

7 pages, 852 KB

Open AccessProceeding Paper

Improving Hand Pose Recognition Using Localization and Zoom Normalizations over MediaPipe Landmarks

by Miguel Ángel Remiro, Manuel Gil-Martín and Rubén San-Segundo

Eng. Proc. 2023, 58(1), 69; https://doi.org/10.3390/ecsa-10-16215 - 15 Nov 2023

Cited by 3 | Viewed by 2619

Abstract

Hand pose recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks [...] Read more.

Hand pose recognition presents significant challenges that need to be addressed, such as varying lighting conditions or complex backgrounds, which can hinder accurate and robust hand pose estimation. This can be mitigated by employing MediaPipe to facilitate the efficient extraction of representative landmarks from static images combined with the use of Convolutional Neural Networks. Extracting these landmarks from the hands mitigates the impact of lighting variability or the presence of complex backgrounds. However, the variability of the location and size of the hand is still not addressed by this process. Therefore, the use of processing modules to normalize these points regarding the location of the wrist and the zoom of the hands can significantly mitigate the effects of these variabilities. In all the experiments performed in this work based on American Sign Language alphabet datasets of 870, 27,000, and 87,000 images, the application of the proposed normalizations has resulted in significant improvements in the model performance in a resource-limited scenario. Particularly, under conditions of high variability, applying both normalizations resulted in a performance increment of 45.08%, increasing the accuracy from 43.94 ± 0.64% to 89.02 ± 0.40%. Full article

(This article belongs to the Proceedings of The 10th International Electronic Conference on Sensors and Applications)

► Show Figures

Figure 1

26 pages, 3814 KB

Open AccessArticle

SDViT: Stacking of Distilled Vision Transformers for Hand Gesture Recognition

by Chun Keat Tan, Kian Ming Lim, Chin Poo Lee, Roy Kwang Yang Chang and Ali Alqahtani

Appl. Sci. 2023, 13(22), 12204; https://doi.org/10.3390/app132212204 - 10 Nov 2023

Cited by 4 | Viewed by 2495

Abstract

Hand gesture recognition (HGR) is a rapidly evolving field with the potential to revolutionize human–computer interactions by enabling machines to interpret and understand human gestures for intuitive communication and control. However, HGR faces challenges such as the high similarity of hand gestures, real-time [...] Read more.

Hand gesture recognition (HGR) is a rapidly evolving field with the potential to revolutionize human–computer interactions by enabling machines to interpret and understand human gestures for intuitive communication and control. However, HGR faces challenges such as the high similarity of hand gestures, real-time performance, and model generalization. To address these challenges, this paper proposes the stacking of distilled vision transformers, referred to as SDViT, for hand gesture recognition. An initially pretrained vision transformer (ViT) featuring a self-attention mechanism is introduced to effectively capture intricate connections among image patches, thereby enhancing its capability to handle the challenge of high similarity between hand gestures. Subsequently, knowledge distillation is proposed to compress the ViT model and improve model generalization. Multiple distilled ViTs are then stacked to achieve higher predictive performance and reduce overfitting. The proposed SDViT model achieves a promising performance on three benchmark datasets for hand gesture recognition: the American Sign Language (ASL) dataset, the ASL with digits dataset, and the National University of Singapore (NUS) hand gesture dataset. The accuracies achieved on these datasets are 100.00%, 99.60%, and 100.00%, respectively. Full article

► Show Figures

Figure 1

20 pages, 1550 KB

Open AccessArticle

Deep Learning Technology to Recognize American Sign Language Alphabet

by Bader Alsharif, Ali Salem Altaher, Ahmed Altaher, Mohammad Ilyas and Easa Alalwany

Sensors 2023, 23(18), 7970; https://doi.org/10.3390/s23187970 - 19 Sep 2023

Cited by 36 | Viewed by 10920

Abstract

Historically, individuals with hearing impairments have faced neglect, lacking the necessary tools to facilitate effective communication. However, advancements in modern technology have paved the way for the development of various tools and software aimed at improving the quality of life for hearing-disabled individuals. [...] Read more.

Historically, individuals with hearing impairments have faced neglect, lacking the necessary tools to facilitate effective communication. However, advancements in modern technology have paved the way for the development of various tools and software aimed at improving the quality of life for hearing-disabled individuals. This research paper presents a comprehensive study employing five distinct deep learning models to recognize hand gestures for the American Sign Language (ASL) alphabet. The primary objective of this study was to leverage contemporary technology to bridge the communication gap between hearing-impaired individuals and individuals with no hearing impairment. The models utilized in this research include AlexNet, ConvNeXt, EfficientNet, ResNet-50, and VisionTransformer were trained and tested using an extensive dataset comprising over 87,000 images of the ASL alphabet hand gestures. Numerous experiments were conducted, involving modifications to the architectural design parameters of the models to obtain maximum recognition accuracy. The experimental results of our study revealed that ResNet-50 achieved an exceptional accuracy rate of 99.98%, the highest among all models. EfficientNet attained an accuracy rate of 99.95%, ConvNeXt achieved 99.51% accuracy, AlexNet attained 99.50% accuracy, while VisionTransformer yielded the lowest accuracy of 88.59%. Full article

(This article belongs to the Collection Machine Learning and AI for Sensors)

► Show Figures

Figure 1

20 pages, 5367 KB

Open AccessArticle

Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs

by Ahmed Mateen Buttar, Usama Ahmad, Abdu H. Gumaei, Adel Assiri, Muhammad Azeem Akbar and Bader Fahad Alkhamees

Mathematics 2023, 11(17), 3729; https://doi.org/10.3390/math11173729 - 30 Aug 2023

Cited by 52 | Viewed by 13058

Abstract

A speech impairment limits a person’s capacity for oral and auditory communication. A great improvement in communication between the deaf and the general public would be represented by a real-time sign language detector. This work proposes a deep learning-based algorithm that can identify [...] Read more.

A speech impairment limits a person’s capacity for oral and auditory communication. A great improvement in communication between the deaf and the general public would be represented by a real-time sign language detector. This work proposes a deep learning-based algorithm that can identify words from a person’s gestures and detect them. There have been many studies on this topic, but the development of static and dynamic sign language recognition models is still a challenging area of research. The difficulty is in obtaining an appropriate model that addresses the challenges of continuous signs that are independent of the signer. Different signers’ speeds, durations, and many other factors make it challenging to create a model with high accuracy and continuity. For the accurate and effective recognition of signs, this study uses two different deep learning-based approaches. We create a real-time American Sign Language detector using the skeleton model, which reliably categorizes continuous signs in sign language in most cases using a deep learning approach. In the second deep learning approach, we create a sign language detector for static signs using YOLOv6. This application is very helpful for sign language users and learners to practice sign language in real time. After training both algorithms separately for static and continuous signs, we create a single algorithm using a hybrid approach. The proposed model, consisting of LSTM with MediaPipe holistic landmarks, achieves around 92% accuracy for different continuous signs, and the YOLOv6 model achieves 96% accuracy over different static signs. Throughout this study, we determine which approach is best for sequential movement detection and for the classification of different signs according to sign language and shows remarkable accuracy in real time. Full article

(This article belongs to the Special Issue New Advances in Computer Vision and Deep Learning)

► Show Figures

Figure 1

15 pages, 1906 KB

Open AccessArticle

Multi-Stream General and Graph-Based Deep Neural Networks for Skeleton-Based Sign Language Recognition

by Abu Saleh Musa Miah, Md. Al Mehedi Hasan, Si-Woong Jang, Hyoun-Sup Lee and Jungpil Shin

Electronics 2023, 12(13), 2841; https://doi.org/10.3390/electronics12132841 - 27 Jun 2023

Cited by 33 | Viewed by 3475

Abstract

Sign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many researchers have recently [...] Read more.

Sign language recognition (SLR) aims to bridge speech-impaired and general communities by recognizing signs from given videos. However, due to the complex background, light illumination, and subject structures in videos, researchers still face challenges in developing effective SLR systems. Many researchers have recently sought to develop skeleton-based sign language recognition systems to overcome the subject and background variation in hand gesture sign videos. However, skeleton-based SLR is still under exploration, mainly due to a lack of information and hand key point annotations. More recently, researchers have included body and face information along with hand gesture information for SLR; however, the obtained performance accuracy and generalizability properties remain unsatisfactory. In this paper, we propose a multi-stream graph-based deep neural network (SL-GDN) for a skeleton-based SLR system in order to overcome the above-mentioned problems. The main purpose of the proposed SL-GDN approach is to improve the generalizability and performance accuracy of the SLR system while maintaining a low computational cost based on the human body pose in the form of 2D landmark locations. We first construct a skeleton graph based on 27 whole-body key points selected among 67 key points to address the high computational cost problem. Then, we utilize the multi-stream SL-GDN to extract features from the whole-body skeleton graph considering four streams. Finally, we concatenate the four different features and apply a classification module to refine the features and recognize corresponding sign classes. Our data-driven graph construction method increases the system’s flexibility and brings high generalizability, allowing it to adapt to varied data. We use two large-scale benchmark SLR data sets to evaluate the proposed model: The Turkish Sign Language data set (AUTSL) and Chinese Sign Language (CSL). The reported performance accuracy results demonstrate the outstanding ability of the proposed model, and we believe that it will be considered a great innovation in the SLR domain. Full article

(This article belongs to the Special Issue Machine Learning and Deep Learning Based Pattern Recognition)

► Show Figures

Figure 1

Search Results (47)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (47)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI