You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

1 September 2022

Sign Language Recognition Method Based on Palm Definition Model and Multiple Classification

,
,
,
,
and
1
Faculty of Information Technologies, L.N. Gumilyov Eurasian National University, Nur-Sultan 010008, Kazakhstan
2
Institute of Economics, Information Technologies and Professional Education, Zangir Khan West Kazakhstan Agrarion-Technical University, Uralsk 090000, Kazakhstan
3
Department of Information and Technical Sciences, Faculty of Information Technologies and Economics, Kazakh Humanitarian Law Innovative University, East Kazakhstan Region, Semey 701400, Kazakhstan
*
Author to whom correspondence should be addressed.
This article belongs to the Section Physical Sensors

Abstract

Technologies for pattern recognition are used in various fields. One of the most relevant and important directions is the use of pattern recognition technology, such as gesture recognition, in socially significant tasks, to develop automatic sign language interpretation systems in real time. More than 5% of the world’s population—about 430 million people, including 34 million children—are deaf-mute and not always able to use the services of a living sign language interpreter. Almost 80% of people with a disabling hearing loss live in low- and middle-income countries. The development of low-cost systems of automatic sign language interpretation, without the use of expensive sensors and unique cameras, would improve the lives of people with disabilities, contributing to their unhindered integration into society. To this end, in order to find an optimal solution to the problem, this article analyzes suitable methods of gesture recognition in the context of their use in automatic gesture recognition systems, to further determine the most optimal methods. From the analysis, an algorithm based on the palm definition model and linear models for recognizing the shapes of numbers and letters of the Kazakh sign language are proposed. The advantage of the proposed algorithm is that it fully recognizes 41 letters of the 42 in the Kazakh sign alphabet. Until this time, only Russian letters in the Kazakh alphabet have been recognized. In addition, a unified function has been integrated into our system to configure the frame depth map mode, which has improved recognition performance and can be used to create a multimodal database of video data of gesture words for the gesture recognition system.

1. Introduction

Globally, 432 million adults and 34 million children need rehabilitation for “disabling” hearing loss. It is estimated that by 2050, more than 700 million people—or 1 in 10 people—will have a disabling hearing loss. The prevalence of hearing loss increases with age, with over 25% of people over 60 years of age suffering from a disabling hearing loss [1].
Recently, more attention has been paid in the world to improving the quality of life of people with disabilities. Necessary conditions for movement, training, and interaction with the public for people with disabilities are being created; special hardware, software, scientific and technical products are being developed, and various social state programs and programs of inclusive education are being implemented. For scientists in countries around the world, creating a barrier-free society for people with disabilities is one of the most important tasks.
In modern Kazakhstan, one of the most important directions of state policy concerns the equal right to education for all citizens. The prerequisite for ensuring accessibility of education is an inclusive environment. The modern task of inclusive education provides intellectual development, ensuring equal access to education for all levels of the population, taking into account their psycho-physiological and individual characteristics. The process of inclusive education is conditioned by normative legal documents, such as the Law of the Republic of Kazakhstan on Education [2] and “the concept of development of inclusive education in the Republic of Kazakhstan” [3], which strengthen the requirements for professional activities of teachers, the process of barrier-free training of people with disabilities, and access to software products.
Object recognition technologies are a key factor that can provide solutions to improve the quality of life for people with disabilities. The ability of a machine to understand human gestures, interpret deaf-mute people’s intentions, and react accordingly is one of the most important aspects of human–machine interaction. At the same time, gesture recognition is quite a challenge, not only because of the variety of contexts, multiple interpretations, spatial and temporal variations, and complex non-rigid hand properties, but also because of different lighting levels and complex backgrounds. The first attempts to create various automated systems capable of perceiving the world like humans were made decades ago. Over time, greatly improved, these technologies have been widely used in many fields.
The authors proposed an algorithm for recognizing the dactylic alphabet of the Kazakh sign language. This algorithm can in turn can be used for recognition systems calculating Kazakh sign speech. The novelty of the proposed algorithm is that there are no full-fledged recognition systems of the Kazakh dactylic alphabet. Because of its similarity with the Russian sign language, the scientific community is limited to recognizing only the Russian dactylic alphabet, which consists of 33 letters, and the Kazakh dactylic alphabet of 42 letters, in which the letters i, ң, ғ, к, қ, ө, ë, ъ, and ь are denoted by several positions. The authors [4] compared Kazakh, Russian, English, and Turkish sign languages to prove that Kazakh Sign Language can exist as a separate sign language. A study of the form of the display was carried out in terms of configuration (arm/forearm), place of execution (localization), the direction of movement, nature of the movement, and a component that cannot be performed manually (facial expression and articulation). Despite the 50% similarity with Russian Sign Language, it can be said that Kazakh Sign Language is a separate language since, in turn, Russian Sign Language has 1050 words employed in the course of the study, about 30% of which are borrowed from English. The vocabulary of languages is many times less than the vocabulary of natural languages. In addition, during communication with each other, hard-of-hearing people continue to create new words, adapting them to the conversation. Therefore, we can conclude that the Kazakh sign language, in terms of vocabulary, is a separate sign language with its own specifics.
The next innovation of this work is that a unified “draw_people” functionality was integrated into the system for recording and demonstrating gestures in real life, in which users can set up a frame depth map mode, which in turn contributed to better results. The draw_people functionality makes it possible to obtain an approximately equal depth map of the dataset and the frame when shown in real time.
Moreover, linguists and sign language interpreters should also consider Kazakh Sign Language, as it is necessary to consolidate its status as a separate sign language and prevent its extinction.

3. Problem Description and Proposed Solution

3.1. Problem Description

Gesture (lat. Gestus—body movement)—movement of the human body or its parts, which has a certain meaning, i.e., is a symbol or emblem.
Sign language is a method of interpersonal communication, characterized by specific lexical and grammatical patterns, supported by gestures of hearing-impaired people.
Sign language is a system of non-verbal communication between people with normal hearing and people with hearing impairments, and the latter is actually used as the main mode of communication, in which one can find a gesture that corresponds to each word [15]. The basic unit of sign language is a gesture, i.e., the ability to indicate an object through gestures, facial expressions and articulation, head turning, etc., visualization of object parameters.
In most cases it is not possible to convey names, foreign, technical and medical references with the help of sign language. Therefore, along with sign language, the deaf (hearing-impaired) widely use dactylic alphabet as a supplement (Figure 1).
Figure 1. Kazakh Sign Language dactyl alphabet.
The grammar of the dactylic language resembles that the grammar of the native language of the deaf. Dactylogy can often be described as writing with fingers in the air: visual perception and reliance on all the rules of spelling, such as writing. But not punctuation marks: exclamation and question marks are conveyed through appropriate facial expressions; period and multi-point pauses; dashes, colons and other punctuation marks, although peculiar types of expression, are not indicated in dactyl writing.
To parameterize gesture demonstration, five components of gesture are distinguished: configuration (hand/forearm), place of performance (localization), direction of movement, nature of movement, and a component not performed by hands (facial expression and articulation) [39,40]. Let’s take localization, movement and direction of the palm as the basic properties in the gesture demonstration, and introduce the following concept and notations for the model construction (Table 1).
Table 1. The basic parameters of Kazakh Sign Language, received at demonstration.
Based on the features entered in Table 1, we can determine the complexity of the input data to solve the problem of sign language recognition. On the basis of the parameters designated in this table, the classification of words of the Kazakh sign language is carried out. For the sign language recognition task, the place of the gesture (Figure 2) is very important, because if the gesture is performed in a neutral area, you can separate the object from the background by putting on the speaker clothes of the same color, but if the place of display is marked by touching the face, neck, it is difficult to separate the hand from the face or neck. If the display location is at the waist or around the shoulders, you must have certain requirements for the background in order to separate the object from the background.
Figure 2. Localization.
General gestures can be divided into static and dynamic gestures. Static gestures represent the position of the hand without any movement in space, and dynamic gestures are characterized by sequential hand movements from the starting point to the end point over a certain period of time (Figure 3). At the same time, many letters of the dactyl alphabet can be referred to static signs (Figure 4).
Figure 3. Direction of movement (straight, intermittent, jumping, repetitive).
Figure 4. Example of static gestures.
When demonstrating the gestures «eкi(two)», «қaзaн (caldron)», «қыcқaшa (briefly)», «ipi (large)» in gesture language, the hand takes only one position, and when demonstrating dynamic gestures, not only changes the hand position but also the configuration, as well as all the changes from Table 1.
Gestures with two hands are called symmetrical if the shape and direction of movement of two wrists coincide or reflect each other’s movements. In the words «дәлдiк (accuracy)», «бapaбaн (drum)» depicted in the pictures, both hands move symmetrically to each other. In asymmetric two-handed gestures, one hand often does not move, or moves in a different direction—this hand is called the passive hand, and the other hand can perform complex movements—this hand is called the active hand, often (it allows you) to determine the shape and movement of the active hand.
For static and dynamical gestures with the palm pointing to the camera or to the speaker (Figure 4 and Figure 5), various algorithms can be used to accurately determine the configurations or shape of the hand [41].
Figure 5. Example of dynamic gestures.
If the palm is oriented left, right, up and down, the hand configuration may not be correctly read by a simple camera and thus impairs the detection of hand features and object recognition (Figure 6).
Figure 6. An example of gestures in which it is difficult to recognize the orientation of the palm.
In this paper, we proposed an algorithm for recognizing one-handed and two-handed gesture types that satisfies the following conditions:
  • Ω HA ψ PTFS↔ ML/DULRS
  • Ω HA/OH ψ PTFS↔ ML/DULRS
  • Ω HA/RLH ψ PTFS↔ ML/DULRS
  • Ω HA/TF ψ PTFS↔ ML/DULRS
  • Ω HA/TN ψ PTFS↔ ML/DULRS
  • Ω NZ ψ PTFS↔ ML/DULRS
  • Ω NRLSH ψ PTFS↔ ML/DULRS
  • Ω W ψ PTFS↔ ML/DULRS

3.2. Proposed Solution

For a gesture recognition system, the technologies that are used to collect raw data on hand movements, facial expressions or body language play a crucial role. In general, input data acquisition devices for gesture recognition systems fall into two categories: simple cameras and various sensors. Accordingly, it can be said that the methods and algorithms used in gesture recognition are directly dependent on these data collection devices.
The MediaPipe technology used in this paper, without special sensors and gloves, using a simple camera, can ensure information about the main points, characteristics and the position of the hand, which can be provided by sensors and devices, using a simple camera.
Since the 1990s, the use of special gloves in gesture recognition applications has been widespread, and there has been great interest in methods using various sensors. The solution to the problem of gesture recognition using sensors is still relevant and has been described in many modern works.
Saggio et al. [42] proposed a system based on wearable electronic devices and two different classification algorithms. The system has been tested on 10 Italian sign language words: “costo”, “grazie”, “maestro”, as well as on international words such as “Google”, “internet”, “jogging”, “pizza”, “TV”, “Twitter”, and “ciao”. Hou et al. [43] proposed a SignSpeaker system based on a smartwatch. Hou described how each sign has its own specific motion model and can be converted into unique gyroscope and accelerometer signals. They implemented their system on a pair of ready-made commercial devices—a smartwatch and a smartphone.
The FinGTrAC system [44] demonstrates the feasibility of fine-grained finger gesture tracking using a minimally invasive wearable sensor platform (a smart ring worn on the index finger and a smartwatch worn on the wrist). The main contribution is to increase the scale of gesture recognition to hundreds of gestures using only a rare set of wearable sensors. In contrast, previous work detected only dozens of hand gestures. Such rare sensors are comfortable to wear, but they cannot track all fingers and provide insufficient information.
Yin et al. [45] proposed a gesture recognition system based on an information glove; their proposed glove included a FLEX2.2 sensor and an STM32 chip to detect finger position. The data received from the sensor is fed to the input of the ANN and a signal is recognized as a result of comparison with a reference. Similarly, Bairagi [46] et al. used an information glove and mounted an ADXL335 accelerometer sensor and a resistance sensor. The data from the accelerometer was converted into digital data using an ADC and recognized using a microcontroller. It was then sent to the Android device via a Bluetooth module and converted from text to speech.
Chiu [47] et al. proposed a gesture recognition system using an autonomous current source attached to the back of the hand; that is, the gesture was recognized by detecting the movement of the joints of the hand using a triboelectric nanogenerator.
Mummadi et al. [48] have developed a data processing glove design that detects fine-grained hand shapes on gloves, based on IMU sensors on all fingertips. It takes advantage of the latest System-on-Chip designs that are powerful enough to perform real-time data fusion and classification procedures in the glove. A total of five IMUs, a multiplexer, and an embedded microprocessor make up the entire configuration of the glove. To improve the noisy and drift-prone sensor readings from the IMUs, an additional filter was used to generate a smooth and consistent signal. In addition, combining the data allows accurate measurement of orientation and finger movements.
The aforementioned research on glove use has made a major contribution to gesture recognition. However, the problem of the number of recognized gestures is still relevant, since there is a limited amount of test data, despite the fact that many of them are effective and show results above 90%. In addition, the systems in question use gloves and additional devices to obtain information about hand joints and finger positions, and these devices are known to be very expensive and inconvenient to use in the household. The MediaPipe technology [49] used in our work is a revolutionary product in the field of gesture recognition because it does not require the use of additional, very expensive, or unavailable devices to formalize and track palm and finger movements.
Many current automatic sign language translation systems, especially those based on machine learning, require the performance of processing; therefore, the architecture and levels of the artificial neural network are large enough to recognize dynamic gestures.
The algorithm we proposed does not require processing performance and can also be updated for mobile gadgets (Figure 7). This algorithm, based on the MediaPipe Hands technology and the OpenCV library, was able to recognize two-handed and one-handed gestures in real time and provide reliable results. MediaPipe Hands is one of the best solutions for hand and finger recognition.
Figure 7. Sign language recognition in a real-time system based on the palm definition model.

3.2.1. Get Image

Within the framework of the article, a gesture recording functionality was developed, in which the active region (ROI) and depth map mode can be configured, which in turn significantly improved the results of gesture recognition compared to previous similar systems [29,30,31,32,33]. This functionality has been integrated into the gesture recording module and into the real-time dactyl letter recognition module.
The gesture recording system consists of the following steps:
(1)
Starting the camera
(2)
Clicking on the screen
(3)
The layout of the human upper body appears (Figure 8)
Figure 8. Draw_people functionality.
(4)
Adjusting the layout
(5)
Starting recording
As emphasized in the introduction, it was essential for us to develop budget systems with an ordinary everyday camera without additional devices. The recording was made by a regular Logitech WebCam C270 USB webcam, which allows one to receive 1280 × 720 video with a frequency of up to 30 frames. The system works with any camera above 480 × 640.The proposed system works in real time, so it is necessary to record, analyze, and pre-process video in a short time and feed the information to the input of the model. In the Get Image module, recording is performed in real time until 500 frames are reached, or until the speaker presses the letter q. Every 5th frame out of 500 recorded frames is saved. Thus, 100 frames are recorded for each 500 frames. A total of seven people (four of 20 years of age, one of 40 years of age, one of 11 years of age, and one of 8 years of age) participated in the experiment, and 4100 frames of the KSL alphabet and 1500 frames of numbers from 1 to 15 were captured for each.

3.2.2. Get Data

Recognizing human hand configuration and direction of movement is one of the vital components, as it opens up possibilities for natural human–computer interaction. Among the five components described above, the configuration, direction of movement, and character of movement are important components, since the main information about the gesture is read from them. Correspondingly, reading the coordinates of the finger joints and reading the trajectory of each finger can be a prerequisite for calculating gesture language recognition.
To determine the initial location of the hand, we use the BlazePalm single-finger detector model which is available in MediaPipe. Hand detection is quite a complex task: the system must process hands of different sizes, in different lighting, on different backgrounds, with crossed fingers or closed hands, etc. In addition, after separating the palm in the frame from the frame, the system processes only the values inside the frame, which, in turn, reduces the load on the performance of the machine by a multiple of five to six.
The Get Data module reads a static frame, then converts the image from BGR to RGB, outputs a Y-axis image, sends it to the MediaPipe input, and detects a palm.
The defined palm is a list of 21 base coordinates, in which each point consists of x, y, and z values; x and y are set to the width and height of the image, respectively [0, 0, 1.0], and z to the depth of the point. The forearm depth is the reference point, and the smaller the value, the closer the orientation is to the camera. The z value uses approximately the same scale as x.
After obtaining the coordinates of 21 joints of the human palm, these values are transferred to the real coordinates of the virtual three-dimensional world, and the hand orientation is determined based on magnetic positioning (Figure 1). There is a terminology for the digital three-dimensional space coordinate system. This is unconventional, although these concepts help programmers create 3D applications and games. Here, x is width, y is height, and z is depth compared to mapping.
From the 100 frames recorded in the Get Image module, the palm is detected by BlazePalm, and the palm is extracted from the frames with the detected palm (Figure 9). A csv data file will be created as a result of palm detection frames.
Figure 9. Detected palm.
When submitting input data into an artificial neural network, it is important to process the input data correctly, not the neural network itself. Accordingly, it is first necessary to analyze the correlation and select the correct data. To realize this goal, data visualization is performed (Figure 10). Knowing how to choose the right type of graph is a key skill, because distortion can lead to misinterpretation of the data obtained by qualitative data analysis. Therefore, the data were visualized using two tools.
Figure 10. An example of data visualization of some classes.
For example, if class “4” initially has 100 frames, palms are detected from about 70 of these frames, and hand values are recorded in a csv file. The mutual similarity of the recorded values can be seen in the following graph (Figure 11).
Figure 11. An example of data.

3.2.3. Get Train and Get Classification

The Support Vector Machine (SVM) method is a very powerful and versatile machine learning model, capable of performing linear or non-linear classification, regression, and even outlier detection. SVM methods are particularly well suited for classifying complex but small or medium datasets, such as a gesture language dataset.
The fundamental idea behind SVM methods can be better revealed using illustrations. Figure 12 shows part of the gesture dataset 0 and 1. The two classes can be easily and clearly separated with a straight line (they are linearly separable).
Figure 12. Visualization of gesture data 1.5.
The graph on the left shows the decision bounds of the two 1.5 gestures of possible linear classifiers. The model is so poor that it doesn’t even separate classes properly.
For multiple classification problems, we used SVC with an OVO strategy. Although SVM linear classifiers are efficient and work surprisingly well in numerous cases, many datasets are far from being linearly separable. One approach to processing nonlinear datasets involves adding additional features, calculated using a similarity function that measures how much similarity each sample has to a separate landmark.
The technique for solving nonlinear problems involves adding features calculated using the similarity function, which measures how much similarity each sample has to a separate landmark; for example, if we utilize a one-dimensional data set and add a landmark to it at x1 = 0.6, as illustrated by the graph in Figure 13:
Figure 13. One-dimensional dataset with a landmark.
Subsequently, we defined the proximity function as a Gaussian Radial Basis Function (RBF) with γ = 0.1 (Equation (1)):
ϕ γ ( x , ) = e x p γ x 2
The Gaussian RBF is a bell-shaped function that changes from 0 (very far from the landmark) to 1 (at the landmark). New features can be computed in this way. We use this approach in order to create a landmark based on the location of each sample in the data set. This approach creates many dimensions, and thus increases the chances that the transformed dataset will be linearly separable. If the training set is very large, one will obtain an equally large number of features. SVM allows one to obtain similar results as if multiple proximity features were added, without actually adding them.
We tested (Figure 14) the Gaussian RBF kernel (Gaussian RBF kernel) using the SVC class: svm = SVC (C = 15, gamma = 0.1, kernel = ‘rbf’):
Figure 14. The result of classifying numbers 1–15.
Figure 15 shows models trained with the values of hyperparameters gamma (γ) and C = 41 (dactyl alphabet).
Figure 15. The result of the classification of the Kazakh dactylic alphabet.

4. Results

As part of the paper, a dataset of the alphabet of the Kazakh language, numbers from 1 to 15, with approximately the same depth map, were created. Next, the Get_Data module calculates the coordinates of the joints of the human palm using the BlazePalm palm detector model and creates a data set for one class of gestures. The fastest and most convenient way to store a dataset is the Pickle format, since even a csv file cannot compare with the speed of reading, processing, and viewing pkl files. The SVM is trained using a file, and the model was prepared with a data set of numbers from 1–15 and the alphabets. Two-handed and one-handed gestures were taught separately.
System example: the system was trained based on a data set recorded by a girl of 20 years old, and showed high recognition results at the same distance for children of 8 and 11 years old (Figure 16). In the case where the depth coordinates were changed, the system outputs erroneous classes (Figure 17).
Figure 16. An example of a correctly working system.
Figure 17. An example of incorrect system operation.
The results obtained during the experiment are clearly shown in the Confusion Matrix table.
The algorithm is usable, but the model does not always perform well because it is filmed in adverse lighting conditions with a weak webcam. Reassembling the model under ideal conditions provides the new possibilities for the program. Therefore, the program is yet to be improved further (Figure 18 and Figure 19).
Figure 18. Confusion Matrix of numbers.
Figure 19. Confusion Matrix of the Kazakh dactylic alphabet.
We use the graph of the confusion matrix to see how the currently selected classifier performs in each class.
Classification accuracy is the accuracy we usually have in mind when we use the term “precision”. We obtain classification accuracy by calculating the ratio of correct predictions to the total number of input samples.
Classification accuracy is good, but gives a false positive sense of achieving high accuracy. The problem arises because the probability of misclassifying samples of a minor class is very high.
In our example, using the dataset of numbers 1–15, and 41 letters of the Kazakh alphabet, the top row shows all data with a true class. The columns show the predicted classes. In the top row, 99% of the numbers 0 of the other numbers are classified correctly; therefore, 99% is the true positive for the correctly classified data in this class, shown in the green cell in the True Positive column. For 99%, we obtained the highest average score of 99.6%, for a dactylic alphabet of 90.6%.

5. Discussion

The proposed method recognizes in static and short dynamic gestures in real time, which are represented by two hands, in comparison with the methods described in reference sources considered above.
There are 42 letters in Kazakh Sign Language. Of these, the Kazakh letters i, ң, ғ, k, қ, ө, ë, ъ, and ь are different from the letters of other languages and possess dynamic elements.
In general, the results show the feasibility of the proposed approach to machine learning. In particular, it is shown that the SVM classification model can be trained on the data of a large set of available images, which are processed by the manual control algorithm (MediaPipe Hands) and then successfully tested by the system.

6. Conclusions

In this paper, a method based on the palm recognition model and linear recognition models of the dactyl alphabet and sign language numbers is discussed. The method considered was tested experimentally on the data of the magnetic positioning system using a kinematic model of the hand. During the test, the results for letters displayed in two positions were presented. The findings confirm the feasibility of the approach, with approximately 97% classification accuracy.
Therefore, the method enables the development of efficient automated sign language translation systems for sign languages. Such systems are capable of supporting effective human–machine communication and interaction for the deaf and hard of hearing.
Future developments of issues considered in this paper include the application of the method proposed, and experimental setup to solve problems of hand movement recognition.

7. Future Work

Firstly, by expanding the system we offer, it is possible to develop a sign language recognition system, which will work on recognizing proper names in the future. Secondly, use of the unified functionality offered in this system can be a prerequisite for developing multimodal videos without the words of sign languages. Thirdly, our system can be a prerequisite for the creation of multimodal sign corpora in the Kazakh language.

Author Contributions

N.A. engaged in writing and layout of the article, preparing data for the experiment, methodology, conducting the experiment, project management. S.K. (Saule Kudubayeva) engaged in counseling and methodology, as she has experience in the field of sign language recognition. A.K. (Akmaral Kassymova) help to added part of the description of the problem, the author conducted research on the forms of gesture demonstration. A.K. (Ardak Karipzhanova) help to added Related works part was edited, that is, the author helped to change this part completely. B.R. helped choose a linear model, and did the verification of the model. S.K. (Serikbay Kuralov) involved in conducting the experiment and preparing the data for the experiment. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization (WHO). Deafness and Hearing Loss. Available online: http://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss (accessed on 2 August 2022).
  2. Law of the Republic of Kazakhstan “On Education”. Available online: https://adilet.zan.kz/kaz/docs/Z070000319_ (accessed on 2 August 2022).
  3. The Concept of Development of Inclusive Education in Kazakhstan. Available online: https://legalacts.egov.kz/application/downloadconceptfile?id=2506747 (accessed on 2 August 2022).
  4. Amangeldy, N.; Kudubayeva, S.A.; Tursynova, N.A.; Baymakhanova, A.; Yerbolatova, A.; Abdieva, S. Comparative analysis on the form of demonstration words of kazakh sign language with other sign languages. TURKLANG 2022, 114. [Google Scholar]
  5. Gani, E.; Kika, A. Albanian Sign Language (AlbSL) Number Recognition from Both Hand’s Gestures Acqu4red by Kinect Sensors. Int. J. Adv. Comput. Sci. Appl. 2016, 7. [Google Scholar] [CrossRef]
  6. Sharma, A.; Mittal, A.; Singh, S.; Awatramani, V. Hand Gesture Recognition using Image Processing and Feature Extraction Techniques. Procedia Comput. Sci. 2020, 173, 181–190. [Google Scholar] [CrossRef]
  7. Malik, M.S.A.; Kousar, N.; Abdullah, T.; Ahmed, M.; Rasheed, F.; Awais, M. Pakistan Sign Language Detection using PCA and KNN. Int. J. Adv. Comput. Sci. Appl. 2018, 9. [Google Scholar] [CrossRef]
  8. Patil, A.; Tavade, C.M. Performance analysis and high recognition rate of automated hand gesture recognition though GMM and SVM-KNN classifiers. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 7712–7722. [Google Scholar] [CrossRef]
  9. Kagalkar, R.M.; Gumaste, S.V. Mobile Application Based Translation of Sign Language to Text Description in Kannada Language. Int. J. Interact. Mob. Technol. 2018, 12, 92–112. [Google Scholar] [CrossRef]
  10. Qin, M.; He, G. Gesture Recognition with Multiple Spatial Feature Fusion. In Proceedings of the 2016 4th International Conference on Machinery, Materials and Computing Technology, Changsha, China, 18–20 March 2016; Atlantis Press: Amsterdam, The Netherlands, 2016. [Google Scholar] [CrossRef][Green Version]
  11. Su, R.; Chen, X.; Cao, S.; Zhang, X. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors. Sensors 2016, 16, 100. [Google Scholar] [CrossRef]
  12. Kenshimov, C.; Buribayev, Z.; Amirgaliyev, Y.; Ataniyazova, A.; Aitimov, A. Sign language dactyl recognition based on machine learning algorithms. East.-Eur. J. Enterp. Technol. 2021, 4, 58–72. [Google Scholar] [CrossRef]
  13. Koller, O.; Zargaran, S.; Ney, H.; Bowden, R. Deep Sign: Enabling Robust Statistical Continuous Sign Language Recognition via Hybrid CNN-HMMs. Int. J. Comput. Vis. 2018, 126, 1311–1325. [Google Scholar] [CrossRef]
  14. Koller, O.; Zargaran, S.; Ney, H.; Bowden, R. Deep Sign: Hybrid CNN-HMM for Continuous Sign Language Recognition. In Proceedings of the British Machine Vision Conference 2016, York, UK, 19–22 September 2016. [Google Scholar] [CrossRef]
  15. Raj, H.; Duggal, A.; Uppara, S. Hand Motion Analysis using CNN. Int. J. Soft Comput. Eng. 2020, 9, 26–30. [Google Scholar] [CrossRef]
  16. Bendarkar, D.S.; Somase, P.A.; Rebari, P.K.; Paturkar, R.R.; Khan, A.M. Web Based Recognition and Translation of American Sign Language with CNN and RNN. Int. J. Online Biomed. Eng. 2021, 17, 34–50. [Google Scholar] [CrossRef]
  17. Rahim, A.; Islam, R.; Shin, J. Non-Touch Sign Word Recognition Based on Dynamic Hand Gesture Using Hybrid Segmentation and CNN Feature Fusion. Appl. Sci. 2019, 9, 3790. [Google Scholar] [CrossRef]
  18. Nafis, A.F.; Suciati, N. Sign Language Recognition on Video Data Based on Graph Convolutional Network. J. Theor. Appl. Inf. Technol. 2021, 99. [Google Scholar]
  19. Adithya, V.; Rajesh, R. A Deep Convolutional Neural Network Approach for Static Hand Gesture Recognition. Procedia Comput. Sci. 2020, 171, 2353–2361. [Google Scholar] [CrossRef]
  20. Ahuja, R.; Jain, D.; Sachdeva, D.; Garg, A.; Rajput, C. Convolutional Neural Network Based American Sign Language Static Hand Gesture Recognition. Int. J. Ambient Comput. Intell. 2019, 10, 60–73. [Google Scholar] [CrossRef]
  21. Rahim, A.; Shin, J.; Yun, K.S. Hand Gesture-based Sign Alphabet Recognition and Sentence Interpretation using a Convolutional Neural Network. Ann. Emerg. Technol. Comput. 2020, 4, 20–27. [Google Scholar] [CrossRef]
  22. Bastwesy, M.R.M.; Elshennawy, N.M.; Saidahmed, M.T.F. Deep Learning Sign Language Recognition System Based on Wi-Fi CSI. Int. J. Intell. Syst. Appl. 2020, 12, 33–45. [Google Scholar] [CrossRef]
  23. Xiao, Q.; Chang, X.; Zhang, X.; Liu, X. Multi-Information Spatial–Temporal LSTM Fusion Continuous Sign Language Neural Machine Translation. IEEE Access 2020, 8, 216718–216728. [Google Scholar] [CrossRef]
  24. Hossain, B.; Adhikary, A.; Soheli, S.J. Sign Language Digit Recognition Using Different Convolutional Neural Network Model. Asian J. Res. Comput. Sci. 2020, 16–24. [Google Scholar] [CrossRef]
  25. Sai-Kumar, S.; Sundara-Krishna, Y.K.; Tumuluru, P.; Ravi-Kiran, P. Design and Development of a Sign Language Gesture Recognition using Open CV. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 8504–8508. [Google Scholar] [CrossRef]
  26. Mohammed, A.A.Q.; Lv, J.; Islam, S. A Deep Learning-Based End-to-End Composite System for Hand Detection and Gesture Recognition. Sensors 2019, 19, 5282. [Google Scholar] [CrossRef]
  27. Khari, M.; Garg, A.K.; Gonzalez-Crespo, R.; Verdu, E. Gesture Recognition of RGB and RGB-D Static Images Using Convolutional Neural Networks. Int. J. Interact. Multimed. Artif. Intell. 2019, 5, 22. [Google Scholar] [CrossRef]
  28. Jia, Y.; Ding, R.; Ren, W.; Shu, J.; Jin, A. Gesture Recognition of Somatosensory Interactive Acupoint Massage Based on Image Feature Deep Learning Model. Trait. Signal 2021, 38, 565–572. [Google Scholar] [CrossRef]
  29. Caputo, A.; Giachetti, A.; Soso, S.; Pintani, D.; D’Eusanio, A.; Pini, S.; Borghi, G.; Simoni, A.; Vezzani, R.; Cucchiara, R.; et al. SHREC 2021: Skeleton-based hand gesture recognition in the wild. Comput. Graph. 2021, 99, 201–211. [Google Scholar] [CrossRef]
  30. Halder, A.; Tayade, A. Sign Language to Text and Speech Translation in Real Time Using Convolutional Neural Network. Int. J. Res. Publ. Rev. 2021, 8, 9–17. [Google Scholar]
  31. Gomase, K.; Dhanawade, A.; Gurav, P.; Lokare, S. Sign Language Recognition using Mediapipe. Int. Res. J. Eng. Technol. 2022, 9. [Google Scholar]
  32. Alvin, A.; Husna-Shabrina, N.; Ryo, A.; Christian, E. Hand Gesture Detection for American Sign Language using K-Nearest Neighbor with Mediapipe. Ultim. Comput. J. Sist. Komput. 2021, 13, 57–62. [Google Scholar] [CrossRef]
  33. Chakraborty, S.; Bandyopadhyay, N.; Chakraverty, P.; Banerjee, S.; Sarkar, Z.; Ghosh, S. Indian Sign Language Classification (ISL) using Machine Learning. Am. J. Electron. Commun. 2021, 1, 17–21. [Google Scholar] [CrossRef]
  34. Zhang, S.; Meng, W.; Li, H.; Cui, X. Multimodal Spatiotemporal Networks for Sign Language Recognition. IEEE Access 2019, 7, 180270–180280. [Google Scholar] [CrossRef]
  35. Yu, T.; Jin, H.; Tan, W.-T.; Nahrstedt, K. SKEPRID. ACM Trans. Multimedia Comput. Commun. Appl. 2018, 14, 1–24. [Google Scholar] [CrossRef]
  36. Luqman, H.; El-Alfy, E.-S. Towards Hybrid Multimodal Manual and Non-Manual Arabic Sign Language Recognition: mArSL Database and Pilot Study. Electronics 2021, 10, 1739. [Google Scholar] [CrossRef]
  37. Kagirov, I.; Ivanko, D.; Ryumin, D.; Axyonov, A.; Karpov, A. TheRuSLan: Database of Russian sign language. In Proceedings of the LREC 2020—12th International Conference on Language Resources and Evaluation, Conference Proceedings, Marseille, France, 11–16 May 2020. [Google Scholar]
  38. Ryumin, D.; Kagirov, I.; Ivanko, D.; Axyonov, A.; Karpov, A.A. Automatic detection and recognition of 3d manual gestures for human-machine interaction. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-2/W12, 179–183. [Google Scholar] [CrossRef]
  39. Dimskis, L.S. We Study Sign Language; Academia Publishing Center: Moscow, Russia, 2002; p. 128. [Google Scholar]
  40. Zaitseva, G.L. Signed speech. In Dactylology; Vlados: Moscow, Russia, 2000; p. 192. [Google Scholar]
  41. Kudubayeva, S.; Amangeldy, N.; Sundetbayeva, A.; Sarinova, A. The use of correlation analysis in the algorithm of dynamic gestures recognition in video sequence. In Proceedings of the 5th International Conference on Engineering and MIS, Pahang, Malaysia, 6–8 June 2019. [Google Scholar] [CrossRef]
  42. Saggio, G.; Cavallo, P.; Ricci, M.; Errico, V.; Zea, J.; Benalcázar, M. Sign Language Recognition Using Wearable Electronics: Implementing k-Nearest Neighbors with Dynamic Time Warping and Convolutional Neural Network Algorithms. Sensors 2020, 20, 3879. [Google Scholar] [CrossRef] [PubMed]
  43. Hou, J.; Li, X.-Y.; Zhu, P.; Wang, Z.; Wang, Y.; Qian, J.; Yang, P. SignSpeaker: A Real-time, High-Precision SmartWatch-based Sign Language Translator. Proceeding of the MobiCom’19: The 25th Annual International Conference on Mobile Computing and Networking, Los Cabos, Mexico, 21–25 October 2019; pp. 1–15. [Google Scholar] [CrossRef]
  44. Liu, Y.; Jiang, F.; Gowda, M. Finger Gesture Tracking for Interactive Applications. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2020, 4, 1–21. [Google Scholar] [CrossRef]
  45. Yin, S.; Yang, J.; Qu, Y.; Liu, W.; Guo, Y.; Liu, H.; Wei, D. Research on Gesture Recognition Technology of Data Glove Based on Joint Algorithm. In Proceedings of the 2018 International Conference on Mechanical, Electronic, Control and Automation Engineering, Qingdao, China, 30–31 March 2018; Atlantis Press: Amsterdam, The Netherlands, 2018. [Google Scholar] [CrossRef]
  46. Bairagi, V.K. Gloves based hand gesture recognition using indian sign language. Int. J. Latest Trends Eng. Technol. 2017, 8, 131–137. [Google Scholar] [CrossRef]
  47. Chiu, C.-M.; Chen, S.-W.; Pao, Y.-P.; Huang, M.-Z.; Chan, S.-W.; Lin, Z.-H. A smart glove with integrated triboelectric nanogenerator for self-powered gesture recognition and language expression. Sci. Technol. Adv. Mater. 2019, 20, 964–971. [Google Scholar] [CrossRef]
  48. Mummadi, C.K.; Leo, F.P.P.; Verma, K.D.; Kasireddy, S.; Scholl, P.M.; Kempfle, J.; van Laerhoven, K. Real-time and embedded detection of hand gestures with an IMU-based glove. Informatics 2018, 5, 28. [Google Scholar] [CrossRef]
  49. Available online: https://pypi.org/project/mediapipe/ (accessed on 2 August 2022).
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.