Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network

Cheng, Jinrun; Hu, Xing; Yang, Kuo

doi:10.3390/electronics14224374

Open AccessArticle

Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network

by

Jinrun Cheng

,

Xing Hu

and

Kuo Yang

^*

School of Mechanical Engineering, Shanghai DianJi University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(22), 4374; https://doi.org/10.3390/electronics14224374 (registering DOI)

Submission received: 10 October 2025 / Revised: 3 November 2025 / Accepted: 7 November 2025 / Published: 9 November 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Sign language recognition aims to capture and classify hand and arm motion signals to enable intuitive communication for individuals with hearing and speech impairments. This study proposes a real-time Chinese Sign Language (CSL) recognition framework that integrates a dual-stage segmentation strategy with a lightweight three-layer artificial neural network to achieve early gesture prediction before completion of motion sequences. The system was evaluated on a 21-class CSL dataset containing several highly similar gestures and achieved an accuracy of 91.5%, with low average inference latency per cycle. Furthermore, training set truncation experiments demonstrate that using only the first 50% of each gesture instance preserves model accuracy while reducing training time by half, thereby enhancing real-time efficiency and practical deployability for embedded or assistive applications.

Keywords:

sign language recognition system; surface electromyography (sEMG); artificial neural network; real-time recognition

1. Introduction

Sign language (SL) is one of the most common ways for many deaf people to communicate information and consists of specific gestures. It is challenging for most non-disabled individuals to understand these gestures directly. SL recognition is a process of identifying and classifying gesture information obtained by sensors through a computer. SL recognition a crucial area in human–computer interaction research. SL can be translated into speech or text information, aiding communication for individuals with hearing and speech impairments with others through body language or SL. It also finds applications in areas such as gesture control. In comparison with general gesture-based human–computer interaction, natural SL involves hand shape, position, movement, and other elements, making it more complex and variable [1]. In recent years, it has become a focus area for human research. Many SLs have been studied, such as American Sign Language [2,3], Korean Sign Language [4], Indonesian Sign Language [5], and Chinese sign language (CSL) [1,6].

Traditional gesture recognition research can be divided into two categories in terms of sensors: data gloves [7,8] and computer vision-based technologies [9]. The first computer vision-based technology uses cameras to obtain images and uses image processing techniques to complete SL recognition. This method does not need participants to wear any equipment and is low in cost, but the background environment and light source have a greater impact on the recognition result. The other type involves data gloves, which are wearable devices equipped with multiple sensors to capture hand pose, finger flexion, and motion trajectories, thereby providing rich spatial and temporal information for sign recognition. This method has a high recognition rate, but because the device is complex to wear and difficult to carry, the cost is high, and it is difficult to promote its use. In summary, computer-vision-based techniques are inexpensive and non-invasive but highly sensitive to illumination and background variation, whereas data glove systems capture finger-joint motion precisely yet remain costly, cumbersome, and impractical for daily use. These trade-offs motivate the exploration of sEMG-based sensing, which combines portability, robustness, and real-time potential.

Compared with the traditional method, sEMG signals reflect muscle electrical activity arising from neural activation; when recorded from the forearm, these signals correlate with the muscle contractions that drive finger and wrist motions. Originally developed for prosthetic hand control [10], surface EMG techniques are equally applicable to sign language recognition, because they capture the same neuromuscular activation patterns that are responsible for hand and finger movements [11]. It has the advantages of low cost and no environmental impact. The emergence of the multi-channel EMG arm ring has greatly increased its portability and practicality. The collected multi-channel EMG signals often have a large amount of data and also contain substantial noise, which takes time to process. However, SL recognition often has real-time requirements. This poses challenges for researchers. Real-time responsiveness is critical for practical applications such as assistive communication, prosthetic control, and human–robot interaction. Even small delays can interrupt the natural flow of conversation or motion feedback, increase user fatigue, and reduce confidence in system reliability. Therefore, minimizing latency is not only a technical goal but also a prerequisite for seamless human–machine communication. This paper presents an improved artificial neural network (ANN)-based CSL prediction model that uses the method of segmenting the training set to reduce the amount of data that are processed and greatly improve the recognition speed.

The main contributions of this paper are as follows:

(1) We proposed a real-time CSL recognition framework based on sEMG, which adopts an efficient sub-window segmentation strategy to enable early gesture prediction before the action is completed.

(2) Using threshold segmentation, we can achieve a recognition accuracy of more than 91% using only 50% of the training data, which significantly shortens the training time.

(3) We use a lightweight ANN model with only three layers that is suitable for low-computing power devices, making our system easier to deploy in practical assistive scenarios.

The rest of the paper is organized as follows: Section II describes related work, and details on methods and experiments are described in Section III. Section IV introduces the experimental results and discusses the impact of different training set sizes on the experimental results. Section V presents the conclusions of the work.

2. Related Work

Recent research on SL recognition has increasingly explored sEMG, especially with the availability of wearable armbands such as the Myo, which allow for multi-channel recording in a compact and portable form. As a subset of gesture recognition, SL recognition has also benefited from the broader advances in real-time gesture recognition.

Early sEMG studies demonstrated that compact time domain features can achieve reliable classification with modest computational demand, laying the foundation for real-time SL systems. Many researchers have combined sEMG with other sensor modalities. For example, Wu et al. [12] integrated sEMG and inertial measurement units (IMUs) to classify 80 ASL words using classical classifiers, achieving promising user-dependent performance in online tests. Similarly, Yang et al. [1] evaluated the classification capability of sEMG, accelerometer, and gyroscope signals and proposed a tree-structured framework that achieved 94.31% and 87.02% accuracy in user-dependent and user-independent tests, respectively, for 150 CSL subwords. While sensor fusion often improves recognition accuracy, it also inevitably increases system complexity and latency.

Motivated by real-time constraints, several studies focused on sEMG-only approaches. Savur et al. [13] collected eight-channel sEMG from the forearm and extracted ten features per channel, achieving 82.3% real-time accuracy on 26 ASL letters using support vector machine (SVM). With the rise of consumer wearable devices, the Myo armband has played a pivotal role in enabling SL recognition with low cost and high accessibility. Early Myo-based works established real-time baselines with classical classifiers, achieving sub-second latency on alphabetic gestures [14]. More recently, Kadavath et al. [15] designed an EMG-based SL system using Myo that combined wearability and rapid deployment with competitive performance, while Umut et al. [16] demonstrated real-time SL-to-text/voice conversion, confirming its practicality for continuous assistive scenarios.

Another line of work investigates robustness to electrode displacement and user variability. For example, Wang et al. [17] systematically analyzed the effect of limb position and electrode shift on recognition performance and proposed strategies for faster re-calibration and improved robustness. Such studies reflect the community’s growing interest in adaptive sEMG systems that are suitable for daily deployment.

To further improve accuracy, deep learning models have also been applied. López et al. [18] proposed a CNN–LSTM hybrid using spectrogram features, which improved robustness but significantly increased computational cost, illustrating the trade-off between modeling capacity and real-time feasibility on embedded devices. Similarly, [19] compared four classifiers on 40 daily-life gestures, emphasizing that models with high training cost are not well-suited for real-time deployment, especially with small training sets.

Within this context, our research proposes a unique CSL prediction framework based on an ANN, which innovates upon previous Myo-based ANN or SVM systems. First, a two-stage segmentation strategy combining early time truncation and sliding sub-windows enables the model to predict gestures before the action is completed, achieving true real-time response. Second, by utilizing a training set truncation method, over 91% accuracy is achieved using only the first half of each gesture time series, halving the training time compared to using the full dataset. Finally, a lightweight three-layer ANN architecture optimized for low-latency embedded operations is employed. These components collectively establish a novel real-time CSL prediction framework that balances accuracy, computational simplicity, and training efficiency. Distinct from prior Myo-based ANN/SVM systems, our framework jointly introduces dual-stage segmentation for early prediction and training set truncation for efficient learning, while retaining accuracy on a 21-class CSL set with subtle inter-class similarities.

3. Methodology

The sEMG signal is located in a high-dimensional space and takes the characteristics of nonlinearity and non-stationarity into account. Traditional gesture recognition models usually use high-complexity models, and training requires a large number of training datasets, long processing time, and a large memory. From a practical point of view, developing a real-time SL prediction model requires low complexity and a model that can achieve good results with a small number of samples. The model framework diagram we proposed is shown in Figure 1. Using the training set segmentation method can save processing time, in order to achieve the purpose of real-time prediction of CSL gestures.

3.1. EMG Data Acquisition

Eight healthy test subjects (four males and four females, age range of 22–27 years old, average age 23) participated in the experiment. They had not previously trained in CSL. During the experiment, they performed SL actions by imitating pictures. To collect sEMG data, an eight-channel low-cost consumption equipment, MYO, was used, as shown in Figure 2, which consists of eight pairs of dry electrodes and has a low sampling rate (200 Hz).

Although the dry electrode is less accurate and robust to motion artifacts than traditional gel-based electrodes [20], it means that the user does not need to shave and clean their skin in advance to obtain optimal contact between the user’s skin and electrodes; it only needs to be worn directly on the arm, and it is very easy to use (Figure 3).

In order to facilitate the recording of experimental data, we selected 21 gestures, including 20 common SL gestures in Chinese and 1 relaxation gesture. An illustration of the gestures is presented in Figure 4.

In addition, there was also a gesture in a relaxed state; a total of 21 CSL gestures were recognized, and these gestures were completed using the right hand alone. The collection device was uniformly worn at a fixed position on the subject’s right forearm. Each CSL gesture was performed 30 times, and the sEMG signal was recorded for 2 s in an action cycle. Each action recorded 400 data points, and each participant systematically performed this operation in the same way. The data of each action is randomly divided into 5 groups for the training set, and the remaining 25 groups are used for the test set. The main parameters of the sensors and dataset collection are shown in Table 1.

3.2. Data Preprocessing

When we obtain the original signal, due to the skin temperature, tissue structure, measurement site, etc., various noise signals or artifacts may be mixed, which may affect the result of feature extraction and thus the identification of EMG signals [21]. Therefore, the original signal needs to be preprocessed. First, it is standardized using Max-Min. Then we use the short-time Fourier transform to obtain the spectrum of the original signal, calculate the norm of the spectrum to detect the area of muscle activity during hand movement, and remove the inactive signal of the head and tail. This shortens the training time and improves accuracy. We use an absolute value function and a 4th-order Butterworth low-pass filter with a cutoff frequency of 5 Hz to smooth the signal and remove the original signal noise, as shown in Figure 5.

3.3. Data Segmentation

From the perspective of practical applications, it is difficult to obtain a large number of datasets for a specific user model, and the model needs to be trained for each person’s use. Therefore, it is very important to save the training time and cost by training real-time models on a limited dataset. In our proposed model, after the data is preprocessed, we need to segment the training data in order to help each user to train in the shortest time possible before use.

Since our model can recognize gestures and perform them at the same time, we can make predictions before the actions are completed. The specific method will be described later. Before training, we segment the training data, as shown in the figure, where the length of the training set that is returned after applying the muscle detection function is L, and we segment the data of length m from the starting point to form a new training set, T_N.

T_{N} = K [a b s ({F_{N}}^{'})] = (s_{1}, s_{2}, s_{3}, \dots s_{m})

(1)

where N denotes the number of sEMG channels, K represents the sub-windowing operation applied during segmentation, and F_N indicates the original windowed signal from which features are extracted. This notation ensures that the segmentation and feature extraction process is explicitly defined for each channel and each sub-window. A new signal T is obtained after segmentation.

T = (T_{1}, T_{2} \dots T_{8}) \in {[0, 1]}^{n \times 8}

(2)

In this work, we adopt a two-stage segmentation strategy to support early gesture prediction. The process of data segmentation is shown in Figure 6. First, once the muscle activity region is detected, we apply temporal truncation to retain only the initial portion of the gesture sequence. This fixed-window truncation ensures that the model focuses on the early stage of muscle activation, allowing the system to anticipate gestures before they are completed. Second, within the truncated segment, we introduce a sliding sub-window mechanism, where short overlapping windows are continuously extracted and processed by the classifier. This hierarchical segmentation, i.e., temporal truncation followed by a sliding sub-window strategy, combines the advantages of early decision making and fine-grained temporal resolution, improving responsiveness and prediction stability compared to traditional fixed-length or overlapping window strategies that are applied to the entire gesture. For this work, we used a uniform window length of 25 points for the training set and testing set, and a shorter window length can achieve better real-time performance. Given our sampling frequency of 200 Hz, each window corresponds to a time duration of 125 ms, which is sufficient to capture the dynamic muscle activity for the majority of CSL gestures. We deliberately frame the task as a short-window classification problem. Each 25-point sub-window provides sufficient temporal context for a lightweight ANN to achieve accurate recognition while maintaining an inference time of about 20 ms per cycle, which is crucial for deployment on embedded devices. A shorter window would not include enough temporal information, while longer windows introduce latency and may reduce real-time responsiveness.

3.4. Feature Extraction and CSL Classification

After the signal has been pre-processed and segmented, feature extraction is first performed. Appropriate feature extraction is very important for the identification and classification of sEMG signals. Characteristics based on time statistics have been widely used in research. Compared with the frequency domain and time–frequency methods, real-time constraints can be modeled under simple hardware conditions [22]. In order to facilitate the calculation, we extracted 6 representative eigenvalues with lower calculation dimensions in the preprocessed EMG signal, which are waveform length (WL), scope sign changes (SSC), root mean square (RMS), variance (VAR), and mean absolute value (MAV). These features were adopted based on their demonstrated effectiveness in previous EMG-based gesture recognition studies, including our own prior work [23,24]. Following established practices ensures consistency with the literature and provides reliable performance without introducing unnecessary computational overhead. In this study, we also re-validated their empirical performance under a constrained real-time setting. Compared with frequency domain, Fourier-based, or wavelet-based time–frequency descriptors, time domain features can be extracted with minimal latency and do not require extensive windowing or large-scale matrix operations, thereby maintaining very low computational costs. At the same time, they capture essential information on amplitude variation, signal complexity, and spectral dynamics, which makes them particularly suitable for deployment on wearable or embedded devices with limited processing power and strict latency constraints.

For the classification part, we used a simple three-layer feedforward ANN classifier, because it is computationally efficient, easy to implement, and well suited for low-latency, real-time prediction when the dataset is limited. Given the real-time constraints and the relatively small size of the training dataset, ANNs offer a good balance between performance and computational complexity. Although recurrent models such as LSTM and GRU are effective for modeling long-term dependencies in sequential data, they typically incur higher computational and memory costs, which would hinder deployment on resource-constrained platforms, where low-latency operation is critical. The size of the parameter depends on the complexity of the network structure and the input dimension. In the gesture recognition application of this study, compared with CNNs and LSTM, the ANN structure is relatively simple, and the number of parameters is proportional to the number of layers and the number of neurons in each layer; a CNN extracts local features through convolution kernels, and the number of parameters is lower than that of the fully connected network, but it will still increase due to the increase in the number of channels; LSTM has a gating mechanism, and there are multiple weight matrices in each LSTM unit, so the number of parameters far exceeds that of an ANN [25].

In this study, the ANN has three layers, namely the input layer, hidden layer, and output layer. The number of nodes in the input layer includes the sub-window data and the extracted feature vector, with 8 channels and a total of 48 neurons; the number of nodes in the hidden layer is 128, and the tanh transfer function is used to introduce the necessary nonlinearity into the model; the output layer uses a softmax activation function to normalize the output into a probability distribution, and the number of nodes is 21 for gesture categories. The model is trained using the Adam optimizer, which combines the advantages of momentum-based methods and adaptive learning rates. The Adam optimizer was chosen because it is efficient and suitable for our relatively small dataset. The model was trained for 150 iterations, which was sufficient to achieve convergence according to preliminary tests, and the batch size was 64 to balance training speed and memory usage. We counted the labels returned by the ANN and set a threshold.

Yi = \{\begin{cases} t, & i f m = τ \\ 0, & e l s e w l s e \end{cases}

(3)

where

t \in (0, 1, 2, \dots 20)

represents the recognized gesture, and m returns the count label. Real-time prediction was carried out while the sub-window was moving backward, and the gesture category was output after the threshold was reached, which can effectively improve the real-time response of the system.

4. Analysis of Results

We used three different methods to evaluate and analyze the performance of the system. The first is to analyze the prediction accuracy of all 21 gestures of all subjects. The next is to discuss our data segmentation to identify the accuracy and the evaluation of the change in training time for the segmented dataset. Finally, we evaluate the superiority of our proposed real-time system in terms of response time.

4.1. SL Gesture Prediction Performance Evaluation

In this article, we use the confusion matrix results obtained from all the training sets as shown in the figure, showing the results of all the non-test sets of all the subjects. As can be seen from Figure 7, the overall prediction accuracy of the 21 actions reached 91.5%, the best-performing gestures reached 100%, and the worst-performing reached 79.5%.

Across all 21 gestures, the mean accuracy was 91.5 ± 2.7%, indicating good consistency among participants. Most misclassifications occurred between gestures with similar forearm muscle activations, such as ‘SAN’ and ‘Z’, or between motionless gestures and low-amplitude movements. The standard deviation of class-wise accuracy across users remained below 3%, demonstrating stable recognition, even with natural inter-subject variation. These findings confirm that the system’s main errors stem from physiologically similar gesture patterns rather than random noise, reflecting the overall robustness of the proposed ANN-based framework. The evaluation results presented in Figure 7 reflect the aggregated classification performance across all eight participants (four males and four females). Each subject contributed 30 repetitions per gesture, resulting in a comprehensive multi-user dataset. While Table 2 highlights accuracy and training time for four representative users to illustrate training scalability, the confusion matrix in Figure 7 provides a complete visualization of the system’s prediction accuracy across all 21 gesture classes and all eight users. This aggregation ensures that the model’s performance reflects inter-user variability and generalization capability.

4.2. Evaluation of Training Set Size

In our model, only a part of the training set is segmented for training to save training time. We evaluated training sets of different sizes, ranging from 20% to 100%. We analyzed from the two aspects of prediction accuracy and average training time, and the results are shown in Figure 8, where the accuracy refers to the overall accuracy rate including all test sets, and the average training time is the training time of each group of data, calculated on the host MATLAB 2018b (OS: Windows 10; CPU: i7-9750 H; RAM: 16 GB).

From Figure 8, we can see that as the length of the training set increases, the accuracy rate first increases, then gradually approaches a plateau, and finally stabilizes at about 91%. After the training set size is 50%, the accuracy changes are small, because our real-time system can complete gesture prediction when almost all gestures are not completed. The 50% training data does not refer to reducing the number of gesture repetitions but to the time truncation of each gesture instance. Specifically, each gesture duration is 2 s and contains 400 samples, and we only extract the first 50% of the time series, or 200 samples, from each repetition for training. The training time increases almost linearly with the increase in the training set length. This is as we expected. As the training data increases, the training time cost will inevitably increase. Therefore, for our proposed model, time can be saved by reducing the amount of training data, the original data can be reduced by half, and the expected prediction result can be achieved. This design enables the system to learn to predict gestures in the early stages of gesture execution, thereby improving real-time responsiveness. The total number of training samples (i.e., gesture instances) remains unchanged; only the signal duration used for each instance is shortened. This strategy ensures both training efficiency and early prediction capabilities without compromising category coverage or representation balance.

In Table 2, we can see that when the training set is only 20%, the accuracy of user 2 is only 17%. This is probably because the length of the training set is too small, as it is less than the length of the sub-window, and a large number of gestures are recognized as no gesture. As the training set increases, the accuracy rate increases rapidly and eventually remains stable.

To further quantify the inter-subject differences, we calculated the average classification accuracy and standard deviation for all subjects at each training set size. As shown in Table 2, the average accuracy when using 50% of the training data is 0.90, with a standard deviation of ±0.03. In addition, the 95% confidence interval at this data ratio is [0.86, 0.95], indicating that the model performance remains at a high level across different users. Although not subjected to ANOVA/t-tests in this study, the observed performance trends were consistent across users, and more rigorous statistical validation could be carried out. This also shows that the system maintains good generalization capabilities despite natural variations in muscle activation patterns, arm size, and electrode alignment.

4.3. Real-Time Performance of the Model

The overall processing timeline of gesture recognition is illustrated in Figure 9. A complete gesture spans approximately 2 s, although the detected muscle activity occupies only part of this interval. Within the active region, the system applies sliding sub-windows of 25 points to generate predictions. This design enables early decision making, with the system consistently producing a stable prediction within 200 ms of gesture onset (response time), rather than waiting for the gesture to finish.

Figure 10 further compares the gesture action time with the response time. Here, movement time represents the duration of detected muscle activity, while response time denotes the interval from gesture onset to the first correct prediction produced by the sliding sub-window mechanism. As shown, a full gesture takes about 1500 ms, but the system outputs a reliable prediction after only ~200 ms, thereby significantly reducing overall recognition delay.

To validate the real-time capability of the proposed CSL recognition system under continuous use, we conducted a streaming experiment in which subjects performed CSL gestures sequentially without interruption. The system continuously acquired and processed sEMG signals in real time using the sliding sub-window strategy. The measured end-to-end latency—including acquisition, preprocessing, feature extraction, and ANN inference—was approximately 20 ms per prediction cycle. This latency remained consistent across gesture types and subjects, and the system maintained robust performance, even during overlapping gesture transitions.

4.4. Comparison with Other Methods

To contextualize the effectiveness of our proposed method, we compare it with several representative works on sEMG-based gesture recognition, as shown in Table 3. Simao et al. [26] applied recurrent neural networks (RNNs) for online gesture classification and achieved 92.2% accuracy on a small set of six gestures. Xie et al. [27] utilized convolutional networks for gesture recognition using wearable sensors, attaining 90.0% accuracy over 10 classes, although the real-time capability was not specified. Zhang et al. [28] also employed RNNs for sEMG-based prediction and reported 89.7% accuracy across 12 hand gestures.

In contrast, our proposed ANN-based model achieves 91.5% accuracy on 21 CSL gestures, using only sEMG signals collected from the MYO armband. ANNs are preferred over more complex architectures such as CNNs, LSTMs, or Transformers, because our target application prioritizes low latency and low computational costs in a real-time environment. Unlike more complex RNN or CNN architectures, our approach maintains a lightweight structure that is suitable for real-time deployment and supports early prediction via a moving sub-window strategy. Furthermore, our model achieves high accuracy using just 50% of training data, highlighting its training efficiency. These results demonstrate that, while competitive with recent deep learning methods, our system balances accuracy, simplicity, and speed, which is critical for real-world assistive applications. This design explicitly targets real-time responsiveness under limited training duration and embedded computer budgets, a combination that is not concurrently addressed by earlier Myo-based approaches.

5. Conclusions

This paper proposes a real-time gesture prediction model. This model takes the sEMG of the forearm muscles, measured by the muscle arm band, as input. For any user, the model can learn to recognize gestures through a training process. Unlike other high-complexity methods that require a large number of samples to train, we employed a low-complexity model trained with a limited number of samples and evaluated it on a larger dataset, achieving competitive prediction accuracy.

The model proposed in this paper has better real-time performance than traditional gesture recognition, which is mainly reflected in three aspects to save time. First, we used a muscle detection function during training to quickly remove the inactive head and tail of the original signal. We then segmented the training set and used only some of the signals to train the model. Experimental results prove that only about 50% of the training set data is needed to reach the final prediction accuracy. Finally, we used an improved ANN classifier to count and classify the labels that are returned by the sliding sub-window in real time, so that the SL gestures can be predicted in real time. It should be noted that this study involved eight participants (aged 22–27 years, all right-handed) under similar physical conditions. Such demographic homogeneity may limit the generalization of the proposed model to individuals of different ages, muscle strengths, limb sizes, or dominant hands. Future work will include participants with more diverse physiological and demographic backgrounds to systematically evaluate inter-subject variability and improve the robustness and adaptability of the system. In future work, we plan to extend the framework to recognize bimanual CSL gestures. This will involve equipping both forearms with synchronized sEMG armbands and implementing signal fusion strategies to jointly analyze muscle activations from the left and right arms. Possible approaches include feature-level concatenation and attention-based temporal fusion to model the coordination between the two hands. Such an extension will allow the system to handle a broader range of CSL vocabulary and improve its applicability to real-world communication scenarios.

In addition to extending the system to support bimanual CSL gestures, we plan to make several methodological improvements to improve statistical robustness and deployment reliability. Future work will incorporate stratified k-fold cross-validation to better assess the generalization ability of the model, especially under class imbalance and small-sample conditions. This will allow us to more rigorously evaluate the between-class variance and support model selection to reduce bias. Additionally, the current decision threshold in the sliding window voting mechanism is empirically chosen by validation on a holdout subset. While these methods are effective, more adaptive methods such as confidence-weighted fusion or dynamic temporal voting can further improve robustness, especially in noisy-signal conditions or in the presence of user-specific variables.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/electronics14224374/s1.

Author Contributions

Conceptualization, J.C. and K.Y.; methodology, J.C.; software, K.Y.; validation, J.C., X.H. and K.Y.; formal analysis, J.C.; investigation, X.H.; resources, J.C. and K.Y.; data curation, J.C.; writing—original draft preparation, J.C.; writing—review and editing, X.H. and K.Y.; visualization, K.Y.; project administration, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The research data is unavailable due to privacy, see Supplementary Materials.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, X.; Chen, X.; Cao, X.; Wei, S.; Zhang, X. Chinese sign language recognition based on an optimized tree-structure framework. IEEE J. Biomed. Health Inform. 2016, 21, 994–1004. [Google Scholar] [CrossRef]
Hellara, H.; Barioul, R.; Sahnoun, S.; Fakhfakh, A.; Kanoun, O. Improving the accuracy of hand sign recognition by chaotic swarm algorithm-based feature selection applied to fused surface electromyography and force myography signals. Eng. Appl. Artif. Intell. 2025, 154, 110878. [Google Scholar] [CrossRef]
Singh, S.K.; Chaturvedi, A. A reliable and efficient machine learning pipeline for American Sign Language gesture recognition using EMG sensors. Multimed. Tools Appl. 2023, 82, 23833–23871. [Google Scholar] [CrossRef]
Shin, J.; Miah, A.S.M.; Suzuki, K.; Hirooka, K.; Hasan, M.A.M. Dynamic Korean sign language recognition using pose estimation based and attention-based neural network. IEEE Access 2023, 11, 143501–143513. [Google Scholar] [CrossRef]
Nadaf, A.I.; Pardeshi, S.; Gupta, R. Efficient gesture recognition in Indian sign language using SENet fusion of multimodal data. J. Integr. Sci. Technol. 2025, 13, 1145. [Google Scholar] [CrossRef]
Li, Y.; Chen, X.; Zhang, X.; Wang, K.; Wang, Z.J. A sign-component-based framework for Chinese sign language recognition using accelerometer and sEMG data. IEEE Trans. Biomed. Eng. 2012, 59, 2695–2704. [Google Scholar] [CrossRef]
Galka, J.; Masior, M.; Zaborski, M.; Barczewska, K. Inertial Motion Sensing Glove for Sign Language Gesture Acquisition and Recognition. IEEE Sens. J. 2016, 16, 6310–6316. [Google Scholar] [CrossRef]
Gao, W.; Fang, G.; Zhao, D.; Chen, Y. A Chinese sign language recognition system based on SOFM/SRN/HMM. Pattern Recognition. 2004, 37, 2389–2402. [Google Scholar] [CrossRef]
Molchanov, P.; Gupta, S.; Kim, K.; Kautz, J. Hand gesture recognition with 3D convolutional neural networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops 2015, Boston, MA, USA, 1–7 October 2015. [Google Scholar] [CrossRef]
Oskoei, M.A.; Hu, H. Myoelectric control systems—A survey. Biomed. Signal Process. Control 2007, 2, 275–294. [Google Scholar] [CrossRef]
Orban, M.; Zhang, X.; Lu, Z.; Marcal, A.; Emad, A.; Masengo, G. Precise Control Method on Prosthetic Hand Using sEMG Signals. In Proceedings of the IEEE International Conference on Cyber Technology in Automation, Control and Intelligent Systems (CYBER), Xi’an, China, 10–13 October 2020; pp. 326–331. [Google Scholar] [CrossRef]
Wu, J.; Sun, L.; Jafari, R. A Wearable System for Recognizing American Sign Language in Real-Time Using IMU and Surface EMG Sensors. IEEE J. Biomed. Health Inform. 2016, 20, 1281–1290. [Google Scholar] [CrossRef]
Savur, C.; Sahin, F. American Sign Language Recognition system by using surface EMG signal. In Proceedings of the 2016 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2016—Conference Proceedings 2017, Budapest, Hungary, 9–12 October 2016; pp. 2872–2877. [Google Scholar] [CrossRef]
Tepe, C.; Demir, M.C. Real-time classification of emg myo armband data using support vector machine. IRBM 2022, 43, 300–308. [Google Scholar] [CrossRef]
Kadavath, M.R.K.; Nasor, M.; Imran, A. Enhanced hand gesture recognition with surface electromyogram and machine learning. Sensors 2024, 24, 5231. [Google Scholar] [CrossRef]
Umut, İ.; Kumdereli, Ü.C. Novel Wearable System to Recognize Sign Language in Real Time. Sensors 2024, 24, 4613. [Google Scholar] [CrossRef]
Wang, B.; Li, J.; Hargrove, L.; Kamavuako, E.N. Unravelling influence factors in pattern recognition myoelectric control systems: The impact of limb positions and electrode shifts. Sensors 2024, 24, 4840. [Google Scholar] [CrossRef]
López, L.I.B.; Ferri, F.M.; Zea, J.; Caraguay, Á.L.V.; Benalcázar, M.E. CNN-LSTM and post-processing for EMG-based hand gesture recognition. Intell. Syst. Appl. 2024, 22, 200352. [Google Scholar] [CrossRef]
Wu, J.; Tian, Z.; Sun, L.; Estevez, L.; Jafari, R. Real-time American Sign Language Recognition using wrist-worn motion and surface EMG sensors. In Proceedings of the 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks, BSN 2015, Cambridge, MA, USA, 9–12 June 2015; pp. 1–6. [Google Scholar] [CrossRef]
Stegeman, D.F.; Kleine, B.U.; Lapatki, B.G.; Van Dijk, J.P. High-density surface emg: Techniques and applications at a motor unit level. Biocybern. Biomed. Eng. 2012, 32, 3–27. [Google Scholar] [CrossRef]
Benalcazar, M.E.; Motoche, C.; Zea, J.A.; Jaramillo, A.G.; Anchundia, C.E.; Zambrano, P.; Segura, M.; Palacios, F.B.; Perez, M. Real-time hand gesture recognition using the Myo armband and muscle activity detection. In Proceedings of the 2017 IEEE 2nd Ecuador Technical Chapters Meeting, ETCM 2017, Salinas, Ecuador, 16–20 October 2017; pp. 1–6. [Google Scholar] [CrossRef]
Tkach, D.; Huang, H.; Kuiken, T.A. Study of stability of time-domain features for electromyographic pattern recognition. J. Neuroeng. Rehabil. 2010, 7, 1–13. [Google Scholar] [CrossRef]
Zhang, Z.; Yang, K.; Qian, J.; Zhang, L. Real-Time Surface EMG Pattern Recognition for Hand Gestures Based on an Artificial Neural Network. Sensors 2019, 19, 3170. [Google Scholar] [CrossRef]
Le, H.; Panhuis, M.I.H.; Spinks, G.M.; Alici, G. The effect of dataset size on EMG gesture recognition under diverse limb positions. In Proceedings of the 2024 10th IEEE RAS/EMBS International Conference for Biomedical Robotics and Biomechatronics (BioRob), Heidelberg, Germany, 1–4 September 2024; IEEE: New York, NY, USA, 2024; pp. 303–308. [Google Scholar] [CrossRef]
Wei, H.; Nie, J.; Yang, H. Rapid Calibration of High-Performance Wavelength Selective Switches Based on the Few-Shot Transfer Learning. J. Light. Technol. 2025, 43, 6682–6689. [Google Scholar] [CrossRef]
Simao, M.A.; Neto, P.; Gibaru, O. EMG-based Online Classification of Gestures with Recurrent Neural Networks. Pattern Recognit. Lett. 2019, 128, 45–51. [Google Scholar] [CrossRef]
Xie, B.; Li, B.; Harland, A. Movement and Gesture Recognition Using Deep Learning and Wearable-sensor Technology. In Proceedings of the 2018 International Conference on Artificial Intelligence and Pattern Recognition, Beijing, China, 18–20 August 2018; pp. 26–31. [Google Scholar] [CrossRef]
Zhang, Z.; He, C.; Yang, K. A novel surface electromyographic signal-based hand gesture prediction using a recurrent neural network. Sensors 2020, 20, 3994. [Google Scholar] [CrossRef]
Abreu, J.G.; Teixeira, J.M.; Figueiredo, L.S.; Teichrieb, V. Evaluating Sign Language Recognition Using the Myo Armband. In Proceedings of the 18th Symposium on Virtual and Augmented Reality, SVR 2016, Gramado, Brazil, 21–24 June 2016; pp. 64–70. [Google Scholar] [CrossRef]

Figure 1. CSL prediction model system.

Figure 2. The MYO armband and its components.

Figure 3. The electrode position on the right arm: (a) front view; (b) back view.

Figure 4. The 20 recognized CSL gestures.

Figure 5. (a) Raw signal of one channel. (b) Preprocessed signal.

Figure 6. (a) Cut-out of the yellow box for the first interception. (b) Second interception of data in black box.

Figure 7. Confusion matrix showing gesture-level prediction accuracy, aggregated across all eight users and 21 CSL classes (rows represent true labels, columns predicted labels; diagonal indicates correct predictions).

Figure 8. Evaluation of the impact of training set size on training time and accuracy.

Figure 9. Overall processing timeline of gesture recognition.

Figure 10. Average activity time and response time.

Table 1. Main parameters of dataset collection.

Acquisition Device	Myo Armband	Channel number	8
Sensor placement	right forearm	Sampling frequency	200 Hz
Subject number	8	Male/Female	4/4
Gestures	21	Repetitions	30
Sampling time of a repetition	2 s	Training set: Test set	1:5

Table 2. The user classification accuracy and training time for different training set sizes.

	Subject 1		Subject 2		Subject 3		Subject 4
Training Set Size	Training Time (ms)	Accuracy (%)	Testing Time (ms)	Accuracy (%)	Testing Time (ms)	Accuracy (%)	Testing Time (ms)	Accuracy (%)	Mean Accuracy ± SD	95% CI
20%	33.30	70	8.68	17	28.50	29	30.47	62	0.44 ± 0.255	[0.04, 0.85]
30%	65.90	84	34.07	70	49.28	68	62.37	82	0.76 ± 0.082	[0.63, 0.89]
40%	109.24	90	62.86	88	82.87	79	104.12	85	0.86 ± 0.048	[0.78, 0.93]
50%	140.56	92	87.55	93	114.67	89	126.59	87	0.90 ± 0.027	[0.86, 0.95]
60%	181.15	94	113.56	94	141.31	90	162.16	88	0.91 ± 0.03	[0.87, 0.96]
70%	208.43	94	144.62	94	183.17	91	200.35	87	0.92 ± 0.033	[0.86, 0.97]
80%	246.87	93	163.53	94	201.04	89	251.05	88	0.91 ± 0.029	[0.86, 0.96]
90%	262.21	94	176.02	93	227.23	89	274.53	88	0.91 ± 0.029	[0.86, 0.96]
100%	349.89	94	220.91	93	260.99	89	298.82	88	0.91 ± 0.029	[0.86, 0.96]

Table 3. Effectiveness of our proposed method compared with several representative works on sEMG-based gesture recognition.

Reference	Task Type	Sensor Setup	Model Type	Gestures	Accuracy	Real-Time Capable
Wu et al. [12]	ASL (80 words)	sEMG + IMU	SVM, RF	80	92.00%	Yes
Savur et al. [13]	ASL (26 letters)	sEMG	SVM	26	82.30%	Yes
Abreu et al. [29]	Brazilian SL	sEMG (MYO)	SVM	20	87.00%	Yes
Simao et al. [26]	Generic gestures	sEMG	LSTM/GRU	8	92.20%	Yes
Xie et al. [27]	Hand motions	sEMG(MYO)	CNN	17	90.00%	Not specified
Zhang et al. [28]	Hand gestures	sEMG(MYO)	RNN	21	89.60%	Yes
Our method (this work)	CSL (21 classes)	sEMG (MYO)	ANN	21	91.50%	Yes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheng, J.; Hu, X.; Yang, K. Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network. Electronics 2025, 14, 4374. https://doi.org/10.3390/electronics14224374

AMA Style

Cheng J, Hu X, Yang K. Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network. Electronics. 2025; 14(22):4374. https://doi.org/10.3390/electronics14224374

Chicago/Turabian Style

Cheng, Jinrun, Xing Hu, and Kuo Yang. 2025. "Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network" Electronics 14, no. 22: 4374. https://doi.org/10.3390/electronics14224374

APA Style

Cheng, J., Hu, X., & Yang, K. (2025). Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network. Electronics, 14(22), 4374. https://doi.org/10.3390/electronics14224374

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Real-Time Chinese Sign Language Gesture Prediction Based on Surface EMG Sensors and Artificial Neural Network

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. EMG Data Acquisition

3.2. Data Preprocessing

3.3. Data Segmentation

3.4. Feature Extraction and CSL Classification

4. Analysis of Results

4.1. SL Gesture Prediction Performance Evaluation

4.2. Evaluation of Training Set Size

4.3. Real-Time Performance of the Model

4.4. Comparison with Other Methods

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI