Low-Density sEMG-Based Pattern Recognition of Unrelated Movements Rejection for Wrist Joint Rehabilitation

sEMG-based pattern recognition commonly assumes a limited number of target categories, and the classifiers often predict each target category depending on probability. In wrist rehabilitation training, the patients may make movements that do not belong to the target category unconsciously. However, most pattern recognition methods can only identify limited patterns and are prone to be disturbed by abnormal movement, especially for wrist joint movements. To address the above the problem, a sEMG-based rejection method for unrelated movements is proposed to identify wrist joint unrelated movements using center loss. In this paper, the sEMG signal collected by the Myo armband is used as the input of the sEMG control method. First, the sEMG signal is processed by sliding signal window and image coding. Then, the CNN with center loss and softmax loss is used to describe the spatial information from the sEMG image to extract discriminative features and target movement recognition. Finally, the deep spatial information is used to train the AE to reject unrelated movements based on the reconstruction loss. The results show that the proposed method can realize the target movements recognition and reject unrelated movements with an F-score of 93.4% and a rejection accuracy of 95% when the recall is 0.9, which reveals the effectiveness of the proposed method.


Introduction
The loss of one side of upper limb mobility function and sensory information hampers the activities of daily living (ADL) for hemiplegic patients [1]. Many clinical studies have shown that the plasticity of the central nervous system can restore the function of the injured central nervous system. This remodeling can be strengthened and consolidated through continuous motor relearning to help the functional repair and reconstruction of the neuromuscular system. Therefore, appropriate rehabilitation training can help patients recover their motor function to a certain extent. The delay of intervention therapy will affect patients' affected side functional capability restoration and increase rehabilitation duration [2,3]. Bilateral rehabilitation training is considered to be an effective strategy for the rehabilitation of hemiplegic patients [4]. The maximum latency of 300ms was recommended for sEMG-based pattern recognition method [5]. The sEMG-based pattern recognition method mainly finds the internal information of multi-channel sEMG signals through feature learning, which assumes that the internal information of sEMG signals for the same movements are similar, but the internal information of sEMG signals for different movements are different [6]. Therefore, the upper limb intention recognition with high accuracy and reliability is an important part of the exoskeleton control system [7], especially for remote monitoring which needs a more efficient and intelligent framework to decode the subject's intention. sEMG signals can directly reflect the activation of superficial muscles  Accurate and reliable motion intention perception and prediction is the key to the exoskeleton control system, especially for remote monitoring. In this paper, the rejection of unrelated movements is considered to achieve rehabilitation training with high accuracy and reliability. The proposed method overview is shown in Figure 1b, which mainly includes sEMG image encoding, discriminative features extract module, and unrelated movements rejection module. The method proposed in this paper is based on the assumption that although the signals between the wrist joint target movements and unrelated movements are similar, these movements are activated by different muscle combinations. Therefore, the deep spatial features of the sEMG images extracted from these movements are also different. First, the sEMG signal is processed by a sliding signal window and image coding, and then CNN is used to extract deep discrimination features from these sEMG images. CNN is also used to predict the possible target category of the sample combined with the center loss and softmax with cross-entropy loss, and then the AE network is trained to reject samples higher than the set threshold by calculating reconstruction error. If the reconstruction error is below a threshold, it is considered that the sample belongs to the target movements, and the movements category is the prediction result of CNN; If the reconstruction error is above the threshold, the sample is considered as unrelated movements and rejected. These steps are introduced in detail below.

sEMG Dataset
The sEMG data contains two distinct sub-datasets in this paper, which are The NinaPro DB5 [29] dataset and NewMyo dataset. The NinaPro DB5 dataset, recorded with the Myo Armband, contains data from 10 able-bodied participants performing a total of 53 different movements (including neutral which denotes resting state) divided into three exercise sets. The NewMyo dataset is collected from two able-bodied subjects in this paper (one male and one female; right-hand; age from 22 to 32). The experimental content was introduced to each subject in detail, and each subject signed the informed consent. All the experimental procedures are approved by the Institutional Review Board (IRB) in the faculty of Engineering Kagawa University (Ref. No. 01-011 from February 2020), which follows the ethical principle of declaration of Helsinki.
The experimental scheme adopted in this paper is shown in Figure 2. sEMG signals are collected using the Myo armband (Thalmic Labs Inc. Kitchener, Ontario, Canada), which is a commercially available device with eight equidistance sEMG sensors that transfers the data through a Bluetooth low-energy connection to the computer. The Myo armband sensors distribution is shown in Figure 2a, in which the electrode with the LED light and Myo logo that shows the sync state is channel 4, followed by channel 5 in a clockwise direction and channel 3 in a counterclockwise direction. Limited by the number of Myo armbands, the red dotted line frame is the main muscle group of wrist joint motion, including brachioradialis, flexor carpi radialis, extensor carpi radialis, palmaris longus, flexor carpi ulnaris, and so on. The initial position of channel 4 is placed about 2 cm away from the right elbow joint, which is located in the brachioradialis muscle. The real-time sEMG data at 200 Hz can be acquired through the software development kit (SDK) of the Myo armband. The Myo armband includes a 50 Hz notch filter which eliminates the power-line interference (50 Hz) [30]. The subjects were asked to perform 12 different movements, including seven target movements and five rejection movements which are highly similar to the sample signal of the target movements, as shown in Figure 2b,c. Seven target movements include wrist pronation (T1)/supination (T2) (axis: little finger), wrist extension (T3)/flexion (T4), wrist radial deviation (T5)/ulnar deviation (T6), and neutral. The five rejection movements include the abduction of all fingers (R1), fingers flexed together in a fist (R2), wrist pronation (R3)/supination (R4) (axis: middle finger), and wrist extension with closed hand (R5). The data collection scheme strictly replicated the scheme from Ninapro DB5. The subjects were asked to perform the movements with the right hand for six repetitions. Each movement repetition lasted 5 s and was followed by 3 s of rest to avoid muscular and mental fatigue of the subjects. wrist extension (T3)/flexion (T4), wrist radial deviation (T5)/ulnar deviation (T6), and neutral. The five rejection movements include the abduction of all fingers (R1), fingers flexed together in a fist (R2), wrist pronation (R3)/supination (R4) (axis: middle finger), and wrist extension with closed hand (R5). The data collection scheme strictly replicated the scheme from Ninapro DB5. The subjects were asked to perform the movements with the right hand for six repetitions. Each movement repetition lasted 5 s and was followed by 3 s of rest to avoid muscular and mental fatigue of the subjects.  For the NinaPro DB5 dataset, the second exercise set is of particular interest to this paper. The first eight-channel sEMG signals of seven target movement signals and five unrelated movement signals related to this study are obtained as shown in Figure 2b,c. The purpose of this paper is to achieve wrist target movements classification and unrelated movements rejection based on the low-density sEMG signals. Considering the robustness of the model, the sEMG signals of each target motion for 12 recruiters are randomly sampled in the proportion of 60%, 20%, and 20% to form the training dataset, verification dataset, and test dataset depending on target motion categories. The training dataset is used to train the model. The validation dataset is used to calculate the average reconstruction error of target movement samples. The test data consists of the test dataset and all unrelated movement samples, which are used to evaluate the performance of the model proposed in this paper.

Data Segmentation and sEMG Image Encoding
The sEMG signal is collected through the Myo armband in this paper, which can be regarded as a low-density array signal, including additional spatial information. CNN is good at extracting spatial information. Therefore, this procedure preprocesses low-density sEMG signals into images. First, the sliding windows are used to segment the multichannel sEMG signals into a series of overlapped analysis windows. Limited by real-time requirements, the maximum latency of 300 ms was recommended in [31]. Therefore, the window length and window step are set as 250 ms and 100 ms, respectively. Therefore, sEMG is segmented into a series of sEMG window matrices of size 50 × 8. Then, the amplitude mean and standard deviation of all the signals in the neutral state are calculated, and the amplitude threshold method is used to filter each signal window signal to obtain the active segment signal of the wrist joint. The threshold is defined as the sum of the mean and three times the standard deviation of the neutral signal, as shown in the Equation (1).
where Threshold is the boundary threshold between the active segment and the neutral segment, which is used to obtain signals of the muscle active segment. If the amplitude of a window signal is above the Threshold, it can be considered as a muscle activity state. The µ static and σ static denote the expectation and standard deviation of all neutral signals. Then three time-domain (TD) features are extracted for the signals of muscle activity state, that is, root mean square (RMS), mean absolute value (MAV), and waveform length (WL). They are calculated with Equations (2)-(4). The RMS, MAV, and WL are the most commonly used TD features in motion pattern recognition [5,7,10,12,30], and they are closely associated with the amplitude of sEMG, which can directly reflect the intensity of muscle contraction with low computational complexity. The studies [7,28,32] also showed that the combination of TD features and a small-scale CNN could reach satisfactory control performance. Therefore, the CNN combined with three time-domain features is applied in this paper with the advantages of simple network structure, less computation, and excellent results.
where x n denotes the value of the n th point for each window signal. Therefore, each window signal belonging to muscle activity is transformed into a 3 × 53 × 8 feature matrix as shown in Figure 3, which can directly represent the state of muscle activity.
where x denotes the value of the n point for each window signal. Therefore, each window signal belonging to muscle activity is transformed into a 3 × 53 × 8 feature matrix as shown in Figure 3, which can directly represent the state of muscle activity.

The Discriminative Feature Extraction Network
Many studies [6,15,[24][25][26][27][28][29][30] have verified the effectiveness of the convolution neural network for mining the internal information of sEMG signal. In this procedure, CNN is used to extract discriminative features from low-density sEMG images. The input of CNN is the sEMG feature images. This module makes two contributions: one is to provide AE with the deep discriminative features of sEMG signals, and the other is to identify target movements. The architecture of the CNN developed in this paper is shown in Figure 4. This CNN contains six layers, including three convolutional layers, two fully-connected layers, and an output layer. Three 2D convolution kernels (16 × 2 filter with a stride of 2 and a padding of 0, 8 × 3 filter with a stride of (2,1) and a padding of (2,1), and 3 × 3 filter with a stride of 1 and a padding of (1,1)) are used to process the sEMG images for the first three convolution layers in turn, and the convolution layer activation function is LeakyReLU activation. During training, the dropout rate between Layer4 and Layer5 is set to 20% to prevent model overfitting. The adaptive moment estimation (Adam) algorithm is adopted with the batch size set to 32 to train the network. We optimized the network using the training samples for 200 epochs with a learning rate of 0.0001. The trained CNN is used for extracting discriminant features, and the parameters are fixed during the training of the AE. The center loss and softmax loss are applied to optimize the network, which makes the deep characteristics of the fifth layer output of the network discriminative. Generally, only softmax loss is used for CNN model training, as shown in Equation (5).

The Discriminative Feature Extraction Network
Many studies [6,15,[24][25][26][27][28][29][30] have verified the effectiveness of the convolution neural network for mining the internal information of sEMG signal. In this procedure, CNN is used to extract discriminative features from low-density sEMG images. The input of CNN is the sEMG feature images. This module makes two contributions: one is to provide AE with the deep discriminative features of sEMG signals, and the other is to identify target movements. The architecture of the CNN developed in this paper is shown in Figure 4. This CNN contains six layers, including three convolutional layers, two fully-connected layers, and an output layer. Three 2D convolution kernels (16 × 2 filter with a stride of 2 and a padding of 0, 8 × 3 filter with a stride of (2, 1) and a padding of (2, 1), and 3 × 3 filter with a stride of 1 and a padding of (1, 1)) are used to process the sEMG images for the first three convolution layers in turn, and the convolution layer activation function is LeakyReLU activation. During training, the dropout rate between Layer4 and Layer5 is set to 20% to prevent model overfitting. The adaptive moment estimation (Adam) algorithm is adopted with the batch size set to 32 to train the network. We optimized the network using the training samples for 200 epochs with a learning rate of 0.0001. The trained CNN is used for extracting discriminant features, and the parameters are fixed during the training of the AE. The center loss and softmax loss are applied to optimize the network, which makes the deep characteristics of the fifth layer output of the network discriminative. Generally, only softmax loss is used for CNN model training, as shown in Equation (5).
where x i ∈ R d , and y i denote deep features and label of the i th record, respectively. w m is the m th column of the parameters W = [w 1 , w 2 . . . , w m ] ∈ R d×m . b ∈ R c is the bias. N and M denote the length of mini-batch samples and the number of categories, respectively. The softmax with cross-entropy loss is good at learning inter-class information, which describes the distance of two probability distributions [33]. For the multi-classification task, softmax activation can map the deep features to a category probability distribution, and cross-entropy can describe the loss between real label value and predicted probability.
The lower the softmax with cross-entropy loss, the better separability of deep features. However, the CNN trained only with softmax loss (CNN-AE-S) is more interested in the accuracy of the prediction probability for correct labels while ignoring the differences of other incorrect labels, resulting in the scattered distribution of learned features. That is, the optimization of intra-class distance is relatively weak, resulting in the poor discriminant performance of the model. In general, especially for a very similar sample distribution, small intra-class differences and large inter-class differences need to be considered for model training. Therefore, inspired by A. Farzaneh et al. [33], the center loss combined with the softmax loss is used to train the network. According to [33], the center loss can be calculated as shown in Equations (6)- (10).
where c y i ∈ R d is the y i th class center of deep features. c y i will be updated with the change of deep features. Therefore, it will be inefficient and even impractical to average the features of each class in each iteration for all training datasets. Therefore, in this paper, c y i is updated using a mini-batch sample for each iteration. And considering mislabeling samples caused by large disturbances, the scalars α (restricted in [0, 1]) are used to control the learning rate of center c y i . Therefore, the gradients of L center for x i and the update of c y i are defined as: When y i is different from j of c j (that is, δ(condition) = 0), c j will not be updated. Conversely, δ(condition) = 1 if y i is the same as j of c j , c j will be updated. Therefore, the loss function in this paper is defined as weighted the softmax loss and center loss as shown in Equation (10). The scalar λ is used to balance two loss functions. The softmax loss is used to increase inter-class spacing, and center loss is used to reduce intra-class spacing, thus making the deep features more discriminative.

Unrelated Movements Rejection Module
In this part, the AE network is trained to reject any samples that do not belong to the target movements by calculating reconstruction error. The CNN model developed in this paper can recognize target movements well, but it does not recognize the performance to reject unrelated movements. The AE network framework adopted by the proposed method is shown in Figure 5. The network adopts a mirror architecture, which can be divided into an encoder and a decoder. The encoder compresses the input into a low dimensional feature space, which only retains the main mode of the input data, and then the decoder can recover the raw data. If the distribution of the input and training data is different, then there are obvious differences in the output of the encoder, and the decoder is unable to recover the input data causing large reconstruction errors. The mean square error (MSE) loss is used to train AE. The input of AE is the layer5 output (512 × 1) of the CNN network on the training dataset of the target movement samples. The Adam algorithm with a mini-batch (set to 32) is adopted to optimize the parameters in the network. The learning rate is set as 0.0001 with the 1000 epochs to train the AE network. To test the discriminant network, the Pearson correlation(PC) [16] is applied to measure the difference between the input data and the reconstructed data as shown in Equation (11).
where x and y represent the j dimension values of the i input vector x and the reconstructed vector y , respectively, and x and y denote average of x and y , respectively. This distance can normalize the high-dimensional vector and calculate the difference. When an unknown sample is input into the model trained in this paper, the discriminative features of the sample are extracted through the CNN, and then these features are input into the trained AE network to calculate the reconstruction error. If the reconstruction error is below the threshold, it is considered that the sample belongs to the target movements, and the motion category is predicted by the CNN. If the reconstruction error is above the threshold, the sample is considered as unrelated movements and rejected. Therefore, it is necessary to select an appropriate reconstruction error threshold to distinguish between target movements and denial movements. Therefore, a prior recall factor is defined, and when the mean reconstruction error on the verification dataset of the target samples reaches the prior recall, this reconstruction error is recorded. Finally, this value is defined as the threshold to distinguish between movements and unrelated movements.

Unrelated Movements Rejection Module
In this part, the AE network is trained to reject any samples that do not belong to the target movements by calculating reconstruction error. The CNN model developed in this paper can recognize target movements well, but it does not recognize the performance to reject unrelated movements. The AE network framework adopted by the proposed method is shown in Figure 5. The network adopts a mirror architecture, which can be divided into an encoder and a decoder. The encoder compresses the input into a low dimensional feature space, which only retains the main mode of the input data, and then the decoder can recover the raw data. If the distribution of the input and training data is different, then there are obvious differences in the output of the encoder, and the decoder is unable to recover the input data causing large reconstruction errors. The mean square error (MSE) loss is used to train AE. The input of AE is the layer5 output (512 × 1) of the CNN network on the training dataset of the target movement samples. The Adam algorithm with a mini-batch (set to 32) is adopted to optimize the parameters in the network. The learning rate is set as 0.0001 with the 1000 epochs to train the AE network. To test the discriminant network, the Pearson correlation (PC) [16] is applied to measure the difference between the input data and the reconstructed data as shown in Equation (11).
where x ij and y ij represent the j th dimension values of the i th input vector x i and the reconstructed vector y i , respectively, and x ij and y ij denote average of x i and y i , respectively. This distance can normalize the high-dimensional vector and calculate the difference. When an unknown sample is input into the model trained in this paper, the discriminative features of the sample are extracted through the CNN, and then these features are input into the trained AE network to calculate the reconstruction error. If the reconstruction error is below the threshold, it is considered that the sample belongs to the target movements, and the motion category is predicted by the CNN. If the reconstruction error is above the threshold, the sample is considered as unrelated movements and rejected. Therefore, it is necessary to select an appropriate reconstruction error threshold to distinguish between target movements and denial movements. Therefore, a prior recall factor is defined, and when the mean reconstruction error on the verification dataset of the target samples reaches the prior recall, this reconstruction error is recorded. Finally, this value is defined as the threshold to distinguish between movements and unrelated movements. Micromachines 2023, 14, x FOR PEER REVIEW 10 of 21 Figure 5. The architecture of the discriminant network using multiple autoencoders.

Evaluation Criteria
In this paper, accuracy (Acc) [5], F-score, and receiver operating characteristic (ROC) curve are used to verify the performance of the proposed method. The corresponding equations are defined as follows.
where the subscript j represents the index of different movements, j = 1, 2, … , 7 correspond to the T1-T6 and neural movements, respectively. As shown in Equations (12)-(16), TP represents true positive, FP represents false positive, TN represents true negative, and FN represents false negative. The curve (ROC) indicates that the model is better at distinguishing unrelated samples. It is a binary classification problem to judge the target samples and unrelated samples. False positive rate (FPR) represents the probability of the target movement samples (FP) accounting for among all the samples whose prediction results are unrelated movement samples; true positive rate (TPR) represents the probability of the target movement samples predicted by model accounts for all target movement samples. When TPR is 1 and FPR is 0, the accuracy of the model for secondary classification reaches 100%. The area under the curve (AUC) further quantifies the performance of the model represented by the ROC curve. The AUC value is in the range of 0-1. Therefore, the larger the AUC value, the higher the overall performance of the model.

Experiment results and Discussion
In this section, the model obtained through the proposed method is presented and discussed. To verify the effectiveness of center loss, the visualization of deep features is provided on the training dataset and we also discussed the influence of hyperparameter

Evaluation Criteria
In this paper, accuracy (Acc) [5], F-score, and receiver operating characteristic (ROC) curve are used to verify the performance of the proposed method. The corresponding equations are defined as follows.
where the subscript j represents the index of different movements, j = 1, 2, . . . , 7 correspond to the T1-T6 and neural movements, respectively. As shown in Equations (12)-(16), TP represents true positive, FP represents false positive, TN represents true negative, and FN represents false negative. The curve (ROC) indicates that the model is better at distinguishing unrelated samples. It is a binary classification problem to judge the target samples and unrelated samples. False positive rate (FPR) represents the probability of the target movement samples (FP) accounting for among all the samples whose prediction results are unrelated movement samples; true positive rate (TPR) represents the probability of the target movement samples predicted by model accounts for all target movement samples. When TPR is 1 and FPR is 0, the accuracy of the model for secondary classification reaches 100%. The area under the curve (AUC) further quantifies the performance of the model represented by the ROC curve. The AUC value is in the range of 0-1. Therefore, the larger the AUC value, the higher the overall performance of the model.

Experiment Results and Discussion
In this section, the model obtained through the proposed method is presented and discussed. To verify the effectiveness of center loss, the visualization of deep features is provided on the training dataset and we also discussed the influence of hyperparameter λ on the model. To verify the performance of the model, the reconstruction error distribution on the validation dataset and the reconstruction error distribution of all unrelated movement samples are provided. Different evaluation indicators are adopted for quantitative evaluation to compare the performance of different methods.

Influence of Different Hyper-Parameter λ
To verify that the center loss function is available for optimizing deep features to make them separable and discriminative, the t-distributed stochastic neighbor embedding (t-SNE) [34] is applied to reveal the distributions of the deep features with different λ values, as shown in Figure 6a-i. Figure 6a shows the distributions of deep features trained only with the softmax loss using the t-SNE visualization method. Although these deep features have good separability in high-dimensions, the intra-class cluster capability of samples is inadequate and even overlaps. Therefore, these features have poor discrimination and may be unavailable and difficult for the rejection of unrelated movements. As shown in Figure 6b-i, the two-dimensional deep features distribution of training samples is shown when λ is 5 × 10 −6 , 1 × 10 −5 , 1.5 × 10 −5 , 2 × 10 −5 , 1 × 10 −4 , 2.5 × 10 −4 , 5.0 × 10 −4 , and 1 × 10 −3 , respectively. The hyperparameter λ impacts the model performance by influencing the distributions of the deep features. It can be seen that the larger the λ value, the smaller the intra-class spacing, that is, the more compact the deep features of each category are extracted. The joint supervision loss is used to train CNN, which reduces the inter-class spacing of deep features for samples, and the intra-class spacing is also effectively reduced simultaneously, which shows the effectiveness of center loss for the distribution of deep features where λ denotes the weight of center loss. Figure 6a-i shows that the deep feature distribution is affected by the hyperparameter λ, which is crucial to the rejection performance of unrelated movements.
As shown in Figure 7, the mean F-score and mean accuracy of all subjects of the training model are calculated with λ value defined as 5 × 10 −6 , 1 × 10 −5 , 1.5 × 10 −5 , 2 × 10 −5 , 1 × 10 −4 , 2.5 × 10 −4 , 5.0 × 10 −4 , and 1 × 10 −3 , respectively. As shown in Figure 7a,b, the green curve indicates the results of the mean f-score and accuracy when the recall is defined as 0.9, and the blue curve indicates the experimental results of the mean f-score and accuracy when the recall is defined as 0.85. When λ = 0, only the softmax loss is considered to train the model. The mean F-score and mean accuracy are 0.825 and 0.822, corresponding (recall = 0.9) with the worst performance. When λ > 0 (considering the center loss), the model performance is improved, and the model performance will change according to the value of λ. In addition, the increase of λ improves the rejection performance of the model, but the mean f-score and accuracy are reduced with the increase of λ. This is due to the large λ that may greatly compress the intra-class spacing, and CNN is not sensitive to the small differences between samples. Therefore, an appropriate λ is important for deep feature extraction and rejection of unrelated motions, which should be defined according to the practical application. In this paper, λ is defined as 0.00025, which shows better the performance of the model. the λ value, the smaller the intra-class spacing, that is, the more compact the deep features of each category are extracted. The joint supervision loss is used to train CNN, which reduces the inter-class spacing of deep features for samples, and the intra-class spacing is also effectively reduced simultaneously, which shows the effectiveness of center loss for the distribution of deep features where λ denotes the weight of center loss. Figure 6a-i shows that the deep feature distribution is affected by the hyperparameter λ, which is crucial to the rejection performance of unrelated movements. As shown in Figure 7, the mean F-score and mean accuracy of all subjects of the training model are calculated with λ value defined as 5 × 10 , 1 × 10 , 1.5 × 10 , 2 × 10 , 1 × 10 , 2.5 × 10 , 5.0 × 10 , and 1 × 10 , respectively. As shown in Figure 7a,b, the green curve indicates the results of the mean f-score and accuracy when the recall is defined as 0.9, and the blue curve indicates the experimental results of the mean f-score and accuracy when the recall is defined as 0.85. When λ = 0, only the softmax loss is considered to train the model. The mean F-score and mean accuracy are 0.825 and 0.822, corresponding (recall = 0.9) with the worst performance. When λ 0 (considering the center loss), the model performance is improved, and the model performance will change according to the value of λ. In addition, the increase of λ improves the rejection performance of the model, but the mean f-score and accuracy are reduced with the increase of λ. This is due to the large λ that may greatly compress the intra-class spacing, and CNN according to the value of λ. In addition, the increase of λ improves the rejection perfor-mance of the model, but the mean f-score and accuracy are reduced with the increase of λ. This is due to the large λ that may greatly compress the intra-class spacing, and CNN is not sensitive to the small differences between samples. Therefore, an appropriate λ is important for deep feature extraction and rejection of unrelated motions, which should be defined according to the practical application. In this paper, λ is defined as 0.00025, which shows better the performance of the model.

Performance of the Trained Model
In this section, the threshold is used to distinguish the target movements from the unrelated movements, and the threshold is determined according to the reconstruction error. Specifically, first a recall factor value for all target movements is preset, then the numerical value is recorded when the reconstruction error of all target movement classes in the verification set reaches the recall rate, and this value is regarded as the rejection threshold of each class. The ROC curve is used to estimate the performance of the proposed method for the target movements and rejection movements under different thresholds. A larger recall factor will make the threshold of all target movements categories larger, and more reconstruction errors in target movements samples are lower than the threshold. The smaller recall factor will reduce the threshold value of the target movements category, inevitably reducing the target movements recognition performance, but it will strengthen the rejection performance of unrelated movements. The reconstruction error of the target motion is relatively small, whereas the reconstruction error of the unrelated motion samples is significantly higher than that of the target category samples. As shown in Figure 8a, the reconstruction loss distribution is described of target movements on the validation dataset, and Figure 8b shows the reconstruction loss distribution of all unrelated movements in the test dataset by the trained model (λ = 0.00025). When the recall of the validation dataset achieves 0.80, 0.85, and 0.90, the corresponding thresholds are obtained, which are 0.0170, 0.0194, and 0.0235, respectively. As shown in Figure 8a,b, it is apparent that most of the sample reconstruction errors in the verification dataset are distributed on the left side of the threshold, whereas most of the reconstruction errors of unrelated samples are distributed on the right side of the threshold. Therefore, there is a threshold to distinguish target movements and unrelated movements. unrelated movements in the test dataset by the trained model (λ = 0.00025). When the recall of the validation dataset achieves 0.80, 0.85, and 0.90, the corresponding thresholds are obtained, which are 0.0170, 0.0194, and 0.0235, respectively. As shown in Figure 8a,b, it is apparent that most of the sample reconstruction errors in the verification dataset are distributed on the left side of the threshold, whereas most of the reconstruction errors of unrelated samples are distributed on the right side of the threshold. Therefore, there is a threshold to distinguish target movements and unrelated movements.  Figure 8c,d show the PC distribution for each movement, that is, the reconstruction error distribution of each category sample after the trained model on the test dataset when the recall of the model is 0.85 and 0.9, respectively. The blue dotted line indicates the rejection performance of the unrelated movements, which are 98.3% and 95.0%. Therefore, an appropriate threshold can effectively segment the target and unrelated samples.
As shown in Figure 9, The true positive rate (TPR) and false positive rate (FPR) are obtained under different thresholds, and their ROC curves are described. The proposed method reaches the maximum AUC value of 94.3%, which is-higher than that from the CNN-AE with softmax loss (the AUC value of 92.1%) when the recall is 0.9. Therefore, there is a threshold that can obtain the highest TPR while keeping the FPR very low, making the target movements more distinguishable from unrelated movements. In addition, the discriminative deep features combined with simple AE can achieve rejection performance. According to Figures 6 and 8, the center loss can compact intra-class variations and enlarge inter-class differences to improve rejection performance.  Figure 8c,d show the PC distribution for each movement, that is, the reconstruction error distribution of each category sample after the trained model on the test dataset when the recall of the model is 0.85 and 0.9, respectively. The blue dotted line indicates the rejection performance of the unrelated movements, which are 98.3% and 95.0%. Therefore, an appropriate threshold can effectively segment the target and unrelated samples.
As shown in Figure 9, The true positive rate (TPR) and false positive rate (FPR) are obtained under different thresholds, and their ROC curves are described. The proposed method reaches the maximum AUC value of 94.3%, which is-higher than that from the CNN-AE with softmax loss (the AUC value of 92.1%) when the recall is 0.9. Therefore, there is a threshold that can obtain the highest TPR while keeping the FPR very low, making the target movements more distinguishable from unrelated movements. In addition, the discriminative deep features combined with simple AE can achieve rejection performance. According to Figures 6 and 8, the center loss can compact intra-class variations and enlarge inter-class differences to improve rejection performance. method reaches the maximum AUC value of 94.3%, which is-higher than that from the CNN-AE with softmax loss (the AUC value of 92.1%) when the recall is 0.9. Therefore, there is a threshold that can obtain the highest TPR while keeping the FPR very low, making the target movements more distinguishable from unrelated movements. In addition, the discriminative deep features combined with simple AE can achieve rejection performance. According to Figures 6 and 8, the center loss can compact intra-class variations and enlarge inter-class differences to improve rejection performance.   Figure 10 shows the confusion matrices by the proposed method (λ = 0.00025) and CNN-AE-S (λ = 0) when the recall factor is defined as 0.85 and 0.9, respectively. The confusion matrices averaged over all subjects with the proposed method on the test dataset are shown in Figure 10a,b. When the recall is 0.9, the mean classification accuracy is 92% and the rejection performance of unrelated motions reached 95% by the CNN-AE-SC method, which is higher than that (82.2% and 71.6%) from the CNN-AE-S. When the recall is 0.85, the mean classification accuracy is 92% and the rejection performance of unrelated motions reached 95% by the CNN-AE-SC method, which is higher than that (82.2% and 71.6%) from the CNN-AE-S. According to Figure 10, for the model trained with the CNN-AE-SC method, all misclassifications of target motions (A1-A6) are determined to be the unrelated pattern (R1-R5) by mistake, which leads to no response of the control system. However, for the model trained with the CNN-AE-S, there are some misclassifications of target movements determined to other target movements by mistake, which leads to abnormal movement of the control system because of the wrong recognition.
To further compare the performance of the proposed methods, Table 1 reveals the mean recognition accuracies and rejection performance for each movement when the recall is defined as 0.8, 0.85, and 0.9. As shown in Table 1, although the traditional machine learning methods of SVM and LDA achieve the highest recognition performance (96.9% and 86.1%, respectively) for target motions, they cannot reject any unrelated samples and they finally misclassify them as target motion tasks. However, other methods are all capable of rejecting unrelated samples depending on the recall factor on the verification dataset. These rejections also affects the ability of the model to recognize the target samples. When the recall factor is set to 0.8, that is, the model has a poor ability to recognize target action, the recognition accuracies of target motion obtained by the LDA-MD, CNN-AE-S, CNN-AE-SC all drop to about 80%, whereas the rejection performance of unrelated motion are 84.0%, 84.3%, 99.7% by these methods, respectively. It is worth noting that the rejection performance of unrelated motion by the CNN-AE-SC is obviously better than the other two methods. Then, when the recall factor is increased to 0.85, the recognition performance of all methods is correspondingly improved to about 85%, whereas the average rejection performance of unrelated movements obtained by the LDA-MD (72.7%), CNN-AE-S (79.3%), and CNN-AE-SC (98.3%) methods slightly drop. The rejection performance of unrelated motion by the CNN-AE-SC method is obviously better than the other two methods. Finally, when the recall factor is defined as 0.9, the recognition performance of all methods for the target movements are correspondingly improved to about 90%, whereas the average rejection performance of unrelated movements by the LDA-MD (59.0%), CNN-AE-S (71.7%), CNN-AE-SC (95.0%) methods drop sharply. it can be seen that with the increase of the recall factor, the proposed method could always maintain better rejection ability for unrelated movements. Therefore, the proposed method is available, and its performance is significantly better than the other two methods, which could be applied to the real-time control of the upper limb exoskeleton tele-rehabilitation robot system. However, all of the above processing flows are performed offline, but how to apply the research framework in this study to the online stage to control the exoskeleton device is a consideration in our future work.
Micromachines 2023, 14, x FOR PEER REVIEW 15 of 21 Figure 10 shows the confusion matrices by the proposed method (λ = 0.00025) and CNN-AE-S (λ = 0) when the recall factor is defined as 0.85 and 0.9, respectively. The confusion matrices averaged over all subjects with the proposed method on the test dataset are shown in Figure 10a,b. When the recall is 0.9, the mean classification accuracy is 92% and the rejection performance of unrelated motions reached 95% by the CNN-AE-SC method, which is higher than that (82.2% and 71.6%) from the CNN-AE-S. When the recall is 0.85, the mean classification accuracy is 92% and the rejection performance of unrelated motions reached 95% by the CNN-AE-SC method, which is higher than that (82.2% and 71.6%) from the CNN-AE-S. According to Figure 10, for the model trained with the CNN-AE-SC method, all misclassifications of target motions (A1-A6) are determined to be the unrelated pattern (R1-R5) by mistake, which leads to no response of the control system. However, for the model trained with the CNN-AE-S, there are some misclassifications of target movements determined to other target movements by mistake, which leads to abnormal movement of the control system because of the wrong recognition. To further compare the performance of the proposed methods, Table 1 reveals the mean recognition accuracies and rejection performance for each movement when the recall is defined as 0.8, 0.85, and 0.9. As shown in Table 1, although the traditional machine learning methods of SVM and LDA achieve the highest recognition performance (96.9% and 86.1%, respectively) for target motions, they cannot reject any unrelated samples and they finally misclassify them as target motion tasks. However, other methods are all capable of rejecting unrelated samples depending on the recall factor on the verification dataset. These rejections also affects the ability of the model to recognize the target samples.  The number of samples in different reconstruction error intervals are shown in Figure 11a,b. The reconstruction loss distribution is achieved by the CNN-AE-S method in the verification dataset and unrelated samples. Compared with Figure 8a,b, although some target movement sample reconstruction errors and unrelated movement sample reconstruction errors can be correctly distinguished, many sample reconstruction errors are wrongly distinguished, indicating that samples are wrongly judged. When the recall coefficient is defined as 0.8, that is, the discrimination threshold is small (threshold = 0.0170, λ = 0.00025), the average recognition accuracy of target samples by the CNN-AE-S method and CNN-AE-SC method drops to about 80%. The average accuracy of rejection of unrelated samples by CNN-AE-S methods is 84.3% with an average F-score of 83.7%. The average rejection accuracy is 99.7% for unrelated samples by the CNN-AE-SC method proposed in this paper, with an average F-score of 92.8%. When the recall is defined as 0.9, that is, the discrimination threshold is large (threshold = 0.0235, λ = 0.00025). The average recognition accuracy of the two methods for target samples correspondingly increases to about 90%, but the rejection performance of the CNN-AE-S method for unrelated samples drops sharply to 71.7% with an average F-score of 83.8%. The proposed method still has a high rejection performance with an average accuracy of 95.0% and an average F-score of 93.7%. Therefore, the CNN-AE-SC method is superior to the CNN-AE-S method. Although the recognition accuracy of the model for target samples is improved with the increase in recall rate, the rejection performance for unrelated samples will decrease accordingly. However, the model trained with the proposed method has a higher recognition accuracy (90.0%) for target samples and a higher rejection performance (95.0%) for unrelated samples.
The sEMG signal is very sensitive to muscle contraction, and small unrelated movements can introduce interference to the sEMG control system. The unrelated movements will cause the system to execute wrong instructions, which may cause safety accidents while reducing user experience. As shown in Table 2, some comparison results of different methods are summarized. In summary, the existing solutions could be divided into domain-based methods [26] and refactoring-based methods [28], etc.. However, the above work of rejecting irrelevant actions often assumes that the irrelevant actions and target actions are significantly different, so a relatively simple method can be used to distinguish them. Q. Ding [26] constructed an adaptive incremental hybrid classifier by combining oneclass support vector data description and multi-class LDA with an average target motions recognition rate of 91.7% and unrelated motions rejection rate of about 90.0%. However, in actual application situations, unrelated actions are unpredictable and often very similar to target actions. Y. Zhang [28], based on a high-density EMG signal, proposed a myoelectric control method to alleviate the interference brought by novel motion tasks with an average target motions recognition rate of 88.6% and the unrelated motions rejection rate of 96.4%. To resolve the above problems, this paper proposes a method for rejecting unrelated motion based on low-density EMG signals that can be used for portable rehabilitation of the wrist joint at home. The results show that the average recognition rate of the method proposed in this paper for the target motion is about 90.0%, and the rejection rate for irrelevant motion can reach 95.0%. Compared with [26] and [28], this method realizes the unrelated motion rejection based on low-density sEMG signals when the unrelated motion and target motion signals are very similar. The sEMG signal is very sensitive to muscle contraction, and small unrelated movements can introduce interference to the sEMG control system. The unrelated movements will cause the system to execute wrong instructions, which may cause safety accidents while reducing user experience. As shown in Table 2, some comparison results of different methods are summarized. In summary, the existing solutions could be divided into domain-based methods [26] and refactoring-based methods [28], etc.. However, the above work of rejecting irrelevant actions often assumes that the irrelevant actions and target actions are significantly different, so a relatively simple method can be used to distinguish them. Q. Ding [26] constructed an adaptive incremental hybrid classifier by combining one-class support vector data description and multi-class LDA with an average target motions recognition rate of 91.7% and unrelated motions rejection rate of about 90.0%. However, in actual application situations, unrelated actions are unpredictable and often very similar to target actions. Y. Zhang [28], based on a high-density EMG signal, proposed a myoelectric control method to alleviate the interference brought by novel motion tasks with an average target motions recognition rate of 88.6% and the unrelated motions rejection rate of 96.4%. To resolve the above problems, this paper proposes a method for rejecting unrelated motion based on low-density EMG signals that can be used for portable rehabilitation of the wrist joint at home. The results show that the average recognition rate of the method proposed in this paper for the target motion is about 90.0%, and the rejection rate for irrelevant motion can reach 95.0%. Compared with [26] and [28], this method realizes the unrelated motion rejection based on low-density sEMG signals when the unrelated motion and target motion signals are very similar.

Conclusions
The performance of rejecting unrelated samples is very important for the sEMGcontrolled exoskeleton rehabilitation robot. In this paper, a sEMG-based pattern recognition method for wrist rehabilitation is proposed to improve the anti-interference performance of unrelated wrist movements. First, the method uses the idea of metric learning to extract deep distinguishing features from low-density sEMG feature images. Then the automatic encoder is used to calculate the reconstruction error to distinguish the unrelated samples and target samples. Compared with only using softmax loss to train the model, the proposed method enables the model to have a higher recognition accuracy (90.0%) for target samples with a higher rejection performance (95.0%) for unrelated samples, which shows that the proposed method is a viable method to improve the robustness of sEMG-based pattern recognition for exoskeleton rehabilitation robots. In the future, we will further consider the influence of the inter-subject difference of sEMG to model performance and apply it to the real-time control of the upper limb exoskeleton rehabilitation robot system.  Informed Consent Statement: Written informed consent has been obtained from the subjects to publish this paper.
Data Availability Statement: Not applicable.