Design of Mirror Therapy System Base on Multi-Channel Surface-Electromyography Signal Pattern Recognition and Mobile Augmented Reality

: Numerous studies have proven that the mirror therapy can make rehabilitation more e ﬀ ective on hemiparesis following a stroke. Using surface electromyography (SEMG) to predict gesture presents one of the important subjects in related research areas, including rehabilitation medicine, sports medicine, prosthetic control, and so on. However, current signal analysis methods still fail to achieve accurate recognition of multimode motion in a very reliable way due to the weak physiological signal and low noise-signal ratio. In this paper, a mirror therapy system based on multi-channel SEMG signal pattern recognition and mobile augmented reality is studied. Besides, wavelet transform method is designed to mitigate the noise. The spectrogram obtained by analyzing electromyography signals is proposed to be used as an image. Two approaches, including Convolutional Neural Network (CNN) and grid-optimized Support Vector Machine (SVM), are designed to classify the SEMG of di ﬀ erent gestures. The mobile augmented reality provides a virtual hand movement in the real environment to perform mirror therapy process. The experimental results show that the overall accuracy of SVM is 93.07%, and that of CNN is up to 97.8%.


Introduction
According to some incomplete statistics, in China, strokes cause about 1.5 million deaths each year, and the new patients who suffer from stroke are in a huge number of 2 million [1]. The biggest problem of stroke lies in the physical impact, most of the survivors after a stroke will show a certain kind of limb weakness [2]. As per the research and clinical studies, the timely and effective rehabilitation training is available for the rehabilitation in limb movements to certain degree for stroke patients [3]. The rehabilitation method for patients with stroke is mainly the hand-function training, including the ability to pinch, to grab, and to catch [4].
In the traditional rehabilitation, patients accept training under the guidance of therapists, however, the methods provide limited feedback and motivation to patients. In contrast, mirror therapy is characterized by the ability to map other people's movements directly in the observer's brain, and the mechanism aims to activate the contralateral brain mirror neurons which often appear when people do certain kind of action or observe other peoples' actions [5]. Research has proven that mirror therapy patients [2]. Studies showed that task-oriented motor training was able to improve the impaired limbs [18]. Being unsystematic and difficult to conduct, the traditional manual training is characterized by unsatisfactory effectiveness, and fails to meet the patients' requirements. Patients with amputated limb may have a feeling or sense that their limb is still there, and these phantom pain sensations will make patients suffer. A person can use his or her intact limb to represent their disabled limb in the mirror, therefore, their pain will be eliminated, or their motor function will be trained, which is the so-called mirror therapy. Mirror therapy helps people use their intact limb to represent their disabled limb in the mirror, so that their pain will either be eliminated or their motor function will be trained.
The schematic diagram of the mirror therapy system based on multi-channel SEMG signal pattern recognition and mobile augmented reality is shown in Figure 1. A wearable device is employed as the gesture sensor to collect the SEMG signal data. Besides, the training data is transmitted to the personal computer when it is in the training mode. After pre-processing or analysis, the CNN is designed to train the classification model. While it is in the test mode, the test data is sent to the mobile terminal before the data is used for features extraction and action matching after pre-processing. The determined results are fed back to the mobile side. After the processing of AR on the mobile terminal, users can see the movement they are trying to do.

SEMG Pre-Processing
In this proposed mirror therapy system with the objective to achieve the SEMG signal pattern recognition, the raw signal must be processed, including data acquisition, noise reduction, spectrogram drawing, and features extractions.

SEMG Signal Data Acquisition and Data Segmentation
The main process of the data acquisition in the personal computer terminal is shown in Figure  2. The way of the data acquisition in mobile phone terminal is similar to that on the personal computer terminal. The sampled data is sent to the computer and saved in CSV file.

SEMG Pre-Processing
In this proposed mirror therapy system with the objective to achieve the SEMG signal pattern recognition, the raw signal must be processed, including data acquisition, noise reduction, spectrogram drawing, and features extractions.

SEMG Signal Data Acquisition and Data Segmentation
The main process of the data acquisition in the personal computer terminal is shown in Figure 2. The way of the data acquisition in mobile phone terminal is similar to that on the personal computer terminal. The sampled data is sent to the computer and saved in CSV file.  As a device widely used for gesture recognition and manufactured by Thalmic Labs, MYO h ght interconnected stainless-steel sensor components. In this study, the MYO armband is used as ata acquisition device, and a total of 10 types of SEMG signal data for gesture motion are sample he sampling frequency for each type of gesture motion is 100 Hz with a relaxing time of 3 s betwe o motions completed by the subject. The 10 types of motions are defined in Figure 3. As a device widely used for gesture recognition and manufactured by Thalmic Labs, MYO has eight interconnected stainless-steel sensor components. In this study, the MYO armband is used as a data acquisition device, and a total of 10 types of SEMG signal data for gesture motion are sampled. The sampling frequency for each type of gesture motion is 100 Hz with a relaxing time of 3 s between two motions completed by the subject. The 10 types of motions are defined in Figure 3.  The data segment is necessary to study every single gesture, and a window is added to get the rough range. After that, data smoothing is done to obtain the absolute maximum value, we choose 20% of the maximum as the boundaries for both left and right. The wave of channel three's 10 single gestures is shown in Figure 4.

Noise Reduction
Being nonstationary signals in limb motion, SEMG signals have a low noise-signal ratio for the environmental and physiology noise influences during acquisition. The study shows that the energy of the SEMG signals is concentrated in 20-450 Hz. The common noise mainly includes power frequency interference, sudden peak, white noise, and high-frequency noise. Facing the challenge, noise cancelation methods should be indispensably conducted.

A Notch Filter to Remove Power Frequency Interference
A notch filter refers to a filter that attenuates an input signal quickly at a certain frequency to hinder the passage of this frequency signal. It is mainly used to eliminate the interference at a certain frequency.

High Pass Filter to Remove the Sudden Peak
High pass filter is always defined as a filter used to remove unnecessary low-frequency part of the signal and eliminate the interference of low-frequency in the signal. Therefore, the high pass filter is very useful in this situation. The cutoff frequency is set to be near zero, which turns out to be effective to cut off the sudden peak near 0 Hz.

The Wavelet Transforms to Remove White Noise and High-Frequency Noise
Wavelet transform (WT) is a method which is available to overcome some disadvantages of short-time Fourier transform [19]. Different from the short-time Fourier transform, the window size varies with frequency. This ability is ideal for time-frequency analysis and processing of signals.
Wavelet noise reduction methods can be divided into many categories, including modulus maximum de-noising method and shielded de-noising based on the correlation of wavelet coefficients at each scale, wavelet threshold de-noising and translation invariant method. In this paper, the design utilized wavelet threshold transform de-noise based on translation invariance, the calculation process is as follows.

Noise Reduction
Being nonstationary signals in limb motion, SEMG signals have a low noise-signal ratio for the environmental and physiology noise influences during acquisition. The study shows that the energy of the SEMG signals is concentrated in 20-450 Hz. The common noise mainly includes power frequency interference, sudden peak, white noise, and high-frequency noise. Facing the challenge, noise cancelation methods should be indispensably conducted.

A Notch Filter to Remove Power Frequency Interference
A notch filter refers to a filter that attenuates an input signal quickly at a certain frequency to hinder the passage of this frequency signal. It is mainly used to eliminate the interference at a certain frequency.

High Pass Filter to Remove the Sudden Peak
High pass filter is always defined as a filter used to remove unnecessary low-frequency part of the signal and eliminate the interference of low-frequency in the signal. Therefore, the high pass filter is very useful in this situation. The cutoff frequency is set to be near zero, which turns out to be effective to cut off the sudden peak near 0 Hz.

The Wavelet Transforms to Remove White Noise and High-Frequency Noise
Wavelet transform (WT) is a method which is available to overcome some disadvantages of short-time Fourier transform [19]. Different from the short-time Fourier transform, the window size varies with frequency. This ability is ideal for time-frequency analysis and processing of signals.
Wavelet noise reduction methods can be divided into many categories, including modulus maximum de-noising method and shielded de-noising based on the correlation of wavelet coefficients at each scale, wavelet threshold de-noising and translation invariant method. In this paper, the design utilized wavelet threshold transform de-noise based on translation invariance, the calculation process is as follows.

Time domain translation
is the signal after translation and N is the signal length.

2.
Wavelet basis selection and decomposition The Wavelet Transform of the above formula is carried out and the observation of the wavelet decomposition coefficient is obtained: After an analysis of the wavelet coefficients of signal and noise that affect performance in each scale, a finite coefficient of the main characteristic signal is distributed in a large scale and wavelet coefficients by finite number can well reconstruct the original signal.

3.
Threshold selection The coefficients of the soft limited amplitude function wavelet domain are used to process the threshold. Soft limiting amplitude function is: where η t x i,j represents the wavelet coefficient after threshold processing, and t represents the threshold value.
The selection of the threshold value is based on two filtering conditions, one is smooth, and the other is adapt. Here, t can meet both of the conditions.
Time domain translation reversely Here, translation should be reverse to the first step. It could be expressed in the following formula.
If T is used to represent the WT and soft threshold process, then the whole process can be expressed by the formula below: wherex is the result of Wavelet Threshold Transform De-noise Based on Translation invariance. According to the result shown in Figure 5, the white noise and high-frequency noise have been effectively removed, which could be concluded through the comparison between Figure 5a

Gesture Classification and Recognition
In the recognition gesture based on EMG, every motion has its own unique feature. The common methods for feature extraction include the time domain analysis, frequency domain analysis, and time-frequency domain analysis. Several features will be tested to form the feature vectors; besides, the choice of features will have a direct influence on the recognition accuracy. In this section, two methods will be studied to classify gestures, including support vector machine optimized by grid search method and the convolutional neural network.

Gesture Recognition Based on SVM Optimized by Grid Search Method
As a machine learning method based on statistical learning theory, Support Vector Machine [20] has achieved great developments in pattern recognition and regression analysis in the function estimation and time series prediction fields.

Gesture Classification and Recognition
In the recognition gesture based on EMG, every motion has its own unique feature. The common methods for feature extraction include the time domain analysis, frequency domain analysis, and time-frequency domain analysis. Several features will be tested to form the feature vectors; besides, the choice of features will have a direct influence on the recognition accuracy. In this section, two methods will be studied to classify gestures, including support vector machine optimized by grid search method and the convolutional neural network.

Gesture Recognition Based on SVM Optimized by Grid Search Method
As a machine learning method based on statistical learning theory, Support Vector Machine [20] has achieved great developments in pattern recognition and regression analysis in the function estimation and time series prediction fields. The gesture features extracted are stored in a text file, and the format can be explained by Table 1.
• Gesture: Ten gestures from AG to GF are repeated for 100 times, which means that 100 groups of gestures from AG to GF have been done; • Features: F1, F2, F3, F4, F5 and F6 represent the features respectively, including Mean, Variance, Kurtosis, Skewness, Peak and Signal Slope. Where, the calculation formulas of Mean, Variance, Kurtosis and Skewness are shown as following formulas respectively.
A n = a 1 + a 2 + a 3 + · · · + a n n where Y i represent the univariate data. Y is the mean, s is the standard deviation, and N is the number of the data points.

Gestures SEMG Channel 1 SEMG Channel 8 Label
• EMG Channel: Since the MYO armband has 8 channels, the features will be from EMG channels from 1 to 8. That is to say, 8 group of features from F1 to F6 will be included in the text file; • Label: Labels from 0 to 9 have been added to 10 gestures from AG to GF.
These time-domain features are effective in classifying the 10 gestures, and it is also more valuable to calculate the features from all 8 channels instead of only one channel. The text file in this format is directly put into the SVM training model.
In the SVM training process, the penalty parameter c and the kernel parameter g need to be optimized to select the best parameter. In this paper, Cross Validation (CV) combined with grid optimizer was used to obtain the best parameters of c and g. Figure 6 shows the processing flow of the grid optimized SVM.
Electronics 2020, 9, x FOR PEER REVIEW 8 of 16 The gesture features extracted are stored in a text file, and the format can be explained by Table 1.
 Gesture: Ten gestures from AG to GF are repeated for 100 times, which means that 100 groups of gestures from AG to GF have been done;  Features: F1, F2, F3, F4, F5 and F6 represent the features respectively, including Mean, Variance, Kurtosis, Skewness, Peak and Signal Slope. Where, the calculation formulas of Mean, Variance, Kurtosis and Skewness are shown as following formulas respectively.
where represent the univariate data. is the mean, is the standard deviation, and is the number of the data points.
 EMG Channel: Since the MYO armband has 8 channels, the features will be from EMG channels from 1 to 8. That is to say, 8 group of features from F1 to F6 will be included in the text file;  Label: Labels from 0 to 9 have been added to 10 gestures from AG to GF.
These time-domain features are effective in classifying the 10 gestures, and it is also more valuable to calculate the features from all 8 channels instead of only one channel. The text file in this format is directly put into the SVM training model.
In the SVM training process, the penalty parameter c and the kernel parameter g need to be optimized to select the best parameter. In this paper, Cross Validation (CV) combined with grid optimizer was used to obtain the best parameters of c and g. Figure 6 shows the processing flow of the grid optimized SVM. Cross-validation is used to verify the performance of the classifier statistically, and the basic idea is to group the dataset into a training set and a test set. First, train the classifier with the training set, and use the test set to verify the trained model as a performance indicator for the evaluation of the classifier. K-fold cross validation has been used in this work, in which the original dataset is divided into K groups (generally in equal division), each subset of data is set as a verification set, and the remaining K-1 subsets of data are used as a training set. This will result in K models which will eventually be used. The average of the classification accuracy of the verification set is used as the performance index for the classifier under K-CV. Cross-validation is used to verify the performance of the classifier statistically, and the basic idea is to group the dataset into a training set and a test set. First, train the classifier with the training set, and use the test set to verify the trained model as a performance indicator for the evaluation of the classifier. K-fold cross validation has been used in this work, in which the original dataset is divided into K groups (generally in equal division), each subset of data is set as a verification set, and the remaining K-1 subsets of data are used as a training set. This will result in K models which will eventually be Electronics 2020, 9, 2142 9 of 16 used. The average of the classification accuracy of the verification set is used as the performance index for the classifier under K-CV.
The basic principle of the grid search method [21] is to allow c and g to divide the grid in a certain range and traverse all points in the grid. For the determined c and g, K-CV is used to obtain the classification accuracy of the training set under the group of c and g. Finally, c and g present the best parameters that help the training set to verify the highest classification accuracy.
The best SVM training accuracy and the average accuracy of prediction in the control console are shown in Table 2 below. The average accuracy of prediction is about 93.07%, which is the result of one iteration since our test data set and train data set are chosen randomly. It therefore shows that they are changing in every single iteration.
Cohen's kappa coefficient is index to evaluate the classifying process and to measure the classification accuracy, it can be calculated by the formula below: where p o is the observed proportion of conformity, and p e is due to the proportion of randomness.
The kappa score is about 92.25% as shown in Table 2, which means the classification accuracy is very high. In Figure 7, every average accuracy of the iteration for 100 times is drawn in the line chart. It shows the maximum average accuracy in the iteration for 100 times is about 97.03%, and the minimum average accuracy is about 90.10%. Since our test data set and train data set of each iteration is chosen randomly, this result is more reliable. In that case, the performance of the grid optimized SVM performs well during the classification.
Electronics 2020, 9, x FOR PEER REVIEW 9 of 16 The basic principle of the grid search method [21] is to allow c and g to divide the grid in a certain range and traverse all points in the grid. For the determined c and g, K-CV is used to obtain the classification accuracy of the training set under the group of c and g. Finally, c and g present the best parameters that help the training set to verify the highest classification accuracy.
The best SVM training accuracy and the average accuracy of prediction in the control console are shown in Table 2 below. The average accuracy of prediction is about 93.07%, which is the result of one iteration since our test data set and train data set are chosen randomly. It therefore shows that they are changing in every single iteration.
Cohen's kappa coefficient is index to evaluate the classifying process and to measure the classification accuracy, it can be calculated by the formula below: where po is the observed proportion of conformity, and pe is due to the proportion of randomness.
The kappa score is about 92.25% as shown in Table 2, which means the classification accuracy is very high. In Figure 7, every average accuracy of the iteration for 100 times is drawn in the line chart. It shows the maximum average accuracy in the iteration for 100 times is about 97.03%, and the minimum average accuracy is about 90.10%. Since our test data set and train data set of each iteration is chosen randomly, this result is more reliable. In that case, the performance of the grid optimized SVM performs well during the classification.

Gesture Recognition Based on Convolutional Neural Network
Nowadays, many recognition systems with high precision have been established by deep learning, which have achieved significant progress in many fields, such as computer vision and voice recognition. In this study, a deep CNN that transforms SEMG signals into images as input is proposed.

Gesture Recognition Based on Convolutional Neural Network
Nowadays, many recognition systems with high precision have been established by deep learning, which have achieved significant progress in many fields, such as computer vision and voice recognition. In this study, a deep CNN that transforms SEMG signals into images as input is proposed.
After data pre-processing, the data of gestures in each channel are saved; feature extraction needs to be operated on the single data of these gestures. In this process, first, a Fourier transform of a single motion signal is performed, after that, time and frequency are used as the two dimensions of the image to plot the spectrogram of each motion in every channel. Secondly, after combining the plurality of the convolution layers and the pooling layers, the SEMG signal is modeled using output results corresponding to a plurality of motion models of the limb. The steps of transforming SEMG signals into images are illustrated as follows.

1.
Calculate the maximum difference value derived from the SEMG signal of each channel.
The first step is to calculate the maximum difference value derived from the SEMG signal of each channel, then combine it with other 8 channels to obtain the ninth channel, which means there are 9 channels in total which can make the process of feature expression more reliable.

2.
The time domain figure of every gesture in each channel 3.
The frequency domain figure of every gesture in each channel The following formula of the short-time Fourier transform is available to convert the time-domain SEMG signal into the frequency-domain one.
Spectrogram drawing Time and frequency are used as the two dimensions of the image to obtain the spectrogram of the SEMG signal, as shown in Figure 8a. After data pre-processing, the data of gestures in each channel are saved; feature extraction needs to be operated on the single data of these gestures. In this process, first, a Fourier transform of a single motion signal is performed, after that, time and frequency are used as the two dimensions of the image to plot the spectrogram of each motion in every channel. Secondly, after combining the plurality of the convolution layers and the pooling layers, the SEMG signal is modeled using output results corresponding to a plurality of motion models of the limb. The steps of transforming SEMG signals into images are illustrated as follows.
1. Calculate the maximum difference value derived from the SEMG signal of each channel.
The first step is to calculate the maximum difference value derived from the SEMG signal of each channel, then combine it with other 8 channels to obtain the ninth channel, which means there are 9 channels in total which can make the process of feature expression more reliable.  The SEMG signal of 8 channels and the maximum difference channel signal are combined and converted into a spectrogram, as shown in Figure 8b.
These spectrograms in the form of images are used as the inputs to a CNN classifier. Furthermore, SEMG signals will be modeled based on the output results corresponding to multiple motion modes of the upper limbs after combining the convolutional layer with the pooling layer. The SEMG signal of 8 channels and the maximum difference channel signal are combined and converted into a spectrogram, as shown in Figure 8b.
These spectrograms in the form of images are used as the inputs to a CNN classifier. Furthermore, SEMG signals will be modeled based on the output results corresponding to multiple motion modes of the upper limbs after combining the convolutional layer with the pooling layer.
To improve the generalization ability of the utilization of certain information included in the plurality of task training signal, the multitask learning (MTL) and multi-label classification concepts are studied. Besides, the shared expression is used to train the multiple tasks in parallel by MTL [22]. When extracting SEMG signal features through the five convolutional layers and the pooling layers, the corresponding gesture classification labels of each channel are used as the output of CNNs. The multi-label classification method is shown in Figure 9. The input of convolutional layers is connected by means of three fully connected layers. In addition, the SoftMax layers are used for the output of 10 categories of gesture labels, wherein the SEMG signals of 8 channels and these 8 channels' maximal match.
Electronics 2020, 9, x FOR PEER REVIEW 11 of 16 are studied. Besides, the shared expression is used to train the multiple tasks in parallel by MTL [22]. When extracting SEMG signal features through the five convolutional layers and the pooling layers, the corresponding gesture classification labels of each channel are used as the output of CNNs. The multi-label classification method is shown in Figure 9. The input of convolutional layers is connected by means of three fully connected layers. In addition, the SoftMax layers are used for the output of 10 categories of gesture labels, wherein the SEMG signals of 8 channels and these 8 channels' maximal match. In this study, 10 types of SEMG signal data of 100 subjects for gesture motion are sampled. The 5-10 groups are sampled for every motion of each subject. Therefore, a total of 5000 to 10,000 groups of data are sampled; among them, 4/5 of these data is used in training, and the rest is used in model testing.
The single-label classification and multi-label classification model are trained with a network, as shown in Figure 9. And a spectrogram of image size of integrated with the 9-channel SEMG signal is taken as an input, each channel has 9 gesture labels SoftMax classifier. The labels of this gesture are obtained with majority voting algorithm.

Comparison between SVM and CNN
SVM and CNN present two ways for classification, which are quite different. The CNN process attempts to take the image of spectrogram as the input of CNN, while in SVM, the time domain features are chosen. As shown in Table 3, the comparison will be based on the following aspects. Accuracy: In this design, according to the experiment results, the accuracy of SVM is around 93%. In the case of iteration for more times, sometimes it will be below 90%, but higher than 87%. According to the observation, sometimes, the accuracies of SVM are higher than those of CNN (the maximum average accuracy of SVM is 97.03%). This may be due to the fact that the test data set and the training data set are chosen randomly every time in SVM training process. The accuracy of CNN is up to 97.8% in the case of iteration for multiple times, which is more stable. The reason lies in the fact that the automatic features extraction makes it more robust.
Training time: The training time of CNN is longer than that of SVM, which is reasonable because the features are manually selected and extracted before SVM training. This means that for different signals, the features need to be selected through the tests and artificial extractions of hundreds of In this study, 10 types of SEMG signal data of 100 subjects for gesture motion are sampled. The 5-10 groups are sampled for every motion of each subject. Therefore, a total of 5000 to 10,000 groups of data are sampled; among them, 4/5 of these data is used in training, and the rest is used in model testing.
The single-label classification and multi-label classification model are trained with a network, as shown in Figure 9. And a spectrogram of image size of integrated with the 9-channel SEMG signal is taken as an input, each channel has 9 gesture labels SoftMax classifier. The labels of this gesture are obtained with majority voting algorithm.

Comparison between SVM and CNN
SVM and CNN present two ways for classification, which are quite different. The CNN process attempts to take the image of spectrogram as the input of CNN, while in SVM, the time domain features are chosen. As shown in Table 3, the comparison will be based on the following aspects. Accuracy: In this design, according to the experiment results, the accuracy of SVM is around 93%. In the case of iteration for more times, sometimes it will be below 90%, but higher than 87%. According to the observation, sometimes, the accuracies of SVM are higher than those of CNN (the maximum average accuracy of SVM is 97.03%). This may be due to the fact that the test data set and the training data set are chosen randomly every time in SVM training process. The accuracy of CNN is up to 97.8% in the case of iteration for multiple times, which is more stable. The reason lies in the fact that the automatic features extraction makes it more robust.
Training time: The training time of CNN is longer than that of SVM, which is reasonable because the features are manually selected and extracted before SVM training. This means that for different signals, the features need to be selected through the tests and artificial extractions of hundreds of times. However, CNN is able to find the features and do the extraction by itself, which is more timesaving. And the training time of SVM is also influenced by the number of data samples, which means that the time would be much longer if more data input is added.
The requirement for hardware: Since CNN is operated based on an image, it applies requirements on hardware, especially for GPU, which is higher than SVM.
The requirement for data: CNN shows a high requirement for data, and the data needs to be well de-noised and with lower redundancy, in other words, CNN needs a high-quality data input. This conclusion is based on the experience of this experiment, that is, the data redundancy will have its influence on the process of features extraction.

System Implementation
In this section, the mirror therapy system for upper limb based on SEMG pattern recognition is implemented to identify the gestures of patients' intact hand. And the feedback is sent to the phone which is placed on the top of the disabled hand to display the gesture identified, thus achieving the performance of mirror therapy for the upper limb.

System Structure
The designed mirror therapy system structure is shown in Figure 10. The wearable device MYO is connected to the mobile phone via Bluetooth. When the user is exercising a certain finger pair movement for rehabilitation training, MYO will collect the SEMG signal data which is then transmitted to the software designed in the mobile terminal. After the pre-processing or analysis, the data is sent to the cloud server for feature extraction and action matching. The cloud server saved the trained classification model before it determines the type of action and offers an instant feedback to the mobile side. The mobile-side software then plays the corresponding finger pair movement at the receipt of the feedback, and users can see the movement that they are trying to do through the mobile augmented reality which provides a virtual hand movement in the real environment and on the real arm, so as to accomplish the mirror therapy process.
The flowchart and the main structure of the rehabilitation system built in this study are demonstrated in Figure 11. The operation steps are as follows: 1.
Connect MYO Armband with Android Platform through Bluetooth; 2.
The patient performs a certain hand gesture with its undamaged hand; 3.
The Android platform receives the SEMG data from MYO Armband, and the data is sent to the cloud server after pre-processing; 4.
The cloud server extracts the features of the SEMG data, classify the certain gesture the patient did, and transmits a real-time feedback to the Android platform (to tell it what the gesture is); 5.
The Android platform on the damaged hand side displays the gesture recognized as a MAR scene; 6.
The patient watches the video to accomplish the task of the mirror therapy.
transmitted to the software designed in the mobile terminal. After the pre-processing or analysis, the data is sent to the cloud server for feature extraction and action matching. The cloud server saved the trained classification model before it determines the type of action and offers an instant feedback to the mobile side. The mobile-side software then plays the corresponding finger pair movement at the receipt of the feedback, and users can see the movement that they are trying to do through the mobile augmented reality which provides a virtual hand movement in the real environment and on the real arm, so as to accomplish the mirror therapy process. Figure 10. The designed mirror therapy system structure diagram.
The flowchart and the main structure of the rehabilitation system built in this study are demonstrated in Figure 11. The operation steps are as follows: 1. Connect MYO Armband with Android Platform through Bluetooth; 2. The patient performs a certain hand gesture with its undamaged hand; 3. The Android platform receives the SEMG data from MYO Armband, and the data is sent to the cloud server after pre-processing; 4. The cloud server extracts the features of the SEMG data, classify the certain gesture the patient did, and transmits a real-time feedback to the Android platform (to tell it what the gesture is); 5. The Android platform on the damaged hand side displays the gesture recognized as a MAR scene; 6. The patient watches the video to accomplish the task of the mirror therapy. Figure 11. The mirror therapy system diagram.

Cloud Server and Android Application
The cloud services are characterized by many advantages, for example, convenience for utilization, cloud host storage available for more aspects; the cloud server is relatively safe and reliable; it is more convenient to upgrade; has a higher response speed; it is cost-effective performance. In this work, the Alibaba cloud server called Aliyun-since is employed to implement the cloud services for the mirror therapy.
The cloud server is employed to implement the algorithm of classification. The model which is already trained will be put in the cloud server to perform prediction and gesture matches.
On the other hand, Android application is the main part of our mirror therapy system. The designed functions of the Android application include receiving the SEMG signal data and implementing preprocess, communication with the cloud server and displaying the gesture MAR scene performed by the patient. The basic functions needed by the mirror therapy system are implemented. Figure 11. The mirror therapy system diagram.

Cloud Server and Android Application
The cloud services are characterized by many advantages, for example, convenience for utilization, cloud host storage available for more aspects; the cloud server is relatively safe and reliable; it is more convenient to upgrade; has a higher response speed; it is cost-effective performance. In this work, the Alibaba cloud server called Aliyun-since is employed to implement the cloud services for the mirror therapy.
The cloud server is employed to implement the algorithm of classification. The model which is already trained will be put in the cloud server to perform prediction and gesture matches.
On the other hand, Android application is the main part of our mirror therapy system. The designed functions of the Android application include receiving the SEMG signal data and implementing pre-process, communication with the cloud server and displaying the gesture MAR scene performed by the patient. The basic functions needed by the mirror therapy system are implemented.

Implementation of the Mobile Augmented Reality
The basic process of the mobile augmented reality implementation will be introduced. In this study, a marker type of AR is utilized to for this part of operation. Besides, ARToolKit [23], an open-source AR project, is applied. The flowchart is described as follows.
Firstly, a marker is put on the hand that is replaced by the virtual hand, which will be overlaid by the virtual hand. After that, the camera on the mobile phone is used to capture the image with the marker inside. The application will be used to detect the location of the marker, and get the direction of the marker, thus understanding the direction for putting the hand. At last, the image taken by the camera and the virtual hand animation match with the animation located in the place of the marker.

The Demostration of the Real System
The real system implemented is shown in Figure 12. It can be clearly seen from the above picture that the left hand is the undamaged hand, and the right hand is damaged or disabled. When the right hand exercises the gesture set for the rehabilitation, the mobile platform on the right will display the gesture that the patient carried out by his left hand.
Electronics 2020, 9, x FOR PEER REVIEW 14 of 16 The real system implemented is shown in Figure 12. It can be clearly seen from the above picture that the left hand is the undamaged hand, and the right hand is damaged or disabled. When the right hand exercises the gesture set for the rehabilitation, the mobile platform on the right will display the gesture that the patient carried out by his left hand. During this process, the patient needs to pay their attention to the phone or another Android platform of the damaged hand, to send their brain feedback to pretend that their damaged hand is functioning normally like an undamaged hand.

Conclusions
In this paper, a mirror therapy system is proposed based on multi-channel SEMG signal pattern recognition and mobile augmented reality. The objective is to achieve the rehabilitation of hemiparesis following a stroke. The spectrogram obtained by analyzing electromyography signals is studied to be used as an image. Two approaches, including Convolutional Neural Network (CNN) and grid-optimized Support Vector Machine (SVM), are researched to classify the SEMG of different gestures. The experimental results show that both two approaches show high overall accuracy, and the CNN method is characterized by higher accuracy and stability. The proposed mirror therapy system is presented and available to achieve the mirror therapy goal. In the future work, we will focus on the following aspects. On the algorithm level, we will enhance the overall accuracy of the gesture recognition by regulating the structure of the model and improving the quality of the data which involves more different subjects in the data collecting process. On the system level, the doctor end will be added to the system, so that the doctor can give rehabilitation opinions on the patient's condition, make corresponding rehabilitation plans at any During this process, the patient needs to pay their attention to the phone or another Android platform of the damaged hand, to send their brain feedback to pretend that their damaged hand is functioning normally like an undamaged hand.

Conclusions
In this paper, a mirror therapy system is proposed based on multi-channel SEMG signal pattern recognition and mobile augmented reality. The objective is to achieve the rehabilitation of hemiparesis following a stroke. The spectrogram obtained by analyzing electromyography signals is studied to be used as an image. Two approaches, including Convolutional Neural Network (CNN) and grid-optimized Support Vector Machine (SVM), are researched to classify the SEMG of different gestures. The experimental results show that both two approaches show high overall accuracy, and the CNN method is characterized by higher accuracy and stability. The proposed mirror therapy system is presented and available to achieve the mirror therapy goal. In the future work, we will focus on the following aspects. On the algorithm level, we will enhance the overall accuracy of the gesture recognition by regulating the structure of the model and improving the quality of the data which involves more different subjects in the data collecting process. On the system level, the doctor end will be added to the system, so that the doctor can give rehabilitation opinions on the patient's condition, make corresponding rehabilitation plans at any time, and further improve the rehabilitation effect.