A User-Speciﬁc Hand Gesture Recognition Model Based on Feed-Forward Neural Networks, EMGs, and Correction of Sensor Orientation

: Hand gesture recognition systems have several applications including medicine and engineering. A gesture recognition system should identify the class, time, and duration of a gesture executed by a user. Gesture recognition systems based on electromyographies (EMGs) produce good results when the EMG sensor is placed on the same orientation for training and testing. However, when the orientation of the sensor changes between training and testing, which is very common in practice, the classiﬁcation and recognition accuracies degrade signiﬁcantly. In this work, we propose a system for recognizing, in real time, ﬁve gestures of the right hand. These gestures are the same ones recognized by the proprietary system of the Myo armband. The proposed system is based on the use of a shallow artiﬁcial feed-forward neural network. This network takes as input the covariances between the channels of an EMG and the result of a bag of ﬁve functions applied to each channel of an EMG. To correct the rotation of the EMG sensor, we also present an algorithm based on ﬁnding the channel of maximum energy given a set of synchronization EMGs, which for this work correspond to the gesture wave out . The classiﬁcation and recognition accuracies obtained here show that the recognition system, together with the algorithm for correcting the orientation, allows a user to wear the EMG sensor in different orientations for training and testing, without a signiﬁcant performance reduction. Finally, to reproduce the results obtained in this paper, we have made the code and the dataset used here publicly available.


Introduction
A gesture can be considered as a symbol that communicates expressions or physical behavior. Many investigations use gestures as a communication tool between a computer and a human to interact or control devices in different fields, including medical applications, human-machine interfaces, automotive field, robot control, sign language recognition, sign language translation, movement modeling, and virtual reality [1][2][3][4][5][6]. Gesture recognition allows us to determine the user's intention based on the identification of facial expressions, eye gaze, hand motion, and the movement of the body segment or posture. The development of a hand gesture recognition model is a challenging problem because it requires high accuracy to identify the class of a given movement, the instant of time when it happens, and its duration. Moreover, to develop a system with acceptable portability and usability characteristics, it needs to be able to work with limited computational resources [2,7,8].
Currently, several works are focused on hand gesture classification instead of recognition [8][9][10][11]. Classification models only identify the class of a given hand movement from a predefined set of gesture classes. In other words, the classification only returns a label that identifies the gesture (which class), whereas the recognition returns the label as well as its timestamp (when it is produced and how long it lasts). As a result, the problem of gesture recognition is harder than classification. Moreover, a recognition model as part of its architecture, usually includes a classification model [7].
There are different types of sensors to develop hand gesture recognition models such as gloves equipped with force and flexion sensors [12,13], optical and infrared cameras [14], electromyographic (EMG) sensors [15], and inertial measurement units (IMUs) [16][17][18][19]. Each of these sensors has problems to solve to achieve promising results in hand gesture recognition applications. For example, in vision-based gesture recognition models, the detection and tracking of the hand movement may be affected by changes of illumination and occlusions [20]. Using a glove can be uncomfortable if the glove does not fit the user's hand properly or if it is used for a long period of time. Signals measured with IMUs (e.g., angular velocity and acceleration) are usually noisy, thus making it difficult to extract the part of the signal that conveys information of the gesture executed.
For this work, EMG sensors are used because they do not suffer from some of the problems mentioned above. For example, EMG sensors do not have problems because of occlusion and changes in illumination; they are usually easy and comfortable to use, and EMG sensors return less noisy signals than IMUs. EMG sensors measure electric signals on skeletal muscles. These electric signals, called EMGs, contain information of a voluntary movement commanded by the brain. EMG sensors can be of two types: intramuscular and surface sensors. On one hand, intramuscular sensors use a set of invasive electrodes (i.e., needles), which make good and localized contact with the part of a skeletal muscle under examination, thus making it possible to record EMGs from deep skeletal muscles, with little interference. However, intramuscular sensors are not practical for implementing gesture recognition models because one or more needles must be inserted in the muscle causing pain to the user. On the other hand, a surface sensor consists of a set of non-invasive electrodes, such as wearable bracelets or adhesive pods, so wearing this type of sensor is easy and comfortable [21,22]. For this reason, in this work, we use a surface EMG sensor called a Myo armband produced by Thalmic Labs (Kitchener, Ont.,Canada).
As we mentioned before, an EMG sensor measures the electrical signals when a voluntary movement is produced due to the contraction of skeletal muscles. This contraction is produced by the activation of a set of muscle fibers, called a motor unit (MU), where each fiber of the MU is connected to the axon of a motor neuron. The force produced by a muscle depends on the number of MUs activated by the brain. MUs contract when a train of electrical pulses, generated by the brain, reaches each of its fibers. This train of pulses that reach a MU is called Motor Unit Action Potential Train (MUAPT). The superposition (i.e., linear summation) of the MUAPTs of several motor units is what we call an EMG. Additionally, an EMG contains not only MUAPTs but also Gaussian noise [23] due to signals generated by nearby organs, environment noise, or noise produced by the electrical equipment used to measure the EMG. It is important to highlight that an EMG signal behaves as a non-stationary process because the distribution of these signals change with time. The statistics of an EMG depend on several factors such as the location of the sensor, corporal temperature, and body fatness [24]. Consequently, defining a mathematical model to characterize an EMG signal is a difficult task. In Figure 1, we show an example of EMGs measured by the Myo armband for each of the five gestures recognized in this work: wave in, wave out, f ist, f ingers spread, and double tap. These five gestures are the ones recognized by the proprietary system of the Myo armband, a sensor that returns EMGs composed of eight channels measured at 200 Hz. It is worth mentioning that, in other works, some authors used sensors with EMGs sampling rates from 200 to 1000 Hz obtaining sundry results in their recognition systems [7]. However, the Myo armband sensor sampling rate-200 Hz-has been demonstrated to provide high-performance results in hand gesture recognition applications, which make it one of the most popular EMGs sensor in the market [7,25].
There are two types of hand gesture recognition models: user-specific and general. On one hand, a general model is designed to recognize gestures from any user. For the training phase, a general model uses the data gathered from a set of users, and then the model can be tested with other users. Therefore, a general model must be trained only once. On the other hand, a user-specific model must be trained and tested with the data of each user. Although user-specific models must be retrained for a new user, it is well-known that these types of models provide higher accuracy than general models. Consequently, for this research, we have chosen to design a user-specific model. It is important to note that even a slight variation of the orientation of the EMG sensor between training and testing can cause the reduction of the performance in terms of accuracy for user-specific and general models [26]. This variation happens when the sensor's pods are placed in a different position every time a user wears the armband. Therefore, a correction of the sensor orientation is needed to compensate for any change in the orientation between training and testing. There are few works in the literature that address the orientation correction problem of the EMG sensor. For example, a set of experiments was carried out in [26] that demonstrate that shifting the sensor 2 cm can cause the classification accuracy to drop considerably with accuracy between 50% and 60%. In another work, it is demonstrated that, if the sensor is rotated, a remapping can be performed to reorganize the distribution of the sensors and thus try to avoid drops in the classification accuracy due to the incorrect position of the sensor [27]. Finally, another method can be applied by analyzing the angle shift of the EMG sensor [28]. However, such approaches were computationally expensive due to the high complexity of the calculations required to perform the orientation correction procedure.
Some researchers [11,27,28] use the Myo armband or the Delsys sensor [29] to measure EMGs. Even though they claim to propose hand gesture recognition systems; in practice, they only perform classification. As it was mentioned before, classification is different from recognition, so most of the literature provides classification systems instead of recognition systems. In [26], a high-density electrode array is used. This sensor array is composed of 96 channels that measure EMGs used to distinguish 11 hand and wrist movements. However, this proposal is focused on the classification task only. Regarding the correction of sensor orientation, there are a few works that tackle this problem using the Myo armband [27,28]. Despite these two works solving the problem of the correction of sensor rotation, they do not perform recognition. Therefore, it is important to design systems that perform recognition with the correction of sensor orientation.
In this work, a user-specific model with correction of sensor orientation is proposed. The proposed algorithm for the correction of orientation identifies the EMG channel with the maximum energy, and then the EMG channels are rearranged based on this channel. To the best of our knowledge, this is the first proposal that applies the correction of orientation for a user-specific hand recognition model. Therefore, the main contribution of this work is to propose a user-specific hand gesture recognition model that allows a user to wear the bracelet in different orientations between training and testing, without experiencing a significant reduction in the recognition accuracy. In this way, we avoid the need of training a user-specific model each time it needs to be used, which is the usual approach for user-specific models.
To implement our hand recognition model, five different modules are proposed: data acquisition, pre-processing, feature extraction, classification, and post-processing. The data acquisition module is in charge of gathering the signals returned by the EMG sensor. The collected EMGs are related to five hand gestures executed for 5 s each. The gestures that we use for this work are the same ones recognized by the proprietary recognition system of the Myo armband. The pre-processing module is responsible for preparing and cleaning the collected EMG data. Typical approaches used at this module include filtering [30], offset compensation [7], rectification [31], normalization [32], and segmentation [22]. The feature extraction and feature selection module allowed us to extract independent features that represent accurately each EMG to be processed. The main challenge of this module is to determine the most significant and meaningful features to be fed to the classification module. Performing feature selection is a difficult task because it is a combinatorial problem [33]. The feature extraction methods can provide time, frequency, and time-frequency features from the EMGs. Some of the feature extraction methods that can be used are the mean absolute value (MAV), root mean square (RMS), variance (VAR), standard deviation (SD), among others [13,34,35]. Once the features are selected, this information is classified in different classes using the classification module. Some of the classifiers that can be used are K-nearest neighbor [36], decision trees [37], and Artificial Neural Networks (ANN) [38]. Finally, the post-processing module consists of refining the classifier responses by smoothing or filtering spurious labels [39] The collected EMGs are related to five hand gestures executed for 5 s each. The gestures that we use for this work are the same ones recognized by the proprietary recognition system of the Myo armband. The pre-processing module is responsible for preparing and cleaning the collected data in order to reduce noise. The feature extraction and feature selection module allowed us to extract independent features that represent accurately each EMG to be processed. The main challenge of this module is to determine the most significant and meaningful features to be fed to the classification module. Performing feature selection is a difficult task because it is a combinatorial problem [33]. Once the features are selected, this information is classified in different classes using the classification module. Finally, the post-processing module consists of refining the classifier responses by smoothing or filtering spurious labels [39].
The main contributions of this paper include: (i) a user-specific hand gesture recognition model together with (ii) an algorithm for the correction of sensor orientation. The proposed recognition model is based on the use of a shallow feed-forward artificial neural network that takes as input a feature vector composed of the covariances between the channels of an EMG and a bag of five functions applied to the channels of an EMG. To reproduce the results obtained in this paper and analyzing the fine details of the proposed model, we have made the code and dataset publicly available.
Following this introduction, the remaining of this document is organized into four sections. In Section 2, we describe the dataset and the methodology used in this work. In Section 3, we present the results obtained with the proposed model. In Section 4, we present an analysis and a discussion of the results obtained. Finally, in Section 5, we present the conclusions and outline future work.

Dataset
The dataset used for this work is composed of EMG signals. These signals were measured using the Myo armband, which is a commercial and low cost EMG bracelet manufactured by Thalmic Labs (Figure 2). The bracelet has eight pods distributed uniformly along its circumference. The digital EMG signals returned by the sensor are organized into eight channels, in accordance with the order of the Myo pods, and are measured using a sampling frequency of 200 Hz. Once the Myo armband is worn on the forearm, the user is requested to make five hand gestures: wave in, wave out, f ist, f ingers spread, and double tap. The rest position, as well as other gestures, are referred to as no gesture. Figure 2 highlights the gestures used in this work in their final positions.
The dataset used here is composed of EMGs of 120 users which were randomly divided into two groups: training and testing. Subsequently, the EMGs of each user in each of these two groups were randomly divided into two subsets: one for training and the other for testing (see Figure 3). The training subset for each user 1000×8 is a matrix that contains the i-th EMG signal recorded for 5 seconds. The label for F i is denoted by the categorical variable Y i ∈ {wave in, wave out, f ist, f ingers spread, double tap}, with i = 1, 2, · · · , N. In the subset D train , we have 25 EMGs for each of the five gestures to be recognized and five EMGs for the hand relaxed. Each of the five gestures for this work was performed following a sequence of three positions: rest, final position, and rest.
The testing subset D test = {(G 1 , Y 1 ), · · · , (G M , Y M )} for each user is also composed of a total of M = 130 EMGs, with 25 EMGs for each gesture to be recognized and 5 EMGs for the hand relaxed.
The dataset used for this paper contains a total of 15,600 EMGs. In this dataset, 75% of users are men and the remaining 25% are women. The age of the users varies from 17 up to 29 years. Finally, 91% of users are right-handed and the remaining 9% are left-handed, and 16% of the users suffered some injury to the arm. For the left-handed users, the EMGs were recorded on the right forearm.

Proposed Model
In this subsection, we describe the process applied to each EMG for the training and testing subsets of each user, which is represented in Figure 4 .

Data Acquisition:
To simulate the recognition model functioning in online mode, the concept of a sliding window was used for data acquisition as can be observed in Figure 5. In this work, we simulate the operation of the system because the data were acquired offline. We used a window of n = 66 points, with a stride of also 66 points. This window was subsequently divided into six non-overlapping sub-windows of length m = 11 points each. The segment of EMG observed through a sub-window was stored in a matrix A (i) ∈ [−1, +1] m×8 , where i denotes the number of the sub-window from which the data came from, with i = 1, 2, .., 6. It should be taken into account that F i denotes the EMG measured during the 5 s of data acquisition for each sample, while A (i) denotes exclusively the portion of an EMG viewed through an 11-point sub-window. Figure 5 illustrates how a 66-point window through sub-windows is divided in an EMG recording. The EMG signal is non-stationary, but if small windows are taken, the statistical variation of the probability distribution of the features is lower than large windows. We use sub-windows because empirically we observe that predicting a label for each EMG sub-window and then making a majority vote among these labels to predict the window label gave better results than predicting a single label considering only the 66-point window.
Pre-Processing: Each obtained sub-window A (i) goes through a process of rectification and filtering [30,31]. This process is used to obtain the envelope of each channel. The EMG envelope is assumed to contain the actual information of the gesture performed. For rectification, we used the absolute value function, thus obtaining the matrix B (i) = abs(A (i) ), which has its values in the interval [0, 1]. For filtering, a second-order Butterworth low-pass filter, with a cutoff frequency of 1 Hz, was applied to each column of the matrix B (i) , obtaining the matrix C (i) with the signal filtered, with i = 1, 2, .., 6.
Feature Extraction: In this step, we want to represent each of the pre-processed sub-window observations with a feature vector that belongs to a feature vector space, where the clusters of feature vectors of each class are compact and well separated from each other. For this purpose, we computed the covariance between the pre-processed channels of each subwindow observation. Together with the covariance, we also used the following functions for feature extraction: band power, mean frequency, occupied bandwidth, mean absolute value, and waveform length. The input of these five functions was the raw EMG signal observed through each sub-window. Below, we describe briefly each function used for feature extraction: • Covariance: This feature is used to measure how the values of an EMG channel vary with respect to the values of other seven channels. The covariance is calculated between each pair of channels, and it returns a vector x σ composed of ( 8 2 ) = 28 elements. It is a popular feature used in EMG hand gesture recognition applications [40].
The vector x mav has eight elements. • Waveform length: This feature represents the cumulative length of waveform over time [41] and returns a vector x wl of eight elements. Appl. Sci. 2020, 10, 8604 8 of 21 All vectors described above were concatenated as follows: This new vector contains 68 elements and will be used as input of the classification module.
Classification: To design the classification module, we used a feed-forward artificial neural network (ANN) because it includes universal approximators (i.e., ANN with a continuous squashing function in the output is capable of approximating any Borel function, provided sufficiently many hidden units) [42]. The ANN used for this work contains three layers: input, hidden, and output. The input layer contains 68 nodes, each of which takes one component of the feature vector x. The hidden layer is composed of 500 neurons with the ReLu activation function. The output layer has six neurons, with the softmax activation function, where each neuron estimates the conditional probability P (Y|x) that the feature vector x belongs to the class Y ∈ {wave in, wave out, f ist, f ingers spread, double Tap, relax}. For training the ANN, we minimized a loss function that is the sum of the weights (regularization) plus the cross entropy between the six probabilities predicted by the network and the one-hot encoding of the actual label of the feature vector x. For minimizing the cross entropy, we used 500 epochs of the Conjugate Gradient Backpropagation with a Polak-Ribiére Updates algorithm that adjusts the step size at each iteration [43].
The number of training samples for every individual model is the total number of sub-window observations from the 130 EMGs and would be around 11,830 (i.e., 130 × 91). Each user has 130 EMGs, and each EMG includes 1000 points. These points can be divided into around 91 sub-window observations of 11 points. It is worth mentioning that, for each user (in both training and testing user groups), the same network architecture was trained using only his/her EMGs. This implies that, during the training process, an individual adaptation is carried out, and it is encoded in the weights of his/her corresponding network. To reduce the chance of model overfitting, we use regularization by weight decay (with λ = 10 −5 ) similarly as proposed by Chen et.al. [44].
Post-Processing: In this step, in order to refine the decision returned by the classification module, two steps were applied: reduction and filtering atypical values. The selection of a baseline model was achieved after the evaluation of different hyper-parameters combinations as can be observed in the Appendix A, and more specifically in Appendix A.1. The chosen baseline model was selected as the hyper-parameter combination with the best recognition accuracy results over the validation tests. Using this baseline model, an analysis of the effect and importance of the pre-processing and post-processing modules is carried out in Appendix A.2. From both of these analyses, we present in Table 1 a summary of the hyper-parameters on each module that best fit our data-set for the proposed hand gesture recognition model.

Method for Correction of Sensor Orientation
Correction of orientation is required in EMG-based hand gesture recognition systems because a rotation of the sensor causes a shift in the sequence of features fed to a classifier. This shift can cause a significant reduction of the recognition accuracy. To illustrate this problem, let us assume that we have a very simple recognition system composed of only two modules: feature extraction and classification ( Figure 6). This system takes as input EMGs of n ∈ Z + channels, where each channel is denoted by CH i , with i = 1, 2, ..., n, and returns a label y corresponding to the gesture performed. The feature extractor maps an EMG to a feature vector x = (x 1 , x 2 , ..., x n ) ∈ R n , where x i = f (CH i ), with i = 1, 2, ..., n. The classifier g returns a label y = g(x 1 , x 2 , ..., x n ). Note that the result of the classifier depends on the sequence with which the features are arranged in the vector x. If the EMG sensor is rotated, then x i = f (CH j ), with i, j = 1, 2, ..., n and i = j, except for rotations of ±360 degrees, thus causing a shift in the sequence of features fed to the classifier, a shift which in turn reduces the accuracy of the recognition system. Now, we describe the proposed algorithm for the correction of orientation. The aim of this algorithm is to correct, by software, the rotation of the EMG sensor by arranging its channels with respect to the channel with the highest energy, using a set of synchronization gestures. By default, we assume that the sequence in which the pods of the EMG sensor are arranged is S = 0, 1, ..., 7. Given a set of synchronization gestures G sync = {E 1 , E 2 , ..., E m }, where E i ∈ [−1, +1] k×8 , with k ∈ Z + and i = 1, 2, ..., m, and the sequence S, the algorithm for the correction of orientation consists of the following steps:

1.
For each EMG E i ∈ G sync : (a) Compute the envelope E i of the EMG E i . For this task, we first rectify the EMG by computing its absolute value and then filter the rectified EMG using a second-order Butterworth low-pass filter Ψ, with a cutoff frequency of 10 Hz, thus obtaining the signal Compute Selecting the gesture for synchronization is a key aspect for the correction of orientation to be successful. In this work, we selected the gesture for synchronization by testing each of the five gestures considered here using 20 users from the training set. For each of these 20 training users and for each candidate gesture for synchronization, we selected randomly four EMGs out of the 25 training EMGs of that candidate. With these four EMGs for synchronization, we followed the procedure stated for experiment 4, which is described in the next section. In this way, we estimated the recognition accuracy for each of the five candidate gestures for synchronization over 20 training users: wave in (83.68%), wave out (96.76%), f ist (91.80%), f ingers spread (83.96%), and double tap (84.12%). Therefore, based on these recognition accuracies, we selected wave out as the synchronization gesture for this work.
The way how, in practice, the recognition model combined with the algorithm for correcting the orientation is as follows. Here, we assume that the recognition model is already trained. After wearing the bracelet, a user has to perform the synchronization gesture a given number of times. Then, the algorithm for correcting the orientation finds an arrangement of the pods based on the synchronization EMGs. After this, the recognition systems use this new arrangement of the pods to rearrange the channels of an EMG before it passes to the recognition system. Note that the process of acquiring the synchronization EMGs must be performed each time a user wants to use the recognition system.

Experiments and Results
In this section, we describe the experiments that we executed in this work and also analyze the results obtained. For reproducing the results of the proposed model, we make publicly available the Matlab code [45] and the dataset used for this paper [46].

Experiments
For this work, a total of four experiments were executed combining rotation and no rotation of the bracelet with and without the correction of orientation proposed here. The details of each experiment are presented below.
Experiment 1 : In this experiment, we evaluated the performance of the proposed algorithm when: (a) there is no rotation of the bracelet for training and testing, and (b) there is no correction of the orientation of the training and testing EMGs. This experiment simulates the ideal scenario where each user trains and tests the recognition model with EMGs acquired placing the bracelet always in exactly the same orientation. For this experiment, all users wore the bracelet, for training and testing, in the orientation recommended by Thalmic Labs, the manufacturer of the Myo armband. This recommended position implies that a user should wear the bracelet in such a way that pod number 4 is always parallel to the palm of the hand. Experiment 2 : In this experiment, we evaluated the performance of the proposed algorithm when: (a) there is no rotation of the bracelet for training, but there is rotation for testing, and (b) there is no correction of the orientation of the training and testing EMGs. In this experiment, the training EMGs were acquired with the bracelet placed in the orientation recommend by the manufacturer of the Myo armband.
Experiment 3 : In this experiment, we evaluated the performance of the proposed algorithm when: (a) there is no rotation of the bracelet for training, but there is rotation for testing, and (b) there is correction of the orientation of the training and testing EMGs. In this experiment, the training EMGs were also acquired with the bracelet placed in the orientation recommend by the manufacturer of the Myo armband.
Experiment 4 : In this experiment, we evaluated the performance of the proposed algorithm when: (a) there is rotation of the bracelet for training and testing, and (b) there is correction of the orientation of the training and testing EMGs. This experiment simulates a general scenario where the user trains the recognition model using an orientation different from the orientation of testing. The experiments described above are summarized in Table 2. Simulation of rotation of the bracelet : All the EMGs of the dataset used for this work were acquired with the users wearing the bracelet in the orientation recommended by Thalmic Labs. To simulate rotations of the bracelet, we assume that, by default, the pods of the Myo armband are ordered according to the sequence S = 0, 1, ..., 7. Then, with uniform probability, we selected randomly a number r from the set {−3, −2, −1, 0, +1, +2, +3, +4}. Then, we simulate the rotation of bracelet by computing the new sequence S = s 1 , s 2 , ..., s 8 of the pods, where s i = mod(s i + r, 8), with s i ∈ S and i = 1, 2, ..., 8. Note that, in this way, we simulated rotations of the bracelet clockwise and counterclockwise in steps of 45 degrees.
Protocol for model evaluation : The recognition model was designed and trained using the EMGs of the 60 users from the training subset processed according to the settings established for each experiment. Likewise, for testing, we used the EMGs of the 60 users from the testing subset. For each user from the testing subset, the model was trained using the 25 training EMGs for each gesture to be recognized and the five EMGs of the hand relaxed. Once the model was trained, it was applied to the 25 testing EMGs for each gesture to be recognized, including the five testing EMGs of the hand relaxed, using a 66-point window, with a stride of 66 points. In this way, for each testing EMG, we obtained two vectors: a vectorl = (l 1 , ...l p ), containing the labels predicted by the model, and a vectort = (t 1 , ...t p ) containing the instants of time to which these labels correspond.
For evaluating the recognition accuracy, we used two sets: A and B. The set is formed by the ordered pairs (ŷ i , t i ), which are obtained from the vectorsl andt, where k denotes the number of time points of the EMG, with k = 1000 for this work. For example, given the predicted vectors of labels and timesl = (1, 3, 3, 3, 2, 1, 1) andt = (10, 20, 30, 40, 50, 60, 70), respectively, the first 10 pairs of the set A will be (1, 1), (1, 2), (1, 3), ..., (1,10), the next 10 pairs will be (3,11), (3,12), (3,13), ..., (3,20), and so on. The set B = {(y i , t i )} k i=1 is the ground truth for the recognition, where the pair (y i , t i ) contains the actual label y i for the time t i of the EMG, with i = 1, 2, ..., k. The values of y i and t i were obtained from the manual segmentation of the EMGs.
A recognition was considered successful if each of the following three conditions are met: (a) all the labels of the ordered pairs in the set A * must be equal to each other, where the set A * is obtained from the set A by excluding from A all the ordered pairs that contain the label no gesture. (b) All the ordered pairs in the set A * must be connected, which means that for at least one permutation of the times of the pairs in A * , the absolute value of the difference between adjacent times in that permutation must be exactly equal to 1; and (c) the overlapping factor computed using the following equation must be equal to or greater than a given threshold ρ: where the set B * is obtained from the set B by excluding from B all the pairs that contain the label no gesture. For this work, we used a threshold ρ = 0.25. It is important to note that, for evaluating the recognition, we only processed the 25 testing EMGs of each of the five gestures of interest, thus excluding the class no gesture. An illustration of the calculation of value ρ through overlapping among ground-truth and the vector of predictions can be observed in Figure 7. At this point, it is worth emphasizing that classification and recognition are different concepts. Classification compares a predicted label with a ground true label of a given EMG sample, which allows for identifying the corresponding hand gesture class for such sample. On the other hand, recognition involves not only inferring the label of a given sample, but also requires calculating the instants of time where the gesture was performed. For this purpose, it is required to compare the vector of points related to the ground truth with the corresponding vector of predictions for a given EMG sample by using the overlapping factor ρ.  The time response of the proposed model was evaluated using a desktop computer with a 4-core processor of 4 GHz and 16 GB of RAM. The time response obtained for the proposed model is the average of the times that takes the model to process each window observation of 66 points.

Results
The averages of the classification and recognition accuracies, and the time of processing using the EMGs from the testing subset are presented in Table 3. For experiments 3 and 4, we tested the algorithm for correction of the orientation using 1, 2, and 3 synchronization EMGs, which for this work correspond to EMGs of the gesture wave out.
Finally, in Table 4, we present the classification and recognition accuracies classified according to sex (female and male) and handedness (left and right-handed) for experiment 4, with four EGMs for synchronization. It is important to mention that the classes' sex and handedness are not mutually exclusive. The time included in Table 4 corresponds to the time that the model takes to classify a 66-point window.

Discussion
For all the experiments, the difference between the classification and the recognition accuracies is due to the way that these performances are computed. Let us remember that, for classification, the time in which a gesture is executed is not taken into account-unlike for recognition, where the time matters in the computation of the result.
The results presented in Table 3 show that the highest classification and recognition results are obtained in experiment 1, where we simulate an ideal case in which a user always wears the bracelet in exactly the same orientation both for training and testing. It is worth pointing out that the results of experiment 1 are the targets for the results of the remaining three experiments. When, in experiment 2, we simulate the rotation of the sensor for testing, and when the recognition model does not include a correction of orientation, the classification and recognition accuracies hugely deteriorate, making the recognition model almost useless since a random model will achieve a classification accuracy of 17% for six classes. For the case of experiment 3, where there is rotation of the sensor, for testing and training, the EMGs are acquired in the recommended orientation for the My armband, and the correction of orientation is included for the training and testing EMGs. For experiment 3, the classification and recognition accuracies improve significantly with respect to the accuracies of experiment 2. The highest accuracies of experiment 3 are obtained with four synchronization EMGs. For the case of experiment 4, the training and testing EMGs are acquired in different orientations from the recommended orientation for the Myo armband, and the orientation of the sensor for training is different from the orientation of testing, and includes the correction of orientation for training and testing; we also observe an improvement of the accuracies with respect to the ones of experiment 2. Once again, among the accuracies of experiment 4, the highest results are obtained with four synchronization EMGs. The high standard deviations of the best accuracies of experiments 3 and 4 (both achieved with four synchronization EMGs) indicate that the recognition model with the correction of orientation proposed in this work has a performance that varies significantly from one user to another. This result could be caused due to a miss performance of the algorithm for the correction of orientation for some users.
The results presented in Table 4 allow us to understand better the recognition and classification accuracy of the proposed model under the configuration that was set for experiment 4, with four synchronization EMGs. For the case of the class sex that is composed of the categories male (with 45 testing users) and female (with 25 testing users), the model achieved recognition accuracies of (95.89 ± 2.80)% and (93.90 ± 12.52)%, respectively. It is interesting to see that the proposed model produced, on average, better recognition accuracies for females than for males. However, the overlapping of the confidence intervals µ ± σ of the recognition accuracy for females and for males suggests that this difference might not be statistically significant. The standard deviation for the classification and recognition accuracies is significantly lower for females than for males. Regarding the class handedness, on average, better recognition accuracy was achieved for the two left-handed testing users who, according to our protocol for EMG acquisition, performed the gestures using their right hand. As in the case of the class sex, for the class handedness, we cannot conclude that there is statistically significant difference between the performance of our model with left-handed and right-handed users, since the confidence intervals µ ± σ of the recognition accuracies of these two categories overlap. In-depth research needs to be performed to analyze if there is a statistical difference between the classification and recognition accuracies for the classes sex and handedness.
Finally, regarding the processing time, the average times of all the experiments do not have significant variation among each other. This suggests that the algorithm for the correction of orientation is cheap in the processing time, making its use in real-time recognition systems possible, which should respond in less than 300 ms.

Conclusions and Future Work
In this work, we have presented a model for hand gesture recognition. Together with this model, we have also presented an algorithm for correcting, by software, the orientation of the EMG bracelet that is used to measure EMGs on the forearm for recognizing, in real time, five gestures of the hand: wave in, wave out, f ist, f ingers spread, and double tap. The proposed gesture recognition model is user-specific, which means that it is trained and tested with EMGs of the same user, but its architecture is the same for all users. For the architecture of the recognition model, we use a shallow feed-forward artificial neural network, which takes as input a feature vector composed of the covariances between the channels of an EMG and a bag of five functions applied to each channel of an EMG. The algorithm for the correction of orientation is based on finding the index of the channel of maximum energy for a set of synchronization EMGs, which for this work corresponds to the gesture wave out.
The average classification and recognition accuracies of the proposed model are relatively high and have low standard deviations when the bracelet is placed in exactly the same orientation for training and testing. However, by simulating rotations of the bracelet for testing (which is very likely to occur in reality), we have demonstrated that the accuracies of the proposed model deteriorate so much that they behave almost as bad as a random recognition algorithm. The inclusion, as part of the recognition model, of the algorithm for correcting the orientation of the bracelet significantly improves the classification and recognition accuracies. This improvement of the accuracy increases on average by including more synchronization EMGs. In this work, using four synchronization EMGs allowed us to obtain average accuracies that are almost as good as when the bracelet is placed on exactly the same orientation for training and testing. However, the main drawback of the proposed model for gesture recognition is its high standard deviation when we include the algorithm for correcting the orientation of the EMG bracelet.
Even though the experiments and the results presented in this paper have been executed and obtained using EMGs measured with the Myo armband, the proposed model has the potential to be used with EMGs acquired using other sensors (i.e., bracelets) whose designs are similar to the Myo armband-for example, with the new EMG sensor gForce-pro.
Future work includes testing the proposed algorithm for correcting the orientation of the EMG bracelet for general gesture recognition models. Another future work includes studying statistically the difference in the recognition accuracy for the classes sex and handedness.

Funding:
The authors gratefully acknowledge the financial support provided by the Escuela Politécnica Nacional (EPN) for the development of the research project "PIGR-19-07 Reconocimiento de gestos de la mano usando señales electromiográficas e inteligencia artificial y su aplicación para la implementación de interfaces humano-máquina y humano-humano".

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Model Parameters
Appendix A.1. Hyper-Parameters Tuning This Appendix A.1 provides details of the hyper-parameter tuning process that we performed to obtain the best possible model for our hand gesture recognition model. For this purpose, we consider the number of hidden units, the regularization parameter λ, and the number of epochs (i.e., to analyze early stopping) as the tuning hyper-parameters on our model. We summarize in Table A1 the combinations tested over the validation user set (i.e., users 31-60), and the configuration used for testing is highlighted, which was the best configuration obtained based on the recognition metric. It is noticed that the best result is sorted by recognition accuracy.
In Table A2, we present the confusion matrix of the best obtained model which was selected as a baseline for this work. As mentioned before, the confusion matrix is a tool for analyzing classification results. It can be observed that the lowest sensitivity is on relax 46.67%; this means that several relax EMGs were classified incorrectly. On the other hand, the highest precision is obtained by the gesture f ingers spread 95.45%.
In Table A3, it can be observed that classification results are similar in these tests (around 96%, see Tables A4-A6). On the other hand, recognition accuracy results show the effects and importance of the post-processing module increasing performance from 64.35% to 93.25%. With these results, we justify the pre and post-processing modules used in this hand gesture recognition model.