Label Self-Advised Support Vector Machine (LSA-SVM)—Automated Classification of Foot Drop Rehabilitation Case Study

Stroke represents a major health problem in our society. One of the effects of stroke is foot drop. Foot drop (FD) is a weakness that occurs in specific muscles in the ankle and foot such as the anterior tibialis, gastrocnemius, plantaris and soleus muscles. Foot flexion and extension are normally generated by lower motor neurons (LMN). The affected muscles impact the ankle and foot in both downward and upward motions. One possible solution for FD is to investigate the movement based on the bio signal (myoelectric signal) of the muscles. Bio signal control systems like electromyography (EMG) are used for rehabilitation devices that include foot drop. One of these systems is function electrical stimulation (FES). This paper proposes new methods and algorithms to develop the performance of myoelectric pattern recognition (M-PR), to improve automated rehabilitation devices, to test these methodologies in offline and real-time experimental datasets. Label classifying is a predictive data mining application with multiple applications in the world, including automatic labeling of resources such as videos, music, images and texts. We combine the label classification method with the self-advised support vector machine (SA-SVM) to create an adapted and altered label classification method, named the label self-advised support vector machine (LSA-SVM). For the experimental data, we collected data from foot drop patients using the sEMG device, in the Metro Rehabilitation Hospital in Sydney, Australia using Ethical Approval (UTS HREC NO. ETH15-0152). The experimental results for the EMG dataset and benchmark datasets exhibit its benefits. Furthermore, the experimental results on UCI datasets indicate that LSA-SVM achieves the best performance when working together with SA-SVM and SVM. This paper describes the state-of-the-art procedures for M-PR and studies all the conceivable structures.


Introduction
Stroke represents a major health problem in today's society. One of the consequences of stroke is foot drop. Foot drop is a weakness that appears in specific muscles in the ankle and foot such as: the anterior tibialis (AT), gastrocnemius (Gas.), plantaris and soleus muscles. Foot flexion and extension are normally generated by lower motor neuron (LMN). Lesions in the lower motor neuron (LMN) will cause foot drop. Foot drop manifests in 52% to 67% of patients with spinal upper motor neuron (UMN) [1]. Foot Drop (FD) is a common disorder without specifying to age and affects around 1% of women and 2.8% of men [2]. Bio signal control systems like electromyography (EMG) are used devices that target leg rehabilitation. These devices target various leg impairments, including foot drop. Electromyography (EMG), which records myoelectric signals from muscle activity, has been widely used to detect the user's intended action [3]. The EMG electrodes are placed in the subject's limb, either in an invasive or non-invasive way. A wide range of people prefer not to implant electrodes inside the body and instead prefer to use a surface EMG (sEMG). However, the sEMG has numerous disadvantages such as crosstalk from other muscles and robustness. Furthermore, it is hard to acquire a myoelectric signal from deeper muscles, so it is difficult to deal with myoelectric signal processing using EMG. Myoelectric pattern recognition (M-PR) methods can also be applied to artificial intelligent applications. This system contains multiple steps-first the sEMG data is filtered from the noisy signal, then it is applied to extract and reduce features, to minimize the large sample data. In addition, methodologies such as support vector machine (SVM), cSA-SVM, vSA-SVM and label classifications are studied in this paper. One of the popular machine learning methodologies is the support vector machine (SVM) method, which is used to classify data. Vapnik [3,4] proposed the support vector machine as an influential classification method, various forms of SVM were presented in the literature and applied in several different applications. SVM categorizes class problems into two; classes and multi-classes. The SVM method gives an ideal decision for the formation of a boundary between two or multiple classes. The margin created, separates the classes and the decision boundary is maximized. To theorize the binary classification by a training set of N samples, a vector for the input data should be considered, which depends on the ith sample. This is labeled related to its class. The purpose of SVM is to separate the binary labeled training data within the hyperplane that has a maximum distance from them. This is called a maximum margin hyperplane [3,5]. The standard SVM disregards the train data that is not separated linearly by the kernels through the training stage, which occurs during the outline of the tolerance parameters in the impartial function and restrictions. For this reason, it will be classified incorrectly if the data are similar or match the misclassified data that appears in the test set. This happens when the data that are close to the misclassified data are unspecified. This results in a misclassification that is not sensible and not controlled [6]. A non-repeating, self-advising method for SVM was adapted [7], which extracts consequent knowledge from the training phase without the addition of extra parameters. The misclassified data is supplied from two prospective sources. First, from outliers and second, from the data that has not been separated correctly [8]. Many researchers have adapted versions of SVM with the goal of raising the classification efficiency and performance for some applications. Use of label classification on experienced knowledge is one of the methods that has been used without increasing the cost since there is no addition of an extra parameter. For example, they proposed novel methodologies for texture analysis to improve the single-label classification of facial features [9,10]. Masood, A. et al. [6] suggested to enhance label classification with class classification techniques. In addition, they dealt with the problem of limited labeled data available, especially for histopathological images. They proposed a novel learning model, created on a deep belief neural network and semi-advised SVM to make effective use of labeled data along with unlabeled data for the training phase. It displayed improved performance when matched with different state-of-the-art approaches for skin cancer diagnosis. The proposed model was used for diagnosing skin cancer. Multi-label classifying is a predictive data mining application with multiple applications in the world, including automatic labeling of resources, including videos, music, images and texts. The multi-label data can be used as a learning tool and can be achieved by different methods, such as: the problem transformation method which has two common methods, the adaptation method, and ensembles of classifiers [3]. Several applications include research work with the use of label classification methods (LCM) to improve the progression of the search for related information on Twitter. Five different labels are defined to categorize tweets, including news.
The system was proposed to analyze complex motion in events; it combines the tracking and multilabel hypergraphs of moving targets in video sequences [11]. The adaptation methodologies state that some classification models were primarily intended for resolving binary problems and then were expanded to solve multi-class problems. By contrast, other methods can easily work with several classes. In this case study, the novel recognition system, that is an integration of label classification methods with SA-SVM is used to get LSA-SVM for two-class classification, to overcome the problems and improve the reliability of the diagnosis process. It is important to develop computational tools for automated diagnosis that operates on a quantitative measure. Such tools can facilitate objective mathematical judgment complementary to that of medical experts and help them to identify the affected areas more efficiently with more accurate diagnosis and less wastage of time in treatment while trying not to lose the bounder for working in real time standard.

The Standard SVM
The basic idea of the SVM, which simplifies the pair (w, b) is described in the hyperplane with the equation < w, x > +b = 0. SVM can utilize to produce a non-linear decision function, by projecting the training data to a higher dimensional internal product space, known as feature space, by applying a non-linear map φ(x) : R n → R d .
Although the optimal linear hyper-plane calculates in the feature space, by applying kernels it is capable to make necessary processes in the input space using k(x i , x j ) = < φ(x i ), φ(x j ) > which is an internal product in the feature space. In terms of these kernels, the decision function can be written as the following Equation (1): The decision value for each X of the test sets have either a negative or a positive value, depends on the situation of X and the hyperplane that has been clarified as Equation (2): There are three mutual kernel functions in SVM-Radial Basis Function kernel (RBF), Polynomial kernel and Sigmoid kernel. This paper has used the RBF kernel as Equation (3):

Self-Advised Support Vector Machine (SA-SVM)
Advised Weighted-SVM treats the neglect of SVM from the information that can be obtained from misclassified data. By the creation of advised weights, which depends on the distance between the misclassified train data and the classified train data. In addition, applying these weights together with decision values of SVM in the test phase, assist the procedure to reduce the outlier data [12,13]. The details of the Self-Advised SVM procedure is demonstrated in the following steps: 1. Classify the hyper-plane founded by applying the decision function is Equation (4): 2. Misclassified data that samples in the first train phase are recognized. The misclassified datasets (MD) in the training phase are calculated by Equation (5): The MD set may be null but the empirical outcomes appear when the presence of misclassified data in the training phase is a communal existence. It should be recognized that trying any technique to benefit from misclassified data should have a control to affect the outlier data. When the misclassified data is included to resemble samples, the use of misclassified data improved the classification accuracy [14]. 3. The algorithm indicates: If MD is null then go to the testing phase or else compute neighborhood length (NL) for each Xi of MD. Equation (6), defined NL.
where X j = 1, . . . , N is the training data that does not belong to the MD set. If the training data is a map with a higher dimension, the distance between xi and xj can be evaluated in Equation (7) with reference to the related RBF kernel 4. Calculating Advised Weight AW(xk) for each sample xk from the test set using Equation (8).
These AWs represent the closest test data to the misclassified data.
5. The absolute value of the SVM decision values for each xk from the test set are considered and scaled in to [0, 1]. 6. Finally, for each xk from the test set in Equation (9): If AW(xk) < decision value (xk) then : which is identified with normal SVM, otherwise:

Label Classification
Label Classification (LC) contains two types of classifying-Single Label Classification (SLC) and Multi-Label Classification (MLC)-which are the supervised learning problems where sample data are connected to single or multiple labels. Applications that use the SLC and MLC have increased in different fields. For example, text classification, scene and video classification, bioinformatics and Biomedical Text Data [15]. Generally, Binary Relevance (BR) is a method used for Multi-Label Classification (MLC). Label Classification LC deliberates every label as an independent binary problem and its work depends on the lack of appearance for the non-direct modeling label correlations. Most of the existing methods contribute to complexity of model inter-dependencies between the labels. Another method used in Multi-Label Classification is to perform problem transformation, where a multi-label problem transformed into one or more single-labels like binary or multi-class problems. That activates single-label classifiers and is transformed by their single-label predictions into multi-label predictions. Problem transformation is used to describe both flexibility and scalability. They apply Support Vector Machines, Naive Bayes, k Nearest Neighbor methods and Perceptron [16]. Where X d ⊂ R the input sample domain, the sample feature formed as a vector d. The sample input forms a vector of d feature X = [x 1 , . . . , x d ], while the l output domain is L = 1, . . . , L. Each sample (x) related to a subset of these labels, which forms as L vector y = [y 1 , . . . , y L ]. Where y 1 = 1, if and only if label j is related with sample x and 0 otherwise. They assume a set of training data D of N labeled patterns as D = (x i , y i )|i = 1, . . . , N. So that the researcher can write multi-label accuracy equation for a set of N test as follow in Equation (10): Ensemble Classifier Chains (ECC) signifies a vector of absolute outputsŴ = [ŵ 1 , . . . ,ŵ L ] ∈ R L , whileŴ j signifying the absolute for the j th label. For prediction vectorsŷ 1 , . . . ,ŷ m from repeating 1, . . . , m, The absolute equation evaluate as in Equation (11): Also, threshold function f applyŵ j to get a bipartition of appropriate and inappropriate labels: y = f t (ŵ) Softmax functions also offer single-label classification. Practically, it applies to the multi-label scenario by problem transformation. Softmax loss function can modify the multilabel scenario as shown in Equation (12) [17]:

Materials
The Myoelectric Pattern Recognition system consists of two main parts-the software and hardware that are further elaborated in the phases of the real-time M-PR system, as visible in Figures 1 and 2. Overall, all phases of real-time pattern recognition based sEMG is displayed in Table 1. For the experimental stages, the system collects the data from healthy and unhealthy subjects. The collected data is then applied to train the system, resulting in the output of classification; the trained classifier, the OpenSim prediction and simulation for gait level.   The collected EMG signals were processed on a Personal Computer Intel Core i7, 2.8 GHz with 16 GB RAM and equipped with a Windows 10 operating system. A band-pass filter was used for filtering the signals in the frequency band (25-550 Hz). A notch filter was applied to remove the 50 Hz line nosiness. The EMG signals were downsized to 1000 Hz to minimize the size. To estimate the recommended mSA-SVM, LSA-SVM and ELM-LSA-SVM, the experiment embraced 13 datasets from the UCI machine learning repository [18,19] These databases were nominated from the most public benchmarks for classification and diagnosis. The diversity of these databases supports the authentication in this study. The number of instances and the attributes of each database shown in the tables in each chapter was used. It should be noted that for a dataset with multi-class we used 6 datasets while for two classes 7 datasets [20].

Procedure for Collecting sEMG Signal Data
The data wre collected from the hospital, based on the design procedure that received ethical approval to collect data from FD patients in Metro Rehabilitation Hospital in Sydney, Australia using Ethical Approval (UTS HREC NO. ETH15-0152). For the experiment, data were collected from 13 subjects. These 13 subjects involved in the offline experiment; consisted of 6 females and 7 males, aged between 18 to 84 years; the average age being 51 years. Ten of them were affected with Foot Drop. The other three were healthy with no muscle disorder. During the experiment the subjects were seated, so that the knee is in a fixed position as shown in Figure 3, to avoid the influence of position movements on EMG signals. A few digital filters were applied during data collection. The filters applied were a band pass filter between 25 and 550 Hz and a notch filter to remove the 50 Hz line noise. The EMG signals were reduced to 1080 Hz. Data for 12 s was collected for each subject's trial and data for 156 s were collected for all repetitions by the subject. Three-fold cross-validation was conducted with the offline classification. To measure the Dorsal Plantar Flexion range for the leg, we used the Goniometric Measurement such as Protractor (Angle Finder and Bevel Square Head) as shown in Figure 10. Table 2 presents the characteristics for sick and healthy subjects collected from Metro Rehabilitation Hospital.         For the Metro-Hospital dataset, we collected the Surface electromyography (EMG) signals from 13 subjects, from Rectus Femurs (RF), Gastrocnemius (Gas), Soule (Sol) and Tibias Anterior (TA). The OpenSim dataset for CG, provides Surface electromyography (EMG) signals which recorded from ten subjects. sEMG collects signals from the Medial Hamstrings (mH), Biceps Femurs long head (BF), Rectus Femurs (RF), Gastrocnemius (Gas) and Tibias Anterior (TA). Each trial was 5 s with 2 repetitions, which makes 10 s in total. Three trials were done for each subject = 30 s 1000 in each class = 30,000 and there were 4 classes, so data = 120,000/channel. There were 4 channels, which makes the entire sample data = 480,000. The data collected were divided into training data and test data using 3-fold cross-validation.

Method: Label Self-Advised Support Vector Machine (LSA-SVM)
The following concepts describe the LSA-SVM, where the misclassified data was based on calculating the neighborhood length using the label of data instead of the value for single classification method. This procedure would minimize the time process and large data can be processed. (10), to classify hyperplane:

Applying the decision function as in Equation
where x i is the input vector for i th sample labeled with y i related to its class, while α i is the non-negative Lagrange multiplier, which conflicts with standard SVM training. 2. Misclassified data samples in the first train phase are recognized. The misclassified data sets (MD) in the training phase are calculated by Equation (11): 3. The algorithm indicates that: If MD is empty, then go to the testing phase, or else calculate neighborhood length (NL) for each yi of label MD. Equation (12) defined NL.
NL(y i ) = minimum y i ( y i − y j |x ij ) (15) where y j , j = 1, . . . , N is the label of training data that do not belong to the label of MD set. The label of the training data is mapped to a higher dimension, the distance between y ou and y j is computed according to the following Equations (13) and (14) with reference to the related RBF kernel.
φ(y i ) − φ(y j ) = (k(y i , y i ) + k(y j , y j ) − 2kk(y i , y j )) 0.5 (16) that is, RBF will be: k(y i , y j ) = e −γ|y i −y j | 2 (17) 4. For each label yk from test data, the Lab Advised Weight LAW (yk) figures out as Equation (15). These LAWs represent how close the label test data are to the label of misclassified data.
The absolute value of the SVM decision values for each xk from the test set is calculated and scaled to [0, 1]. 5. For each y k from the label of the test set, If (LAW(y k ) < decision value (y k ) then y k = sign(Σ α j >0 y i α j k(x k , x i ) + b) which is compatible with normal SVM labeling, otherwise y k = y i |( y k − y i ≤ NL(y i ) and x i ∈ MD). Figure 11, explain the flow chart for the steps above.

Experiments and Results
The experiments check the performance of LSA-SVM for a single class as conducted through the state of the art pattern recognition system shown in Figure 12. This Flowchart describes each of the methods that were applied to get the output; classifying subjects as healthy and unhealthy using the novel.

Experiments on Hospital Datasets
Various experiments were done to test the performance of LSA-SVM in the Myoelectric pattern recognition. First, we adjusted the parameters of C and g from the range of (2-9, 2-8, . . . , 29,210), as shown in Table 3, then we examined the performance of accuracy while we apply v-SVM, c-SVM, v-SA-SVM, c-SA-SVM and LSA-SVM. We compare all the classifiers. Some analysis is given in each experiment; in the first experiment we used the first data set FD and the second data set CG to estimate accuracy for each classifier. The second experiment applies the third datasets from UCI.  Figure 13 and Table 4 specify that the accuracy of LSA-SVM was higher than SA-SVM and other classifiers in all types of dataset. The average accuracy for vLSA-SVM as compared with all other five classifiers is a little higher-equal to 99.06%-while in vSA-SVM it was equal to 98.75% for the FD Hospital dataset. However, for the CG OpenSim dataset vLSA-SVM gives 82.01% and vSA-SVM equal to 80.01% with five-fold cross-validation training.    Figure 14 show that the training times of vSA-SVM in two groups of datasets are much faster than those of vLSA-SVM. In that case, the LSA-SVM method did not achieve the best performance, whereas time consumption was equal to 69.4 msec for vLSA-SVM and 46.3 msec, for vSA-SVM but it still reached a real time that is less than 300msec as a standard real time measurement [22,23]. Overall, in most cases, the adaptation of the Label function used in LSA-SVM can improve the performance of the classical support vector machine and SA-SVM.

Experiments on UCI Datasets
LSA-SVM proved that it is capable of classifying four classes of leg movements consisting of a combination of three classes of the unhealthy leg (Mild, Moderate and Severe patients) and the fourth is the healthy class. This section investigates the performance of LSA-SVM on the benchmark dataset that is accessible online on the UCI machine learning website. The experiment depends on the size of the data. We implement 3-fold cross-validation on the larger sized data, while we executed 5-fold cross validation for the small and medium-sized data. Table 6 shows the data specification for a benchmark for 2-Classes dataset types. This experiment involved six classifiers-v-SVM, c-SVM, c-SA-SVM, v-SA-SVM, c-LSA-SVM, v-LSA-SVM. The optimal parameters are established and noted its effects on the accuracy and time performance of the classifier. Table 7 provides all parameters that were utilized in this experiment. Table 8 and Figure 15 show that LSA-SVM had performed reasonably through a seven different data sets with 2 classes. The comparison of LSA-SVM and SA-SVM shows that average accuracy are quite similar in some dataset. As for LSA-SVM, the accuracy of v-LSA-SVM is significantly better than c-LSA-SVM only in "Breast Cancer" equal to 92.09 dataset. From the results, recognized that LSA-SVM is the most accurate classifier across seven datasets equal to 96.42 in "Australian Credit (Statelog)" except "Pima Indians diabetes" and "Spambase" datasets that is equal to 83.31% and 62.52% respectively.   The processing time (time consumption) of classifiers also calculated. Table 9 provides the training time. Figure 16 clarifies that the training time of LSA-SVM is one of the slowest classifiers, equated to other classifiers in overall datasets. Its performance worsens when vLSA-SVM works on big data like the "Skin Segmentation" dataset equal to 133.1 msec. The cLSA-SVM is the slowest classifier, taking around 267 ms to learn "Skin Segmentation" datasets while vSA-SVM shows the faster time, equal to 0.537 msec.

Conclusions
LSA-SVM has an advantage over SA-SVM as it works on label data instead of data value. In addition to the Myoelectric leg motion classification, LSA-SVM has been applied to a wide collection of classification problems using a UCI machine learning dataset. The experimental results show that LSA-SVM performs on a wide range of dataset sizes (small to large). Overall, LSA-SVM is a promising classifier for several classification applications, particularly for Myoelectric pattern recognition. In the training test, which was executed to compare the results of LSA-SVM with all other algorithms, the p-value correlated to the ANOVA test by a level of impact α = 0.05 was 0.038, which shows good statistical difference between these groups. Therefore, it can be concluded that LSA-SVM achieved better results than these algorithms. In addition, 68.80 percent sensitivity and 76.5 percent specificity was achieved to classify the testing data for the Hospital. Label Self-Advised Support Vector Machine (LSA-SVM) was implemented and projected the Self-Advised Support Vector Machine (SA-SVM) for leg motion recognition using sEMG signals. Overall, LSA-SVM could classify four leg movements with an accuracy of 99.06 percent, deeming it comparable with renowned classifiers such as SA-SVM, SVM. Therefore, LSA-SVM could improve the performance of the advised-based SVM.
This study presented a new Label classification method, called the Label Self-Advised-Support Vector Machine LSA-SVM, to diagnose leg movements for foot drop patients. Data were collected using a surface electromyography (sEMG) device from Foot Drop Patients from the Metro Rehabilitation Hospital in Sydney, Australia using Ethical Approval (UTS HREC NO. ETH15-0152). Also, the experimental results for the sEMG dataset, UCI and OpenSem benchmark datasets prove its assistance.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: