Classiﬁcation of Guillain–Barré Syndrome Subtypes Using Sampling Techniques with Binary Approach

: Guillain–Barré Syndrome (GBS) is an unusual disorder where the body’s immune system affects the peripheral nervous system. GBS has four main subtypes, whose treatments vary among them. Severe cases of GBS can be fatal. This work aimed to investigate whether balancing an original GBS dataset improves the predictive models created in a previous study. purpleBalancing a dataset is to pursue symmetry in the number of instances of each of the classes.The dataset includes 129 records of Mexican patients diagnosed with some subtype of GBS. We created 10 binary datasets from the original dataset. Then, we balanced these datasets using four different methods to undersample the majority class and one method to oversample the minority class. Finally, we used three classiﬁers with different approaches to creating predictive models. The results show that balancing the original dataset improves the previous predictive models. The goal of the predictive models is to identify the GBS subtypes applying Machine Learning algorithms. It is expected that specialists may use the model to have a complementary diagnostic using a reduced set of relevant features. Early identiﬁcation of the subtype will allow starting with the appropriate treatment for patient recovery. This is a contribution to exploring the performance of balancing techniques with real data. Software, M.T.-V., and J.H.-T.; Validation, M.T.-V., J.H.-T., O.C.-B., and B.H.-O.; Writing–original draft preparation, M.T.-V., J.H.-T., and O.C.-B.; Writing–review and editing, M.T.-V., J.H.-T., O.C.-B., and B.H.-O. authors


Guillain-Barré Syndrome
Guillain-Barré Syndrome (GBS) was initially detected in 1916 by Guillain, Barré and Strohl. It is a rare acute paralytic polyneuropathy with four principal several clinical variants. It is an autoimmune disorder of the peripheral nervous system [1]. GBS characterizes by a fast development normally from a few days up to four weeks with an incidence closely to one to two in 100,000 people. It occurs in adults and children. GBS can damage the nerves controlling movements, pain, temperature, and touch sensations [2]. In critical cases, GBS may lead to respiratory failure and can also be mortal. The progression of GBS can be described in three phases: 1.
Initial phase: evolution of symptoms lasting days to up to four weeks 2.
Plateau phase: lasting weeks to months 3.
Recovery phase: remyelination, lasting weeks to months. Critical patients can take a minimum of two years or more. Full recovery is not achieved in some cases.
The exact cause is unknown but frequently is associated with a respiratory or gastrointestinal infection. Cytomegalovirus and Zika are associated with GBS [3].
The GBS subtypes are mainly [4]:   [5] describes the characteristics of each of the GBS subtypes. Table 1. Features of GBS subtypes [5].

AIDP
Most common variant (85% of cases). Primarily motor inflammatory demyelination ± secondary axonal damage. Maximum of four weeks of progression.
Macrophages invade intact myelin sheaths and denude the axons.

AMAN
Motor only with early and severe respiratory involvement. Primary axonal degeneration. Often affects children, young adults. Up to 75% positive Campylobacter jejuni serology.
Macrophages invade the nodes of Ranvier where they insert between the axon and the surrounding Schwann-cell axolemma, leaving the myelin sheath intact.

AMSAN
Motor and sensory affection with critical course of respiratory and bulbar involvement. Primary axonal degeneration with poorer prognosis.
Similar to AMAN but also involving vetral and dorsal roots.
Abnormality in sensory conduction, although the underlying pathology is not clear.
The first approach in the diagnosis of GBS is based upon the clinical features since it is a non-invasive method. Nevertheless, diagnostic mechanisms such as cerebrospinal fluid (CSF) analysis and electrodiagnostic studies are useful to determine the specific subtype that the patient is suffering [6]. These methods have several disavantages since they are invasive and costly. In this exploratory study, we used different sampling methods, to balance the GBS multiclass dataset. We aimed to create different predictive models using real data to identify four main GBS subtypes that a patient suffers, applying Machine Learning algorithms. It is expected that specialists may use the model to have a complementary diagnostic using a reduced set of relevant features. Early diagnosis of the GBS subtype is essential due to the rapid progress of this disorder. The treatments vary according to the subtype contracted. Sequelae and economic costs can be high unless proper treatment is started immediately.

Imbalanced Data Classification
A dataset is imbalanced when one of its classes has fewer instances (minority class) regarding the other class (majority class) [7]. purple One instance is a row in a dataset. For this study, there are 129 instances that belong to patients diagnosed with some type of GBS. Classes are the way the data is grouped in a dataset. For example, in this work, there are four classes in the original dataset. Each class represents a subtype of GBS. Standard classifiers are designed to work with balanced datasets. When a dataset is imbalanced, the classifiers take the majority class for decision making, ignoring the minority class. It affects the performance of the classifiers because, in real-life cases, it generally needs to find the classification of the minority class [8]. For example, in cases of cancer diagnosis, there are more healthy patients than those diagnosed with the disease. If we apply a classifier to imbalanced data to identify cancer patients, the classifier biases the result to healthy patients (majority class) ignoring cancer patients (minority class). The accuracy will be high; however, it is more important to identify cancer patients than healthy patients.
There are two types of imbalance data. Binary imbalance occurs when in a dataset is integrated with two classes, one of them has fewer instances (minority class) than the other class (majority class). On the other hand, the multiclass imbalance is present when the dataset has more than two classes and the instances that form them are unequal with respect to the others [9]. There are three main methods used in the literature to handle imbalanced data: * Algorithm Level: It makes a modification to the algorithm, generally adds more weight to the minority class. This method requires a deep knowledge of the operation of the algorithm to be modified. Each algorithm must be adapted to the dataset to be used. * Data Level: It consists of balancing the training set by matching the majority class with the minority class. This method is known as preprocessing since the modification of the data is done before the application of the classification algorithm. Standard classifiers are designed to work with a balanced dataset. The advantages of this method are that they are easy to configure, and they can be used with any classification algorithm. There are three sampling methods: Undersampling: It consists of eliminating instances of the majority class until matching the number of instances with the minority class. There are other undersampling variants that eliminate instances in a directed manner such as noise or instances that are in the border of the decision area.
Oversampling: This method adds instances to the minority class until the majority class is balanced with the minority class. There are different variants for oversampling. For example, Random Oversampling (ROS), makes a copy of existing instances and adds a copy of them randomly. SMOTE is one of the most successful methods for oversampling. This adds instances in synthetic form to the minority class. There are also variants of SMOTE which have demonstrated great precision. Hybrid: It is the combination of the different Oversampling and Undersampling methods.
* Cost-sensitive: Combines the methods of Data level and Algorithm Level. It is considered the costs associated with misclassifying.
Preprocessing methods have shown that balancing the training set by oversampling and undersampling of classes improves significantly the classifiers results. This regarding imbalanced data [10][11][12].
The goal of this research was to identify the best algorithm to balance Guillain-Barré Syndrome (GBS) dataset by applying different data balancing techniques at the data level, oversampling the minority class and undersampling the majority class. purpleIn the specialized literature, there are no studies to classify the subtypes of GBS using Machine Learning algorithms. In previous studies, [13,14], predictive models were created to classify the four main GBS subtypes using different classifiers. These models were created using an imbalanced dataset obtained an accuracy of 90%. In this experimental study, the data was preprocessed using different balancing techniques to balance the original dataset. With the objective that the classifiers use balanced data and know if it is possible to overcome the previously created models. The results show that balancing the data helps in the performance of predictive models. In some cases improved 90% accuracy.
In this study, purplewe try to make symmetrical the number of instances of each subtype by applying four different undersampling algorithms (Random Undersampling -RUS-, Tomek Link -TM-, One Side Selection -OSS-and Neighborhood Cleaning Rule -NCR-). Then, we compared these results with those found by Synthetic Minority Oversampling Technique (SMOTE) using different percentages of oversampling. We binarized the multiclass dataset with two different techniques: One versus All (OVA) and One versus One (OVO). We used three classifiers with different approaches: Decision tree (C4.5), Support Vector Machines (SVM) and JRip. purple Decision tree and JRip create predictive models understandable by humans and this is an advantage, especially in this case, models obtained may be useful for physicians to diagnose GBS subtypes. Moreover, C4.5, JRip, and SVM stand out their excellent results in classification tasks.
The goal was to investigate whether data balancing techniques allow to create a predictive model with a statistically significant difference with respect to a predictive model with imbalanced data.
This article is organized as follows. In Section 2, we show a literature review. 3, we present a description of the dataset, machine learning algorithms and the performance measure used in the study. Section 4 describes the experimental procedure. In Section 5, we show and discuss the experimental results. Finally, in Section 6, we summarize results, provide conclusions, and suggest future work.

Related Work
In real life, the imbalance data is frequent in cases of medical diagnosis or in the identification of variants of diseases. The main problem occurs because of existing more cases of healthy patients than patients with any disease. For this type of challenge, researchers have applied data preprocessing techniques which consist of oversampling the minority class or undersampling the majority class. These techniques have shown that balancing datasets significantly improve the performance of classifiers.
In [15], Han and coworkers proposed Distribution-Sensitive (DS). This is an oversampling algorithm for Medical Diagnosis for imbalanced data. DS analyzes the position of the minority class instances and carefully classifies them into noise samples, unstable samples, limit samples, and stable samples. Each of these samples is processed differently by the algorithm. The objective is to choose the most suitable sample to synthesize new samples. Authors apply sample synthesis methods according to the closeness among surrounding samples, and thus guarantee that the newly synthesized samples and the original minority samples share characteristics. The results showed that the accuracy of the classification algorithm is improved.
Bach et al., in 2016 [16], analyzed a dataset of 729 patients. In total, 92.6% belonged to healthy cases and 7% of cases suffered from Osteoporosis. For this imbalanced data, the authors applied oversampling and undersampling methods to detect patients with Osteoporosis. To oversample the dataset, they applied SMOTE. To undersample, they used two different methods, Random Undersampling (RU) and Edited Nearest Neighbours (ENN). Bach found that SMOTE at 300% combined with ENN gave the best results.
Kalwa et al. [17] a Smartphone Application was used to diagnose melanoma which is a type of skin cancer, considered the most deadly and difficult to treat in advanced stages. The application analyzes images and compares them with 200 images of a public dataset. This research uses SMOTE to oversampled cases of melanoma patients. The results were compared without using any preprocessing technique, resulting in SMOTE obtaining better performance regarding the data not oversampled.
In [18], Le et al. propose a framework for self-care problems detection of children with physical and motor disabilities. This research uses SMOTE to improve the prediction for the SCADI (Self-Care Activities Dataset) dataset. The results show that extreme gradient boosting using SMOTE outperforms Artificial Neural Network, Support Vector Machine and Random Forest (RF). The accuracy of their framework reaches 85.4%.
Fazal proposes a Hybrid Prediction Model (HPM) [19]. This study analyzes a dataset to improve early diagnosis of Type 2 Diabetes and Hypertension. HPM consists of Density-based Spatial Clustering of Applications with noise-based outlier detection, SMOTE, and RF. The authors successfully predict diabetes and hypertension using three benchmark datasets.
Elreedy et al. [20], conducted an experimental study to explore SMOTE performance factors, analyzing the relationship between the number of records created and the dataset dimension. They also analyzed the performance of some classifiers and the effects of applying SMOTE. Finally, they included in the study some variants of SMOTE such as Bordeline_SMOTE1, Borderline_SMOTE2 and ADASYN and their performance. For this work, they used five public datasets taken from UCI. As a result, they found that SMOTE improves the performance of the classifiers, however, this varies from one type of classifier to another. They found that the more examples of the minority class exist, the greater the accuracy. This is because the K-nearest neighbor patterns become closer to each other. They concluded that SMOTE can be used in classification problems for small datasets since increasing the size of the data improves the classification performance.
In [21], Devi and coworkers presented a modification of the Tomek Link undersampling algorithm, based on the fact that, in addition to class imbalance, there are other factors such as the existence of redundant borderline records and outliers in the data space that critically reduce the performance of classifiers. They used 10 public UCI datasets and four single classifiers for their experiments. The proposed algorithm facilitates the removal of redundant boundary records rather than simple boundary ones, with the aim of creating a sparse majority region near the decision boundary. This may help to convergence towards a balanced class distribution. This undersampling method achieves less loss of information and better performance.
Bach et al. [22], compared four different undersampling methods to balance data: Edited Nearest Neighbor, Neighborhood Cleaning Rule, Tomek Link,and Random Undersampling, against his proposed algorithm, called KNN_Order. This algorithm removes records from high-density areas to minimize loss of information. They proved the performance of this algorithm using 18 public datasets.
In addition to class imbalance and noise, the superposition of instances of different classes affects the performance of classifiers. In [23], they proposed to remove potentially overlapped data points to tackle binary class imbalance, using Neighborhood search with different criteria. This method identifies and eliminates instances of the majority class. They use 66 synthetic datasets and 24 public datasets of UCI and Keel repository in their experiments. These methods were compared with other balancing methods, achieving competitive performance over traditional methods.
In [24] Kovacs et al., they performed a detailed comparison of 85 variants of oversampling techniques for the minority class. They used 104 imbalanced datasets as well as four classifiers for their experiments. They found that oversampling leads to better results in classification on imbalanced datasets. Regarding SMOTE variants, polynom-fit-SMOTE, ProWSyn, and SMOTE-IPF gave the best results.
In [25], introduced Farthest SMOTE (FSMOTE), a modification of SMOTE. This approach increases the decision area, considering minority samples closer to the boundary. They compare different oversampling methods: SMOTE, ADASYN, borderline SMOTE, and safe-level SMOTE. For experiments, they used seven datasets and two classifiers: Naive Bayes and SVM. Results showed that FSMOTE improves the existing techniques.
Debashree and coworkers [26] proposed a modification of the Tomek-Link undersampling method. They present a solution to class imbalance and classes overlapping, as these two problems affect the performance of standard classifiers. The objective of their research was overlapping region detection, cleaning up of overlapping region, undersampling of the majority records, and an effective data-preprocessing framework. The proposed model increases the performance of the minority class while maintaining an intact majority class performance.
On the other hand, there are several studies employing bioinformatics thechniques, such as microarray tests [27]. However, the most significant disadvantage of microarrays is the high cost of a single experiment.
The data balancing through sampling methods can be applied to any imbalanced dataset, regardless of the subject. In finance, the classification can be improved, for example: In [28] SMOTE was applied to create Financial risk models. These models serve companies to prevent threats from the external economic environment or bad financial decisions. In this study, the authors used 2628 Chinese companies listed on the stock exchange. The imbalance occurs because there are more companies with healthy finances (2190 belonging to the majority class) than companies with financial risk (438 belonging to the minority class). They performed three types of experiments: In the first experiment, they used the imbalanced data and applied Adaboost and Support Vector Machine (SVM). In the second experiment, they applied data balancing with SMOTE and subsequently applied Adaboost with SVM. For the third experiment, they executed Adaboost with SVM, however, SMOTE worked at the same time that the classifiers. The results show that balancing the data improved the models with the imbalanced data. For balanced models, the third model improved a significant difference with the second model.
Online banking operations using credit cards have been increasing every day; with this growth, credit card frauds are also more common. In The results showed that the best classifiers were Bagging and SVM. SMOTE-ENN obtained the best performance compared to the other oversampling methods. For the undersampling methods, TL obtained the best performance.
Phishing is a technique used by cybercriminals to deceive and obtain personal information such as passwords, credit card data, and bank account numbers. This is achieved through fraudulent emails. A large amount of mail sent and received can help build models with Machine Learning algorithms that help predict future cyber-attacks. However, most of the emails that reach us in the inbox are true compared to phishing emails. This results in an imbalance of data. In [30], they used SMOTE to balance a dataset with 812 instances obtained from the UCI Machine Learning Repository. The dataset is divided into three classes (phishy, suspicious and legitimate). Three algorithms were used to create the models (Support Vector Machine, Random Forests, and XGBoost). The results show that the imbalanced data have poor performance. The data that were balanced using SMOTE achieved a better performance.

Dataset
The dataset used in this work are records of 129 cases of patients diagnosed with Guillain-Barré Syndrome (GBS). They received treatment for one of the four subtypes of GBS: AIDP, AMAN, AMSAN and MF. The data were collected at the Instituto Nacional de Neurología y Neurocirugía. Table 2 shows the characteristics of the dataset.  Table 3 shows the 16 relevant features selected in a previous study [31]. These attributes were selected from the original dataset with 365 features. The features V22, V29, V30, and V31 are integer values; the remaining ones are decimal.

Imbalance Ratio
In binary classification, it is common to find real-life cases where highly imbalanced data are present. An example is credit card fraud detection, where more cases of operations carried out correctly than fraudulent operations are usually found [32]. However, in cases where the number of records of one class is similar to another one it is not clear to determine when a dataset is imbalanced. For example, in [33] the researchers classified three types of different pediatric brain tumors with a dataset of 90 patients divided into three classes: 38, 42, and 10. In cases like this, there is no consensus among experts in the field if there is an imbalance of data between classes.
Imbalance ratio (IR) is the widely accepted measure to determine imbalance data. In Equation (1), IR is the ratio of the number of records of the majority class between the number of records of minority class [34]. A dataset can be considered imbalanced if IR > 1.5 [35].

Machine Learning Algorithms
In this study, we include four methods of undersampling with different approaches. These methods have demonstrated their success to improve the performance of classifiers by eliminating instances of the majority class [36]. We applied these methods to investigate if eliminating random instances of the majority class affects the performance of classifiers. On the other hand, it is proven that not only the imbalance between classes affects the performance of classifiers, but also factors such as noise affect the result [37]. For this reason, we apply three different undersampling methods for noise elimination. We also apply SMOTE, the most commonly used method for oversampling the minority class with synthetic data, using six different synthetic oversampling percentages. This method has demonstrated its success with imbalanced datasets [38]. We used three classifiers from different family, we wanted to investigate which of them gets the best performance compared to those reported in previous studies using the imbalanced dataset.

Random Undersampling (RUS)
RUS is a non-heuristic method of randomly reducing data. RUS takes the majority class and randomly removed the requested instances according to the percentage required in the algorithm. This with the objective of equalizing the majority class with the minority class until reaching the desired balance between the two classes [39]. One of the advantages of this method is that it decreases the run time [40].

Tomek Link (TML)
It is one of the most used data undersampling techniques [41]. TML is based on the Condensed Nearest Neighbor algorithm. TML is also known as a data cleaning method since it eliminates noise from the majority or minority class. On the other hand, TML does not perform data balancing between classes, however, it looks for Tomek examples and only deletes examples of the majority class for each Tomek Link found. The algorithm works as follows: A couple of records m i and m j is name the Tomek Link if they are from different classes and are closer neighbors one another. Namely, there is no record m l , in such a way d( is the distance between m i and m l . Two records building up a Tomek Link indicates that one of them is noise or both are at the limit [42].

One Side Selection (OSS)
OSS is the combination of two different undersampling methods that carefully remove records of the majority class. First, OSS applies Condensed nearest-neighbor US-CNN, which removes records of the majority class being far from the decision area boundary (redundant examples). Subsequently, OSS uses TML to remove records of the majority class that are noisy examples and also instances that are at the border of the decision area (unsafe examples). Instances of the majority class that were not eliminated are used for learning (safe examples) [43]. Algorithm 1 shows OSS steps.

Algorithm 1: One Side Selection (OSS).
Data: T (the original training set) Result: S (the resulting set) begin D = all instance minority from T and randomly selected instances majority; Classify T with the 1-NN rule using the records in D, and contrast the assigned concept categories with the original ones; Move all misclassified records into D that is now compatible with T while being smaller; Remove from D all instances majority that is believed borderline and/or noisy; S = All instances minorities retained; end The objective of OSS is to balance the training set keeping only the most significant records of the majority class without eliminating instances of the minority class [44].

Neighborhood Cleaning Rule (NCR)
NCR is a modification of the Edited Nearest Neighbor Rule (ENN) [45]. NCR improves the data cleanliness of the majority class for imbalanced data binary. NCR stands out among other undersampling methods because it considers the quality of the deleted data. It is focused only on data cleansing rather than on the balance of classes of the training set [46]. NCR works as follows: for each record, there is a N 1 sample in the training set. Then, find the three closest neighbors of each sample. When N 1 belongs to the majority class and the classification outcome is the opposite of the original class at N 1 , then N 1 is removed. When N 1 belongs to a minority class and the neighbors belong to the majority class, then the nearest neighbor is removed. [47]. Algorithm 2 shows NCR steps.
NCR eliminate outlier in the majority class of imbalanced datasets [48].

Synthetic Minority Oversampling Technique (SMOTE)
In [49], SMOTE was introduced, one of the most successful and commonly used oversampling methods in cases of binary class imbalance problems. This technique oversamples the minority class by creating synthetic or artificial data based on the similarities of the feature space between existing minority examples. SMOTE introduces synthetic examples along with the line segments that join any of the closest neighbors to the minority class. Based on the oversampling required, the neighbors of the nearest neighbors are chosen at random. These new data created synthetically improve the previous techniques that replace oversampling in a simple way. Synthetic data balance the training set helping the classifier to significantly improve the result [50]. Algorithm 3 shows SMOTE steps.
In Figure 1, we show the operation of SMOTE. Synthetic objects in the minority class are created through the interpolating of the object and his k Nearest Neighbors. In Figure 1a, we can see the dataset consisting of two classes, a majority and a minority class. Figure 1b shows the Nearest Neighbors selected to apply SMOTE. The synthetic instances of the minority class are also observed. Figure 1c shows the set of balanced data using oversampling synthetic. We used SMOTE for oversampling the minority class of our imbalanced dataset.

Single Classifiers
Decision tree (C4.5): C4.5 divides the original problem into sub-groups. For each iteration, a tree with the best gain is constructed according to the selected feature. The decision tree is constructed top-down. The feature with the highest information gain is used to make the decision [51]. This method is one of the most popular of inductive algorithms. It has been successfully applied to diagnose medical cases [52]. Support Vector Machines (SVM): SVM is used in binary classification problems. Given a training set, SVM search for the optimal hyperplanes, with a maximum margin of the distance between them [53]. The larger the margin of the classes, the lower the error and accuracy increased of the classifier [54]. SVM is based-kernel. RIPPER (JRip): JRip, a based-ruled approach, is one of the most popular algorithms for classification problems [55]. Classes are examined in increasing size. Then, a starting rule set for the class is created using incrementally reduced error. JRip creates a rule set for all the records of each class, one by one [56].

Performance Measure
We used the Receiver Operating Characteristics (ROC) curve performance measure, a frequently used tool for evaluating classifiers [57]. It has advantages over other evaluation measures, such as precision-recall. ROC curve is a two-dimensional graph that provides a good summary of a classification model performance in the presence of imbalanced datasets with unequal error costs [58]. An ROC curve is generally employed in medical scenarios where the diagnostic of presence or absence of an abnormal condition are common [59].
The area of the graph has a value between 0.5 and 1, where a value of 1 represents a perfect diagnosis and 0.5 represents a test with no discriminatory capacity diagnosis.

Binarization Techniques
In multiclass classification, it is common to decompose the original dataset containing all the classes into a binary dataset. One versus All (OVA) and One versus One (OVO) are two approaches commonly used for binarization. OVA and OVA facilitate the application of the data preprocessing techniques to balance the data before the training set goes to the classifier [60]. The OVA approach takes one class as a minority and the remaining classes are combined and transformed into the majority class. This procedure is made for the n classes of the dataset [61]. OVO trains a classifier for each possible pair of classes (n-1)/2 (pairwise learning) [62]. Figures 2 and 3 show examples of OVA and OVO approaches used in a multiclass imbalanced dataset.
We use the OVA and OVO binarization technique widely used in classification problems [63]. From a medical perspective, OVA and OVO may assist physicians in distinguishing one subtype from another, an important task since each subtype varies in severity and treatment.

Validation
We used train-test evaluation for each single classifier, employing two-thirds of data for training, and one-third for testing. Figure 4 describes the experimental procedure. We tackle our multiclass classification problem by dividing it into two different binary subproblems using OVA and OVO approaches. Purple the sampling methods use binary datasets. These are integrated with minority class and majority class. For this reason, we used two different techniques to binarize our original GBS multiclass dataset. We created 10 binary datasets divided into two groups. purple The OVA technique takes a subtype of GBS which will be the minority class. The majority class will be made up of the sum of the other three remaining subtypes of GBS. Applying OVA, we obtained four imbalanced pairs of subsets. The OVO technique performs all possible combinations between two classes that integrated a dataset. For this experimental study, six possible imbalanced subsets pairs were obtained, created by the combination of the GBS subtypes from the original dataset.

Subset Original Training SMOTE SMOTE SMOTE SMOTE SMOTE SMOTE
We conducted a Wilcoxon test [64] to search for a statistical difference among the models using a significance value of 0.05. A nonparametric test was used since it does not require a particular data distribution [35].
Purple R is a language used to perform statistical analysis, it allows you to manipulate data quickly and accurately. R creates high-quality graphics, it is free and open source. It is an object-oriented language. RStudio is an IDE or integrated development environment. This means that RStudio is a program to manage R and use it more conveniently. RStudio includes a console, a syntax editor that supports code execution, as well as tools for plotting, debugging and managing the workspace. R experiments were performed in RStudio 1.2.1335.
A package is a collection of functions, data, and documentation that improves the capabilities of R. Packages are available in CRAN (Comprehensive R Archive Network). We used DMwR package [65] to oversampling with SMOTE. We used Unbalanced package to undersample the majority class with methods RUS, TML, OSS, NCL [66]. On the other hand, we applied three classifiers to create predictive models, using RWeka package [67] for C4.5 and JRip, e1071 package [68] for SVM classifier.
Other packages used were rJava [69], a low level interface for JAVA that allows the creation of objects. The data partition and the confusion matrix was created using the packagecaret [70]. To calculate the imbalance ratio we used imbalance [71]. Curve ROC was created using pROC [72]. We used lattice [73], for data viewer. We used rpart [74], a recursive partitioning for classification trees. To plot the models created by rpart we used rpart.plot [75]. SVM was tuned with the tune function, assigning the values 0.001, 0.01, 0.1, 1, 10, 50, 80, 100 for the C parameter.

Results and Discussion
This section show results obtained applying the four different undersampling techniques and the oversampling SMOTE technique to four imbalanced subsets obtained using OVA, as well as to six imbalanced subsets obtained using OVO. Each value is the average ROC curve obtained across 60 runs, each with a different seed.
We applied C4.5, SVM and JRip classifiers after the data balancing and we evaluated the model performance using ROC, the most accepted metric for imbalanced problems. We used the Wilcoxon test to evaluate the statistically significant difference between the models using imbalanced data against to the models using balanced data.
In Tables 8 and 9, we show the IR computed of the GBS subset from OVA and OVO. The highest IR values were obtained with OVA. This is because the higher the number of the majority class with respect to the minority one the higher the result. However, in GBS3 the IR = 1.1864. Some authors consider that a dataset is imbalanced when IR > 1 [76]. For OVO, in all cases, IR > 1.5. Tables 10-13 show in bold the cases with a statistically significant difference. The structure of the four tables is as follows: first column shows the subsets obtained using binarization techniques (OVA, OVO), the GBS subtype included, as well as the number of instances for each of them. The second column shows the three classifiers used for each subset. The third column shows the results of the classifiers using the imbalanced data.
Subsequent columns show results of applying the balance techniques and their corresponding Wilcoxon test, where N S (Not Significant) stands for a not statistically significant difference between results using imbalanced data and results using balanced data, NC (Not Computed) means that the test could not be performed due to many identical results across the 60 runs or that best results were obtained using imbalanced data, and S (Significant) represents that there is a statistically significant difference between results using imbalanced data against to balanced data.  Table 10 shows results obtained after applying RUS, TML, OSS, and NCR to the four imbalanced subsets obtained through OVA. A total of 48 data balanced cases were obtained. In 16 cases, balanced data could not improve imbalanced data. In 24 cases, balanced data improved the imbalanced data with no statistically significant difference. Eight cases presented a statistically significant difference. These cases are listed below with their corresponding ROC value. GBS4 subset obtained the best results. In all 12 cases, the balanced data improved the imbalanced data, applying all four undersampling methods and all three classifiers. Furthermore, a statistically significant difference was found in four of them. GBS3 subset obtained the worst performance. Balanced data could not improve the imbalanced data in eight cases. Balanced data improved imbalanced data only in four cases, with no statistically significant difference.
The best undersampling method using OVA was RUS because it improved imbalanced data in 8 cases, half of them with a statistically significant difference. OSS improved results in seven cases, three of them with a statistically significant difference. NCR improved imbalanced data in 8 cases, however, only one of them obtained a statistically significant difference. TML obtained the worst performance, although in nine cases results were improved, none of them obtained a statistically significant difference.
We conducted 16 experiments cases for each classifier, derived from applying four undersampling methods in 4GBS subsets. From these experiments, C4.5 obtained the best results, in 11 cases balanced data improved imbalanced data, three of them with a statistically significant difference. Applying SVM, in 13 cases balanced data improved imbalanced data, but only two of them with a statistically significant difference. Finally with JRip, in nine cases balanced data improved imbalanced data, three of them with a statistically significant difference.  Table 11 shows results obtained after applying RUS, TML, OSS and NCR to the 6 imbalanced subsets obtained through OVO. A total of 72 data balanced cases were obtained. In 40 cases, balanced data could not improve imbalanced data. In 20 cases, balanced data improved the imbalanced data with no statistically significant difference. 12 cases presented a statistically significant difference. These cases are listed below with their corresponding ROC value. GBS6 subset obtained the best results. In 11 out of 12 cases the balanced data improved the imbalanced data, 5 of them with a statistically significant difference. In only one case the balanced data could not improve the imbalanced data. GBS1 subset had the worst performance. In none of the 12 cases, the balanced data improved the imbalanced data.
The best undersampling method using OVO was TML since it improved imbalanced data in 9 cases, in 4 of them with statistically significant difference. RUS and OSS behaved the same, that is, in 8 cases the balanced data improved the imbalanced data, 3 of them with a statistically significant difference. NCR had the worst performance: in 7 cases the balanced data improved the imbalanced data, 2 of them with a statistically significant difference.
We conducted 16 experiments for each classifier, as in OVA. From these experiments, C4.5 obtained the best results, in 13 cases the balanced data improved the imbalanced data, 8 of them with a statistically significant difference. Applying JRip, in 13 cases the balanced data improved the imbalanced data but only 2 of them with a statistically significant difference. With SVM, in 6 cases the balanced data improved the imbalanced data, 2 of them with a statistically significant difference.  Table 12 shows results obtained after applying SMOTE at 100%, 200%, 300%, 400%, 500%, and 1000% to the 4 imbalanced subsets obtained through OVA. A total of 72 data balanced cases were obtained as result from applying three classifiers to 24 imbalanced subsets. In 28 cases, balanced data could not improve imbalanced data. In 26 cases, balanced data improved the imbalanced data with no statistically significant difference. 18 cases presented a statistically significant difference. These cases are listed below with their corresponding ROC value. GBS4 subset obtained the best results. From 18 balancing cases with SMOTE, in only one case balanced data could no improve imbalanced data. In 7 cases, balanced data improved imbalanced data without a statistically significant difference. In 10 cases, a statistically significant difference was found. On the other hand, GBS2 obtained the worst performance. In only one case a statistically significant difference was found. In 4 cases, balanced data improved imbalanced data; however, a statistically significant difference was not found. In 13 cases, balanced data could no improve imbalanced data.
For OVA and SMOTE techniques, the best performance was obtained applying SMOTE at 100%, since in seven cases balanced data improved the imbalanced data, 5 of them with a statistically significant differences. SMOTE at 400% obtained the worst performance since in 9 cases balanced data improved the imbalanced data, however, only one obtained a statistically significant difference.
As for the classifiers, JRip obtained the best performance, given that in 13 cases balanced data improved imbalanced data without statistically significant difference. In addition, in other 8 cases we found a statistically significant difference. With C4.5, in 11 cases balanced data improved imbalanced data, however, only 5 of them obtained a statistically significant difference. Applying SVM, in 12 cases balanced data improved imbalanced data, but only 5 of them with a statistically significant difference.
We conclude that SMOTE at 100% combined with JRip obtained best results.  Table 13 shows results obtained after applying SMOTE at 100%, 200%, 300%, 400%, 500%, and 1000% to the 6 imbalanced subsets obtained through OVO. A total of 108 data balanced cases were obtained as result from applying 3 classifier to 36 imbalanced subsets. In 72 cases, balanced data could not improve imbalanced data. In 29 cases, balanced data improved the imbalanced data with no a statistically significant difference. 7 cases presented a statistically significant difference. These cases are listed below with their corresponding ROC value. GBS4 subset obtained the best results. In 6 cases, a statistically significant difference was found. In 2 cases, balanced data improved the imbalanced data with no statistically significant difference. In 10 cases, balanced data could not improve the imbalanced data. GBS3 subset obtained the worst performance. In all 18 cases, balanced data could not improve the imbalanced data.
For OVO and SMOTE techniques, the best performance was obtained applying SMOTE at 100%, since in 5 cases, balanced data improved the imbalanced data without a statistically significant difference, however, in 2 cases a statistically significant difference was found. In 11 cases, balanced data could no improve the imbalanced data. SMOTE at 400% obtained the worst performance since in 14 cases balanced data could no improve the imbalanced data. In 4 cases, balanced data improved the imbalanced data, however, only one case obtained a statistically significant difference.
As for the classifiers, JRip obtained the best performance. In 8 cases balanced data improved the imbalanced data with no statistically significant difference, however, in 6 cases we founded a statistically significant difference. In 16 cases balanced data could no improve the imbalanced data. Applying C4.5, in 19 cases balanced data could no improve the imbalanced data, in 11 cases balanced data improved the imbalanced data, without a statistically significant difference. SVM obtained worst performance, only in 5 cases balanced data improved the imbalanced data, however, a statistically significant difference was not found.
We conclude, as in OVA, for OVO and SMOTE at 100% combined with JRip obtained the best results.

Conclusions
The aim of this work was to investigate if balancing the original GBS dataset improves the predictive models to identify GBS subtypes created in a previous study. We performed 4 independent experiments applying data-level techniques.
We started by creating 10 binary datasets divided into two groups. We used OVA and OVO techniques on the original dataset obtaining 4 and 6 binary subsets respectively. We divided each GBSn subset into 2 sets, 66% for training and 34% for testing. We balanced the training subset using two sampling methods. The majority class for each training subset was undersampled applying 4 different methods: RUS, NCR, OSS, and TML. Furthermore, the minority class of the training subset was oversampled applying SMOTE at 100%, 200%, 300%, 400%, 500%, and 1000%. Undersampling and oversampling were applied for OVA and OVO.
Once the training subsets were balanced, we applied 3 different classifiers: C4.5, JRip, and SVM. The scores are the average ROC curve obtained through 60 runs, each with a different seed. We used the Wilcoxon test to assess whether there is a statistically significant difference between the imbalanced models versus the balanced models.
The number of cases with statistically significant difference between imbalanced data and balanced data across the 4 experiments was: 8 for OVA with undersampling, 12 for OVO with undersampling, 18 for OVA with SMOTE, and 7 for OVO with SMOTE.
From all 4 sampling experiments, the best results were obtained combining SMOTE with OVA. Regarding classifiers, JRip obtained the best performance since it found more cases with statistically significant differences for all experiments.
Purple Balance a subset data using oversampling obtained better performance. Adding synthetic instances to minority class applying SMOTE helped classifiers get the best performance. On the other hand, eliminating instances of the majority class resulted in losing information that the classifiers needed to achieve better performance. However, factors independent of imbalanced data, such as noise, can affect the performance of the classifiers. We found that the best results were obtained in the combinations where the majority class clearly exceeds the minority class. In these cases, the instances clearly distinguish each other and the undersampling algorithms were only responsible for eliminating noise or class overlapping that helped improve the performance of the classifiers. On the contrary, when the classes have a similar number of instances, the worst results were produced.
The results achieved in this research shows that balancing the original dataset improves the previous predictive models. In addition, this predictive model can help specialists to identify the subtype of GBS that a patient suffers. Early identification of the subtype will allow starting with the appropriate treatment for patient recovery. This is a contribution to exploring the performance of balancing techniques with real data.
As future work, we will experiment with different variants of SMOTE, and we will apply a hybrid approach using the OVA and OVO techniques. Also, we plan to build more accurate predictive models using different single and ensemble methods.