An Oversampling Method for Class Imbalance Problems on Large Datasets

: Several oversampling methods have been proposed for solving the class imbalance problem. However, most of them require searching the k -nearest neighbors to generate synthetic objects. This requirement makes them time-consuming and therefore unsuitable for large datasets. In this paper, an oversampling method for large class imbalance problems that do not require the k -nearest neighbors’ search is proposed. According to our experiments on large datasets with different sizes of imbalance, the proposed method is at least twice as fast as 8 the fastest method reported in the literature while obtaining similar oversampling quality.


Introduction
One of the current research topics of interest in supervised classification is the class imbalance problem that frequently appears in several real-world datasets [1][2][3][4][5]. The class imbalance problem occurs when, in a dataset, one of the classes has fewer objects, usually called the minority class, than the other class, usually called the majority class. This problem produces a poor classification rate for the minority class, which is usually the most important. Consequently, it becomes difficult for a classifier to effectively discriminate between the minority and majority classes, especially if the class imbalance is extreme. For example, among several patients with a suspicious disease, very few are likely to have cancer, i.e., minority class, while most are likely not to have cancer, i.e., majority class. Here, a false negative means a patient with cancer is misclassified as not having the disease, which is a severe error. Another example of the class imbalance problem is distinguishing between 100 spam emails from 100,000, given that most emails are non-spam. Like these examples, the class imbalance problem is widespread and appears in many areas such as industrial, medical, and social, among others.
In the literature, we can find different types of solutions like class decomposition [6,7] or oversampling methods that are the most frequently used solutions reported in the literature for solving the class imbalance problem. SMOTE (Synthetic Minority Oversampling Technique) [8] is the most widely used and referenced method among the oversampling methods. SMOTE addresses the class imbalance problem by increasing the number of objects of the minority class by generating synthetic objects for this class.
Due to its success, many other oversampling methods based on SMOTE have been proposed in the literature ; these methods are characterized by modifications of some of the SMOTE steps and/or addition of new steps. Other oversampling methods that do not follow the SMOTE's approach have also been proposed [22,23,26,[31][32][33]. However, most of the oversampling methods reported in the literature require computing the k-nearest neighbors of objects from the minority class or other slow steps, making them unsuitable for balancing large datasets. This paper will use the term large datasets to refer to those bigger than the conventional datasets used in the literature of oversampling, which have less than 10K objects. Among those oversampling methods not based on the k-nearest neighbors' search, we can highlight LMCMO [34], ROS (Random OverSampling), and NDO [35]. However, LMCMO needs to find the minimum distance between the minority class and the majority class objects, becoming very slow for large datasets. On the other hand, although ROS is fast, it produces low oversampling quality because ROS only generates exact copies of the objects of the minority class [36][37][38][39]. Meanwhile, NDO is an oversampling method that is not based on the search for the k-nearest neighbors, but in a fast generation of synthetic objects; showing good oversampling results. Therefore, NDO can be applied to large datasets with imbalanced classes. The above shows that the development of fast oversampling methods not based on the k-nearest neighbors' search or other slow steps has been little studied in the literature. For his reason, this paper introduces a method that follows this approach, making it applicable to large datasets with imbalanced classes. Our experiments with large datasets show that the proposed method is at least twice as fast as the fastest method reported in the literature while obtaining similar oversampling quality.
The rest of the paper is organized as follows: In Section 2, we present the related work. In Section 3, our oversampling method for class imbalance problems on large datasets is introduced. Section 4 shows our experimental results and compares our proposed method against the most successful oversampling state-of-the-art methods. We also include experiments on real large datasets, as well as scalability experiments that confirm the good performance of the proposed method in terms of time and quality of oversampling. Finally, in Section 5, we present our conclusions and future work.

Related Work
The most successful and referenced oversampling method for class imbalance problems is SMOTE [8]. The main idea of SMOTE is to oversample the minority class by randomly generating synthetic objects between objects from the minority class and some of their k-nearest neighbors selected in random manner.
In SMOTE, to generate a synthetic object, the feature difference between an object of the minority class and one of its k-nearest neighbors, selected in random manner, is computed. Then, for each feature, this difference is multiplied by a random number between 0 and 1. Finally, the multiplied differences is added to the object of the minority class in order to obtain a synthetic object that is then added to the training set. In Algorithm 1, a pseudocode of SMOTE is shown.  [40][41][42][43][44]. These methods modify SMOTE in different ways in order to improve the oversampling quality. In the next paragraphs, we will describe only some of the most widely cited and recent SMOTEbased methods.

Algorithm 1 SMOTE
Borderline-SMOTE [9] first labels each object of the minority class as a border. An object is labeled as a border if the number of its k-nearest neighbors computed from the whole training set that belong to the majority class is greater than or equal to k/2 and smaller than k. Then, Borderline-SMOTE generates synthetic objects between the border objects and their k-nearest neighbors in the minority class.
Safe Level SMOTE [11] computes, for each object of the minority class, a safe level (sl) value. This value depends on the number of nearest neighbors that the object has into the minority class among its k-nearest neighbors computed in the whole training set. Safe Level SMOTE generates synthetic objects between objects of the minority class with sl > 0 and their k-nearest neighbors in the minority class. If the safe level of the selected neighbor is 0, the object of the minority class is duplicated. Otherwise, Safe Level SMOTE computes a safe level ratio (slr) by dividing the sl value of the object in the minority class by the sl value of the selected neighbor. Then, to generate a synthetic object, as SMOTE does, the random number used for multiplying the feature difference between the object of the minority class and its selected neighbor is generated in a different interval, according to the slr value, as follows: if slr = 1 the random number is generated between 0 and 1; if slr > 1 the random number is generated between 0 and 1/slr; and if slr < 1 the random number is generated between (1 − slr) and 1.
LNE [12] is based on Safe Level SMOTE (SLS) [11], but in the generation of the sinthetic objects rather than computing the k-nearest neighbors in the minority class, LNE computes the k-nearest neighbors in the whole training set. Then, when a neighbor of the majority class is selected to generate a synthetic object, the random number (r) is computed as in SLS and adjusted as r = r(sl n /k) where sl n is the safe level of the selected neighbor.
NDO [35] generates synthetic objects around each object of the minority class by adding a fraction of the standard deviation of the values of each feature. This portion of the standard deviation is computed by dividing the standard deviation by the square root of the number of objects in the minority class, and multiplying the result by a random number. The random number is generated by a normal distribution with the mean of 0 and standard deviation of 1.
SMOTE-RSB* (S-RSB*) [13] first applies SMOTE, and then only keeps as synthetic objects the objects that belong to the lower approximation of the minority class, according to the Rough Set Theory.
K-means-SMOTE (K-means-S) [24] applies K-means for clustering the training set and then selects the clusters with an Imbalance Ratio (IR) lower than a given threshold. Finally, K-means-SMOTE applies SMOTE on those selected clusters. The amount of oversampling for each cluster is computed based on the sparsity and density values of the minority class objects. Following this idea, A-SUWO [25] first group the training set but apply Complete-Linkage Hierarchical Clustering and then apply SMOTE over each cluster.
Farthest SMOTE (Farthest-S) [27] uses a similar approach to that of SMOTE, but rather than using the k-nearest neighbors, it uses the k-farthest neighbors.
To generate a synthetic object, Geometric-SMOTE (G-SMOTE) [28] first randomly selects an object of the minority class as the center of a hyper-spherical region and, also randomly selects one of its nearest neighbors computed in the whole training set to define the size of the hyper-spherical region based on the distance between the two selected objects. Then, the synthetic object is randomly generated as a point inside of the hyper-spherical region; the generation of synthetic objects is repeated as many times as synthetic objects to be generated. SMOTE-SF (S-SF) [29] selects a subset of features to handle with high-dimensional datasets (a high number of features). The selection of the subset of features is performed by applying a feature ranking method (many methods are studied in their work). Then, SMOTE-SF generates synthetic objects similar to SMOTE but only until it reaches the number specified by an input parameter.
In the literature, other methods that are not based on computing the k-nearest neighbors for generating synthetic objects have been also proposed. However, these methods use computationally expensive techniques such as particle swarm optimization [22], selforganizing maps [23], multi-objective swarm fusion [26], genetic algorithms [31], differential evolution [32], deep generative models and variational autoencoders [33], and feature correlation computing [29]; among others for generating synthetic objects. Therefore, these methods are time-consuming and are unsuitable for large datasets with imbalanced classes.
From the above review of oversampling methods, we can see that with the exception of NDO, most oversampling methods are based on the search for the k-nearest neighbors or other slow techniques. This makes them time-consuming when applied on large datasets with imbalanced classes. Therefore, in the next section, we propose a fast oversampling method for class imbalance problems on large datasets.

Proposed Method
SMOTE is the most widely cited method for addressing the class imbalance problems and has inspired and continues to inspire the development of many oversampling methods [45][46][47][48][49][50][51]. However, SMOTE as well as SMOTE-based methods have poor time performance for large class imbalance problems because all of these methods either require searching for the k-nearest neighbors or use other time-consuming methods for balancing the minority class.
If SMOTE avoided searching the k-nearest neighbors to generate synthetic objects, this would improve its performance. An alternative to searching for the k-nearest neighbors in the generation of synthetic objects is that instead of using an object of the minority class each time and selecting one of its k-nearest neighbors as is done in SMOTE, one object of the minority class is used and the nearest neighbor's selection is performed with a faster method for determining where in the minority class it is necessary to generate the new synthetic object.
To generate a new synthetic object, SMOTE uses an object of the minority class and one of its nearest neighbors. In some sense, this is done in order to guarantee that the synthetic object is generated in an object's neighborhood belonging to the minority class. Using the nearest neighbors also aims to guarantee that the synthetic object is located in a region in the minority class because it is computed by generating a value between the values of an object in the minority class and one of its nearest neighbors for each of the features of the synthetic object. Thus, in our proposed method for generating synthetic objects in the minority class, instead of using objects of the minority class, as is done in SMOTE, we will use the mode jointly with the minimum and maximum values in each feature. We tested other central tendencies such as the mean or the median and other statistical dispersion metrics such as the standard deviation or the variance in the proposed method. However, the mode jointly with the minimum and maximum values obtained the best oversampling quality results.
To compute the mode for numerical features, we followed [52] (allowing us to compute the mode rapidly) to obtain the minimum and maximum values from a feature and divide the feature's values in the sample in 10 equal-size bins. Then, the mode is computed as described by Equation (1): where l is the lower limit of the bin with the highest frequency ( f ), w is the bin size, f −1 is the frequency of the bin preceding the bin with the highest frequency, and f +1 is the frequency of the bin following to the bin with the highest frequency.
It is important to highlight that the mode, the minimum, and the maximum can be computed quickly since they can be computed in two scans of the minority class, i.e., computing them for a feature is O(n), where n is the number of objects in the minority class. Thus, the complexity of computing them for all features is O(mn), where m is the number of features.
In our method, we propose computing the absolute differences between the mode value and the minimum and maximum values for each feature in the minority class. Then, by subtracting or adding these differences multiplied by a certain random factor to each feature's mode value, it is possible to generate a new synthetic object into the minority class. Similar to SMOTE, our method generates a synthetic object located between the mode (a central tendency measure) of the class and the neighbor objects that are not farther from the mode than the minimum or maximum values. We want to point out that using the minimum and maximum values allows our method to keep the synthetic generated values in the observed value range of the features.
Following the idea mentioned above, it is possible to avoid searching for the k-nearest neighbors, which is the most time-consuming step of SMOTE. Consequently, we obtain an oversampling method that is faster than SMOTE and the SMOTE-based methods.
The first step of our method computes, for each feature the minimum, maximum and mode values in the minority class. Then, the absolute differences between the mode and the minimum and maximum values are computed. Then, a random number is generated for each feature. This random number is multiplied by one of the differences and subtracted or added to the corresponding feature mode. To decide if a difference is subtracted or added, we generate a random number between 0.0 and 1.0; then, if the generated random number is lower than 0.5, the difference is subtracted. Otherwise, the difference is added. Thus, the generated synthetic object's feature values will be clustered around the mode and in the region defined by each feature's difference values. In the proposed method, we use a random number between 0 and 1; this allows the generation of objects as far as the differences allow, but not too far so as to avoid the generation of feature value outside the observed feature value range.
We want to point out that if the random number that is multiplied by the difference is closer to 0, the synthetic objects are generated closer to the mode of the class on the corresponding feature, or, in the opposite case, if the random number is close to 1, the feature values of the synthetic object is generated farther from the mode of the class and closer to the minimum or maximum values.
Finally, the above-described procedure is repeated as many times as the number of synthetic objects to be generated N (an input parameter of our method). A pseudo-code of the proposed method is shown in Algorithm 2.
To determine the computational complexity of Fast SMOTE, we ill analyze each one of its steps (see Algorithm 2). Steps 1-3 of Fast-SMOTE are O(n f ) (where n is the number of objects in the minority class and f is the number of features of the dataset), because as it is well-known, computing the minimum and maximum and the mode from grouped data of n values is O(n), and this must be done for each feature. Steps 4 and 5 are O( f ), because they involves just a difference for each one of the f features. The FOR loop of step 6 is repeated n times, and inside of this loop, we have a nested FOR loop that is repeated f times. In the inner loop, the steps 9-15 are O(1). Therefore the complexity of the FOR loop of step 6 is O(n f ), and hence the complexity of Fast-SMOTE is O(n f ). This complexity is smaller than the complexity of SMOTE and SMOTE-based methods, which is at least O(n 2 f ) [22,53] due to the k-nearest neighbors search. Moreover, this complexity is similar to the complexity of NDO, which is the only oversampling method reported in the literature that is not based on computing the k-nearest neighbors and therefore that can be applied to large datasets.

Algorithm 2 Fast-SMOTE
Input: A minority class, N -number of synthetic objects to be generated. Output: An oversampled minority class.
MI N ← Minimum value for each feature into the minority class. MAX ← Maximum value for each feature into the minority class. MODE ← Mode value for each feature into the minority class.
MinDi f ← MODE − MI N Absolute difference between the mode and the minimum value for each feature into the minority class.
MaxDi f ← MAX − MODE Absolute difference between the mode and the maximum value for each feature into the minority class.
Add Syn to the minority class. end for

Experimental Results
In this section, we first present an experiment to show the synthetic objects generated by SMOTE, NDO and our proposed method Fast-SMOTE. Then, in the second experiment, we evaluate the proposed method by comparing it to the methods described in the related work section using standard unbalanced public databases. Since these databases are relatively small, in the third experiment, we compared the proposed method to the fastest methods in the previous experiment but for larger datasets. Finally, in the last experiment, we demonstrate the scalability of the proposed method.

Synthetic Objects Generated by SMOTE, NDO and Fast-SMOTE
This experiment applies SMOTE, NDO, and Fast-SMOTE over three real datasets to show the synthetic objects generated by each method.
For this experiment, we used the mammography dataset that was also used in [54] for this purpose, and we included the glass6 and ecoli1 datasets from the KEEL repository [55]. For a better appreciation of the plotted objects, for this experiment, we selected only 10% of majority class objects and 10% of minority class objects. In this way, the IR of each dataset is preserved.
To show the synthetic objects generated by each method, in Figures 1-3 we plotted, in a 2-D projections, the mammography, glass, and ecoli datasets respectively, together with 10 synthetic objects generated by each method. The mammography dataset was projected on the fourth and fifth features, the glass6 dataset on the aluminum and silicon features, and the ecoli1 dataset on the Mcg and Alm2 features. In all three figures, green circles represent the objects of the majority class, and red crosses represent the objects of the minority class. Also, we show in Table 1 the mode values computed by the proposed method on each dataset (see rows) for axis x and y (see columns) for the Figures 1-3.  In Figures 1-3, we can see that Fast-SMOTE, as well as SMOTE and NDO generate synthetic objects close to the objects of the minority class.

Comparison against State of the Art Oversampling Methods
In this experiment, we compare the proposed method Fast-SMOTE to the state-ofthe-art oversampling methods in terms of oversampling quality and runtime. In this experiment, we used small-medium size datasets to allow applying SMOTE and other state of the art methods. We used all 66 imbalanced datasets from the KEEL repository [55] (shown in Table 2), and assessed the oversampling quality of the oversampling methods. This was done by oversampling a repository training dataset until reaching a full balance between classes and training a classifier with the oversampled dataset produced by an oversampling method. The testing set that is also available in the repository was used to evaluate the trained classifier. The quality of the classification result is interpreted as the oversampling quality of the employed oversampling method. Since all oversampling methods generate random numbers when a synthetic object is built, the oversampling methods were applied 30 times for each one of the 66 imbalanced datasets using the 5-fold cross-validation partition available in the repository. We used the supervised classifiers CART (Classification and Regresion Tree), KNN (K-Nearest Neighbor Classifier) (K = 5), and Naïve Bayes. Additionally, as suggested in [56], we used AUC (Area Under Curve) to assess the classification results, the most used measure for evaluating classification quality on imbalance class problems. Table 3 shows the average AUC of the 150 results (30 repetitions for each one of the 5 folds) obtained with each classifier (CART, KNN, and Naïve Bayes; see rows) over the 66 datasets for each oversampling method (see columns). As mentioned above, most of the compared oversampling methods are based on finding the k-nearest neighbors; it is observed from the results shown in Table 3 that these methods obtain the best AUC results. In particular, we can observe that SMOTE and S-RSB* obtained the best average oversampling quality. On the other hand, the methods that are not based on finding the nearest neighbors (NDO and Fast-SMOTE) do not obtain the lowest oversampling quality. The methods that obtain the worst AUC results were K-Means-S and G-SMOTE. The proposed method and NDO, obtain lower average AUC results than SMOTE and S-RSB* although unlike these methods, Fast-SMOTE and NDO can be applied on large datasets.
Additionally, in this experiment we assessed the runtime employed by the oversampling methods. This experiment was run in MATLAB 2020b, using a computer with a Ryzen 5300X 3.60 GHz processor with 32 GB DDR4 RAM, running 64-bit Windows 10.  Table 4 shows the runtime in seconds to fully balance each KEEL data set with all oversampling methods. An examination of the data presented in the table shows that S-RSB* was the most time-consiming oversampling method. By contrast, the proposed Fast-SMOTE method is clearly the fastest of all the oversampling methods under comparison. The closest method in runtime to our proposed method is NDO, but it was more than twice as time-consuming as the proposed method. Table 4. Runtime in seconds spent by the oversampling methods revised in Section 2 for oversampling all datasets from Table 2.

Dataset
Oversampling Method   Table 4. Cont. Based on this experiment, we conclude that the methods that obtain the best oversampling quality in the previous experiment are among the slowest. By contrast, the fastest methods obtain lower oversampling quality, although it is similar to that of the methods that obtain the best oversampling results. However, the KEEL repository databases are relatively small and therefore do not provide a good assessment of the runtime advantage of the proposed method. Hence, in the following experiment, the proposed method is compared to NDO that is the second fastest method in the previous experiment but for larger datasets.

Assessing the Proposed Method in Large Datasets
In this experiment, we compared the proposed method to the fastest method in the previous experiment, namely NDO, but on larger datasets taken from the UCI repository [57] (see Table 5); these datasets are 5, 7, 10, 44, and 100 times larger than the largest dataset used in the previous experiments (page-blocks0). We followed the same evaluation scheme to measure the oversampling quality of the methods as in the previous experiment. However, given the datasets' size, the oversampling methods were applied only 10 times on each imbalanced dataset of Table 5 with a 5-fold cross-validation partition. Additionally, we only used the CART classifier because it can produce good results in a reasonable time for this size of the datasets compared with the other two supervised classifiers. Table 6 shows the average AUC of the 50 results (10 repetitions for each one of the 5 folds) obtained from each dataset in Table 5 for NDO and Fast-SMOTE. As observed from the data presented in this table, the proposed method and NDO obtain quite similar oversampling quality results for all of the datasets. To validate these results, we applied the Wilcoxon test, which showed that Fast-SMOTE and NDO were statistically similar on the first three datasets (Default of credit card clients, Online News Popularity, and Statlog (Shuttle)). Meanwhile, for the Skin segmentation dataset, Fast-SMOTE was statistically better than NDO, and for the Buzz in social media (Twitter) dataset, NDO was statistically better than Fast-SMOTE. On the other hand, in Table 7, we show the runtime spent by NDO and Fast-SMOTE for fully balancing every dataset in Table 5. Table 7 shows that the proposed method's runtime is approximately 50% lower than NDO's runtime. Moreover, the difference grows with increasing the size of the dataset. For example, for the largest dataset, the dataset with 583,250 objects (153,487 objects in the minority class, a large dataset, although not so large) and 78 features, Fast-SMOTE only require 2860.4 s, while NDO require 8771.89 s, in this case, NDO spent more than three times longer time than our method. Although we could include larger datasets, the datasets of Table 5 are sufficient for showing that the proposed method is clearly faster than NDO on large datasets.

Scalability
In this experiment, we evaluate the scalability of the proposed method Fast-SMOTE. Here, we randomly generated two artificial datasets with numeric features in [0,1], one with 10,000,000 objects and 10 features, and another with 10,000 features and 10,000 objects. Then, to evaluate the scalability of the proposed method regarding the number of objects, we measure the runtime required for oversampling minority classes from one million to 10 million of objects with increments of 2 million of objects, taken from the artificial dataset with 10 features. Additionally, to evaluate the scalability of Fast-SMOTE regarding the number of features, we measure the runtime required for oversampling minority classes with 1000 to 10,000 features with increments of 1000 features from the artificial dataset with 10,000 objects. Due to the size of the minority classes that makes it unmanageable in MATLAB, for this experiment, Fast-SMOTE and NDO were implemented in C++. The experiment was run on a computer with an Intel Core i7-3820 3.60 GHz processor with 64 GB DDR4 RAM, running 64-bit Ubuntu. Although we could use larger datasets, we selected these datasets because they are sufficient for demonstrating the scalability of Fast-SMOTE. Figure 4 shows a graph of the scalability of the proposed method Fast-SMOTE with respect to the number of objects. For comparison purposes, we include NDO in the graph. In this graph, the x-axis shows the number of objects of the minority classes used in this experiment, while the y-axis shows the runtime in seconds for FAST-SMOTE (blue line) and NDO (red line) for oversampling the respective minority class. From this figure, we can observe that Fast-SMOTE scales better than NDO with increasing number of objects, allowing oversampling larger datasets in a much shorter runtime.
1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000  Figure 5 shows a graph of the scalability of Fast-SMOTE with respect to the number of features. We also included NDO results in the graph. In this graph, the x-axis shows the number of features that describe the objects of the minority classes used in this experiment, while the y-axis shows the runtime in seconds required by FAST-SMOTE (blue line) and NDO (red line) for oversampling the respective minority class. It is observed from this figure that Fast-SMOTE scales better than NDO with increasing number of features. This experiment shows that when the number of objects and variables grows, Fast SMOTE has better scalability than NDO, making it more suitable than NDO for practical use.

Conclusions
In this paper, we introduced Fast-SMOTE, a fast oversampling method non based on the k-nearest neighbors search that is the most time-expensive step of SMOTE-based methods. For each feature, the proposed method uses the value that appears most often in the minority class (the mode) jointly with the minimum and maximum values (all three can be quickly computed), in an appropriate manner, for the generation of synthetic objects around the mode in the region of the minority class defined by the minimum and maximum values. This allowed us to obtain a linear complexity oversampling method that is much faster than SMOTE-based methods, which complexity is at least quadratic. Moreover, the proposed method has a complexity similar to the complexity of NDO, which is the only oversampling method reported in the literature that is not based on the k-nearest neighbors, however according to our experiments, the proposed method scales better when the number of objects and features increases.
Comparing the synthetic objects generated by the proposed method against those generated by SMOTE and NDO, we have shown that similar to SMOTE and NDO, Fast-SMOTE generate synthetic objects close to the objects of the minority class and near the decision region between the classes.
In small unbalanced datasets where state-of-the-art oversampling methods can oversample the minority class in a short time, the proposed method obtains an oversampling quality similar to these method but it is found to be the fastest method.
Our comparison between Fast-SMOTE and NDO for large datasets with regard to runtime and AUC shows that Fast-SMOTE produces oversampled datasets that allow training supervised classifiers to obtain classification results that are statistically similar to those classification results obtained by training the classifiers with datasets oversampled by NDO, but in much less runtime. From all of these experiments, we can conclude that Fast-SMOTE is the best method for oversampling large datasets with imbalanced classes.
As mentioned above, the development of oversampling methods not based on the search for nearest neighbors is a poorly explored research direction. Hence, in future work, we propose to continue developing oversampling methods following this approach. In particular, we are interested in developing oversampling methods that can work on large mixed datasets with imbalanced classes. Additionally, the implementation and development of oversampling methods on GPU, parallel CPU or distributed computing [58][59][60][61][62] is an active research area. Thus, in future work, we will develop an implementation of our oversampling method on these platforms to improve its runtime further. Also, we will face the imbalance problem with other problems in supervised classification as noise, missing data, or multiple minority class groups.  Acknowledgments: The first author gratefully acknowledges the National Council of Science and Technology of Mexico (CONACyT) for his Ph.D. fellowship through a scholarship.

Conflicts of Interest:
The authors declare no conflict of interest.