Next Article in Journal
Type II Topp–Leone Inverted Kumaraswamy Distribution with Statistical Inference and Applications
Next Article in Special Issue
The Influence of AlGaN/GaN Heteroepitaxial Structure Fractal Geometry on Size Effects in Microwave Characteristics of AlGaN/GaN HEMTs
Previous Article in Journal
Some New Observations and Results for Convex Contractions of Istratescu’s Type
Previous Article in Special Issue
Microwave Photonic ICs for 25 Gb/s Optical Link Based on SiGe BiCMOS Technology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data

Faculty of Security, Tomsk State University of Control Systems and Radioelectronics, 40 Lenina Prospect, 634050 Tomsk, Russia
*
Author to whom correspondence should be addressed.
Symmetry 2019, 11(12), 1458; https://doi.org/10.3390/sym11121458
Submission received: 28 September 2019 / Revised: 28 October 2019 / Accepted: 24 November 2019 / Published: 28 November 2019
(This article belongs to the Special Issue Information Technologies and Electronics)

Abstract

:
The presence of imbalance in data significantly complicates the classification task, including fuzzy systems. Due to a large number of instances of bigger classes, instances of smaller classes are not recognized correctly. Therefore, additional tools for improving the quality of classification are required. The most common methods for handling imbalanced data have several disadvantages. For example, methods for generating additional instances of minority classes can worsen classification if there is a strong overlap of instances from different classes. Methods that directly modify the fuzzy classification algorithm lead to a decline in the interpretability of the model. In this paper, we study the efficiency of the gravitational search algorithm in the tasks of selecting the features and tuning the term parameters for fuzzy classifiers of imbalanced data. We consider only data with two classes and apply the algorithm based on extreme values of classes to construct models with a minimum number of rules. In addition, we propose a new quality metric based on the sum of the overall accuracy and the geometric mean with the presence of a priority coefficient between them.

1. Introduction

The classification task is to divide objects in the feature space into classes or categories based on retrospective observations with the given class label values. Real data are characterized by an imbalanced distribution of classes when the number of instances in some classes exceeds the number of instances in other classes. This situation is mainly explained by the limited occurrence of minority class instances [1]. For example, the normal web browsing traffic is dominant when classifying traffic on the Internet. However, detection of rare malicious connections is very important for training [1]. Similar examples can be given from the field of medical diagnosis, detection of bank fraud, and diagnosis of equipment malfunctions.
The search for regularities in imbalanced data is a difficult task for specialists in data mining, machine learning, pattern recognition, and statistics [2]. The main problem of constructing classifiers of imbalanced data is poor adaption of standard training algorithms, which leads to a significant reduction in the effectiveness of classification. Due to the imbalance between classes, the standard classifier usually defines instances of minority classes incorrectly, since the model is retrained on instances of bigger classes [1].
It is not enough to evaluate the constructed classifier of imbalanced data using the overall accuracy [3]. Positive classes (with the smallest number of instances) are usually more important than negative classes (with the biggest number of instances). Reducing misclassification of minority class instances is crucial in many real-world challenges [4,5]. However, improving the classification quality of positive classes often leads to poor recognition of instances of negative classes, as instances of different classes often intersect. Thus, in each data classification task, the developer of the data analysis system needs to prioritize; either to focus on improving the overall accuracy or try to correctly identify positive instances with some worsening in the definition of negative ones, or to look for some compromise. Finally, it all depends on the purpose of creating the model and the requirements for it.
There is a large list of classification methods, for example, naive Bayes classifiers, support vector machines, artificial neural networks, and others. Unlike other methods, fuzzy classification does not imply the existence of rigid boundaries between neighboring classes. A classifying object may belong to several classes with various degrees of confidence. The advantage of a fuzzy classifier is understandability and interpretability of the rules, which makes fuzzy classifiers a practically useful data analysis tool.
In many real-world applications, an accurate, but also a computationally simple system, is required. Therefore, we propose to use two procedures for constructing a fuzzy classifier. The first is to shrink the input feature space to reduce complexity. The second is to tune the fuzzy classifier parameters, which increases the definition quality of the output class label. Since these two procedures are formulated as optimization problems, a single optimization algorithm is applied to solve both of them. We use the gravitational search algorithm (GSA), which has previously proven itself well when working with a fuzzy classifier [6].
Since the goal of our work is to improve the efficiency of the fuzzy classifier of imbalanced data, it is necessary to choose an appropriate metric to use as a fitness function for the GSA. We explore the possibilities of applying the following metrics: the overall accuracy, the geometric mean, and a new function that combines the two previous estimates to find a compromise version of the classifier.
The main contributions of this paper are as follows.
  • We propose a new metric based on the sum of the overall accuracy and the geometric mean of each class accuracy. The presence of the coefficient controls the priority of the estimates used.
  • We demonstrated the use of the feature selection method based on the binary gravitational search algorithm in order to reduce the effect of imbalance on classification. The application of the new metric as the fitness function assisted to find subsets of relevant features for both classes.
  • We presented the combination of binary and continuous algorithms for constructing fuzzy classifiers of imbalanced data. The continuous gravitational search algorithm helped to increase the quality of classification on selected features.
This article is organized as follows. Section 2 discusses the levels of problems when working with imbalanced data and provides basic methods for solving them. The procedure for constructing a fuzzy classifier and objective functions under consideration are described in Section 3. Section 3 gives a short description of the gravitational search algorithm. Section 4 and Section 5 present the experimental results and their analysis, respectively. Finally, we present the conclusions of our work in Section 6.

2. Related Works

Here, we represent the main approaches to improving the quality of imbalanced data classification. There are three levels of training problems on such data which include: (1) Problems associated with the definition of classification performance indexes, (2) problems related to the learning algorithm, and (3) problems related to the training data [7].
The first level is determined by the lack of an objective method for evaluating (quantitative measures) existing knowledge to select the optimal classifier. The understanding that the overall accuracy is an insufficient measure for classifying imbalanced data has led to the application of new metrics such as the AUC (the area under the ROC curve) [8], the geometric mean, the balanced accuracy, the Fβ-measure, and others [9]. To assess the effectiveness of the classifiers, the authors in [9] have proposed 18 indicators, which are classified into the following three categories:
  • Threshold metrics geared towards minimizing the number of errors, i.e., the overall accuracy, the averaged accuracy (arithmetic and geometric), the Fβ-measure, and the Kappa-statistics;
  • Metrics based on the probabilistic understanding of an error and used to assess the reliability of classifiers, such as the mean absolute error, the mean square error, and the cross-entropy;
  • Metrics based on estimating instance separability, for example, the AUC, which is equivalent to Mann–Whitney–Wilcoxon statistics [9] for two classes.
After analyzing the above 18 indicators, the authors of [7] conclude that the choice of metrics for imbalanced data is of paramount importance. Fernandez et al. [10] have described the use of a multiobjective evolutionary algorithm with a pair of metrics, which are the overall accuracy and the F1 measure. They concluded that the algorithm with simultaneous optimization of this pair of metrics can lead to a balanced accuracy for both classes.
Classification algorithms make some changes to their construction and training processes in order to reduce the influence of imbalance in classes on the classification quality [7].
Cost-sensitive learning methods are based on modifying the classification algorithm so that the costs of misclassifying the instances of minority classes are greater as compared with the instances of majority classes. A typical solution, here, is to use a weight matrix that takes into account the costs of each incorrectly classified instance [11]. This solution is not suitable for a fuzzy classifier since it does not estimate the probability of assigning an object to a particular class.
There is a small list of methods for creating fuzzy classifiers in the presence of imbalance. Weights were added to fuzzy rules in [12,13,14]. Adding a weight function allows setting the priority of some rules over others when determining the output of the classifier. The weight values are most often configured by optimization algorithms. Another method of changing the fuzzy classification tool is to introduce a bipolar model using the principle of labeling the class, called maximum rule. The adjusted degree of belonging to each class is calculated based on the positive and negative degrees of membership in the bipolar fuzzy classifier [15]. The disadvantage of this model is the need to additionally introduce and adjust the matrix of dissimilarity coefficients and the difficulty to apply this method with another principle of assigning labels. Furthermore, the addition of supplementary modifications to fuzzy systems complicates the interpretation of the resulting models. Consequently, methods for improving the quality of the classifier without interfering directly with the classification algorithm are relevant.
Data play an integral role in machine learning and data mining research. A number of data preprocessing methods have been developed in order to correct the imbalance in the data. Over-sampling methods based on increasing instances of a positive class try to produce a balanced dataset by creating additional instances of the minority class, while undersampling methods reduce the number of majority class instances to achieve a quantitative balance. The most famous representative of oversampling is the SMOTE and its modifications [5,16,17,18], in which the generation of new instances from a positive class depends on the measure of proximity to existing instances. Among undersampling methods, random undersampling (RUS) is often used. This non-heuristic method aims to eliminate class imbalance by randomly excluding instances of the negative class. Obviously, the disadvantage of RUS is the loss of information about data of a negative class [7,16,19].
Hybrid methods that combine two previous strategies of adding and removing data instances are described in [20,21]. In order to preserve useful information about majority classes, clustering methods have recently been applied [7,22,23].
Preprocessing methods are universal and easy to apply but have low efficiency and cannot be used as the only tool for solving the imbalance problem in classes. In addition, creating new instances of data is not acceptable for some classification tasks. For example, the artificial creation of patient’s records can lead to errors in diagnosing diseases.
Another way to change data to improving the quality of recognizing minority classes is by carry out a procedure for selecting informative features. Feature selection consists of selecting, from the input feature space, such a subset that would have fewer attributes but provide comparable classification accuracy relative to the full set. The formed subset should be sufficient to adequately represent all classes in the training samples. Selection methods are usually divided into four types, namely, integrated methods, filters, wrappers, and hybrid methods.
A peculiarity of the integrated (built-in) methods is the principle of feature selection, which is part of the general mechanism of training a model on specific data [24]. An example of applying such methods is the selection of features during training a decision tree [25]. However, not every classification algorithm embeds the selection process into the learning process.
Filtering methods, on the contrary, are universal, as they are used independently of the classifier at the stage of data preparation. Four groups of filters are distinguished in [26]. The methods which make up the first group are based on the distance. They select features that provide the greatest distance between classes. The second group of filters uses the calculation of the amount of information. Such methods select features which, when attached to an existing set, reduce its entropy [27]. The third group determines the relationship between features and classes using the correlation coefficient or mutual information [28]. The fourth group is represented by filters that minimize the number of inconsistent features. A case of inconsistency is the presence of two instances belonging to different classes but having the same values of the same features. Filter algorithms are easy to use but have low efficiency.
Wrappers are methods that evaluate each subset of features based on the effectiveness of the constructed classifier. As a search algorithm, they usually use metaheuristic algorithms. Since such algorithms are iterative, the classifier needs to be reconstructed after each iteration. Wrapper methods can require considerable time and resources for large datasets [24]. The advantage of wrappers is the ability to choose a set of features that will be optimal for a particular classification algorithm.
The method of applying a genetic algorithm for feature selection in the wrapper mode based on the SVM classifier is described in [29]. The fitness function of this algorithm is a measure consisting of a compromise between the geometric mean and the share of selected features. The results showed that the proposed method selects features that improve the recognition of minority classes.
Hybrid feature selection methods consist of a combination of filters and wrappers. First, a filter is used for preliminary selection, then a classifier is built on the resulting subset and a wrapper algorithm is launched [30]. This approach is described in [31], which uses symmetric uncertainty for filtering in order to weigh features relative to their dependence on class labels, and the harmonic search as the wrapper algorithm. Hybrid selection methods can be a good solution for data with a large number of features.

3. Materials and Methods

3.1. The Fuzzy Classifier

3.1.1. The Fuzzy Classifier Structure

Classification algorithms determine the most suitable class from the set of all classes C = {c1, c2, …, cl} to each object xp = {xp1, xp2, …, xpm} from the set of n objects (p ∈ [1, n]), where xpk is the value of the kth feature of the pth object, k ∈ [1, m], m is the number of features. The fuzzy classifier is constructed on the basis of production rules, each of which has its own set of fuzzy terms. A fuzzy term is a structure on the feature definition domain, reflecting the degree of object membership to a rule. The terms can be described by membership functions of various kinds such as triangles, trapezoids, bells, or Gaussian-type functions. In this work, we used the membership functions of the Gaussian type, which differ from others by the property of symmetry. Figure 1 shows an example of partitioning some attribute x1 by Gaussian terms.
A Gaussian fuzzy term characterizing the kth feature in the ith rule is given by the following expression:
T i k ( x ) = e ( x b k i c k i ) 2 ,
where i is the rule number to which the term (i ∈ [1, r]) belongs, r is the number of fuzzy rules, b is the coordinate of the term vertex, and c is the function dispersion. The term parameters listed sequentially for each feature compose the antecedent vector θ = (b11, c11, b12, c12, b13, c13, b21, c21, …, bmr, cmr).
The standard fuzzy rule consists of the antecedent part, which lists the variables and their terms, and the consequent part, which specifies the output class label as:
Ri: If x1 is Ti1 and x2 is Ti2 and … and xm is Tim then class is cj,
where cj is the label of the jth class from the set of classes C, class is an output variable.
To use the possibility of feature selection in the wrapper mode, the binary feature vector S = (s1, s2, …, sm) must be introduced into the antecedent part. If sk = 1, then the kth feature is taken into account in the classification; otherwise the feature is ignored. Given the vector S, the fuzzy rule will change as follows:
Ri: If (s1˄x1) is Ti1 and (s2˄x2) is Ti2 and … and (sm˄xm) is Tim then class is cj,
where the record (sp˄xp) indicates the use (sp = 1) or ignorance (sp = 0) of the feature and its terms in the classifier. The binary vector S = (s1, s2, …, sm) is formed by the feature selection algorithm.

3.1.2. Generation of the Fuzzy Rule Base

There are various methods for generating fuzzy terms and forming a fuzzy classifier rule base such as uniform partitioning, random generation, clustering [32], and others. In this paper, we apply an algorithm based on the extreme values of classes of the training data. This algorithm constructs compact classifiers by using the minimum possible number of rules. In this case, the number of rules is equal to the number of classes, that is, there is one rule for each class.
The algorithm based on extreme values of classes is presented in [6]. The first step is to determine the minimum and maximum values of the features for each class. In the second step, the terms are generated in such a way that the entire definition area is covered in the interval between the two extremes, and the top of the term is located in the middle of this segment. In the third step, the rule base is formed. Each feature is represented in the rule by only one term. The terms belonging to each separate class are combined in the antecedent part of the rule by the conjunction operation. The consequent part of the rule contains the label of this class.
The presented algorithm is very simple, but its efficiency is not high. Therefore, it is necessary to use parameter tuning as an additional training step. The description of the procedure for term parameter tuning with the gravitational search algorithm is given in Section 3.2.

3.1.3. Output of Fuzzy Classifier

The output of the classifier for the input string xp is formed by sequentially performing three steps. In the first step, the value of the membership function of the object to each term is calculated:
μ i k ( x p k ) = T i k ( x p k ) .
The degree of the object membership to each rule is evaluated in the second step:
β i ( x p ) = k = 1 m μ i k ( x p k ) .
The third step is to define the output class by the maximum rule. The output of the classifier will be the class that corresponds to the rule with the highest degree of membership:
class ( x p ) = c j * , j * = arg max 1 i m β i ( x p ) ,
After the procedure of forming the output has finished, the constructed model can be evaluated using various performance indexes.

3.1.4. Classification Quality Evaluation

The most common classification quality criterion is the overall accuracy, which is the percentage of correct classification. In the observation table {(xp; cp), p ∈ [1, z]}, where z is the number of instances, the measure of accuracy can be given as follows:
A c c ( θ , S ) = p = 1 z { 1 ,   if   c p = arg max 1 j m f j ( x p ; θ , S ) 0 ,   otherwise z ,
where f(xp; θ, S) is the output of the fuzzy classifier with the parameter vector θ and the binary feature vector S at the point xp. As noted earlier, the overall accuracy is not an objective assessment of classification quality when there is an imbalance in the class distribution.
The geometric mean is a sensitive estimate for the accuracy of each class:
G M ( θ , S ) = ( i = 1 l A c c i ( θ , S ) ) 1 l ,
where Acci(θ, S) is the classification accuracy of ith class:
A c c i ( θ , S ) = p = 1 z i { 1 ,   if   c p = arg max 1 j m f j ( x p ; θ , S ) 0 ,   otherwise z i ,
where zi is the number of instances with the ith class label. Thus, the fewer instances represent a class, the geometric mean increases more significantly with an increment in the number of correctly classified instances of that class. In the case when one of the classes is classified absolutely incorrectly, the geometric mean is zero.
While using general accuracy as the objective function, on the one hand, the classifier prefers to focus on recognizing negative classes. The geometric mean, on the other hand, can lead to a large loss in the quality of classification of negative classes, even if the accuracy of positive classes is low. We propose to use a compromise option that combines both of these metrics and allows varying their importance degree using the coefficient γ ∈ [0; 1]:
F i t ( θ , S ) = γ G M ( θ , S ) + ( 1 γ ) A c c ( θ , S ) .
The problem of constructing a fuzzy classifier reduces to searching for the maximum of the selected function.

3.2. Training a Classifier with the Gravitational Search Algorithm

For selecting feature and tuning term parameters we suggest using the gravitational search algorithm in two versions, i.e., binary for optimizing the binary feature vector S and continuous for optimizing the continuous vector of term parameters θ. The GSA was first proposed by Rashedi, Nezamabadi-pour, and Saryazdi in 2009 [33], and in the same year, its binary version was described [34]. This algorithm is widely used to solve various problems. For example, the GSA was applied to optimize parameters in a geothermal power generation system in the study of Özkaraca and Keçebaş [35], to determine the location of a microseismic source in order to warn about explosions in tunnels in [36]. Mahanipour and Nezamabadi-pour described the use of GSA for the automatic creation of computer programs in [37] and the feature construction in [38].
The application of the binary and the continuous versions of the GSA for the fuzzy classifier has been described in detail earlier in [6]. In the binary GSA, a population of particles corresponding to binary feature vectors S is generated randomly. At each iteration, the algorithm calculates particle masses, gravity, acceleration, and velocity. Transformation functions are applied to transform the obtained speed value into a binary equivalent in order to update the feature vector. In this paper, we use the V-type transformation function:
IF   ( r a n d ( 0 ; 1 ) < | 2 π arctan ( π 2 v i d ( t + 1 ) ) | ) , then   s i d ( t + 1 ) = s i d ¯ ( t ) , else s i d ( t + 1 ) = s i d ( t ) ,
where rand(0;1) is a random number in the range from 0 to 1, v i d is the speed of the dth element of the ith particle, s i d is the value of the dth element of the ith feature vector, and t is the iteration number.
The continuous GSA optimizes the numerical vector θ, consisting of the term parameters. In this version of the algorithm, the population is formed as follows: The first vector is input to the algorithm after the stage of creating the classifier structure and the remaining vectors are generated based on the first one with some deviation. Unlike the binary version, in GSAc the vector value is updated by the simple addition of the current value and the calculated speed:
θ i d ( t + 1 ) = θ i d ( t ) + V i d ( t + 1 ) ,
where θ i d is the value of the dth element of the ith vector.
Five parameters are used in both versions of the GSA: the number of iterations t, the number of particles P, the value of the gravitational constant G0, the coefficient of the gravitational constant decrease α, and the variable for calculating the attractive force ε. The computational complexity of the GSA with n agents is O(n × d) where d is the search space dimension [39]. We did not modify the original GSA, therefore, both algorithms have the complexity O(P × d), where P is the number of particles and d is the size of the dataset.
The classifier training procedure is as follows: After the algorithm based on extremes values of classes has created the initial vector θ, the binary GSA searches for the optimal vector S; then, the classifier is rebuilt on the obtained set of features Sbest and the algorithm for optimizing the term parameters is launched; the continuous GSA runs for a given number of iterations and provides the best parameter vector θbest; and the resulting Sbest and θbest are used to construct and validate the classifier on test data.

4. Experimental Results

The experiment was performed on imbalanced binary datasets from the KEEL repository [40]. The sets are described in Table 1. Here, Fall is the number of features in a dataset, Strall is the number of lines, Str+ is the number of rows of the smallest class, Str- is the number of rows of the largest class, and IR is the imbalance ratio. The imbalance ratio is the ratio of the number of rows of a negative class to the number of rows of a positive class.
Five-fold cross-validation was applied in all stages of the experiment. The data were divided into five pairs of training and test samples. The structure of the fuzzy classifier was formed by the algorithm based on the extreme values of classes with symmetric Gaussian terms. Since only two classes are represented in all data, the number of rules in all cases was two.
In the first stage of the experiment, the efficiency of the continuous gravitational algorithm was tested when the priority coefficient γ in the fitness function was changed. The tuning of the fuzzy classifier parameters was carried out on full sets of features. The following parameters were set for the GSAc: 750 iterations, 15 particles, G0 = 10, α = 10, and ε = 0.01. The particle population was cleared after each 150th iteration, except for the best particle on the basis of which the population was generated anew. The parameters were chosen empirically as the most universal for the selected datasets.
Table 2 contains the results of the first experimental stage, used to assess the quality of the constructed model based on the following: the classification accuracy, the geometric mean, as well as the percentage of correctly classified instances of the positive class relative to the total number of instances of the positive class (true positive rate) and the percentage of correctly classified instances of the negative class relative to the total number of instances of the negative class (true negative rate). The table shows the results obtained on the test data as an average of three runs (Avr.), and the best one (Best).
The purpose of the second experimental stage consisted of verifying the effectiveness of GSA on the task of selecting features in the wrapper mode for the fuzzy classifier of imbalanced data. The binary gravitational algorithm with the same coefficient γ was run three times on each sample. Due to the stochasticity of the algorithm, one to three different feature sets could be obtained on the same sample. Next, a set of features with the highest fitness function value was selected. A classifier was built on this set; the parameters of the created model were tuned by the continuous algorithm. The obtained values of quality indicators were averaged over three independent runs of the GSAc.
The following parameters were empirically selected for the binary gravitational algorithm: 750 iterations, 15 particles, G0 = 10, α = 10, and ε = 0.01. The parameters of the continuous algorithm did not differ from those used at the first stage of the experiment. Table 3 shows the results of the classifier on the selected feature sets before parameter tuning (GSAb) and after optimization (GSAb + GSAc). In the following table and further, formatting the cells according to a color scale was used to visualize the results. The values presented in each row were compared with each other. The hue of the color depended on the relative magnitude of the value compared to other cells in the row. Thus, the worst results are marked in red, the best are highlighted in green, the remaining values are colored in intermediate colors.
Table 4 shows fuzzy classifiers based on the best feature sets. The best sets here are those that gain the highest averaged value of the objective function with a given value γ over five samples.

5. Discussion

To confirm the effectiveness of the gravitational algorithm for optimizing the fuzzy classifier of imbalanced data, we performed a five-stage comparison.
The task of the first stage was to check the quality of the fuzzy classifier in the presence of feature selection. For this purpose, we compared the results of fuzzy classifiers constructed on complete datasets (Table 1, average values for three runs) with those built on abbreviated sets of features (Table 3). In both cases, the results obtained after setting the GSAc parameters were taken into account. Table 5 shows the results of the pairwise comparison of the number of features by Wilcoxon’s sign rank criterion for linked samples. The significance level is 0.05; the null hypothesis states that the difference median between the two samples is zero.
The first three rows of the table are the comparison of the number of features in the original set (Fall) and in the selected feature sets (Fbin). The last three rows are the comparison of the number of features when using the GSAb with different values of the coefficient γ in the fitness function.
On the basis of the results of the verification, we conclude that the binary gravitational algorithm can significantly reduce the number of features working with imbalanced data in the wrapper mode of the fuzzy classifier. In addition, there is no significant difference in the number of features when using one or another value of γ.
Table 6 shows the results of comparing the performance indexes for classifiers built on complete and selected sets of features when changing the priority coefficient γ in the fitness function. The obtained values of the Wilcoxon’s sign rank criteria are grouped for each of four quality indexes (the total accuracy, the geometric mean, the percentage of correctly classified instances of the positive class, and the percentage of correctly identified instances of the negative class).
Thus, the results of the first stage of the comparison show that the use of the GSAb for selecting features in the wrapper mode of the fuzzy classifier of imbalanced data significantly reduces the number of features while maintaining or increasing the quality of classification.
In the second stage, the effectiveness of the binary gravitational algorithm was tested in comparison with popular methods of selecting features. We used a random search (RS) and a filtering algorithm based on mutual information (MI).
The filter was executed as follows: The value of mutual information was calculated for each feature with three randomly-selected neighbors. Next, the algorithm found the arithmetic mean of these values. The set of selected features included only those variables whose mutual information exceeded the value of the arithmetic mean. Both algorithms were run three times, among the obtained feature sets, those with the best accuracy were selected. Fuzzy classifiers were constructed on the selected feature sets using the algorithm based on extreme values of classes. The obtained values were compared with the results of fuzzy classifiers built on the feature sets found by the GSAb (Table 3). In this case, we considered the results without optimizing parameters. The average performance indexes of the classifiers are given in Table 7 (F is the number of features).
Table 8 demonstrates the results of a pairwise comparison of the performance indexes of the obtained systems by the criterion of Wilcoxon’s sign ranks for linked samples. Here STS is the standardized test statistic, p is the p-value, and NH is the null hypothesis. The left half of Table 8 shows the results of the comparison with the random search algorithm, the right half of the table demonstrates the comparison with the filter based on mutual information.
The algorithms are statistically indistinguishable by the number of selected features. But the value of the standardized test statistic shows that fuzzy classifiers, constructed on the features selected by the gravitational search algorithm, have higher classification quality values in most cases. Hence, the binary gravitational algorithm is more preferable for imbalanced data classification in contrast to the random search or the filter based on mutual information.
In the third stage of the comparison, we compared our results with fuzzy classifiers based on imbalanced data preprocessed by the SMOTE algorithm. We used a realization of the algorithm from the open library [41] and all parameters were taken by default. After applying SMOTE, the number of instances of the positive and negative classes was equal. Next, we conducted five-fold cross-validation. Fuzzy classifiers were constructed with the algorithm based on the extreme values of the classes. The feature selection was not produced. Table 9 presents the results of fuzzy classifiers averaged over five samples.
We compared the obtained results with the results demonstrated in Table 2, where fuzzy classifiers were constructed on complete sets of imbalanced data and optimized by the continuous GSA. The Wilcoxon’s criterion values for the third stage are presented in Table 10.
The comparison shows that fuzzy classifiers constructed on the original datasets and tuned by GSAc in relation to fuzzy classifiers built on oversampled data demonstrate better overall accuracy with comparable recognition quality of a positive class. Therefore, if for the classification task it is important not only to classify the positive class correctly, but also not to receive large losses in the recognition of a negative class, then a fuzzy classifier with parameter tuning with the GSAc is a more preferable tool.
At the next stage of comparison, the feature selection was carried out on the oversampled data. Table 11 presents the results of fuzzy classification averaged over five samples on subsets of features obtained by the random search algorithm.
Table 12 presents the values of the performance indexes obtained after selecting features by the filter based on mutual information.
We compared these values with the results of constructing fuzzy classifiers with feature selection and parameter tuning using the GSA on the initial datasets (Table 3). Table 13 shows the results of the comparison by the Wilcoxon test.
The results demonstrate that fuzzy classifiers optimized by the gravitational search algorithm show better results than fuzzy classifiers constructed on selected sets of features after data oversampling using the SMOTE.
The last stage of the comparison was to check the effectiveness of the fuzzy classifier using the GSA for selecting features and tuning parameters relative to the state-of-art classification algorithms. Using the open sklearn library, the following classifiers were built on complete data sets: Gaussian naive Bayes (GNB), logistic regression classifier (LR), decision tree classifier (DT), multilayer perceptron classifier (MLP), linear support vector classifier (LSV), K-nearest neighbors classifier with k = 3 (3NN), AdaBoost classifier (AB), random forest classifier (RF), and gradient boosting for classification (GB) [42]. All algorithm parameters were used by default.
Table 14 contains the results of constructing various classifiers on selected data sets. The last three columns show the fuzzy classifiers from Table 4.
The obtained values were compared using the criterion of Wilcoxon’s sign ranks for linked samples (Table A1, Table A2, Table A3 and Table A4). The fuzzy classifier demonstrates results comparable with analogues in terms of the overall accuracy and the geometric mean but has fewer features. It shows the best results for the TPrate value when the coefficient γ is equal to one. With the coefficient γ is equal to 0.5, the fuzzy classifier shows statistically comparable results with analogues by the value of TPrate and yields only to three algorithms by the value of TNrate.
Thus, if the chosen priority coefficient γ is zero, the proposed metric represents the overall accuracy. Then the classifier focuses on recognizing a negative class, and as a result, the model has a low value of the Type I error, but a high value of the Type II error.
In the case when γ is equal to 1, the function will be identical to the geometric mean. Then, the efficiency of the fuzzy classifier with respect to the positive class will increase. As a result, the Type II error will decrease, but the Type I error can increase significantly.
When using coefficient γ close to 0.5, a system with low values of both errors will be obtained simultaneously. The proposed metric can be useful for such data as vowel0, ecoli4, and yeast4, when a high-quality classification of one class can lead to large losses in the ability of the model to recognize another class.

6. Conclusions

We considered the possibility of applying the gravitational search algorithm to improve the efficiency of the fuzzy classifier in the presence of data imbalance. The binary GSA reduced the space of input features by selecting informative feature subsets in the wrapper mode for a fuzzy classifier. The continuous GSA helped to improve the quality of classification. We proposed a new metric that could influence the final performance indexes of the model by choosing the priority coefficient. The function with the ability to change priority between the number of correctly defined positive and negative classes allowed the developer to flexibly configure the fuzzy classifier. In future works, we plan to further study the impact of the coefficient in the metric on the result and make proposals about the recommended value of the coefficient for certain characteristics of the dataset.

Author Contributions

Conceptualization and methodology, M.B. and I.H.; software, M.B.; validation, I.H. and M.B.; investigation, K.S. and I.H.; writing, I.H. and M.B.; writing—review and editing, M.B., I.H., and A.K.; supervision, A.K. and A.S.; project administration, A.K. and A.S.; funding acquisition, A.S.

Funding

This research was supported by the Ministry of Education and Science of Russian Federation, Government Order no. 2.8172.2017/8.9 (TUSUR).

Conflicts of Interest

The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.

Appendix A

Table A1. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the overall accuracy.
Table A1. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the overall accuracy.
AlgorithmsFC, γ = 0FC, γ = 1FC, γ = 0.5
STSpNHSTSpNHSTSpNH
GNB−2.5210.012Reject−2.5210.012Reject−2.5210.012Reject
LR−0.210.833Retain0.70.484Retain0.420.674Retain
DT−0.560.575Retain0.420.674Retain0.140.889Retain
MLP−0.140.889Retain1.1220.262Retain0.70.484Retain
LSV−1.120.263Retain0.140.889Retain−0.280.779Retain
3NN01Retain0.980.327Retain0.70.484Retain
AB−0.140.889Retain0.980.327Retain0.70.484Retain
RF−0.4910.624Retain1.260.208Retain0.7710.441Retain
GB0.070.944Retain1.1830.237Retain0.8450.398Retain
Table A2. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the geometric mean accuracy of each class.
Table A2. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the geometric mean accuracy of each class.
AlgorithmsFC, γ = 0FC, γ = 1FC, γ = 0.5
STSpNHSTSpNHSTSpNH
GNB0.280.779Retain−2.5210.012Reject−2.5210.012Reject
LR0.4210.674Retain−1.540.123Retain−1.540.123Retain
DT1.120.263Retain−1.820.069Retain−1.680.093Retain
MLP1.260.208Retain−1.40.161Retain−1.260.208Retain
LSV−0.70.484Retain−1.9630.05Reject−1.820.069Retain
3NN1.260.208Retain−1.5210.128Retain−1.260.208Retain
AB0.840.401Retain−1.690.091Retain−1.260.208Retain
RF01Retain−1.680.093Retain−1.680.093Retain
GB0.840.401Retain−1.540.123Retain−1.40.161Retain
Table A3. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the true positive rate.
Table A3. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the true positive rate.
AlgorithmsFC, γ = 0FC, γ = 1FC, γ = 0.5
STSpNHSTSpNHSTSpNH
GNB1.8590.063Retain−0.420.674Retain0.420.674Retain
LR0.3380.735Retain−2.240.025Reject−1.540.123Retain
DT1.0140.31Retain−2.5210.012Reject−1.820.069Retain
MLP1.540.123Retain−2.0280.043Reject−1.260.208Retain
LSV−0.280.779Retain−1.9920.046Reject−1.680.093Retain
3NN1.5210.128Retain−2.5210.012Reject−1.40.161Retain
AB1.0140.31Retain−2.240.025Reject−1.40.161Retain
RF−0.1690.866Retain−2.5240.012Reject−1.680.093Retain
GB1.40.161Retain−2.240.025Reject−1.540.123Retain
Table A4. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the true negative rate.
Table A4. The results of the comparison of various classification algorithms with fuzzy classifiers optimized with the gravitational search algorithm by the value of the true negative rate.
AlgorithmsFC, γ = 0FC, γ = 1FC, γ = 0.5
STSpNHSTSpNHSTSpNH
GNB−2.5210.012Reject−2.3660.018Reject−2.5210.012Reject
LR−1.120.263Retain1.260.208Retain0.980.327Retain
DT−2.380.017Reject1.1830.237Retain0.980.327Retain
MLP−1.540.123Retain1.260.208Retain1.120.263Retain
LSV−1.680.093Retain0.560.575Retain0.420.674Retain
3NN−1.540.123Retain1.260.208Retain1.3320.183Retain
AB−1.540.123Retain2.3830.017Reject1.960.05Reject
RF−1.2620.207Retain2.1030.035Reject2.10.036Reject
GB−0.6310.528Retain2.380.017Reject2.10.036Reject

References

  1. Peng, L.; Zhang, H.; Yang, B.; Chen, Y. A new approach for imbalanced data classification based on data gravitation. Inf. Sci. 2014, 288, 347–373. [Google Scholar] [CrossRef]
  2. Special Issue on Recent advances in Theory, Methodology and Applications of Imbalanced Learning. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 763. [CrossRef]
  3. He, H.; Garcia, E.A. Learning from Imbalanced Data. IEEE Trans. Know. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
  4. Ali, A.; Shamsuddin, S.M.; Ralescu, A. Classification with class imbalance problem: A review. Int. J. Adv. Soft Comput. Appl. 2013, 5, 1–30. [Google Scholar]
  5. Mathew, J.; Pang, C.K.; Luo, M.; Leong, W.H. Classification of Imbalanced Data by Oversampling in Kernel Space of Support Vector Machines. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 4065–4076. [Google Scholar] [CrossRef] [PubMed]
  6. Bardamova, M.; Konev, A.; Hodashinsky, I.; Shelupanov, A. A Fuzzy Classifier with Feature Selection Based on the Gravitational Search Algorithm. Symmetry 2018, 10, 609. [Google Scholar] [CrossRef]
  7. He, H.; Ma, Y. (Eds.) Imbalanced Learning: Foundations, Algorithms, and Applications; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2013; p. 216. [Google Scholar]
  8. Hand, D. Measuring classifier performance: A coherent alternative to the area under the ROC curve. Mach. Learn. 2009, 77, 103–123. [Google Scholar] [CrossRef]
  9. Ferri, C.; Haernandez-Orallo, J.; Modroiu, R. An experimental comparison of performance measures for classification. Pattern Recognit. Lett. 2009, 30, 27–38. [Google Scholar] [CrossRef]
  10. Fernandez, J.C.; Carbonero, M.; Gutierrez, P.A.; Hervas-Martınez, C. Multi-objective evolutionary optimization using the relationship between F1 and accuracy metrics in classification tasks. Appl. Intell. 2019, 49, 3447–3463. [Google Scholar] [CrossRef]
  11. Lopez, V.; Fernandez, A.; Garcia, S.; Paladec, V.; Herrera, F. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 2013, 250, 113–141. [Google Scholar] [CrossRef]
  12. Lopez, V.; del Rio, S.; Benitez, J.M.; Herrera, F. Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data. Fuzzy Set Syst. 2015, 258, 5–38. [Google Scholar] [CrossRef]
  13. Vluymans, S.; Tarrago, D.S.; Saeys, Y.; Cornelis, C.; Herrera, F. Fuzzy rough classifiers for class imbalanced multi-instance data. Pattern Recognit. 2016, 53, 36–45. [Google Scholar] [CrossRef]
  14. Fernández, A.; Del Jesus, M.J.; Herrera, F. Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced datasets. Int. J. Approx. Reason. 2009, 50, 561–577. [Google Scholar] [CrossRef]
  15. Villarino, G.; Gómez, D.; Rodríguez, J.T.; Montero, J. A bipolar knowledge representation model to improve supervised fuzzy classification algorithms. Soft Comput. 2018, 22, 5121–5146. [Google Scholar] [CrossRef]
  16. Haixiang, G.; Li, Y.; Shang, J.; Mingyun, G.; Yuanyue, H.; Gong, B. Learning from class-imbalanced data: Review of methods and application. Expert Syst. Appl. 2017, 73, 220–239. [Google Scholar] [CrossRef]
  17. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  18. Liu, G.; Yang, Y.; Li, B. Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning. Knowl. Based Syst. 2018, 158, 154–174. [Google Scholar] [CrossRef]
  19. D‘Addabbo, A.; Maglietta, R. Parallel selective sampling method for imbalanced and large data classification. Pattern Recognit. Lett. 2015, 62, 61–67. [Google Scholar] [CrossRef]
  20. Diez-Pastor, J.F.; Rodriguez, J.J.; García-Osorio, C.; Kuncheva, L.I. Random balance: ensembles of variable priors classifiers for imbalanced data. Knowl. Based Syst. 2015, 85, 96–111. [Google Scholar] [CrossRef]
  21. Saez, J.A.; Luengo, J.; Stefanowski, J.; Herrera, F. SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 2015, 291, 184–203. [Google Scholar] [CrossRef]
  22. Lin, W.-C.; Tsai, C.-F.; Hu, Y.-H.; Jhang, J.-S. Clustering-based undersampling in class-imbalanced data. Inf. Sci. 2017, 409–410, 17–26. [Google Scholar] [CrossRef]
  23. Ofek, N.; Rokach, L.; Stern, R.; Shabtai, A. Fast-CBUS: a fast clustering-based undersampling method for addressing the class imbalance problem. Neurocomputing 2017, 243, 88–102. [Google Scholar] [CrossRef]
  24. Diao, R. Feature Selection with Harmony Search and Its Applications. Available online: https://www.researchgate.net/publication/283652269_Feature_selection_with_harmony_search_and_its_applications (accessed on 10 March 2019).
  25. Witten, I.H.; Frank, E. Data Mining Practical Machine Learning Tools and Techniques, 2nd ed.; Morgan Kaufmann: Amsterdam, The Netherlands, 2011; 558p. [Google Scholar]
  26. Liu, H.; Yu, L. Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Trans. Knowl. Data Eng. 2005, 17, 491–502. [Google Scholar]
  27. Senthamarai Kannan, S.; Ramaraj, N. A novel hybrid feature selection via symmetrical uncertainty ranking based local memetic search algorithm. Knowl. Based Syst. 2010, 23, 580–585. [Google Scholar] [CrossRef]
  28. Bonnlander, B.; Weigend, A. Selecting input variables using mutual information and nonparametric density estimation. Int. Symp. Artif. Neural Netw. 1994, 49, 42–50. [Google Scholar]
  29. Du, L.; Xu, Y.; Zhu, H. Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm. Ann. Data Sci. 2015, 2, 293–300. [Google Scholar] [CrossRef]
  30. Hernandez, J.C.H.; Duval, B.; Hao, J.-K. A genetic embedded approach for gene selection and classification of microarray data. In Lecture Notes in Computer Science. Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. 5th European Conference, EvoBIO 2007, Valencia, Spain, 11–13 April 2007; Marchiori, E., Moore, J.H., Rajapakse, J.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2007; Volume 4447, pp. 90–101. [Google Scholar] [CrossRef]
  31. Moayedikia, A.; Ong, K.-L.; Boo, Y.L.; Yeoh, W.G.S.; Jensen, R. Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 2017, 57, 38–49. [Google Scholar] [CrossRef]
  32. Hodashinsky, I.; Sarin, K. Feature Selection for Classification through Population Random Search with Memory. Autom. Remote Control 2019, 80, 324–333. [Google Scholar] [CrossRef]
  33. Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. GSA: A Gravitational Search Algorithm. Inf. Sci. 2009, 179, 2232–2248. [Google Scholar] [CrossRef]
  34. Rashedi, E.; Nezamabadi-pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2010, 9, 727–745. [Google Scholar] [CrossRef]
  35. Özkaraca, O.; Keçebaş, A. Performance analysis and optimization for maximum exergy efficiency of a geothermal power plant using gravitational search algorithm. Energy Convers. Manag. 2019, 185, 155–168. [Google Scholar] [CrossRef]
  36. Ma, C.; Jiang, Y.; Li, T. Gravitational Search Algorithm for Microseismic Source Location in Tunneling: Performance Analysis and Engineering Case Study. Rock Mech. Rock Eng. 2019, 1–18. [Google Scholar] [CrossRef]
  37. Mahanipour, A.; Nezamabadi-pour, H. GSP: an automatic programming technique with gravitational search algorithm. Appl. Intell. 2019, 49, 1502–1516. [Google Scholar] [CrossRef]
  38. Mahanipour, A.; Nezamabadi-pour, H. A multiple feature construction method based on gravitational search algorithm. Expert Syst. Appl. 2019, 127, 199–209. [Google Scholar] [CrossRef]
  39. Pelusi, D.; Mascella, R.; Tallini, L.; Nayak, J.; Naik, B.; Abraham, A. Neural network and fuzzy system for the tuning of Gravitational Search Algorithm parameters. Expert Syst. Appl. 2018, 102, 234–244. [Google Scholar] [CrossRef]
  40. Knowledge Extraction Based on Evolutionary Learning. Available online: http://keel.es (accessed on 10 May 2019).
  41. Lemaître, G.; Nogueira, F.; Aridas, C.K. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. J. Mach. Learn. Res. 2017, 18, 1–5. [Google Scholar]
  42. Scikit-learn. User Guide. Supervised Learning. Available online: https://scikit-learn.org/stable/supervised_learning.html#supervised-learning (accessed on 13 August 2019).
Figure 1. Example of a fuzzy partition of feature x1 by two symmetric fuzzy terms.
Figure 1. Example of a fuzzy partition of feature x1 by two symmetric fuzzy terms.
Symmetry 11 01458 g001
Table 1. Description of the datasets used in the experiment.
Table 1. Description of the datasets used in the experiment.
Data SetFallStrallStr+Str-IR
1vehicle0188461996473.25
2newthyroid25215351805.14
3segment019230832919796.02
4page-blocks010547255949138.79
5vowel013988908989.98
6cleveland-0vs4131771316412.62
7ecoli473362031615.8
8yeast48148451143328.1
Table 2. Classification results obtained while using the continuous gravitational search algorithm for tuning fuzzy classifier parameters.
Table 2. Classification results obtained while using the continuous gravitational search algorithm for tuning fuzzy classifier parameters.
γ010.250.50.75
Avr.BestAvr.BestAvr.BestAvr.BestAvr.Best
vehicle0
Acc.81.6482.5081.5282.0384.7585.1182.6286.2882.8284.28
GM57.4759.5669.2870.3674.8581.4574.2881.6374.1280.40
TPrate34.8437.6954.1055.7862.5677.2363.6574.8762.9374.82
TNrate96.0396.2989.9590.1191.5587.4888.4689.8088.9187.16
newthyroid2
Acc.98.7699.0798.7699.0798.6099.0798.9199.5398.4599.07
GM98.0598.2498.8599.4498.3599.4498.5499.7297.4698.24
TPrate97.1497.1499.05100.0098.10100.0098.10100.0096.1997.14
TNrate99.0799.4498.7098.8998.7098.8999.0799.4498.8999.44
segment0
Acc.91.2991.4290.8390.9091.4191.4691.1391.1690.8790.90
GM92.3692.8294.1594.3193.5393.5793.8593.9994.0594.07
TPrate93.9294.8399.0999.3996.6596.6597.8798.1898.7898.78
TNrate90.8590.8589.4689.4990.5390.6090.0189.9989.5689.59
page-blocks0
Acc.93.2493.9388.9691.0393.3794.1992.8593.5990.9991.01
GM65.3072.3976.7980.5974.6477.0574.0179.4274.1878.28
TPrate44.0753.3164.5269.5957.3660.6456.9565.3058.6465.86
TNrate98.8398.5591.7493.4797.4798.0196.9396.8094.6793.87
vowel0
Acc.92.1192.7188.5989.6796.8697.6796.1996.8695.7596.46
GM47.9954.0990.2292.1593.9496.6995.0196.7594.7597.04
TPrate36.6746.6792.5995.5690.7495.5693.7097.7893.7097.78
TNrate97.6695.8888.2089.0997.4897.8996.4494.7795.9696.33
cleveland-0vs4
Acc.92.8695.5187.0390.3791.7695.4990.7993.2588.9290.41
GM54.4373.1774.5080.3671.4773.0066.1072.2070.7376.32
TPrate38.4653.8561.5469.2356.6756.6746.1553.8557.7866.67
TNrate97.1598.7889.0292.0794.7398.7794.3196.3491.6792.69
ecoli4
Acc.96.9197.3294.2597.0296.9297.6295.7895.8494.8495.53
GM76.1779.0691.0695.9078.7181.7478.2585.8382.0785.73
TPrate61.0065.0088.3395.0065.0070.0066.0080.0073.3380.00
TNrate99.1899.3794.6297.1598.9499.3797.6696.8496.2096.52
yeast4
Acc.96.5296.6381.8186.6692.5792.0589.3188.5485.4788.88
GM2.116.3278.2880.1869.1274.7676.5583.2274.4178.20
TPrate0.651.9675.1674.5151.4560.5566.0178.4364.6168.55
TNrate100.00100.0082.0487.0994.0393.1790.1488.9086.2189.60
Table 3. The results of constructing fuzzy classifiers on imbalanced datasets obtained with feature selection and parameter tuning.
Table 3. The results of constructing fuzzy classifiers on imbalanced datasets obtained with feature selection and parameter tuning.
γ00110.50.5
GSAbGSAb + GSAcGSAbGSAb + GSAcGSAbGSAb + GSAc
Datasetvehicle0
Features10.207.609.00
Accuracy83.3384.4380.6177.0781.0984.28
GM66.4067.2575.2878.0171.7878.28
TPrate47.2447.9167.3483.0858.7969.35
TNrate94.4495.6784.7075.2287.9488.87
Datasetnewthyroid2
Features3.603.203.20
Accuracy99.5399.0798.6098.4598.6098.45
GM98.5297.0399.1698.6699.1698.66
TPrate97.1494.29100.0099.05100.0099.05
TNrate100.00100.0098.3398.3398.3398.33
Datasetsegment0
Features7.206.606.80
Accuracy97.3697.8896.4598.7397.4098.60
GM95.7696.8096.2898.7997.0898.34
TPrate93.6295.3496.0598.8996.6697.97
TNrate97.9898.3096.5198.7097.5298.70
Datasetpage-blocks0
Features3.804.202.80
Accuracy93.6094.4988.5488.1392.2092.59
GM67.8574.6574.1481.9373.3176.89
TPrate46.6956.6560.0075.1956.1761.96
TNrate98.9498.8091.8089.6196.3096.07
Datasetvowel0
Features6.206.606.60
Accuracy88.8692.1187.4597.6488.2597.20
GM85.6475.5990.0296.9788.9494.85
TPrate82.2267.7893.3396.3090.0092.22
TNrate89.5394.5486.8697.7788.0897.70
Data setcleveland-0vs4
Features4.006.806.60
Accuracy93.7893.7988.7092.0685.8689.97
GM39.1747.8082.3882.4668.0166.57
TPrate30.7733.3376.9274.3653.8548.72
TNrate98.7898.5889.6393.5088.4193.29
Datasetecoli4
Features3.003.203.00
Accuracy98.2198.0296.1394.1497.9297.12
GM89.0186.8987.3584.3685.8187.11
TPrate80.0076.6780.0076.6775.0078.33
TNrate99.3799.3797.1595.2599.3798.31
Datasetyeast4
Features3.203.202.40
Accuracy96.2396.2378.2484.0587.2690.43
GM6.306.3066.9977.6967.0579.26
TPrate1.961.9658.8271.9052.9469.28
TNrate99.5899.5878.9384.4888.4991.18
Table 4. The results of constructing fuzzy classifiers on the best feature sets found by the binary gravitational algorithm.
Table 4. The results of constructing fuzzy classifiers on the best feature sets found by the binary gravitational algorithm.
MetricsResults
DataSetvehicle0
γ010.5
Features1, 4, 8, 9, 10, 13, 14, 15, 161, 4, 6, 7, 9, 10, 12, 13, 15, 16, 181, 5, 7, 9, 10, 11, 12, 15, 16, 17, 18
F91111
Acc.85.0778.3084.00
GM66.8782.8680.58
TPrate46.4093.3375.88
TNrate96.9673.6486.50
Datasetnewthyroid2
γ010.5
Features1, 2, 3, 51, 2, 51, 2, 5
F433
Acc.99.5399.8499.53
GM98.5299.5199.32
TPrate97.1499.0599.05
TNrate100.00100.0099.63
Datasetsegment0
γ010.5
Features1, 4, 6, 11, 14, 18, 191, 6, 8, 14, 16, 186, 8, 11, 14, 18, 19
F766
Acc.98.9399.0899.02
GM98.2299.0898.66
TPrate97.2699.0998.18
TNrate99.2199.0799.16
Datasetpage-blocks0
γ010.5
Features1, 2, 5, 104, 104, 10
F422
Acc.94.7791.7592.85
GM77.0684.9281.52
TPrate60.3577.2270.60
TNrate98.6893.4195.39
Datasetvowel0
γ010.5
Features5, 6, 7, 8, 9, 10, 134, 5, 6, 7, 9, 134, 5, 6, 7, 8, 13
F766
Acc.96.3998.1897.74
GM82.8398.1197.41
TPrate70.0098.1597.04
TNrate99.0398.1897.81
Datasetcleveland-0vs4
γ010.5
Features4, 8, 101, 4, 7, 9, 10, 1310, 12
F362
Acc.94.7293.0493.02
GM54.9685.1986.17
TPrate41.0376.9282.05
TNrate98.9894.3193.90
Datasetecoli4
γ010.5
Features5, 6, 72, 3, 4, 5, 72, 3, 5, 7
F354
Acc.98.7196.1397.92
GM88.9090.1193.88
TPrate80.0085.0090.00
TNrate99.8996.8498.42
Datasetyeast4
γ010.5
Features1, 2, 3, 7, 81, 3, 51, 3
F532
Acc.95.6284.0591.19
GM19.7379.9380.40
TPrate9.876.4770.59
TNrate98.6784.3291.93
Table 5. The results of comparing classifiers by the number of selected features.
Table 5. The results of comparing classifiers by the number of selected features.
Feature SetsStandardized Test Statisticp-ValueNull Hypothesis
FallFbin, γ = 02.5210.012Reject
FallFbin, γ = 12.5210.012Reject
FallFbin, γ = 0,52.5240.012Reject
Fbin, γ = 0 − Fbin, γ = 101Retain
Fbin, γ = 0 − Fbin, γ = 0.50.8510.395Retain
Fbin, γ = 1 − Fbin, γ = 0.50.6380.524Retain
Table 6. The results of comparing classification performance indexes in the absence and presence of feature selection performed using the binary gravitational search algorithm.
Table 6. The results of comparing classification performance indexes in the absence and presence of feature selection performed using the binary gravitational search algorithm.
MetricγStandardized Test Statisticp-ValueNull Hypothesis
Accuracy (all - bin)0−2.1970.028Reject
1−0.980.327Retain
0.5−1.680.093Retain
GM (all - bin)0−1.820.069Retain
1−1.40.161Retain
0.5−2.240.025Reject
TPrate (all - bin)0−1.5440.123Retain
1−1.0510.293Retain
0.5−2.0360.042Reject
TNrate (all - bin)0−0.730.465Retain
1−0.5940.553Retain
0.5−0.8770.38Retain
Table 7. The results of constructing fuzzy classifiers obtained with the different feature selection algorithms.
Table 7. The results of constructing fuzzy classifiers obtained with the different feature selection algorithms.
Alg.GSAbRSMIAlg.GSAbRSMI
γ = 0γ = 1γ = 0.5γ = 0γ = 1γ = 0.5
Datavehicle0Datavowel0
F10.207.609.006.608.40F6.206.606.605.805.40
Acc.83.3380.6181.0970.0879.43Acc.88.8687.4588.2577.9385.29
GM66.4075.2871.7862.6765.45GM85.6490.0288.9481.1375.59
TPrate47.2467.3458.7953.2948.22TPrate82.2293.3390.0077.1770.00
TNrate94.4484.7087.9475.2689.02TNrate89.5386.8688.0885.5686.55
Datanewthyroid2Datacleveland-0_vs_4
F3.603.203.202.803.00F4.006.806.606.403.00
Acc.99.5398.6098.6095.3599.53Acc.93.7888.7085.8653.5198.22
GM98.5299.1699.1694.8598.52GM39.1782.3868.0139.5288.49
TPrate97.14100.0100.094.2997.14TPrate30.7776.9253.8553.7580.00
TNrate100.098.3398.3395.56100.0TNrate98.7889.6388.4156.6799.37
Datasegment0Dataecoli4
F7.206.606.8010.609.40F3.003.203.006.404.80
Acc.97.3696.4597.4090.9991.12Acc.98.2196.1397.9296.7387.34
GM95.7696.2897.0888.6785.08GM89.0187.3585.8168.7088.90
TPrate93.6296.0596.6685.7277.85TPrate80.0080.0075.0050.0091.11
TNrate97.9896.5197.5291.8693.33TNrate99.3797.1599.3799.6886.96
Datapage-blocks0Datayeast4
F3.804.202.806.805.60F3.203.202.406.403.00
Acc.93.6088.5492.2081.4987.65Acc.96.2378.2487.2694.2191.24
GM67.8574.1473.3159.8051.98GM6.3066.9967.0529.4662.79
TPrate46.6960.0056.1742.0431.82TPrate1.9658.8252.9416.0045.64
TNrate98.9491.8096.3085.9894.00TNrate99.5878.9388.4997.0092.88
Table 8. Comparison of fuzzy classifier results obtained using different algorithms for feature selection.
Table 8. Comparison of fuzzy classifier results obtained using different algorithms for feature selection.
AlgorithmSTSpNHAlgorithmSTSpNH
Features
RS - GSA (γ = 0)0.9810.326RetainMI - GSA (γ = 0)0.2810.778Retain
RS - GSA (γ = 1)1.1230.261RetainMI - GSA (γ = 1)0.4210.674Retain
RS - GSA (γ = 0.5)1.1220.262RetainMI - GSA (γ = 0.5)0.350.726Retain
Accuracy
RS - GSA (γ = 0)−2.5210.012RejectMI - GSA (γ = 0)−1.8590.063Retain
RS - GSA (γ = 1)−1.40.161RetainMI - GSA (γ = 1)−0.140.889Retain
RS - GSA (γ = 0.5)−1.960.05RejectMI - GSA (γ = 0.5)−0.70.484Retain
GM
RS - GSA (γ = 0)−1.260.208RetainMI - GSA (γ = 0)−0.1690.866Retain
RS - GSA (γ = 1)−2.5210.012RejectMI - GSA (γ = 1)−1.680.093Retain
RS - GSA (γ = 0.5)−2.5210.012RejectMI - GSA (γ = 0.5)−1.260.208Retain
TPrate
RS - GSA (γ = 0)−0.140.889RetainMI - GSA (γ = 0)0.3380.735Retain
RS - GSA (γ = 1)−2.5210.012RejectMI - GSA (γ = 1)−1.820.069Retain
RS - GSA (γ = 0.5)−2.3710.018RejectMI - GSA (γ = 0.5)−0.840.401Retain
TNrate
RS - GSA (γ = 0)−2.3830.017RejectMI - GSA (γ = 0)−2.1970.028Reject
RS - GSA (γ = 1)−1.120.263RetainMI - GSA (γ = 1)0.840.401Retain
RS - GSA (γ = 0.5)−1.6820.092RetainMI - GSA (γ = 0.5)−0.140.889Retain
Table 9. Results of fuzzy classifiers after using the over-sampling algorithm.
Table 9. Results of fuzzy classifiers after using the over-sampling algorithm.
Metricsvhc0nth2sgm0pbl0vwl0clv04ecl4yst4
Accuracy66.4699.1789.9768.4950.0095.5783.9172.29
GM60.5099.1689.9363.190.0095.5083.9072.07
TPrate69.6898.3388.6273.310.0094.0183.7873.79
TNrate63.26100.0091.3163.68100.0097.1484.0470.80
Table 10. Comparison of fuzzy classification results with and without preprocessing.
Table 10. Comparison of fuzzy classification results with and without preprocessing.
MetricsAlgorithmsSTSpNH
AccuracySMOTE - GSA (γ = 0)−1.960.05Reject
SMOTE - GSA (γ = 1)−1.960.05Reject
SMOTE - GSA (γ = 0.5)−1.960.05Reject
GMSMOTE - GSA (γ = 0)0.840.401Retain
SMOTE - GSA (γ = 1)−1.40.161Retain
SMOTE - GSA (γ = 0.5)−0.840.401Retain
TPrateSMOTE - GSA (γ = 0)1.40.161Retain
SMOTE - GSA (γ = 1)−0.140.889Retain
SMOTE - GSA (γ = 0.5)0.840.401Retain
TNrateSMOTE - GSA (γ = 0)−1.260.208Retain
SMOTE - GSA (γ = 1)−0.840.401Retain
SMOTE - GSA (γ = 0.5)−1.120.263Retain
Table 11. Results of fuzzy classifier construction on features selected after using the SMOTE algorithm.
Table 11. Results of fuzzy classifier construction on features selected after using the SMOTE algorithm.
Metricsvhc0nth2sgm0pbl0vwl0clv04ecl4yst4
F.8.403.008.801.205.601.006.202.00
Acc.60.5197.7886.0062.0549.3890.9886.5250.70
GM44.9897.7585.7548.9014.6290.9186.2011.57
TPrate69.22100.0092.2671.356.2593.3987.6740.77
TNrate51.7895.5679.7552.7492.5088.5785.3860.63
Table 12. Results of constructing fuzzy classifiers on subsets of features found by the filter after using the oversampling algorithm.
Table 12. Results of constructing fuzzy classifiers on subsets of features found by the filter after using the oversampling algorithm.
Metricsvhc0nth2sgm0pbl0vwl0clv04ecl4yst4
F.8.601.8012.004.006.005.005.205.00
Acc.69.4794.7290.4053.8152.1991.3089.4871.49
GM66.6594.5590.3427.499.8490.8689.4468.40
TPrate72.7589.4489.0862.835.0086.7588.7871.49
TNrate66.20100.0091.7244.7899.3895.8790.1871.50
Table 13. Comparison of the results of constructing fuzzy classifiers on oversampled and origin data using the selection of features.
Table 13. Comparison of the results of constructing fuzzy classifiers on oversampled and origin data using the selection of features.
MetricsAlgorithm 1Algorithm 2Standardized Test Statisticp-ValueNull Hypothesis
FeaturesSMOTE + RSGSA (γ = 0)−0.8410.4Retain
GSA (γ = 1)−0.6310.528Retain
GSA (γ = 0.5)−0.70.484Retain
SMOTE + MIGSA (γ = 0)0.9830.326Retain
GSA (γ = 1)0.7710.441Retain
GSA (γ = 0.5)0.840.401Retain
AccuracySMOTE + RSGSA (γ = 0)−2.5210.012Reject
GSA (γ = 1)−2.5210.012Reject
GSA (γ = 0.5)−2.240.025Reject
SMOTE + MIGSA (γ = 0)−2.5210.012Reject
GSA (γ = 1)−2.5210.012Reject
GSA (γ = 0.5)−2.380.017Reject
GMSMOTE + RSGSA (γ = 0)−0.840.401Retain
GSA (γ = 1)−1.8230.068Retain
GSA (γ = 0.5)−1.9630.05Reject
SMOTE + MIGSA (γ = 0)−0.420.674Retain
GSA (γ = 1)−1.820.069Retain
GSA (γ = 0.5)−1.540.123Retain
TPrateSMOTE + RSGSA (γ = 0)1.260.208Retain
GSA (γ = 1)−0.980.327Retain
GSA (γ = 0.5)01Retain
SMOTE + MIGSA (γ = 0)0.980.327Retain
GSA (γ = 1)−0.840.401Retain
GSA (γ = 0.5)0.140.889Retain
TNrateSMOTE + RSGSA (γ = 0)−2.5210.012Reject
GSA (γ = 1)−2.5210.012Reject
GSA (γ = 0.5)−2.5210.012Reject
SMOTE + MIGSA (γ = 0)−2.0280.043Reject
GSA (γ = 1)−1.680.093Retain
GSA (γ = 0.5)−1.680.093Retain
Table 14. The results of constructing various classification algorithms on imbalanced datasets.
Table 14. The results of constructing various classification algorithms on imbalanced datasets.
Data SetsClassification AlgorithmsFuzzy Classifiers
vhc0GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.64.996.693.698.196.894.896.295.696.585.178.384.0
GM70.795.691.797.496.092.395.694.195.066.982.980.6
TPrate85.494.088.496.094.587.994.591.592.546.493.375.9
TNrate58.697.495.298.897.596.996.896.997.797.073.686.5
nth2GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.96.398.195.897.798.198.199.196.797.299.599.899.5
GM96.595.391.195.096.595.198.291.591.698.599.599.3
TPrate97.191.485.791.494.391.497.185.785.797.199.099.0
TNrate96.199.497.898.998.999.499.498.999.4100.0100.099.6
sgm0GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.83.399.799.299.796.899.399.699.499.398.999.199.0
GM89.299.398.498.497.699.199.198.598.398.299.198.7
TPrate98.598.897.399.199.198.898.597.397.097.399.198.2
TNrate80.799.999.599.896.499.499.799.899.799.299.199.2
pbl0GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.88.794.195.395.493.995.394.395.796.594.891.892.9
GM65.474.985.186.171.883.484.186.387.777.184.981.5
TPrate47.458.174.676.053.371.273.576.479.260.377.270.6
TNrate93.498.197.797.698.598.196.697.998.498.793.495.4
vwl0GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.93.791.295.294.789.394.496.295.597.096.498.297.7
GM87.371.086.280.565.878.381.778.584.282.898.197.4
TPrate81.158.977.873.355.663.368.963.374.470.098.197.0
TNrate95.094.497.096.992.797.698.998.899.299.098.297.8
clv04GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.87.995.491.395.992.595.993.193.193.094.793.093.0
GM84.982.060.180.260.880.255.237.045.555.085.286.2
TPrate84.669.246.269.246.269.238.523.153.841.076.982.1
TNrate88.197.595.098.196.398.197.598.896.399.094.393.9
ecl4GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.81.293.494.694.094.093.495.896.796.798.796.197.9
GM83.985.774.683.688.685.781.878.284.588.990.193.9
TPrate95.080.060.075.085.080.070.065.075.080.085.090.0
TNrate80.494.396.895.394.694.397.598.798.199.996.898.4
yst4GNBLRDTMLPLSV3NNABRFGBγ = 0γ = 1γ = 0.5
Acc.16.096.696.095.396.596.896.496.096.495.684.191.2
GM34.630.154.744.76.347.547.230.145.119.779.980.4
TPrate96.111.831.427.52.023.523.511.823.59.876.570.6
TNrate13.199.798.397.799.999.499.099.099.098.784.391.9

Share and Cite

MDPI and ACS Style

Bardamova, M.; Hodashinsky, I.; Konev, A.; Shelupanov, A. Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data. Symmetry 2019, 11, 1458. https://doi.org/10.3390/sym11121458

AMA Style

Bardamova M, Hodashinsky I, Konev A, Shelupanov A. Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data. Symmetry. 2019; 11(12):1458. https://doi.org/10.3390/sym11121458

Chicago/Turabian Style

Bardamova, Marina, Ilya Hodashinsky, Anton Konev, and Alexander Shelupanov. 2019. "Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data" Symmetry 11, no. 12: 1458. https://doi.org/10.3390/sym11121458

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop