Application of the Gravitational Search Algorithm for Constructing Fuzzy Classifiers of Imbalanced Data

: The presence of imbalance in data significantly complicates the classification task, including fuzzy systems. Due to a large number of instances of bigger classes, instances of smaller classes are not recognized correctly. Therefore, additional tools for improving the quality of classification are required. The most common methods for handling imbalanced data have several disadvantages. For example, methods for generating additional instances of minority classes can worsen classification if there is a strong overlap of instances from different classes. Methods that directly modify the fuzzy classification algorithm lead to a decline in the interpretability of the model. In this paper, we study the efficiency of the gravitational search algorithm in the tasks of selecting the features and tuning the term parameters for fuzzy classifiers of imbalanced data. We consider only data with two classes and apply the algorithm based on extreme values of classes to construct models with a minimum number of rules. In addition, we propose a new quality metric based on the sum of the overall accuracy and the geometric mean with the presence of a priority coefficient between them.


Introduction
The classification task is to divide objects in the feature space into classes or categories based on retrospective observations with the given class label values. Real data are characterized by an imbalanced distribution of classes when the number of instances in some classes exceeds the number of instances in other classes. This situation is mainly explained by the limited occurrence of minority class instances [1]. For example, the normal web browsing traffic is dominant when classifying traffic on the Internet. However, detection of rare malicious connections is very important for training [1]. Similar examples can be given from the field of medical diagnosis, detection of bank fraud, and diagnosis of equipment malfunctions.
The search for regularities in imbalanced data is a difficult task for specialists in data mining, machine learning, pattern recognition, and statistics [2]. The main problem of constructing classifiers of imbalanced data is poor adaption of standard training algorithms, which leads to a significant reduction in the effectiveness of classification. Due to the imbalance between classes, the standard classifier usually defines instances of minority classes incorrectly, since the model is retrained on instances of bigger classes [1].
It is not enough to evaluate the constructed classifier of imbalanced data using the overall accuracy [3]. Positive classes (with the smallest number of instances) are usually more important than negative classes (with the biggest number of instances). Reducing misclassification of minority class instances is crucial in many real-world challenges [4,5]. However, improving the classification quality of positive classes often leads to poor recognition of instances of negative classes, as instances of different classes often intersect. Thus, in each data classification task, the developer of the data analysis system needs to prioritize; either to focus on improving the overall accuracy or try to correctly identify positive instances with some worsening in the definition of negative ones, or to look for some compromise. Finally, it all depends on the purpose of creating the model and the requirements for it.
There is a large list of classification methods, for example, naive Bayes classifiers, support vector machines, artificial neural networks, and others. Unlike other methods, fuzzy classification does not imply the existence of rigid boundaries between neighboring classes. A classifying object may belong to several classes with various degrees of confidence. The advantage of a fuzzy classifier is understandability and interpretability of the rules, which makes fuzzy classifiers a practically useful data analysis tool.
In many real-world applications, an accurate, but also a computationally simple system, is required. Therefore, we propose to use two procedures for constructing a fuzzy classifier. The first is to shrink the input feature space to reduce complexity. The second is to tune the fuzzy classifier parameters, which increases the definition quality of the output class label. Since these two procedures are formulated as optimization problems, a single optimization algorithm is applied to solve both of them. We use the gravitational search algorithm (GSA), which has previously proven itself well when working with a fuzzy classifier [6].
Since the goal of our work is to improve the efficiency of the fuzzy classifier of imbalanced data, it is necessary to choose an appropriate metric to use as a fitness function for the GSA. We explore the possibilities of applying the following metrics: the overall accuracy, the geometric mean, and a new function that combines the two previous estimates to find a compromise version of the classifier.
The main contributions of this paper are as follows.
1. We propose a new metric based on the sum of the overall accuracy and the geometric mean of each class accuracy. The presence of the coefficient controls the priority of the estimates used. 2. We demonstrated the use of the feature selection method based on the binary gravitational search algorithm in order to reduce the effect of imbalance on classification. The application of the new metric as the fitness function assisted to find subsets of relevant features for both classes. 3. We presented the combination of binary and continuous algorithms for constructing fuzzy classifiers of imbalanced data. The continuous gravitational search algorithm helped to increase the quality of classification on selected features.
This article is organized as follows. Section 2 discusses the levels of problems when working with imbalanced data and provides basic methods for solving them. The procedure for constructing a fuzzy classifier and objective functions under consideration are described in Section 3. Section 4 gives a short description of the gravitational search algorithm. Sections 5 and 6 present the experimental results and their analysis, respectively. Finally, we present the conclusions of our work in Section 7.

Related Works
Here, we represent the main approaches to improving the quality of imbalanced data classification. There are three levels of training problems on such data which include: (1) Problems associated with the definition of classification performance indexes, (2) problems related to the learning algorithm, and (3) problems related to the training data [7].
The first level is determined by the lack of an objective method for evaluating (quantitative measures) existing knowledge to select the optimal classifier. The understanding that the overall accuracy is an insufficient measure for classifying imbalanced data has led to the application of new metrics such as the AUC (the area under the ROC curve) [8], the geometric mean, the balanced accuracy, the Fβ-measure, and others [9]. To assess the effectiveness of the classifiers, the authors in [9] have proposed 18 indicators, which are classified into the following three categories: 1. Threshold metrics geared towards minimizing the number of errors, i.e., the overall accuracy, the averaged accuracy (arithmetic and geometric), the Fβ-measure, and the Kappa-statistics; 2. Metrics based on the probabilistic understanding of an error and used to assess the reliability of classifiers, such as the mean absolute error, the mean square error, and the cross-entropy; 3. Metrics based on estimating instance separability, for example, the AUC, which is equivalent to Mann-Whitney-Wilcoxon statistics [9] for two classes.
After analyzing the above 18 indicators, the authors of [7] conclude that the choice of metrics for imbalanced data is of paramount importance. Fernandez et al. [10] have described the use of a multiobjective evolutionary algorithm with a pair of metrics, which are the overall accuracy and the F1 measure. They concluded that the algorithm with simultaneous optimization of this pair of metrics can lead to a balanced accuracy for both classes.
Classification algorithms make some changes to their construction and training processes in order to reduce the influence of imbalance in classes on the classification quality [7].
Cost-sensitive learning methods are based on modifying the classification algorithm so that the costs of misclassifying the instances of minority classes are greater as compared with the instances of majority classes. A typical solution, here, is to use a weight matrix that takes into account the costs of each incorrectly classified instance [11]. This solution is not suitable for a fuzzy classifier since it does not estimate the probability of assigning an object to a particular class.
There is a small list of methods for creating fuzzy classifiers in the presence of imbalance. Weights were added to fuzzy rules in [12][13][14]. Adding a weight function allows setting the priority of some rules over others when determining the output of the classifier. The weight values are most often configured by optimization algorithms. Another method of changing the fuzzy classification tool is to introduce a bipolar model using the principle of labeling the class, called maximum rule. The adjusted degree of belonging to each class is calculated based on the positive and negative degrees of membership in the bipolar fuzzy classifier [15]. The disadvantage of this model is the need to additionally introduce and adjust the matrix of dissimilarity coefficients and the difficulty to apply this method with another principle of assigning labels. Furthermore, the addition of supplementary modifications to fuzzy systems complicates the interpretation of the resulting models. Consequently, methods for improving the quality of the classifier without interfering directly with the classification algorithm are relevant.
Data play an integral role in machine learning and data mining research. A number of data preprocessing methods have been developed in order to correct the imbalance in the data. Over-sampling methods based on increasing instances of a positive class try to produce a balanced dataset by creating additional instances of the minority class, while undersampling methods reduce the number of majority class instances to achieve a quantitative balance. The most famous representative of oversampling is the SMOTE and its modifications [5,[16][17][18], in which the generation of new instances from a positive class depends on the measure of proximity to existing instances. Among undersampling methods, random undersampling (RUS) is often used. This non-heuristic method aims to eliminate class imbalance by randomly excluding instances of the negative class. Obviously, the disadvantage of RUS is the loss of information about data of a negative class [7,16,19].
Hybrid methods that combine two previous strategies of adding and removing data instances are described in [20,21]. In order to preserve useful information about majority classes, clustering methods have recently been applied [7,22,23].
Preprocessing methods are universal and easy to apply but have low efficiency and cannot be used as the only tool for solving the imbalance problem in classes. In addition, creating new instances of data is not acceptable for some classification tasks. For example, the artificial creation of patient's records can lead to errors in diagnosing diseases.
Another way to change data to improving the quality of recognizing minority classes is by carry out a procedure for selecting informative features. Feature selection consists of selecting, from the input feature space, such a subset that would have fewer attributes but provide comparable classification accuracy relative to the full set. The formed subset should be sufficient to adequately represent all classes in the training samples. Selection methods are usually divided into four types, namely, integrated methods, filters, wrappers, and hybrid methods.
A peculiarity of the integrated (built-in) methods is the principle of feature selection, which is part of the general mechanism of training a model on specific data [24]. An example of applying such methods is the selection of features during training a decision tree [25]. However, not every classification algorithm embeds the selection process into the learning process.
Filtering methods, on the contrary, are universal, as they are used independently of the classifier at the stage of data preparation. Four groups of filters are distinguished in [26]. The methods which make up the first group are based on the distance. They select features that provide the greatest distance between classes. The second group of filters uses the calculation of the amount of information. Such methods select features which, when attached to an existing set, reduce its entropy [27]. The third group determines the relationship between features and classes using the correlation coefficient or mutual information [28]. The fourth group is represented by filters that minimize the number of inconsistent features. A case of inconsistency is the presence of two instances belonging to different classes but having the same values of the same features. Filter algorithms are easy to use but have low efficiency.
Wrappers are methods that evaluate each subset of features based on the effectiveness of the constructed classifier. As a search algorithm, they usually use metaheuristic algorithms. Since such algorithms are iterative, the classifier needs to be reconstructed after each iteration. Wrapper methods can require considerable time and resources for large datasets [24]. The advantage of wrappers is the ability to choose a set of features that will be optimal for a particular classification algorithm.
The method of applying a genetic algorithm for feature selection in the wrapper mode based on the SVM classifier is described in [29]. The fitness function of this algorithm is a measure consisting of a compromise between the geometric mean and the share of selected features. The results showed that the proposed method selects features that improve the recognition of minority classes.
Hybrid feature selection methods consist of a combination of filters and wrappers. First, a filter is used for preliminary selection, then a classifier is built on the resulting subset and a wrapper algorithm is launched [30]. This approach is described in [31], which uses symmetric uncertainty for filtering in order to weigh features relative to their dependence on class labels, and the harmonic search as the wrapper algorithm. Hybrid selection methods can be a good solution for data with a large number of features.

The Fuzzy Classifier Structure
Classification algorithms determine the most suitable class from the set of all classes C = {c1, c2, …, cl} to each object xp = {xp1, xp2, …, xpm} from the set of n objects (p ∈ [1, n]), where xpk is the value of the kth feature of the pth object, k ∈ [1, m], m is the number of features. The fuzzy classifier is constructed on the basis of production rules, each of which has its own set of fuzzy terms. A fuzzy term is a structure on the feature definition domain, reflecting the degree of object membership to a rule. The terms can be described by membership functions of various kinds such as triangles, trapezoids, bells, or Gaussian-type functions. In this work, we used the membership functions of the Gaussian type, which differ from others by the property of symmetry. Figure 1 shows an example of partitioning some attribute x1 by Gaussian terms. A Gaussian fuzzy term characterizing the kth feature in the ith rule is given by the following expression: where i is the rule number to which the term (i ∈ [1, r]) belongs, r is the number of fuzzy rules, b is the coordinate of the term vertex, and c is the function dispersion. The term parameters listed sequentially for each feature compose the antecedent vector θ = (b11, c11, b12, c12, b13, c13, b21, c21, …, bmr, cmr). The standard fuzzy rule consists of the antecedent part, which lists the variables and their terms, and the consequent part, which specifies the output class label as: Ri: If x1 is Ti1 and x2 is Ti2 and … and xm is Tim then class is cj, where cj is the label of the jth class from the set of classes C, class is an output variable.
To use the possibility of feature selection in the wrapper mode, the binary feature vector S = (s1, s2, …, sm) must be introduced into the antecedent part. If sk = 1, then the kth feature is taken into account in the classification; otherwise the feature is ignored. Given the vector S, the fuzzy rule will change as follows: Ri: If (s1˄x1) is Ti1 and (s2˄x2) is Ti2 and … and (sm˄xm) is Tim then class is cj, where the record (sp˄xp) indicates the use (sp = 1) or ignorance (sp = 0) of the feature and its terms in the classifier. The binary vector S = (s1, s2, …, sm) is formed by the feature selection algorithm.

Generation of the Fuzzy Rule Base
There are various methods for generating fuzzy terms and forming a fuzzy classifier rule base such as uniform partitioning, random generation, clustering [32], and others. In this paper, we apply an algorithm based on the extreme values of classes of the training data. This algorithm constructs compact classifiers by using the minimum possible number of rules. In this case, the number of rules is equal to the number of classes, that is, there is one rule for each class.
The algorithm based on extreme values of classes is presented in [6]. The first step is to determine the minimum and maximum values of the features for each class. In the second step, the terms are generated in such a way that the entire definition area is covered in the interval between the two extremes, and the top of the term is located in the middle of this segment. In the third step, the rule base is formed. Each feature is represented in the rule by only one term. The terms belonging to each separate class are combined in the antecedent part of the rule by the conjunction operation. The consequent part of the rule contains the label of this class.
The presented algorithm is very simple, but its efficiency is not high. Therefore, it is necessary to use parameter tuning as an additional training step. The description of the procedure for term parameter tuning with the gravitational search algorithm is given in Section 3.2.

Output of Fuzzy Classifier
The output of the classifier for the input string xp is formed by sequentially performing three steps. In the first step, the value of the membership function of the object to each term is calculated: The degree of the object membership to each rule is evaluated in the second step: The third step is to define the output class by the maximum rule. The output of the classifier will be the class that corresponds to the rule with the highest degree of membership: After the procedure of forming the output has finished, the constructed model can be evaluated using various performance indexes.

Classification Quality Evaluation
The most common classification quality criterion is the overall accuracy, which is the percentage of correct classification. In the observation table {(xp; cp), p ∈ [1, z]}, where z is the number of instances, the measure of accuracy can be given as follows: where f(xp; θ, S) is the output of the fuzzy classifier with the parameter vector θ and the binary feature vector S at the point xp. As noted earlier, the overall accuracy is not an objective assessment of classification quality when there is an imbalance in the class distribution. The geometric mean is a sensitive estimate for the accuracy of each class: where Acci(θ, S) is the classification accuracy of ith class: where zi is the number of instances with the ith class label. Thus, the fewer instances represent a class, the geometric mean increases more significantly with an increment in the number of correctly classified instances of that class. In the case when one of the classes is classified absolutely incorrectly, the geometric mean is zero. While using general accuracy as the objective function, on the one hand, the classifier prefers to focus on recognizing negative classes. The geometric mean, on the other hand, can lead to a large loss in the quality of classification of negative classes, even if the accuracy of positive classes is low. We propose to use a compromise option that combines both of these metrics and allows varying their importance degree using the coefficient γ ∈ [0; 1]: GM Acc The problem of constructing a fuzzy classifier reduces to searching for the maximum of the selected function.

Training a Classifier with the Gravitational Search Algorithm
For selecting feature and tuning term parameters we suggest using the gravitational search algorithm in two versions, i.e., binary for optimizing the binary feature vector S and continuous for optimizing the continuous vector of term parameters θ. The GSA was first proposed by Rashedi, Nezamabadi-pour, and Saryazdi in 2009 [33], and in the same year, its binary version was described [34]. This algorithm is widely used to solve various problems. For example, the GSA was applied to optimize parameters in a geothermal power generation system in the study of Özkaraca and Keçebaş [35], to determine the location of a microseismic source in order to warn about explosions in tunnels in [36]. Mahanipour and Nezamabadi-pour described the use of GSA for the automatic creation of computer programs in [37] and the feature construction in [38].
The application of the binary and the continuous versions of the GSA for the fuzzy classifier has been described in detail earlier in [6]. In the binary GSA, a population of particles corresponding to binary feature vectors S is generated randomly. At each iteration, the algorithm calculates particle masses, gravity, acceleration, and velocity. Transformation functions are applied to transform the obtained speed value into a binary equivalent in order to update the feature vector. In this paper, we use the V-type transformation function: where rand(0;1) is a random number in the range from 0 to 1, The continuous GSA optimizes the numerical vector θ, consisting of the term parameters. In this version of the algorithm, the population is formed as follows: The first vector is input to the algorithm after the stage of creating the classifier structure and the remaining vectors are generated based on the first one with some deviation. Unlike the binary version, in GSAc the vector value is updated by the simple addition of the current value and the calculated speed: where d i θ is the value of the dth element of the ith vector.
Five parameters are used in both versions of the GSA: the number of iterations t, the number of particles P, the value of the gravitational constant G0, the coefficient of the gravitational constant decrease α, and the variable for calculating the attractive force ε. The computational complexity of the GSA with n agents is O(n × d) where d is the search space dimension [39]. We did not modify the original GSA, therefore, both algorithms have the complexity O(P × d), where P is the number of particles and d is the size of the dataset.
The classifier training procedure is as follows: After the algorithm based on extremes values of classes has created the initial vector θ, the binary GSA searches for the optimal vector S; then, the classifier is rebuilt on the obtained set of features Sbest and the algorithm for optimizing the term parameters is launched; the continuous GSA runs for a given number of iterations and provides the best parameter vector θbest; and the resulting Sbest and θbest are used to construct and validate the classifier on test data.

Experimental Results
The experiment was performed on imbalanced binary datasets from the KEEL repository [40]. The sets are described in Table 1. Here, Fall is the number of features in a dataset, Strall is the number of lines, Str+ is the number of rows of the smallest class, Str-is the number of rows of the largest class, and IR is the imbalance ratio. The imbalance ratio is the ratio of the number of rows of a negative class to the number of rows of a positive class. Five-fold cross-validation was applied in all stages of the experiment. The data were divided into five pairs of training and test samples. The structure of the fuzzy classifier was formed by the algorithm based on the extreme values of classes with symmetric Gaussian terms. Since only two classes are represented in all data, the number of rules in all cases was two.
In the first stage of the experiment, the efficiency of the continuous gravitational algorithm was tested when the priority coefficient γ in the fitness function was changed. The tuning of the fuzzy classifier parameters was carried out on full sets of features. The following parameters were set for the GSAc: 750 iterations, 15 particles, G0 = 10, α = 10, and ε = 0.01. The particle population was cleared after each 150th iteration, except for the best particle on the basis of which the population was generated anew. The parameters were chosen empirically as the most universal for the selected datasets. Table 2 contains the results of the first experimental stage, used to assess the quality of the constructed model based on the following: the classification accuracy, the geometric mean, as well as the percentage of correctly classified instances of the positive class relative to the total number of instances of the positive class (true positive rate) and the percentage of correctly classified instances of the negative class relative to the total number of instances of the negative class (true negative rate). The table shows the results obtained on the test data as an average of three runs (Avr.), and the best one (Best). The purpose of the second experimental stage consisted of verifying the effectiveness of GSA on the task of selecting features in the wrapper mode for the fuzzy classifier of imbalanced data. The binary gravitational algorithm with the same coefficient γ was run three times on each sample. Due to the stochasticity of the algorithm, one to three different feature sets could be obtained on the same sample. Next, a set of features with the highest fitness function value was selected. A classifier was built on this set; the parameters of the created model were tuned by the continuous algorithm. The obtained values of quality indicators were averaged over three independent runs of the GSAc.
The following parameters were empirically selected for the binary gravitational algorithm: 750 iterations, 15 particles, G0 = 10, α = 10, and ε = 0.01. The parameters of the continuous algorithm did not differ from those used at the first stage of the experiment. Table 3 shows the results of the classifier on the selected feature sets before parameter tuning (GSAb) and after optimization (GSAb + GSAc). In the following table and further, formatting the cells according to a color scale was used to visualize the results. The values presented in each row were compared with each other. The hue of the color depended on the relative magnitude of the value compared to other cells in the row. Thus, the worst results are marked in red, the best are highlighted in green, the remaining values are colored in intermediate colors.  Table 4 shows fuzzy classifiers based on the best feature sets. The best sets here are those that gain the highest averaged value of the objective function with a given value γ over five samples.

Discussion
To confirm the effectiveness of the gravitational algorithm for optimizing the fuzzy classifier of imbalanced data, we performed a five-stage comparison.
The task of the first stage was to check the quality of the fuzzy classifier in the presence of feature selection. For this purpose, we compared the results of fuzzy classifiers constructed on complete datasets (Table 1, average values for three runs) with those built on abbreviated sets of features (Table 3). In both cases, the results obtained after setting the GSAc parameters were taken into account. Table 5 shows the results of the pairwise comparison of the number of features by Wilcoxon's sign rank criterion for linked samples. The significance level is 0.05; the null hypothesis states that the difference median between the two samples is zero.
The first three rows of the table are the comparison of the number of features in the original set (Fall) and in the selected feature sets (Fbin). The last three rows are the comparison of the number of features when using the GSAb with different values of the coefficient γ in the fitness function.  On the basis of the results of the verification, we conclude that the binary gravitational algorithm can significantly reduce the number of features working with imbalanced data in the wrapper mode of the fuzzy classifier. In addition, there is no significant difference in the number of features when using one or another value of γ. Table 6 shows the results of comparing the performance indexes for classifiers built on complete and selected sets of features when changing the priority coefficient γ in the fitness function. The obtained values of the Wilcoxon's sign rank criteria are grouped for each of four quality indexes (the total accuracy, the geometric mean, the percentage of correctly classified instances of the positive class, and the percentage of correctly identified instances of the negative class). Thus, the results of the first stage of the comparison show that the use of the GSAb for selecting features in the wrapper mode of the fuzzy classifier of imbalanced data significantly reduces the number of features while maintaining or increasing the quality of classification.

Feature Sets Standardized Test Statistic p-Value Null Hypothesis
In the second stage, the effectiveness of the binary gravitational algorithm was tested in comparison with popular methods of selecting features. We used a random search (RS) and a filtering algorithm based on mutual information (MI).
The filter was executed as follows: The value of mutual information was calculated for each feature with three randomly-selected neighbors. Next, the algorithm found the arithmetic mean of these values. The set of selected features included only those variables whose mutual information exceeded the value of the arithmetic mean. Both algorithms were run three times, among the obtained feature sets, those with the best accuracy were selected. Fuzzy classifiers were constructed on the selected feature sets using the algorithm based on extreme values of classes. The obtained values were compared with the results of fuzzy classifiers built on the feature sets found by the GSAb (Table 3). In this case, we considered the results without optimizing parameters. The average performance indexes of the classifiers are given in Table 7 (F is the number of features).  Table 8 demonstrates the results of a pairwise comparison of the performance indexes of the obtained systems by the criterion of Wilcoxon's sign ranks for linked samples. Here STS is the standardized test statistic, p is the p-value, and NH is the null hypothesis. The left half of Table 8 shows the results of the comparison with the random search algorithm, the right half of the table demonstrates the comparison with the filter based on mutual information.  The algorithms are statistically indistinguishable by the number of selected features. But the value of the standardized test statistic shows that fuzzy classifiers, constructed on the features selected by the gravitational search algorithm, have higher classification quality values in most cases. Hence, the binary gravitational algorithm is more preferable for imbalanced data classification in contrast to the random search or the filter based on mutual information.

STS p NH
In the third stage of the comparison, we compared our results with fuzzy classifiers based on imbalanced data preprocessed by the SMOTE algorithm. We used a realization of the algorithm from the open library [41] and all parameters were taken by default. After applying SMOTE, the number of instances of the positive and negative classes was equal. Next, we conducted five-fold cross-validation. Fuzzy classifiers were constructed with the algorithm based on the extreme values of the classes. The feature selection was not produced. Table 9 presents the results of fuzzy classifiers averaged over five samples. We compared the obtained results with the results demonstrated in Table 2, where fuzzy classifiers were constructed on complete sets of imbalanced data and optimized by the continuous GSA. The Wilcoxon's criterion values for the third stage are presented in Table 10.

Metrics
Algorithms The comparison shows that fuzzy classifiers constructed on the original datasets and tuned by GSAc in relation to fuzzy classifiers built on oversampled data demonstrate better overall accuracy with comparable recognition quality of a positive class. Therefore, if for the classification task it is important not only to classify the positive class correctly, but also not to receive large losses in the recognition of a negative class, then a fuzzy classifier with parameter tuning with the GSAc is a more preferable tool.
At the next stage of comparison, the feature selection was carried out on the oversampled data. Table 11 presents the results of fuzzy classification averaged over five samples on subsets of features obtained by the random search algorithm.  Table 12 presents the values of the performance indexes obtained after selecting features by the filter based on mutual information. We compared these values with the results of constructing fuzzy classifiers with feature selection and parameter tuning using the GSA on the initial datasets (Table 3). Table 13 shows the results of the comparison by the Wilcoxon test. Table 13. Comparison of the results of constructing fuzzy classifiers on oversampled and origin data using the selection of features. The results demonstrate that fuzzy classifiers optimized by the gravitational search algorithm show better results than fuzzy classifiers constructed on selected sets of features after data oversampling using the SMOTE.

Metrics Algorithm 1 Algorithm 2 Standardized Test Statistic p-Value Null Hypothesis
The last stage of the comparison was to check the effectiveness of the fuzzy classifier using the GSA for selecting features and tuning parameters relative to the state-of-art classification algorithms. Using the open sklearn library, the following classifiers were built on complete data sets: Gaussian naive Bayes (GNB), logistic regression classifier (LR), decision tree classifier (DT), multilayer perceptron classifier (MLP), linear support vector classifier (LSV), K-nearest neighbors classifier with k = 3 (3NN), AdaBoost classifier (AB), random forest classifier (RF), and gradient boosting for classification (GB) [42]. All algorithm parameters were used by default. Table 14 contains the results of constructing various classifiers on selected data sets. The last three columns show the fuzzy classifiers from Table 4. The obtained values were compared using the criterion of Wilcoxon's sign ranks for linked samples (Tables A1-A4). The fuzzy classifier demonstrates results comparable with analogues in terms of the overall accuracy and the geometric mean but has fewer features. It shows the best results for the TPrate value when the coefficient γ is equal to one. With the coefficient γ is equal to 0.5, the fuzzy classifier shows statistically comparable results with analogues by the value of TPrate and yields only to three algorithms by the value of TNrate.
Thus, if the chosen priority coefficient γ is zero, the proposed metric represents the overall accuracy. Then the classifier focuses on recognizing a negative class, and as a result, the model has a low value of the Type I error, but a high value of the Type II error.
In the case when γ is equal to 1, the function will be identical to the geometric mean. Then, the efficiency of the fuzzy classifier with respect to the positive class will increase. As a result, the Type II error will decrease, but the Type I error can increase significantly.
When using coefficient γ close to 0.5, a system with low values of both errors will be obtained simultaneously. The proposed metric can be useful for such data as vowel0, ecoli4, and yeast4, when a high-quality classification of one class can lead to large losses in the ability of the model to recognize another class.

Conclusions
We considered the possibility of applying the gravitational search algorithm to improve the efficiency of the fuzzy classifier in the presence of data imbalance. The binary GSA reduced the space of input features by selecting informative feature subsets in the wrapper mode for a fuzzy classifier. The continuous GSA helped to improve the quality of classification. We proposed a new metric that could influence the final performance indexes of the model by choosing the priority coefficient. The function with the ability to change priority between the number of correctly defined positive and negative classes allowed the developer to flexibly configure the fuzzy classifier. In future works, we plan to further study the impact of the coefficient in the metric on the result and make proposals about the recommended value of the coefficient for certain characteristics of the dataset.

Conflicts of Interest:
The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.