You are currently viewing a new version of our website. To view the old version click .
Mathematics
  • Article
  • Open Access

8 July 2022

An Efficient Heap Based Optimizer Algorithm for Feature Selection

,
and
1
Computer Science Department, College of Computer Science and Information Technology, King Faisal University, Al Ahsa 400, Saudi Arabia
2
Computer Science Department, Faculty of Computers and Artificial Intelligence, Benha University, Benha 12311, Egypt
3
Department of Computer Science, Faculty of Computers and Information, Misr International University, Cairo 12585, Egypt
4
Department of Information System, Faculty of Computers and Artificial Intelligence, Benha University, Benha 12311, Egypt
This article belongs to the Special Issue Advanced Optimization Methods and Applications

Abstract

The heap-based optimizer (HBO) is an innovative meta-heuristic inspired by human social behavior. In this research, binary adaptations of the heap-based optimizer B _ H B O are presented and used to determine the optimal features for classifications in wrapping form. In addition, HBO balances exploration and exploitation by employing self-adaptive parameters that can adaptively search the solution domain for the optimal solution. In the feature selection domain, the presented algorithms for the binary Heap-based optimizer B _ H B O are used to find feature subsets that maximize classification performance while lowering the number of selected features. The textitk-nearest neighbor (textitk-NN) classifier ensures that the selected features are significant. The new binary methods are compared to eight common optimization methods recently employed in this field, including Ant Lion Optimization (ALO), Archimedes Optimization Algorithm (AOA), Backtracking Search Algorithm (BSA), Crow Search Algorithm (CSA), Levy flight distribution (LFD), Particle Swarm Optimization (PSO), Slime Mold Algorithm (SMA), and Tree Seed Algorithm (TSA) in terms of fitness, accuracy, precision, sensitivity, F-score, the number of selected features, and statistical tests. Twenty datasets from the UCI repository are evaluated and compared using a set of evaluation indicators. The non-parametric Wilcoxon rank-sum test was used to determine whether the proposed algorithms’ results varied statistically significantly from those of the other compared methods. The comparison analysis demonstrates that B _ H B O is superior or equivalent to the other algorithms used in the literature.

1. Introduction

On one hand, the massive amounts of data collected in all industries at present provide more specific and valuable information. On the other side, it is more difficult to analyze this data when not all the information is important. Identifying the appropriate aspects of data is a difficult challenge. Dimension reduction is a strategy used to solve classification and regression problems by identifying a subset of characteristics and eliminating duplicate ones. This method is very useful when there are numerous qualities, and not all of them are needed to interpret the data and conduct additional experiments on the attributes. The essential principle of selecting features is that for many pattern classification tasks, a large number of features does not necessarily equal a high classification accuracy. Idealistically, the selected attributes subset will improve classifier performance and provide a quicker, more cost-effective classification, resulting in comparable or even higher classification accuracy than using all of the attributes [1].
Selecting feature subsets with a powerful and distinctive impact for high-dimensional data analysis is a critical step. High-dimensional datasets have recently become more prevalent in various real-world applications, including genome ventures, data mining, and computer vision. However, the high dimensionality of the datasets may result from unnecessary or redundant features, which can reduce the effectiveness of the learning algorithm or result in data overfitting [2].
Feature selection (FS) has become a viable data preparation method for addressing the curse of dimensionality. FS strategies focus on selecting feature subsets using various selection criteria while keeping the physical meanings of the original characteristics [3]. It can make learning models easier to comprehend and perceive. FS has proven its efficiency in various real-world machine learning and data mining problems, such as pattern recognition, information retrieval, object-based image classification, intrusion detection, and spam detection, to name only a few [4]. The FS process aims to reduce the search space’s dimension to improve the learning algorithm’s efficiency [5].
The feature selection methodology derives its strength from two main processes, the search and the evaluation. Choosing the most valuable features from the original set with passing all the incoming subsets may face a combinatorial explosion. Therefore, search methodologies are adopted to select the worthy features efficiently. The traditional greedy search strategies such as forward and backward search have been used. The problem with this type of searching is that it may succumb to locally optimal solutions, resulting in non-optimal features. The evaluation function can handle this issue by assessing each feature subset’s overall importance, which may help discover the globally optimal or near-optimal solution. Based on the methods used to evaluate feature subsets, the feature selection algorithms are categorized into three primary approaches: filter, wrapper, and embedding methods [6].
Since FS seeks out the near-optimal feature subset, it is considered an optimization problem. Thus, exhaustive search methodologies will be unreliable in this situation, as they generate all potential solutions to find only the best one [7].
Meta-heuristic algorithms gain their superiority from their ability to find the most appropriate solutions in an acceptable, realistic time [8]. In general, meta-heuristic and evolutionary algorithms can avoid the problem of local-optima better than traditional optimization algorithms [9]. Recently, nature-inspired meta-heuristic algorithms have been used most frequently to tackle optimization problems [10].
Typically, the feature selection problem can be mathematically phrased as a multi-objective problem with two objectives: decreasing the size of the selected feature set and optimizing classification accuracy. Typically, these two goals are incompatible, and the ideal answer is a compromise.
Meta-heuristic algorithms are stochastic algorithms that fall into two categories: single-solution-based and population-based. The solution is randomly generated until it reaches the optimum result [11]. In contrast, the population-based algorithm’s strategy is based on evolving a set of solutions (i.e., population) in a given search space during many iterations until it obtains the best solution. According to the theory of evolutionary algorithms, population-based algorithms are categorized into physics laws-based algorithms, swarm intelligence of particles, and bio-inspired algorithms’ biological behavior. Evolutionary algorithms (EA) are based on the fitness of survival attempts. GA inspires its strategy from natural evolutionary processes (e.g., reproduction, mutation, recombination, and selection). Swarm intelligence (SI) techniques are based on swarms’ mutual intelligence. Finally, the physical processes that motivate the physics law-based algorithms include electrostatic induction, gravitational force, and heating systems of materials [11].
Several algorithms proved their efficiency in both optimization and feature selection fields. The Genetic Algorithms (GA), especially binary GA approaches, are regarded as leading evolution-based algorithms that have been used to handle FS problems [12].
The Particle Swarm Optimization (PSO) algorithm, constructed for continuous optimization issues [13], also has a binary version (BPSO) that was presented for binary optimization problems[14]. The BPSO has been used, as well, in FS by [15,16,17]. Furthermore, many other optimization algorithms succeeded in solving FS problems, such as the Ant Colony Optimization (ACO) algorithm [18], Artificial Bee Colony (ABC) algorithm [19], Binary Gravitational Search Algorithm (BGSA) [20], Scatter Search Algorithm (SSA) [21], Archimedes Optimization Algorithm (AOA) [22], Backtracking Search Algorithm (BSA) [23], Marine Predators Algorithm (MPA) [24], and Whale Optimization Algorithm (WOA) [25].
The common challenge of the previously suggested metaheuristics for FS is the slow convergence rate, bad scalability [26], and lack of precision and consistency. Moreover, the characteristics of large-scale FS issues with various datasets may differ. As a result, solving diverse large-scale FS issues using an existing approach with only one candidate solution generating process may be inefficient [27]. Furthermore, identifying an appropriate FS approach and parameter values takes time to efficiently address a large-scale FS issue. This limitation motivates the current study, which proposes a novel algorithm for an FS task using the Heap Based optimizer.
As a result, In this research, we propose an enhancement to a current optimization technique known as the Heap Based Optimizer (HBO), which is a brand-new human behavior-based algorithm [28]. The HBO is a novel meta-heuristic inspired by corporate rank hierarchy (CRH) and some human behavior. An adaptive opposition strategy is proposed to enable the original algorithm to achieve more precise outcomes with increasingly complex challenges.
HBO displayed incredibly competitive performance. It demonstrates effectiveness in optimization issues. HBO provides numerous benefits, including fewer parameters, a straightforward configuration, simple implementation, and precise calculation. In addition, this method is superior to other algorithms. The HBO algorithm takes fewer iterations. All of these features are really beneficial for resolving the FS issue. It has a straightforward approach, low computational burden, rapid convergence, near-global solution, issue independence, and gradient-free nature [29,30].
This paper reports the following main contributions:
  • An improved Heap Based Optimizer (HBO) algorithm, termed B _ H B O , is proposed for the feature selection problem
  • The proposed improved version was tested on 20 datasets, in which 8 belong to a considerably high-dimensional class. The performance of the meta-heuristic algorithms on FS problems for such high-dimensional datasets is rarely investigated.
  • The performance of proposed B _ H B O in terms of fitness, accuracy, precision, sensitivity, F-score, and the number of selected features is compared with some recent optimization methods.
The remainder of the article is as follows: Section 2 demonstrates the process of reviewing the literature on FS metaheuristic algorithms. Section 3 introduces Continuous Heap Based Optimizer steps. The binary HBO strategy is detailed in Section 4. The experiment’s results are discussed in Section 5. In Section 6, conclusions and future work are discussed.

3. Procedure and Methodology

The proposed framework for the Binary Heap-Based optimizer B _ H B O for Feature Selection contains three significant steps illustrated in Figure 1. The Heap Based Optimizer (HBO) algorithm is the most recent and advanced SI algorithm. HBO was proposed in 2020 by Qamar Askari, Mehreen Saeed, and Irfan Younas [28], and the competition was fierce. It exhibits effectiveness when it comes to tackling optimization difficulties. HBO provides a variety of advantages, including fewer parameters, simple configuration, ease of implementation, and high calculation accuracy. Furthermore, this method surpasses competing algorithms in terms of performance. It retains optimization findings, showing that marine predators benefit from a strong memory for recalling both their associates and the location of successful foraging. Furthermore, the HBO algorithm requires fewer iterations. It has a simple technique, a low computational cost, rapid convergence, a near-global solution, problem independence, and a gradient-free nature [28]. All of these benefits are critical in resolving the FS issue. HBO was ranked second and displayed exceptionally competitive performance compared to LSHADE-cnEpSin, the highest performing technique and a CEC 2017 winner. HBO can be called a high-performance optimizer because it statistically outperforms GA, PSO, GSA, CS, and SSA.
Figure 1. The framework of the proposed B _ H B O for feature selection based on KNN classifier.

3.1. Continuous Heap Based Optimizer

This section describes the steps of the heap-based optimizer algorithm (HBO). The HBO imitates the job titles, duties, and job descriptions of employees [28]. Although the designations differ from company to company and business to business, they are all structured hierarchically. Many know them by titles, including corporate rank hierarchy (CRH), organizational chart tree, and corporate hierarchy structure [69]. The organizational structure is a set of strategies for dividing tasks into specific responsibilities and coordinating them. The main body of HBO is presented in Algorithm 1. The mathematical model of the heap-based optimizer is discussed in this section.
Algorithm 1 HBO Pseudo-code
  • for ( i 1 to T) do
  •      γ is calculated using Equation (3).
  •      ( p 1 ) is calculated using Equation (6)
  •      ( p 2 ) is calculated using Equation (7)
  •     for ( i N down to 2) do
  •          i heap[I].value
  •          b i ← heap[parent(I)].value
  •          c i ← heap[colleague(I)].value
  •          B x b i
  •          S x c i
  •         for ( k 1 to D) do
  •             p r a n d ( )
  •             x t e m p k u p d a t e x i k ( t ) with Equation (9)
  •         end for
  •         if  f ( x t e m p ) f ( x i ( t ) )  then
  •             x i ( t + 1 ) x i ( t )
  •         end if
  •         Heapify_Up(I)
  •     end for
  • end for
  • return x h e a p [ 1 ] . v a l u e

3.1.1. Mathematical Formulating of the Collaboration with the Direct Boss

In a centralized organizational structure, upper-level policies and norms are imposed, and subordinates report to their immediate superior. This behavior may be simulated by changing each search agent’s position x i regarding its parent node B as shown in Equation (1)
X i k ( t + 1 ) = B k + γ λ k | B k X i k ( t ) |
where t represents the current iteration and k represents the kth component of the vector, and   calculates the absolute value. λ k represents the kth component of the vector λ , which is generated randomly as demonstrated by Equation (2)
λ = 2 r 1
where r is an integer generated randomly in between range 0 , 1 using the uniform distribution. γ is a well chosen parameter in Equation (1), and it is computed as as shown in Equation (3)
γ = 2 t mod T c T 4 c
where T is the total number of iterations, and C is a user-defined parameter, as described below. γ decreases linearly from 2 to 0 throughout iterations, and after reaching 0, it begins to rise again to 2 with more iterations. However, C specifies the number of cycles γ will complete in T iterations.
To determine the effect of the parameter C on the performance of HBO, we solved a variety of unimodal and multimodal benchmark functions while varying C from its minimum to its maximum value. After doing this experiment for many other functions, we chose to determine the balanced value of C by dividing the maximum number of Iterations by 25.

3.1.2. Mathematical Formulating of the Collaboration between Colleagues

Similar-ranking officials are referred to as “colleagues”. They collaborate to execute official responsibilities. We assume that nodes on the same level are colleagues in a heap, and that each search agent x i modifies its position relative to a randomly chosen colleague S r using (4)
The main objective function
X i k ( t + 1 ) = S r k + γ λ k | S r k x i k ( t ) | , f ( S r ) < f ( x i ( t ) ) x i k + γ λ k | S r k x i k ( t ) | , f ( S r ) f ( x i ( t ) )
where f is the fitness-calculating objective function for the search agent. Equation (4) position updating process is fairly similar to Equation (1). In contrast, Equation (4) permits the search agent to explore the region surrounding S r k if ( S r ) < f ( x i ( t ) ) and the region surrounding x i k otherwise.

3.1.3. Self Contribution of an Employee

This phase’s mapping procedure is relatively straightforward. This phase depicts the concept of an employee’s self contribution. The behavior is modeled by keeping the employee’s previous position in the next iteration, as shown in Equation (5)
x i k ( t + 1 ) = x i k ( t )
The search agent x i in Equation (5) does not modify its position for its kth design variable in the next iteration. This behavior is used to control a search agent’s rate of change.

3.1.4. Putting It All Together

This part explains how the previous subsections’ position updating equations are combined into a single equation. Calculating the probabilities of selection for all three equations is a big task, as probabilities play an important part in balancing exploration and exploitation. A roulette wheel, which is divided into three proportions of p 1 , p 2 , and p 3 , is designed to balance them. Using Equation (6), a search agent can update its location by selecting the proportion p 1 . The p 1 limit is calculated as follows:
p 1 = 1 t T
where t stands for the current and T stands for the total number of iterations. Using Equation (7), a search agent can update its position by selecting proportion p 2 . The p 2 limit is calculated as follows:
p 2 = p 1 + 1 p 1 2
Finally, p 3 is chosen to represent updated position using Equation (8), and the limit of p 3 is calculated as follows:
p 3 = p 2 + 1 p 1 2 = 1
The following equation depicts HBO’s general position update mechanism:
x i k ( t + 1 ) = x i k ( t ) , p p 1 B k + γ λ k B k x i k ( t ) , p > p 1 a n d p p 2 S r k + γ λ k S r k x i k ( t ) , p > p 2 a n d p p 3 a n d f ( S r ) < f ( x i ( t ) ) x i k + γ λ k S r k x i k ( t ) , p > p 2 a n d p p 3 a n d f ( S r ) f ( x i ( t ) )
where p is a number within the range [ 0 , 1 ] chosen at random. It is worthwhile to note that Equation (5) supports exploration, while Equation (1) supports exploitation and convergence, while Equation (4) supports exploration as well as exploitation. Based on these findings, p 1 is initially increased and subsequently linearly decreased over repetitions, decreasing exploration and increasing exploitation. After the computation of p 1 , the remaining span is split into two equal parts, increasing the chances of attraction to the boss and coworkers equally likely.

3.1.5. The HBO Step by Step

This section describes the HBO phases and algorithm in detail.
  • Initialize generic parameters such as the size of the population ( N ) , the number of design variables/dimensions ( D ) , the maximum iteration ( T ) , and the ranges of the model parameters C = T / 25 are used to compute the algorithm-specific parameter C.
  • Create the first population: Create a P by chance population of N-dimensional search agents with D. Following is a representation of population P:
    p = x 1 T x 2 T x N T = x 1 1 x 1 2 x 1 3 x 1 D x 2 1 x 2 2 x 2 3 x 2 D x N 1 x N 2 x N 3 x N D
A heap is typically represented by a d a r y tree. To implement CRH, however, the 3 a r y heap is used. Due to its completeness, a heap, which is a tree-shaped data structure, can be built quickly using an array. The following are essential d a r y heap-based operations that HBO requires:
  • The index of a node is received, and the index of the node’s parent is returned by this function, assuming the heap is implemented as an array. For example, the following is the formula for calculating the index of the parent of node ( i ) :
    p a r e n t ( i ) = i + 1 d
      is the floor operator, which produces the largest integer less than or equal to the input.
  • ( i ; j ) : This method returns the index of the jth child of the given node. A node in a 3 a r y heap can have no more than three offspring. According to our concept, a leader may have no more than three direct reports. The following is the mathematical formula for this function in constant time:
    c h i l d ( i , j ) = d × i d + j + 1
  • depth ( i ) : Using the following formula, the depth of any node i may be determined in constant time if the depth of the previous level equals 0.
    d e p t h ( i ) = log ( d × i i + 1 ) 1
      is the notation for the ceil function, which returns the smallest integer greater than or equal to the input.
  • colleague ( i ) : At the node level i, all nodes are considered their colleagues. This function returns the index of colleague of node i chosen randomly, which may be determined by producing random number in the range d d d e p t h ( i ) 1 ) 1 d 1 + 1 , f d d d e p t h ( i ) 1 ) 1 d 1 .
  • _Up ( i ) Heapify Up: To maintain the heap property, it searches upward in a heap and enters the node i in its proper spot. Algorithm 2 contains the pseudo code for this operation.
    Algorithm 2 Heapify_Up (i) Pseudo-code
    • Inputs: i (The index of the node that is being heaped.)
    •                         ▹ Considering the remaining nodes satisfy the heap property
    • while i root and heap[i].key[]parent(i)].key do
    •     Swap (heap[i], heap[parent(i)]
    •      i parent(i)
    • end while
    Finally, Algorithm 3 describes the heap-building algorithm.
    Algorithm 3 Build_Heap (P, N) Pseudo-code
    • Inputs: N is the population size, P is the search agents population
    • for ( i 1 to N) do
    •     heap[i].value ←i
    •     heap[i].key ← f ( x i )
    •     Heapify_Up (i)
    • end for
  • Position updating mechanism: Search agents update their positions regularly according to the previously described equations to converge on the best global solution.

4. The Proposed Binary HBO ( B _ HBO ) for Feature Selection

This section includes the proposed Heap Based Optimizer’s (HBO) steps for solving feature selection using KNN as the classifier. The proposed technique is a mix of the B _ H B O and KNN algorithms for classification, feature selection, and parameter optimization. In B _ H B O , KNN parameters are used to identify the best selection accuracy, and the selected features are used for all cross-validation folds. Figure 1 depicts the suggested B _ H B O -KNN approach’s flowchart, which depicts the three steps of the proposed method. Algorithm 4 shows the pseudocode for the proposed B _ H B O with KNN classification algorithm.
Algorithm 4 The Pseudo code of the proposed B _ H B O based on KNN classifier.
  • Inputs: The size of the population, N, and the maximum number of generations T, group classifier G, characteristic X, dataset set D, and a novel fitness function ( fobj).
  • Outputs: The prediction accuracy for each iteration (optimal location) and the highest accuracy value
  • Randomly Initiate the population X i (i = 1, 2, …, N)
  • while t h e s t o p c o n d i t i o n i s n o t m e t do
  •     New fitness function computed Using the strategy for selecting call features, Call k-NN classifier
  •     for ( i 1 to T) do
  •          γ is calculated using Equation(3).
  •          ( p 1 ) is calculated using Equation (6)
  •          ( p 2 ) is calculated using Equation (7)
  •         for ( i N down to 2) do
  •             i heap[I].value
  •             b i ← heap[parent(I)].value
  •             c i ← heap[colleague(I)].value
  •             B x b i
  •             S x c i
  •            for ( k 1 to D) do
  •                Calculation of a new fitness function using the call feature selection technique.
  •                Call k-NN classifier
  •                 p r a n d ( )
  •                 x t e m p k u p d a t e x i k ( t ) using Equation (9)
  •            end for
  •            if  f ( x t e m p ) f ( x i ( t ) )  then
  •                 x i ( t + 1 ) x i ( t )
  •            end if
  •            Heapify_Up(I)
  •         end for
  •     end for
  •     return x h e a p [ 1 ] . v a l u e
  • end while

4.1. FS for Classification

Classification is the most important problem in data mining, and its fundamental function is to estimate the class of an unknown object. A dataset (also known as a training set) typically consists of rows (referred to as objects) and columns (referred to as features) that correspond to predetermined classifications (decision features). A significant number of redundant or irrelevant characteristics in the dataset may be the primary factor affecting a classifier’s accuracy and performance. Redundant characteristics may negatively impact the classifier’s performance in various ways; adding more features to a dataset involves adding more examples, which increases the learning time of the classifier. Moreover, a classification that learns from features in the dataset is more accurate than one that learns from irrelevant data. This is because irrelevant features can confuse the classifier, overfitting the data. In addition, the duplicated and irrelevant input will increase the complexity of the classifier, making it more challenging to comprehend the learned results. As was demonstrated previously, the selection of a suitable searching strategy in FS techniques is crucial for optimizing the efficiency of the learning algorithm. FS often aids in detecting redundant and unneeded features and eliminating them to enhance the classifier’s results in terms of learning time and accuracy, as well as simplifying the findings to make them more understandable. By selecting the most informative feature and removing unneeded and redundant features, the dimension of the feature space is decreased, and the convergence rate of the learning algorithm is accelerated.
Because of the above, the HBO was selected as an efficient optimization engine in a wrapper FS method since it has shown sufficient efficacy in solving several optimization issues compared to Si-based optimization techniques. The HBO is a new optimizer that has not yet been used to solve FS issues. Its distinctive properties make it a suitable search engine for global optimization and FS issues. The HBO is initially efficient, adaptable, simple, and straightforward to deploy. To balance exploration and exploitation, HBO has only one parameter.

4.2. The Proposed Binary HBO ( B _ H B O )

Searching for the optimal feature subset in FS is a difficult problem, particularly for wrapper-based approaches. This is because the supervised learning (e.g., classifier) must evaluate the selected subset at each optimization step. Consequently, a suitable optimization approach is crucial to minimize the number of evaluations.
The based on comparative of HBO prompted us to suggest using this method as a search strategy in a wrapper-based FS procedure. We proposed a binary version of the HBO to solve the FS problem because the search space may be represented by binary values [0, 1]. Binary operators are believed to be considerably easier than their continuous counterparts. In the continuous form of HBO, each agent’s location is updated depending on its current position, the position of the best solution so far (target), and the positions of all other solutions, as shown in Equations (1) and (5). The new solution derived from Equation (4) is obtained by adding the step vector to the target vector (position vector). However, the addition operator cannot be used in a binary space because the position vector only includes 0 s and 1 s. The next three subsections elaborate on these approaches.

4.3. B _ H B O Proposed for FS Based on KNN

Previous sections demonstrated the significance of an effective search strategy for FS approaches. Another feature of FS techniques is evaluating the selected subset’s quality. Since the suggested method is wrapper-based, an algorithm for learning (such as a classifier) must be incorporated into the evaluation process. This study employs the k-Nearest Neighbor. The classifying quality of the chosen features is integrated into the proposed fitness values because the primary problem of this study is the feature selection problem—not the classification problem—which is tackled by the HBO technique. Each algorithm is executed 51 times with 1000 iterations; the classification is used to select the iteration with the highest accuracy for each run. Therefore, we require the simplicity classifier to reduce each method’s complexity and execution time.
In this proposed approach, the KNN is employed as a classification to guarantee the selected features’ quality. When relevant features are selected from a subset, the classification accuracy will be enhanced. One of the primary goals of FS approaches is to improve classification accuracy; another is to reduce the number of selected features. The superiority of a solution increases as its number of components decreases. In the proposed fitness function, these two contradicting goals are considered. Equation (16) depicts the fitness function that considers classification accuracy and the number of selected features when evaluating a subset of characteristics across all techniques. In HBO, KNN parameters are used to identify the best selection accuracy, and the selected features are used for all cross-validation folds. Figure 1 depicts the suggested HBO-KNN approach’s flowchart, which depicts the three steps of the proposed method. The first step is preprocessing, followed by FS and optimization phase, and then the classification and cross-validation phase. Algorithm 4 shows the pseudocode for the proposed B _ H B O based on the KNN classifier.

4.4. Fitness Function for FS

To define FS as an optimization problem, it is necessary to examine two crucial factors: how to represent a solution and how to evaluate it. A wrapper FS strategy employing HBO as a search algorithm and a k-NN classifier as an evaluator has been developed. A feature subset is converted into a binary vector with the same length as the number of selected features used for this investigation. If the value of a feature is 1, it has been selected; otherwise, it has not. The quality of a feature subset is determined by the classification accuracy (error rate) and the number of features selected simultaneously. These two contradicting objectives are represented by a single fitness function denoted by Equation (14).
Fitness = α γ R ( D ) + β | R | | C |
where | R | is the number of selected features in a reduct, | C | is the number of conditional features in the original dataset, and α [ 0 , 1 ] , β = ( 1 α ) are two main parameters related to the significance of classifying performance and subset length.
The proposed fitness function governs the Accuracy of the selected features. During the iterative process, the solutions HBO finds must be reviewed to verify the performance of each iteration. Before the evaluation of fitness, a binary conversion is realized using Equation (15) and the HBO fitness function is defined by Equation (16).
x i b i n = 1 if x i t > 0.5 0 otherwise .
F i t = 0.99 × R + 0.01 × | c | C
R is the classification error rate computed by k-NN (80% for training and 20% for testing), where C denotes the total number of features and c indicates the relevant selected features.
As illustrated in Figure 1, HBO is customized to choose the most important and best features.

5. Results and Discussion

In this section, a comparison between the results of the developed FS approach and other methods is performed. The proposed B _ H B O algorithm is compared with eight recent evolutionary feature selection algorithms, such as ALO, AOA, BSA, CSA, LFD, PSO, SMA, and TSA. Each compared algorithm was run 51 times on a population size set to 30 with 1000 iterations. The suggested B _ H B O algorithm was constructed in Matlab using the same interactive environment, which was executed on a computer with an Intel(R) Core i7 2.80 GHz processor and 32 GB RAM.
The experiment is achieved using different datasets with different characteristics. The details of the behavior of datasets are given in the following section.

5.1. Datasets and Parameter Setup

Twenty datasets in Table 2 from the UCI machine learning repository [70] are used in the experiments to evaluate the effectiveness of the suggested method. Each dataset’s instances are randomly partitioned into 80% for training and 20% for testing. The provided datasets are arranged from the category of low dimensionality to high dimensionality data. Low-dimensional datasets have less than ten feature sizes, whereas high dimension is greater than ten features. The challenge is finding an optimal subset of features with high accuracy to justify the quality performance. This study employs a wrapper-based method for feature selection based on the KNN classifier, where K = 5 was determined to be the optimal choice for all datasets. Table 3 presents the settings of parameters of algorithms considered in this work to analyze and assess the performance of the proposed method.
Table 2. Details of Used Datasets.
Table 3. Parameters settings of B_HBO and other computational algorithms.

5.2. Performance Metrics

It is imperative to quantify the relevant performance metrics which can guide to analysis of the performance behavior of an anticipated algorithm. As a result, the following evaluation metrics and measures were computed for the proposed method ( B _ H B O ), developed to solve the feature selection problem [71].
  • Average fitness value: is the best fitness value F i t v a l obtained when running several algorithms for N times. It represents decreasing the selection ratio and minimizing the classification error rate. It is calculated by Equation (17):
    M e a n = i = 1 N F i t v a l N
  • Standard Deviation ( StdDev ): It is an indicator of the stability of the used algorithm. It is calculated by (18):
    S t d D e v = 1 N 1 i = 1 N ( f i t v a l m e a n ) 2
  • Average accuracy AVG ACC : The accuracy metric ( A C C ) identifies the correct data classification rate. It is calculated by (19):
    A C C = T P + T N T P + F N + F P + T N
    In our study, nine different algorithms are running N times, so it is more suitable to use the A V G A C C metric, which is calculated by (20):
    A V G A C C = 1 N i = 1 N A C C
  • Sensitivity or True Positive Rate (TPR): it presents the rate of predicting positive patterns. It is calculated by (21):
    T P R = T P T P + F N
  • Specificity or True Negative Rate (TNR): it indicates the percentage of actual negatives which are correctly detected. Equation (22) is used to calculate it:
    T N R = T N F P + T N

5.3. Comparison of B _ H B O with Other Metaheuristics

In this section, the comparison of performance of B _ H B O with other well-known meta-heuristic algorithms is performed. The results are discussed in terms of different performance analyses such as:
  • In terms of fitness: The comparison results between the suggested B _ H B O and other competing algorithms are shown in Table 4. It is evident from the obtained results that our B _ H B O provides results better than the others. For example, it has the smallest results compared with the competitive methods at 17 datasets which represents 85% of the total number of tested datasets. The ALO follows this, which achieved the smallest fitness value in the two datasets, while AOA, BSA, and CSA are the worst algorithms.
    Table 4. The average fitness values of B_HBO against other recent optimizers.
  • In terms of accuracy: The following points can be observed from the results given in Table 5. First, the B _ H B O has higher accuracy at nearly 80% of the total number of datasets. In addition, it is more stable than all other tested algorithms. However, it has the worst accuracy of the three datasets. This indicates the high efficiency of the proposed B _ H B O . This is followed by the ALO, which achieved the second accuracy rank at seven datasets. At the same time, SMA is the worst algorithm. The standard deviation is computed to evaluate the stability of fitness value for each FS method. From the results of Std, it can be seen that the B _ H B O is more stable than other algorithms in 14 datasets.
    Table 5. The average Accuracy of B_HBO against other recent optimizers.
  • In terms of precision: It can be seen from the results presented in Table 6 that list the precision of the proposed method B _ H B O with eight wrapper FS algorithms. By examining the average precision values for all 20 datasets, it is evident that B _ H B O outperforms all advanced competitor algorithms. For example, the average precision has the highest results compared with the competitive methods at eight datasets which represent 40% of the total number of tested datasets. The CSA followed this, achieving the highest precision value in the five datasets, while LFD, BSA, AOA, and TSA are the worst algorithms.
    Table 6. The average Precision of B_HBO against other recent optimizers.
  • In terms of sensitivity: The results presented in Table 7 demonstrate the sensitivity of the proposed method B _ H B O with eight wrapper FS algorithms. Examining the average sensitivity values for each of the 20 Datasets reveals that B _ H B O outperforms all advanced competitor algorithms. Eight datasets, or 40% of the total number of datasets tested, produce the best results based on the average sensitivity. This was followed by the CSA with the highest value of precision across five datasets. LFD, BSA, AOA, and TSA are the four worst algorithms.
    Table 7. The average S e n s i t i v i t y of B_HBO against other recent optimizers.
  • In terms of F-score and number of selected features: In terms of F-score, Table 8 reveals that the proposed method B _ H B O outperforms all other competitors. It has the highest results compared with the competitive methods of eight datasets which represent 40% of the total number of tested datasets. The PSO followed this, which achieved the highest F-Score value in five datasets, while LFD, BSA, SMA, and TSA are the worst algorithms.
    Table 8. The average test F s c o r e of B_HBO against other recent optimizers.
    Based on the results of Table 9, which depicts the number of selected features, the proposed method B _ H B O exhibited excellent performance in selecting relevant features from other competitors. It has the smallest results of the competitive methods at 15 datasets, which represents 75% of the total number of tested datasets. The SMA follows this, achieving the smallest number of selected features at ten datasets.
    Table 9. The number of selected features of B_HBO against other recent optimizers.
In this paper, we can see that the performance of the proposed B _ H B O algorithm for feature selection and classification was investigated using six different statistical metrics (e.g., average fitness value, average accuracy, average sensitivity, average precision, average F_score, and number of selected features over 51 runs for each algorithm) Table 5 displays the average accuracy for the proposed B _ H B O and the eight other compared algorithms. As shown, the proposed B _ H B O recorded the best fitness values in all the used datasets. The findings presented in Table 5 demonstrate that the suggested B _ H B O outperformed competing methods in nearly all datasets. Additionally, this method has the highest accuracy rate in 80% of the dataset. ALO is ranked the second algorithm in performance after the proposed B _ H B O while SMA takes the third rank in performance. The proposed B _ H B O algorithm achieved the best result for most datasets.
Precision and sensitivity are shown in Table 6 and Table 7. The higher the algorithm’s precision and sensitivity, the better its performance. It is easy to see that the proposed B _ H B O has high precision and sensitivity values in eight datasets. At the same time, the CSA algorithm provides better precision and sensitivity in only six datasets. The ALO algorithm takes the third level in performance in three datasets. Previous results ensure the superiority of the proposed B _ H B O algorithm over other compared algorithms.
Table 8 shows that the proposed B _ H B O provides a higher F-score rate than others. The proposed B _ H B O algorithm provides a higher f-score by 160% than (CSA, PSO) while it is higher by 200% than (ALO). So HBO is ranked the first algorithm in performance, then CSA and PSO came in the second rank followed by ALO ranked the third in performance. We notice that BSA and LFD algorithms ranked last and performed worst.
Table 9 displays the number of selected features for each technique during its evaluation. The results demonstrate that B HBO is highly effective for the FS procedure.

5.4. Convergence Curve

This section is devoted to the asymptotic evaluation of the proposed B _ H B O algorithm for the FS problem on various carefully chosen datasets. It illustrates the relationship between the number of optimization iterations, the prediction error attained thus far, and the graphical convergence curve of the proposed B _ H B O technique.The convergence curve of B _ H B O with varying MHs, such as ALO, AOA, BSA, CSA, LFD, PSO, SMA, and TSA on 20 medical benchmark datasets, is shown in Figure 2, Figure 3, Figure 4 and Figure 5. It illustrated convergence curves of B _ H B O with k-NN.
Figure 2. (a) Convergence Curve of the proposed approach for Arrhythmia dataset; (b) Convergence Curve of the proposed approach for Breast-cancer dataset; (c) Convergence Curve of the proposed approach for BreastEW dataset; (d) Convergence Curve of the proposed approach for CongressEW dataset; (e) Convergence Curve of the proposed approach for Diabets dataset. Convergence Curve of the proposed approach (B_HBO) over 1000 iterations as a stop criterion for Arrhythmia, Breastcancer, BreastEW, CongressEW, and Diabetes datasets.
Figure 3. (a) Convergence Curve of the proposed approach for German dataset; (b) Convergence Curve of the proposed approach for Glass dataset; (c) Convergence Curve of the proposed approach for Heart-C dataset; (d) Convergence Curve of the proposed approach for Heart-StatLog dataset; (e) Convergence Curve of the proposed approach for Hepatitis dataset.
Figure 4. (a) Convergence Curve of the proposed approach for Hillvalley dataset; (b) Convergence Curve of the proposed approach for Ionosphere dataset; (c) Convergence Curve of the proposed approach for Iris dataset; (d) Convergence Curve of the proposed approach for Lung-Cancer dataset; (e) Convergence Curve of the proposed approach for Lymphography dataset.
Figure 5. (a) Convergence Curve of the proposed approach for Vowel dataset; (b) Convergence Curve of the proposed approach for WaveformEW dataset; (c) Convergence Curve of the proposed approach for WDBC dataset; (d) Convergence Curve of the proposed approach for Wine dataset; (e) Convergence Curve of the proposed approach for Zoo dataset.
As can be observed in the graphs, almost every B _ H B O had better outcomes than the others because their curves were higher than the other algorithms. It can be shown that B _ H B O causes an increase in the convergence rate toward the optimal solutions. For example, this can be noticed in the Diabetes, German, Iris, Lymphography, Vowel, Waveform_EW, Wine, and Zoo datasets. Most of these high rates of convergence are obtained at high dimension datasets.

5.5. Boxplot

The boxplot is used to analyze further the behavior of B _ H B O in terms of different performance measures. Figure 6, Figure 7, Figure 8 and Figure 9 show boxplots of the accuracies for all datasets achieved by the optimizers; ALO, AOA, BSA, CSA, LFD, PSO, SMA, and TSA on 20 medical benchmark datasets, and the proposed method B _ H B O with k-NN. The minimum, maximum, median, first quartile ( Q 1 ) , and third quartile ( Q 3 ) of the data are the five elements of a boxplot. In addition, the red line inside the box indicated the median value representing the algorithms’ categorization accuracy. Compared to the other algorithms, B _ H B O has a higher number of boxplots.
Figure 6. (a) Classification error of the proposed approach for Arrhythmia dataset; (b) Classification error of the proposed approach for Breastcancer dataset; (c) Classification error of the proposed approach for BreastEW dataset; (d) Classification error of the proposed approach for Congress dataset; (e) Classification error of the proposed approach for Diabets dataset.
Figure 7. Boxplots of the results achieved by the B_HBO regarding classification error over German, Glass, Heart-C, Heart-StatLog, and Hepatitis datasets regarding classification error. (a) Classification error of the proposed approach for German dataset. (b) Classification error of the proposed approach for Glass dataset. (c) Classification error of the proposed approach for Heart-C dataset. (d) Classification error of the proposed approach for Heart-StatLog dataset. (e) Classification error of the proposed approach for Hepatitis dataset.
Figure 8. Boxplots of the results achieved by the B_HBO regarding classification error over Hillvalley, Ionosphere, Iris, Lung-Cancer, and Lymphography datasets regarding classification error. (a) Classification error of the proposed approach for Hillvalley dataset. (b) Classification error of the proposed approach for Ionosphere dataset. (c) Classification error of the proposed approach for Iris dataset. (d) Classification error of the proposed approach for Lung-Cancer dataset. (e) Classification error of the proposed approach for Lymphography dataset.
Figure 9. Boxplots of the results achieved by the B_HBO regarding classification error over Vowel, WaveformEW, WDBC, Wine, and Zoo datasets regarding classification error. (a) Classification error of the proposed approach for Vowel dataset. (b) Classification error of the proposed approach for WaveformEW dataset. (c) Classification error of the proposed approach for WDBC dataset. (d) Classification error of the proposed approach for Wine dataset. (e) Classification error of the proposed approach for Zoo dataset.
It is evident that the B _ H B O has the lowest boxplot for fitness value in most tested datasets, especially those with high dimensions, except for four datasets (arrhythmia, Hepatitis, Hillvalley, and Lymphography). By analyzing the boxplot results, the following points can be reached: First, the presented B _ H B O has a lower boxplot at 80% of the datasets. In addition, there are some datasets’ boxplot plots that indicate that the competitive FS methods nearly have the same statistical description. In addition, most of the obtained results belong to the first quartile, which indicates that the proposed B _ H B O obtained a small classification error.
We can conclude that the B _ H B O with k-NN has the best boxplots for most datasets compared with the other algorithms. The B _ H B O algorithm’s median has a greater value. Depending on the dataset, the second-best algorithm is ALO.
Finally, it is clear that:
  • box plots Let us note that B _ H B O outperforms Ant Lion Optimizer (ALO), Archimedes Optimization Algorithm (AOA), Backtracking Search Algorithm (BSA), Crow Search Algorithm (CSA), Levy flight distribution (LFD), Particle Swarm Optimization (PSO), Slime Mold Algorithm (SMA), and Tree Seed Algorithm (TSA) (TSA).
    In final, it is easy to note that:
  • The performance of the proposed method is compared to the performance of eight different algorithms. The results reveal the higher categorization, accuracy, number of selected characteristics, sensitivity, and specificity of our proposed method.

5.6. The Wilcoxon Test

Statistical analysis is necessary to compare the efficiency of B _ H B O to that of other competitive algorithms. Wilcoxon’s test assesses the superiority of the presented B _ H B O over the other FS methods. The main aim of using Wilcoxon’s test is to determine whether there is a significant difference between B _ H B O (as control group) and each of the tested FS methods. Since Wilcoxon’s test is the pair-wise non-parametric statistical test, in this test, there are two hypotheses: the first one is called null and supposes there is no significant difference between the B _ H B O and other methods. The second hypothesis is called the alternative, and it assumes there is a significant difference. The alternative hypothesis is accepted if the p-value is less than 0.05. Table 10 shows the p-value obtained using Wilcoxon’s rank-sum test for the accuracy. From the results, it can be seen that B _ H B O has a significant difference in accuracy value with ALO, AOA, BSA, CSA, LFD, PSO, SMA, and TSA at 17 datasets. In most cases, there is a significant difference with other methods, with nearly more than 14 datasets. The combination of MRFO and SCA enhances the performance of determining the relevant features with increasing classification accuracy. Following this criterion, B _ H B O outperforms all other algorithms to varying degrees, indicating that B _ H B O benefits from extensive exploitation. In general, B _ H B O is statistically significant with 85% of algorithms. Therefore, we can conclude that B _ H B O has a high exploration capability to investigate the most promising regions of the search space and provides superior results compared to competing algorithms.
Table 10. Wilcoxon ranksum Statistical test based on Accuracy.

6. Conclusions

This paper presents a novel feature selection method based on the Heap-Based Optimizer (HBO). This paper proposes a new binary version of the basic Heap-based optimizer (HBO) called BHBO to solve the FS problem. The experiments are applied to 20 benchmark datasets from UCI datasets, and five evaluation criteria are performed to investigate the performance of the proposed algorithm. The experimental results revealed that the proposed algorithms achieved superior results versus eight of the recent state-of-the-art algorithms, including ALO, AOA, BSA, CSA, LFD, PSO, SMA, and TSA, according to the experimental results. Furthermore, the results proved that B_HBO had achieved the smallest number of features with better classification accuracy. The findings and results showed that the HBO achieved the minimum number of selected features with the best accuracy in a reasonable amount of time for most datasets. The HBO exhibited a considerable benefit for significantly big datasets. Regarding average accuracy, sensitivity, specificity, and feature size, HBO came in first, with the least number of specified features. After, HBO, ALO and CSA are ranked second in terms of performance.

Author Contributions

All authors contributed equally to this paper, where; D.S.A.E.: Supervision, Methodology, Conceptualization, Formal analysis, Writing—review & editing. F.R.P.P.: Methodology, Formal analysis, Methodology, Writing—review & editing, Implementation the code and running. M.A.S.A.: Software, Formal analysis, Resources, Writing—original draft, Supervision, Conceptualization, Methodology, Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia Project No. AN000565.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This work was supported by the Deanship of Scientific Research, Vice Presidency for Graduate Studies and Scientific Research, King Faisal University, Saudi Arabia [Project No. AN000565].

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
ACOAnt Colony Optimization
ALOAnt Lion Optimization
AOAArchimedes Optimization Algorithm
BALOBinary Ant Lion Optimization
BCFABinary Clonal Flower Pollination Algorithm
BGOABinary Grasshopper Optimization Algorithm
BGSABinary Gravitational Search Algorithm
BGWOBinary Gray Wolf Optimization
B _ H B O Binary Heap Based Optimizer
BSABack Tracking Search Algorithm
BSSOBinary Swallow Swarm Optimization
BSHOBinary Spotted Hyena Optimizer
BPSOBinary Particle Swarm Optimization
BWOABinary Whale Optimization Algorithm
CSACrow Search Algorithm
EOEquilibrium Optimizer
FOAForest Optimization Algorithm
FPAFlower Pollination Algorithm
FSFeature Selection
GAGenetic Algorithm
GBOGradient-Based Optimizer
GSAGravitational Search Algorithm
GWOGray wolf Optimizer
HBOHeap-Based Optimizer
HGSOHenry Gas Solubility Optimization Algorithm
LFDLevy flight Distribution
PSOParticle Swarm Optimization
SCASine Cosine Algorithm
SMASlime Mold Algorithm
SSASalp Swarm Algorithm
TSATree Seed Algorithm
WOAWhale Optimization Algorithm

References

  1. Zawbaa, H.M.; Emary, E.; Grosan, C. Feature selection via chaotic antlion optimization. PLoS ONE 2016, 11, e0150652. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Faris, H.; Mafarja, M.M.; Heidari, A.A.; Aljarah, I.; Ala’M, A.Z.; Mirjalili, S.; Fujita, H. An efficient binary salp swarm algorithm with crossover scheme for feature selection problems. Knowl.-Based Syst. 2018, 154, 43–67. [Google Scholar] [CrossRef]
  3. Huang, Y.; Jin, W.; Yu, Z.; Li, B. Supervised feature selection through Deep Neural Networks with pairwise connected structure. Knowl.-Based Syst. 2020, 204, 106202. [Google Scholar] [CrossRef]
  4. Mafarja, M.; Aljarah, I.; Heidari, A.A.; Faris, H.; Fournier-Viger, P.; Li, X.; Mirjalili, S. Binary dragonfly optimization for feature selection using time-varying transfer functions. Knowl.-Based Syst. 2018, 161, 185–204. [Google Scholar] [CrossRef]
  5. Zhang, J.; Hu, X.; Li, P.; He, W.; Zhang, Y.; Li, H. A hybrid feature selection approach by correlation-based filters and svm-rfe. In Proceedings of the 2014 22nd International Conference on Pattern Recognition, Stockholm, Sweden, 24–28 August 2014; pp. 3684–3689. [Google Scholar]
  6. Teng, X.; Dong, H.; Zhou, X. Adaptive feature selection using v-shaped binary particle swarm optimization. PLoS ONE 2017, 12, e0173907. [Google Scholar] [CrossRef]
  7. Motoda, H.; Liu, H. Feature Selection, Extraction and Construction; Communication of IICM (Institute of Information and Computing Machinery Taiwan): Taiwan, 2002; Volume 5, p. 2. [Google Scholar]
  8. Talbi, E.G. Metaheuristics: From Design to Implementation; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 74. [Google Scholar]
  9. Gnana, D.A.A.; Balamurugan, S.A.A.; Leavline, E.J. Literature review on feature selection methods for high-dimensional data. Int. J. Comput. Appl. 2016, 975, 8887. [Google Scholar]
  10. Hussien, A.G.; Hassanien, A.E.; Houssein, E.H.; Bhattacharyya, S.; Amin, M. S-shaped binary whale optimization algorithm for feature selection. In Recent Trends in Signal and Image Processing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 79–87. [Google Scholar]
  11. Dhiman, G.; Kaur, A. Optimizing the design of airfoil and optical buffer problems using spotted hyena optimizer. Designs 2018, 2, 28. [Google Scholar] [CrossRef] [Green Version]
  12. Oh, I.S.; Lee, J.S.; Moon, B.R. Hybrid genetic algorithms for feature selection. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 1424–1437. [Google Scholar]
  13. Kennedy, J.; Eberhart, R. Particle Swarm Optimisation. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995. [Google Scholar]
  14. Kennedy, J.; Eberhart, R.C. A discrete binary version of the particle swarm algorithm. In Proceedings of the 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation, Orlando, FL, USA, 12–15 October 1997; Volume 5, pp. 4104–4108. [Google Scholar]
  15. Chakraborty, B. Feature subset selection by particle swarm optimization with fuzzy fitness function. In Proceedings of the 2008 3rd International Conference on Intelligent System and Knowledge Engineering, Xiamen, China, 17–19 November 2008; Volume 1, pp. 1038–1042. [Google Scholar]
  16. Wang, X.; Yang, J.; Teng, X.; Xia, W.; Jensen, R. Feature selection based on rough sets and particle swarm optimization. Pattern Recognit. Lett. 2007, 28, 459–471. [Google Scholar] [CrossRef] [Green Version]
  17. Xue, B.; Zhang, M.; Browne, W.N. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms. Appl. Soft Comput. 2014, 18, 261–276. [Google Scholar] [CrossRef]
  18. Aghdam, M.H.; Ghasem-Aghaee, N.; Basiri, M.E. Text feature selection using ant colony optimization. Expert Syst. Appl. 2009, 36, 6843–6853. [Google Scholar] [CrossRef]
  19. Karaboga, D. An Idea Based on Honey Bee Swarm for Numerical Optimization; Technical Report; Citeseer: Princeton, NJ, USA, 2005. [Google Scholar]
  20. Rashedi, E.; Nezamabadi-Pour, H.; Saryazdi, S. BGSA: Binary gravitational search algorithm. Nat. Comput. 2010, 9, 727–745. [Google Scholar] [CrossRef]
  21. Wang, J.; Hedar, A.R.; Wang, S.; Ma, J. Rough set and scatter search metaheuristic based feature selection for credit scoring. Expert Syst. Appl. 2012, 39, 6123–6128. [Google Scholar] [CrossRef]
  22. Hashim, F.A.; Hussain, K.; Houssein, E.H.; Mabrouk, M.S.; Al-Atabany, W. Archimedes optimization algorithm: A new metaheuristic algorithm for solving optimization problems. Appl. Intell. 2021, 51, 1531–1551. [Google Scholar] [CrossRef]
  23. Van Beek, P. Backtracking search algorithms. In Foundations of Artificial Intelligence; Elsevier: Amsterdam, The Netherlands, 2006; Volume 2, pp. 85–134. [Google Scholar]
  24. Abd Elminaam, D.S.; Nabil, A.; Ibraheem, S.A.; Houssein, E.H. An efficient marine predators algorithm for feature selection. IEEE Access 2021, 9, 60136–60153. [Google Scholar] [CrossRef]
  25. Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  26. Sharma, M.; Kaur, P. A Comprehensive Analysis of Nature-Inspired Meta-Heuristic Techniques for Feature Selection Problem. Arch. Comput. Methods Eng. 2021, 28, 1103–1127. [Google Scholar] [CrossRef]
  27. Xue, Y.; Xue, B.; Zl, M. Self-Adaptive particle swarm optimization for large-scale feature selection in classification. Acm Trans. Knowl. Discov. Data 2019, 13, 50. [Google Scholar] [CrossRef]
  28. Askari, Q.; Saeed, M.; Younas, I. Heap-based optimizer inspired by corporate rank hierarchy for global optimization. Expert Syst. Appl. 2020, 161, 113702. [Google Scholar] [CrossRef]
  29. AbdElminaam, D.S.; Houssein, E.H.; Said, M.; Oliva, D.; Nabil, A. An efficient heap-based optimizer for parameters identification of modified photovoltaic models. Ain Shams Eng. J. 2022, 13, 101728. [Google Scholar] [CrossRef]
  30. Elsayed, S.K.; Kamel, S.; Selim, A.; Ahmed, M. An improved heap-based optimizer for optimal reactive power dispatch. IEEE Access 2021, 9, 58319–58336. [Google Scholar] [CrossRef]
  31. Zarshenas, A.; Suzuki, K. Binary coordinate ascent: An efficient optimization technique for feature subset selection for machine learning. Knowl.-Based Syst. 2016, 110, 191–201. [Google Scholar] [CrossRef]
  32. Chuang, L.Y.; Tsai, S.W.; Yang, C.H. Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 2011, 38, 12699–12707. [Google Scholar] [CrossRef]
  33. Zhang, Y.; Wang, S.; Phillips, P.; Ji, G. Binary PSO with mutation operator for feature selection using decision tree applied to spam detection. Knowl.-Based Syst. 2014, 64, 22–31. [Google Scholar] [CrossRef]
  34. Mafarja, M.; Aljarah, I.; Faris, H.; Hammouri, A.I.; Ala’M, A.Z.; Mirjalili, S. Binary grasshopper optimisation algorithm approaches for feature selection problems. Expert Syst. Appl. 2019, 117, 267–286. [Google Scholar] [CrossRef]
  35. Emary, E.; Zawbaa, H.M.; Hassanien, A.E. Binary ant lion approaches for feature selection. Neurocomputing 2016, 213, 54–65. [Google Scholar] [CrossRef]
  36. Asgarnezhad, R.; Monadjemi, S.A.; Soltanaghaei, M. An application of MOGW optimization for feature selection in text classification. J. Supercomput. 2021, 77, 5806–5839. [Google Scholar] [CrossRef]
  37. Neggaz, N.; Ewees, A.A.; Abd Elaziz, M.; Mafarja, M. Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection. Expert Syst. Appl. 2020, 145, 113103. [Google Scholar] [CrossRef]
  38. Kumar, V.; Kaur, A. Binary spotted hyena optimizer and its application to feature selection. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 2625–2645. [Google Scholar] [CrossRef]
  39. Nakamura, R.Y.; Pereira, L.A.; Costa, K.A.; Rodrigues, D.; Papa, J.P.; Yang, X.S. BBA: A binary bat algorithm for feature selection. In Proceedings of the 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images, Ouro Preto, Brazil, 22–25 August 2012; pp. 291–297. [Google Scholar]
  40. Mafarja, M.M.; Eleyan, D.; Jaber, I.; Hammouri, A.; Mirjalili, S. Binary dragonfly algorithm for feature selection. In Proceedings of the 2017 International Conference on New Trends in Computing Sciences (ICTCS), Amman, Jordan, 11–13 October 2017; pp. 12–17. [Google Scholar]
  41. Neggaz, N.; Houssein, E.H.; Hussain, K. An efficient henry gas solubility optimization for feature selection. Expert Syst. Appl. 2020, 152, 113364. [Google Scholar] [CrossRef]
  42. Jiang, Y.; Luo, Q.; Wei, Y.; Abualigah, L.; Zhou, Y. An efficient binary Gradient-based optimizer for feature selection. Math. Biosci. Eng. 2021, 18, 3813–3854. [Google Scholar] [CrossRef] [PubMed]
  43. Ouadfel, S.; Abd Elaziz, M. Enhanced crow search algorithm for feature selection. Expert Syst. Appl. 2020, 159, 113572. [Google Scholar] [CrossRef]
  44. Chaudhuri, A.; Sahu, T.P. Feature selection using Binary Crow Search Algorithm with time varying flight length. Expert Syst. Appl. 2021, 168, 114288. [Google Scholar] [CrossRef]
  45. Too, J.; Mirjalili, S. General learning equilibrium optimizer: A new feature selection method for biological data classification. Appl. Artif. Intell. 2021, 35, 247–263. [Google Scholar] [CrossRef]
  46. Hamidzadeh, J.; Kelidari, M. Feature selection by using chaotic cuckoo optimization algorithm with levy flight, opposition-based learning and disruption operator. Soft Comput. 2021, 25, 2911–2933. [Google Scholar]
  47. Sayed, S.A.F.; Nabil, E.; Badr, A. A binary clonal flower pollination algorithm for feature selection. Pattern Recognit. Lett. 2016, 77, 21–27. [Google Scholar] [CrossRef]
  48. Moorthy, U.; Gandhi, U.D. Forest optimization algorithm-based feature selection using classifier ensemble. Comput. Intell. 2020, 36, 1445–1462. [Google Scholar] [CrossRef]
  49. Hodashinsky, I.; Sarin, K.; Shelupanov, A.; Slezkin, A. Feature selection based on swallow swarm optimization for fuzzy classification. Symmetry 2019, 11, 1423. [Google Scholar] [CrossRef] [Green Version]
  50. Ghosh, M.; Guha, R.; Alam, I.; Lohariwal, P.; Jalan, D.; Sarkar, R. Binary genetic swarm optimization: A combination of GA and PSO for feature selection. J. Intell. Syst. 2019, 29, 1598–1610. [Google Scholar] [CrossRef]
  51. Liu, M.K.; Tran, M.Q.; Weng, P.Y. Fusion of vibration and current signatures for the fault diagnosis of induction machines. Shock Vib. 2019, 2019, 7176482. [Google Scholar] [CrossRef]
  52. Tran, M.Q.; Elsisi, M.; Liu, M.K. Effective feature selection with fuzzy entropy and similarity classifier for chatter vibration diagnosis. Measurement 2021, 184, 109962. [Google Scholar] [CrossRef]
  53. Tran, M.Q.; Li, Y.C.; Lan, C.Y.; Liu, M.K. Wind Farm Fault Detection by Monitoring Wind Speed in the Wake Region. Energies 2020, 13, 6559. [Google Scholar] [CrossRef]
  54. Aljarah, I.; Habib, M.; Faris, H.; Al-Madi, N.; Heidari, A.A.; Mafarja, M.; Elaziz, M.A.; Mirjalili, S. A dynamic locality multi-objective salp swarm algorithm for feature selection. Comput. Ind. Eng. 2020, 147, 106628. [Google Scholar] [CrossRef]
  55. Alweshah, M.; Khalaileh, S.A.; Gupta, B.B.; Almomani, A.; Hammouri, A.I.; Al-Betar, M.A. The monarch butterfly optimization algorithm for solving feature selection problems. Neural Comput. Appl. 2020. [Google Scholar] [CrossRef]
  56. Arora, S.; Anand, P. Binary butterfly optimization approaches for feature selection. Expert Syst. Appl. 2019, 116, 147–160. [Google Scholar] [CrossRef]
  57. Gao, Y.; Zhou, Y.; Luo, Q. An Efficient Binary Equilibrium Optimizer Algorithm for Feature Selection. IEEE Access 2020, 8, 140936–140963. [Google Scholar] [CrossRef]
  58. Ghosh, K.K.; Singh, P.K.; Hong, J.; Geem, Z.W.; Sarkar, R. Binary social mimic optimization algorithm with X-shaped transfer function for feature selection. IEEE Access 2020, 8, 97890–97906. [Google Scholar] [CrossRef]
  59. Ghosh, K.K.; Ahmed, S.; Singh, P.K.; Geem, Z.W.; Sarkar, R. Improved binary sailfish optimizer based on adaptive β-Hill climbing for feature selection. IEEE Access 2020, 8, 83548–83560. [Google Scholar] [CrossRef]
  60. Guha, R.; Ghosh, M.; Kapri, S.; Shaw, S.; Mutsuddi, S.; Bhateja, V.; Sarkar, R. Deluge based Genetic Algorithm for feature selection. Evol. Intell. 2021, 14, 357–367. [Google Scholar] [CrossRef]
  61. Hammouri, A.I.; Mafarja, M.; Al-Betar, M.A.; Awadallah, M.A.; Abu-Doush, I. An improved Dragonfly Algorithm for feature selection. Knowl.-Based Syst. 2020, 203, 106131. [Google Scholar] [CrossRef]
  62. Han, C.; Zhou, G.; Zhou, Y. Binary Symbiotic Organism Search Algorithm for Feature Selection and Analysis. IEEE Access 2019, 7, 166833–166859. [Google Scholar] [CrossRef]
  63. Nouri-Moghaddam, B.; Ghazanfari, M.; Fathian, M. A novel multi-objective forest optimization algorithm for wrapper feature selection. Expert Syst. Appl. 2021, 175, 114737. [Google Scholar] [CrossRef]
  64. Yan, C.; Ma, J.; Luo, H.; Patel, A. Hybrid binary Coral Reefs Optimization algorithm with Simulated Annealing for Feature Selection in high-dimensional biomedical datasets. Chemom. Intell. Lab. Syst. 2019, 184, 102–111. [Google Scholar] [CrossRef]
  65. Alweshah, M.; Alkhalaileh, S.; Albashish, D.; Mafarja, M.; Bsoul, Q.; Dorgham, O. A hybrid mine blast algorithm for feature selection problems. Soft Comput. 2021, 25, 517–534. [Google Scholar] [CrossRef]
  66. Anand, P.; Arora, S. A novel chaotic selfish herd optimizer for global optimization and feature selection. Artif. Intell. Rev. 2020, 53, 1441–1486. [Google Scholar] [CrossRef]
  67. Anter, A.M.; Ali, M. Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems. Soft Comput. 2020, 24, 1565–1584. [Google Scholar] [CrossRef]
  68. Qasim, O.S.; Al-Thanoon, N.A.; Algamal, Z.Y. Feature selection based on chaotic binary black hole algorithm for data classification. Chemom. Intell. Lab. Syst. 2020, 204, 104104. [Google Scholar] [CrossRef]
  69. Ahmady, G.A.; Mehrpour, M.; Nikooravesh, A. Organizational structure. Procedia-Soc. Behav. Sci. 2016, 230, 455–462. [Google Scholar] [CrossRef]
  70. Dheeru, D.; Karra Taniskidou, E. UCI Machine Learning Repository; Irvine, School of Information and Computer Sciences, University of California: Irvine, CA, USA, 2017. [Google Scholar]
  71. Jiao, Y.; Du, P. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quant. Biol. 2016, 4, 320–330. [Google Scholar] [CrossRef] [Green Version]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.